Re: [RFC PATCH 00/24] Control VQ support in vDPA

2020-09-24 Thread Stefan Hajnoczi
On Thu, Sep 24, 2020 at 11:21:01AM +0800, Jason Wang wrote:
> This series tries to add the support for control virtqueue in vDPA.

Please include documentation for both driver authors and vhost-vdpa
ioctl users. vhost-vdpa ioctls are only documented with a single
sentence. Please add full information on arguments, return values, and a
high-level explanation of the feature (like this cover letter) to
introduce the API.

What is the policy for using virtqueue groups? My guess is:
1. virtio_vdpa simply enables all virtqueue groups.
2. vhost_vdpa relies on userspace policy on how to use virtqueue groups.
   Are the semantics of virtqueue groups documented somewhere so
   userspace knows what to do? If a vDPA driver author decides to create
   N virtqueue groups, N/2 virtqueue groups, or just 1 virtqueue group,
   how will userspace know what to do?

Maybe a document is needed to describe the recommended device-specific
virtqueue groups that vDPA drivers should implement (e.g. "put the net
control vq into its own virtqueue group")?

This could become messy with guidelines. For example, drivers might be
shipped that aren't usable for certain use cases just because the author
didn't know that a certain virtqueue grouping is advantageous.

BTW I like how general this feature is. It seems to allow vDPA devices
to be split into sub-devices for further passthrough. Who will write the
first vDPA-on-vDPA driver? :)

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-blk: Use kobj_to_dev() instead of container_of()

2020-08-21 Thread Stefan Hajnoczi
On Fri, Aug 21, 2020 at 09:19:15AM +0800, Tian Tao wrote:
> Use kobj_to_dev() instead of container_of()
> 
> Signed-off-by: Tian Tao 
> ---
>  drivers/block/virtio_blk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 02/10] block: virtio-blk: check logical block size

2020-07-28 Thread Stefan Hajnoczi
On Tue, Jul 21, 2020 at 01:52:31PM +0300, Maxim Levitsky wrote:
> Linux kernel only supports logical block sizes which are power of two,
> at least 512 bytes and no more that PAGE_SIZE.
> 
> Check this instead of crashing later on.
> 
> Note that there is no need to check physical block size since it is
> only a hint, and virtio-blk already only supports power of two values.
> 
> Bugzilla link: https://bugzilla.redhat.com/show_bug.cgi?id=1664619
> 
> Signed-off-by: Maxim Levitsky 
> ---
>  drivers/block/virtio_blk.c | 15 +--
>  1 file changed, 13 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: Interesting qemu/virt-manager bug about the "rotational" attribute on virtio-blk disks

2020-07-23 Thread Stefan Hajnoczi
On Thu, Jul 16, 2020 at 11:33:44AM +0200, Stefano Garzarella wrote:
> +Cc Michael, Stefan, virtualization@lists.linux-foundation.org
> 
> On Thu, Jul 16, 2020 at 09:06:14AM +0100, Richard W.M. Jones wrote:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1857515
> > 
> > A virtio-blk disk which is backed by a raw file on an SSD,
> > inside the guest shows rotational = 1.
> > 
> > I assumed that qemu must have a "rotational" property for disks and
> > this would be communicated by virtio to the guest, but qemu and virtio
> > don't seem to have this.  Pretty surprising!  Is it called something
> > other than "rotational"?
> > 
> 
> I'm not sure if we need to add this property in QEMU, but in Linux
> I found these flags (include/linux/blkdev.h) for the block queues:
> 
> #define QUEUE_FLAG_NONROT 6   /* non-rotational device (SSD) */
> #define QUEUE_FLAG_VIRT   QUEUE_FLAG_NONROT /* paravirt device */
> 
> xen-blkfront driver is the only one that sets the QUEUE_FLAG_VIRT,
> should we do the same in the virtio-blk driver regardless of the backend?

The ability to control this flag would be interesting for performance
experiments.

The problem with changing the default is that regressions can be
expected. Certain workloads benefit while others regress.

I suggest:
1. Make it controllable so that QUEUE_FLAG_NONROT can be set or clear
   (not hardcoded to a single value).
2. The device can communicate the optimal setting from the host. The
   SCSI protocol already conveys this information. Virtio-blk needs a
   feature bit and possibly config space field.
3. Make it migration-safe. It needs to be configured explicitly so the
   value doesn't change suddenly across migration.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost/scsi: fix up req type endian-ness

2020-07-13 Thread Stefan Hajnoczi
On Fri, Jul 10, 2020 at 06:48:51AM -0400, Michael S. Tsirkin wrote:
> vhost/scsi doesn't handle type conversion correctly
> for request type when using virtio 1.0 and up for BE,
> or cross-endian platforms.
> 
> Fix it up using vhost_32_to_cpu.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/vhost/scsi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vsock/virtio: annotate 'the_virtio_vsock' RCU pointer

2020-07-13 Thread Stefan Hajnoczi
On Fri, Jul 10, 2020 at 02:12:43PM +0200, Stefano Garzarella wrote:
> Commit 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free
> on the_virtio_vsock") starts to use RCU to protect 'the_virtio_vsock'
> pointer, but we forgot to annotate it.
> 
> This patch adds the annotation to fix the following sparse errors:
> 
> net/vmw_vsock/virtio_transport.c:73:17: error: incompatible types in 
> comparison expression (different address spaces):
> net/vmw_vsock/virtio_transport.c:73:17:struct virtio_vsock [noderef] 
> __rcu *
> net/vmw_vsock/virtio_transport.c:73:17:struct virtio_vsock *
> net/vmw_vsock/virtio_transport.c:171:17: error: incompatible types in 
> comparison expression (different address spaces):
> net/vmw_vsock/virtio_transport.c:171:17:struct virtio_vsock [noderef] 
> __rcu *
> net/vmw_vsock/virtio_transport.c:171:17:struct virtio_vsock *
> net/vmw_vsock/virtio_transport.c:207:17: error: incompatible types in 
> comparison expression (different address spaces):
> net/vmw_vsock/virtio_transport.c:207:17:struct virtio_vsock [noderef] 
> __rcu *
> net/vmw_vsock/virtio_transport.c:207:17:struct virtio_vsock *
> net/vmw_vsock/virtio_transport.c:561:13: error: incompatible types in 
> comparison expression (different address spaces):
> net/vmw_vsock/virtio_transport.c:561:13:struct virtio_vsock [noderef] 
> __rcu *
> net/vmw_vsock/virtio_transport.c:561:13:struct virtio_vsock *
> net/vmw_vsock/virtio_transport.c:612:9: error: incompatible types in 
> comparison expression (different address spaces):
> net/vmw_vsock/virtio_transport.c:612:9:struct virtio_vsock [noderef] 
> __rcu *
> net/vmw_vsock/virtio_transport.c:612:9:struct virtio_vsock *
> net/vmw_vsock/virtio_transport.c:631:9: error: incompatible types in 
> comparison expression (different address spaces):
> net/vmw_vsock/virtio_transport.c:631:9:struct virtio_vsock [noderef] 
> __rcu *
> net/vmw_vsock/virtio_transport.c:631:9:struct virtio_vsock *
> 
> Fixes: 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free on 
> the_virtio_vsock")
> Reported-by: Michael S. Tsirkin 
> Signed-off-by: Stefano Garzarella 
> ---
>  net/vmw_vsock/virtio_transport.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC 0/3] virtio: NUMA-aware memory allocation

2020-06-30 Thread Stefan Hajnoczi
On Mon, Jun 29, 2020 at 11:28:41AM -0400, Michael S. Tsirkin wrote:
> On Mon, Jun 29, 2020 at 10:26:46AM +0100, Stefan Hajnoczi wrote:
> > On Sun, Jun 28, 2020 at 02:34:37PM +0800, Jason Wang wrote:
> > > 
> > > On 2020/6/25 下午9:57, Stefan Hajnoczi wrote:
> > > > These patches are not ready to be merged because I was unable to 
> > > > measure a
> > > > performance improvement. I'm publishing them so they are archived in 
> > > > case
> > > > someone picks up this work again in the future.
> > > > 
> > > > The goal of these patches is to allocate virtqueues and driver state 
> > > > from the
> > > > device's NUMA node for optimal memory access latency. Only guests with 
> > > > a vNUMA
> > > > topology and virtio devices spread across vNUMA nodes benefit from 
> > > > this.  In
> > > > other cases the memory placement is fine and we don't need to take NUMA 
> > > > into
> > > > account inside the guest.
> > > > 
> > > > These patches could be extended to virtio_net.ko and other devices in 
> > > > the
> > > > future. I only tested virtio_blk.ko.
> > > > 
> > > > The benchmark configuration was designed to trigger worst-case NUMA 
> > > > placement:
> > > >   * Physical NVMe storage controller on host NUMA node 0
> 
> It's possible that numa is not such a big deal for NVMe.
> And it's possible that bios misconfigures ACPI reporting NUMA placement
> incorrectly.
> I think that the best thing to try is to use a ramdisk
> on a specific numa node.

Using ramdisk is an interesting idea, thanks.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC 0/3] virtio: NUMA-aware memory allocation

2020-06-29 Thread Stefan Hajnoczi
On Sun, Jun 28, 2020 at 02:34:37PM +0800, Jason Wang wrote:
> 
> On 2020/6/25 下午9:57, Stefan Hajnoczi wrote:
> > These patches are not ready to be merged because I was unable to measure a
> > performance improvement. I'm publishing them so they are archived in case
> > someone picks up this work again in the future.
> > 
> > The goal of these patches is to allocate virtqueues and driver state from 
> > the
> > device's NUMA node for optimal memory access latency. Only guests with a 
> > vNUMA
> > topology and virtio devices spread across vNUMA nodes benefit from this.  In
> > other cases the memory placement is fine and we don't need to take NUMA into
> > account inside the guest.
> > 
> > These patches could be extended to virtio_net.ko and other devices in the
> > future. I only tested virtio_blk.ko.
> > 
> > The benchmark configuration was designed to trigger worst-case NUMA 
> > placement:
> >   * Physical NVMe storage controller on host NUMA node 0
> >   * IOThread pinned to host NUMA node 0
> >   * virtio-blk-pci device in vNUMA node 1
> >   * vCPU 0 on host NUMA node 1 and vCPU 1 on host NUMA node 0
> >   * vCPU 0 in vNUMA node 0 and vCPU 1 in vNUMA node 1
> > 
> > The intent is to have .probe() code run on vCPU 0 in vNUMA node 0 (host NUMA
> > node 1) so that memory is in the wrong NUMA node for the virtio-blk-pci 
> > devic=
> > e.
> > Applying these patches fixes memory placement so that virtqueues and driver
> > state is allocated in vNUMA node 1 where the virtio-blk-pci device is 
> > located.
> > 
> > The fio 4KB randread benchmark results do not show a significant 
> > improvement:
> > 
> > Name  IOPS   Error
> > virtio-blk42373.79 =C2=B1 0.54%
> > virtio-blk-numa   42517.07 =C2=B1 0.79%
> 
> 
> I remember I did something similar in vhost by using page_to_nid() for
> descriptor ring. And I get little improvement as shown here.
> 
> Michael reminds that it was probably because all data were cached. So I
> doubt if the test lacks sufficient stress on the cache ...

Yes, that sounds likely. If there's no real-world performance
improvement then I'm happy to leave these patches unmerged.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[RFC 3/3] virtio-blk: use NUMA-aware memory allocation in probe

2020-06-25 Thread Stefan Hajnoczi
Allocate frequently-accessed data structures from the NUMA node
associated with this device to avoid slow cross-NUMA node memory
accesses.

Only the following memory allocations are made NUMA-aware:

1. Called during probe. If called in the data path then hopefully we're
   executing on a CPU in the same NUMA node as the device. If the CPU is
   not in the right NUMA node then it's unclear whether forcing memory
   allocations to use the device's NUMA node will increase or decrease
   performance.

2. Memory will be frequently accessed from the data path. There is no
   need to worry about data that is not accessed from
   performance-critical code paths.

Signed-off-by: Stefan Hajnoczi 
---
 drivers/block/virtio_blk.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 9d21bf0f155e..40845e9ad3b1 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -482,6 +482,7 @@ static int init_vq(struct virtio_blk *vblk)
unsigned short num_vqs;
struct virtio_device *vdev = vblk->vdev;
struct irq_affinity desc = { 0, };
+   int node = dev_to_node(>dev);
 
err = virtio_cread_feature(vdev, VIRTIO_BLK_F_MQ,
   struct virtio_blk_config, num_queues,
@@ -491,7 +492,8 @@ static int init_vq(struct virtio_blk *vblk)
 
num_vqs = min_t(unsigned int, nr_cpu_ids, num_vqs);
 
-   vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
+   vblk->vqs = kmalloc_array_node(num_vqs, sizeof(*vblk->vqs),
+  GFP_KERNEL, node);
if (!vblk->vqs)
return -ENOMEM;
 
@@ -683,6 +685,7 @@ module_param_named(queue_depth, virtblk_queue_depth, uint, 
0444);
 
 static int virtblk_probe(struct virtio_device *vdev)
 {
+   int node = dev_to_node(>dev);
struct virtio_blk *vblk;
struct request_queue *q;
int err, index;
@@ -714,7 +717,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 
/* We need an extra sg elements at head and tail. */
sg_elems += 2;
-   vdev->priv = vblk = kmalloc(sizeof(*vblk), GFP_KERNEL);
+   vdev->priv = vblk = kmalloc_node(sizeof(*vblk), GFP_KERNEL, node);
if (!vblk) {
err = -ENOMEM;
goto out_free_index;
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC 1/3] virtio-pci: use NUMA-aware memory allocation in probe

2020-06-25 Thread Stefan Hajnoczi
Allocate frequently-accessed data structures from the NUMA node
associated with this virtio-pci device. This avoids slow cross-NUMA node
memory accesses.

Only the following memory allocations are made NUMA-aware:

1. Called during probe. If called in the data path then hopefully we're
   executing on a CPU in the same NUMA node as the device. If the CPU is
   not in the right NUMA node then it's unclear whether forcing memory
   allocations to use the device's NUMA node will increase or decrease
   performance.

2. Memory will be frequently accessed from the data path. There is no
   need to worry about data that is not accessed from
   performance-critical code paths.

Signed-off-by: Stefan Hajnoczi 
---
 drivers/virtio/virtio_pci_common.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index 222d630c41fc..cc6e49f9c698 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -178,11 +178,13 @@ static struct virtqueue *vp_setup_vq(struct virtio_device 
*vdev, unsigned index,
 u16 msix_vec)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-   struct virtio_pci_vq_info *info = kmalloc(sizeof *info, GFP_KERNEL);
+   int node = dev_to_node(>dev);
+   struct virtio_pci_vq_info *info;
struct virtqueue *vq;
unsigned long flags;
 
/* fill out our structure that represents an active queue */
+   info = kmalloc_node(sizeof *info, GFP_KERNEL, node);
if (!info)
return ERR_PTR(-ENOMEM);
 
@@ -283,10 +285,12 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned nvqs,
struct irq_affinity *desc)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   int node = dev_to_node(>dev);
u16 msix_vec;
int i, err, nvectors, allocated_vectors, queue_idx = 0;
 
-   vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
+   vp_dev->vqs = kcalloc_node(nvqs, sizeof(*vp_dev->vqs),
+  GFP_KERNEL, node);
if (!vp_dev->vqs)
return -ENOMEM;
 
@@ -355,9 +359,11 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, 
unsigned nvqs,
const char * const names[], const bool *ctx)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   int node = dev_to_node(>dev);
int i, err, queue_idx = 0;
 
-   vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
+   vp_dev->vqs = kcalloc_node(nvqs, sizeof(*vp_dev->vqs),
+  GFP_KERNEL, node);
if (!vp_dev->vqs)
return -ENOMEM;
 
@@ -513,10 +519,12 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
const struct pci_device_id *id)
 {
struct virtio_pci_device *vp_dev, *reg_dev = NULL;
+   int node = dev_to_node(_dev->dev);
int rc;
 
/* allocate our structure and fill it out */
-   vp_dev = kzalloc(sizeof(struct virtio_pci_device), GFP_KERNEL);
+   vp_dev = kzalloc_node(sizeof(struct virtio_pci_device),
+ GFP_KERNEL, node);
if (!vp_dev)
return -ENOMEM;
 
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC 0/3] virtio: NUMA-aware memory allocation

2020-06-25 Thread Stefan Hajnoczi
These patches are not ready to be merged because I was unable to measure a
performance improvement. I'm publishing them so they are archived in case
someone picks up this work again in the future.

The goal of these patches is to allocate virtqueues and driver state from the
device's NUMA node for optimal memory access latency. Only guests with a vNUMA
topology and virtio devices spread across vNUMA nodes benefit from this.  In
other cases the memory placement is fine and we don't need to take NUMA into
account inside the guest.

These patches could be extended to virtio_net.ko and other devices in the
future. I only tested virtio_blk.ko.

The benchmark configuration was designed to trigger worst-case NUMA placement:
 * Physical NVMe storage controller on host NUMA node 0
 * IOThread pinned to host NUMA node 0
 * virtio-blk-pci device in vNUMA node 1
 * vCPU 0 on host NUMA node 1 and vCPU 1 on host NUMA node 0
 * vCPU 0 in vNUMA node 0 and vCPU 1 in vNUMA node 1

The intent is to have .probe() code run on vCPU 0 in vNUMA node 0 (host NUMA
node 1) so that memory is in the wrong NUMA node for the virtio-blk-pci devic=
e.
Applying these patches fixes memory placement so that virtqueues and driver
state is allocated in vNUMA node 1 where the virtio-blk-pci device is located.

The fio 4KB randread benchmark results do not show a significant improvement:

Name  IOPS   Error
virtio-blk42373.79 =C2=B1 0.54%
virtio-blk-numa   42517.07 =C2=B1 0.79%

Stefan Hajnoczi (3):
  virtio-pci: use NUMA-aware memory allocation in probe
  virtio_ring: use NUMA-aware memory allocation in probe
  virtio-blk: use NUMA-aware memory allocation in probe

 include/linux/gfp.h|  2 +-
 drivers/block/virtio_blk.c |  7 +--
 drivers/virtio/virtio_pci_common.c | 16 
 drivers/virtio/virtio_ring.c   | 26 +-
 mm/page_alloc.c|  2 +-
 5 files changed, 36 insertions(+), 17 deletions(-)

--=20
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC 2/3] virtio_ring: use NUMA-aware memory allocation in probe

2020-06-25 Thread Stefan Hajnoczi
Allocate frequently-accessed data structures from the NUMA node
associated with this device to avoid slow cross-NUMA node memory
accesses.

Only the following memory allocations are made NUMA-aware:

1. Called during probe. If called in the data path then hopefully we're
   executing on a CPU in the same NUMA node as the device. If the CPU is
   not in the right NUMA node then it's unclear whether forcing memory
   allocations to use the device's NUMA node will increase or decrease
   performance.

2. Memory will be frequently accessed from the data path. There is no
   need to worry about data that is not accessed from
   performance-critical code paths.

This patch adds a non-meminit alloc_pages_exact_nid() caller so I've
removed the __meminit added by commit e19318116048 ("mm/page_alloc.c:
add __meminit to alloc_pages_exact_nid()").

Cc: Fabian Frederick 
Cc: Andrew Morton 
Cc: Mel Gorman 
Signed-off-by: Stefan Hajnoczi 
---
I have included the alloc_pages_exact_nid() __meminit removal in this
patch to provide context for reviewers.
---
 include/linux/gfp.h  |  2 +-
 drivers/virtio/virtio_ring.c | 26 +-
 mm/page_alloc.c  |  2 +-
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4aba4c86c626..9b69df707c7a 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -563,7 +563,7 @@ extern unsigned long get_zeroed_page(gfp_t gfp_mask);
 
 void *alloc_pages_exact(size_t size, gfp_t gfp_mask);
 void free_pages_exact(void *virt, size_t size);
-void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
+void *alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
 
 #define __get_free_page(gfp_mask) \
__get_free_pages((gfp_mask), 0)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 58b96baa8d48..d06b42309bed 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -276,7 +276,9 @@ static void *vring_alloc_queue(struct virtio_device *vdev, 
size_t size,
return dma_alloc_coherent(vdev->dev.parent, size,
  dma_handle, flag);
} else {
-   void *queue = alloc_pages_exact(PAGE_ALIGN(size), flag);
+   int node = dev_to_node(>dev);
+   void *queue = alloc_pages_exact_nid(node, PAGE_ALIGN(size),
+   flag);
 
if (queue) {
phys_addr_t phys_addr = virt_to_phys(queue);
@@ -1567,6 +1569,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
struct vring_packed_desc_event *driver, *device;
dma_addr_t ring_dma_addr, driver_event_dma_addr, device_event_dma_addr;
size_t ring_size_in_bytes, event_size_in_bytes;
+   int node = dev_to_node(>dev);
unsigned int i;
 
ring_size_in_bytes = num * sizeof(struct vring_packed_desc);
@@ -1591,7 +1594,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
if (!device)
goto err_device;
 
-   vq = kmalloc(sizeof(*vq), GFP_KERNEL);
+   vq = kmalloc_node(sizeof(*vq), GFP_KERNEL, node);
if (!vq)
goto err_vq;
 
@@ -1639,9 +1642,10 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->packed.event_flags_shadow = 0;
vq->packed.avail_used_flags = 1 << VRING_PACKED_DESC_F_AVAIL;
 
-   vq->packed.desc_state = kmalloc_array(num,
+   vq->packed.desc_state = kmalloc_array_node(num,
sizeof(struct vring_desc_state_packed),
-   GFP_KERNEL);
+   GFP_KERNEL,
+   node);
if (!vq->packed.desc_state)
goto err_desc_state;
 
@@ -1653,9 +1657,10 @@ static struct virtqueue *vring_create_virtqueue_packed(
for (i = 0; i < num-1; i++)
vq->packed.desc_state[i].next = i + 1;
 
-   vq->packed.desc_extra = kmalloc_array(num,
+   vq->packed.desc_extra = kmalloc_array_node(num,
sizeof(struct vring_desc_extra_packed),
-   GFP_KERNEL);
+   GFP_KERNEL,
+   node);
if (!vq->packed.desc_extra)
goto err_desc_extra;
 
@@ -2059,13 +2064,14 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
void (*callback)(struct virtqueue *),
const char *name)
 {
+   int node = dev_to_node(>dev);
unsigned int i;
struct vring_virtqueue *vq;
 
if (virtio_has_feature(vdev, VIRTIO_F_RING_PACKED))
return NULL;
 
-   vq = kmalloc(sizeof(*vq), GFP_KERNEL);
+   vq = kmalloc_node(sizeof(*vq), GFP_KERNEL, node);
if (!vq)
return NULL;
 
@@ -2110,8 +2116,10 @@ s

Re: [RFC v9 09/11] vhost/scsi: switch to buf APIs

2020-06-22 Thread Stefan Hajnoczi
On Fri, Jun 19, 2020 at 08:23:00PM +0200, Eugenio Pérez wrote:
> @@ -1139,9 +1154,9 @@ vhost_scsi_send_tmf_reject(struct vhost_scsi *vs,
>   iov_iter_init(_iter, READ, >iov[vc->out], vc->in, sizeof(rsp));
>  
>   ret = copy_to_iter(, sizeof(rsp), _iter);
> - if (likely(ret == sizeof(rsp)))
> - vhost_add_used_and_signal(>dev, vq, vc->head, 0);
> - else
> + if (likely(ret == sizeof(rsp))) {
> + vhost_scsi_signal_noinput(>dev, vq, >buf);
> + } else
>   pr_err("Faulted on virtio_scsi_ctrl_tmf_resp\n");
>  }

The curly brackets are not necessary, but the patch still looks fine:

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v9 10/11] vhost/vsock: switch to the buf API

2020-06-22 Thread Stefan Hajnoczi
On Fri, Jun 19, 2020 at 08:23:01PM +0200, Eugenio Pérez wrote:
> From: "Michael S. Tsirkin" 
> 
> A straight-forward conversion.
> 
> Signed-off-by: Michael S. Tsirkin 
> Signed-off-by: Eugenio Pérez 
> ---
>  drivers/vhost/vsock.c | 30 ++
>  1 file changed, 18 insertions(+), 12 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2] drivers/block: Use kobj_to_dev() API

2020-06-22 Thread Stefan Hajnoczi
On Sat, Jun 20, 2020 at 09:53:43AM +0800, Wang Qing wrote:
> Use kobj_to_dev() API instead of container_of().
> 
> Signed-off-by: Wang Qing 
> ---
>  drivers/block/virtio_blk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] drivers\block: Use kobj_to_dev() API

2020-06-19 Thread Stefan Hajnoczi
On Fri, Jun 12, 2020 at 03:10:56PM +0800, Wang Qing wrote:
> Use kobj_to_dev() API instead of container_of().
> 
> Signed-off-by: Wang Qing 
> ---
>  drivers/block/virtio_blk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>  mode change 100644 => 100755 drivers/block/virtio_blk.c

Please fix the '\' -> '/' in the commit message. Looks good otherwise:

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v5 11/13] vhost/scsi: switch to buf APIs

2020-06-08 Thread Stefan Hajnoczi
On Sun, Jun 07, 2020 at 10:11:46AM -0400, Michael S. Tsirkin wrote:
> Switch to buf APIs. Doing this exposes a spec violation in vhost scsi:
> all used bufs are marked with length 0.
> Fix that is left for another day.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/vhost/scsi.c | 73 ++--
>  1 file changed, 44 insertions(+), 29 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v5 12/13] vhost/vsock: switch to the buf API

2020-06-08 Thread Stefan Hajnoczi
On Sun, Jun 07, 2020 at 10:11:49AM -0400, Michael S. Tsirkin wrote:
> A straight-forward conversion.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/vhost/vsock.c | 30 ++
>  1 file changed, 18 insertions(+), 12 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 11/13] vhost/scsi: switch to buf APIs

2020-06-05 Thread Stefan Hajnoczi
On Tue, Jun 02, 2020 at 09:06:20AM -0400, Michael S. Tsirkin wrote:
> Switch to buf APIs. Doing this exposes a spec violation in vhost scsi:
> all used bufs are marked with length 0.
> Fix that is left for another day.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/vhost/scsi.c | 73 ++--
>  1 file changed, 44 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index c39952243fd3..c426c4e899c7 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -71,8 +71,8 @@ struct vhost_scsi_inflight {
>  };
>  
>  struct vhost_scsi_cmd {
> - /* Descriptor from vhost_get_vq_desc() for virt_queue segment */
> - int tvc_vq_desc;
> + /* Descriptor from vhost_get_avail_buf() for virt_queue segment */
> + struct vhost_buf tvc_vq_desc;
>   /* virtio-scsi initiator task attribute */
>   int tvc_task_attr;
>   /* virtio-scsi response incoming iovecs */
> @@ -213,7 +213,7 @@ struct vhost_scsi {
>   * Context for processing request and control queue operations.
>   */
>  struct vhost_scsi_ctx {
> - int head;
> + struct vhost_buf buf;
>   unsigned int out, in;
>   size_t req_size, rsp_size;
>   size_t out_size, in_size;
> @@ -443,6 +443,20 @@ static int vhost_scsi_check_stop_free(struct se_cmd 
> *se_cmd)
>   return target_put_sess_cmd(se_cmd);
>  }
>  
> +/* Signal to guest that request finished with no input buffer. */
> +/* TODO calling this when writing into buffer and most likely a bug */
> +static void vhost_scsi_signal_noinput(struct vhost_dev *vdev,
> +   struct vhost_virtqueue *vq,
> +   struct vhost_buf *bufp)
> +{
> + struct vhost_buf buf = *bufp;
> +
> + buf.in_len = 0;
> + vhost_put_used_buf(vq, );

Yes, this behavior differs from the QEMU virtio-scsi device
implementation. I think it's just a quirk that is probably my fault (I
guess I thought the length information is already encoded in the payload
SCSI headers so we have no use for the used descriptor length field).

Whether it's worth changing now or is an interesting question. In theory
it would make vhost-scsi more spec compliant and guest drivers might be
happier (especially drivers for niche OSes that were only tested against
QEMU's virtio-scsi). On the other hand, it's a guest-visible change that
could break similar niche drivers that assume length is always 0.

I'd leave it as-is unless people hit issues that justify the risk of
changing it.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 12/13] vhost/vsock: switch to the buf API

2020-06-05 Thread Stefan Hajnoczi
On Tue, Jun 02, 2020 at 09:06:22AM -0400, Michael S. Tsirkin wrote:
> A straight-forward conversion.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/vhost/vsock.c | 30 ++
>  1 file changed, 18 insertions(+), 12 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v4] virtio-blk: handle block_device_operations callbacks after hot unplug

2020-04-30 Thread Stefan Hajnoczi
A userspace process holding a file descriptor to a virtio_blk device can
still invoke block_device_operations after hot unplug.  This leads to a
use-after-free accessing vblk->vdev in virtblk_getgeo() when
ioctl(HDIO_GETGEO) is invoked:

  BUG: unable to handle kernel NULL pointer dereference at 0090
  IP: [] virtio_check_driver_offered_feature+0x10/0x90 
[virtio]
  PGD 80003a92f067 PUD 3a930067 PMD 0
  Oops:  [#1] SMP
  CPU: 0 PID: 1310 Comm: hdio-getgeo Tainted: G   OE     
3.10.0-1062.el7.x86_64 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  task: 9be5fbfb8000 ti: 9be5fa89 task.ti: 9be5fa89
  RIP: 0010:[]  [] 
virtio_check_driver_offered_feature+0x10/0x90 [virtio]
  RSP: 0018:9be5fa893dc8  EFLAGS: 00010246
  RAX: 9be5fc3f3400 RBX: 9be5fa893e30 RCX: 
  RDX:  RSI: 0004 RDI: 9be5fbc10b40
  RBP: 9be5fa893dc8 R08: 0301 R09: 0301
  R10:  R11:  R12: 9be5fdc24680
  R13: 9be5fbc10b40 R14: 9be5fbc10480 R15: 
  FS:  7f1bfb968740() GS:9be5ffc0() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 0090 CR3: 3a894000 CR4: 00360ff0
  DR0:  DR1:  DR2: 
  DR3:  DR6: fffe0ff0 DR7: 0400
  Call Trace:
   [] virtblk_getgeo+0x47/0x110 [virtio_blk]
   [] ? handle_mm_fault+0x39d/0x9b0
   [] blkdev_ioctl+0x1f5/0xa20
   [] block_ioctl+0x41/0x50
   [] do_vfs_ioctl+0x3a0/0x5a0
   [] SyS_ioctl+0xa1/0xc0

A related problem is that virtblk_remove() leaks the vd_index_ida index
when something still holds a reference to vblk->disk during hot unplug.
This causes virtio-blk device names to be lost (vda, vdb, etc).

Fix these issues by protecting vblk->vdev with a mutex and reference
counting vblk so the vd_index_ida index can be removed in all cases.

Fixes: 48e4043d4529523cbc7fa8dd745bd8e2c45ce1d3
   ("virtio: add virtio disk geometry feature")
Reported-by: Lance Digby 
Signed-off-by: Stefan Hajnoczi 
---
v4:
 * Clarify vdev_mutex usage [Stefano and Michael]

 drivers/block/virtio_blk.c | 86 ++
 1 file changed, 78 insertions(+), 8 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 93468b7c6701..9d21bf0f155e 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -33,6 +33,15 @@ struct virtio_blk_vq {
 } cacheline_aligned_in_smp;
 
 struct virtio_blk {
+   /*
+* This mutex must be held by anything that may run after
+* virtblk_remove() sets vblk->vdev to NULL.
+*
+* blk-mq, virtqueue processing, and sysfs attribute code paths are
+* shut down before vblk->vdev is set to NULL and therefore do not need
+* to hold this mutex.
+*/
+   struct mutex vdev_mutex;
struct virtio_device *vdev;
 
/* The disk structure for the kernel. */
@@ -44,6 +53,13 @@ struct virtio_blk {
/* Process context for config space updates */
struct work_struct config_work;
 
+   /*
+* Tracks references from block_device_operations open/release and
+* virtio_driver probe/remove so this object can be freed once no
+* longer in use.
+*/
+   refcount_t refs;
+
/* What host tells us, plus 2 for header & tailer. */
unsigned int sg_elems;
 
@@ -295,10 +311,55 @@ static int virtblk_get_id(struct gendisk *disk, char 
*id_str)
return err;
 }
 
+static void virtblk_get(struct virtio_blk *vblk)
+{
+   refcount_inc(>refs);
+}
+
+static void virtblk_put(struct virtio_blk *vblk)
+{
+   if (refcount_dec_and_test(>refs)) {
+   ida_simple_remove(_index_ida, vblk->index);
+   mutex_destroy(>vdev_mutex);
+   kfree(vblk);
+   }
+}
+
+static int virtblk_open(struct block_device *bd, fmode_t mode)
+{
+   struct virtio_blk *vblk = bd->bd_disk->private_data;
+   int ret = 0;
+
+   mutex_lock(>vdev_mutex);
+
+   if (vblk->vdev)
+   virtblk_get(vblk);
+   else
+   ret = -ENXIO;
+
+   mutex_unlock(>vdev_mutex);
+   return ret;
+}
+
+static void virtblk_release(struct gendisk *disk, fmode_t mode)
+{
+   struct virtio_blk *vblk = disk->private_data;
+
+   virtblk_put(vblk);
+}
+
 /* We provide getgeo only to please some old bootloader/partitioning tools */
 static int virtblk_getgeo(struct block_device *bd, struct hd_geometry *geo)
 {
struct virtio_blk *vblk = bd->bd_disk->private_data;
+   int ret = 0;
+
+   mutex_lock(>vdev_mutex);
+
+   if (!vblk->vdev) {
+   ret = -ENXIO;
+   got

Re: [PATCH v3] virtio-blk: handle block_device_operations callbacks after hot unplug

2020-04-30 Thread Stefan Hajnoczi
On Thu, Apr 30, 2020 at 10:43:23AM +0200, Stefano Garzarella wrote:
> On Wed, Apr 29, 2020 at 05:53:45PM +0100, Stefan Hajnoczi wrote:
> > A userspace process holding a file descriptor to a virtio_blk device can
> > still invoke block_device_operations after hot unplug.  This leads to a
> > use-after-free accessing vblk->vdev in virtblk_getgeo() when
> > ioctl(HDIO_GETGEO) is invoked:
> > 
> >   BUG: unable to handle kernel NULL pointer dereference at 0090
> >   IP: [] virtio_check_driver_offered_feature+0x10/0x90 
> > [virtio]
> >   PGD 80003a92f067 PUD 3a930067 PMD 0
> >   Oops:  [#1] SMP
> >   CPU: 0 PID: 1310 Comm: hdio-getgeo Tainted: G   OE    
> >  3.10.0-1062.el7.x86_64 #1
> >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> >   task: 9be5fbfb8000 ti: 9be5fa89 task.ti: 9be5fa89
> >   RIP: 0010:[]  [] 
> > virtio_check_driver_offered_feature+0x10/0x90 [virtio]
> >   RSP: 0018:9be5fa893dc8  EFLAGS: 00010246
> >   RAX: 9be5fc3f3400 RBX: 9be5fa893e30 RCX: 
> >   RDX:  RSI: 0004 RDI: 9be5fbc10b40
> >   RBP: 9be5fa893dc8 R08: 0301 R09: 0301
> >   R10:  R11:  R12: 9be5fdc24680
> >   R13: 9be5fbc10b40 R14: 9be5fbc10480 R15: 
> >   FS:  7f1bfb968740() GS:9be5ffc0() 
> > knlGS:
> >   CS:  0010 DS:  ES:  CR0: 80050033
> >   CR2: 0090 CR3: 3a894000 CR4: 00360ff0
> >   DR0:  DR1:  DR2: 
> >   DR3:  DR6: fffe0ff0 DR7: 0400
> >   Call Trace:
> >[] virtblk_getgeo+0x47/0x110 [virtio_blk]
> >[] ? handle_mm_fault+0x39d/0x9b0
> >[] blkdev_ioctl+0x1f5/0xa20
> >[] block_ioctl+0x41/0x50
> >[] do_vfs_ioctl+0x3a0/0x5a0
> >[] SyS_ioctl+0xa1/0xc0
> > 
> > A related problem is that virtblk_remove() leaks the vd_index_ida index
> > when something still holds a reference to vblk->disk during hot unplug.
> > This causes virtio-blk device names to be lost (vda, vdb, etc).
> > 
> > Fix these issues by protecting vblk->vdev with a mutex and reference
> > counting vblk so the vd_index_ida index can be removed in all cases.
> > 
> > Fixes: 48e4043d4529523cbc7fa8dd745bd8e2c45ce1d3
> >("virtio: add virtio disk geometry feature")
> > Reported-by: Lance Digby 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  drivers/block/virtio_blk.c | 87 ++
> >  1 file changed, 79 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 93468b7c6701..6f7f277495f4 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -33,6 +33,16 @@ struct virtio_blk_vq {
> >  } cacheline_aligned_in_smp;
> >  
> >  struct virtio_blk {
> > +   /*
> > +* vdev may be accessed without taking this mutex in blk-mq and
> > +* virtqueue code paths because virtblk_remove() stops them before vdev
> > +* is freed.
> > +*
> > +* Everything else must hold this mutex when accessing vdev and must
> > +* handle the case where vdev is NULL after virtblk_remove() has been
> > +* called.
> > +*/
> > +   struct mutex vdev_mutex;
> 
> The patch LGTM, I'm just worried about cache_type_store() and
> cache_type_show() because IIUC they aren't in blk-mq and virtqueue code
> paths, but they use vdev.
> 
> Do we have to take the mutex there too?

No, because del_gendisk() in virtblk_remove() deletes the sysfs
attributes before vblk->vdev is set to NULL.  kernfs deactivates the
kernfs nodes so that further read()/write() operations fail with ENODEV
(see fs/kernfs/dir.c and fs/kernfs/file.c).

I have tested that a userspace process with a sysfs attr file open
cannot access the attribute after virtblk_remove().

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3] virtio-blk: handle block_device_operations callbacks after hot unplug

2020-04-29 Thread Stefan Hajnoczi
A userspace process holding a file descriptor to a virtio_blk device can
still invoke block_device_operations after hot unplug.  This leads to a
use-after-free accessing vblk->vdev in virtblk_getgeo() when
ioctl(HDIO_GETGEO) is invoked:

  BUG: unable to handle kernel NULL pointer dereference at 0090
  IP: [] virtio_check_driver_offered_feature+0x10/0x90 
[virtio]
  PGD 80003a92f067 PUD 3a930067 PMD 0
  Oops:  [#1] SMP
  CPU: 0 PID: 1310 Comm: hdio-getgeo Tainted: G   OE     
3.10.0-1062.el7.x86_64 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  task: 9be5fbfb8000 ti: 9be5fa89 task.ti: 9be5fa89
  RIP: 0010:[]  [] 
virtio_check_driver_offered_feature+0x10/0x90 [virtio]
  RSP: 0018:9be5fa893dc8  EFLAGS: 00010246
  RAX: 9be5fc3f3400 RBX: 9be5fa893e30 RCX: 
  RDX:  RSI: 0004 RDI: 9be5fbc10b40
  RBP: 9be5fa893dc8 R08: 0301 R09: 0301
  R10:  R11:  R12: 9be5fdc24680
  R13: 9be5fbc10b40 R14: 9be5fbc10480 R15: 
  FS:  7f1bfb968740() GS:9be5ffc0() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 0090 CR3: 3a894000 CR4: 00360ff0
  DR0:  DR1:  DR2: 
  DR3:  DR6: fffe0ff0 DR7: 0400
  Call Trace:
   [] virtblk_getgeo+0x47/0x110 [virtio_blk]
   [] ? handle_mm_fault+0x39d/0x9b0
   [] blkdev_ioctl+0x1f5/0xa20
   [] block_ioctl+0x41/0x50
   [] do_vfs_ioctl+0x3a0/0x5a0
   [] SyS_ioctl+0xa1/0xc0

A related problem is that virtblk_remove() leaks the vd_index_ida index
when something still holds a reference to vblk->disk during hot unplug.
This causes virtio-blk device names to be lost (vda, vdb, etc).

Fix these issues by protecting vblk->vdev with a mutex and reference
counting vblk so the vd_index_ida index can be removed in all cases.

Fixes: 48e4043d4529523cbc7fa8dd745bd8e2c45ce1d3
   ("virtio: add virtio disk geometry feature")
Reported-by: Lance Digby 
Signed-off-by: Stefan Hajnoczi 
---
 drivers/block/virtio_blk.c | 87 ++
 1 file changed, 79 insertions(+), 8 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 93468b7c6701..6f7f277495f4 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -33,6 +33,16 @@ struct virtio_blk_vq {
 } cacheline_aligned_in_smp;
 
 struct virtio_blk {
+   /*
+* vdev may be accessed without taking this mutex in blk-mq and
+* virtqueue code paths because virtblk_remove() stops them before vdev
+* is freed.
+*
+* Everything else must hold this mutex when accessing vdev and must
+* handle the case where vdev is NULL after virtblk_remove() has been
+* called.
+*/
+   struct mutex vdev_mutex;
struct virtio_device *vdev;
 
/* The disk structure for the kernel. */
@@ -44,6 +54,13 @@ struct virtio_blk {
/* Process context for config space updates */
struct work_struct config_work;
 
+   /*
+* Tracks references from block_device_operations open/release and
+* virtio_driver probe/remove so this object can be freed once no
+* longer in use.
+*/
+   refcount_t refs;
+
/* What host tells us, plus 2 for header & tailer. */
unsigned int sg_elems;
 
@@ -295,10 +312,55 @@ static int virtblk_get_id(struct gendisk *disk, char 
*id_str)
return err;
 }
 
+static void virtblk_get(struct virtio_blk *vblk)
+{
+   refcount_inc(>refs);
+}
+
+static void virtblk_put(struct virtio_blk *vblk)
+{
+   if (refcount_dec_and_test(>refs)) {
+   ida_simple_remove(_index_ida, vblk->index);
+   mutex_destroy(>vdev_mutex);
+   kfree(vblk);
+   }
+}
+
+static int virtblk_open(struct block_device *bd, fmode_t mode)
+{
+   struct virtio_blk *vblk = bd->bd_disk->private_data;
+   int ret = 0;
+
+   mutex_lock(>vdev_mutex);
+
+   if (vblk->vdev)
+   virtblk_get(vblk);
+   else
+   ret = -ENXIO;
+
+   mutex_unlock(>vdev_mutex);
+   return ret;
+}
+
+static void virtblk_release(struct gendisk *disk, fmode_t mode)
+{
+   struct virtio_blk *vblk = disk->private_data;
+
+   virtblk_put(vblk);
+}
+
 /* We provide getgeo only to please some old bootloader/partitioning tools */
 static int virtblk_getgeo(struct block_device *bd, struct hd_geometry *geo)
 {
struct virtio_blk *vblk = bd->bd_disk->private_data;
+   int ret = 0;
+
+   mutex_lock(>vdev_mutex);
+
+   if (!vblk->vdev) {
+   ret = -ENXIO;
+   goto out;

Re: [PATCH v2] virtio-blk: handle block_device_operations callbacks after hot unplug

2020-04-28 Thread Stefan Hajnoczi
On Tue, Apr 28, 2020 at 11:25:07AM -0400, Michael S. Tsirkin wrote:
> On Tue, Apr 28, 2020 at 03:30:09PM +0100, Stefan Hajnoczi wrote:
> > A userspace process holding a file descriptor to a virtio_blk device can
> > still invoke block_device_operations after hot unplug.  For example, a
> > program that has /dev/vdb open can call ioctl(HDIO_GETGEO) after hot
> > unplug to invoke virtblk_getgeo().
> 
> 
> which causes what? a use after free?

Yes, use after free.  I will include the kernel panic in the next
revision.

virtio_check_driver_offered_feature() accesses vdev->dev.driver, which
doesn't work after virtblk_remove() has freed vdev :).

> > 
> > Introduce a reference count in struct virtio_blk so that its lifetime
> > covers both virtio_driver probe/remove and block_device_operations
> > open/release users.  This ensures that block_device_operations functions
> > like virtblk_getgeo() can safely access struct virtio_blk.
> > 
> > Add remove_mutex to prevent block_device_operations functions from
> > accessing vblk->vdev during virtblk_remove() and let the safely check
> 
> let the -> let them?

Thanks, will fix.

> 
> > for !vblk->vdev after virtblk_remove() returns.
> > 
> > Switching to a reference count also solves the vd_index_ida leak where
> > vda, vdb, vdc, etc indices were lost when the device was hot unplugged
> > while the block device was still open.
> 
> Can you move this statement up so we list both issues (use after free
> and leak) upfront, then discuss the fix?

Thanks, will fix.

> 
> > 
> > Reported-by: Lance Digby 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> > If someone has a simpler solution please let me know.  I looked at
> > various approaches including reusing device_lock(>vdev.dev) but
> > they were more complex and extending the lifetime of virtio_device after
> > remove() has been called seems questionable.
> > ---
> >  drivers/block/virtio_blk.c | 85 ++
> >  1 file changed, 77 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 93468b7c6701..3dd53b445cc1 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -44,6 +44,13 @@ struct virtio_blk {
> > /* Process context for config space updates */
> > struct work_struct config_work;
> >  
> > +   /*
> > +* Tracks references from block_device_operations open/release and
> > +* virtio_driver probe/remove so this object can be freed once no
> > +* longer in use.
> > +*/
> > +   refcount_t refs;
> > +
> > /* What host tells us, plus 2 for header & tailer. */
> > unsigned int sg_elems;
> >  
> > @@ -53,6 +60,9 @@ struct virtio_blk {
> > /* num of vqs */
> > int num_vqs;
> > struct virtio_blk_vq *vqs;
> > +
> > +   /* Provides mutual exclusion with virtblk_remove(). */
> 
> This is not the best way to document access rules.
> Which fields does this protect, exactly?
> I think it's just vdev. Right?
> Pls add to the comment.

Yes, vblk->vdev cannot be dereferenced after virtblk_remove() is
entered.

I'll document this mutex as protecting vdev.

> 
> > +   struct mutex remove_mutex;
> >  };
> >  
> >  struct virtblk_req {
> > @@ -295,10 +305,54 @@ static int virtblk_get_id(struct gendisk *disk, char 
> > *id_str)
> > return err;
> >  }
> >  
> > +static void virtblk_get(struct virtio_blk *vblk)
> > +{
> > +   refcount_inc(>refs);
> > +}
> > +
> > +static void virtblk_put(struct virtio_blk *vblk)
> > +{
> > +   if (refcount_dec_and_test(>refs)) {
> > +   ida_simple_remove(_index_ida, vblk->index);
> > +   mutex_destroy(>remove_mutex);
> > +   kfree(vblk);
> > +   }
> > +}
> > +
> > +static int virtblk_open(struct block_device *bd, fmode_t mode)
> > +{
> > +   struct virtio_blk *vblk = bd->bd_disk->private_data;
> > +   int ret = -ENXIO;
> 
> 
> It's more common to do
> 
>   int ret = 0;
> 
> and on error:
>   ret = -ENXIO;
> 
> 
> let's do this.

Will fix.

> > +
> > +   mutex_lock(>remove_mutex);
> > +
> > +   if (vblk->vdev) {
> > +   virtblk_get(vblk);
> > +   ret = 0;
> > +   }
> 
> I prefer
>   else
>   ret = -ENXIO
> 
> here.

Got it.

> > +
> > +   mutex_unlock(>remove_mutex);
> > +   re

[PATCH v2] virtio-blk: handle block_device_operations callbacks after hot unplug

2020-04-28 Thread Stefan Hajnoczi
A userspace process holding a file descriptor to a virtio_blk device can
still invoke block_device_operations after hot unplug.  For example, a
program that has /dev/vdb open can call ioctl(HDIO_GETGEO) after hot
unplug to invoke virtblk_getgeo().

Introduce a reference count in struct virtio_blk so that its lifetime
covers both virtio_driver probe/remove and block_device_operations
open/release users.  This ensures that block_device_operations functions
like virtblk_getgeo() can safely access struct virtio_blk.

Add remove_mutex to prevent block_device_operations functions from
accessing vblk->vdev during virtblk_remove() and let the safely check
for !vblk->vdev after virtblk_remove() returns.

Switching to a reference count also solves the vd_index_ida leak where
vda, vdb, vdc, etc indices were lost when the device was hot unplugged
while the block device was still open.

Reported-by: Lance Digby 
Signed-off-by: Stefan Hajnoczi 
---
If someone has a simpler solution please let me know.  I looked at
various approaches including reusing device_lock(>vdev.dev) but
they were more complex and extending the lifetime of virtio_device after
remove() has been called seems questionable.
---
 drivers/block/virtio_blk.c | 85 ++
 1 file changed, 77 insertions(+), 8 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 93468b7c6701..3dd53b445cc1 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -44,6 +44,13 @@ struct virtio_blk {
/* Process context for config space updates */
struct work_struct config_work;
 
+   /*
+* Tracks references from block_device_operations open/release and
+* virtio_driver probe/remove so this object can be freed once no
+* longer in use.
+*/
+   refcount_t refs;
+
/* What host tells us, plus 2 for header & tailer. */
unsigned int sg_elems;
 
@@ -53,6 +60,9 @@ struct virtio_blk {
/* num of vqs */
int num_vqs;
struct virtio_blk_vq *vqs;
+
+   /* Provides mutual exclusion with virtblk_remove(). */
+   struct mutex remove_mutex;
 };
 
 struct virtblk_req {
@@ -295,10 +305,54 @@ static int virtblk_get_id(struct gendisk *disk, char 
*id_str)
return err;
 }
 
+static void virtblk_get(struct virtio_blk *vblk)
+{
+   refcount_inc(>refs);
+}
+
+static void virtblk_put(struct virtio_blk *vblk)
+{
+   if (refcount_dec_and_test(>refs)) {
+   ida_simple_remove(_index_ida, vblk->index);
+   mutex_destroy(>remove_mutex);
+   kfree(vblk);
+   }
+}
+
+static int virtblk_open(struct block_device *bd, fmode_t mode)
+{
+   struct virtio_blk *vblk = bd->bd_disk->private_data;
+   int ret = -ENXIO;
+
+   mutex_lock(>remove_mutex);
+
+   if (vblk->vdev) {
+   virtblk_get(vblk);
+   ret = 0;
+   }
+
+   mutex_unlock(>remove_mutex);
+   return ret;
+}
+
+static void virtblk_release(struct gendisk *disk, fmode_t mode)
+{
+   struct virtio_blk *vblk = disk->private_data;
+
+   virtblk_put(vblk);
+}
+
 /* We provide getgeo only to please some old bootloader/partitioning tools */
 static int virtblk_getgeo(struct block_device *bd, struct hd_geometry *geo)
 {
struct virtio_blk *vblk = bd->bd_disk->private_data;
+   int ret = -ENXIO;
+
+   mutex_lock(>remove_mutex);
+
+   if (!vblk->vdev) {
+   goto out;
+   }
 
/* see if the host passed in geometry config */
if (virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_GEOMETRY)) {
@@ -314,11 +368,17 @@ static int virtblk_getgeo(struct block_device *bd, struct 
hd_geometry *geo)
geo->sectors = 1 << 5;
geo->cylinders = get_capacity(bd->bd_disk) >> 11;
}
-   return 0;
+
+   ret = 0;
+out:
+   mutex_unlock(>remove_mutex);
+   return ret;
 }
 
 static const struct block_device_operations virtblk_fops = {
.owner  = THIS_MODULE,
+   .open = virtblk_open,
+   .release = virtblk_release,
.getgeo = virtblk_getgeo,
 };
 
@@ -655,6 +715,10 @@ static int virtblk_probe(struct virtio_device *vdev)
goto out_free_index;
}
 
+   /* This reference is dropped in virtblk_remove(). */
+   refcount_set(>refs, 1);
+   mutex_init(>remove_mutex);
+
vblk->vdev = vdev;
vblk->sg_elems = sg_elems;
 
@@ -820,8 +884,12 @@ static int virtblk_probe(struct virtio_device *vdev)
 static void virtblk_remove(struct virtio_device *vdev)
 {
struct virtio_blk *vblk = vdev->priv;
-   int index = vblk->index;
-   int refc;
+
+   /*
+* Virtqueue processing is stopped safely here but mutual exclusion is
+* needed for block_device_operations.
+*/
+   mutex_lock(>remove_mutex);
 
  

Re: [PATCH] virtio-blk: handle block_device_operations callbacks after hot unplug

2020-04-24 Thread Stefan Hajnoczi
On Thu, Apr 23, 2020 at 07:13:38AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 23, 2020 at 01:37:17PM +0100, Stefan Hajnoczi wrote:
> > A virtio_blk block device can still be referenced after hot unplug by
> > userspace processes that hold the file descriptor.  In this case
> > virtblk_getgeo() can be invoked after virtblk_remove() was called.  For
> > example, a program that has /dev/vdb open can call ioctl(HDIO_GETGEO)
> > after hot unplug.
> > 
> > Fix this by clearing vblk->disk->private_data and checking that the
> > virtio_blk driver instance is still around in virtblk_getgeo().
> > 
> > Note that the virtblk_getgeo() function itself is guaranteed to remain
> > in memory after hot unplug because the virtio_blk module refcount is
> > still held while a block device reference exists.
> > 
> > Originally-by: Lance Digby 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  drivers/block/virtio_blk.c | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 93468b7c6701..b50cdf37a6f7 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -300,6 +300,10 @@ static int virtblk_getgeo(struct block_device *bd, 
> > struct hd_geometry *geo)
> >  {
> > struct virtio_blk *vblk = bd->bd_disk->private_data;
> >  
> > +   /* Driver instance has been removed */
> > +   if (!vblk)
> > +   return -ENOTTY;
> 
> If this ever hits you have a nasty race condition and this is not
> going to fix it, as it could be removed just after this check as well.

Perhaps a better fix is to keep a refcount in struct virtio_blk and
implement block_device_operations->release() to decrement the refcount.

That way the struct virtio_blk stays around until all block device
references are dropped.  This is similar to nbd_put() in
drivers/block/nbd.c.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH] virtio-blk: handle block_device_operations callbacks after hot unplug

2020-04-23 Thread Stefan Hajnoczi
A virtio_blk block device can still be referenced after hot unplug by
userspace processes that hold the file descriptor.  In this case
virtblk_getgeo() can be invoked after virtblk_remove() was called.  For
example, a program that has /dev/vdb open can call ioctl(HDIO_GETGEO)
after hot unplug.

Fix this by clearing vblk->disk->private_data and checking that the
virtio_blk driver instance is still around in virtblk_getgeo().

Note that the virtblk_getgeo() function itself is guaranteed to remain
in memory after hot unplug because the virtio_blk module refcount is
still held while a block device reference exists.

Originally-by: Lance Digby 
Signed-off-by: Stefan Hajnoczi 
---
 drivers/block/virtio_blk.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 93468b7c6701..b50cdf37a6f7 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -300,6 +300,10 @@ static int virtblk_getgeo(struct block_device *bd, struct 
hd_geometry *geo)
 {
struct virtio_blk *vblk = bd->bd_disk->private_data;
 
+   /* Driver instance has been removed */
+   if (!vblk)
+   return -ENOTTY;
+
/* see if the host passed in geometry config */
if (virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_GEOMETRY)) {
virtio_cread(vblk->vdev, struct virtio_blk_config,
@@ -835,6 +839,7 @@ static void virtblk_remove(struct virtio_device *vdev)
vdev->config->reset(vdev);
 
refc = kref_read(_to_dev(vblk->disk)->kobj.kref);
+   vblk->disk->private_data = NULL;
put_disk(vblk->disk);
vdev->config->del_vqs(vdev);
kfree(vblk->vqs);
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vsock/virtio: postpone packet delivery to monitoring devices

2020-04-22 Thread Stefan Hajnoczi
On Tue, Apr 21, 2020 at 06:17:24PM +0200, Stefano Garzarella wrote:
> On Tue, Apr 21, 2020 at 04:42:46PM +0100, Stefan Hajnoczi wrote:
> > On Tue, Apr 21, 2020 at 11:25:27AM +0200, Stefano Garzarella wrote:
> > > We delivering packets to monitoring devices, before to check if
> > > the virtqueue has enough space.
> > 
> > "We [are] delivering packets" and "before to check" -> "before
> > checking".  Perhaps it can be rewritten as:
> > 
> >   Packets are delivered to monitoring devices before checking if the
> >   virtqueue has enough space.
> > 
> 
> Yeah, it is better :-)
> 
> > > 
> > > If the virtqueue is full, the transmitting packet is queued up
> > > and it will be sent in the next iteration. This causes the same
> > > packet to be delivered multiple times to monitoring devices.
> > > 
> > > This patch fixes this issue, postponing the packet delivery
> > > to monitoring devices, only when it is properly queued in the
> > 
> > s/,//
> > 
> > > virqueue.
> > 
> > s/virqueue/virtqueue/
> > 
> 
> Thanks, I'll fix in the v2!
> 
> > > @@ -137,6 +135,11 @@ virtio_transport_send_pkt_work(struct work_struct 
> > > *work)
> > >   break;
> > >   }
> > >  
> > > + /* Deliver to monitoring devices all correctly transmitted
> > > +  * packets.
> > > +  */
> > > + virtio_transport_deliver_tap_pkt(pkt);
> > > +
> > 
> > The device may see the tx packet and therefore receive a reply to it
> > before we can call virtio_transport_deliver_tap_pkt().  Does this mean
> > that replies can now appear in the packet capture before the transmitted
> > packet?
> 
> hmm, you are right!
> 
> And the same thing can already happen in vhost-vsock where we call
> virtio_transport_deliver_tap_pkt() after the vhost_add_used(), right?
> 
> The vhost-vsock case can be fixed in a simple way, but here do you think
> we should serialize them? (e.g. mutex, spinlock)
> 
> In this case I'm worried about performance.
> 
> Or is there some virtqueue API to check availability?

Let's stick to the same semantics as Ethernet netdevs.  That way there
are no surprises to anyone who is familiar with Linux packet captures.
I don't know what those semantics are though, you'd need to check the
code :).

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] vsock/virtio: postpone packet delivery to monitoring devices

2020-04-21 Thread Stefan Hajnoczi
On Tue, Apr 21, 2020 at 11:25:27AM +0200, Stefano Garzarella wrote:
> We delivering packets to monitoring devices, before to check if
> the virtqueue has enough space.

"We [are] delivering packets" and "before to check" -> "before
checking".  Perhaps it can be rewritten as:

  Packets are delivered to monitoring devices before checking if the
  virtqueue has enough space.

> 
> If the virtqueue is full, the transmitting packet is queued up
> and it will be sent in the next iteration. This causes the same
> packet to be delivered multiple times to monitoring devices.
> 
> This patch fixes this issue, postponing the packet delivery
> to monitoring devices, only when it is properly queued in the

s/,//

> virqueue.

s/virqueue/virtqueue/

> @@ -137,6 +135,11 @@ virtio_transport_send_pkt_work(struct work_struct *work)
>   break;
>   }
>  
> + /* Deliver to monitoring devices all correctly transmitted
> +  * packets.
> +  */
> + virtio_transport_deliver_tap_pkt(pkt);
> +

The device may see the tx packet and therefore receive a reply to it
before we can call virtio_transport_deliver_tap_pkt().  Does this mean
that replies can now appear in the packet capture before the transmitted
packet?


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v8 09/19] virtio: stop using legacy struct vring in kernel

2020-04-07 Thread Stefan Hajnoczi
On Mon, Apr 06, 2020 at 09:16:46PM -0400, Michael S. Tsirkin wrote:
> struct vring (in the uapi directory) and supporting APIs are kept
> around to solely avoid breaking old userspace builds.
> It's not actually part of the UAPI - it was kept in the UAPI
> header by mistake, and using it in kernel isn't necessary
> and prevents us from making changes safely.
> In particular, the APIs actually assume the legacy layout.
> 
> Add an internal kernel-only struct vring and
> switch everyone to use that.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/block/virtio_blk.c   |  1 +
>  include/linux/virtio.h   |  1 -
>  include/linux/virtio_ring.h  | 10 ++
>  include/linux/vringh.h   |  1 +
>  include/uapi/linux/virtio_ring.h | 26 --
>  5 files changed, 28 insertions(+), 11 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/2] doc: zh_CN: facilitate translation for filesystems

2020-03-15 Thread Stefan Hajnoczi
On Sun, Mar 15, 2020 at 02:27:58AM -0700, Wang Wenhu wrote:
> This patch series set up the basic facility for the translation work
> of the docs residing on filesystems into Chinese, indexing the filesystems
> directory and adding one indexed translation into it. The virtiofs.rst
> added is not only a translation itself but also an simple example that
> future developers would take.
> 
> The detailed diff info also shows the basic essential markups of
> the toctree and reStructuredText, at least for the most simple occasions.
> More translations of filesystems are on their way, and futher,
> of more subsystems.
> 
> ---
> Wang Wenhu (2):
>   doc: zh_CN: index files in filesystems subdirectory
>   doc: zh_CN: add translation for virtiofs
> 
>  Documentation/filesystems/index.rst   |  2 +
>  Documentation/filesystems/virtiofs.rst|  2 +
>  .../translations/zh_CN/filesystems/index.rst  | 29 +
>  .../zh_CN/filesystems/virtiofs.rst| 62 +++
>  Documentation/translations/zh_CN/index.rst|  1 +
>  5 files changed, 96 insertions(+)
>  create mode 100644 Documentation/translations/zh_CN/filesystems/index.rst
>  create mode 100644 Documentation/translations/zh_CN/filesystems/virtiofs.rst
> 
> -- 
> 2.17.1
> 

I am not a Chinese speaker but thank you for translation the
documentation!

Acked-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] VIRTIO adoption in other hypervisors

2020-02-28 Thread Stefan Hajnoczi
On Fri, Feb 28, 2020 at 10:16:21AM +, Alex Bennée wrote:
> I'm currently trying to get my head around virtio and was wondering how
> widespread adoption of virtio is amongst the various hypervisors and
> emulators out there.
> 
> Obviously I'm familiar with QEMU both via KVM and even when just doing
> plain emulation (although with some restrictions). As far as I'm aware
> the various Rust based VMMs have vary degrees of support for virtio
> devices over KVM as well. CrosVM specifically is embracing virtio for
> multi-process device emulation.
> 
> I believe there has been some development work for supporting VIRTIO on
> Xen although it seems to have stalled according to:
> 
>   https://wiki.xenproject.org/wiki/Virtio_On_Xen
> 
> Recently at KVM Forum there was Jan's talk about Inter-VM shared memory
> which proposed ivshmemv2 as a VIRTIO transport:
> 
>   https://events19.linuxfoundation.org/events/kvm-forum-2019/program/schedule/
> 
> As I understood it this would allow Xen (and other hypervisors) a simple
> way to be able to carry virtio traffic between guest and end point.
> 
> So some questions:
> 
>   - Am I missing anything out in that summary?

VirtualBox has virtio-net support:
https://www.virtualbox.org/manual/ch06.html

>   - How about HyperV and the OSX equivalent?

macOS has *guest* drivers for VIRTIO devices:
https://www.kraxel.org/blog/2019/06/macos-qemu-guest/

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] vsock: fix potential deadlock in transport->release()

2020-02-27 Thread Stefan Hajnoczi
On Wed, Feb 26, 2020 at 11:58:18AM +0100, Stefano Garzarella wrote:
> Some transports (hyperv, virtio) acquire the sock lock during the
> .release() callback.
> 
> In the vsock_stream_connect() we call vsock_assign_transport(); if
> the socket was previously assigned to another transport, the
> vsk->transport->release() is called, but the sock lock is already
> held in the vsock_stream_connect(), causing a deadlock reported by
> syzbot:
> 
> INFO: task syz-executor280:9768 blocked for more than 143 seconds.
>   Not tainted 5.6.0-rc1-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor280 D27912  9768   9766 0x
> Call Trace:
>  context_switch kernel/sched/core.c:3386 [inline]
>  __schedule+0x934/0x1f90 kernel/sched/core.c:4082
>  schedule+0xdc/0x2b0 kernel/sched/core.c:4156
>  __lock_sock+0x165/0x290 net/core/sock.c:2413
>  lock_sock_nested+0xfe/0x120 net/core/sock.c:2938
>  virtio_transport_release+0xc4/0xd60 
> net/vmw_vsock/virtio_transport_common.c:832
>  vsock_assign_transport+0xf3/0x3b0 net/vmw_vsock/af_vsock.c:454
>  vsock_stream_connect+0x2b3/0xc70 net/vmw_vsock/af_vsock.c:1288
>  __sys_connect_file+0x161/0x1c0 net/socket.c:1857
>  __sys_connect+0x174/0x1b0 net/socket.c:1874
>  __do_sys_connect net/socket.c:1885 [inline]
>  __se_sys_connect net/socket.c:1882 [inline]
>  __x64_sys_connect+0x73/0xb0 net/socket.c:1882
>  do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> To avoid this issue, this patch remove the lock acquiring in the
> .release() callback of hyperv and virtio transports, and it holds
> the lock when we call vsk->transport->release() in the vsock core.
> 
> Reported-by: syzbot+731710996d79d0d58...@syzkaller.appspotmail.com
> Fixes: 408624af4c89 ("vsock: use local transport when it is loaded")
> Signed-off-by: Stefano Garzarella 
> ---
>  net/vmw_vsock/af_vsock.c| 20 
>  net/vmw_vsock/hyperv_transport.c|  3 ---
>  net/vmw_vsock/virtio_transport_common.c |  2 --
>  3 files changed, 12 insertions(+), 13 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/2] virtio-blk: improve handling of DMA mapping failures

2020-02-19 Thread Stefan Hajnoczi
On Thu, Feb 13, 2020 at 01:37:26PM +0100, Halil Pasic wrote:
> Two patches are handling new edge cases introduced by doing DMA mappings
> (which can fail) in virtio core.
> 
> I stumbled upon this while stress testing I/O for Protected Virtual
> Machines. I deliberately chose a tiny swiotlb size and have generated
> load with fio. With more than one virtio-blk disk in use I experienced
> hangs.
> 
> The goal of this series is to fix those hangs.
> 
> Halil Pasic (2):
>   virtio-blk: fix hw_queue stopped on arbitrary error
>   virtio-blk: improve virtqueue error to BLK_STS
> 
>  drivers/block/virtio_blk.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> 
> base-commit: 39bed42de2e7d74686a2d5a45638d6a5d7e7d473
> -- 
> 2.17.1
> 

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 0/3] vsock: support network namespace

2020-01-21 Thread Stefan Hajnoczi
What should vsock_dev_do_ioctl() IOCTL_VM_SOCKETS_GET_LOCAL_CID return?
The answer is probably dependent on the caller's network namespace.

Ultimately we may need per-namespace transports.  Imagine assigning a
G2H transport to a specific network namespace.

vsock_stream_connect() needs to be namespace-aware so that other
namespaces cannot use the G2H transport to send a connection
establishment packet.


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 1/3] vsock: add network namespace support

2020-01-21 Thread Stefan Hajnoczi
On Tue, Jan 21, 2020 at 09:31:42AM -0500, Michael S. Tsirkin wrote:
> On Tue, Jan 21, 2020 at 01:59:07PM +0000, Stefan Hajnoczi wrote:
> > On Tue, Jan 21, 2020 at 10:07:06AM +0100, Stefano Garzarella wrote:
> > > On Mon, Jan 20, 2020 at 11:02 PM Michael S. Tsirkin  
> > > wrote:
> > > > On Mon, Jan 20, 2020 at 05:53:39PM +0100, Stefano Garzarella wrote:
> > > > > On Mon, Jan 20, 2020 at 5:04 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > > On Mon, Jan 20, 2020 at 02:58:01PM +0100, Stefano Garzarella wrote:
> > > > > > > On Mon, Jan 20, 2020 at 1:03 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > > On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella 
> > > > > > > > wrote:
> > > > > > > > > On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> > > > > > > > > > From: Stefano Garzarella 
> > > > > > > > > > Date: Thu, 16 Jan 2020 18:24:26 +0100
> > > > > > > > > >
> > > > > > > > > > > This patch adds 'netns' module param to enable this new 
> > > > > > > > > > > feature
> > > > > > > > > > > (disabled by default), because it changes vsock's 
> > > > > > > > > > > behavior with
> > > > > > > > > > > network namespaces and could break existing applications.
> > > > > > > > > >
> > > > > > > > > > Sorry, no.
> > > > > > > > > >
> > > > > > > > > > I wonder if you can even design a legitimate, reasonable, 
> > > > > > > > > > use case
> > > > > > > > > > where these netns changes could break things.
> > > > > > > > >
> > > > > > > > > I forgot to mention the use case.
> > > > > > > > > I tried the RFC with Kata containers and we found that Kata 
> > > > > > > > > shim-v1
> > > > > > > > > doesn't work (Kata shim-v2 works as is) because there are the 
> > > > > > > > > following
> > > > > > > > > processes involved:
> > > > > > > > > - kata-runtime (runs in the init_netns) opens 
> > > > > > > > > /dev/vhost-vsock and
> > > > > > > > >   passes it to qemu
> > > > > > > > > - kata-shim (runs in a container) wants to talk with the 
> > > > > > > > > guest but the
> > > > > > > > >   vsock device is assigned to the init_netns and kata-shim 
> > > > > > > > > runs in a
> > > > > > > > >   different netns, so the communication is not allowed
> > > > > > > > > But, as you said, this could be a wrong design, indeed they 
> > > > > > > > > already
> > > > > > > > > found a fix, but I was not sure if others could have the same 
> > > > > > > > > issue.
> > > > > > > > >
> > > > > > > > > In this case, do you think it is acceptable to make this 
> > > > > > > > > change in
> > > > > > > > > the vsock's behavior with netns and ask the user to change 
> > > > > > > > > the design?
> > > > > > > >
> > > > > > > > David's question is what would be a usecase that's broken
> > > > > > > > (as opposed to fixed) by enabling this by default.
> > > > > > >
> > > > > > > Yes, I got that. Thanks for clarifying.
> > > > > > > I just reported a broken example that can be fixed with a 
> > > > > > > different
> > > > > > > design (due to the fact that before this series, vsock devices 
> > > > > > > were
> > > > > > > accessible to all netns).
> > > > > > >
> > > > > > > >
> > > > > > > > If it does exist, you need a way for userspace to opt-in,
> > > > > > > > module parameter isn't that.
> > > > > > >
> > > > > > > Okay, but I honestly can't find a case that can't be solved.
> > > 

Re: [PATCH net-next 1/3] vsock: add network namespace support

2020-01-21 Thread Stefan Hajnoczi
On Tue, Jan 21, 2020 at 06:14:48AM -0500, Michael S. Tsirkin wrote:
> On Tue, Jan 21, 2020 at 10:07:06AM +0100, Stefano Garzarella wrote:
> > On Mon, Jan 20, 2020 at 11:02 PM Michael S. Tsirkin  wrote:
> > > On Mon, Jan 20, 2020 at 05:53:39PM +0100, Stefano Garzarella wrote:
> > > > On Mon, Jan 20, 2020 at 5:04 PM Michael S. Tsirkin  
> > > > wrote:
> > > > > On Mon, Jan 20, 2020 at 02:58:01PM +0100, Stefano Garzarella wrote:
> > > > > > On Mon, Jan 20, 2020 at 1:03 PM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > > On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella 
> > > > > > > wrote:
> > > > > > > > On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> > > > > > > > > From: Stefano Garzarella 
> > > > > > > > > Date: Thu, 16 Jan 2020 18:24:26 +0100
> > > > > > > > >
> > > > > > > > > > This patch adds 'netns' module param to enable this new 
> > > > > > > > > > feature
> > > > > > > > > > (disabled by default), because it changes vsock's behavior 
> > > > > > > > > > with
> > > > > > > > > > network namespaces and could break existing applications.
> > > > > > > > >
> > > > > > > > > Sorry, no.
> > > > > > > > >
> > > > > > > > > I wonder if you can even design a legitimate, reasonable, use 
> > > > > > > > > case
> > > > > > > > > where these netns changes could break things.
> > > > > > > >
> > > > > > > > I forgot to mention the use case.
> > > > > > > > I tried the RFC with Kata containers and we found that Kata 
> > > > > > > > shim-v1
> > > > > > > > doesn't work (Kata shim-v2 works as is) because there are the 
> > > > > > > > following
> > > > > > > > processes involved:
> > > > > > > > - kata-runtime (runs in the init_netns) opens /dev/vhost-vsock 
> > > > > > > > and
> > > > > > > >   passes it to qemu
> > > > > > > > - kata-shim (runs in a container) wants to talk with the guest 
> > > > > > > > but the
> > > > > > > >   vsock device is assigned to the init_netns and kata-shim runs 
> > > > > > > > in a
> > > > > > > >   different netns, so the communication is not allowed
> > > > > > > > But, as you said, this could be a wrong design, indeed they 
> > > > > > > > already
> > > > > > > > found a fix, but I was not sure if others could have the same 
> > > > > > > > issue.
> > > > > > > >
> > > > > > > > In this case, do you think it is acceptable to make this change 
> > > > > > > > in
> > > > > > > > the vsock's behavior with netns and ask the user to change the 
> > > > > > > > design?
> > > > > > >
> > > > > > > David's question is what would be a usecase that's broken
> > > > > > > (as opposed to fixed) by enabling this by default.
> > > > > >
> > > > > > Yes, I got that. Thanks for clarifying.
> > > > > > I just reported a broken example that can be fixed with a different
> > > > > > design (due to the fact that before this series, vsock devices were
> > > > > > accessible to all netns).
> > > > > >
> > > > > > >
> > > > > > > If it does exist, you need a way for userspace to opt-in,
> > > > > > > module parameter isn't that.
> > > > > >
> > > > > > Okay, but I honestly can't find a case that can't be solved.
> > > > > > So I don't know whether to add an option (ioctl, sysfs ?) or wait 
> > > > > > for
> > > > > > a real case to come up.
> > > > > >
> > > > > > I'll try to see better if there's any particular case where we need
> > > > > > to disable netns in vsock.
> > > > > >
> > > > > > Thanks,
> > > > > > Stefano
> > > > >
> > > > > Me neither. so what did you have in mind when you wrote:
> > > > > "could break existing applications"?
> > > >
> > > > I had in mind:
> > > > 1. the Kata case. It is fixable (the fix is not merged on kata), but
> > > >older versions will not work with newer Linux.
> > >
> > > meaning they will keep not working, right?
> > 
> > Right, I mean without this series they work, with this series they work
> > only if the netns support is disabled or with a patch proposed but not
> > merged in kata.
> > 
> > >
> > > > 2. a single process running on init_netns that wants to communicate with
> > > >VMs handled by VMMs running in different netns, but this case can be
> > > >solved opening the /dev/vhost-vsock in the same netns of the process
> > > >that wants to communicate with the VMs (init_netns in this case), and
> > > >passig it to the VMM.
> > >
> > > again right now they just don't work, right?
> > 
> > Right, as above.
> > 
> > What do you recommend I do?
> > 
> > Thanks,
> > Stefano
> 
> If this breaks userspace, then we need to maintain compatibility.
> For example, have two devices, /dev/vhost-vsock and /dev/vhost-vsock-netns?

/dev/vhost-vsock-netns is cleaner and simpler than my suggestion.  I
like it!

This is nice for containers (say you want to run QEMU inside a container
on the host) because you can allow only /dev/vhost-vsock-netns inside
containers.  This prevents them from opening /dev/vhost-vsock to get
access to the initial network namespace.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH net-next 1/3] vsock: add network namespace support

2020-01-21 Thread Stefan Hajnoczi
On Tue, Jan 21, 2020 at 10:07:06AM +0100, Stefano Garzarella wrote:
> On Mon, Jan 20, 2020 at 11:02 PM Michael S. Tsirkin  wrote:
> > On Mon, Jan 20, 2020 at 05:53:39PM +0100, Stefano Garzarella wrote:
> > > On Mon, Jan 20, 2020 at 5:04 PM Michael S. Tsirkin  
> > > wrote:
> > > > On Mon, Jan 20, 2020 at 02:58:01PM +0100, Stefano Garzarella wrote:
> > > > > On Mon, Jan 20, 2020 at 1:03 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > > On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella wrote:
> > > > > > > On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote:
> > > > > > > > From: Stefano Garzarella 
> > > > > > > > Date: Thu, 16 Jan 2020 18:24:26 +0100
> > > > > > > >
> > > > > > > > > This patch adds 'netns' module param to enable this new 
> > > > > > > > > feature
> > > > > > > > > (disabled by default), because it changes vsock's behavior 
> > > > > > > > > with
> > > > > > > > > network namespaces and could break existing applications.
> > > > > > > >
> > > > > > > > Sorry, no.
> > > > > > > >
> > > > > > > > I wonder if you can even design a legitimate, reasonable, use 
> > > > > > > > case
> > > > > > > > where these netns changes could break things.
> > > > > > >
> > > > > > > I forgot to mention the use case.
> > > > > > > I tried the RFC with Kata containers and we found that Kata 
> > > > > > > shim-v1
> > > > > > > doesn't work (Kata shim-v2 works as is) because there are the 
> > > > > > > following
> > > > > > > processes involved:
> > > > > > > - kata-runtime (runs in the init_netns) opens /dev/vhost-vsock and
> > > > > > >   passes it to qemu
> > > > > > > - kata-shim (runs in a container) wants to talk with the guest 
> > > > > > > but the
> > > > > > >   vsock device is assigned to the init_netns and kata-shim runs 
> > > > > > > in a
> > > > > > >   different netns, so the communication is not allowed
> > > > > > > But, as you said, this could be a wrong design, indeed they 
> > > > > > > already
> > > > > > > found a fix, but I was not sure if others could have the same 
> > > > > > > issue.
> > > > > > >
> > > > > > > In this case, do you think it is acceptable to make this change in
> > > > > > > the vsock's behavior with netns and ask the user to change the 
> > > > > > > design?
> > > > > >
> > > > > > David's question is what would be a usecase that's broken
> > > > > > (as opposed to fixed) by enabling this by default.
> > > > >
> > > > > Yes, I got that. Thanks for clarifying.
> > > > > I just reported a broken example that can be fixed with a different
> > > > > design (due to the fact that before this series, vsock devices were
> > > > > accessible to all netns).
> > > > >
> > > > > >
> > > > > > If it does exist, you need a way for userspace to opt-in,
> > > > > > module parameter isn't that.
> > > > >
> > > > > Okay, but I honestly can't find a case that can't be solved.
> > > > > So I don't know whether to add an option (ioctl, sysfs ?) or wait for
> > > > > a real case to come up.
> > > > >
> > > > > I'll try to see better if there's any particular case where we need
> > > > > to disable netns in vsock.
> > > > >
> > > > > Thanks,
> > > > > Stefano
> > > >
> > > > Me neither. so what did you have in mind when you wrote:
> > > > "could break existing applications"?
> > >
> > > I had in mind:
> > > 1. the Kata case. It is fixable (the fix is not merged on kata), but
> > >older versions will not work with newer Linux.
> >
> > meaning they will keep not working, right?
> 
> Right, I mean without this series they work, with this series they work
> only if the netns support is disabled or with a patch proposed but not
> merged in kata.
> 
> >
> > > 2. a single process running on init_netns that wants to communicate with
> > >VMs handled by VMMs running in different netns, but this case can be
> > >solved opening the /dev/vhost-vsock in the same netns of the process
> > >that wants to communicate with the VMs (init_netns in this case), and
> > >passig it to the VMM.
> >
> > again right now they just don't work, right?
> 
> Right, as above.
> 
> What do you recommend I do?

Existing userspace applications must continue to work.

Guests are fine because G2H transports are always in the initial network
namespace.

On the host side we have a real case where Kata Containers and other
vsock users break.  Existing applications run in other network
namespaces and assume they can communicate over vsock (it's only
available in the initial network namespace by default).

It seems we cannot isolate new network namespaces from the initial
network namespace by default because it will break existing
applications.  That's a bummer.

There is one solution that maintains compatibility:

Introduce a per-namespace vsock isolation flag that can only transition
from false to true.  Once it becomes true it cannot be reset to false
anymore (for security).

When vsock isolation is false the initial network namespace is used for
 addressing.

When vsock isolation 

Re: [PATCH v3 13/22] compat_ioctl: scsi: move ioctl handling into drivers

2020-01-02 Thread Stefan Hajnoczi
On Thu, Jan 02, 2020 at 03:55:31PM +0100, Arnd Bergmann wrote:
> Each driver calling scsi_ioctl() gets an equivalent compat_ioctl()
> handler that implements the same commands by calling scsi_compat_ioctl().
> 
> The scsi_cmd_ioctl() and scsi_cmd_blk_ioctl() functions are compatible
> at this point, so any driver that calls those can do so for both native
> and compat mode, with the argument passed through compat_ptr().
> 
> With this, we can remove the entries from fs/compat_ioctl.c.  The new
> code is larger, but should be easier to maintain and keep updated with
> newly added commands.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/block/virtio_blk.c |   3 +
>  drivers/scsi/ch.c  |   9 ++-
>  drivers/scsi/sd.c  |  50 ++
>  drivers/scsi/sg.c  |  44 -
>  drivers/scsi/sr.c  |  57 ++--
>  drivers/scsi/st.c  |  51 --
>  fs/compat_ioctl.c  | 132 +
>  7 files changed, 142 insertions(+), 204 deletions(-)

virtio_blk.c changes:

Acked-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v3 00/11] VSOCK: add vsock_test test suite

2019-12-19 Thread Stefan Hajnoczi
On Wed, Dec 18, 2019 at 07:06:57PM +0100, Stefano Garzarella wrote:
> The vsock_diag.ko module already has a test suite but the core AF_VSOCK
> functionality has no tests. This patch series adds several test cases that
> exercise AF_VSOCK SOCK_STREAM socket semantics (send/recv, connect/accept,
> half-closed connections, simultaneous connections).
> 
> The v1 of this series was originally sent by Stefan.
> 
> v3:
> - Patch 6:
>   * check the byte received in the recv_byte()
>   * use send(2)/recv(2) instead of write(2)/read(2) to test also flags
> (e.g. MSG_PEEK)
> - Patch 8:
>   * removed unnecessary control_expectln("CLOSED") [Stefan].
> - removed patches 9,10,11 added in the v2
> - new Patch 9 add parameters to list and skip tests (e.g. useful for vmci
>   that doesn't support half-closed socket in the host)
> - new Patch 10 prints a list of options in the help
> - new Patch 11 tests MSG_PEEK flags of recv(2)
> 
> v2: https://patchwork.ozlabs.org/cover/1140538/
> v1: https://patchwork.ozlabs.org/cover/847998/
> 
> Stefan Hajnoczi (7):
>   VSOCK: fix header include in vsock_diag_test
>   VSOCK: add SPDX identifiers to vsock tests
>   VSOCK: extract utility functions from vsock_diag_test.c
>   VSOCK: extract connect/accept functions from vsock_diag_test.c
>   VSOCK: add full barrier between test cases
>   VSOCK: add send_byte()/recv_byte() test utilities
>   VSOCK: add AF_VSOCK test cases
> 
> Stefano Garzarella (4):
>   vsock_test: wait for the remote to close the connection
>   testing/vsock: add parameters to list and skip tests
>   testing/vsock: print list of options and description
>   vsock_test: add SOCK_STREAM MSG_PEEK test
> 
>  tools/testing/vsock/.gitignore|   1 +
>  tools/testing/vsock/Makefile  |   9 +-
>  tools/testing/vsock/README|   3 +-
>  tools/testing/vsock/control.c |  15 +-
>  tools/testing/vsock/control.h |   2 +
>  tools/testing/vsock/timeout.h |   1 +
>  tools/testing/vsock/util.c| 376 +
>  tools/testing/vsock/util.h|  49 
>  tools/testing/vsock/vsock_diag_test.c | 202 --
>  tools/testing/vsock/vsock_test.c  | 379 ++
>  10 files changed, 883 insertions(+), 154 deletions(-)
>  create mode 100644 tools/testing/vsock/util.c
>  create mode 100644 tools/testing/vsock/util.h
>  create mode 100644 tools/testing/vsock/vsock_test.c
> 
> -- 
> 2.24.1
> 
> _______
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-blk: remove VIRTIO_BLK_F_SCSI support

2019-12-13 Thread Stefan Hajnoczi
On Thu, Dec 12, 2019 at 05:37:19PM +0100, Christoph Hellwig wrote:
> Since the need for a special flag to support SCSI passthrough on a
> block device was added in May 2017 the SCSI passthrough support in
> virtio-blk has been disabled.  It has always been a bad idea
> (just ask the original author..) and we have virtio-scsi for proper
> passthrough.  The feature also never made it into the virtio 1.0
> or later specifications.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/configs/guest.config |   1 -
>  drivers/block/Kconfig |  10 ---
>  drivers/block/virtio_blk.c| 115 +-
>  3 files changed, 1 insertion(+), 125 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 0/3] vsock: support network namespace

2019-12-03 Thread Stefan Hajnoczi
On Thu, Nov 28, 2019 at 06:15:16PM +0100, Stefano Garzarella wrote:
> Hi,
> now that we have multi-transport upstream, I started to take a look to
> support network namespace (netns) in vsock.
> 
> As we partially discussed in the multi-transport proposal [1], it could
> be nice to support network namespace in vsock to reach the following
> goals:
> - isolate host applications from guest applications using the same ports
>   with CID_ANY
> - assign the same CID of VMs running in different network namespaces
> - partition VMs between VMMs or at finer granularity
> 
> This preliminary implementation provides the following behavior:
> - packets received from the host (received by G2H transports) are
>   assigned to the default netns (init_net)
> - packets received from the guest (received by H2G - vhost-vsock) are
>   assigned to the netns of the process that opens /dev/vhost-vsock
>   (usually the VMM, qemu in my tests, opens the /dev/vhost-vsock)
> - for vmci I need some suggestions, because I don't know how to do
>   and test the same in the vmci driver, for now vmci uses the
>   init_net
> - loopback packets are exchanged only in the same netns
> 
> Questions:
> 1. Should we make configurable the netns (now it is init_net) where
>packets from the host should be delivered?

Yes, it should be possible to have multiple G2H (e.g. virtio-vsock)
devices and to assign them to different net namespaces.  Something like
net/core/dev.c:dev_change_net_namespace() will eventually be needed.

> 2. Should we provide an ioctl in vhost-vsock to configure the netns
>to use? (instead of using the netns of the process that opens
>/dev/vhost-vsock)

Creating the vhost-vsock instance in the process' net namespace makes
sense.  Maybe wait for a use case before adding an ioctl.

> 3. Should we provide a way to disable the netns support in vsock?

The code should follow CONFIG_NET_NS semantics.  I'm not sure what they
are exactly since struct net is always defined, regardless of whether
network namespaces are enabled.


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 4/6] vsock: add vsock_loopback transport

2019-11-22 Thread Stefan Hajnoczi
On Thu, Nov 21, 2019 at 04:25:17PM +0100, Stefano Garzarella wrote:
> On Thu, Nov 21, 2019 at 10:59:48AM +0100, Stefano Garzarella wrote:
> > On Thu, Nov 21, 2019 at 09:34:58AM +0000, Stefan Hajnoczi wrote:
> > > On Tue, Nov 19, 2019 at 12:01:19PM +0100, Stefano Garzarella wrote:
> > > > +static struct workqueue_struct *vsock_loopback_workqueue;
> > > > +static struct vsock_loopback *the_vsock_loopback;
> > > 
> > > the_vsock_loopback could be a static global variable (not a pointer) and
> > > vsock_loopback_workqueue could also be included in the struct.
> > > 
> > > The RCU pointer is really a way to synchronize vsock_loopback_send_pkt()
> > > and vsock_loopback_cancel_pkt() with module exit.  There is no other
> > > reason for using a pointer.
> > > 
> > > It's cleaner to implement the synchronization once in af_vsock.c (or
> > > virtio_transport_common.c) instead of making each transport do it.
> > > Maybe try_module_get() and related APIs provide the necessary semantics
> > > so that core vsock code can hold the transport module while it's being
> > > used to send/cancel a packet.
> > 
> > Right, the module cannot be unloaded until open sockets, so here the
> > synchronization is not needed.
> > 
> > The synchronization come from virtio-vsock device that can be
> > hot-unplugged while sockets are still open, but that can't happen here.
> > 
> > I will remove the pointers and RCU in the v2.
> > 
> > Can I keep your R-b or do you prefer to watch v2 first?

I'd like to review v2.

> > > > +MODULE_ALIAS_NETPROTO(PF_VSOCK);
> > > 
> > > Why does this module define the alias for PF_VSOCK?  Doesn't another
> > > module already define this alias?
> > 
> > It is a way to load this module when PF_VSOCK is starting to be used.
> > MODULE_ALIAS_NETPROTO(PF_VSOCK) is already defined in vmci_transport
> > and hyperv_transport. IIUC it is used for the same reason.
> > 
> > In virtio_transport we don't need it because it will be loaded when
> > the PCI device is discovered.
> > 
> > Do you think there's a better way?
> > Should I include the vsock_loopback transport directly in af_vsock
> > without creating a new module?
> > 
> 
> That last thing I said may not be possible:
> I remembered that I tried, but DEPMOD found a cyclic dependency because
> vsock_transport use virtio_transport_common that use vsock, so if I
> include vsock_transport in the vsock module, DEPMOD is not happy.
> 
> Do you think it's okay in this case to keep MODULE_ALIAS_NETPROTO(PF_VSOCK)
> or is there a better way?

The reason I asked is because the semantics of duplicate module aliases
aren't clear to me.  Do all modules with the same alias get loaded?
Or just the first?  Or ...?

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 5/6] vsock: use local transport when it is loaded

2019-11-21 Thread Stefan Hajnoczi
On Tue, Nov 19, 2019 at 12:01:20PM +0100, Stefano Garzarella wrote:
> @@ -420,9 +436,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, 
> struct vsock_sock *psk)
>   new_transport = transport_dgram;
>   break;
>   case SOCK_STREAM:
> - if (remote_cid <= VMADDR_CID_HOST ||
> - (transport_g2h &&
> -  remote_cid == transport_g2h->get_local_cid()))
> + if (vsock_use_local_transport(remote_cid))
> + new_transport = transport_local;
> + else if (remote_cid == VMADDR_CID_HOST ||
> +  remote_cid == VMADDR_CID_HYPERVISOR)
>   new_transport = transport_g2h;
>   else
>   new_transport = transport_h2g;

We used to send VMADDR_CID_RESERVED to the host.  Now we send
VMADDR_CID_RESERVED (LOCAL) to the guest when there is no
transport_local loaded?

If this is correct, is there a justification for this change?  It seems
safest to retain existing behavior.


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 0/6] vsock: add local transport support

2019-11-21 Thread Stefan Hajnoczi
On Tue, Nov 19, 2019 at 12:01:15PM +0100, Stefano Garzarella wrote:
> This series introduces a new transport (vsock_loopback) to handle
> local communication.
> This could be useful to test vsock core itself and to allow developers
> to test their applications without launching a VM.
> 
> Before this series, vmci and virtio transports allowed this behavior,
> but only in the guest.
> We are moving the loopback handling in a new transport, because it
> might be useful to provide this feature also in the host or when
> no H2G/G2H transports (hyperv, virtio, vmci) are loaded.
> 
> The user can use the loopback with the new VMADDR_CID_LOCAL (that
> replaces VMADDR_CID_RESERVED) in any condition.
> Otherwise, if the G2H transport is loaded, it can also use the guest
> local CID as previously supported by vmci and virtio transports.
> If G2H transport is not loaded, the user can also use VMADDR_CID_HOST
> for local communication.
> 
> Patch 1 is a cleanup to build virtio_transport_common without virtio
> Patch 2 adds the new VMADDR_CID_LOCAL, replacing VMADDR_CID_RESERVED
> Patch 3 adds a new feature flag to register a loopback transport
> Patch 4 adds the new vsock_loopback transport based on the loopback
> implementation of virtio_transport
> Patch 5 implements the logic to use the local transport for loopback
> communication
> Patch 6 removes the loopback from virtio_transport
> 
> @Jorgen: Do you think it might be a problem to replace
> VMADDR_CID_RESERVED with VMADDR_CID_LOCAL?
> 
> Thanks,
> Stefano
> 
> Stefano Garzarella (6):
>   vsock/virtio_transport_common: remove unused virtio header includes
>   vsock: add VMADDR_CID_LOCAL definition
>   vsock: add local transport support in the vsock core
>   vsock: add vsock_loopback transport
>   vsock: use local transport when it is loaded
>   vsock/virtio: remove loopback handling
> 
>  MAINTAINERS |   1 +
>  include/net/af_vsock.h  |   2 +
>  include/uapi/linux/vm_sockets.h |   8 +-
>  net/vmw_vsock/Kconfig   |  12 ++
>  net/vmw_vsock/Makefile  |   1 +
>  net/vmw_vsock/af_vsock.c|  49 +-
>  net/vmw_vsock/virtio_transport.c|  61 +--
>  net/vmw_vsock/virtio_transport_common.c |   3 -
>  net/vmw_vsock/vmci_transport.c  |   2 +-
>  net/vmw_vsock/vsock_loopback.c  | 217 
>  10 files changed, 283 insertions(+), 73 deletions(-)
>  create mode 100644 net/vmw_vsock/vsock_loopback.c

Please see my comments.  Otherwise:

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 4/6] vsock: add vsock_loopback transport

2019-11-21 Thread Stefan Hajnoczi
On Tue, Nov 19, 2019 at 12:01:19PM +0100, Stefano Garzarella wrote:

Ideas for long-term changes below.

Reviewed-by: Stefan Hajnoczi 

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 760049454a23..c2a3dc3113ba 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -17239,6 +17239,7 @@ F:net/vmw_vsock/diag.c
>  F:   net/vmw_vsock/af_vsock_tap.c
>  F:   net/vmw_vsock/virtio_transport_common.c
>  F:   net/vmw_vsock/virtio_transport.c
> +F:   net/vmw_vsock/vsock_loopback.c
>  F:   drivers/net/vsockmon.c
>  F:   drivers/vhost/vsock.c
>  F:   tools/testing/vsock/

At this point you are most active in virtio-vsock and I am reviewing
patches on a best-effort basis.  Feel free to add yourself as
maintainer.

> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> new file mode 100644
> index ..3d1c1a88305f
> --- /dev/null
> +++ b/net/vmw_vsock/vsock_loopback.c
> @@ -0,0 +1,217 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * loopback transport for vsock using virtio_transport_common APIs
> + *
> + * Copyright (C) 2013-2019 Red Hat, Inc.
> + * Author: Asias He 
> + * Stefan Hajnoczi 
> + * Stefano Garzarella 
> + *
> + */
> +#include 
> +#include 
> +#include 
> +#include 

Is it time to rename the generic functionality in
virtio_transport_common.c?  This doesn't have anything to do with virtio
:).

> +
> +static struct workqueue_struct *vsock_loopback_workqueue;
> +static struct vsock_loopback *the_vsock_loopback;

the_vsock_loopback could be a static global variable (not a pointer) and
vsock_loopback_workqueue could also be included in the struct.

The RCU pointer is really a way to synchronize vsock_loopback_send_pkt()
and vsock_loopback_cancel_pkt() with module exit.  There is no other
reason for using a pointer.

It's cleaner to implement the synchronization once in af_vsock.c (or
virtio_transport_common.c) instead of making each transport do it.
Maybe try_module_get() and related APIs provide the necessary semantics
so that core vsock code can hold the transport module while it's being
used to send/cancel a packet.

> +MODULE_ALIAS_NETPROTO(PF_VSOCK);

Why does this module define the alias for PF_VSOCK?  Doesn't another
module already define this alias?


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 3/3] virtiofs: Use completions while waiting for queue to be drained

2019-11-13 Thread Stefan Hajnoczi
On Wed, Oct 30, 2019 at 11:07:19AM -0400, Vivek Goyal wrote:
> While we wait for queue to finish draining, use completions instead of
> uslee_range(). This is better way of waiting for event.

s/uslee_range()/usleep_range()/


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/3] virtiofs: Small Cleanups for 5.5

2019-11-13 Thread Stefan Hajnoczi
On Wed, Oct 30, 2019 at 11:07:16AM -0400, Vivek Goyal wrote:
> Hi Miklos,
> 
> Here are few small cleanups for virtiofs for 5.5. I had received some
> comments from Michael Tsirkin on original virtiofs patches and these
> cleanups are result of these comments.
> 
> Thanks
> Vivek
> 
> Vivek Goyal (3):
>   virtiofs: Use a common function to send forget
>   virtiofs: Do not send forget request "struct list_head" element
>   virtiofs: Use completions while waiting for queue to be drained
> 
>  fs/fuse/virtio_fs.c | 204 ++--
>  1 file changed, 103 insertions(+), 101 deletions(-)
> 
> -- 
> 2.20.1
> 

There are typos in the commit descriptions but the code looks fine:

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/3] virtiofs: Do not send forget request "struct list_head" element

2019-11-13 Thread Stefan Hajnoczi
On Wed, Oct 30, 2019 at 11:07:18AM -0400, Vivek Goyal wrote:
> We are sending whole of virtio_fs_foreget struct to the other end over

s/foreget/forget/


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH -next] virtiofs: Fix old-style declaration

2019-11-12 Thread Stefan Hajnoczi
On Mon, Nov 11, 2019 at 08:23:59PM +0800, YueHaibing wrote:
> There expect the 'static' keyword to come first in a
> declaration, and we get warnings like this with "make W=1":
> 
> fs/fuse/virtio_fs.c:687:1: warning: 'static' is not at beginning of 
> declaration [-Wold-style-declaration]
> fs/fuse/virtio_fs.c:692:1: warning: 'static' is not at beginning of 
> declaration [-Wold-style-declaration]
> fs/fuse/virtio_fs.c:1029:1: warning: 'static' is not at beginning of 
> declaration [-Wold-style-declaration]
> 
> Signed-off-by: YueHaibing 
> ---
>  fs/fuse/virtio_fs.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2] virtiofs: Use static const, not const static

2019-11-12 Thread Stefan Hajnoczi
On Mon, Nov 11, 2019 at 08:15:45PM +0800, zhengbin wrote:
> Move the static keyword to the front of declarations, which resolves
> compiler warnings when building with "W=1":
> 
> fs/fuse/virtio_fs.c:687:1: warning: ‘static’ is not at beginning of 
> declaration [-Wold-style-declaration]
>  const static struct virtio_device_id id_table[] = {
>  ^
> fs/fuse/virtio_fs.c:692:1: warning: ‘static’ is not at beginning of 
> declaration [-Wold-style-declaration]
>  const static unsigned int feature_table[] = {};
>  ^
> fs/fuse/virtio_fs.c:1029:1: warning: ‘static’ is not at beginning of 
> declaration [-Wold-style-declaration]
>  const static struct fuse_iqueue_ops virtio_fs_fiq_ops = {
> 
> Reported-by: Hulk Robot 
> Signed-off-by: zhengbin 
> ---
> v1->v2: modify comment
>  fs/fuse/virtio_fs.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtiofs: Use static const, not const static

2019-11-11 Thread Stefan Hajnoczi
On Mon, Nov 11, 2019 at 05:26:41PM +0800, zhengbin wrote:
> Move the static keyword to the front of declarations.

Please mention why this change is necessary in the commit description.

> 
> Reported-by: Hulk Robot 
> Signed-off-by: zhengbin 
> ---
>  fs/fuse/virtio_fs.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index b77acea..2ac6818 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -684,12 +684,12 @@ static int virtio_fs_restore(struct virtio_device *vdev)
>  }
>  #endif /* CONFIG_PM_SLEEP */
> 
> -const static struct virtio_device_id id_table[] = {
> +static const struct virtio_device_id id_table[] = {
>   { VIRTIO_ID_FS, VIRTIO_DEV_ANY_ID },
>   {},
>  };
> 
> -const static unsigned int feature_table[] = {};
> +static const unsigned int feature_table[] = {};
> 
>  static struct virtio_driver virtio_fs_driver = {
>   .driver.name= KBUILD_MODNAME,
> @@ -1026,7 +1026,7 @@ __releases(fiq->lock)
>   }
>  }
> 
> -const static struct fuse_iqueue_ops virtio_fs_fiq_ops = {
> +static const struct fuse_iqueue_ops virtio_fs_fiq_ops = {
>   .wake_forget_and_unlock = virtio_fs_wake_forget_and_unlock,
>   .wake_interrupt_and_unlock  = virtio_fs_wake_interrupt_and_unlock,
>   .wake_pending_and_unlock= virtio_fs_wake_pending_and_unlock,
> --
> 2.7.4
> 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 08/14] vsock: add vsock_create_connected() called by transports

2019-10-27 Thread Stefan Hajnoczi
On Wed, Oct 23, 2019 at 11:55:48AM +0200, Stefano Garzarella wrote:
> All transports call __vsock_create() with the same parameters,
> most of them depending on the parent socket. In order to simplify
> the VSOCK core APIs exposed to the transports, this patch adds
> the vsock_create_connected() callable from transports to create
> a new socket when a connection request is received.
> We also unexported the __vsock_create().
> 
> Suggested-by: Stefan Hajnoczi 
> Signed-off-by: Stefano Garzarella 
> ---
>  include/net/af_vsock.h  |  5 +
>  net/vmw_vsock/af_vsock.c| 20 +---
>  net/vmw_vsock/hyperv_transport.c|  3 +--
>  net/vmw_vsock/virtio_transport_common.c |  3 +--
>  net/vmw_vsock/vmci_transport.c  |  3 +--
>  5 files changed, 17 insertions(+), 17 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 07/14] vsock: handle buffer_size sockopts in the core

2019-10-27 Thread Stefan Hajnoczi
On Wed, Oct 23, 2019 at 11:55:47AM +0200, Stefano Garzarella wrote:
> virtio_transport and vmci_transport handle the buffer_size
> sockopts in a very similar way.
> 
> In order to support multiple transports, this patch moves this
> handling in the core to allow the user to change the options
> also if the socket is not yet assigned to any transport.
> 
> This patch also adds the '.notify_buffer_size' callback in the
> 'struct virtio_transport' in order to inform the transport,
> when the buffer_size is changed by the user. It is also useful
> to limit the 'buffer_size' requested (e.g. virtio transports).
> 
> Acked-by: Dexuan Cui 
> Signed-off-by: Stefano Garzarella 
> ---
> RFC -> v1:
> - changed .notify_buffer_size return to void (Stefan)
> - documented that .notify_buffer_size is called with sk_lock held (Stefan)
> ---
>  drivers/vhost/vsock.c   |  7 +-
>  include/linux/virtio_vsock.h| 15 +
>  include/net/af_vsock.h  | 15 ++---
>  net/vmw_vsock/af_vsock.c| 43 ++---
>  net/vmw_vsock/hyperv_transport.c| 36 ---
>  net/vmw_vsock/virtio_transport.c|  8 +--
>  net/vmw_vsock/virtio_transport_common.c | 79 ---
>  net/vmw_vsock/vmci_transport.c  | 86 +++--
>  net/vmw_vsock/vmci_transport.h  |  3 -
>  9 files changed, 65 insertions(+), 227 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 12/14] vsock/vmci: register vmci_transport only when VMCI guest/host are active

2019-10-27 Thread Stefan Hajnoczi
On Wed, Oct 23, 2019 at 11:55:52AM +0200, Stefano Garzarella wrote:
> +static int __init vmci_transport_init(void)
> +{
> + int features = VSOCK_TRANSPORT_F_DGRAM;

Where is this variable used?


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 00/14] vsock: add multi-transports support

2019-10-27 Thread Stefan Hajnoczi
On Wed, Oct 23, 2019 at 11:55:40AM +0200, Stefano Garzarella wrote:
> This series adds the multi-transports support to vsock, following
> this proposal: https://www.spinics.net/lists/netdev/msg575792.html
> 
> With the multi-transports support, we can use VSOCK with nested VMs
> (using also different hypervisors) loading both guest->host and
> host->guest transports at the same time.
> Before this series, vmci-transport supported this behavior but only
> using VMware hypervisor on L0, L1, etc.
> 
> RFC: https://patchwork.ozlabs.org/cover/1168442/
> RFC -> v1:
> - Added R-b/A-b from Dexuan and Stefan
> - Fixed comments and typos in several patches (Stefan)
> - Patch 7: changed .notify_buffer_size return to void (Stefan)
> - Added patch 8 to simplify the API exposed to the transports (Stefan)
> - Patch 11:
>   + documented VSOCK_TRANSPORT_F_* flags (Stefan)
>   + fixed vsock_assign_transport() when the socket is already assigned
>   + moved features outside of struct vsock_transport, and used as
> parameter of vsock_core_register() as a preparation of Patch 12
> - Removed "vsock: add 'transport_hg' to handle g2h\h2g transports" patch
> - Added patch 12 to register vmci_transport only when VMCI guest/host
>   are active

Has there been feedback from Jorgen or someone else from VMware?  A
Reviewed-by or Acked-by would be nice since this patch series affects
VMCI AF_VSOCK.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH -next] virtiofs: remove unused variable 'fc'

2019-10-23 Thread Stefan Hajnoczi
On Wed, Oct 23, 2019 at 02:21:30PM +0800, YueHaibing wrote:
> fs/fuse/virtio_fs.c:983:20: warning:
>  variable fc set but not used [-Wunused-but-set-variable]
> 
> It is not used since commit 7ee1e2e631db ("virtiofs:
> No need to check fpq->connected state")
> 
> Signed-off-by: YueHaibing 
> ---
>  fs/fuse/virtio_fs.c | 2 --
>  1 file changed, 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtiofs: Remove set but not used variable 'fc'

2019-10-23 Thread Stefan Hajnoczi
On Wed, Oct 23, 2019 at 10:02:49AM +0800, zhengbin wrote:
> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> fs/fuse/virtio_fs.c: In function virtio_fs_wake_pending_and_unlock:
> fs/fuse/virtio_fs.c:983:20: warning: variable fc set but not used 
> [-Wunused-but-set-variable]
> 
> It is not used since commit 7ee1e2e631db ("virtiofs:
> No need to check fpq->connected state")
> 
> Reported-by: Hulk Robot 
> Signed-off-by: zhengbin 
> ---
>  fs/fuse/virtio_fs.c | 2 --
>  1 file changed, 2 deletions(-)

Only affects the linux-next tree, not virtio-fs-dev or linux.git/master.
Same as "[PATCH -next] virtiofs: remove unused variable 'fc'"
(<20191023062130.23068-1-yuehaib...@huawei.com>).

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 0/7] mdev based hardware virtio offloading support

2019-10-17 Thread Stefan Hajnoczi
On Thu, Oct 17, 2019 at 09:42:53AM +0800, Jason Wang wrote:
> 
> On 2019/10/15 下午10:37, Stefan Hajnoczi wrote:
> > On Tue, Oct 15, 2019 at 11:37:17AM +0800, Jason Wang wrote:
> > > On 2019/10/15 上午1:49, Stefan Hajnoczi wrote:
> > > > On Fri, Oct 11, 2019 at 04:15:50PM +0800, Jason Wang wrote:
> > > > > There are hardware that can do virtio datapath offloading while having
> > > > > its own control path. This path tries to implement a mdev based
> > > > > unified API to support using kernel virtio driver to drive those
> > > > > devices. This is done by introducing a new mdev transport for virtio
> > > > > (virtio_mdev) and register itself as a new kind of mdev driver. Then
> > > > > it provides a unified way for kernel virtio driver to talk with mdev
> > > > > device implementation.
> > > > > 
> > > > > Though the series only contains kernel driver support, the goal is to
> > > > > make the transport generic enough to support userspace drivers. This
> > > > > means vhost-mdev[1] could be built on top as well by resuing the
> > > > > transport.
> > > > > 
> > > > > A sample driver is also implemented which simulate a virito-net
> > > > > loopback ethernet device on top of vringh + workqueue. This could be
> > > > > used as a reference implementation for real hardware driver.
> > > > > 
> > > > > Consider mdev framework only support VFIO device and driver right now,
> > > > > this series also extend it to support other types. This is done
> > > > > through introducing class id to the device and pairing it with
> > > > > id_talbe claimed by the driver. On top, this seris also decouple
> > > > > device specific parents ops out of the common ones.
> > > > I was curious so I took a quick look and posted comments.
> > > > 
> > > > I guess this driver runs inside the guest since it registers virtio
> > > > devices?
> > > 
> > > It could run in either guest or host. But the main focus is to run in the
> > > host then we can use virtio drivers in containers.
> > > 
> > > 
> > > > If this is used with physical PCI devices that support datapath
> > > > offloading then how are physical devices presented to the guest without
> > > > SR-IOV?
> > > 
> > > We will do control path meditation through vhost-mdev[1] and 
> > > vhost-vfio[2].
> > > Then we will present a full virtio compatible ethernet device for guest.
> > > 
> > > SR-IOV is not a must, any mdev device that implements the API defined in
> > > patch 5 can be used by this framework.
> > What I'm trying to understand is: if you want to present a virtio-pci
> > device to the guest (e.g. using vhost-mdev or vhost-vfio), then how is
> > that related to this patch series?
> 
> 
> This series introduce some infrastructure that would be used by vhost-mdev:
> 
> 1) allow new type of mdev devices/drivers other than vfio (through class_id
> and device ops)
> 
> 2) a set of virtio specific callbacks that will be used by both vhost-mdev
> and virtio-mdev defined in patch 5
> 
> Then vhost-mdev can be implemented on top: a new mdev class id but reuse the
> callback defined in 2. Through this way the parent can provides a single set
> of callbacks (device ops) for both kernel virtio driver (through
> virtio-mdev) or userspace virtio driver (through vhost-mdev).

Okay, thanks for explaining!

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vsock/virtio: remove unused 'work' field from 'struct virtio_vsock_pkt'

2019-10-16 Thread Stefan Hajnoczi
On Tue, Oct 15, 2019 at 05:00:51PM +0200, Stefano Garzarella wrote:
> The 'work' field was introduced with commit 06a8fc78367d0
> ("VSOCK: Introduce virtio_vsock_common.ko")
> but it is never used in the code, so we can remove it to save
> memory allocated in the per-packet 'struct virtio_vsock_pkt'
> 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Stefano Garzarella 
> ---
>  include/linux/virtio_vsock.h | 1 -
>  1 file changed, 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 0/7] mdev based hardware virtio offloading support

2019-10-15 Thread Stefan Hajnoczi
On Tue, Oct 15, 2019 at 11:37:17AM +0800, Jason Wang wrote:
> 
> On 2019/10/15 上午1:49, Stefan Hajnoczi wrote:
> > On Fri, Oct 11, 2019 at 04:15:50PM +0800, Jason Wang wrote:
> > > There are hardware that can do virtio datapath offloading while having
> > > its own control path. This path tries to implement a mdev based
> > > unified API to support using kernel virtio driver to drive those
> > > devices. This is done by introducing a new mdev transport for virtio
> > > (virtio_mdev) and register itself as a new kind of mdev driver. Then
> > > it provides a unified way for kernel virtio driver to talk with mdev
> > > device implementation.
> > > 
> > > Though the series only contains kernel driver support, the goal is to
> > > make the transport generic enough to support userspace drivers. This
> > > means vhost-mdev[1] could be built on top as well by resuing the
> > > transport.
> > > 
> > > A sample driver is also implemented which simulate a virito-net
> > > loopback ethernet device on top of vringh + workqueue. This could be
> > > used as a reference implementation for real hardware driver.
> > > 
> > > Consider mdev framework only support VFIO device and driver right now,
> > > this series also extend it to support other types. This is done
> > > through introducing class id to the device and pairing it with
> > > id_talbe claimed by the driver. On top, this seris also decouple
> > > device specific parents ops out of the common ones.
> > I was curious so I took a quick look and posted comments.
> > 
> > I guess this driver runs inside the guest since it registers virtio
> > devices?
> 
> 
> It could run in either guest or host. But the main focus is to run in the
> host then we can use virtio drivers in containers.
> 
> 
> > 
> > If this is used with physical PCI devices that support datapath
> > offloading then how are physical devices presented to the guest without
> > SR-IOV?
> 
> 
> We will do control path meditation through vhost-mdev[1] and vhost-vfio[2].
> Then we will present a full virtio compatible ethernet device for guest.
> 
> SR-IOV is not a must, any mdev device that implements the API defined in
> patch 5 can be used by this framework.

What I'm trying to understand is: if you want to present a virtio-pci
device to the guest (e.g. using vhost-mdev or vhost-vfio), then how is
that related to this patch series?

Does this mean this patch series is useful mostly for presenting virtio
devices to containers or the host?

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net 0/2] vsock: don't allow half-closed socket in the host transports

2019-10-15 Thread Stefan Hajnoczi
On Sat, Oct 12, 2019 at 06:38:46PM -0400, Michael S. Tsirkin wrote:
> On Fri, Oct 11, 2019 at 04:34:57PM +0200, Stefano Garzarella wrote:
> > On Fri, Oct 11, 2019 at 10:19:13AM -0400, Michael S. Tsirkin wrote:
> > > On Fri, Oct 11, 2019 at 03:07:56PM +0200, Stefano Garzarella wrote:
> > > > We are implementing a test suite for the VSOCK sockets and we discovered
> > > > that vmci_transport never allowed half-closed socket on the host side.
> > > > 
> > > > As Jorgen explained [1] this is due to the implementation of VMCI.
> > > > 
> > > > Since we want to have the same behaviour across all transports, this
> > > > series adds a section in the "Implementation notes" to exaplain this
> > > > behaviour, and changes the vhost_transport to behave the same way.
> > > > 
> > > > [1] https://patchwork.ozlabs.org/cover/847998/#1831400
> > > 
> > > Half closed sockets are very useful, and lots of
> > > applications use tricks to swap a vsock for a tcp socket,
> > > which might as a result break.
> > 
> > Got it!
> > 
> > > 
> > > If VMCI really cares it can implement an ioctl to
> > > allow applications to detect that half closed sockets aren't supported.
> > > 
> > > It does not look like VMCI wants to bother (users do not read
> > > kernel implementation notes) so it does not really care.
> > > So why do we want to cripple other transports intentionally?
> > 
> > The main reason is that we are developing the test suite and we noticed
> > the miss match. Since we want to make sure that applications behave in
> > the same way on different transports, we thought we would solve it that
> > way.
> > 
> > But what you are saying (also in the reply of the patches) is actually
> > quite right. Not being publicized, applications do not expect this behavior,
> > so please discard this series.
> > 
> > My problem during the tests, was trying to figure out if half-closed
> > sockets were supported or not, so as you say adding an IOCTL or maybe
> > better a getsockopt() could solve the problem.
> > 
> > What do you think?
> > 
> > Thanks,
> > Stefano
> 
> Sure, why not.

The aim is for applications using AF_VSOCK sockets to run on any
transport.  When the semantics differ between transports it creates a
compatibility problem.

That said, I do think keeping the standard sockets behavior is
reasonable.  If applications have problems on VMCI a sockopt may be
necessary :(.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 0/7] mdev based hardware virtio offloading support

2019-10-14 Thread Stefan Hajnoczi
On Fri, Oct 11, 2019 at 04:15:50PM +0800, Jason Wang wrote:
> There are hardware that can do virtio datapath offloading while having
> its own control path. This path tries to implement a mdev based
> unified API to support using kernel virtio driver to drive those
> devices. This is done by introducing a new mdev transport for virtio
> (virtio_mdev) and register itself as a new kind of mdev driver. Then
> it provides a unified way for kernel virtio driver to talk with mdev
> device implementation.
> 
> Though the series only contains kernel driver support, the goal is to
> make the transport generic enough to support userspace drivers. This
> means vhost-mdev[1] could be built on top as well by resuing the
> transport.
> 
> A sample driver is also implemented which simulate a virito-net
> loopback ethernet device on top of vringh + workqueue. This could be
> used as a reference implementation for real hardware driver.
> 
> Consider mdev framework only support VFIO device and driver right now,
> this series also extend it to support other types. This is done
> through introducing class id to the device and pairing it with
> id_talbe claimed by the driver. On top, this seris also decouple
> device specific parents ops out of the common ones.

I was curious so I took a quick look and posted comments.

I guess this driver runs inside the guest since it registers virtio
devices?

If this is used with physical PCI devices that support datapath
offloading then how are physical devices presented to the guest without
SR-IOV?

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 6/7] virtio: introduce a mdev based transport

2019-10-14 Thread Stefan Hajnoczi
On Fri, Oct 11, 2019 at 04:15:56PM +0800, Jason Wang wrote:
> +struct virtio_mdev_device {
> + struct virtio_device vdev;
> + struct mdev_device *mdev;
> + unsigned long version;
> +
> + struct virtqueue **vqs;
> + /* The lock to protect virtqueue list */
> + spinlock_t lock;
> + struct list_head virtqueues;

Is this a list of struct virtio_mdev_vq_info?  Please document the
actual type in a comment.

> +static int virtio_mdev_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> + struct virtqueue *vqs[],
> + vq_callback_t *callbacks[],
> + const char * const names[],
> + const bool *ctx,
> + struct irq_affinity *desc)
> +{
> + struct virtio_mdev_device *vm_dev = to_virtio_mdev_device(vdev);
> + struct mdev_device *mdev = vm_get_mdev(vdev);
> + const struct virtio_mdev_device_ops *ops = mdev_get_dev_ops(mdev);
> + struct virtio_mdev_callback cb;
> + int i, err, queue_idx = 0;
> +
> + vm_dev->vqs = kmalloc_array(queue_idx, sizeof(*vm_dev->vqs),
> + GFP_KERNEL);

kmalloc_array(0, ...)?  I would have expected nvqs instead of queue_idx
(0).

What is this the purpose of vm_dev->vqs and does anything ever access it?


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 5/7] mdev: introduce virtio device and its device ops

2019-10-14 Thread Stefan Hajnoczi
On Fri, Oct 11, 2019 at 04:15:55PM +0800, Jason Wang wrote:
> + * @set_vq_cb:   Set the interrut calback function for

s/interrut/interrupt/

s/calback/callback/


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

2019-10-14 Thread Stefan Hajnoczi
On Fri, Oct 11, 2019 at 03:40:48PM +0200, Stefano Garzarella wrote:
> On Sun, Sep 1, 2019 at 8:56 AM Michael S. Tsirkin  wrote:
> > On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
> > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
> > > > > Since virtio-vsock was introduced, the buffers filled by the host
> > > > > and pushed to the guest using the vring, are directly queued in
> > > > > a per-socket list. These buffers are preallocated by the guest
> > > > > with a fixed size (4 KB).
> > > > >
> > > > > The maximum amount of memory used by each socket should be
> > > > > controlled by the credit mechanism.
> > > > > The default credit available per-socket is 256 KB, but if we use
> > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB
> > > > > buffers, using up to 1 GB of memory per-socket. In addition, the
> > > > > guest will continue to fill the vring with new 4 KB free buffers
> > > > > to avoid starvation of other sockets.
> > > > >
> > > > > This patch mitigates this issue copying the payload of small
> > > > > packets (< 128 bytes) into the buffer of last packet queued, in
> > > > > order to avoid wasting memory.
> > > > >
> > > > > Reviewed-by: Stefan Hajnoczi 
> > > > > Signed-off-by: Stefano Garzarella 
> > > >
> > > > This is good enough for net-next, but for net I think we
> > > > should figure out how to address the issue completely.
> > > > Can we make the accounting precise? What happens to
> > > > performance if we do?
> > > >
> > >
> > > Since I'm back from holidays, I'm restarting this thread to figure out
> > > how to address the issue completely.
> > >
> > > I did a better analysis of the credit mechanism that we implemented in
> > > virtio-vsock to get a clearer view and I'd share it with you:
> > >
> > > This issue affect only the "host->guest" path. In this case, when the
> > > host wants to send a packet to the guest, it uses a "free" buffer
> > > allocated by the guest (4KB).
> > > The "free" buffers available for the host are shared between all
> > > sockets, instead, the credit mechanism is per-socket, I think to
> > > avoid the starvation of others sockets.
> > > The guests re-fill the "free" queue when the available buffers are
> > > less than half.
> > >
> > > Each peer have these variables in the per-socket state:
> > >/* local vars */
> > >buf_alloc/* max bytes usable by this socket
> > >[exposed to the other peer] */
> > >fwd_cnt  /* increased when RX packet is consumed by the
> > >user space [exposed to the other peer] */
> > >tx_cnt /* increased when TX packet is sent to the 
> > > other peer */
> > >
> > >/* remote vars  */
> > >peer_buf_alloc   /* peer's buf_alloc */
> > >peer_fwd_cnt /* peer's fwd_cnt */
> > >
> > > When a peer sends a packet, it increases the 'tx_cnt'; when the
> > > receiver consumes the packet (copy it to the user-space buffer), it
> > > increases the 'fwd_cnt'.
> > > Note: increments are made considering the payload length and not the
> > > buffer length.
> > >
> > > The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
> > > all packet headers or with an explicit CREDIT_UPDATE packet.
> > >
> > > The local 'buf_alloc' value can be modified by the user space using
> > > setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.
> > >
> > > Before to send a packet, the peer checks the space available:
> > >   credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
> > > and it will send up to credit_available bytes to the other peer.
> > >
> > > Possible solutions considering Michael's advice:
> > > 1. Use the buffer length instead of the payload length when we increment
> > >the counters:
> > >   - This approach will account precisely the memory used per socket.
> > >   - This requir

Re: [RFC PATCH 07/13] vsock: handle buffer_size sockopts in the core

2019-10-11 Thread Stefan Hajnoczi
On Thu, Oct 10, 2019 at 11:32:54AM +0200, Stefano Garzarella wrote:
> On Wed, Oct 09, 2019 at 01:30:26PM +0100, Stefan Hajnoczi wrote:
> > On Fri, Sep 27, 2019 at 01:26:57PM +0200, Stefano Garzarella wrote:
> > Another issue is that this patch drops the VIRTIO_VSOCK_MAX_BUF_SIZE
> > limit that used to be enforced by virtio_transport_set_buffer_size().
> > Now the limit is only applied at socket init time.  If the buffer size
> > is changed later then VIRTIO_VSOCK_MAX_BUF_SIZE can be exceeded.  If
> > that doesn't matter, why even bother with VIRTIO_VSOCK_MAX_BUF_SIZE
> > here?
> > 
> 
> The .notify_buffer_size() should avoid this issue, since it allows the
> transport to limit the buffer size requested after the initialization.
> 
> But again the min set by the user can not be respected and in the
> previous implementation we forced it to VIRTIO_VSOCK_MAX_BUF_SIZE.
> 
> Now we don't limit the min, but we guarantee only that vsk->buffer_size
> is lower than VIRTIO_VSOCK_MAX_BUF_SIZE.
> 
> Can that be an acceptable compromise?

I think so.

Setting buffer sizes was never tested or used much by userspace
applications that I'm aware of.  We should probably include tests for
changing buffer sizes in the test suite.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 07/11] VSOCK: add AF_VSOCK test cases

2019-10-09 Thread Stefan Hajnoczi
On Wed, Oct 09, 2019 at 12:03:53PM +0200, Stefano Garzarella wrote:
> Hi Stefan,
> I'm thinking about dividing this test into single applications, one
> for each test, do you think it makes sense?
> Or is it just a useless complication?

I don't mind either way but personally I would leave it as a single
program.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 00/13] vsock: add multi-transports support

2019-10-09 Thread Stefan Hajnoczi
On Fri, Sep 27, 2019 at 01:26:50PM +0200, Stefano Garzarella wrote:
> Hi all,
> this series adds the multi-transports support to vsock, following
> this proposal:
> https://www.spinics.net/lists/netdev/msg575792.html

Nice series!  I have left a few comments but overall it looks promising.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 13/13] vsock: fix bind() behaviour taking care of CID

2019-10-09 Thread Stefan Hajnoczi
On Fri, Sep 27, 2019 at 01:27:03PM +0200, Stefano Garzarella wrote:
> When we are looking for a socket bound to a specific address,
> we also have to take into account the CID.
> 
> This patch is useful with multi-transports support because it
> allows the binding of the same port with different CID, and
> it prevents a connection to a wrong socket bound to the same
> port, but with different CID.
> 
> Signed-off-by: Stefano Garzarella 
> ---
>  net/vmw_vsock/af_vsock.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 12/13] vsock: prevent transport modules unloading

2019-10-09 Thread Stefan Hajnoczi
On Fri, Sep 27, 2019 at 01:27:02PM +0200, Stefano Garzarella wrote:
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index c5f46b8242ce..750b62711b01 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -416,13 +416,28 @@ int vsock_assign_transport(struct vsock_sock *vsk, 
> struct vsock_sock *psk)
>   return -ESOCKTNOSUPPORT;
>   }
>  
> - if (!vsk->transport)
> + /* We increase the module refcnt to prevent the tranport unloading

s/tranport/transport/

Otherwise:

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 11/13] vsock: add 'transport_hg' to handle g2h\h2g transports

2019-10-09 Thread Stefan Hajnoczi
On Fri, Sep 27, 2019 at 01:27:01PM +0200, Stefano Garzarella wrote:
> VMCI transport provides both g2h and h2g behaviors in a single
> transport.
> We are able to set (or not) the g2h behavior, detecting if we
> are in a VMware guest (or not), but the h2g feature is always set.
> This prevents to load other h2g transports while we are in a
> VMware guest.

In the vhost_vsock.ko case we only register the h2g transport when
userspace has loaded the module (by opening /dev/vhost-vsock).

VMCI has something kind of similar: /dev/vmci and the
vmci_host_active_users counter.  Maybe we can use this instead of
introducing the transport_hg concept?


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 10/13] vsock: add multi-transports support

2019-10-09 Thread Stefan Hajnoczi
On Fri, Sep 27, 2019 at 01:27:00PM +0200, Stefano Garzarella wrote:
> RFC:
> - I'd like to move MODULE_ALIAS_NETPROTO(PF_VSOCK) to af_vsock.c.
>   @Jorgen could this break the VMware products?

What will cause the vmw_vsock_vmci_transport.ko module to be loaded
after you remove MODULE_ALIAS_NETPROTO(PF_VSOCK)?  Perhaps
drivers/misc/vmw_vmci/vmci_guest.c:vmci_guest_probe_device() could do
something when the guest driver loads.  There would need to be something
equivalent for the host side too.

This will solve another issue too.  Today the VMCI transport can be
loaded if an application creates an AF_VSOCK socket during early boot
before the virtio transport has been probed.  This happens because the
VMCI transport uses MODULE_ALIAS_NETPROTO(PF_VSOCK) *and* it does not
probe whether this system is actually a VMware guest.

If we instead load the core af_vsock.ko module and transports are only
loaded based on hardware feature probing (e.g. the presence of VMware
guest mode, a virtio PCI adapter, etc) then transports will be
well-behaved.

> - DGRAM sockets are handled as before, I don't know if make sense work
>   on it now, or when another transport will support DGRAM. The big
>   issues here is that we cannot link 1-1 a socket to transport as
>   for stream sockets since DGRAM is not connection-oriented.

Let's ignore DGRAM for now since only VMCI supports it and we therefore
do not require multi-transport support.

> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index 86f8f463e01a..2a081d19e20d 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -94,7 +94,13 @@ struct vsock_transport_send_notify_data {
>   u64 data2; /* Transport-defined. */
>  };
>  
> +#define VSOCK_TRANSPORT_F_H2G0x0001
> +#define VSOCK_TRANSPORT_F_G2H0x0002
> +#define VSOCK_TRANSPORT_F_DGRAM  0x0004

Documentation comments, please.

> +void vsock_core_unregister(const struct vsock_transport *t)
> +{
> + mutex_lock(_register_mutex);
> +
> + /* RFC-TODO: maybe we should check if there are open sockets
> +  * assigned to that transport and avoid the unregistration
> +  */

If unregister() is only called from module_exit() functions then holding
a reference to the transport module would be enough to prevent this
case.  The transport could only be removed once all sockets have been
destroyed (and dropped their transport module reference).


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 09/13] hv_sock: set VMADDR_CID_HOST in the hvs_remote_addr_init()

2019-10-09 Thread Stefan Hajnoczi
On Fri, Sep 27, 2019 at 01:26:59PM +0200, Stefano Garzarella wrote:
> Remote peer is always the host, so we set VMADDR_CID_HOST as
> remote CID instead of VMADDR_CID_ANY.
> 
> Signed-off-by: Stefano Garzarella 
> ---
>  net/vmw_vsock/hyperv_transport.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 08/13] vsock: move vsock_insert_unbound() in the vsock_create()

2019-10-09 Thread Stefan Hajnoczi
On Fri, Sep 27, 2019 at 01:26:58PM +0200, Stefano Garzarella wrote:
> vsock_insert_unbound() was called only when 'sock' parameter of
> __vsock_create() was not null. This only happened when
> __vsock_create() was called by vsock_create().
> 
> In order to simplify the multi-transports support, this patch
> moves vsock_insert_unbound() at the end of vsock_create().
> 
> Signed-off-by: Stefano Garzarella 
> ---
>  net/vmw_vsock/af_vsock.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)

Maybe transports shouldn't call __vsock_create() directly.  They always
pass NULL as the parent socket, so we could have a more specific
function that transports call without a parent sock argument.  This
would eliminate any concern over moving vsock_insert_unbound() out of
this function.  In any case, I've checked the code and this patch is
correct.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 07/13] vsock: handle buffer_size sockopts in the core

2019-10-09 Thread Stefan Hajnoczi
On Fri, Sep 27, 2019 at 01:26:57PM +0200, Stefano Garzarella wrote:
> @@ -140,18 +145,11 @@ struct vsock_transport {
>   struct vsock_transport_send_notify_data *);
>   int (*notify_send_post_enqueue)(struct vsock_sock *, ssize_t,
>   struct vsock_transport_send_notify_data *);
> + int (*notify_buffer_size)(struct vsock_sock *, u64 *);

Is ->notify_buffer_size() called under lock_sock(sk)?  If yes, please
document it.

> +static void vsock_update_buffer_size(struct vsock_sock *vsk,
> +  const struct vsock_transport *transport,
> +  u64 val)
> +{
> + if (val > vsk->buffer_max_size)
> + val = vsk->buffer_max_size;
> +
> + if (val < vsk->buffer_min_size)
> + val = vsk->buffer_min_size;
> +
> + if (val != vsk->buffer_size &&
> + transport && transport->notify_buffer_size)
> + transport->notify_buffer_size(vsk, );

Why does this function return an int if we don't check the return value?

> diff --git a/net/vmw_vsock/virtio_transport_common.c 
> b/net/vmw_vsock/virtio_transport_common.c
> index fc046c071178..bac9e7430a2e 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -403,17 +403,13 @@ int virtio_transport_do_socket_init(struct vsock_sock 
> *vsk,
>   if (psk) {
>   struct virtio_vsock_sock *ptrans = psk->trans;
>  
> - vvs->buf_size   = ptrans->buf_size;
> - vvs->buf_size_min = ptrans->buf_size_min;
> - vvs->buf_size_max = ptrans->buf_size_max;
>   vvs->peer_buf_alloc = ptrans->peer_buf_alloc;
> - } else {
> - vvs->buf_size = VIRTIO_VSOCK_DEFAULT_BUF_SIZE;
> - vvs->buf_size_min = VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE;
> - vvs->buf_size_max = VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE;
>   }
>  
> - vvs->buf_alloc = vvs->buf_size;
> + if (vsk->buffer_size > VIRTIO_VSOCK_MAX_BUF_SIZE)
> + vsk->buffer_size = VIRTIO_VSOCK_MAX_BUF_SIZE;

Hmm...this could be outside the [min, max] range.  I'm not sure how much
it matters.

Another issue is that this patch drops the VIRTIO_VSOCK_MAX_BUF_SIZE
limit that used to be enforced by virtio_transport_set_buffer_size().
Now the limit is only applied at socket init time.  If the buffer size
is changed later then VIRTIO_VSOCK_MAX_BUF_SIZE can be exceeded.  If
that doesn't matter, why even bother with VIRTIO_VSOCK_MAX_BUF_SIZE
here?



signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 06/13] vsock: add 'struct vsock_sock *' param to vsock_core_get_transport()

2019-10-09 Thread Stefan Hajnoczi
On Fri, Sep 27, 2019 at 01:26:56PM +0200, Stefano Garzarella wrote:
> -const struct vsock_transport *vsock_core_get_transport(void)
> +const struct vsock_transport *vsock_core_get_transport(struct vsock_sock 
> *vsk)
>  {
>   /* vsock_register_mutex not taken since only the transport uses this
>* function and only while registered.
>*/
> - return transport_single;

This comment is about protecting transport_single.  It no longer applies
when using vsk->transport.  Please drop it.

Otherwise:

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 05/13] vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock()

2019-10-09 Thread Stefan Hajnoczi
On Fri, Sep 27, 2019 at 01:26:55PM +0200, Stefano Garzarella wrote:
> We are going to add 'struct vsock_sock *' parameter to
> virtio_transport_get_ops().
> 
> In some cases, like in the virtio_transport_reset_no_sock(),
> we don't have any socket assigned to the packet received,
> so we can't use the virtio_transport_get_ops().
> 
> In order to allow virtio_transport_reset_no_sock() to use the
> '.send_pkt' callback from the 'vhost_transport' or 'virtio_transport',
> we add the 'struct virtio_transport *' to it and to its caller:
> virtio_transport_recv_pkt().
> 
> We moved the 'vhost_transport' and 'virtio_transport' definition,
> to pass their address to the virtio_transport_recv_pkt().
> 
> Signed-off-by: Stefano Garzarella 
> ---
>  drivers/vhost/vsock.c   |  94 +++---
>  include/linux/virtio_vsock.h|   3 +-
>  net/vmw_vsock/virtio_transport.c| 160 
>  net/vmw_vsock/virtio_transport_common.c |  12 +-
>  4 files changed, 135 insertions(+), 134 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 04/13] vsock: add 'transport' member in the struct vsock_sock

2019-10-09 Thread Stefan Hajnoczi
On Fri, Sep 27, 2019 at 01:26:54PM +0200, Stefano Garzarella wrote:
> As a preparation to support multiple transports, this patch adds
> the 'transport' member at the 'struct vsock_sock'.
> This new field is initialized during the creation in the
> __vsock_create() function.
> 
> This patch also renames the global 'transport' pointer to
> 'transport_single', since for now we're only supporting a single
> transport registered at run-time.
> 
> Signed-off-by: Stefano Garzarella 
> ---
>  include/net/af_vsock.h   |  1 +
>  net/vmw_vsock/af_vsock.c | 56 +++-
>  2 files changed, 39 insertions(+), 18 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 01/13] vsock/vmci: remove unused VSOCK_DEFAULT_CONNECT_TIMEOUT

2019-10-09 Thread Stefan Hajnoczi
On Fri, Sep 27, 2019 at 01:26:51PM +0200, Stefano Garzarella wrote:
> The VSOCK_DEFAULT_CONNECT_TIMEOUT definition was introduced with
> commit d021c344051af ("VSOCK: Introduce VM Sockets"), but it is
> never used in the net/vmw_vsock/vmci_transport.c.
> 
> VSOCK_DEFAULT_CONNECT_TIMEOUT is used and defined in
> net/vmw_vsock/af_vsock.c
> 
> Signed-off-by: Stefano Garzarella 
> ---
>  net/vmw_vsock/vmci_transport.c | 5 -
>  1 file changed, 5 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 03/13] vsock: remove include/linux/vm_sockets.h file

2019-10-09 Thread Stefan Hajnoczi
On Fri, Sep 27, 2019 at 01:26:53PM +0200, Stefano Garzarella wrote:
> This header file now only includes the "uapi/linux/vm_sockets.h".
> We can include directly it when needed.
> 
> Signed-off-by: Stefano Garzarella 
> ---
>  include/linux/vm_sockets.h| 13 -
>  include/net/af_vsock.h|  2 +-
>  include/net/vsock_addr.h  |  2 +-
>  net/vmw_vsock/vmci_transport_notify.h |  1 -
>  4 files changed, 2 insertions(+), 16 deletions(-)
>  delete mode 100644 include/linux/vm_sockets.h

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 02/13] vsock: remove vm_sockets_get_local_cid()

2019-10-09 Thread Stefan Hajnoczi
On Fri, Sep 27, 2019 at 01:26:52PM +0200, Stefano Garzarella wrote:
> vm_sockets_get_local_cid() is only used in virtio_transport_common.c.
> We can replace it calling the virtio_transport_get_ops() and
> using the get_local_cid() callback registered by the transport.
> 
> Signed-off-by: Stefano Garzarella 
> ---
>  include/linux/vm_sockets.h  |  2 --
>  net/vmw_vsock/af_vsock.c| 10 --
>  net/vmw_vsock/virtio_transport_common.c |  2 +-
>  3 files changed, 1 insertion(+), 13 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v5 0/4] virtio-fs: shared file system for virtual machines

2019-09-12 Thread Stefan Hajnoczi
On Thu, Sep 12, 2019 at 10:14:11AM +0200, Miklos Szeredi wrote:
> On Wed, Sep 11, 2019 at 5:54 PM Stefan Hajnoczi  wrote:
> >
> > On Tue, Sep 10, 2019 at 05:12:02PM +0200, Miklos Szeredi wrote:
> > > I've folded the series from Vivek and fixed a couple of TODO comments
> > > myself.  AFAICS two issues remain that need to be resolved in the short
> > > term, one way or the other: freeze/restore and full virtqueue.
> >
> > I have researched freeze/restore and come to the conclusion that it
> > needs to be a future feature.  It will probably come together with live
> > migration support for reasons mentioned below.
> >
> > Most virtio devices have fairly simply power management freeze/restore
> > functions that shut down the device and bring it back to the state held
> > in memory, respectively.  virtio-fs, as well as virtio-9p and
> > virtio-gpu, are different because they contain session state.  It is not
> > easily possible to bring back the state held in memory after the device
> > has been reset.
> >
> > The following areas of the FUSE protocol are stateful and need special
> > attention:
> >
> >  * FUSE_INIT - this is pretty easy, we must re-negotiate the same
> >settings as before.
> >
> >  * FUSE_LOOKUP -> fuse_inode (inode_map)
> >
> >The session contains a set of inode numbers that have been looked up
> >using FUSE_LOOKUP.  They are ephemeral in the current virtiofsd
> >implementation and vary across device reset.  Therefore we are unable
> >to restore the same inode numbers upon restore.
> >
> >The solution is persistent inode numbers in virtiofsd.  This is also
> >needed to make open_by_handle_at(2) work and probably for live
> >migration.
> >
> >  * FUSE_OPEN -> fh (fd_map)
> >
> >The session contains FUSE file handles for open files.  There is
> >currently no way of re-opening a file so that a specific fh is
> >returned.  A mechanism to do so probably isn't necessary if the
> >driver can update the fh to the new one produced by the device for
> >all open files instead.
> >
> >  * FUSE_OPENDIR -> fh (dirp_map)
> >
> >Same story as for FUSE_OPEN but for open directories.
> >
> >  * FUSE_GETLK/SETLK/SETLKW -> (inode->posix_locks and 
> > fcntl(F_OFD_GET/SETLK))
> >
> >The session contains file locks.  The driver must reacquire them upon
> >restore.  It's unclear what to do when locking fails.
> >
> > Live migration has the same problem since the FUSE session will be moved
> > to a new virtio-fs device instance.  It makes sense to tackle both
> > features together.  This is something that can be implemented in the
> > next year, but it's not a quick fix.
> 
> Right.   The question for now is: should the freeze silently succeed
> (as it seems to do now) or should it fail instead?
> 
> I guess normally freezing should be okay, as long as the virtiofsd
> remains connected while the system is frozen.
> 
> I tried to test this with "echo -n mem > /sys/power/state", which
> indeed resulted in the virtio_fs_freeze() callback being called.
> However, I couldn't find a way to wake up the system...

The issue occurs only on restore.  The core virtio driver code resets
the device so we lose state and cannot resume.

virtio-9p and virtio-gpu do not implement the .freeze() callback but
this is problematic since the system will think freeze succeeded.  It's
safer for virtio-fs to implement .freeze() and return -EOPNOTSUPP.

Can you squash in a trivial return -EOPNOTSUPP .freeze() function?

Thanks,
Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v5 0/4] virtio-fs: shared file system for virtual machines

2019-09-11 Thread Stefan Hajnoczi
On Tue, Sep 10, 2019 at 05:12:02PM +0200, Miklos Szeredi wrote:
> I've folded the series from Vivek and fixed a couple of TODO comments
> myself.  AFAICS two issues remain that need to be resolved in the short
> term, one way or the other: freeze/restore and full virtqueue.

I have researched freeze/restore and come to the conclusion that it
needs to be a future feature.  It will probably come together with live
migration support for reasons mentioned below.

Most virtio devices have fairly simply power management freeze/restore
functions that shut down the device and bring it back to the state held
in memory, respectively.  virtio-fs, as well as virtio-9p and
virtio-gpu, are different because they contain session state.  It is not
easily possible to bring back the state held in memory after the device
has been reset.

The following areas of the FUSE protocol are stateful and need special
attention:

 * FUSE_INIT - this is pretty easy, we must re-negotiate the same
   settings as before.

 * FUSE_LOOKUP -> fuse_inode (inode_map)

   The session contains a set of inode numbers that have been looked up
   using FUSE_LOOKUP.  They are ephemeral in the current virtiofsd
   implementation and vary across device reset.  Therefore we are unable
   to restore the same inode numbers upon restore.

   The solution is persistent inode numbers in virtiofsd.  This is also
   needed to make open_by_handle_at(2) work and probably for live
   migration.

 * FUSE_OPEN -> fh (fd_map)

   The session contains FUSE file handles for open files.  There is
   currently no way of re-opening a file so that a specific fh is
   returned.  A mechanism to do so probably isn't necessary if the
   driver can update the fh to the new one produced by the device for
   all open files instead.

 * FUSE_OPENDIR -> fh (dirp_map)

   Same story as for FUSE_OPEN but for open directories.

 * FUSE_GETLK/SETLK/SETLKW -> (inode->posix_locks and fcntl(F_OFD_GET/SETLK))

   The session contains file locks.  The driver must reacquire them upon
   restore.  It's unclear what to do when locking fails.

Live migration has the same problem since the FUSE session will be moved
to a new virtio-fs device instance.  It makes sense to tackle both
features together.  This is something that can be implemented in the
next year, but it's not a quick fix.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v5 0/4] virtio-fs: shared file system for virtual machines

2019-09-11 Thread Stefan Hajnoczi
On Tue, Sep 10, 2019 at 05:12:02PM +0200, Miklos Szeredi wrote:
> Git tree for this version is available here:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git#virtiofs-v5
> 
> Only post patches that actually add virtiofs (virtiofs-v5-base..virtiofs-v5).
> 
> I've folded the series from Vivek and fixed a couple of TODO comments
> myself.  AFAICS two issues remain that need to be resolved in the short
> term, one way or the other: freeze/restore and full virtqueue.

Thank you!  I am investigating freeze/restore.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Virtio-fs] [PATCH 00/18] virtiofs: Fix various races and cleanups round 1

2019-09-09 Thread Stefan Hajnoczi
On Sun, Sep 08, 2019 at 07:53:55PM +0800, piaojun wrote:
> 
> 
> On 2019/9/6 19:52, Miklos Szeredi wrote:
> > On Fri, Sep 6, 2019 at 12:36 PM Stefan Hajnoczi  wrote:
> >>
> >> On Fri, Sep 06, 2019 at 10:15:14AM +0200, Miklos Szeredi wrote:
> >>> On Thu, Sep 5, 2019 at 9:49 PM Vivek Goyal  wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> Michael Tsirkin pointed out issues w.r.t various locking related TODO
> >>>> items and races w.r.t device removal.
> >>>>
> >>>> In this first round of cleanups, I have taken care of most pressing
> >>>> issues.
> >>>>
> >>>> These patches apply on top of following.
> >>>>
> >>>> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git#virtiofs-v4
> >>>>
> >>>> I have tested these patches with mount/umount and device removal using
> >>>> qemu monitor. For example.
> >>>
> >>> Is device removal mandatory?  Can't this be made a non-removable
> >>> device?  Is there a good reason why removing the virtio-fs device
> >>> makes sense?
> >>
> >> Hot plugging and unplugging virtio PCI adapters is common.  I'd very
> >> much like removal to work from the beginning.
> > 
> > Can you give an example use case?
> 
> I think VirtFS migration need hot plugging, or it may cause QEMU crash
> or some problems.

Live migration is currently unsupported.  Hot unplugging the virtio-fs
device would allow the guest to live migrate successfully, so it's a
useful feature to work around the missing live migration support.

Is this what you mean?

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 16/18] virtiofs: Use virtio_fs_mutex for races w.r.t ->remove and mount path

2019-09-09 Thread Stefan Hajnoczi
On Fri, Sep 06, 2019 at 09:51:31AM -0400, Vivek Goyal wrote:
> On Fri, Sep 06, 2019 at 01:05:34PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Sep 05, 2019 at 03:48:57PM -0400, Vivek Goyal wrote:
> > > It is possible that a mount is in progress and device is being removed at
> > > the same time. Use virtio_fs_mutex to avoid races.
> > > 
> > > This also takes care of bunch of races and removes some TODO items.
> > > 
> > > Signed-off-by: Vivek Goyal 
> > > ---
> > >  fs/fuse/virtio_fs.c | 32 ++--
> > >  1 file changed, 22 insertions(+), 10 deletions(-)
> > 
> > Let's move to a per-device mutex in the future.  That way a single
> > device that fails to drain/complete requests will not hang all other
> > virtio-fs instances.  This is fine for now.
> 
> Good point. For now I updated the patch so that it applies cleanly
> after previous two patches changed.
> 
> Subject: virtiofs: Use virtio_fs_mutex for races w.r.t ->remove and mount path
> 
> It is possible that a mount is in progress and device is being removed at
> the same time. Use virtio_fs_mutex to avoid races. 
> 
> This also takes care of bunch of races and removes some TODO items.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  fs/fuse/virtio_fs.c |   32 ++--
>  1 file changed, 22 insertions(+), 10 deletions(-)
> 
> Index: rhvgoyal-linux-fuse/fs/fuse/virtio_fs.c
> ===
> --- rhvgoyal-linux-fuse.orig/fs/fuse/virtio_fs.c  2019-09-06 
> 09:40:53.309245246 -0400
> +++ rhvgoyal-linux-fuse/fs/fuse/virtio_fs.c   2019-09-06 09:43:25.335245246 
> -0400
> @@ -13,7 +13,9 @@
>  #include 
>  #include "fuse_i.h"
>  
> -/* List of virtio-fs device instances and a lock for the list */
> +/* List of virtio-fs device instances and a lock for the list. Also provides
> + * mutual exclusion in device removal and mounting path
> + */
>  static DEFINE_MUTEX(virtio_fs_mutex);
>  static LIST_HEAD(virtio_fs_instances);
>  
> @@ -72,17 +74,19 @@ static void release_virtio_fs_obj(struct
>   kfree(vfs);
>  }
>  
> +/* Make sure virtiofs_mutex is held */
>  static void virtio_fs_put(struct virtio_fs *fs)
>  {
> - mutex_lock(_fs_mutex);
>   kref_put(>refcount, release_virtio_fs_obj);
> - mutex_unlock(_fs_mutex);
>  }
>  
>  static void virtio_fs_fiq_release(struct fuse_iqueue *fiq)
>  {
>   struct virtio_fs *vfs = fiq->priv;
> +
> + mutex_lock(_fs_mutex);
>   virtio_fs_put(vfs);
> + mutex_unlock(_fs_mutex);
>  }
>  
>  static void virtio_fs_drain_queue(struct virtio_fs_vq *fsvq)
> @@ -596,9 +600,8 @@ static void virtio_fs_remove(struct virt
>   struct virtio_fs *fs = vdev->priv;
>  
>   mutex_lock(_fs_mutex);
> + /* This device is going away. No one should get new reference */
>   list_del_init(>list);
> - mutex_unlock(_fs_mutex);
> -
>   virtio_fs_stop_all_queues(fs);
>   virtio_fs_drain_all_queues(fs);
>   vdev->config->reset(vdev);
> @@ -607,6 +610,7 @@ static void virtio_fs_remove(struct virt
>   vdev->priv = NULL;
>   /* Put device reference on virtio_fs object */
>   virtio_fs_put(fs);
> + mutex_unlock(_fs_mutex);
>  }
>  
>  #ifdef CONFIG_PM_SLEEP
> @@ -978,10 +982,15 @@ static int virtio_fs_fill_super(struct s
>   .no_force_umount = true,
>   };
>  
> - /* TODO lock */
> - if (fs->vqs[VQ_REQUEST].fud) {
> - pr_err("virtio-fs: device already in use\n");
> - err = -EBUSY;
> + mutex_lock(_fs_mutex);
> +
> + /* After holding mutex, make sure virtiofs device is still there.
> +  * Though we are holding a refernce to it, drive ->remove might
> +  * still have cleaned up virtual queues. In that case bail out.
> +  */
> + err = -EINVAL;
> + if (list_empty(>list)) {
> + pr_info("virtio-fs: tag <%s> not found\n", fs->tag);
>   goto err;
>   }
>  
> @@ -1007,7 +1016,6 @@ static int virtio_fs_fill_super(struct s
>  
>   fc = fs->vqs[VQ_REQUEST].fud->fc;
>  
> - /* TODO take fuse_mutex around this loop? */
>   for (i = 0; i < fs->nvqs; i++) {
>   struct virtio_fs_vq *fsvq = >vqs[i];
>  
> @@ -1020,6 +1028,7 @@ static int virtio_fs_fill_super(struct s
>   /* Previous unmount will stop all queues. Start these again */
>   virtio_fs_start_all_queues(fs);
>   fuse_send_init(fc, init_req);
> + mutex_unlock(_fs_mutex);
>   return 0;
&g

Re: [PATCH 15/18] virtiofs: Make virtio_fs object refcounted

2019-09-09 Thread Stefan Hajnoczi
On Fri, Sep 06, 2019 at 09:50:32AM -0400, Vivek Goyal wrote:
> On Fri, Sep 06, 2019 at 01:03:09PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Sep 05, 2019 at 03:48:56PM -0400, Vivek Goyal wrote:
> > > This object is used both by fuse_connection as well virt device. So make
> > > this object reference counted and that makes it easy to define life cycle
> > > of the object.
> > > 
> > > Now deivce can be removed while filesystem is still mounted. This will
> > > cleanup all the virtqueues but virtio_fs object will still be around and
> > > will be cleaned when filesystem is unmounted and sb/fc drops its 
> > > reference.
> > > 
> > > Removing a device also stops all virt queues and any new reuqest gets
> > > error -ENOTCONN. All existing in flight requests are drained before
> > > ->remove returns.
> > > 
> > > Signed-off-by: Vivek Goyal 
> > > ---
> > >  fs/fuse/virtio_fs.c | 52 +
> > >  1 file changed, 43 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> > > index 01bbf2c0e144..29ec2f5bbbe2 100644
> > > --- a/fs/fuse/virtio_fs.c
> > > +++ b/fs/fuse/virtio_fs.c
> > > @@ -37,6 +37,7 @@ struct virtio_fs_vq {
> > >  
> > >  /* A virtio-fs device instance */
> > >  struct virtio_fs {
> > > + struct kref refcount;
> > >   struct list_head list;/* on virtio_fs_instances */
> > >   char *tag;
> > >   struct virtio_fs_vq *vqs;
> > > @@ -63,6 +64,27 @@ static inline struct fuse_pqueue *vq_to_fpq(struct 
> > > virtqueue *vq)
> > >   return _to_fsvq(vq)->fud->pq;
> > >  }
> > >  
> > > +static void release_virtiofs_obj(struct kref *ref)
> > > +{
> > > + struct virtio_fs *vfs = container_of(ref, struct virtio_fs, refcount);
> > > +
> > > + kfree(vfs->vqs);
> > > + kfree(vfs);
> > > +}
> > > +
> > > +static void virtiofs_put(struct virtio_fs *fs)
> > 
> > Why do the two function names above contain "virtiofs" instead
> > of "virtio_fs"?  I'm not sure if this is intentional and is supposed to
> > mean something, but it's confusing.
> > 
> > > +{
> > > + mutex_lock(_fs_mutex);
> > > + kref_put(>refcount, release_virtiofs_obj);
> > > + mutex_unlock(_fs_mutex);
> > > +}
> > > +
> > > +static void virtio_fs_put(struct fuse_iqueue *fiq)
> > 
> > Minor issue: this function name is confusingly similar to
> > virtiofs_put().  Please rename to virtio_fs_fiq_put().
> 
> Fixed with ->release semantics. Replaced "virtiofs" with "virtio_fs".
> 
> 
> Subject: virtiofs: Make virtio_fs object refcounted
> 
> This object is used both by fuse_connection as well virt device. So make
> this object reference counted and that makes it easy to define life cycle
> of the object. 
> 
> Now deivce can be removed while filesystem is still mounted. This will
> cleanup all the virtqueues but virtio_fs object will still be around and
> will be cleaned when filesystem is unmounted and sb/fc drops its reference.
> 
> Removing a device also stops all virt queues and any new reuqest gets
> error -ENOTCONN. All existing in flight requests are drained before
> ->remove returns.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  fs/fuse/virtio_fs.c |   52 
> +++-
>  1 file changed, 43 insertions(+), 9 deletions(-)
> 
> Index: rhvgoyal-linux-fuse/fs/fuse/virtio_fs.c
> ===
> --- rhvgoyal-linux-fuse.orig/fs/fuse/virtio_fs.c  2019-09-06 
> 09:24:21.177245246 -0400
> +++ rhvgoyal-linux-fuse/fs/fuse/virtio_fs.c   2019-09-06 09:40:53.309245246 
> -0400
> @@ -37,6 +37,7 @@ struct virtio_fs_vq {
>  
>  /* A virtio-fs device instance */
>  struct virtio_fs {
> + struct kref refcount;
>   struct list_head list;/* on virtio_fs_instances */
>   char *tag;
>   struct virtio_fs_vq *vqs;
> @@ -63,6 +64,27 @@ static inline struct fuse_pqueue *vq_to_
>   return _to_fsvq(vq)->fud->pq;
>  }
>  
> +static void release_virtio_fs_obj(struct kref *ref)
> +{
> + struct virtio_fs *vfs = container_of(ref, struct virtio_fs, refcount);
> +
> + kfree(vfs->vqs);
> + kfree(vfs);
> +}
> +
> +static void virtio_fs_put(struct virtio_fs *fs)
> +{
> + mutex_lock(_fs_mutex);
> + kref_put(>refcount, release_virtio_

Re: [PATCH 08/18] virtiofs: Drain all pending requests during ->remove time

2019-09-09 Thread Stefan Hajnoczi
On Fri, Sep 06, 2019 at 10:18:49AM -0400, Michael S. Tsirkin wrote:
> On Fri, Sep 06, 2019 at 10:17:05AM -0400, Vivek Goyal wrote:
> > On Fri, Sep 06, 2019 at 11:52:10AM +0100, Stefan Hajnoczi wrote:
> > > On Thu, Sep 05, 2019 at 03:48:49PM -0400, Vivek Goyal wrote:
> > > > +static void virtio_fs_drain_queue(struct virtio_fs_vq *fsvq)
> > > > +{
> > > > +   WARN_ON(fsvq->in_flight < 0);
> > > > +
> > > > +   /* Wait for in flight requests to finish.*/
> > > > +   while (1) {
> > > > +   spin_lock(>lock);
> > > > +   if (!fsvq->in_flight) {
> > > > +   spin_unlock(>lock);
> > > > +   break;
> > > > +   }
> > > > +   spin_unlock(>lock);
> > > > +   usleep_range(1000, 2000);
> > > > +   }
> > > 
> > > I think all contexts that call this allow sleeping so we could avoid
> > > usleep here.
> > 
> > usleep_range() is supposed to be used from non-atomic context.
> > 
> > https://github.com/torvalds/linux/blob/master/Documentation/timers/timers-howto.rst
> > 
> > What construct you are thinking of?
> > 
> > Vivek
> 
> completion + signal on vq callback?

Yes.  Time-based sleep() is sub-optimal because we could wake up exactly
when in_flight is decremented from the vq callback.  This avoids
unnecessary sleep wakeups and the extra time spent sleeping after
in_flight has been decremented.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 18/18] virtiofs: Remove TODO item from virtio_fs_free_devs()

2019-09-06 Thread Stefan Hajnoczi
On Thu, Sep 05, 2019 at 03:48:59PM -0400, Vivek Goyal wrote:
> virtio_fs_free_devs() is now called from ->kill_sb(). By this time
> all device queues have been quiesced. I am assuming that while
> ->kill_sb() is in progress, another mount instance will wait for
> it to finish (sb->s_umount mutex provides mutual exclusion).
> 
> W.r.t ->remove path, we should be fine as we are not touching vdev
> or virtqueues. And we have reference on virtio_fs object, so we know
> rest of the data structures are valid.
> 
> So I can't see the need of any additional locking yet.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  fs/fuse/virtio_fs.c | 2 --
>  1 file changed, 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 17/18] virtiofs: Remove TODO to quiesce/end_requests

2019-09-06 Thread Stefan Hajnoczi
On Thu, Sep 05, 2019 at 03:48:58PM -0400, Vivek Goyal wrote:
> We now stop queues and drain all the pending requests from all virtqueues.
> So this is not a TODO anymore.
> 
> Got rid of incrementing fc->dev_count as well. It did not seem meaningful
> for virtio_fs.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  fs/fuse/virtio_fs.c | 2 --
>  1 file changed, 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 16/18] virtiofs: Use virtio_fs_mutex for races w.r.t ->remove and mount path

2019-09-06 Thread Stefan Hajnoczi
On Thu, Sep 05, 2019 at 03:48:57PM -0400, Vivek Goyal wrote:
> It is possible that a mount is in progress and device is being removed at
> the same time. Use virtio_fs_mutex to avoid races.
> 
> This also takes care of bunch of races and removes some TODO items.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  fs/fuse/virtio_fs.c | 32 ++--
>  1 file changed, 22 insertions(+), 10 deletions(-)

Let's move to a per-device mutex in the future.  That way a single
device that fails to drain/complete requests will not hang all other
virtio-fs instances.  This is fine for now.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 15/18] virtiofs: Make virtio_fs object refcounted

2019-09-06 Thread Stefan Hajnoczi
On Thu, Sep 05, 2019 at 03:48:56PM -0400, Vivek Goyal wrote:
> This object is used both by fuse_connection as well virt device. So make
> this object reference counted and that makes it easy to define life cycle
> of the object.
> 
> Now deivce can be removed while filesystem is still mounted. This will
> cleanup all the virtqueues but virtio_fs object will still be around and
> will be cleaned when filesystem is unmounted and sb/fc drops its reference.
> 
> Removing a device also stops all virt queues and any new reuqest gets
> error -ENOTCONN. All existing in flight requests are drained before
> ->remove returns.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  fs/fuse/virtio_fs.c | 52 +
>  1 file changed, 43 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 01bbf2c0e144..29ec2f5bbbe2 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -37,6 +37,7 @@ struct virtio_fs_vq {
>  
>  /* A virtio-fs device instance */
>  struct virtio_fs {
> + struct kref refcount;
>   struct list_head list;/* on virtio_fs_instances */
>   char *tag;
>   struct virtio_fs_vq *vqs;
> @@ -63,6 +64,27 @@ static inline struct fuse_pqueue *vq_to_fpq(struct 
> virtqueue *vq)
>   return _to_fsvq(vq)->fud->pq;
>  }
>  
> +static void release_virtiofs_obj(struct kref *ref)
> +{
> + struct virtio_fs *vfs = container_of(ref, struct virtio_fs, refcount);
> +
> + kfree(vfs->vqs);
> + kfree(vfs);
> +}
> +
> +static void virtiofs_put(struct virtio_fs *fs)

Why do the two function names above contain "virtiofs" instead
of "virtio_fs"?  I'm not sure if this is intentional and is supposed to
mean something, but it's confusing.

> +{
> + mutex_lock(_fs_mutex);
> + kref_put(>refcount, release_virtiofs_obj);
> + mutex_unlock(_fs_mutex);
> +}
> +
> +static void virtio_fs_put(struct fuse_iqueue *fiq)

Minor issue: this function name is confusingly similar to
virtiofs_put().  Please rename to virtio_fs_fiq_put().


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 14/18] virtiofs: Add a fuse_iqueue operation to put() reference

2019-09-06 Thread Stefan Hajnoczi
On Thu, Sep 05, 2019 at 03:48:55PM -0400, Vivek Goyal wrote:
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 85e2dcad68c1..04e2c000d63f 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -479,6 +479,11 @@ struct fuse_iqueue_ops {
>*/
>   void (*wake_pending_and_unlock)(struct fuse_iqueue *fiq)
>   __releases(fiq->waitq.lock);
> +
> + /**
> +  * Put a reference on fiq_priv.

I'm a bit confused about fiq->priv's role in this.  The callback takes
struct fuse_iqueue *fiq as the argument, not void *priv, so it could
theoretically do more than just release priv.

I think one of the following would be clearer:

 /**
  * Drop a reference to fiq->priv.
  */
 void (*put_priv)(void *priv);

Or:

 /**
  * Clean up when fuse_iqueue is destroyed.
  */
 void (*release)(struct fuse_iqueue *fiq);

In the second case fuse_conn_put() shouldn't check fiq->priv.


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 13/18] virtiofs: Do not access virtqueue in request submission path

2019-09-06 Thread Stefan Hajnoczi
On Thu, Sep 05, 2019 at 03:48:54PM -0400, Vivek Goyal wrote:
> In request submission path it is possible that virtqueue is already gone
> due to driver->remove(). So do not access it in dev_dbg(). Use pr_debug()
> instead.
> 
> If virtuqueue is gone, this will result in NULL pointer deference.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  fs/fuse/virtio_fs.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 12/18] virtiofs: Use virtio_fs_free_devs() in error path

2019-09-06 Thread Stefan Hajnoczi
On Thu, Sep 05, 2019 at 03:48:53PM -0400, Vivek Goyal wrote:
> We already have an helper to cleanup fuse devices. Use that instead of
> duplicating the code.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  fs/fuse/virtio_fs.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 11/18] virtiofs: stop and drain queues after sending DESTROY

2019-09-06 Thread Stefan Hajnoczi
On Thu, Sep 05, 2019 at 03:48:52PM -0400, Vivek Goyal wrote:
> During virtio_kill_sb() we first stop forget queue and drain it and then
> call fuse_kill_sb_anon(). This will result in sending DESTROY request to
> fuse server. Once finished, stop all the queues and drain one more time
> just to be sure and then free up the devices.
> 
> Given drain queues will call flush_work() on various workers, remove this
> logic from virtio_free_devs().
> 
> Signed-off-by: Vivek Goyal 
> ---
>  fs/fuse/virtio_fs.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 10/18] virtiofs: Do not use device managed mem for virtio_fs and virtio_fs_vq

2019-09-06 Thread Stefan Hajnoczi
On Thu, Sep 05, 2019 at 03:48:51PM -0400, Vivek Goyal wrote:
> These data structures should go away when virtio_fs object is going away.
> When deivce is going away, we need to just make sure virtqueues can go
> away and after that none of the code accesses vq and all the requests
> get error.
> 
> So allocate memory for virtio_fs and virtio_fs_vq normally and free it
> at right time.
> 
> This patch still frees up memory during device remove time. A later patch
> will make virtio_fs object reference counted and this memory will be
> freed when last reference to object is dropped.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  fs/fuse/virtio_fs.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 09/18] virtiofs: Add an helper to start all the queues

2019-09-06 Thread Stefan Hajnoczi
On Thu, Sep 05, 2019 at 03:48:50PM -0400, Vivek Goyal wrote:
> This just marks are the queues are connected and ready to accept the
> request.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  fs/fuse/virtio_fs.c | 19 ---
>  1 file changed, 16 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 08/18] virtiofs: Drain all pending requests during ->remove time

2019-09-06 Thread Stefan Hajnoczi
On Thu, Sep 05, 2019 at 03:48:49PM -0400, Vivek Goyal wrote:
> +static void virtio_fs_drain_queue(struct virtio_fs_vq *fsvq)
> +{
> + WARN_ON(fsvq->in_flight < 0);
> +
> + /* Wait for in flight requests to finish.*/
> + while (1) {
> + spin_lock(>lock);
> + if (!fsvq->in_flight) {
> + spin_unlock(>lock);
> + break;
> + }
> + spin_unlock(>lock);
> + usleep_range(1000, 2000);
> + }

I think all contexts that call this allow sleeping so we could avoid
usleep here.


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  1   2   3   4   5   >