Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-23 Thread Siwei Liu
On Fri, Feb 23, 2018 at 2:38 PM, Jiri Pirko  wrote:
> Fri, Feb 23, 2018 at 11:22:36PM CET, losewe...@gmail.com wrote:
>
> [...]
>

 No, that's not what I was talking about of course. I thought you
 mentioned the upgrade scenario this patch would like to address is to
 use the bypass interface "to take the place of the original virtio,
 and get udev to rename the bypass to what the original virtio_net
 was". That is one of the possible upgrade paths for sure. However the
 upgrade path I was seeking is to use the bypass interface to take the
 place of original VF interface while retaining the name and network
 configs, which generally can be done simply with kernel upgrade. It
 would become limiting as this patch makes the bypass interface share
 the same virtio pci device with virito backup. Can this bypass
 interface be made general to take place of any pci device other than
 virtio-net? This will be more helpful as the cloud users who has
 existing setup on VF interface don't have to recreate it on virtio-net
 and VF separately again.
>
> How that could work? If you have the VF netdev with all configuration
> including IPs and routes and whatever - now you want to do migration
> so you add virtio_net and do some weird in-driver bonding with it. But
> then, VF disappears and the VF netdev with that and also all
> configuration it had.
> I don't think this scenario is valid.

We are talking about making udev aware of the new virtio-bypass to
rebind the name of the old VF interface with supposedly virtio-bypass
*post the kernel upgrade*. Of course, this needs virtio-net backend to
supply the [bdf] info where the VF/PT device was located.

-Siwei


>
>
>>>
>>>
>>> Yes. This sounds interesting. Looks like you want an existing VM image with
>>> VF only configuration to get transparent live migration support by adding
>>> virtio_net with BACKUP feature.  We may need another feature bit to switch
>>> between these 2 options.
>>
>>Yes, that's what I was thinking about. I have been building something
>>like this before, and would like to get back after merging with your
>>patch.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-23 Thread Stephen Hemminger
(pruned to reduce thread)

On Wed, 21 Feb 2018 16:17:19 -0800
Alexander Duyck  wrote:

> >>> FWIW two solutions that immediately come to mind is to export "backup"
> >>> as phys_port_name of the backup virtio link and/or assign a name to the
> >>> master like you are doing already.  I think team uses team%d and bond
> >>> uses bond%d, soft naming of master devices seems quite natural in this
> >>> case.  
> >>
> >> I figured I had overlooked something like that.. Thanks for pointing
> >> this out. Okay so I think the phys_port_name approach might resolve
> >> the original issue. If I am reading things correctly what we end up
> >> with is the master showing up as "ens1" for example and the backup
> >> showing up as "ens1nbackup". Am I understanding that right?
> >>
> >> The problem with the team/bond%d approach is that it creates a new
> >> netdevice and so it would require guest configuration changes.
> >>  
> >>> IMHO phys_port_name == "backup" if BACKUP bit is set on slave virtio
> >>> link is quite neat.  
> >>
> >> I agree. For non-"backup" virio_net devices would it be okay for us to
> >> just return -EOPNOTSUPP? I assume it would be and that way the legacy
> >> behavior could be maintained although the function still exists.
> >>  
>  - When the 'active' netdev is unplugged OR not present on a destination
>    system after live migration, the user will see 2 virtio_net netdevs.  
> >>>
> >>> That's necessary and expected, all configuration applies to the master
> >>> so master must exist.  
> >>
> >> With the naming issue resolved this is the only item left outstanding.
> >> This becomes a matter of form vs function.
> >>
> >> The main complaint about the "3 netdev" solution is a bit confusing to
> >> have the 2 netdevs present if the VF isn't there. The idea is that
> >> having the extra "master" netdev there if there isn't really a bond is
> >> a bit ugly.  
> >
> > Is it this uglier in terms of user experience rather than
> > functionality? I don't want it dynamically changed between 2-netdev
> > and 3-netdev depending on the presence of VF. That gets back to my
> > original question and suggestion earlier: why not just hide the lower
> > netdevs from udev renaming and such? Which important observability
> > benefits users may get if exposing the lower netdevs?
> >
> > Thanks,
> > -Siwei  
> 
> The only real advantage to a 2 netdev solution is that it looks like
> the netvsc solution, however it doesn't behave like it since there are
> some features like XDP that may not function correctly if they are
> left enabled in the virtio_net interface.
> 
> As far as functionality the advantage of not hiding the lower devices
> is that they are free to be managed. The problem with pushing all of
> the configuration into the upper device is that you are limited to the
> intersection of the features of the lower devices. This can be
> limiting for some setups as some VFs support things like more queues,
> or better interrupt moderation options than others so trying to make
> everything work with one config would be ugly.
> 


Let's not make XDP the blocker for doing the best solution
from the end user point of view. XDP is just yet another offload
thing which needs to be handled.  The current backup device solution
used in netvsc doesn't handle the full range of offload options
(things like flow direction, DCB, etc); no one but the HW vendors
seems to care.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-23 Thread Stephen Hemminger
On Thu, 22 Feb 2018 13:30:12 -0800
Alexander Duyck  wrote:

> > Again, I undertand your motivation. Yet I don't like your solution.
> > But if the decision is made to do this in-driver bonding. I would like
> > to see it baing done some generic way:
> > 1) share the same "in-driver bonding core" code with netvsc
> >put to net/core.
> > 2) the "in-driver bonding core" will strictly limit the functionality,
> >like active-backup mode only, one vf, one backup, vf netdev type
> >check (so noone could enslave a tap or anything else)
> > If user would need something more, he should employ team/bond.  

Sharing would be good, but netvsc world would really like to only have
one visible network device.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-23 Thread Jiri Pirko
Fri, Feb 23, 2018 at 11:22:36PM CET, losewe...@gmail.com wrote:

[...]

>>>
>>> No, that's not what I was talking about of course. I thought you
>>> mentioned the upgrade scenario this patch would like to address is to
>>> use the bypass interface "to take the place of the original virtio,
>>> and get udev to rename the bypass to what the original virtio_net
>>> was". That is one of the possible upgrade paths for sure. However the
>>> upgrade path I was seeking is to use the bypass interface to take the
>>> place of original VF interface while retaining the name and network
>>> configs, which generally can be done simply with kernel upgrade. It
>>> would become limiting as this patch makes the bypass interface share
>>> the same virtio pci device with virito backup. Can this bypass
>>> interface be made general to take place of any pci device other than
>>> virtio-net? This will be more helpful as the cloud users who has
>>> existing setup on VF interface don't have to recreate it on virtio-net
>>> and VF separately again.

How that could work? If you have the VF netdev with all configuration
including IPs and routes and whatever - now you want to do migration
so you add virtio_net and do some weird in-driver bonding with it. But
then, VF disappears and the VF netdev with that and also all
configuration it had.
I don't think this scenario is valid.


>>
>>
>> Yes. This sounds interesting. Looks like you want an existing VM image with
>> VF only configuration to get transparent live migration support by adding
>> virtio_net with BACKUP feature.  We may need another feature bit to switch
>> between these 2 options.
>
>Yes, that's what I was thinking about. I have been building something
>like this before, and would like to get back after merging with your
>patch.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-23 Thread Siwei Liu
On Wed, Feb 21, 2018 at 6:35 PM, Samudrala, Sridhar
 wrote:
> On 2/21/2018 5:59 PM, Siwei Liu wrote:
>>
>> On Wed, Feb 21, 2018 at 4:17 PM, Alexander Duyck
>>  wrote:
>>>
>>> On Wed, Feb 21, 2018 at 3:50 PM, Siwei Liu  wrote:

 I haven't checked emails for days and did not realize the new revision
 had already came out. And thank you for the effort, this revision
 really looks to be a step forward towards our use case and is close to
 what we wanted to do. A few questions in line.

 On Sat, Feb 17, 2018 at 9:12 AM, Alexander Duyck
  wrote:
>
> On Fri, Feb 16, 2018 at 6:38 PM, Jakub Kicinski  wrote:
>>
>> On Fri, 16 Feb 2018 10:11:19 -0800, Sridhar Samudrala wrote:
>>>
>>> Ppatch 2 is in response to the community request for a 3 netdev
>>> solution.  However, it creates some issues we'll get into in a
>>> moment.
>>> It extends virtio_net to use alternate datapath when available and
>>> registered. When BACKUP feature is enabled, virtio_net driver creates
>>> an additional 'bypass' netdev that acts as a master device and
>>> controls
>>> 2 slave devices.  The original virtio_net netdev is registered as
>>> 'backup' netdev and a passthru/vf device with the same MAC gets
>>> registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are
>>> associated with the same 'pci' device.  The user accesses the network
>>> interface via 'bypass' netdev. The 'bypass' netdev chooses 'active'
>>> netdev
>>> as default for transmits when it is available with link up and
>>> running.
>>
>> Thank you do doing this.
>>
>>> We noticed a couple of issues with this approach during testing.
>>> - As both 'bypass' and 'backup' netdevs are associated with the same
>>>virtio pci device, udev tries to rename both of them with the same
>>> name
>>>and the 2nd rename will fail. This would be OK as long as the
>>> first netdev
>>>to be renamed is the 'bypass' netdev, but the order in which udev
>>> gets
>>>to rename the 2 netdevs is not reliable.
>>
>> Out of curiosity - why do you link the master netdev to the virtio
>> struct device?
>
> The basic idea of all this is that we wanted this to work with an
> existing VM image that was using virtio. As such we were trying to
> make it so that the bypass interface takes the place of the original
> virtio and get udev to rename the bypass to what the original
> virtio_net was.

 Could it made it also possible to take over the config from VF instead
 of virtio on an existing VM image? And get udev rename the bypass
 netdev to what the original VF was. I don't say tightly binding the
 bypass master to only virtio or VF, but I think we should provide both
 options to support different upgrade paths. Possibly we could tweak
 the device tree layout to reuse the same PCI slot for the master
 bypass netdev, such that udev would not get confused when renaming the
 device. The VF needs to use a different function slot afterwards.
 Perhaps we might need to a special multiseat like QEMU device for that
 purpose?

 Our case we'll upgrade the config from VF to virtio-bypass directly.
>>>
>>> So if I am understanding what you are saying you are wanting to flip
>>> the backup interface from the virtio to a VF. The problem is that
>>> becomes a bit of a vendor lock-in solution since it would rely on a
>>> specific VF driver. I would agree with Jiri that we don't want to go
>>> down that path. We don't want every VF out there firing up its own
>>> separate bond. Ideally you want the hypervisor to be able to manage
>>> all of this which is why it makes sense to have virtio manage this and
>>> why this is associated with the virtio_net interface.
>>
>> No, that's not what I was talking about of course. I thought you
>> mentioned the upgrade scenario this patch would like to address is to
>> use the bypass interface "to take the place of the original virtio,
>> and get udev to rename the bypass to what the original virtio_net
>> was". That is one of the possible upgrade paths for sure. However the
>> upgrade path I was seeking is to use the bypass interface to take the
>> place of original VF interface while retaining the name and network
>> configs, which generally can be done simply with kernel upgrade. It
>> would become limiting as this patch makes the bypass interface share
>> the same virtio pci device with virito backup. Can this bypass
>> interface be made general to take place of any pci device other than
>> virtio-net? This will be more helpful as the cloud users who has
>> existing setup on VF interface don't have to recreate it on virtio-net
>> and VF separately again.
>
>
> Yes. This sounds interesting. Looks like 

v4.16-rc2: virtio-block + ext4 lockdep splats / sleeping from invalid context

2018-02-23 Thread Mark Rutland
Hi all,

While fuzzing arm64/v4.16-rc2 with syzkaller, I simultaneously hit a
number of splats in the block layer:

* inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-R} usage in
  jbd2_trans_will_send_data_barrier

* BUG: sleeping function called from invalid context at mm/mempool.c:320

* WARNING: CPU: 0 PID: 0 at block/blk.h:297 
generic_make_request_checks+0x670/0x750

... I've included the full splats at the end of the mail.

These all happen in the context of the virtio block IRQ handler, so I
wonder if this calls something that doesn't expect to be called from IRQ
context. Is it valid to call blk_mq_complete_request() or
blk_mq_end_request() from an IRQ handler?

Syzkaller came up with a minimized reproducer, but it's a bit wacky (the
fcntl and bpf calls should have no practical effect), and I haven't
managed to come up with a C reproducer.

Any ideas?

Thanks,
Mark.


Syzkaller reproducer:
# {Threaded:true Collide:true Repeat:false Procs:1 Sandbox:setuid Fault:false 
FaultCall:-1 FaultNth:0 EnableTun:true UseTmpDir:true HandleSegv:true 
WaitRepeat:false Debug:false Repro:false}
mmap(&(0x7f00/0x24000)=nil, 0x24000, 0x3, 0x32, 0x, 0x0)
r0 = openat(0xff9c, &(0x7f019000-0x8)='./file0\x00', 0x42, 0x0)
fcntl$setstatus(r0, 0x4, 0x1)
ftruncate(r0, 0x400)
io_setup(0x1f, 

Re: [PATCH 2/6] pci: Scan all functions when probing while running over Jailhouse

2018-02-23 Thread Andy Shevchenko
On Mon, Jan 22, 2018 at 8:12 AM, Jan Kiszka  wrote:

>  #include 
>  #include 
>  #include 
> +#include 

Keep it in order?


>  #include 
>  #include 
>  #include 
> +#include 

Ditto.

-- 
With Best Regards,
Andy Shevchenko
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_ring: fix num_free handling in error case

2018-02-23 Thread Cornelia Huck
On Fri, 23 Feb 2018 19:41:30 +0800
Tiwei Bie  wrote:

> The vq->vq.num_free hasn't been changed when error happens,
> so it shouldn't be changed when handling the error.
> 
> Fixes: 780bc7903a32 ("virtio_ring: Support DMA APIs")
> Cc: Andy Lutomirski 
> Cc: Michael S. Tsirkin 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Tiwei Bie 
> ---
>  drivers/virtio/virtio_ring.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index eb30f3e09a47..71458f493cf8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -428,8 +428,6 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>   i = virtio16_to_cpu(_vq->vdev, vq->vring.desc[i].next);
>   }
>  
> - vq->vq.num_free += total_sg;
> -
>   if (indirect)
>   kfree(desc);
>  

Reviewed-by: Cornelia Huck 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] virtio_ring: fix num_free handling in error case

2018-02-23 Thread Tiwei Bie
The vq->vq.num_free hasn't been changed when error happens,
so it shouldn't be changed when handling the error.

Fixes: 780bc7903a32 ("virtio_ring: Support DMA APIs")
Cc: Andy Lutomirski 
Cc: Michael S. Tsirkin 
Cc: sta...@vger.kernel.org
Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index eb30f3e09a47..71458f493cf8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -428,8 +428,6 @@ static inline int virtqueue_add(struct virtqueue *_vq,
i = virtio16_to_cpu(_vq->vdev, vq->vring.desc[i].next);
}
 
-   vq->vq.num_free += total_sg;
-
if (indirect)
kfree(desc);
 
-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 2/2] virtio_ring: support packed ring

2018-02-23 Thread Tiwei Bie
Signed-off-by: Tiwei Bie 
---
 drivers/virtio/virtio_ring.c | 699 +--
 include/linux/virtio_ring.h  |   8 +-
 2 files changed, 618 insertions(+), 89 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index eb30f3e09a47..393778a2f809 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -58,14 +58,14 @@
 
 struct vring_desc_state {
void *data; /* Data for callback. */
-   struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
+   void *indir_desc;   /* Indirect descriptor, if any. */
+   int num;/* Descriptor list length. */
 };
 
 struct vring_virtqueue {
struct virtqueue vq;
 
-   /* Actual memory layout for this queue */
-   struct vring vring;
+   bool packed;
 
/* Can we use weak barriers? */
bool weak_barriers;
@@ -87,11 +87,28 @@ struct vring_virtqueue {
/* Last used index we've seen. */
u16 last_used_idx;
 
-   /* Last written value to avail->flags */
-   u16 avail_flags_shadow;
-
-   /* Last written value to avail->idx in guest byte order */
-   u16 avail_idx_shadow;
+   union {
+   /* Available for split ring */
+   struct {
+   /* Actual memory layout for this queue */
+   struct vring vring;
+
+   /* Last written value to avail->flags */
+   u16 avail_flags_shadow;
+
+   /* Last written value to avail->idx in
+* guest byte order */
+   u16 avail_idx_shadow;
+   };
+
+   /* Available for packed ring */
+   struct {
+   /* Actual memory layout for this queue */
+   struct vring_packed vring_packed;
+   u8 wrap_counter : 1;
+   bool chaining;
+   };
+   };
 
/* How to notify other side. FIXME: commonalize hcalls! */
bool (*notify)(struct virtqueue *vq);
@@ -201,26 +218,37 @@ static dma_addr_t vring_map_single(const struct 
vring_virtqueue *vq,
  cpu_addr, size, direction);
 }
 
-static void vring_unmap_one(const struct vring_virtqueue *vq,
-   struct vring_desc *desc)
+static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
 {
+   u64 addr;
+   u32 len;
u16 flags;
 
if (!vring_use_dma_api(vq->vq.vdev))
return;
 
-   flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+   if (vq->packed) {
+   struct vring_packed_desc *desc = _desc;
+
+   addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
+   len = virtio32_to_cpu(vq->vq.vdev, desc->len);
+   flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+   } else {
+   struct vring_desc *desc = _desc;
+
+   addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
+   len = virtio32_to_cpu(vq->vq.vdev, desc->len);
+   flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+   }
 
if (flags & VRING_DESC_F_INDIRECT) {
dma_unmap_single(vring_dma_dev(vq),
-virtio64_to_cpu(vq->vq.vdev, desc->addr),
-virtio32_to_cpu(vq->vq.vdev, desc->len),
+addr, len,
 (flags & VRING_DESC_F_WRITE) ?
 DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
dma_unmap_page(vring_dma_dev(vq),
-  virtio64_to_cpu(vq->vq.vdev, desc->addr),
-  virtio32_to_cpu(vq->vq.vdev, desc->len),
+  addr, len,
   (flags & VRING_DESC_F_WRITE) ?
   DMA_FROM_DEVICE : DMA_TO_DEVICE);
}
@@ -235,8 +263,9 @@ static int vring_mapping_error(const struct vring_virtqueue 
*vq,
return dma_mapping_error(vring_dma_dev(vq), addr);
 }
 
-static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
-unsigned int total_sg, gfp_t gfp)
+static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
+  unsigned int total_sg,
+  gfp_t gfp)
 {
struct vring_desc *desc;
unsigned int i;
@@ -257,14 +286,32 @@ static struct vring_desc *alloc_indirect(struct virtqueue 
*_vq,
return desc;
 }
 
-static inline int virtqueue_add(struct virtqueue *_vq,
-   struct scatterlist *sgs[],
-   unsigned int total_sg,
-   unsigned int out_sgs,
-   

[PATCH RFC 1/2] virtio: introduce packed ring defines

2018-02-23 Thread Tiwei Bie
Signed-off-by: Tiwei Bie 
---
 include/uapi/linux/virtio_config.h | 18 +-
 include/uapi/linux/virtio_ring.h   | 68 ++
 2 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_config.h 
b/include/uapi/linux/virtio_config.h
index 308e2096291f..e3d077ef5207 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -49,7 +49,7 @@
  * transport being used (eg. virtio_ring), the rest are per-device feature
  * bits. */
 #define VIRTIO_TRANSPORT_F_START   28
-#define VIRTIO_TRANSPORT_F_END 34
+#define VIRTIO_TRANSPORT_F_END 37
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -71,4 +71,20 @@
  * this is for compatibility with legacy systems.
  */
 #define VIRTIO_F_IOMMU_PLATFORM33
+
+/* This feature indicates support for the packed virtqueue layout. */
+#define VIRTIO_F_RING_PACKED   34
+
+/*
+ * This feature indicates that all buffers are used by the device
+ * in the same order in which they have been made available.
+ */
+#define VIRTIO_F_IN_ORDER  35
+
+/*
+ * This feature indicates that drivers pass extra data (besides
+ * identifying the Virtqueue) in their device notifications.
+ */
+#define VIRTIO_F_NOTIFICATION_DATA 36
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 6d5d5faa989b..77b1d4aeef72 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -44,6 +44,9 @@
 /* This means the buffer contains a list of buffer descriptors. */
 #define VRING_DESC_F_INDIRECT  4
 
+#define VRING_DESC_F_AVAIL(b)  ((b) << 7)
+#define VRING_DESC_F_USED(b)   ((b) << 15)
+
 /* The Host uses this in used->flags to advise the Guest: don't kick me when
  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
  * will still kick if it's out of buffers. */
@@ -104,6 +107,36 @@ struct vring {
struct vring_used *used;
 };
 
+struct vring_packed_desc_event {
+   /* Descriptor Event Offset */
+   __virtio16 desc_event_off   : 15,
+   /* Descriptor Event Wrap Counter */
+  desc_event_wrap  : 1;
+   /* Descriptor Event Flags */
+   __virtio16 desc_event_flags : 2;
+};
+
+struct vring_packed_desc {
+   /* Buffer Address. */
+   __virtio64 addr;
+   /* Buffer Length. */
+   __virtio32 len;
+   /* Buffer ID. */
+   __virtio16 id;
+   /* The flags depending on descriptor type. */
+   __virtio16 flags;
+};
+
+struct vring_packed {
+   unsigned int num;
+
+   struct vring_packed_desc *desc;
+
+   struct vring_packed_desc_event *driver;
+
+   struct vring_packed_desc_event *device;
+};
+
 /* Alignment requirements for vring elements.
  * When using pre-virtio 1.0 layout, these fall out naturally.
  */
@@ -171,4 +204,39 @@ static inline int vring_need_event(__u16 event_idx, __u16 
new_idx, __u16 old)
return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
 }
 
+/* The standard layout for the packed ring is a continuous chunk of memory
+ * which looks like this.
+ *
+ * struct vring_packed
+ * {
+ * // The actual descriptors (16 bytes each)
+ * struct vring_packed_desc desc[num];
+ *
+ * // Padding to the next align boundary.
+ * char pad[];
+ *
+ * // Driver Event Suppression
+ * struct vring_packed_desc_event driver;
+ *
+ * // Device Event Suppression
+ * struct vring_packed_desc_event device;
+ * };
+ */
+
+static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
+void *p, unsigned long align)
+{
+   vr->num = num;
+   vr->desc = p;
+   vr->driver = (void *)(((uintptr_t)p + sizeof(struct vring_packed_desc)
+   * num + align - 1) & ~(align - 1));
+   vr->device = vr->driver + 1;
+}
+
+static inline unsigned vring_packed_size(unsigned int num, unsigned long align)
+{
+   return ((sizeof(struct vring_packed_desc) * num + align - 1)
+   & ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
+}
+
 #endif /* _UAPI_LINUX_VIRTIO_RING_H */
-- 
2.14.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 0/2] Packed ring for virtio

2018-02-23 Thread Tiwei Bie
Hello everyone,

This RFC implements a subset of packed ring which is described at
https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd08.pdf

The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
Minor changes are needed for the vhost code, e.g. to kick the guest.

It's not a complete implementation, here is what's missing:

- Device area and driver area
- VIRTIO_RING_F_INDIRECT_DESC
- VIRTIO_F_NOTIFICATION_DATA
- Virtio devices except net are not tested
- See FIXME in the code for more details

Thanks!

Best regards,
Tiwei Bie

Tiwei Bie (2):
  virtio: introduce packed ring defines
  virtio_ring: support packed ring

 drivers/virtio/virtio_ring.c   | 699 -
 include/linux/virtio_ring.h|   8 +-
 include/uapi/linux/virtio_config.h |  18 +-
 include/uapi/linux/virtio_ring.h   |  68 
 4 files changed, 703 insertions(+), 90 deletions(-)

-- 
2.14.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization