Hello Michael, 

Thanks for doing this. ;-)

> Le 16 sept. 2016 à 00:39, Michael S. Tsirkin <m...@redhat.com> a écrit :
> Let's start a discussion around an alternative ring layout.
> This has been in my kvm forum 2016 presentation.
> The idea is to have a r/w descriptor in a ring structure,
> replacing the used and available ring, index and descriptor
> buffer.
> * Descriptor ring:
> Guest adds descriptors with unique index values and DESC_HW set in flags.
> Host overwrites used descriptors with correct len, index, and DESC_HW
> clear.  Flags are always set/cleared last.
> #define DESC_HW 0x0080
> struct desc {
>        __le64 addr;
>        __le32 len;
>        __le16 index;
>        __le16 flags;
> };
> When DESC_HW is set, descriptor belongs to device. When it is clear,
> it belongs to the driver.

I think the simplified ring layout is a good idea. 
It reduces the number of indirections, and will probably help pre-fetching data 
in order to avoid cache misses.

I am not completely convinced by the flag-based approach instead of the 
index-based approach though.
With the index-based approach, you have only one-read to do in order
to know up to what point in the queue you can go.

With this flag, every-time you look at a new descriptor, before doing anything 
you will have to look at the flags.
Meaning that you will have to wait for the result of a test on a very newly 
fetched piece of memory.

So maybe it would be interesting to still have a used_index and available_index 
(put in different cache lines).

> * Scatter/gather support
> We can use 3 bits to set direction
> and to chain s/g entries in a request, same as virtio 1.0:
> /* This marks a buffer as continuing via the next field. */
> #define VRING_DESC_F_NEXT       1
> /* This marks a buffer as write-only (otherwise read-only). */
> #define VRING_DESC_F_WRITE      2
> Unlike virtio 1.0, all descriptors must have distinct ID
> values.
> * Indirect buffers
> Can be done like in virtio 1.0:
> /* This means the buffer contains a list of buffer descriptors. */
> In the descriptors in the indirect buffers, I think we should drop index
> field altogether, just put s/g entries one after the other.
> Also, length in the indirect descriptor should mark
> the list of the chain.
> virtio 1.0 seems to allow a s/g entry followed by
> an indirect descriptor. This does not seem useful,
> so let's not allow that anymore.

First of all, if I understand correctly, VRING_DESC_F_NEXT doesn't use a 'next' 
field, and rather just
say that the next descriptor in the queue is part of the same message. I think 
this is really, really, good.

Paolo proposed in a later mail to drop VRING_DESC_F_NEXT altogether, and I think
he is right at least in the sense that having two solutions for doing almost 
the same thing 
is not necessary (IMHO, even harmful).

But, VRING_DESC_F_INDIRECT adds a layer of indirection and makes prefetching a 
more complicated.
The reason why VRING_DESC_F_INDIRECT was useful in the first place is that 
virtio 1.0 
didn't have enough descriptors in the main queue (it is equal to the queue 

Therefore, ideally, I'd rather have a longer queue and use VRING_DESC_F_NEXT 
only and drop

And if delay spent in the queue is an issue. I think it's driver's job to avoid 
bufferbloat if necessary 
by adapting the number of descriptors is allows the device to use.

> * Batching descriptors:
> virtio 1.0 allows passing a batch of descriptors in both directions, by
> incrementing the used/avail index by values > 1.  We can support this by
> tagging first/middle/last descriptors in the flags field. For
> comptibility, polarity is reversed so bit is set for non-first and
> non-last descriptors in a batch:
> #define BATCH_NOT_FIRST 0x0010
> #define BATCH_NOT_LAST  0x0020
> In other words only descriptor in batch is 0000.
> Batch does not have to be processed together, so
> !first/!last flags can be changed when descriptor is used.

I don't exactly understand why this is necessary.
Is there any added value by knowing that a bunch of messages have been written 
the same time.

As mentioned above, I'd rather keep the index approach instead of those flags.

> * Processing descriptors in and out of order
> Device processing all descriptors in order can simply flip
> the DESC_HW bit as it is done with descriptors.
> Device can process descriptors out of order, and write them out in order
> as they are used, overwriting descriptors that are there.
> Device must not use a descriptor until DESC_HW is set.
> It is only required to look at the first descriptor
> submitted, but it is allowed to look anywhere
> in the ring. This might allow parallel processing.
> Driver must not overwrite a descriptor until DESC_HW is clear.
> It is only required to look at the first descriptor
> submitted, but it is allowed to look anywhere
> in the ring. This might allow parallel processing.
> * Interrupt/event suppression
> virtio 1.0 has two mechanisms for suppression but only
> one can be used at a time. Let's pack them together
> in a structure - one for interrupts, one for notifications:
> struct event {
>       __le16 idx;
>       __le16 flags;
> }
> Flags can be used like in virtio 1.0, by storing a special
> value there:
> #define VRING_F_EVENT_ENABLE  0x0
> or it can be used to switch to event idx:
> #define VRING_F_EVENT_IDX     0x2
> in that case, interrupt triggers when descriptor with
> a given index value is used.

Is there a reason why we would keep both ways and not just use the IDX one ?


- Pierre

Reply via email to