RE: [virtio-dev] [PATCH RFC] packed ring layout spec v5

2017-11-29 Thread Lars Ganrot
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: 28. november 2017 15:37

> > If the device always returns buffers in order, then couldn't the driver 
> > skip the
> >step of reading the used.ring for read-only buffers  (e.g. TX for net 
> >devices)? The
> >used.idx tells how many buffers were returned, and since they are returned in
> >the same order as the driver sent them, it knows what their indices are. This
> >would then save one cache-miss in the old structure too.
> 
> True. That would be another variant to support though.
> 
> I doubt it'll outperform this one but I didn't test it specifically. Care 
> trying to
> implement it?

Agreed, and I don't see it as competing with the packed ring, however if there
are low hanging fruits, that improve performance of the non-ring structure
(in at least some significant use cases) they could be worth considering as part
of a rev 1.1 specification too. I'll see what can be done for a prototype.

> 
> > And a follow-up questions would then be: if a device always returns buffers 
> > in
> >order, does the v1.0 specification not require drivers to reuse descriptors 
> >in the
> >same order as they are returned? I think 3.2.1.1 implies that at least. If 
> >so,
> >wouldn't new descriptors always be placed back2back in the descriptor table
> >(contiguous memory)?
> 
> You probably mean this:
> 1. Get the next free descriptor table entry, d
> 
> and you interpret "next" here as "next in ring order".
> 
> I'm not sure everyone follows this interpretation though.
> 
> E.g. Linux does:
> static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
>void **ctx)
> {
> ...
> vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head);
> vq->free_head = head;
> 
> So descriptors are added at head of the free list.  Next is interpreted as 
> next on
> this list.  E.g. with a single request in flight, it looks like a single 
> descriptor will
> keep getting reused.
> 

I guess there isn't an explicit enough requirement in v1.0 to claim right or 
wrong 
with regards to this. Enforcing it could however be made part of a driver 
requirement imposed by the new IN_ORDER feature bit. Thus the IN_ORDER 
feature bit for the non-ring would be defined to enforce that the descriptor 
indices 
are always processed in-order by both the device and the driver.

My reason for this is to ensure that new descriptors are placed in a contiguous
range of the descriptor table, which should improve the L1$ prefetcher hit rate
for batching, and also provide means for efficient DMA in case of HW-offload.
With knowledge of the number of elements in each buffer it could maybe also be 
possible to calculate the descriptor index range to DMA.

BR,

-Lars

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] [PATCH RFC] packed ring layout spec v5

2017-11-28 Thread Michael S. Tsirkin
On Sat, Nov 25, 2017 at 07:55:53PM +, Lars Ganrot wrote:
> Hi Michael and others,
> 
> Hopefully my question below follows the forum format.
> 
> BR,
> 
> -Lars
> >
> > \subsection{In-order use of descriptors} \label{sec:Packed Virtqueues / 
> > In-order
> > use of descriptors}
> >
> > Some devices always use descriptors in the same order in which they have 
> > been
> > made available. These devices can offer the VIRTIO_F_IN_ORDER feature. If
> > negotiated, this knowledge allows devices to notify the use of a batch of 
> > buffers
> > to the driver by only writing out a single used descriptor with the Buffer 
> > ID
> > corresponding to the last descriptor in the batch.
> >
> This VIRTIO_F_IN_ORDER feature bit answers one of my earlier questions, 
> however I'm curious if it couldn't also be usable in the non-ring Virtqueue?

Yes, it could.

> If the device always returns buffers in order, then couldn't the driver skip 
> the step of reading the used.ring for read-only buffers  (e.g. TX for net 
> devices)? The used.idx tells how many buffers were returned, and since they 
> are returned in the same order as the driver sent them, it knows what their 
> indices are. This would then save one cache-miss in the old structure too.

True. That would be another variant to support though.

I doubt it'll outperform this one but I didn't test it
specifically. Care trying to implement it?

> And a follow-up questions would then be: if a device always returns buffers 
> in order, does the v1.0 specification not require drivers to reuse 
> descriptors in the same order as they are returned? I think 3.2.1.1 implies 
> that at least. If so, wouldn't new descriptors always be placed back2back in 
> the descriptor table (contiguous memory)?

You probably mean this:
1. Get the next free descriptor table entry, d

and you interpret "next" here as "next in ring order".

I'm not sure everyone follows this interpretation though.

E.g. Linux does:
static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
   void **ctx)
{
...
vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head);
vq->free_head = head;

So descriptors are added at head of the free list.  Next is interpreted
as next on this list.  E.g. with a single request in flight, it looks
like a single descriptor will keep getting reused.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] [PATCH RFC] packed ring layout spec v5

2017-11-27 Thread Michael S. Tsirkin
On Fri, Nov 24, 2017 at 10:00:39AM +, Dhanoa, Kully wrote:
> I've embedded my original comments in the text below for archiving purposes.
> 
> Michael, 
> I could not find your responses to my comments in the document.
> 
> If you could provide them, it would help me to understand the proposal better.
> 
> Thanks
> Kully
> 
> -Original Message-
> From: virtio-dev@lists.oasis-open.org 
> [mailto:virtio-dev@lists.oasis-open.org] On Behalf Of Michael S. Tsirkin
> Sent: Wednesday, November 22, 2017 2:56 PM
> To: virtio-dev@lists.oasis-open.org
> Subject: [virtio-dev] [PATCH RFC] packed ring layout spec v5
> 
> No functional changes since v4. Added some clarifications in response to 
> questions by Kully.
> 
> This is reasonably complete functionally, from spec point of view we need
> - more conformance statements
> - pseudo-code
> - discussion of memory barriers
> - rearrange existing (1.0) layout discussion to make it fit
>   in a single chapter
> 
> Kully, you sent a PDF with what looks like more feature suggestions.  For 
> example, a suggestion to use a separate memory location for used entries - 
> which as was suggested is also helpful for RDMA.
> 
> Please post them to list in text format, for now I would like to know whether 
> we can agree on the below and add more features on top gated by a feature bit.
> 
> 
> 
> 
> \section{Packed Virtqueues}\label{sec:Basic Facilities of a Virtio Device / 
> Packed Virtqueues}
> 
> Packed virtqueues is an alternative compact virtqueue layout using read-write 
> memory, that is memory that is both read and written by both host and guest.
> 
> Packed virtqueues support up to $2^14$ queues, with up to $2^15$ entries each.
> 
> With current transports, queues are located in guest memory allocated by 
> driver.
> Each packed virtqueue consists of three parts:
> 
> \begin{itemize}
> \item Descriptor Ring
> \item Device Event Suppression
> \item Driver Event Suppression
> \end{itemize}
> 
> Where Descriptor Ring in turn consists of descriptors, and where each 
> descriptor can contain the following parts:
> 
> \begin{itemize}
> \item Buffer ID
> \item Buffer Address
> \item Buffer Length
> \item Flags
> \end{itemize}
> 
> A buffer consists of zero or more device-readable physically-contiguous 
> elements followed by zero or more physically-contiguous device-writable 
> elements (each buffer has at least one element).
> 
> When the driver wants to send such a buffer to the device, it writes at least 
> one available descriptor describing elements of the buffer into the 
> Descriptor Ring.  The descriptor(s) are associated with a buffer by means of 
> a Buffer ID stored within the descriptor.
> 
> Driver then notifies the device. When the device has finished processing the 
> buffer, it writes a used device descriptor including the Buffer ID into the 
> Descriptor Ring (overwriting a driver descriptor previously made available), 
> and sends an interrupt.
> 
> Descriptor Ring is used in a circular manner: driver writes descriptors into 
> the ring in order. After reaching end of ring, the next descriptor is placed 
> at head of the ring.  Once ring is full of driver descriptors, driver stops 
> sending new requests and waits for device to start processing descriptors and 
> to write out some used descriptors before making new driver descriptors 
> available.
> 
> Similarly, device reads descriptors from the ring in order and detects that a 
> driver descriptor has been made available.  As processing of descriptors is 
> completed used descriptors are written by the device back into the ring.
> 
> Note: after reading driver descriptors and starting their processing in 
> order, device might complete their processing out of order.  Used device 
> descriptors are written in the order in which their processing is complete.
> 
> Device Event Suppression data structure is read-only by the device. It 
> includes information for reducing the number of device events - i.e. 
> interrupts to driver.
> 
> [KSD] Where is this structure located? VM memory OR device memory? I presume 
> VM memory but text should explicitly say so.
> Driver Event Suppression data structure is write-only by the device. It 
> includes information for reducing the number of driver events - i.e. 
> notifications to device.
> 
> [KSD] Similarly, where is this structure located? VM memory OR device memory? 
> I presume VM memory but text should explicitly say so.
> 

A separate section dealing with memory barriers would be a good place to
include this.

We don't say this about 1.0 ring layout anywhere though, do we?

> \subsection{Available and Used Ring Full Counters} \label{sec:Packed 
> Virtqueues / Available and Used Ring Wrap Counters} Each of the driver and 
> the device are expected to maintain, internally, a single-bit ring wrap 
> counter initialized to 1.
> 
> The counter maintained by the driver is called the Available Ring Full 
> Counter. Driver changes its value each time it makes 

RE: [virtio-dev] [PATCH RFC] packed ring layout spec v5

2017-11-25 Thread Lars Ganrot
Hi Michael and others,

Hopefully my question below follows the forum format.

BR,

-Lars
> 
> \subsection{In-order use of descriptors} \label{sec:Packed Virtqueues / 
> In-order
> use of descriptors}
> 
> Some devices always use descriptors in the same order in which they have been
> made available. These devices can offer the VIRTIO_F_IN_ORDER feature. If
> negotiated, this knowledge allows devices to notify the use of a batch of 
> buffers
> to the driver by only writing out a single used descriptor with the Buffer ID
> corresponding to the last descriptor in the batch.
>  
This VIRTIO_F_IN_ORDER feature bit answers one of my earlier questions, however 
I'm curious if it couldn't also be usable in the non-ring Virtqueue?

If the device always returns buffers in order, then couldn't the driver skip 
the step of reading the used.ring for read-only buffers  (e.g. TX for net 
devices)? The used.idx tells how many buffers were returned, and since they are 
returned in the same order as the driver sent them, it knows what their indices 
are. This would then save one cache-miss in the old structure too.

And a follow-up questions would then be: if a device always returns buffers in 
order, does the v1.0 specification not require drivers to reuse descriptors in 
the same order as they are returned? I think 3.2.1.1 implies that at least. If 
so, wouldn't new descriptors always be placed back2back in the descriptor table 
(contiguous memory)?

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] [PATCH RFC] packed ring layout spec v5

2017-11-24 Thread Michael S. Tsirkin
On Fri, Nov 24, 2017 at 10:00:39AM +, Dhanoa, Kully wrote:
> I've embedded my original comments in the text below for archiving purposes.
> 
> Michael, 
> I could not find your responses to my comments in the document.
> 
> If you could provide them, it would help me to understand the proposal better.
> 
> Thanks
> Kully

Thanks, I'll respond. For the future, we use the following
format during discussions:

> Original text
Comment

You are also supposed to snip large chunks of text to which you
do not comment. People are assumed to keep the whole thread around
for reference.


-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



RE: [virtio-dev] [PATCH RFC] packed ring layout spec v5

2017-11-24 Thread Dhanoa, Kully
I've embedded my original comments in the text below for archiving purposes.

Michael, 
I could not find your responses to my comments in the document.

If you could provide them, it would help me to understand the proposal better.

Thanks
Kully

-Original Message-
From: virtio-dev@lists.oasis-open.org [mailto:virtio-dev@lists.oasis-open.org] 
On Behalf Of Michael S. Tsirkin
Sent: Wednesday, November 22, 2017 2:56 PM
To: virtio-dev@lists.oasis-open.org
Subject: [virtio-dev] [PATCH RFC] packed ring layout spec v5

No functional changes since v4. Added some clarifications in response to 
questions by Kully.

This is reasonably complete functionally, from spec point of view we need
- more conformance statements
- pseudo-code
- discussion of memory barriers
- rearrange existing (1.0) layout discussion to make it fit
  in a single chapter

Kully, you sent a PDF with what looks like more feature suggestions.  For 
example, a suggestion to use a separate memory location for used entries - 
which as was suggested is also helpful for RDMA.

Please post them to list in text format, for now I would like to know whether 
we can agree on the below and add more features on top gated by a feature bit.




\section{Packed Virtqueues}\label{sec:Basic Facilities of a Virtio Device / 
Packed Virtqueues}

Packed virtqueues is an alternative compact virtqueue layout using read-write 
memory, that is memory that is both read and written by both host and guest.

Packed virtqueues support up to $2^14$ queues, with up to $2^15$ entries each.

With current transports, queues are located in guest memory allocated by driver.
Each packed virtqueue consists of three parts:

\begin{itemize}
\item Descriptor Ring
\item Device Event Suppression
\item Driver Event Suppression
\end{itemize}

Where Descriptor Ring in turn consists of descriptors, and where each 
descriptor can contain the following parts:

\begin{itemize}
\item Buffer ID
\item Buffer Address
\item Buffer Length
\item Flags
\end{itemize}

A buffer consists of zero or more device-readable physically-contiguous 
elements followed by zero or more physically-contiguous device-writable 
elements (each buffer has at least one element).

When the driver wants to send such a buffer to the device, it writes at least 
one available descriptor describing elements of the buffer into the Descriptor 
Ring.  The descriptor(s) are associated with a buffer by means of a Buffer ID 
stored within the descriptor.

Driver then notifies the device. When the device has finished processing the 
buffer, it writes a used device descriptor including the Buffer ID into the 
Descriptor Ring (overwriting a driver descriptor previously made available), 
and sends an interrupt.

Descriptor Ring is used in a circular manner: driver writes descriptors into 
the ring in order. After reaching end of ring, the next descriptor is placed at 
head of the ring.  Once ring is full of driver descriptors, driver stops 
sending new requests and waits for device to start processing descriptors and 
to write out some used descriptors before making new driver descriptors 
available.

Similarly, device reads descriptors from the ring in order and detects that a 
driver descriptor has been made available.  As processing of descriptors is 
completed used descriptors are written by the device back into the ring.

Note: after reading driver descriptors and starting their processing in order, 
device might complete their processing out of order.  Used device descriptors 
are written in the order in which their processing is complete.

Device Event Suppression data structure is read-only by the device. It includes 
information for reducing the number of device events - i.e. interrupts to 
driver.

[KSD] Where is this structure located? VM memory OR device memory? I presume VM 
memory but text should explicitly say so.

Driver Event Suppression data structure is write-only by the device. It 
includes information for reducing the number of driver events - i.e. 
notifications to device.

[KSD] Similarly, where is this structure located? VM memory OR device memory? I 
presume VM memory but text should explicitly say so.


\subsection{Available and Used Ring Full Counters} \label{sec:Packed Virtqueues 
/ Available and Used Ring Wrap Counters} Each of the driver and the device are 
expected to maintain, internally, a single-bit ring wrap counter initialized to 
1.

The counter maintained by the driver is called the Available Ring Full Counter. 
Driver changes its value each time it makes available the last descriptor in 
the ring (after making the last descriptor available).

The counter maintained by the device is called the Used Ring Wrap Counter.  
Device changes its value each time it uses the last descriptor in the ring 
(after marking the last descriptor used).

It is easy to see that the Availablering Wrap Counter in the driver matches the 
Used Ring Wrap Counter in the device when both are processing the same