Re: [virtio-dev] Re: [PATCH v8 08/16] packed virtqueues: more efficient virtqueue layout

2018-02-28 Thread Michael S. Tsirkin
On Tue, Feb 27, 2018 at 09:49:28AM +0800, Tiwei Bie wrote:
> struct {
>   __le16 desc_event_off : 15,
>  desc_event_wrap : 1;
>   __le16 desc_event_flags : 2;
> };

I decided on this format in the end. Thanks for the suggestion!

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH] introduction: document bitfield notation

2018-02-28 Thread Michael S. Tsirkin
Bitfields are a useful and familiar way to specify sub-byte structure
layout. The only issue is that bitfield order isn't portable across
architectures.  Document that we list bitfields from least to
most significant one, and warn about portability issues.

Signed-off-by: Michael S. Tsirkin 
---
 introduction.tex | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/introduction.tex b/introduction.tex
index 979881e..3cb7a70 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -157,5 +157,23 @@ in little-endian byte order.
 in big-endian byte order.
 \end{description}
 
+When documenting sub-byte data fields, C-like bitfield notation
+is used. Fields within an integer are always listed in order,
+from the least significant to the most significant bit.
+
+For example:
+\begin{lstlisting}
+be16 A : 15;
+be16 B : 1;
+\end{lstlisting}
+documents the value A stored in the low 15 bit of a 16 bit
+integer and the value B stored in the high bit of the 16 bit
+integer, the integer in turn using the big-endian byte order.
+
+Note that this notation typically matches the way bitfields are
+packed by C compilers on little-endian architectures but not the
+way bitfields are packed by C compilers on big-endian
+architectures.
+
 \newpage
 
-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-28 Thread Michael S. Tsirkin
On Wed, Feb 28, 2018 at 08:25:01PM +0100, Jiri Pirko wrote:
> Wed, Feb 28, 2018 at 04:45:39PM CET, m...@redhat.com wrote:
> >On Wed, Feb 28, 2018 at 04:11:31PM +0100, Jiri Pirko wrote:
> >> Wed, Feb 28, 2018 at 03:32:44PM CET, m...@redhat.com wrote:
> >> >On Wed, Feb 28, 2018 at 08:08:39AM +0100, Jiri Pirko wrote:
> >> >> Tue, Feb 27, 2018 at 10:41:49PM CET, kubak...@wp.pl wrote:
> >> >> >On Tue, 27 Feb 2018 13:16:21 -0800, Alexander Duyck wrote:
> >> >> >> Basically we need some sort of PCI or PCIe topology mapping for the
> >> >> >> devices that can be translated into something we can communicate over
> >> >> >> the communication channel. 
> >> >> >
> >> >> >Hm.  This is probably a completely stupid idea, but if we need to
> >> >> >start marshalling configuration requests/hints maybe the entire problem
> >> >> >could be solved by opening a netlink socket from hypervisor?  Even make
> >> >> >teamd run on the hypervisor side...
> >> >> 
> >> >> Interesting. That would be more trickier then just to fwd 1 genetlink
> >> >> socket to the hypervisor.
> >> >> 
> >> >> Also, I think that the solution should handle multiple guest oses. What
> >> >> I'm thinking about is some generic bonding description passed over some
> >> >> communication channel into vm. The vm either use it for configuration,
> >> >> or ignores it if it is not smart enough/updated enough.
> >> >
> >> >For sure, we could build virtio-bond to pass that info to guests.
> >> 
> >> What do you mean by "virtio-bond". virtio_net extension?
> >
> >I mean a new device supplying topology information to guests,
> >with updates whenever VMs are started, stopped or migrated.
> 
> Good. Any idea how that device would look like? Also, any idea how to
> handle in in kernel and how to pass along this info to userspace?
> Is there anything similar out there?
> 
> Thanks!

E.g. balloon is used to pass hints about amount of memory
guest should use. We could do something similar.

I imagine device can send a configuration interrupt
on each topology change. Kernel wakes up userspace pollers.
Userspace starts doing reads from a char device and
figures out what changed.

Which info is needed there? I am not sure.
How about list of MAC/VLAN addresses coupled to list of
devices to queue on (specified by mac? by PCI address)?

Or do we ever need to go higher level and make decisions
based on IP addresses as well?

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH] pci-iov: Add support for unmanaged SR-IOV

2018-02-28 Thread Alexander Duyck
On Wed, Feb 28, 2018 at 2:59 PM, Alex Williamson
 wrote:
> On Wed, 28 Feb 2018 09:49:21 -0800
> Alexander Duyck  wrote:
>
>> On Tue, Feb 27, 2018 at 2:25 PM, Alexander Duyck
>>  wrote:
>> > On Tue, Feb 27, 2018 at 1:40 PM, Alex Williamson
>> >  wrote:
>> >> On Tue, 27 Feb 2018 11:06:54 -0800
>> >> Alexander Duyck  wrote:
>> >>
>> >>> From: Alexander Duyck 
>> >>>
>> >>> This patch is meant to add support for SR-IOV on devices when the VFs are
>> >>> not managed by the kernel. Examples of recent patches attempting to do 
>> >>> this
>> >>> include:
>> >>
>> >> It appears to enable sriov when the _pf_ is not managed by the
>> >> kernel, but by "managed" we mean that either there is no pf driver or
>> >> the pf driver doesn't provide an sriov_configure callback,
>> >> intentionally or otherwise.
>> >>
>> >>> virto - https://patchwork.kernel.org/patch/10241225/
>> >>> pci-stub - https://patchwork.kernel.org/patch/10109935/
>> >>> vfio - https://patchwork.kernel.org/patch/10103353/
>> >>> uio - https://patchwork.kernel.org/patch/9974031/
>> >>
>> >> So is the goal to get around the issues with enabling sriov on each of
>> >> the above drivers by doing it under the covers or are you really just
>> >> trying to enable sriov for a truly unmanage (no pf driver) case?  For
>> >> example, should a driver explicitly not wanting sriov enabled implement
>> >> a dummy sriov_configure function?
>> >>
>> >>> Since this is quickly blowing up into a multi-driver problem it is 
>> >>> probably
>> >>> best to implement this solution in one spot.
>> >>>
>> >>> This patch is an attempt to do that. What we do with this patch is 
>> >>> provide
>> >>> a generic call to enable SR-IOV in the case that the PF driver is either
>> >>> not present, or the PF driver doesn't support configuring SR-IOV.
>> >>>
>> >>> A new sysfs value called sriov_unmanaged_autoprobe has been added. This
>> >>> value is used as the drivers_autoprobe setting of the VFs when they are
>> >>> being managed by an external entity such as userspace or device firmware
>> >>> instead of being managed by the kernel.
>> >>
>> >> Documentation/ABI/testing/sysfs-bus-pci update is missing.
>> >
>> > I can make sure to update that in the next version.
>> >
>> >>> One side effect of this change is that the sriov_drivers_autoprobe and
>> >>> sriov_unmanaged_autoprobe will only apply their updates when SR-IOV is
>> >>> disabled. Attempts to update them when SR-IOV is in use will only update
>> >>> the local value and will not update sriov->autoprobe.
>> >>
>> >> And we expect users to understand when sriov_drivers_autoprobe applies
>> >> vs sriov_unmanaged_autoprobe, even though they're using the same
>> >> interfaces to enable sriov?  Are all combinations expected to work, ex.
>> >> unmanaged sriov is enabled, a native pf driver loads, vfs work?  Not
>> >> only does it seems like there's opportunity to use this incorrectly, I
>> >> think maybe it might be difficult to use correctly.
>> >>
>> >>> I based my patch set originally on the patch by Mark Rustad but there 
>> >>> isn't
>> >>> much left after going through and cleaning out the bits that were no 
>> >>> longer
>> >>> needed, and after incorporating the feedback from David Miller.
>> >>>
>> >>> I have included the authors of the original 4 patches above in the Cc 
>> >>> here.
>> >>> My hope is to get feedback and/or review on if this works for their use
>> >>> cases.
>> >>>
>> >>> Cc: Mark Rustad 
>> >>> Cc: Maximilian Heyne 
>> >>> Cc: Liang-Min Wang 
>> >>> Cc: David Woodhouse 
>> >>> Signed-off-by: Alexander Duyck 
>> >>> ---
>> >>>  drivers/pci/iov.c|   27 +++-
>> >>>  drivers/pci/pci-driver.c |2 +
>> >>>  drivers/pci/pci-sysfs.c  |   62 
>> >>> +-
>> >>>  drivers/pci/pci.h|4 ++-
>> >>>  include/linux/pci.h  |1 +
>> >>>  5 files changed, 86 insertions(+), 10 deletions(-)
>> >>>
>> >>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>> >>> index 677924ae0350..7b8858bd8d03 100644
>> >>> --- a/drivers/pci/iov.c
>> >>> +++ b/drivers/pci/iov.c
>> >>> @@ -446,6 +446,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
>> >>>   pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, >vf_device);
>> >>>   iov->pgsz = pgsz;
>> >>>   iov->self = dev;
>> >>> + iov->autoprobe = true;
>> >>>   iov->drivers_autoprobe = true;
>> >>>   pci_read_config_dword(dev, pos + PCI_SRIOV_CAP, >cap);
>> >>>   pci_read_config_byte(dev, pos + PCI_SRIOV_FUNC_LINK, >link);
>> >>> @@ -643,8 +644,11 @@ void pci_restore_iov_state(struct pci_dev *dev)
>> >>>   */
>> >>>  void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool auto_probe)
>> >>>  {

Re: [virtio-dev] Re: [virtio] Re: [PATCH v8 08/16] packed virtqueues: more efficient virtqueue layout

2018-02-28 Thread Michael S. Tsirkin
On Tue, Feb 27, 2018 at 06:03:01PM +0100, Halil Pasic wrote:
> 
> 
> On 02/27/2018 03:11 PM, Michael S. Tsirkin wrote:
> >> [..]
> > +
> > +\devicenormative{\subsection}{The Virtqueue Descriptor Table}{Basic 
> > Facilities of a Virtio Device / Packed Virtqueues / The Virtqueue 
> > Descriptor Table}
> > +A device MUST NOT write to a device-readable buffer, and a device 
> > SHOULD NOT
> > +read a device-writable buffer.
> > +A device MUST NOT use a descriptor unless it observes
> > +VIRTQ_DESC_F_AVAIL bit in its \field{flags} being changed.
>  I don't really understand this. How does the device observe
>  the VIRTQ_DESC_F_AVAIL bit being changed?
> >>> By reading the descriptor.
> >>>
> >> :) My point is: to observe a change one usually either needs at
> >> least one reading before and at least one reading after the change,
> >> or one needs to know that a certain reading means change. The latter
> >> is possible if we know that at the beginning of the time frame under
> >> consideration (t_0) only a certain set of values,let's say B like before,
> >> is possible, and after the change only a certain other set of values
> >> let's say A like after, is possible, and A and B are disjunctive (
> >> $A \cap B = \emtyset$).
> > Well each descriptor is read each time ring wraps around,
> > and the bit value changes each time ring wraps around.
> > For example device knows it's zero initialized so
> > if it reads bit value as 1 it knows the bit value has changed.
> > 
> > 
> 
> Yeah I kind of understand but I would like having a more straightforward
> formulation here (than changes).
> 
> BTW does this mean that the vhost implementation (that is:
> 
> +static bool desc_is_avail(struct vhost_virtqueue *vq,
> +   struct vring_desc_packed *desc)
> +{
> + if (vq->used_wrap_counter)
> + if ((desc->flags & DESC_AVAIL) && !(desc->flags & DESC_USED))
> + return true;
> + if (vq->used_wrap_counter == false)
> + if (!(desc->flags & DESC_AVAIL) && (desc->flags & DESC_USED))
> + return true;
> +
> + return false;
> +}
> 
> ) is needlessly looking at the 'used' bit? (I think that is the case.)
> 
> Bottom line is: I would like avail/used protocol described in a less
> ambiguous fashion.
> 
> However if I'm the only one who finds this aspect hard to understand,
> the problem probably lies with me and not with the text. I can accept
> that too.

I don't want to over-specify it. There are many options.
For example, if driver sets ID to a value != 0 then
when it sees ID != 0 it knows it has been used.

I added pseudo-code for the driver, hopefully that is sufficient.

> 
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio] Re: [virtio-dev] Re: [PATCH v8 08/16] packed virtqueues: more efficient virtqueue layout

2018-02-28 Thread Michael S. Tsirkin
On Wed, Feb 28, 2018 at 02:59:51PM +0100, Halil Pasic wrote:
> 
> 
> On 02/28/2018 02:42 PM, Jens Freimann wrote:
> > On Tue, Feb 27, 2018 at 12:29:11PM +0100, Halil Pasic wrote:
> >>
> >>
> >> On 02/27/2018 11:23 AM, Jens Freimann wrote:
> >>> On Mon, Feb 26, 2018 at 11:05:14PM +0200, Michael S. Tsirkin wrote:
>  On Mon, Feb 26, 2018 at 06:19:21PM +0100, Halil Pasic wrote:
> > > +?? vq->driver_event.flags = 0x1;
> > > +?? memory_barrier();
> > > +
> > > +?? flags = d->flags;
> > > +?? bool avail = flags & (1 << 
> > > VIRTQ_DESC_F_AVAIL);
> > > +?? bool used = flags & (1 << 
> > > VIRTQ_DESC_F_USED);
> > > +?? if (avail != used) {
> > > +?? break;
> > > +?? }
> > > +
> > > +?? vq->driver_event.flags = 0x2;
> > > +?? }
> > > +
> > > +?? read_memory_barrier();
> >
> > Now with the condition avail != used a freshly (that is zero 
> > initialized)
> > ring would appear all used. And we would do process_buffer(d) for the
> > whole ring if this code happens to get executed. Do we have to make
> > sure that this does not happen?
> 
>  I'll have to think about this.
> >>>
> >>> With the wrap counter initialized to 1 descriptors would not be seen
> >>> as used.
> >>
> >> Looking at the code... In vhost:
> >>
> >> +static bool desc_is_avail(struct vhost_virtqueue *vq,
> >> +  struct vring_desc_packed *desc)
> >> +{
> >> +    if (vq->used_wrap_counter)
> >> +    if ((desc->flags & DESC_AVAIL) && !(desc->flags & DESC_USED))
> >> +    return true;
> >> +    if (vq->used_wrap_counter == false)
> >> +    if (!(desc->flags & DESC_AVAIL) && (desc->flags & DESC_USED))
> >> +    return true;
> >> +
> >> +    return false;
> >> +}
> >>
> >> Here the bit pattern corresponding to available depends on the
> >> value of the wrap counter. I kind of anticipated this, but I could not
> >> find it defined in this spec.
> >>
> >> OTOH in guest:
> >>
> >> +static inline bool more_used_packed(const struct vring_virtqueue *vq)
> >> +{
> >> +    u16 last_used, flags;
> >> +    bool avail, used;
> >> +
> >> +    if (vq->vq.num_free == vq->vring.num)
> >> +    return false;
> >> +
> >> +    last_used = vq->last_used_idx;
> >> +    flags = virtio16_to_cpu(vq->vq.vdev,
> >> +    vq->vring_packed.desc[last_used].flags);
> >> +    avail = flags & VRING_DESC_F_AVAIL(1);
> >> +    used = flags & VRING_DESC_F_USED(1);
> >> +
> >> +    return avail == used;
> >> +}
> >>
> >> if the next to be used descriptor is actually used does not depend on
> >> any wrap counter, but has vq->vq.num_free == vq->vring.num as an extra
> >> condition. This extra condition is basically 'there are outstanding
> >> descriptors' and those are obviously either 'available' or yet to be 
> >> observed
> >> 'used' descriptors. Right after initialization is covered by this extra
> >> condition. And obviously if the descriptor in question is still available
> >> flags & VRING_DESC_F_AVAIL(1) != flags & VRING_DESC_F_USED(1) holds, so
> >> with the extra condition we are right there where we want to be.
> >>
> >> But I could not find the extra condition in the spec.
> >>
> >> With that said, I also want to point out that I don't understand
> >> your statement Jens. I don't see a way to express the condition 
> >> corresponding
> >> to more_used_packed() using the driver wrap counter (avail_wrap_count).
> >> Of course a wrap counter that wraps when last_used wraps could be used
> >> to (but no such counter is mentioned here AFAIU).
> > 
> > Not sure I get this.
> > I was merely saying that when descriptor flags are initialized to 0
> > and the wrap counters to 1, then it is not the case that the driver
> > would see all descriptors as used because it takes the wrap counter
> > into account.
> > 
> 
> Please point me to the paragraph where it's written  how is the wrap
> counter to be taken into account when trying to determine if the
> desc_ring[last_used] (the descriptor we are polling) is used or not.
> 
> Nothing like that being specified (or at least I could not find it)
> was actually what I got hooked on.
> 
> Regards,
> Halil

So the spec just says this: if you see avail flag change,
descriptor is available. If you see used flag change, it
is used.

As value of the flag is also known (equals the wrap bit)
that is one way to detect change.


> 
> 
> 
> -
> To unsubscribe from this mail list, you must leave the OASIS TC that 
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 


[virtio-dev] Re: [virtio] Re: [virtio-dev] Re: [PATCH v8 08/16] packed virtqueues: more efficient virtqueue layout

2018-02-28 Thread Michael S. Tsirkin
On Tue, Feb 27, 2018 at 12:29:11PM +0100, Halil Pasic wrote:
> 
> 
> On 02/27/2018 11:23 AM, Jens Freimann wrote:
> > On Mon, Feb 26, 2018 at 11:05:14PM +0200, Michael S. Tsirkin wrote:
> >> On Mon, Feb 26, 2018 at 06:19:21PM +0100, Halil Pasic wrote:
> >>> > +    vq->driver_event.flags = 0x1;
> >>> > +    memory_barrier();
> >>> > +
> >>> > +    flags = d->flags;
> >>> > +    bool avail = flags & (1 << VIRTQ_DESC_F_AVAIL);
> >>> > +    bool used = flags & (1 << VIRTQ_DESC_F_USED);
> >>> > +    if (avail != used) {
> >>> > +    break;
> >>> > +    }
> >>> > +
> >>> > +    vq->driver_event.flags = 0x2;
> >>> > +    }
> >>> > +
> >>> > +    read_memory_barrier();
> >>>
> >>> Now with the condition avail != used a freshly (that is zero initialized)
> >>> ring would appear all used. And we would do process_buffer(d) for the
> >>> whole ring if this code happens to get executed. Do we have to make
> >>> sure that this does not happen?
> >>
> >> I'll have to think about this.
> > 
> > With the wrap counter initialized to 1 descriptors would not be seen
> > as used.
> 
> Looking at the code... In vhost:
> 
> +static bool desc_is_avail(struct vhost_virtqueue *vq,
> +   struct vring_desc_packed *desc)
> +{
> + if (vq->used_wrap_counter)
> + if ((desc->flags & DESC_AVAIL) && !(desc->flags & DESC_USED))
> + return true;
> + if (vq->used_wrap_counter == false)
> + if (!(desc->flags & DESC_AVAIL) && (desc->flags & DESC_USED))
> + return true;
> +
> + return false;
> +}
> 
> Here the bit pattern corresponding to available depends on the
> value of the wrap counter. I kind of anticipated this, but I could not
> find it defined in this spec.
> 
> OTOH in guest:
> 
> +static inline bool more_used_packed(const struct vring_virtqueue *vq)
> +{
> + u16 last_used, flags;
> + bool avail, used;
> +
> + if (vq->vq.num_free == vq->vring.num)
> + return false;
> +
> + last_used = vq->last_used_idx;
> + flags = virtio16_to_cpu(vq->vq.vdev,
> + vq->vring_packed.desc[last_used].flags);
> + avail = flags & VRING_DESC_F_AVAIL(1);
> + used = flags & VRING_DESC_F_USED(1);
> +
> + return avail == used;
> +}
> 
> if the next to be used descriptor is actually used does not depend on
> any wrap counter, but has vq->vq.num_free == vq->vring.num as an extra
> condition. This extra condition is basically 'there are outstanding
> descriptors' and those are obviously either 'available' or yet to be observed
> 'used' descriptors. Right after initialization is covered by this extra
> condition. And obviously if the descriptor in question is still available
> flags & VRING_DESC_F_AVAIL(1) != flags & VRING_DESC_F_USED(1) holds, so
> with the extra condition we are right there where we want to be.
> 
> But I could not find the extra condition in the spec.
> 
> With that said, I also want to point out that I don't understand
> your statement Jens. I don't see a way to express the condition corresponding
> to more_used_packed() using the driver wrap counter (avail_wrap_count).
> Of course a wrap counter that wraps when last_used wraps could be used
> to (but no such counter is mentioned here AFAIU).

I added such a counter in the pseudo-code.


> >>
> >>
> >>> I was under the impression that this whole wrap counter exercise is
> >>> to be able to distinguish these cases.
> >>>
> >>> BTW tools/virtio/ringtest/ring.c has a single flag bit to indicate
> >>> available/used and does not have these wrap counters AFAIR.
> >>
> >> A single flag is fine if there's not s/g support and all descriptors are
> >> written out.  Wrap counters are needed if we are to support skipping
> >> descriptors because of s/g or in order.
> >>
> >>
> >>> Also for split virtqueues  a descriptor has three possible states:
> >>> * available
> >>> * used
> >>> * free
> >>>
> >>> I wonder if it's the same for packed, and if, how do I recognize
> >>> free descriptors (that is descriptors that are neither available
> >>> nor used.
> >>
> >> I'll think about this.
> >>
> >>> I'm pretty much confused on how this scheme with the available
> >>> and used wrap counters (or device and driver wrap counters is
> >>> supposed to work). A working implementation in C would really help
> >>> me to understand this.
> >>
> >> DPDK based implementation has been posted.
> > 
> > vhost and guest drivers have also been posted.
> > guest: https://lkml.org/lkml/2018/2/23/242
> > vhost: https://lkml.org/lkml/2018/2/13/1102
> > 
> 
> Thanks a lot. I've already found vhost myself but you saved me
> some searching with the other one ;).
> 
> > regards,
> > Jens
> >>
> >>> > +    process_buffer(d);
> >>> > +    vq->next_used++;
> >>> > +    if (vq->next_used >= vq->size) {
> >>> > +    vq->next_used = 0;
> 

[virtio-dev] [PATCH v9 10/16] packed virtqueues: more efficient virtqueue layout

2018-02-28 Thread Michael S. Tsirkin
Performance analysis of this is in my kvm forum 2016 presentation.  The
idea is to have a r/w descriptor in a ring structure, replacing the used
and available ring, index and descriptor buffer.

This is also easier for devices to implement than the 1.0 layout.
Several more enhancements will be necessary to actually make this
efficient for devices to use.

Signed-off-by: Michael S. Tsirkin 
Acked-by: Cornelia Huck 
---
 content.tex |  28 ++-
 packed-ring.tex | 680 
 2 files changed, 705 insertions(+), 3 deletions(-)
 create mode 100644 packed-ring.tex

diff --git a/content.tex b/content.tex
index e1e30a0..73f40b7 100644
--- a/content.tex
+++ b/content.tex
@@ -263,8 +263,20 @@ these parts (following \ref{sec:Basic Facilities of a 
Virtio Device / Split Virt
 
 \end{note}
 
+Two formats are supported: Split Virtqueues (see \ref{sec:Basic
+Facilities of a Virtio Device / Split
+Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device /
+Split Virtqueues}) and Packed Virtqueues (see \ref{sec:Basic
+Facilities of a Virtio Device / Packed
+Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device /
+Packed Virtqueues}).
+
+Every driver and device supports either the Packed or the Split
+Virtqueue format, or both.
+
 \input{split-ring.tex}
 
+\input{packed-ring.tex}
 \chapter{General Initialization And Device Operation}\label{sec:General 
Initialization And Device Operation}
 
 We start with an overview of device initialization, then expand on the
@@ -5215,10 +5227,15 @@ Currently these device-independent feature bits defined:
 \begin{description}
   \item[VIRTIO_F_RING_INDIRECT_DESC (28)] Negotiating this feature indicates
   that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
-  flag set, as described in \ref{sec:Basic Facilities of a Virtio Device / 
Virtqueues / The Virtqueue Descriptor Table / Indirect 
Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues / 
The Virtqueue Descriptor Table / Indirect Descriptors}.
-
+  flag set, as described in \ref{sec:Basic Facilities of a Virtio
+Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
+Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
+Virtqueues / The Virtqueue Descriptor Table / Indirect
+Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather 
Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather 
Support}.
   \item[VIRTIO_F_RING_EVENT_IDX(29)] This feature enables the 
\field{used_event}
-  and the \field{avail_event} fields as described in \ref{sec:Basic Facilities 
of a Virtio Device / Virtqueues / Virtqueue Interrupt Suppression} and 
\ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used 
Ring}.
+  and the \field{avail_event} fields as described in
+\ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue 
Interrupt Suppression}, \ref{sec:Basic Facilities of a Virtio Device / 
Virtqueues / The Virtqueue Used Ring} and \ref{sec:Packed Virtqueues / Driver 
and Device Event Suppression}.
+
 
   \item[VIRTIO_F_VERSION_1(32)] This indicates compliance with this
 specification, giving a simple way to detect legacy devices or drivers.
@@ -5228,6 +5245,9 @@ Currently these device-independent feature bits defined:
   addresses in memory.  If this feature bit is set to 0, then the device emits
   physical addresses which are not translated further, even though an IOMMU
   may be present.
+  \item[VIRTIO_F_RING_PACKED(34)] This feature indicates
+  support for the packed virtqueue layout as described in
+  \ref{sec:Basic Facilities of a Virtio Device / Packed 
Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Packed 
Virtqueues}.
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
@@ -5241,6 +5261,8 @@ passed to the device into physical addresses in memory.  
If
 VIRTIO_F_IOMMU_PLATFORM is not offered, then a driver MUST pass only physical
 addresses to the device.
 
+A driver SHOULD accept VIRTIO_F_RING_PACKED if it is offered.
+
 \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
 
 A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
diff --git a/packed-ring.tex b/packed-ring.tex
new file mode 100644
index 000..2943df0
--- /dev/null
+++ b/packed-ring.tex
@@ -0,0 +1,680 @@
+\section{Packed Virtqueues}\label{sec:Basic Facilities of a Virtio Device / 
Packed Virtqueues}
+
+Packed virtqueues is an alternative compact virtqueue layout using
+read-write memory, that is memory that is both read and written
+by both host and guest.
+
+Use of packed virtqueues is negotiated by the VIRTIO_F_RING_PACKED
+feature bit.
+
+Packed virtqueues support up to $2^{15}$ entries each.
+
+With current transports, virtqueues are located in guest memory
+allocated by driver.
+Each packed virtqueue consists of three parts:
+

[virtio-dev] [PATCH v9 01/16] introduction: document bitfield notation

2018-02-28 Thread Michael S. Tsirkin
Bitfields are a useful and familiar way to specify sub-byte structure
layout. The only issue is that bitfield order isn't portable across
architectures.  Document that we list bitfields from least to
most significant one, and warn about portability issues.

Signed-off-by: Michael S. Tsirkin 
---
 introduction.tex | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/introduction.tex b/introduction.tex
index 979881e..3cb7a70 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -157,5 +157,23 @@ in little-endian byte order.
 in big-endian byte order.
 \end{description}
 
+When documenting sub-byte data fields, C-like bitfield notation
+is used. Fields within an integer are always listed in order,
+from the least significant to the most significant bit.
+
+For example:
+\begin{lstlisting}
+be16 A : 15;
+be16 B : 1;
+\end{lstlisting}
+documents the value A stored in the low 15 bit of a 16 bit
+integer and the value B stored in the high bit of the 16 bit
+integer, the integer in turn using the big-endian byte order.
+
+Note that this notation typically matches the way bitfields are
+packed by C compilers on little-endian architectures but not the
+way bitfields are packed by C compilers on big-endian
+architectures.
+
 \newpage
 
-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 07/16] content: generalize rest of text

2018-02-28 Thread Michael S. Tsirkin
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Cornelia Huck 
---
 content.tex | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/content.tex b/content.tex
index 9fc9673..5634c7d 100644
--- a/content.tex
+++ b/content.tex
@@ -1467,8 +1467,7 @@ All register values are organized as Little Endian.
   }
   \hline 
   \mmioreg{QueueNum}{Virtual queue size}{0x038}{W}{%
-Queue size is the number of elements in the queue, therefore in each
-of the Descriptor Table, the Available Ring and the Used Ring.
+Queue size is the number of elements in the queue.
 Writing to this register notifies the device what size of the
 queue the driver will use. This applies to the queue selected by
 writing to \field{QueueSel}.
@@ -1491,9 +1490,9 @@ All register values are organized as Little Endian.
 caused the device interrupt to be asserted.
 The following events are possible:
 \begin{description}
-  \item[Used Ring Update] - bit 0 - the interrupt was asserted
-because the device has updated the Used
-Ring in at least one of the active virtual queues.
+  \item[Used Buffer Update] - bit 0 - the interrupt was asserted
+because the device has used a buffer
+in at least one of the active virtual queues.
   \item [Configuration Change] - bit 1 - the interrupt was
 asserted because the configuration of the device has changed.
 \end{description}
@@ -1642,9 +1641,8 @@ The driver will typically initialize the virtual queue in 
the following way:
\field{QueueNumMax}. If the returned value is zero (0x0) the
queue is not available.
 
-\item Allocate and zero the queue pages, making sure the memory
-   is physically contiguous. It is recommended to align the
-   Used Ring to an optimal boundary (usually the page size).
+\item Allocate and zero the queue memory, making sure the memory
+   is physically contiguous.
 
 \item Notify the device about the queue size by writing the size to
\field{QueueNum}.
-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 13/16] split-ring: in order feature

2018-02-28 Thread Michael S. Tsirkin
For a split ring, require that drivers use descriptors in order too.
This allows devices to skip reading the available ring.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Cornelia Huck 
Reviewed-by: Stefan Hajnoczi 
---
 split-ring.tex | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/split-ring.tex b/split-ring.tex
index 87ecee2..df278fe 100644
--- a/split-ring.tex
+++ b/split-ring.tex
@@ -203,6 +203,10 @@ struct virtq_desc {
 The number of descriptors in the table is defined by the queue size
 for this virtqueue: this is the maximum possible descriptor chain length.
 
+If VIRTIO_F_IN_ORDER has been negotiated, driver uses
+descriptors in ring order: starting from offset 0 in the table,
+and wrapping around at the end of the table.
+
 \begin{note}
 The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
 referred to this structure as vring_desc, and the constants as
@@ -218,6 +222,12 @@ purposes).
 Drivers MUST NOT add a descriptor chain over than $2^{32}$ bytes long in total;
 this implies that loops in the descriptor chain are forbidden!
 
+If VIRTIO_F_IN_ORDER has been negotiated, and when making a
+descriptor with VRING_DESC_F_NEXT set in \field{flags} at offset
+$x$ in the table available to the device, driver MUST set
+\field{next} to $0$ for the last descriptor in the table
+(where $x = queue\_size - 1$) and to $x + 1$ for the rest of the descriptors.
+
 \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio 
Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
 
 Some devices benefit by concurrently dispatching a large number
@@ -247,6 +257,10 @@ chained by \field{next}. An indirect descriptor without a 
valid \field{next}
 A single indirect descriptor
 table can include both device-readable and device-writable descriptors.
 
+If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
+use sequential indices, in-order: index 0 followed by index 1
+followed by index 2, etc.
+
 \drivernormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a 
Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect 
Descriptors}
 The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag unless the
 VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
@@ -259,6 +273,10 @@ the device.
 A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT
 in \field{flags}.
 
+If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
+MUST appear sequentially, with \field{next} taking the value
+of 1 for the 1st descriptor, 2 for the 2nd one, etc.
+
 \devicenormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a 
Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect 
Descriptors}
 The device MUST ignore the write-only flag (\field{flags}\_DESC_F_WRITE) 
in the descriptor that refers to an indirect table.
 
-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 04/16] content: move virtqueue operation description

2018-02-28 Thread Michael S. Tsirkin
virtqueue operation description is specific to the virtqueue
format. Move it out to split-ring.tex and update all
references.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Cornelia Huck 
---
 conformance.tex |   4 +-
 content.tex | 171 +++-
 split-ring.tex  | 181 ++--
 3 files changed, 185 insertions(+), 171 deletions(-)

diff --git a/conformance.tex b/conformance.tex
index f59e360..55d17b4 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -40,9 +40,9 @@ A driver MUST conform to the following normative statements:
 \item \ref{drivernormative:Basic Facilities of a Virtio Device / Virtqueues / 
Virtqueue Interrupt Suppression}
 \item \ref{drivernormative:Basic Facilities of a Virtio Device / Virtqueues / 
The Virtqueue Used Ring}
 \item \ref{drivernormative:Basic Facilities of a Virtio Device / Virtqueues / 
Virtqueue Notification Suppression}
+\item \ref{drivernormative:Basic Facilities of a Virtio Device / Virtqueues / 
Supplying Buffers to The Device / Updating idx}
+\item \ref{drivernormative:Basic Facilities of a Virtio Device / Virtqueues / 
Supplying Buffers to The Device / Notifying The Device}
 \item \ref{drivernormative:General Initialization And Device Operation / 
Device Initialization}
-\item \ref{drivernormative:General Initialization And Device Operation / 
Device Operation / Supplying Buffers to The Device / Updating idx}
-\item \ref{drivernormative:General Initialization And Device Operation / 
Device Operation / Supplying Buffers to The Device / Notifying The Device}
 \item \ref{drivernormative:General Initialization And Device Operation / 
Device Cleanup}
 \item \ref{drivernormative:Reserved Feature Bits}
 \end{itemize}
diff --git a/content.tex b/content.tex
index 5b4c4e9..3b4579e 100644
--- a/content.tex
+++ b/content.tex
@@ -337,167 +337,14 @@ And Device Operation / Device Initialization / Set 
DRIVER-OK}.
 
 \section{Device Operation}\label{sec:General Initialization And Device 
Operation / Device Operation}
 
-There are two parts to device operation: supplying new buffers to
-the device, and processing used buffers from the device.
-
-\begin{note} As an
-example, the simplest virtio network device has two virtqueues: the
-transmit virtqueue and the receive virtqueue. The driver adds
-outgoing (device-readable) packets to the transmit virtqueue, and then
-frees them after they are used. Similarly, incoming (device-writable)
-buffers are added to the receive virtqueue, and processed after
-they are used.
-\end{note}
-
-\subsection{Supplying Buffers to The Device}\label{sec:General Initialization 
And Device Operation / Device Operation / Supplying Buffers to The Device}
-
-The driver offers buffers to one of the device's virtqueues as follows:
-
-\begin{enumerate}
-\item\label{itm:General Initialization And Device Operation / Device Operation 
/ Supplying Buffers to The Device / Place Buffers} The driver places the buffer 
into free descriptor(s) in the
-   descriptor table, chaining as necessary (see \ref{sec:Basic Facilities of a 
Virtio Device / Virtqueues / The Virtqueue Descriptor Table}~\nameref{sec:Basic 
Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table}).
-
-\item\label{itm:General Initialization And Device Operation / Device Operation 
/ Supplying Buffers to The Device / Place Index} The driver places the index of 
the head of the descriptor chain
-   into the next ring entry of the available ring.
-
-\item Steps \ref{itm:General Initialization And Device Operation / Device 
Operation / Supplying Buffers to The Device / Place Buffers} and 
\ref{itm:General Initialization And Device Operation / Device Operation / 
Supplying Buffers to The Device / Place Index} MAY be performed repeatedly if 
batching
-  is possible.
-
-\item The driver performs suitable a memory barrier to ensure the device sees
-  the updated descriptor table and available ring before the next
-  step.
-
-\item The available \field{idx} is increased by the number of
-  descriptor chain heads added to the available ring.
-
-\item The driver performs a suitable memory barrier to ensure that it updates
-  the \field{idx} field before checking for notification suppression.
-
-\item If notifications are not suppressed, the driver notifies the device
-of the new available buffers.
-\end{enumerate}
-
-Note that the above code does not take precautions against the
-available ring buffer wrapping around: this is not possible since
-the ring buffer is the same size as the descriptor table, so step
-(1) will prevent such a condition.
-
-In addition, the maximum queue size is 32768 (the highest power
-of 2 which fits in 16 bits), so the 16-bit \field{idx} value can always
-distinguish between a full and empty buffer.
+When operating the device, each field in the device configuration
+space can be changed by either the driver or the device.
 
-What 

[virtio-dev] [PATCH v9 00/16] packed ring layout spec

2018-02-28 Thread Michael S. Tsirkin
This addresses comments on v8.

Thanks a lot to all reviewers of v8 - I hope we are
finally there or almost there.


A compiled version can
be found under https://github.com/oasis-tcs/virtio-docs.git

virtio-v1.1-packed-wd09-diff.pdf  virtio-v1.1-packed-wd09.pdf

for redline and clean versions, respectively.
If you are interested in changes from v8, that's in

virtio-v1.1-w08-to-wd09-diff.pdf

in this directory.

Note: please do not try to edit the pdf and post comments
in the edited file. Please post comments in a text
format, as pdfs are not archived with the list.

TODO: support for actual passthrough devices will likely
require more new features, such as requirement for
stronger memory barriers.

Note: should this proposal be accepted and approved, one or more
  claims disclosed to the TC admin and listed on the Virtio TC
  IPR page https://github.com/oasis-tcs/virtio-admin/blob/master/IPR.md
  might become Essential Claims.

Changes from v8:

 packed-ring: clarify wording on s/g conformance
 split-ring: typo: aligment
 packed-ring: typo: aligment
 packed-ring: clarify multi-buffer requests
 packed-ring: necessary, not sufficient condition
 packed-ring: support skipping used descritors
 packed-ring: fix pseudo code
 packed-ring: format pseudocode using spaces
 packed-ring: drop << in pseudocode
 packed-ring: clarify event suppression format
 packed-ring: chained descriptors are always on
 VIRTIO_F_NOTIFICATION_DATA: switch to bitfields
 introduction: document bitfield notation
 VIRTIO_F_NOTIFICATION_DATA: minor cleanups
 content: VIRTIO_F_NOTIFICATION_DATA is option
 VIRTIO_F_NOTIFICATION_DATA: improve formatting

Changes from v7:
- new notitfication_data feature, supported for all
  transports and formats
- addressed all outstanding comments

Changes from v6:
- isolate in-order feature to a separate set of patches
  (reduces scope in case there's more discussion around it)
- support in-order option for split rings
- update all references to available/used ring in spec
  to a format-agnostic terminology
- minor changes to event suppression format
- minor changes to notification format
- lots of new conformance clauses

Changes from v5:
- scope reductions (see below). We can add more
  features down the road, hopefully reduced scope will be enough
  to finalize spec soon.
- cleanup and integrate in the spec
- pseudo-code

Deferred features:
- dropped _F_DESC_LIST, 1.0 includes this unconditionally, we
  can do same
- dropped event structure change notifications - needed for
  efficient hardware implementations but let's add this on top

3 1st patches just move text around so all virtio 1.0
things are in the same place. 2 last ones add the new layout

Option to mark descriptors as not generating events isn't
yet implemented. Again, let's add this on top.

I also note that for hardware implementations, a different
set of memory barriers is needed. Again, let's add this on top

Michael S. Tsirkin (16):
  introduction: document bitfield notation
  content: move 1.0 queue format out to a separate section
  content: move ring text out to a separate file
  content: move virtqueue operation description
  content: len -> used length, used ring -> vq
  content: generalize transport ring part naming
  content: generalize rest of text
  split-ring: generalize text
  split-ring: typo: aligment
  packed virtqueues: more efficient virtqueue layout
  content: in-order buffer use
  packed-ring: add in order support
  split-ring: in order feature
  VIRTIO_F_NOTIFICATION_DATA: extra data to devices
  makediff: update to show diff from master
  REVISION: set to 1.1 packed wd09

 REVISION |   2 +-
 conformance.tex  |   5 +-
 content.tex  | 919 ++-
 introduction.tex |  18 ++
 makediff.sh  |   3 +-
 notifications.c  |   3 +
 packed-ring.tex  | 704 ++
 split-ring.tex   | 689 +
 8 files changed, 1655 insertions(+), 688 deletions(-)
 create mode 100644 notifications.c
 create mode 100644 packed-ring.tex
 create mode 100644 split-ring.tex

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 02/16] content: move 1.0 queue format out to a separate section

2018-02-28 Thread Michael S. Tsirkin
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Cornelia Huck 
---
 content.tex | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/content.tex b/content.tex
index c7ef7fd..4483a4b 100644
--- a/content.tex
+++ b/content.tex
@@ -230,7 +230,30 @@ result.
 The mechanism for bulk data transport on virtio devices is
 pretentiously called a virtqueue. Each device can have zero or more
 virtqueues\footnote{For example, the simplest network device has one virtqueue 
for
-transmit and one for receive.}.  Each queue has a 16-bit queue size
+transmit and one for receive.}.
+
+Driver makes requests available to device by adding
+an available buffer to the queue - i.e. adding a buffer
+describing the request to a virtqueue, and optionally triggering
+a driver event - i.e. sending a notification to the device.
+
+Device executes the requests and - when complete - adds
+a used buffer to the queue - i.e. lets the driver
+know by marking the buffer as used. Device can then trigger
+a device event - i.e. send an interrupt to the driver.
+
+For queue operation detail, see \ref{sec:Basic Facilities of a Virtio Device / 
Split Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Split 
Virtqueues}.
+
+\section{Split Virtqueues}\label{sec:Basic Facilities of a Virtio Device / 
Split Virtqueues}
+The split virtqueue format is the original format used by legacy
+virtio devices.  The split virtqueue format separates the
+virtqueue into several parts, where each part is write-able by
+either the driver or the device, but not both. Multiple
+locations need to be updated when making a buffer available
+and when marking it as used.
+
+
+Each queue has a 16-bit queue size
 parameter, which sets the number of entries and implies the total size
 of the queue.
 
-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 05/16] content: len -> used length, used ring -> vq

2018-02-28 Thread Michael S. Tsirkin
Document buffer used len and use that terminology everywhere in the
generic section.

Further, drop the 'used ring' terminology and just say virtqueue.

Reviewed-by: Cornelia Huck 
Signed-off-by: Michael S. Tsirkin 
---
 content.tex | 39 ---
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/content.tex b/content.tex
index 3b4579e..4350ecf 100644
--- a/content.tex
+++ b/content.tex
@@ -242,7 +242,8 @@ a used buffer to the queue - i.e. lets the driver
 know by marking the buffer as used. Device can then trigger
 a device event - i.e. send an interrupt to the driver.
 
-For queue operation detail, see \ref{sec:Basic Facilities of a Virtio Device / 
Split Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Split 
Virtqueues}.
+Device reports the number of bytes it has written to memory for
+each buffer it uses. This is referred to as ``used length''.
 
 \input{split-ring.tex}
 
@@ -1304,7 +1305,7 @@ The driver interrupt handler would typically:
 \begin{itemize}
   \item Read the ISR Status field, which will reset it to zero.
   \item If the lower bit is set:
-look through the used rings of all virtqueues for the
+look through all virtqueues for the
 device, to see if any progress has been made by the device
 which requires servicing.
   \item If the second lower bit is set:
@@ -1313,8 +1314,7 @@ The driver interrupt handler would typically:
   \item If MSI-X capability is enabled:
 \begin{itemize}
   \item
-Look through the used rings of
-all virtqueues mapped to that MSI-X vector for the
+Look through all virtqueues mapped to that MSI-X vector for the
 device, to see if any progress has been made by the device
 which requires servicing.
   \item
@@ -1728,8 +1728,7 @@ nor behaviour:
   }
   \hline
   \mmioreg{QueueNum}{Virtual queue size}{0x038}{W}{%
-Queue size is the number of elements in the queue, therefore size
-of the descriptor table and both available and used rings.
+Queue size is the number of elements in the queue.
 Writing to this register notifies the device what size of the
 queue the driver will use. This applies to the queue selected by
 writing to \field{QueueSel}.
@@ -2709,7 +2708,7 @@ when VIRTIO_NET_F_MRG_RXBUF was negotiated; without that 
feature the
 structure was 2 bytes shorter.
 
 When using the legacy interface, the driver SHOULD ignore the
-\field{len} value in used ring entries for the transmit queues
+used length for the transmit queues
 and the controlq queue.
 \begin{note}
 Historically, some devices put
@@ -2868,8 +2867,8 @@ Often a driver will suppress transmission virtqueue 
interrupts
 and check for used packets in the transmit path of following
 packets.
 
-The normal behavior in this interrupt handler is to retrieve and
-new descriptors from the used ring and free the corresponding
+The normal behavior in this interrupt handler is to retrieve
+used buffers from the virtqueue and free the corresponding
 headers and packets.
 
 \subsubsection{Setting Up Receive Buffers}\label{sec:Device Types / Network 
Device / Device Operation / Setting Up Receive Buffers}
@@ -2936,7 +2935,7 @@ Processing incoming packets involves:
   This allows receipt of large packets without having to allocate large
   buffers: a packet that does not fit in a single buffer can flow
   over to the next buffer, and so on. In this case, there will be
-  at least \field{num_buffers} in the used ring, and the device
+  at least \field{num_buffers} used buffers in the virtqueue, and the device
   chains them together to form a single packet in a way similar to
   how it would store it in a single buffer spread over multiple
   descriptors.
@@ -2990,8 +2989,8 @@ MUST use all buffers but the last (i.e. the first 
$num_buffers -
 supplied by the driver.
 
 The device MUST use all buffers used by a single receive
-packet together, by atomically incrementing \field{idx} in the
-used ring by the \field{num_buffers} value.
+packet together, such that at least \field{num_buffers} are
+observed by driver as used.
 
 If VIRTIO_NET_F_GUEST_CSUM is not negotiated, the device MUST set
 \field{flags} to zero and SHOULD supply a fully checksummed
@@ -3378,7 +3377,8 @@ The device MUST queue packets only on any receiveq1 
before the
 VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command.
 
 The device MUST NOT queue packets on receive queues greater than
-\field{virtqueue_pairs} once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET 
command in the used ring.
+\field{virtqueue_pairs} once it has placed the
+VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command in a used buffer.
 
 \subparagraph{Legacy Interface: Automatic receive steering in multiqueue 
mode}\label{sec:Device Types / Network Device / Device Operation / Control 
Virtqueue / Automatic receive steering in multiqueue mode / Legacy Interface: 
Automatic receive steering 

[virtio-dev] [PATCH v9 14/16] VIRTIO_F_NOTIFICATION_DATA: extra data to devices

2018-02-28 Thread Michael S. Tsirkin
Motivation for the new feature is included in the text.

Signed-off-by: Michael S. Tsirkin 
---
 content.tex  | 141 ---
 introduction.tex |   4 +-
 notifications.c  |   3 ++
 3 files changed, 140 insertions(+), 8 deletions(-)
 create mode 100644 notifications.c

diff --git a/content.tex b/content.tex
index c57a918..4261913 100644
--- a/content.tex
+++ b/content.tex
@@ -283,9 +283,77 @@ Packed Virtqueues}).
 Every driver and device supports either the Packed or the Split
 Virtqueue format, or both.
 
+\subsection{Driver notifications} \label{sec:Virtqueues / Driver notifications}
+Driver is sometimes required to notify the device after
+making changes to the virtqueue.
+
+When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
+this notification involves sending the
+virtqueue number to the device (depending on the transport).
+
+However, some devices benefit from ability to find out the number of
+available descriptors in the ring, and whether to send
+interrupts to drivers without accessing virtqueue in memory:
+for efficiency or as a debugging aid.
+
+To help with these optimizations, when VIRTIO_F_NOTIFICATION_DATA
+has been negotiated, driver notifications to the device include
+the following information:
+
+\begin{description}
+\item [VQ number]
+\item [Offset]
+  Within the ring where the next available ring entry
+  will be written.
+  Without VIRTIO_F_RING_PACKED this refers to the
+  15 least significant bits of the available index.
+  With VIRTIO_F_RING_PACKED this refers to the offset
+  (in units of descritor entries)
+  within the descriptor ring where the next available
+  descriptor will be written.
+\item [Wrap Counter]
+  With VIRTIO_F_RING_PACKED this is the wrap counter
+  referring to the next available descriptor.
+  Without VIRTIO_F_RING_PACKED this is the most significant bit
+  of the available index.
+\end{description}
+
+Note that driver can trigger multiple notifications even without
+making any more changes to the ring. When VIRTIO_F_NOTIFICATION_DATA
+has been negotiated, these notifications would then have
+identical \field{Offset} and \field{Wrap Counter} values.
+
 \input{split-ring.tex}
 
 \input{packed-ring.tex}
+
+\subsubsection{Driver notifications}
+
+\label{sec:Packed Virtqueues / Driver notifications}
+Whenever not suppressed by Device Event Suppression,
+driver is required to notify the device after
+making changes to the virtqueue.
+
+Some devices benefit from ability to find out the number of
+available descriptors in the ring, and whether to send
+interrupts to drivers without accessing virtqueue in memory:
+for efficiency or as a debugging aid.
+
+To help with these optimizations, driver notifications
+to the device include the following information:
+
+\begin{itemize}
+\item VQ number
+\item Offset (in units of descriptor size) within the ring
+  where the next available descriptor will be written
+\item Wrap Counter referring to the next available
+  descriptor
+\end{itemize}
+
+Note that driver can trigger multiple notifications even without
+making any more changes to the ring. These would then have
+identical \field{Offset} and \field{Wrap Counter} values.
+
 \chapter{General Initialization And Device Operation}\label{sec:General 
Initialization And Device Operation}
 
 We start with an overview of device initialization, then expand on the
@@ -862,7 +930,9 @@ the same Queue Notify address for all queues.
 \devicenormative{\paragraph}{Notification capability}{Virtio Transport Options 
/ Virtio Over PCI Bus / PCI Device Layout / Notification capability}
 The device MUST present at least one notification capability.
 
-The \field{cap.offset} MUST be 2-byte aligned.  
+For devices not offering VIRTIO_F_NOTIFICATION_DATA:
+
+The \field{cap.offset} MUST be 2-byte aligned.
 
 The device MUST either present \field{notify_off_multiplier} as an even power 
of 2,
 or present \field{notify_off_multiplier} as 0.
@@ -876,6 +946,23 @@ For all queues, the value \field{cap.length} presented by 
the device MUST satisf
 cap.length >= queue_notify_off * notify_off_multiplier + 2
 \end{lstlisting}
 
+For devices offering VIRTIO_F_NOTIFICATION_DATA:
+
+The device MUST either present \field{notify_off_multiplier} as a
+number that is a power of 2 that is also a multiple 4,
+or present \field{notify_off_multiplier} as 0.
+
+The \field{cap.offset} MUST be 4-byte aligned.
+
+The value \field{cap.length} presented by the device MUST be at least 4
+and MUST be large enough to support queue notification offsets
+for all supported queues in all possible configurations.
+
+For all queues, the value \field{cap.length} presented by the device MUST 
satisfy:
+\begin{lstlisting}
+cap.length >= queue_notify_off * notify_off_multiplier + 4
+\end{lstlisting}
+
 \subsubsection{ISR status capability}\label{sec:Virtio Transport Options / 
Virtio Over PCI Bus / PCI 

[virtio-dev] [PATCH v9 03/16] content: move ring text out to a separate file

2018-02-28 Thread Michael S. Tsirkin
Will be easier to manage this way.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Cornelia Huck 
---
 content.tex| 499 +
 split-ring.tex | 498 
 2 files changed, 499 insertions(+), 498 deletions(-)
 create mode 100644 split-ring.tex

diff --git a/content.tex b/content.tex
index 4483a4b..5b4c4e9 100644
--- a/content.tex
+++ b/content.tex
@@ -244,504 +244,7 @@ a device event - i.e. send an interrupt to the driver.
 
 For queue operation detail, see \ref{sec:Basic Facilities of a Virtio Device / 
Split Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Split 
Virtqueues}.
 
-\section{Split Virtqueues}\label{sec:Basic Facilities of a Virtio Device / 
Split Virtqueues}
-The split virtqueue format is the original format used by legacy
-virtio devices.  The split virtqueue format separates the
-virtqueue into several parts, where each part is write-able by
-either the driver or the device, but not both. Multiple
-locations need to be updated when making a buffer available
-and when marking it as used.
-
-
-Each queue has a 16-bit queue size
-parameter, which sets the number of entries and implies the total size
-of the queue.
-
-Each virtqueue consists of three parts:
-
-\begin{itemize}
-\item Descriptor Table
-\item Available Ring
-\item Used Ring
-\end{itemize}
-
-where each part is physically-contiguous in guest memory,
-and has different alignment requirements.
-
-The memory aligment and size requirements, in bytes, of each part of the
-virtqueue are summarized in the following table:
-
-\begin{tabular}{|l|l|l|}
-\hline
-Virtqueue Part& Alignment & Size \\
-\hline \hline
-Descriptor Table  & 16& $16 * $(Queue Size) \\
-\hline
-Available Ring& 2 & $6 + 2 * $(Queue Size) \\
- \hline
-Used Ring & 4 & $6 + 8 * $(Queue Size) \\
- \hline
-\end{tabular}
-
-The Alignment column gives the minimum alignment for each part
-of the virtqueue.
-
-The Size column gives the total number of bytes for each
-part of the virtqueue.
-
-Queue Size corresponds to the maximum number of buffers in the
-virtqueue\footnote{For example, if Queue Size is 4 then at most 4 buffers
-can be queued at any given time.}.  Queue Size value is always a
-power of 2.  The maximum Queue Size value is 32768.  This value
-is specified in a bus-specific way.
-
-When the driver wants to send a buffer to the device, it fills in
-a slot in the descriptor table (or chains several together), and
-writes the descriptor index into the available ring.  It then
-notifies the device. When the device has finished a buffer, it
-writes the descriptor index into the used ring, and sends an interrupt.
-
-\drivernormative{\subsection}{Virtqueues}{Basic Facilities of a Virtio Device 
/ Virtqueues}
-The driver MUST ensure that the physical address of the first byte
-of each virtqueue part is a multiple of the specified alignment value
-in the above table.
-
-\subsection{Legacy Interfaces: A Note on Virtqueue Layout}\label{sec:Basic 
Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on 
Virtqueue Layout}
-
-For Legacy Interfaces, several additional
-restrictions are placed on the virtqueue layout:
-
-Each virtqueue occupies two or more physically-contiguous pages
-(usually defined as 4096 bytes, but depending on the transport;
-henceforth referred to as Queue Align)
-and consists of three parts:
-
-\begin{tabular}{|l|l|l|}
-\hline
-Descriptor Table & Available Ring (\ldots padding\ldots) & Used Ring \\
-\hline
-\end{tabular}
-
-The bus-specific Queue Size field controls the total number of bytes
-for the virtqueue.
-When using the legacy interface, the transitional
-driver MUST retrieve the Queue Size field from the device
-and MUST allocate the total number of bytes for the virtqueue
-according to the following formula (Queue Align given in qalign and
-Queue Size given in qsz):
-
-\begin{lstlisting}
-#define ALIGN(x) (((x) + qalign) & ~qalign)
-static inline unsigned virtq_size(unsigned int qsz)
-{
- return ALIGN(sizeof(struct virtq_desc)*qsz + sizeof(u16)*(3 + qsz))
-  + ALIGN(sizeof(u16)*3 + sizeof(struct virtq_used_elem)*qsz);
-}
-\end{lstlisting}
-
-This wastes some space with padding.
-When using the legacy interface, both transitional
-devices and drivers MUST use the following virtqueue layout
-structure to locate elements of the virtqueue:
-
-\begin{lstlisting}
-struct virtq {
-// The actual descriptors (16 bytes each)
-struct virtq_desc desc[ Queue Size ];
-
-// A ring of available descriptor heads with free-running index.
-struct virtq_avail avail;
-
-// Padding to the next Queue Align boundary.
-u8 pad[ Padding ];
-
-// A ring of used descriptor heads with free-running index.
-struct virtq_used used;
-};
-\end{lstlisting}
-
-\subsection{Legacy Interfaces: A 

[virtio-dev] [PATCH v9 06/16] content: generalize transport ring part naming

2018-02-28 Thread Michael S. Tsirkin
Replace descriptor table/available ring/used ring
with descriptor area/driver area/device area
in all transports.

Document what's in which area.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Cornelia Huck 
---
 content.tex| 61 ++
 split-ring.tex |  6 +++---
 2 files changed, 43 insertions(+), 24 deletions(-)

diff --git a/content.tex b/content.tex
index 4350ecf..9fc9673 100644
--- a/content.tex
+++ b/content.tex
@@ -245,6 +245,24 @@ a device event - i.e. send an interrupt to the driver.
 Device reports the number of bytes it has written to memory for
 each buffer it uses. This is referred to as ``used length''.
 
+Each virtqueue can consist of up to 3 parts:
+\begin{itemize}
+\item Descriptor Area - used for describing buffers
+\item Driver Area - extra data supplied by driver to the device
+\item Device Area - extra data supplied by device to driver
+\end{itemize}
+
+\begin{note}
+Note that previous versions of this spec used different names for
+these parts (following \ref{sec:Basic Facilities of a Virtio Device / Split 
Virtqueues}):
+\begin{itemize}
+\item Descriptor Table - for the Descriptor Area
+\item Available Ring - for the Driver Area
+\item Used Ring - for the Device Area
+\end{itemize}
+
+\end{note}
+
 \input{split-ring.tex}
 
 \chapter{General Initialization And Device Operation}\label{sec:General 
Initialization And Device Operation}
@@ -667,8 +685,8 @@ struct virtio_pci_common_cfg {
 le16 queue_enable;  /* read-write */
 le16 queue_notify_off;  /* read-only for driver */
 le64 queue_desc;/* read-write */
-le64 queue_avail;   /* read-write */
-le64 queue_used;/* read-write */
+le64 queue_driver;  /* read-write */
+le64 queue_device;  /* read-write */
 };
 \end{lstlisting}
 
@@ -728,13 +746,13 @@ struct virtio_pci_common_cfg {
 \end{note}
 
 \item[\field{queue_desc}]
-The driver writes the physical address of Descriptor Table here.  See 
section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
+The driver writes the physical address of Descriptor Area here.  See 
section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
 
-\item[\field{queue_avail}]
-The driver writes the physical address of Available Ring here.  See 
section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
+\item[\field{queue_driver}]
+The driver writes the physical address of Driver Area here.  See 
section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
 
-\item[\field{queue_used}]
-The driver writes the physical address of Used Ring here.  See section 
\ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
+\item[\field{queue_device}]
+The driver writes the physical address of Device Area here.  See 
section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
 \end{description}
 
 \devicenormative{\paragraph}{Common configuration structure layout}{Virtio 
Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common 
configuration structure layout}
@@ -1496,24 +1514,24 @@ All register values are organized as Little Endian.
 See also p. \ref{sec:Virtio Transport Options / Virtio Over MMIO / 
MMIO-specific Initialization And Device Operation / Device 
Initialization}~\nameref{sec:Virtio Transport Options / Virtio Over MMIO / 
MMIO-specific Initialization And Device Operation / Device Initialization}.
   }
   \hline 
-  \mmiodreg{QueueDescLow}{QueueDescHigh}{Virtual queue's Descriptor Table 64 
bit long physical address}{0x080}{0x084}{W}{%
+  \mmiodreg{QueueDescLow}{QueueDescHigh}{Virtual queue's Descriptor Area 64 
bit long physical address}{0x080}{0x084}{W}{%
 Writing to these two registers (lower 32 bits of the address
 to \field{QueueDescLow}, higher 32 bits to \field{QueueDescHigh}) notifies
-the device about location of the Descriptor Table of the queue
+the device about location of the Descriptor Area of the queue
 selected by writing to \field{QueueSel} register.
   }
   \hline 
-  \mmiodreg{QueueAvailLow}{QueueAvailHigh}{Virtual queue's Available Ring 64 
bit long physical address}{0x090}{0x094}{W}{%
+  \mmiodreg{QueueDriverLow}{QueueDriverHigh}{Virtual queue's Driver Area 64 
bit long physical address}{0x090}{0x094}{W}{%
 Writing to these two registers (lower 32 bits of the address
 to \field{QueueAvailLow}, higher 32 bits to \field{QueueAvailHigh}) 
notifies
-the device about location of the Available Ring of the queue
+the device about location of the Driver Area of the queue
 selected by writing to \field{QueueSel}.
   }
   \hline 
-  \mmiodreg{QueueUsedLow}{QueueUsedHigh}{Virtual queue's Used Ring 64 bit long 
physical address}{0x0a0}{0x0a4}{W}{%
+  \mmiodreg{QueueDeviceLow}{QueueDeviceHigh}{Virtual queue's 

[virtio-dev] [PATCH v9 15/16] makediff: update to show diff from master

2018-02-28 Thread Michael S. Tsirkin
Down the road, when we are close to releasing
v1.1 we will want to show diff from cs04. But for now,
it's handy to generate the diff from master,
this way each new feature is redlined separately.

Signed-off-by: Michael S. Tsirkin 
---
 makediff.sh | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/makediff.sh b/makediff.sh
index 1dd75d4..2ef92fc 100755
--- a/makediff.sh
+++ b/makediff.sh
@@ -9,7 +9,7 @@ export DATESTR=${DATESTR:-`cat REVISION-DATE`}
 MAIN=$1
 PATH=.:${PATH}
 cur="$PWD"
-oldrev=`git rev-list -1 origin/tags/v1.0-cs03`
+oldrev=`git rev-list -1 origin/master`
 newrev=`git rev-list -1 HEAD`
 rm -fr old new
 git clone $PWD old
@@ -19,7 +19,6 @@ while read -r rev; do
echo "Applying $rev"
git cherry-pick `git rev-list -1 -F --grep "$rev" $newrev` || exit 1
 done << 'EOF'
-headerfile: rename virtio_ring to virtio queue
 EOF
 
 #mv specvars.tex specvars-orig.tex
-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 08/16] split-ring: generalize text

2018-02-28 Thread Michael S. Tsirkin
Update generic text to talk about available/used buffers, not rings.
Move some split-ring specific text to the correct section.

Update conformance section with link to the new conformance clause.

Signed-off-by: Michael S. Tsirkin 
---
 conformance.tex |  1 +
 content.tex | 10 --
 split-ring.tex  |  4 
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/conformance.tex b/conformance.tex
index 55d17b4..e4efe33 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -38,6 +38,7 @@ A driver MUST conform to the following normative statements:
 \item \ref{drivernormative:Basic Facilities of a Virtio Device / Virtqueues / 
The Virtqueue Descriptor Table}
 \item \ref{drivernormative:Basic Facilities of a Virtio Device / Virtqueues / 
The Virtqueue Descriptor Table / Indirect Descriptors}
 \item \ref{drivernormative:Basic Facilities of a Virtio Device / Virtqueues / 
Virtqueue Interrupt Suppression}
+\item \ref{drivernormative:Basic Facilities of a Virtio Device / Virtqueues / 
The Virtqueue Available Ring}
 \item \ref{drivernormative:Basic Facilities of a Virtio Device / Virtqueues / 
The Virtqueue Used Ring}
 \item \ref{drivernormative:Basic Facilities of a Virtio Device / Virtqueues / 
Virtqueue Notification Suppression}
 \item \ref{drivernormative:Basic Facilities of a Virtio Device / Virtqueues / 
Supplying Buffers to The Device / Updating idx}
diff --git a/content.tex b/content.tex
index 5634c7d..e1e30a0 100644
--- a/content.tex
+++ b/content.tex
@@ -381,12 +381,10 @@ of a device are live once the device has been reset.
 
 \drivernormative{\subsection}{Device Cleanup}{General Initialization And 
Device Operation / Device Cleanup}
 
-A driver MUST NOT alter descriptor table entries which have been
-exposed in the available ring (and not marked consumed by the device
-in the used ring) of a live virtqueue.
-
-A driver MUST NOT decrement the available \field{idx} on a live virtqueue (ie.
-there is no way to ``unexpose'' buffers).
+A driver MUST NOT alter virtqueue entries for exposed buffers -
+i.e. buffers which have been
+made available to the device (and not been used by the device)
+of a live virtqueue.
 
 Thus a driver MUST ensure a virtqueue isn't live (by device reset) before 
removing exposed buffers.
 
diff --git a/split-ring.tex b/split-ring.tex
index 9601a53..a594d41 100644
--- a/split-ring.tex
+++ b/split-ring.tex
@@ -296,6 +296,10 @@ referred to this structure as vring_avail, and the 
constant as
 VRING_AVAIL_F_NO_INTERRUPT, but the layout and value were identical.
 \end{note}
 
+\drivernormative{\subsubsection}{The Virtqueue Available Ring}{Basic 
Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}
+A driver MUST NOT decrement the available \field{idx} on a virtqueue (ie.
+there is no way to ``unexpose'' buffers).
+
 \subsection{Virtqueue Interrupt Suppression}\label{sec:Basic Facilities of a 
Virtio Device / Virtqueues / Virtqueue Interrupt Suppression}
 
 If the VIRTIO_F_EVENT_IDX feature bit is not negotiated,
-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 12/16] packed-ring: add in order support

2018-02-28 Thread Michael S. Tsirkin
Support in-order requests for packed rings.
This allows selective write-out of used descriptors.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Cornelia Huck 
Reviewed-by: Stefan Hajnoczi 
---
 packed-ring.tex | 24 
 1 file changed, 24 insertions(+)

diff --git a/packed-ring.tex b/packed-ring.tex
index 2943df0..12bab67 100644
--- a/packed-ring.tex
+++ b/packed-ring.tex
@@ -272,6 +272,30 @@ Buffer ID is also reserved and is ignored by the device.
 In Descriptors with VIRTQ_DESC_F_INDIRECT set VIRTQ_DESC_F_WRITE
 is reserved and is ignored by the device.
 
+\subsection{In-order use of descriptors}
+\label{sec:Packed Virtqueues / In-order use of descriptors}
+
+Some devices always use descriptors in the same order in which
+they have been made available. These devices can offer the
+VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows
+devices to notify the use of a batch of buffers to the driver by
+only writing out a single used descriptor with the Buffer ID
+corresponding to the last descriptor in the batch.
+
+Device then skips forward in the ring according to the size of
+the batch. Driver needs to look up the used Buffer ID and
+calculate the batch size to be able to advance to where the next
+used descriptor will be written by the device.
+
+This will result in the used descriptor overwriting the first
+available descriptor in the batch, the used descriptor for the
+next batch overwriting the first available descriptor in the next
+batch, etc.
+
+The skipped buffers (for which no used descriptor was written)
+are assumed to have been used (read or written) by the
+device completely.
+
 \subsection{Multi-buffer requests}
 \label{sec:Packed Virtqueues / Multi-buffer requests}
 Some devices combine multiple buffers as part of processing of a
-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 09/16] split-ring: typo: aligment

2018-02-28 Thread Michael S. Tsirkin
---
 split-ring.tex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/split-ring.tex b/split-ring.tex
index a594d41..87ecee2 100644
--- a/split-ring.tex
+++ b/split-ring.tex
@@ -23,7 +23,7 @@ Each virtqueue consists of three parts:
 where each part is physically-contiguous in guest memory,
 and has different alignment requirements.
 
-The memory aligment and size requirements, in bytes, of each part of the
+The memory alignment and size requirements, in bytes, of each part of the
 virtqueue are summarized in the following table:
 
 \begin{tabular}{|l|l|l|}
-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v9 11/16] content: in-order buffer use

2018-02-28 Thread Michael S. Tsirkin
Using descriptors in-order is sometimes beneficial.  Add an option for
that - per-format detail allowing more optimizations will be added by
follow-up patches.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Cornelia Huck 
Reviewed-by: Stefan Hajnoczi 
---
 content.tex | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/content.tex b/content.tex
index 73f40b7..c57a918 100644
--- a/content.tex
+++ b/content.tex
@@ -245,6 +245,15 @@ a device event - i.e. send an interrupt to the driver.
 Device reports the number of bytes it has written to memory for
 each buffer it uses. This is referred to as ``used length''.
 
+Device is not generally required to use buffers in
+the same order in which they have been made available
+by the driver.
+
+Some devices always use descriptors in the same order in which
+they have been made available. These devices can offer the
+VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
+might allow optimizations or simplify driver and/or device code.
+
 Each virtqueue can consist of up to 3 parts:
 \begin{itemize}
 \item Descriptor Area - used for describing buffers
@@ -5248,6 +5257,9 @@ Descriptors} and \ref{sec:Packed Virtqueues / Indirect 
Flag: Scatter-Gather Supp
   \item[VIRTIO_F_RING_PACKED(34)] This feature indicates
   support for the packed virtqueue layout as described in
   \ref{sec:Basic Facilities of a Virtio Device / Packed 
Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Packed 
Virtqueues}.
+  \item[VIRTIO_F_IN_ORDER(35)] This feature indicates
+  that all buffers are used by the device in the same
+  order in which they have been made available.
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
@@ -5273,6 +5285,9 @@ translates bus addresses from the device into physical 
addresses in memory.
 A device MAY fail to operate further if VIRTIO_F_IOMMU_PLATFORM is not
 accepted.
 
+If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
+buffers in the same order in which they have been available.
+
 \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature 
Bits / Legacy Interface: Reserved Feature Bits}
 
 Transitional devices MAY offer the following:
-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

2018-02-28 Thread Michael S. Tsirkin
On Wed, Feb 28, 2018 at 08:08:39AM +0100, Jiri Pirko wrote:
> Tue, Feb 27, 2018 at 10:41:49PM CET, kubak...@wp.pl wrote:
> >On Tue, 27 Feb 2018 13:16:21 -0800, Alexander Duyck wrote:
> >> Basically we need some sort of PCI or PCIe topology mapping for the
> >> devices that can be translated into something we can communicate over
> >> the communication channel. 
> >
> >Hm.  This is probably a completely stupid idea, but if we need to
> >start marshalling configuration requests/hints maybe the entire problem
> >could be solved by opening a netlink socket from hypervisor?  Even make
> >teamd run on the hypervisor side...
> 
> Interesting. That would be more trickier then just to fwd 1 genetlink
> socket to the hypervisor.
> 
> Also, I think that the solution should handle multiple guest oses. What
> I'm thinking about is some generic bonding description passed over some
> communication channel into vm. The vm either use it for configuration,
> or ignores it if it is not smart enough/updated enough.

For sure, we could build virtio-bond to pass that info to guests.

Such an advisory mechanism would not be a replacement for the mandatory
passthrough fallback flag proposed, but OTOH it's much more flexible.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v8 08/16] packed virtqueues: more efficient virtqueue layout

2018-02-28 Thread Jens Freimann

On Tue, Feb 27, 2018 at 12:29:11PM +0100, Halil Pasic wrote:



On 02/27/2018 11:23 AM, Jens Freimann wrote:

On Mon, Feb 26, 2018 at 11:05:14PM +0200, Michael S. Tsirkin wrote:

On Mon, Feb 26, 2018 at 06:19:21PM +0100, Halil Pasic wrote:

> +?? vq->driver_event.flags = 0x1;
> +?? memory_barrier();
> +
> +?? flags = d->flags;
> +?? bool avail = flags & (1 << 
VIRTQ_DESC_F_AVAIL);
> +?? bool used = flags & (1 << VIRTQ_DESC_F_USED);
> +?? if (avail != used) {
> +?? break;
> +?? }
> +
> +?? vq->driver_event.flags = 0x2;
> +?? }
> +
> +?? read_memory_barrier();

Now with the condition avail != used a freshly (that is zero initialized)
ring would appear all used. And we would do process_buffer(d) for the
whole ring if this code happens to get executed. Do we have to make
sure that this does not happen?


I'll have to think about this.


With the wrap counter initialized to 1 descriptors would not be seen
as used.


Looking at the code... In vhost:

+static bool desc_is_avail(struct vhost_virtqueue *vq,
+ struct vring_desc_packed *desc)
+{
+   if (vq->used_wrap_counter)
+   if ((desc->flags & DESC_AVAIL) && !(desc->flags & DESC_USED))
+   return true;
+   if (vq->used_wrap_counter == false)
+   if (!(desc->flags & DESC_AVAIL) && (desc->flags & DESC_USED))
+   return true;
+
+   return false;
+}

Here the bit pattern corresponding to available depends on the
value of the wrap counter. I kind of anticipated this, but I could not
find it defined in this spec.

OTOH in guest:

+static inline bool more_used_packed(const struct vring_virtqueue *vq)
+{
+   u16 last_used, flags;
+   bool avail, used;
+
+   if (vq->vq.num_free == vq->vring.num)
+   return false;
+
+   last_used = vq->last_used_idx;
+   flags = virtio16_to_cpu(vq->vq.vdev,
+   vq->vring_packed.desc[last_used].flags);
+   avail = flags & VRING_DESC_F_AVAIL(1);
+   used = flags & VRING_DESC_F_USED(1);
+
+   return avail == used;
+}

if the next to be used descriptor is actually used does not depend on
any wrap counter, but has vq->vq.num_free == vq->vring.num as an extra
condition. This extra condition is basically 'there are outstanding
descriptors' and those are obviously either 'available' or yet to be observed
'used' descriptors. Right after initialization is covered by this extra
condition. And obviously if the descriptor in question is still available
flags & VRING_DESC_F_AVAIL(1) != flags & VRING_DESC_F_USED(1) holds, so
with the extra condition we are right there where we want to be.

But I could not find the extra condition in the spec.

With that said, I also want to point out that I don't understand
your statement Jens. I don't see a way to express the condition corresponding
to more_used_packed() using the driver wrap counter (avail_wrap_count).
Of course a wrap counter that wraps when last_used wraps could be used
to (but no such counter is mentioned here AFAIU).


Not sure I get this. 


I was merely saying that when descriptor flags are initialized to 0
and the wrap counters to 1, then it is not the case that the driver
would see all descriptors as used because it takes the wrap counter
into account.


regards,
Jens 


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio] Re: [PATCH v8 08/16] packed virtqueues: more efficient virtqueue layout

2018-02-28 Thread Jens Freimann

On Tue, Feb 27, 2018 at 06:03:01PM +0100, Halil Pasic wrote:



On 02/27/2018 03:11 PM, Michael S. Tsirkin wrote:

[..]

+
+\devicenormative{\subsection}{The Virtqueue Descriptor Table}{Basic Facilities 
of a Virtio Device / Packed Virtqueues / The Virtqueue Descriptor Table}
+A device MUST NOT write to a device-readable buffer, and a device SHOULD NOT
+read a device-writable buffer.
+A device MUST NOT use a descriptor unless it observes
+VIRTQ_DESC_F_AVAIL bit in its \field{flags} being changed.

I don't really understand this. How does the device observe
the VIRTQ_DESC_F_AVAIL bit being changed?

By reading the descriptor.


:) My point is: to observe a change one usually either needs at
least one reading before and at least one reading after the change,
or one needs to know that a certain reading means change. The latter
is possible if we know that at the beginning of the time frame under
consideration (t_0) only a certain set of values,let's say B like before,
is possible, and after the change only a certain other set of values
let's say A like after, is possible, and A and B are disjunctive (
$A \cap B = \emtyset$).

Well each descriptor is read each time ring wraps around,
and the bit value changes each time ring wraps around.
For example device knows it's zero initialized so
if it reads bit value as 1 it knows the bit value has changed.




Yeah I kind of understand but I would like having a more straightforward
formulation here (than changes).

BTW does this mean that the vhost implementation (that is:

+static bool desc_is_avail(struct vhost_virtqueue *vq,
+ struct vring_desc_packed *desc)
+{
+   if (vq->used_wrap_counter)
+   if ((desc->flags & DESC_AVAIL) && !(desc->flags & DESC_USED))
+   return true;
+   if (vq->used_wrap_counter == false)
+   if (!(desc->flags & DESC_AVAIL) && (desc->flags & DESC_USED))
+   return true;
+
+   return false;
+}

) is needlessly looking at the 'used' bit? (I think that is the case.)


Why do you think so? I assume with "used bit" you refer to the device
ring wrap counter. It needs to be looked at because if device looks
only at the descriptor flags it could be that they are different (so
available != used), but actually this descriptor was just skipped
previously. This could happen when descriptors are used in-order
(VIRTIO_F_IN_ORDER) and only the first descriptor of a batch is marked
as used. 


Should we mention that in the section about the counters?

regards,
Jens 



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v8 08/16] packed virtqueues: more efficient virtqueue layout

2018-02-28 Thread Halil Pasic


On 02/28/2018 02:42 PM, Jens Freimann wrote:
> On Tue, Feb 27, 2018 at 12:29:11PM +0100, Halil Pasic wrote:
>>
>>
>> On 02/27/2018 11:23 AM, Jens Freimann wrote:
>>> On Mon, Feb 26, 2018 at 11:05:14PM +0200, Michael S. Tsirkin wrote:
 On Mon, Feb 26, 2018 at 06:19:21PM +0100, Halil Pasic wrote:
> > +?? vq->driver_event.flags = 0x1;
> > +?? memory_barrier();
> > +
> > +?? flags = d->flags;
> > +?? bool avail = flags & (1 << 
> > VIRTQ_DESC_F_AVAIL);
> > +?? bool used = flags & (1 << 
> > VIRTQ_DESC_F_USED);
> > +?? if (avail != used) {
> > +?? break;
> > +?? }
> > +
> > +?? vq->driver_event.flags = 0x2;
> > +?? }
> > +
> > +?? read_memory_barrier();
>
> Now with the condition avail != used a freshly (that is zero initialized)
> ring would appear all used. And we would do process_buffer(d) for the
> whole ring if this code happens to get executed. Do we have to make
> sure that this does not happen?

 I'll have to think about this.
>>>
>>> With the wrap counter initialized to 1 descriptors would not be seen
>>> as used.
>>
>> Looking at the code... In vhost:
>>
>> +static bool desc_is_avail(struct vhost_virtqueue *vq,
>> +  struct vring_desc_packed *desc)
>> +{
>> +    if (vq->used_wrap_counter)
>> +    if ((desc->flags & DESC_AVAIL) && !(desc->flags & DESC_USED))
>> +    return true;
>> +    if (vq->used_wrap_counter == false)
>> +    if (!(desc->flags & DESC_AVAIL) && (desc->flags & DESC_USED))
>> +    return true;
>> +
>> +    return false;
>> +}
>>
>> Here the bit pattern corresponding to available depends on the
>> value of the wrap counter. I kind of anticipated this, but I could not
>> find it defined in this spec.
>>
>> OTOH in guest:
>>
>> +static inline bool more_used_packed(const struct vring_virtqueue *vq)
>> +{
>> +    u16 last_used, flags;
>> +    bool avail, used;
>> +
>> +    if (vq->vq.num_free == vq->vring.num)
>> +    return false;
>> +
>> +    last_used = vq->last_used_idx;
>> +    flags = virtio16_to_cpu(vq->vq.vdev,
>> +    vq->vring_packed.desc[last_used].flags);
>> +    avail = flags & VRING_DESC_F_AVAIL(1);
>> +    used = flags & VRING_DESC_F_USED(1);
>> +
>> +    return avail == used;
>> +}
>>
>> if the next to be used descriptor is actually used does not depend on
>> any wrap counter, but has vq->vq.num_free == vq->vring.num as an extra
>> condition. This extra condition is basically 'there are outstanding
>> descriptors' and those are obviously either 'available' or yet to be observed
>> 'used' descriptors. Right after initialization is covered by this extra
>> condition. And obviously if the descriptor in question is still available
>> flags & VRING_DESC_F_AVAIL(1) != flags & VRING_DESC_F_USED(1) holds, so
>> with the extra condition we are right there where we want to be.
>>
>> But I could not find the extra condition in the spec.
>>
>> With that said, I also want to point out that I don't understand
>> your statement Jens. I don't see a way to express the condition corresponding
>> to more_used_packed() using the driver wrap counter (avail_wrap_count).
>> Of course a wrap counter that wraps when last_used wraps could be used
>> to (but no such counter is mentioned here AFAIU).
> 
> Not sure I get this.
> I was merely saying that when descriptor flags are initialized to 0
> and the wrap counters to 1, then it is not the case that the driver
> would see all descriptors as used because it takes the wrap counter
> into account.
> 

Please point me to the paragraph where it's written  how is the wrap
counter to be taken into account when trying to determine if the
desc_ring[last_used] (the descriptor we are polling) is used or not.

Nothing like that being specified (or at least I could not find it)
was actually what I got hooked on.

Regards,
Halil




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 0/2] two versioning related changes

2018-02-28 Thread Cornelia Huck
On Tue, 13 Feb 2018 17:27:39 +0100
Cornelia Huck  wrote:

> On Tue, 13 Feb 2018 13:44:06 +0100
> Halil Pasic  wrote:
> 
> > v1->v2:
> > * fixed ungrammatical commit title for #1
> > * added r-b's
> > * added bug tracker references
> > 
> > Halil Pasic (2):
> >   ccw: be more precise about the semantic of revision 1
> >   introduction: simplify the designation of legacy
> > 
> >  content.tex  |2 +-
> >  introduction.tex |4 ++--
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >   
> 
> Changes look fine to me.
> 
> Some minor nits for the issues:
> - I think the reported versions should only refer to released versions
>   (and not to the review versions?)
> - Please add a "Proposed patch" reference in the Proposal: field.
> 
> Otherwise, I think this is ready for voting.

ping - do we want to start voting for this soon?

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v8 08/16] packed virtqueues: more efficient virtqueue layout

2018-02-28 Thread Michael S. Tsirkin
On Wed, Feb 28, 2018 at 02:59:51PM +0100, Halil Pasic wrote:
> 
> 
> On 02/28/2018 02:42 PM, Jens Freimann wrote:
> > On Tue, Feb 27, 2018 at 12:29:11PM +0100, Halil Pasic wrote:
> >>
> >>
> >> On 02/27/2018 11:23 AM, Jens Freimann wrote:
> >>> On Mon, Feb 26, 2018 at 11:05:14PM +0200, Michael S. Tsirkin wrote:
>  On Mon, Feb 26, 2018 at 06:19:21PM +0100, Halil Pasic wrote:
> > > +?? vq->driver_event.flags = 0x1;
> > > +?? memory_barrier();
> > > +
> > > +?? flags = d->flags;
> > > +?? bool avail = flags & (1 << 
> > > VIRTQ_DESC_F_AVAIL);
> > > +?? bool used = flags & (1 << 
> > > VIRTQ_DESC_F_USED);
> > > +?? if (avail != used) {
> > > +?? break;
> > > +?? }
> > > +
> > > +?? vq->driver_event.flags = 0x2;
> > > +?? }
> > > +
> > > +?? read_memory_barrier();
> >
> > Now with the condition avail != used a freshly (that is zero 
> > initialized)
> > ring would appear all used. And we would do process_buffer(d) for the
> > whole ring if this code happens to get executed. Do we have to make
> > sure that this does not happen?
> 
>  I'll have to think about this.
> >>>
> >>> With the wrap counter initialized to 1 descriptors would not be seen
> >>> as used.
> >>
> >> Looking at the code... In vhost:
> >>
> >> +static bool desc_is_avail(struct vhost_virtqueue *vq,
> >> +  struct vring_desc_packed *desc)
> >> +{
> >> +    if (vq->used_wrap_counter)
> >> +    if ((desc->flags & DESC_AVAIL) && !(desc->flags & DESC_USED))
> >> +    return true;
> >> +    if (vq->used_wrap_counter == false)
> >> +    if (!(desc->flags & DESC_AVAIL) && (desc->flags & DESC_USED))
> >> +    return true;
> >> +
> >> +    return false;
> >> +}
> >>
> >> Here the bit pattern corresponding to available depends on the
> >> value of the wrap counter. I kind of anticipated this, but I could not
> >> find it defined in this spec.
> >>
> >> OTOH in guest:
> >>
> >> +static inline bool more_used_packed(const struct vring_virtqueue *vq)
> >> +{
> >> +    u16 last_used, flags;
> >> +    bool avail, used;
> >> +
> >> +    if (vq->vq.num_free == vq->vring.num)
> >> +    return false;
> >> +
> >> +    last_used = vq->last_used_idx;
> >> +    flags = virtio16_to_cpu(vq->vq.vdev,
> >> +    vq->vring_packed.desc[last_used].flags);
> >> +    avail = flags & VRING_DESC_F_AVAIL(1);
> >> +    used = flags & VRING_DESC_F_USED(1);
> >> +
> >> +    return avail == used;
> >> +}
> >>
> >> if the next to be used descriptor is actually used does not depend on
> >> any wrap counter, but has vq->vq.num_free == vq->vring.num as an extra
> >> condition. This extra condition is basically 'there are outstanding
> >> descriptors' and those are obviously either 'available' or yet to be 
> >> observed
> >> 'used' descriptors. Right after initialization is covered by this extra
> >> condition. And obviously if the descriptor in question is still available
> >> flags & VRING_DESC_F_AVAIL(1) != flags & VRING_DESC_F_USED(1) holds, so
> >> with the extra condition we are right there where we want to be.
> >>
> >> But I could not find the extra condition in the spec.
> >>
> >> With that said, I also want to point out that I don't understand
> >> your statement Jens. I don't see a way to express the condition 
> >> corresponding
> >> to more_used_packed() using the driver wrap counter (avail_wrap_count).
> >> Of course a wrap counter that wraps when last_used wraps could be used
> >> to (but no such counter is mentioned here AFAIU).
> > 
> > Not sure I get this.
> > I was merely saying that when descriptor flags are initialized to 0
> > and the wrap counters to 1, then it is not the case that the driver
> > would see all descriptors as used because it takes the wrap counter
> > into account.
> > 
> 
> Please point me to the paragraph where it's written  how is the wrap
> counter to be taken into account when trying to determine if the
> desc_ring[last_used] (the descriptor we are polling) is used or not.
> 
> Nothing like that being specified (or at least I could not find it)
> was actually what I got hooked on.
> 
> Regards,
> Halil
> 

Maintaining an internal "last used wrap counter" is one way to
detect ring entry change. Another would be to maintain
a per-entry "last used flag".

I should probably do this in pseudo-code, and maybe add an
implementation note.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] [PATCH] content: document balloon memory statistics

2018-02-28 Thread Tomáš Golembiovský
On Tue, 27 Feb 2018 21:56:04 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Feb 21, 2018 at 06:44:17PM +0100, Tomáš Golembiovský wrote:
> > On Wed, 21 Feb 2018 11:33:28 +
> > Stefan Hajnoczi  wrote:
> >   
> > > On Wed, Feb 21, 2018 at 12:11:35AM +0100, Tomáš Golembiovský wrote:  
> > > > On Tue, 20 Feb 2018 17:15:44 +
> > > > Stefan Hajnoczi  wrote:
> > > >   
> > > > > On Mon, Feb 19, 2018 at 02:10:29PM +0100, Tomáš Golembiovský wrote:  
> > > > > > +\item[VIRTIO_BALLOON_S_CACHES (7)] The amount of memory currently 
> > > > > > used for
> > > > > > +  caching files and other data from disk (in bytes).
> > > > > >  \end{description}
> > > > > 
> > > > > This one is somewhat vague.  I'm not sure if it means guest page cache
> > > > > specifically or something more general.
> > > > > 
> > > > > On the other hand, the numbers tend to be OS specific...  
> > > > 
> > > > We should come up with a definition that is as definite as possible
> > > > while leaving room for all systems to substitute a value that is
> > > > natively available. So if anyone has a better formulation then please
> > > > advise. The original intent was not to create a Linux specific field.
> > > > 
> > > > On Linux the value corresponds to Buffers + Cached + SwapCached memory
> > > > stats. Correct me if I'm wrong but that should be all clean pages that
> > > > have a counterpart on disk and can be quickly reclaimed and used for
> > > > something else (without additional IO).  
> > > 
> > > Thanks for clarifying.  Based on what you posted, how about:
> > > 
> > > "The amount of memory, in bytes, that can be quickly reclaimed without
> > > additional I/O.  Typically these pages are used for caching files from
> > > disk."
> > > 
> > > ?  
> > 
> > Sounds reasonable, thanks
> > 
> > Tomas  
> 
> OK - will you post v2 then?

I already have:
https://lists.oasis-open.org/archives/virtio-dev/201802/msg00141.html

> 
> >   
> > >   
> > > > On Windows this roughly corresponds to the standby list. Or maybe
> > > > standby list plus something else. I suppose there may be some
> > > > alternative on BSD too.
> > > > 
> > > > Tomas
> > > > 
> > > > 
> > > > -- 
> > > > Tomáš Golembiovský   
> > 
> > 
> > -- 
> > Tomáš Golembiovský   


-- 
Tomáš Golembiovský 

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v8 08/16] packed virtqueues: more efficient virtqueue layout

2018-02-28 Thread Tiwei Bie
On Tue, Feb 27, 2018 at 10:17:17PM +0200, Michael S. Tsirkin wrote:
> On Tue, Feb 27, 2018 at 09:49:28AM +0800, Tiwei Bie wrote:
> > On Mon, Feb 26, 2018 at 10:38:13PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Feb 26, 2018 at 06:51:11PM +0800, Tiwei Bie wrote:
> > > > On Sun, Feb 25, 2018 at 08:49:10PM +0200, Michael S. Tsirkin wrote:
> > > > > On Sat, Feb 24, 2018 at 01:17:08PM +0800, Tiwei Bie wrote:
> > > > > > On Fri, Feb 16, 2018 at 09:24:12AM +0200, Michael S. Tsirkin wrote:
> > > > [...]
> > > > > > > +\subsection{Event Suppression Structure Format}\label{sec:Basic
> > > > > > > +Facilities of a Virtio Device / Packed Virtqueues / Event 
> > > > > > > Suppression Structure
> > > > > > > +Format}
> > > > > > > +
> > > > > > > +The following structure is used to reduce the number of
> > > > > > > +notifications sent between driver and device.
> > > > > > > +
> > > > > > > +\begin{lstlisting}
> > > > > > > +__le16 desc_event_off : 15; /* Descriptor Event Offset */
> > > > > > > +intdesc_event_wrap : 1; /* Descriptor Event Wrap Counter */
> > > > > > 
> > > > > > Is this `int` a typo?
> > > > > 
> > > > > It's a single bit so I think it does not matter.
> > > > > What type would you like me to use instead?
> > > > 
> > > > It looks a bit strange to use different types here, and
> > > > that's why I asked. If there is no particular reason to
> > > > use `int` here, maybe it's better to keep using __le16.
> > > > 
> > > > Besides, just for fun. For C language, I checked gcc and
> > > > clang. It seems that `int desc_event_wrap:1;` is a signed
> > > > type. So, e.g. `p->desc_event_wrap == 1` is always false.
> > > > 
> > > > Best regards,
> > > > Tiwei Bie
> > > 
> > > I'll switch to u8 here, IMHO le16 for a single bit
> > > is really confusing. There's no byte order for a single byte.
> > 
> > Sorry, I just realized that I misunderstood your point
> > previously.. Just to double check, `desc_event_off` and
> > `desc_event_wrap` are not in the same 2 bytes?
> > 
> > Previously I thought both of them are in the first 2
> > bytes. As it said it's a structure, and the fields are
> > defined in a way very similar to the bit field in C.
> > 
> > In C,
> > 
> > struct {
> > __le16 desc_event_off : 15;
> > intdesc_event_wrap : 1;
> > __le16 desc_event_flags : 2;
> > };
> > 
> > struct {
> > __le16 desc_event_off : 15;
> > __le16 desc_event_wrap : 1;
> > __le16 desc_event_flags : 2;
> > };
> > 
> > struct {
> > __le16 desc_event_off : 15,
> >desc_event_wrap : 1;
> > __le16 desc_event_flags : 2;
> > };
> > 
> > All above means `desc_event_off` and `desc_event_wrap`
> > are in the first 2 bytes. So I thought the `int` is a
> > typo. And I thought they are in the first 2 bytes (which
> > is little-endian).
> >
> > Best regards,
> > Tiwei Bie
> 
> Yes but desc_event_wrap itself is completely within the
> second byte. So the question of byte-order does not arise.
> 
> Right, and above is also identical to:
> 
>  struct {
>   __le16 desc_event_off : 15,
>   u8 desc_event_wrap : 1;
>   __le16 desc_event_flags : 2;
>  };
> 
> introduction explains:
> 
> \item[u8, u16, u32, u64] An unsigned integer of the specified length in bits.
> 
> \item[le16, le32, le64] An unsigned integer of the specified length in bits,
> in little-endian byte order.
> 

I've got your point now. Thanks! ;-)
BTW, maybe it's better to remove the `__` prefix for
`__le16` to keep consistency.

Best regards,
Tiwei Bie

> 
> 
> > > 
> > > > > 
> > > > > > > +__le16 desc_event_flags : 2; /* Descriptor Event Flags */
> > > > > > > +\end{lstlisting}
> > > > [...]

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 0/2] two versioning related changes

2018-02-28 Thread Halil Pasic


On 02/28/2018 04:56 PM, Cornelia Huck wrote:
> On Tue, 13 Feb 2018 17:27:39 +0100
> Cornelia Huck  wrote:
> 
>> On Tue, 13 Feb 2018 13:44:06 +0100
>> Halil Pasic  wrote:
>>
>>> v1->v2:
>>> * fixed ungrammatical commit title for #1
>>> * added r-b's
>>> * added bug tracker references
>>>
>>> Halil Pasic (2):
>>>   ccw: be more precise about the semantic of revision 1
>>>   introduction: simplify the designation of legacy
>>>
>>>  content.tex  |2 +-
>>>  introduction.tex |4 ++--
>>>  2 files changed, 3 insertions(+), 3 deletions(-)
>>>   
>>
>> Changes look fine to me.
>>
>> Some minor nits for the issues:
>> - I think the reported versions should only refer to released versions
>>   (and not to the review versions?)
>> - Please add a "Proposed patch" reference in the Proposal: field.
>>
>> Otherwise, I think this is ready for voting.
> 
> ping - do we want to start voting for this soon?
> 

Thanks for the heads up. I will update jira today.

Regards,
Halil


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v8 08/16] packed virtqueues: more efficient virtqueue layout

2018-02-28 Thread Halil Pasic


On 02/28/2018 04:40 PM, Michael S. Tsirkin wrote:
> On Wed, Feb 28, 2018 at 02:59:51PM +0100, Halil Pasic wrote:
>>
>>
>> On 02/28/2018 02:42 PM, Jens Freimann wrote:
>>> On Tue, Feb 27, 2018 at 12:29:11PM +0100, Halil Pasic wrote:


 On 02/27/2018 11:23 AM, Jens Freimann wrote:
> On Mon, Feb 26, 2018 at 11:05:14PM +0200, Michael S. Tsirkin wrote:
>> On Mon, Feb 26, 2018 at 06:19:21PM +0100, Halil Pasic wrote:
 +?? vq->driver_event.flags = 0x1;
 +?? memory_barrier();
 +
 +?? flags = d->flags;
 +?? bool avail = flags & (1 << 
 VIRTQ_DESC_F_AVAIL);
 +?? bool used = flags & (1 << 
 VIRTQ_DESC_F_USED);
 +?? if (avail != used) {
 +?? break;
 +?? }
 +
 +?? vq->driver_event.flags = 0x2;
 +?? }
 +
 +?? read_memory_barrier();
>>>
>>> Now with the condition avail != used a freshly (that is zero 
>>> initialized)
>>> ring would appear all used. And we would do process_buffer(d) for the
>>> whole ring if this code happens to get executed. Do we have to make
>>> sure that this does not happen?
>>
>> I'll have to think about this.
>
> With the wrap counter initialized to 1 descriptors would not be seen
> as used.

 Looking at the code... In vhost:

 +static bool desc_is_avail(struct vhost_virtqueue *vq,
 +  struct vring_desc_packed *desc)
 +{
 +    if (vq->used_wrap_counter)
 +    if ((desc->flags & DESC_AVAIL) && !(desc->flags & DESC_USED))
 +    return true;
 +    if (vq->used_wrap_counter == false)
 +    if (!(desc->flags & DESC_AVAIL) && (desc->flags & DESC_USED))
 +    return true;
 +
 +    return false;
 +}

 Here the bit pattern corresponding to available depends on the
 value of the wrap counter. I kind of anticipated this, but I could not
 find it defined in this spec.

 OTOH in guest:

 +static inline bool more_used_packed(const struct vring_virtqueue *vq)
 +{
 +    u16 last_used, flags;
 +    bool avail, used;
 +
 +    if (vq->vq.num_free == vq->vring.num)
 +    return false;
 +
 +    last_used = vq->last_used_idx;
 +    flags = virtio16_to_cpu(vq->vq.vdev,
 +    vq->vring_packed.desc[last_used].flags);
 +    avail = flags & VRING_DESC_F_AVAIL(1);
 +    used = flags & VRING_DESC_F_USED(1);
 +
 +    return avail == used;
 +}

 if the next to be used descriptor is actually used does not depend on
 any wrap counter, but has vq->vq.num_free == vq->vring.num as an extra
 condition. This extra condition is basically 'there are outstanding
 descriptors' and those are obviously either 'available' or yet to be 
 observed
 'used' descriptors. Right after initialization is covered by this extra
 condition. And obviously if the descriptor in question is still available
 flags & VRING_DESC_F_AVAIL(1) != flags & VRING_DESC_F_USED(1) holds, so
 with the extra condition we are right there where we want to be.

 But I could not find the extra condition in the spec.

 With that said, I also want to point out that I don't understand
 your statement Jens. I don't see a way to express the condition 
 corresponding
 to more_used_packed() using the driver wrap counter (avail_wrap_count).
 Of course a wrap counter that wraps when last_used wraps could be used
 to (but no such counter is mentioned here AFAIU).
>>>
>>> Not sure I get this.
>>> I was merely saying that when descriptor flags are initialized to 0
>>> and the wrap counters to 1, then it is not the case that the driver
>>> would see all descriptors as used because it takes the wrap counter
>>> into account.
>>>
>>
>> Please point me to the paragraph where it's written  how is the wrap
>> counter to be taken into account when trying to determine if the
>> desc_ring[last_used] (the descriptor we are polling) is used or not.
>>
>> Nothing like that being specified (or at least I could not find it)
>> was actually what I got hooked on.
>>
>> Regards,
>> Halil
>>
> 
> Maintaining an internal "last used wrap counter" is one way to
> detect ring entry change. Another would be to maintain
> a per-entry "last used flag".
> 
> I should probably do this in pseudo-code, and maybe add an
> implementation note.
> 

Thanks! I will revisit this once you have a proposed solution.

Regards,
Halil


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For 

Re: [virtio-dev] Re: [PATCH v8 08/16] packed virtqueues: more efficient virtqueue layout

2018-02-28 Thread Cornelia Huck
On Wed, 28 Feb 2018 17:20:29 +0200
"Michael S. Tsirkin"  wrote:

> I agree.  I think I will stop using the bitfields - they seem to
> cause too much confusion. Just
> 
> struct event {
>   __le16 event_desc; /* bits 0-14: desc_off. 15 - desc_wrap. */
>   __le16 event_flags; /* bits 0-1: event_flags, 2-15 - reserved */
> };

Looks sensible; but while you're add it, drop the leading underscores
as well? The spec only talks about 'le16' and friends.

[Also, reserved-and-to-be-ignored or reserved-and-must-be-0?]

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH] pci-iov: Add support for unmanaged SR-IOV

2018-02-28 Thread Alexander Duyck
On Tue, Feb 27, 2018 at 2:25 PM, Alexander Duyck
 wrote:
> On Tue, Feb 27, 2018 at 1:40 PM, Alex Williamson
>  wrote:
>> On Tue, 27 Feb 2018 11:06:54 -0800
>> Alexander Duyck  wrote:
>>
>>> From: Alexander Duyck 
>>>
>>> This patch is meant to add support for SR-IOV on devices when the VFs are
>>> not managed by the kernel. Examples of recent patches attempting to do this
>>> include:
>>
>> It appears to enable sriov when the _pf_ is not managed by the
>> kernel, but by "managed" we mean that either there is no pf driver or
>> the pf driver doesn't provide an sriov_configure callback,
>> intentionally or otherwise.
>>
>>> virto - https://patchwork.kernel.org/patch/10241225/
>>> pci-stub - https://patchwork.kernel.org/patch/10109935/
>>> vfio - https://patchwork.kernel.org/patch/10103353/
>>> uio - https://patchwork.kernel.org/patch/9974031/
>>
>> So is the goal to get around the issues with enabling sriov on each of
>> the above drivers by doing it under the covers or are you really just
>> trying to enable sriov for a truly unmanage (no pf driver) case?  For
>> example, should a driver explicitly not wanting sriov enabled implement
>> a dummy sriov_configure function?
>>
>>> Since this is quickly blowing up into a multi-driver problem it is probably
>>> best to implement this solution in one spot.
>>>
>>> This patch is an attempt to do that. What we do with this patch is provide
>>> a generic call to enable SR-IOV in the case that the PF driver is either
>>> not present, or the PF driver doesn't support configuring SR-IOV.
>>>
>>> A new sysfs value called sriov_unmanaged_autoprobe has been added. This
>>> value is used as the drivers_autoprobe setting of the VFs when they are
>>> being managed by an external entity such as userspace or device firmware
>>> instead of being managed by the kernel.
>>
>> Documentation/ABI/testing/sysfs-bus-pci update is missing.
>
> I can make sure to update that in the next version.
>
>>> One side effect of this change is that the sriov_drivers_autoprobe and
>>> sriov_unmanaged_autoprobe will only apply their updates when SR-IOV is
>>> disabled. Attempts to update them when SR-IOV is in use will only update
>>> the local value and will not update sriov->autoprobe.
>>
>> And we expect users to understand when sriov_drivers_autoprobe applies
>> vs sriov_unmanaged_autoprobe, even though they're using the same
>> interfaces to enable sriov?  Are all combinations expected to work, ex.
>> unmanaged sriov is enabled, a native pf driver loads, vfs work?  Not
>> only does it seems like there's opportunity to use this incorrectly, I
>> think maybe it might be difficult to use correctly.
>>
>>> I based my patch set originally on the patch by Mark Rustad but there isn't
>>> much left after going through and cleaning out the bits that were no longer
>>> needed, and after incorporating the feedback from David Miller.
>>>
>>> I have included the authors of the original 4 patches above in the Cc here.
>>> My hope is to get feedback and/or review on if this works for their use
>>> cases.
>>>
>>> Cc: Mark Rustad 
>>> Cc: Maximilian Heyne 
>>> Cc: Liang-Min Wang 
>>> Cc: David Woodhouse 
>>> Signed-off-by: Alexander Duyck 
>>> ---
>>>  drivers/pci/iov.c|   27 +++-
>>>  drivers/pci/pci-driver.c |2 +
>>>  drivers/pci/pci-sysfs.c  |   62 
>>> +-
>>>  drivers/pci/pci.h|4 ++-
>>>  include/linux/pci.h  |1 +
>>>  5 files changed, 86 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>>> index 677924ae0350..7b8858bd8d03 100644
>>> --- a/drivers/pci/iov.c
>>> +++ b/drivers/pci/iov.c
>>> @@ -446,6 +446,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
>>>   pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, >vf_device);
>>>   iov->pgsz = pgsz;
>>>   iov->self = dev;
>>> + iov->autoprobe = true;
>>>   iov->drivers_autoprobe = true;
>>>   pci_read_config_dword(dev, pos + PCI_SRIOV_CAP, >cap);
>>>   pci_read_config_byte(dev, pos + PCI_SRIOV_FUNC_LINK, >link);
>>> @@ -643,8 +644,11 @@ void pci_restore_iov_state(struct pci_dev *dev)
>>>   */
>>>  void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool auto_probe)
>>>  {
>>> - if (dev->is_physfn)
>>> + if (dev->is_physfn) {
>>>   dev->sriov->drivers_autoprobe = auto_probe;
>>> + if (!dev->sriov->num_VFs)
>>> + dev->sriov->autoprobe = auto_probe;
>>
>> Why is dev->sriov->autoprobe set any time other than immediately prior
>> to enabling VFs?
>
> My concern here was drivers that are still floating around with the
> old module parameter option for enabling SR-IOV. In the unlikely event
> that somebody was to use such a