[virtio-dev] virtio balloon Spec Question

2018-02-01 Thread Jonathan Helman

Hi,

I see that an email sent 2 years ago contains a change that, from what I 
can tell, never made it into the virtio specification: 
https://lists.oasis-open.org/archives/virtio-comment/201602/msg4.html


This makes the spec inconsistent with the Linux kernel:

Spec (Section 5.5.6.3): 
http://docs.oasis-open.org/virtio/virtio/v1.0/cs04/virtio-v1.0-cs04.html#x1-2520006
Kernel: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/virtio_balloon.h?h=v4.15


I am working on a change which will add #defines for 7 and 8 to the 
virtio_balloon.h header file in the Linux kernel, and am hence planning 
to submit spec changes to virtio. My change requires the change in the 
email above to be applied to the spec (since that has the #define for 6).


Am I missing this change somewhere?  And if not, could the spec be 
updated with the change from the email above?


Thank you,
Jon Helman

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v25 0/2] Virtio-balloon: support free page reporting

2018-02-01 Thread Michael S. Tsirkin
On Thu, Jan 25, 2018 at 05:14:04PM +0800, Wei Wang wrote:
> This patch series is separated from the previous "Virtio-balloon
> Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT,  
> implemented by this series enables the virtio-balloon driver to report
> hints of guest free pages to the host. It can be used to accelerate live
> migration of VMs. Here is an introduction of this usage:
> 
> Live migration needs to transfer the VM's memory from the source machine
> to the destination round by round. For the 1st round, all the VM's memory
> is transferred. From the 2nd round, only the pieces of memory that were
> written by the guest (after the 1st round) are transferred. One method
> that is popularly used by the hypervisor to track which part of memory is
> written is to write-protect all the guest memory.
> 
> The second feature enables the optimization of the 1st round memory
> transfer - the hypervisor can skip the transfer of guest free pages in the
> 1st round. It is not concerned that the memory pages are used after they
> are given to the hypervisor as a hint of the free pages, because they will
> be tracked by the hypervisor and transferred in the next round if they are
> used and written.

Could you post performance numbers please?

> ChangeLog:
> v24->v25:
> - mm: change walk_free_mem_block to return 0 (instead of true) on
>   completing the report, and return a non-zero value from the
>   callabck, which stops the reporting.
> - virtio-balloon:
> - use enum instead of define for VIRTIO_BALLOON_VQ_INFLATE etc.
> - avoid __virtio_clear_bit when bailing out;
> - a new method to avoid reporting the some cmd id to host twice
> - destroy_workqueue can cancel free page work when the feature is
>   negotiated;
> - fail probe when the free page vq size is less than 2.
> v23->v24:
> - change feature name VIRTIO_BALLOON_F_FREE_PAGE_VQ to
>   VIRTIO_BALLOON_F_FREE_PAGE_HINT
> - kick when vq->num_free < half full, instead of "= half full"
> - replace BUG_ON with bailing out
> - check vb->balloon_wq in probe(), if null, bail out
> - add a new feature bit for page poisoning
> - solve the corner case that one cmd id being sent to host twice
> v22->v23:
> - change to kick the device when the vq is half-way full;
> - open-code batch_free_page_sg into add_one_sg;
> - change cmd_id from "uint32_t" to "__virtio32";
> - reserver one entry in the vq for teh driver to send cmd_id, instead
>   of busywaiting for an available entry;
> - add "stop_update" check before queue_work for prudence purpose for
>   now, will have a separate patch to discuss this flag check later;
> - init_vqs: change to put some variables on stack to have simpler
>   implementation;
> - add destroy_workqueue(vb->balloon_wq);
> 
> v21->v22:
> - add_one_sg: some code and comment re-arrangement
> - send_cmd_id: handle a cornercase
> 
> For previous ChangeLog, please reference
> https://lwn.net/Articles/743660/
> 
> Wei Wang (2):
>   mm: support reporting free page blocks
>   virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
> 
>  drivers/virtio/virtio_balloon.c | 251 
> ++--
>  include/linux/mm.h  |   6 +
>  include/uapi/linux/virtio_balloon.h |   7 +
>  mm/page_alloc.c |  96 ++
>  4 files changed, 324 insertions(+), 36 deletions(-)
> 
> -- 
> 2.7.4

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v24 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-01 Thread Michael S. Tsirkin
On Thu, Jan 25, 2018 at 08:28:52PM +0900, Tetsuo Handa wrote:
> On 2018/01/25 12:32, Wei Wang wrote:
> > On 01/25/2018 01:15 AM, Michael S. Tsirkin wrote:
> >> On Wed, Jan 24, 2018 at 06:42:42PM +0800, Wei Wang wrote:
> >> +
> >> +static void report_free_page_func(struct work_struct *work)
> >> +{
> >> +struct virtio_balloon *vb;
> >> +unsigned long flags;
> >> +
> >> +vb = container_of(work, struct virtio_balloon, report_free_page_work);
> >> +
> >> +/* Start by sending the obtained cmd id to the host with an outbuf */
> >> +send_cmd_id(vb, >start_cmd_id);
> >> +
> >> +/*
> >> + * Set start_cmd_id to VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID to
> >> + * indicate a new request can be queued.
> >> + */
> >> +spin_lock_irqsave(>stop_update_lock, flags);
> >> +vb->start_cmd_id = cpu_to_virtio32(vb->vdev,
> >> +VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
> >> +spin_unlock_irqrestore(>stop_update_lock, flags);
> >> +
> >> +walk_free_mem_block(vb, 0, _balloon_send_free_pages);
> >> Can you teach walk_free_mem_block to return the && of all
> >> return calls, so caller knows whether it completed?
> > 
> > There will be two cases that can cause walk_free_mem_block to return 
> > without completing:
> > 1) host requests to stop in advance
> > 2) vq->broken
> > 
> > How about letting walk_free_mem_block simply return the value returned by 
> > its callback (i.e. virtio_balloon_send_free_pages)?
> > 
> > For host requests to stop, it returns "1", and the above only bails out 
> > when walk_free_mem_block return a "< 0" value.
> 
> I feel that virtio_balloon_send_free_pages is doing too heavy things.
> 
> It can be called for many times with IRQ disabled. Number of times
> it is called depends on amount of free pages (and fragmentation state).
> Generally, more free pages, more calls.
> 
> Then, why don't you allocate some pages for holding all pfn values
> and then call walk_free_mem_block() only for storing pfn values
> and then send pfn values without disabling IRQ?

So going over it, at some level we are talking about optimizations here.

I'll go over it one last time but at some level we might
be able to make progress faster if we start by enabling
the functionality in question.
If there are no other issues, I'm inclined to merge.

If nothing else, this
will make it possible for people to experiment and
report sources of overhead.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v25 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-01 Thread Michael S. Tsirkin
On Thu, Feb 01, 2018 at 05:43:22PM +0800, Wei Wang wrote:
> 3) Hints means the pages are quite likely to be free pages (no guarantee).
> If the pages given to host are going to be freed, then we really couldn't
> call them hints, they are true free pages. Ballooning needs true free pages,
> while live migration needs hints, would you agree with this?

It's an interesting point, I'm convinced by it.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio] [PATCH v7 08/11] packed virtqueues: more efficient virtqueue layout

2018-02-01 Thread Michael S. Tsirkin
On Thu, Feb 01, 2018 at 11:11:28AM +0100, Cornelia Huck wrote:
> On Thu, 1 Feb 2018 11:05:35 +0800
> Tiwei Bie  wrote:
> 
> > On Tue, Jan 30, 2018 at 09:40:35PM +0200, Michael S. Tsirkin wrote:
> > > On Tue, Jan 30, 2018 at 02:50:44PM +0100, Cornelia Huck wrote:  
> > > > On Tue, 23 Jan 2018 02:01:07 +0200
> > > > "Michael S. Tsirkin"  wrote:
> 
> > > > > +\subsubsection{Driver notifications}
> > > > > +\label{sec:Packed Virtqueues / Driver notifications}
> > > > > +Whenever not suppressed by Device Event Suppression,
> > > > > +driver is required to notify the device after
> > > > > +making changes to the virtqueue.
> > > > > +
> > > > > +Some devices benefit from ability to find out the number of
> > > > > +available descriptors in the ring, and whether to send
> > > > > +interrupts to drivers without accessing virtqueue in memory:
> > > > > +for efficiency or as a debugging aid.
> > > > > +
> > > > > +To help with these optimizations, driver notifications
> > > > > +to the device include the following information:
> > > > > +
> > > > > +\begin{itemize}
> > > > > +\item VQ number
> > > > > +\item Offset (in units of descriptor size) within the ring
> > > > > +  where the next available descriptor will be written
> > > > > +\item Wrap Counter referring to the next available
> > > > > +  descriptor
> > > > > +\end{itemize}
> > > > > +
> > > > > +Note that driver can trigger multiple notifications even without
> > > > > +making any more changes to the ring. These would then have
> > > > > +identical \field{Offset} and \field{Wrap Counter} values.  
> > > > 
> > > > (...)
> > > >   
> > > > > +\subsection{Driver Notification Format}\label{sec:Basic
> > > > > +Facilities of a Virtio Device / Packed Virtqueues / Driver 
> > > > > Notification Format}
> > > > > +
> > > > > +The following structure is used to notify device of
> > > > > +device events - i.e. available descriptors:
> > > > > +
> > > > > +\begin{lstlisting}
> > > > > +__le16 vqn;
> > > > > +__le16 next_off : 15;
> > > > > +intnext_wrap : 1;
> > > > > +\end{lstlisting}  
> > > > 
> > > > (...)
> > > >   
> > > > > +\subsubsection{Notifying The Device}\label{sec:Basic Facilities
> > > > > +of a Virtio Device / Packed Virtqueues / Supplying Buffers to The 
> > > > > Device / Notifying The Device}
> > > > > +
> > > > > +The actual method of device notification is bus-specific, but 
> > > > > generally
> > > > > +it can be expensive.  So the device MAY suppress such notifications 
> > > > > if it
> > > > > +doesn't need them, using the Driver Event Suppression structure
> > > > > +as detailed in section \ref{sec:Basic
> > > > > +Facilities of a Virtio Device / Packed Virtqueues / Event
> > > > > +Suppression Structure Format}.
> > > > > +
> > > > > +The driver has to be careful to expose the new \field{flags}
> > > > > +value before checking if notifications are suppressed.  
> > > > 
> > > > This is all I could find regarding notifications, and it leaves me
> > > > puzzled how notifications are actually supposed to work; especially,
> > > > where that driver notification structure is supposed to be relayed.
> > > > 
> > > > I'm obviously coming from a ccw perspective, but I don't think that pci
> > > > is all that different (well, hopefully).
> > > > 
> > > > Up to now, we notified for a certain virtqueue -- i.e., the device
> > > > driver notified the device that there is something to process for a
> > > > certain queue. ccw uses the virtqueue number in a gpr for a hypercall,
> > > > pci seems to use a write to the config space IIUC. With the packed
> > > > layout, we have more payload per notification. We should be able to put
> > > > it in the same gpr than the virtqueue for ccw (if needed, with some
> > > > compat magic, or with a new hypercall, which would be ugly but doable).
> > > > Not sure how this is supposed to work with pci.
> > > > 
> > > > Has there been any prototyping done to implement this in qemu + KVM?
> > > > I'm unsure how this will work with ioeventfds, which just trigger.  
> > > 
> > > The PCI MMIO version would just trigger on access to a specific
> > > address, ignoring all data in there. PIO would need something
> > > like a data mask so it can ignore everything except the vq #.
> > > 
> > > This is helpful for hardware offloads but I'm open to
> > > making this PCI specific or deferring until we have
> > > explicit support for hardware offloads.
> > > 
> > > What do you think?
> > >   
> > 
> > Hi,
> > 
> > I prefer to keep it (at least for PCI) and refine it if
> > necessary.
> > 
> > Because one of the important goals of packed ring is to
> > be hardware friendly. Supporting tail pointer is one of
> > the important things to make it hardware friendly. More
> > details could be found in Kully's below mail (I've done
> > some slight reformatting):
> > 
> > - START -
> 
> 
> 
> > - END -
> 
> So, my takeaway is here:
> 
> - Having this information (or a 

[virtio-dev] Re: [virtio] [PATCH v7 09/11] content: in-order buffer use

2018-02-01 Thread Cornelia Huck
On Tue, 23 Jan 2018 02:01:08 +0200
"Michael S. Tsirkin"  wrote:

> Using descriptors in-order is sometimes benefitial.  Add an option for

s/benefitial/beneficial/

> that - per-format detail allowing more optimizations will be added by
> follow-up patches.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  content.tex | 15 +++
>  1 file changed, 15 insertions(+)

Reviewed-by: Cornelia Huck 

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio] [PATCH v7 08/11] packed virtqueues: more efficient virtqueue layout

2018-02-01 Thread Cornelia Huck
On Thu, 1 Feb 2018 11:05:35 +0800
Tiwei Bie  wrote:

> On Tue, Jan 30, 2018 at 09:40:35PM +0200, Michael S. Tsirkin wrote:
> > On Tue, Jan 30, 2018 at 02:50:44PM +0100, Cornelia Huck wrote:  
> > > On Tue, 23 Jan 2018 02:01:07 +0200
> > > "Michael S. Tsirkin"  wrote:

> > > > +\subsubsection{Driver notifications}
> > > > +\label{sec:Packed Virtqueues / Driver notifications}
> > > > +Whenever not suppressed by Device Event Suppression,
> > > > +driver is required to notify the device after
> > > > +making changes to the virtqueue.
> > > > +
> > > > +Some devices benefit from ability to find out the number of
> > > > +available descriptors in the ring, and whether to send
> > > > +interrupts to drivers without accessing virtqueue in memory:
> > > > +for efficiency or as a debugging aid.
> > > > +
> > > > +To help with these optimizations, driver notifications
> > > > +to the device include the following information:
> > > > +
> > > > +\begin{itemize}
> > > > +\item VQ number
> > > > +\item Offset (in units of descriptor size) within the ring
> > > > +  where the next available descriptor will be written
> > > > +\item Wrap Counter referring to the next available
> > > > +  descriptor
> > > > +\end{itemize}
> > > > +
> > > > +Note that driver can trigger multiple notifications even without
> > > > +making any more changes to the ring. These would then have
> > > > +identical \field{Offset} and \field{Wrap Counter} values.  
> > > 
> > > (...)
> > >   
> > > > +\subsection{Driver Notification Format}\label{sec:Basic
> > > > +Facilities of a Virtio Device / Packed Virtqueues / Driver 
> > > > Notification Format}
> > > > +
> > > > +The following structure is used to notify device of
> > > > +device events - i.e. available descriptors:
> > > > +
> > > > +\begin{lstlisting}
> > > > +__le16 vqn;
> > > > +__le16 next_off : 15;
> > > > +intnext_wrap : 1;
> > > > +\end{lstlisting}  
> > > 
> > > (...)
> > >   
> > > > +\subsubsection{Notifying The Device}\label{sec:Basic Facilities
> > > > +of a Virtio Device / Packed Virtqueues / Supplying Buffers to The 
> > > > Device / Notifying The Device}
> > > > +
> > > > +The actual method of device notification is bus-specific, but generally
> > > > +it can be expensive.  So the device MAY suppress such notifications if 
> > > > it
> > > > +doesn't need them, using the Driver Event Suppression structure
> > > > +as detailed in section \ref{sec:Basic
> > > > +Facilities of a Virtio Device / Packed Virtqueues / Event
> > > > +Suppression Structure Format}.
> > > > +
> > > > +The driver has to be careful to expose the new \field{flags}
> > > > +value before checking if notifications are suppressed.  
> > > 
> > > This is all I could find regarding notifications, and it leaves me
> > > puzzled how notifications are actually supposed to work; especially,
> > > where that driver notification structure is supposed to be relayed.
> > > 
> > > I'm obviously coming from a ccw perspective, but I don't think that pci
> > > is all that different (well, hopefully).
> > > 
> > > Up to now, we notified for a certain virtqueue -- i.e., the device
> > > driver notified the device that there is something to process for a
> > > certain queue. ccw uses the virtqueue number in a gpr for a hypercall,
> > > pci seems to use a write to the config space IIUC. With the packed
> > > layout, we have more payload per notification. We should be able to put
> > > it in the same gpr than the virtqueue for ccw (if needed, with some
> > > compat magic, or with a new hypercall, which would be ugly but doable).
> > > Not sure how this is supposed to work with pci.
> > > 
> > > Has there been any prototyping done to implement this in qemu + KVM?
> > > I'm unsure how this will work with ioeventfds, which just trigger.  
> > 
> > The PCI MMIO version would just trigger on access to a specific
> > address, ignoring all data in there. PIO would need something
> > like a data mask so it can ignore everything except the vq #.
> > 
> > This is helpful for hardware offloads but I'm open to
> > making this PCI specific or deferring until we have
> > explicit support for hardware offloads.
> > 
> > What do you think?
> >   
> 
> Hi,
> 
> I prefer to keep it (at least for PCI) and refine it if
> necessary.
> 
> Because one of the important goals of packed ring is to
> be hardware friendly. Supporting tail pointer is one of
> the important things to make it hardware friendly. More
> details could be found in Kully's below mail (I've done
> some slight reformatting):
> 
> - START -



> - END -

So, my takeaway is here:

- Having this information (or a variant of it) available on
  notification is useful.
- The specifics on how to convey the info are still a bit unsettled.

I think this should be optionally available to any transport (i.e. not
pci-specific). What about the following wording:

"Driver notifications to the device include the virtqueue number.