Re: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-10-06 Thread Tyler Sanderson via Virtualization
On Sun, Oct 6, 2019 at 4:48 AM Michael S. Tsirkin  wrote:

> On Sun, Oct 06, 2019 at 10:30:40AM +0200, David Hildenbrand wrote:
> > Please note the "use outside of a testing or debugging environment is
> > not recommended". Usually you want a "soft" version of this, e.g., via
> > the OOM handler (so only drop parts of the cache, not all).
>
> Right. We'll need something softer I guess. By how much, I don't know.
>

It should be left up to the guest system administrator to configure the
trade off between performance and cache memory reclaimed.
And the primary performance metric would be cache hit rate.
So maybe some configuration option that lets you drop cache pages that have
a hit rate less than X per minute.
Or more simply: Pages that haven't been hit in the last X seconds.

The other way to do it would be to set a strict cache size limit, but that
would be workload dependent and toilsome to tune correctly, so it seems
like a worse choice.


>
> --
> MST
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-10-06 Thread Michael S. Tsirkin
On Sun, Oct 06, 2019 at 10:30:40AM +0200, David Hildenbrand wrote:
> Please note the "use outside of a testing or debugging environment is
> not recommended". Usually you want a "soft" version of this, e.g., via
> the OOM handler (so only drop parts of the cache, not all).

Right. We'll need something softer I guess. By how much, I don't know.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-10-06 Thread David Hildenbrand
On 04.10.19 21:03, Tyler Sanderson wrote:
> I think DEFLATE_ON_OOM makes sense conceptually, it's just that the
> implementation doesn't play well with the rest of memory management
> under memory pressure.
> It could probably be fixed with enough effort, but IMO free page hinting
> gets 90% of the benefit without poking the dark corners of memory
> management and so is a net win.
> 
> The obvious place where free page hinting falls short (as David pointed
> out above) is that it can't pressure the page cache.

One solution is to move the page cache to the hypervisor, e.g., using
emulated NVDIMMs or virtio-pmem.

> Would it be possible to add a mechanism that explicitly causes page
> cache to shrink without requiring the system to be under memory pressure?
> 

We do have a sysctl "drop_caches" which calls
iterate_supers(drop_pagecache_sb, NULL) and drop_slab().

doc/Documentation/sysctl/vm.txt:
==

drop_caches

Writing to this will cause the kernel to drop clean caches, as well as
reclaimable slab objects like dentries and inodes.  Once dropped, their
memory becomes free.

To free pagecache:
echo 1 > /proc/sys/vm/drop_caches
To free reclaimable slab objects (includes dentries and inodes):
echo 2 > /proc/sys/vm/drop_caches
To free slab objects and pagecache:
echo 3 > /proc/sys/vm/drop_caches

This is a non-destructive operation and will not free any dirty objects.
To increase the number of objects freed by this operation, the user may run
`sync' prior to writing to /proc/sys/vm/drop_caches.  This will minimize the
number of dirty objects on the system and create more candidates to be
dropped.

This file is not a means to control the growth of the various kernel caches
(inodes, dentries, pagecache, etc...)  These objects are automatically
reclaimed by the kernel when memory is needed elsewhere on the system.

Use of this file can cause performance problems.  Since it discards cached
objects, it may cost a significant amount of I/O and CPU to recreate the
dropped objects, especially if they were under heavy use.  Because of this,
use outside of a testing or debugging environment is not recommended.

You may see informational messages in your kernel log when this file is
used:

cat (1234): drop_caches: 3

These are informational only.  They do not mean that anything is wrong
with your system.  To disable them, echo 4 (bit 2) into drop_caches.

==

Please note the "use outside of a testing or debugging environment is
not recommended". Usually you want a "soft" version of this, e.g., via
the OOM handler (so only drop parts of the cache, not all).

-- 

Thanks,

David / dhildenb
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-10-05 Thread Michael S. Tsirkin
On Fri, Oct 04, 2019 at 12:03:43PM -0700, Tyler Sanderson wrote:
> I think DEFLATE_ON_OOM makes sense conceptually, it's just that the
> implementation doesn't play well with the rest of memory management under
> memory pressure.
> It could probably be fixed with enough effort, but IMO free page hinting gets
> 90% of the benefit without poking the dark corners of memory management and so
> is a net win.
> 
> The obvious place where free page hinting falls short (as David pointed out
> above) is that it can't pressure the page cache.
> Would it be possible to add a mechanism that explicitly causes page cache to
> shrink without requiring the system to be under memory pressure?

Which API would you call to shrink it?


> On Fri, Oct 4, 2019 at 1:56 AM David Hildenbrand  wrote:
> 
> On 04.10.19 10:35, Michael S. Tsirkin wrote:
> > On Fri, Oct 04, 2019 at 10:06:03AM +0200, David Hildenbrand wrote:
> >> On 04.10.19 01:15, Tyler Sanderson wrote:
> >>> I was mistaken, the problem with overcommit accounting is not fixed by
> >>> the change to shrinker interface.
> >>> This means that large allocations are stopped even if they could
> succeed
> >>> by deflating the balloon.
> >>
> >> Please note that some people use the balloon for actual memory unplug -
> >> so initiating to deflate the balloon under any circumstances is
> >> undesired. It's different with "VIRTIO_BALLOON_F_DEFLATE_ON_OOM" being
> >> set - however that is barely the case (at least in the setups I know :)
> ).
> >>
> >> So yes, free page reporting is a different thing, because it really is
> >> used to "hint" and not to "agree to unplug" in any scenario.
> >>
> >> --
> >>
> >> Thanks,
> >>
> >
> >
> > VIRTIO_BALLOON_F_DEFLATE_ON_OOM isn't really well thought through
> > at the spec level either. For example, when will we inflate again?
> > Current code does this at the next interrupt, which requires
> > host to somehow know it's time to inflate.
> >
> 
> The host has access to memory stats of the guest, so it could come up
> with some heuristics - but I do agree that is not well thought through -
> one reason why it is barely used :)
> 
> --
> 
> Thanks,
> 
> David / dhildenb
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-10-04 Thread David Hildenbrand
On 04.10.19 10:35, Michael S. Tsirkin wrote:
> On Fri, Oct 04, 2019 at 10:06:03AM +0200, David Hildenbrand wrote:
>> On 04.10.19 01:15, Tyler Sanderson wrote:
>>> I was mistaken, the problem with overcommit accounting is not fixed by
>>> the change to shrinker interface.
>>> This means that large allocations are stopped even if they could succeed
>>> by deflating the balloon.
>>
>> Please note that some people use the balloon for actual memory unplug -
>> so initiating to deflate the balloon under any circumstances is
>> undesired. It's different with "VIRTIO_BALLOON_F_DEFLATE_ON_OOM" being
>> set - however that is barely the case (at least in the setups I know :) ).
>>
>> So yes, free page reporting is a different thing, because it really is
>> used to "hint" and not to "agree to unplug" in any scenario.
>>
>> -- 
>>
>> Thanks,
>>
> 
> 
> VIRTIO_BALLOON_F_DEFLATE_ON_OOM isn't really well thought through
> at the spec level either. For example, when will we inflate again?
> Current code does this at the next interrupt, which requires
> host to somehow know it's time to inflate.
> 

The host has access to memory stats of the guest, so it could come up
with some heuristics - but I do agree that is not well thought through -
one reason why it is barely used :)

-- 

Thanks,

David / dhildenb
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-10-04 Thread Michael S. Tsirkin
On Fri, Oct 04, 2019 at 10:06:03AM +0200, David Hildenbrand wrote:
> On 04.10.19 01:15, Tyler Sanderson wrote:
> > I was mistaken, the problem with overcommit accounting is not fixed by
> > the change to shrinker interface.
> > This means that large allocations are stopped even if they could succeed
> > by deflating the balloon.
> 
> Please note that some people use the balloon for actual memory unplug -
> so initiating to deflate the balloon under any circumstances is
> undesired. It's different with "VIRTIO_BALLOON_F_DEFLATE_ON_OOM" being
> set - however that is barely the case (at least in the setups I know :) ).
> 
> So yes, free page reporting is a different thing, because it really is
> used to "hint" and not to "agree to unplug" in any scenario.
> 
> -- 
> 
> Thanks,
> 


VIRTIO_BALLOON_F_DEFLATE_ON_OOM isn't really well thought through
at the spec level either. For example, when will we inflate again?
Current code does this at the next interrupt, which requires
host to somehow know it's time to inflate.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-10-04 Thread David Hildenbrand
On 04.10.19 01:15, Tyler Sanderson wrote:
> I was mistaken, the problem with overcommit accounting is not fixed by
> the change to shrinker interface.
> This means that large allocations are stopped even if they could succeed
> by deflating the balloon.

Please note that some people use the balloon for actual memory unplug -
so initiating to deflate the balloon under any circumstances is
undesired. It's different with "VIRTIO_BALLOON_F_DEFLATE_ON_OOM" being
set - however that is barely the case (at least in the setups I know :) ).

So yes, free page reporting is a different thing, because it really is
used to "hint" and not to "agree to unplug" in any scenario.

-- 

Thanks,

David / dhildenb
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-10-03 Thread Michael S. Tsirkin
On Thu, Oct 03, 2019 at 11:27:46AM -0700, Tyler Sanderson wrote:
> Sorry for the slow reply, I did some verification on my end. See responses
> inline.
> 
> On Mon, Sep 16, 2019 at 12:26 AM David Hildenbrand  wrote:
> 
> On 16.09.19 03:41, Wei Wang wrote:
> > On 09/14/2019 02:36 AM, Tyler Sanderson wrote:
> >> Hello, I'm curious about the intent of VIRTIO_BALLOON_F_FREE_PAGE_HINT
> >> (commit
> >> <https://github.com/torvalds/linux/commit/
> 86a559787e6f5cf662c081363f64a20cad654195#
> diff-fd202acf694d9eba19c8c64da3e480c9>).
> >>
> >>
> >> My understanding is that this mechanism works similarly to the
> >> existing inflate/deflate queues. Pages are allocated by the guest and
> >> then reported on VQ_FREE_PAGE.
> >>
> >> Question: Is there a limit to how many pages will be allocated? What
> >> controls the amount of memory pressure applied?
> >
> > No control for the limit currently. The implementation reports all the
> > guest free pages to host.
> > The main usage for this feature so far is to have guest skip sending
> > those guest free pages
> > (the more, the better) during live migration.
> 
> How does this differ from the regular inflate/deflate queue?
> Also, couldn't you simply skip sending pages that do not have host pages
> backing them (assuming pages added to the balloon are unbacked to reclaim the
> memory)?

Yes but putting most guest memory into the balloon would
slow the guest down significantly.


> 
> >
> >
> >>
> >> In my experience with virtio balloon there are problems with the
> >> mechanisms that are supposed to deflate the balloon in response to
> >> memory pressure (e.g. OOM notifier).
> >
> > What problem did you see? We've also changed balloon to use memory
> shrinker,
> > did you see the problem with shrinker as well?
> 
> Yes, I've observed problems both before and after the shrinker change 
> (although
> different problems).
> Before the shrinker change, the overcommit accounting feature gets in the way
> and prevents allocations, even when the balloon could be deflated. The OOM
> notifier is never invoked so the balloon driver's hook into the OOM notifier 
> is
> useless.
> After the shrinker change the overcommit accounting problem is fixed, but I
> have still found that forcibly deflating the balloon under memory pressure is
> slow enough that random allocations can still fail (is there a timeout for
> allocations?).
> For example, I've seen:
> tysand@vm ~ $ fallocate -l 5G d/foo    // d is tmpfs mount. This command 
> causes
> balloon to require deflation.
> tysand@vm grep Mem /proc/meminfo
> MemTotal:        8172852 kB
> MemFree:          138932 kB
> MemAvailable:      83428 kB
> tysand@vm ~ $ grep Mem /proc/meminfo
> free(): invalid pointer
> -bash: wait_for: No record of process 5415
> free(): invalid pointer
> 
> Or similarly, I've seen SSH terminate with:
> tysand@vm:~$ grep Mem /proc/meminfo
> *** stack smashing detected ***:  terminated
> 
> Presumably the stack smashing and "free(): invalid pointer" are caused by
> malloc returning null in those programs and the programs not handling it
> correctly.
> 
> Notably I don't see the fallocate command fail. Usually only other processes.
> 
> 
> >
> >>
> >> It seems an ideal balloon interface would allow the guest to round
> >> robin through free guest physical pages, allowing the host to unback
> >> them, but never having more than a few pages allocated to the balloon
> >> at any one time. For example:
> >> 1. Guest allocates 1 page and notifies balloon device of this page's
> >> address.
> >> 2. Host debacks the received page.
> >> 3. Guest frees the page.
> >> 4. Repeat at #1, but ensure that different pages are allocated each
> time.
> >
> > Probably you need a mechanism to "ensure" different pages to be
> allocated.
> > The current implementation (having balloon hold the allocated pages)
> could
> > be thought of as one mechanism (it is simple).
> >
> >>
> >> This way the "balloon size" is never more than a few pages and does
> >> not create memory pressure. However the difficulty is in ensuring each
> >> set of sent pages is disjoint from previously sent pages. Is there a
> >> mechanism to round-robin allocations through all of

Re: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-09-16 Thread David Hildenbrand
On 16.09.19 03:41, Wei Wang wrote:
> On 09/14/2019 02:36 AM, Tyler Sanderson wrote:
>> Hello, I'm curious about the intent of VIRTIO_BALLOON_F_FREE_PAGE_HINT 
>> (commit 
>> <https://github.com/torvalds/linux/commit/86a559787e6f5cf662c081363f64a20cad654195#diff-fd202acf694d9eba19c8c64da3e480c9>).
>>  
>>
>>
>> My understanding is that this mechanism works similarly to the 
>> existing inflate/deflate queues. Pages are allocated by the guest and 
>> then reported on VQ_FREE_PAGE.
>>
>> Question: Is there a limit to how many pages will be allocated? What 
>> controls the amount of memory pressure applied?
> 
> No control for the limit currently. The implementation reports all the 
> guest free pages to host.
> The main usage for this feature so far is to have guest skip sending 
> those guest free pages
> (the more, the better) during live migration.
> 
> 
>>
>> In my experience with virtio balloon there are problems with the 
>> mechanisms that are supposed to deflate the balloon in response to 
>> memory pressure (e.g. OOM notifier).
> 
> What problem did you see? We've also changed balloon to use memory shrinker,
> did you see the problem with shrinker as well?
> 
>>
>> It seems an ideal balloon interface would allow the guest to round 
>> robin through free guest physical pages, allowing the host to unback 
>> them, but never having more than a few pages allocated to the balloon 
>> at any one time. For example:
>> 1. Guest allocates 1 page and notifies balloon device of this page's 
>> address.
>> 2. Host debacks the received page.
>> 3. Guest frees the page.
>> 4. Repeat at #1, but ensure that different pages are allocated each time.
> 
> Probably you need a mechanism to "ensure" different pages to be allocated.
> The current implementation (having balloon hold the allocated pages) could
> be thought of as one mechanism (it is simple).
> 
>>
>> This way the "balloon size" is never more than a few pages and does 
>> not create memory pressure. However the difficulty is in ensuring each 
>> set of sent pages is disjoint from previously sent pages. Is there a 
>> mechanism to round-robin allocations through all of guest physical 
>> memory? Does VIRTIO_BALLOON_F_FREE_PAGE_HINT enable this?

There are use cases where you really want memory pressure (page cache is
the prime example). Anyhow, I think the use case you want the
"round-robin allocations" for is better tackled by "free page reporting"
(used to be called "free page hinting") currently discussed on various
lists.

"allowing the host to unback them, but never having more than a few
pages allocated to the balloon at any one time." is similar to what
"free page reporting" does. We decided to only report bigger pages
(avoid splitting up THP in the hypervisor, overhead) and only
temporarily pull out a fixed amount of pages (16) from the page
allocator to avoid false-OOM. Guaranteeing forward progress (similar to
what you describe) is one important key concept.

-- 

Thanks,

David / dhildenb
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-09-15 Thread Wei Wang

On 09/14/2019 02:36 AM, Tyler Sanderson wrote:
Hello, I'm curious about the intent of VIRTIO_BALLOON_F_FREE_PAGE_HINT 
(commit 
<https://github.com/torvalds/linux/commit/86a559787e6f5cf662c081363f64a20cad654195#diff-fd202acf694d9eba19c8c64da3e480c9>). 



My understanding is that this mechanism works similarly to the 
existing inflate/deflate queues. Pages are allocated by the guest and 
then reported on VQ_FREE_PAGE.


Question: Is there a limit to how many pages will be allocated? What 
controls the amount of memory pressure applied?


No control for the limit currently. The implementation reports all the 
guest free pages to host.
The main usage for this feature so far is to have guest skip sending 
those guest free pages

(the more, the better) during live migration.




In my experience with virtio balloon there are problems with the 
mechanisms that are supposed to deflate the balloon in response to 
memory pressure (e.g. OOM notifier).


What problem did you see? We've also changed balloon to use memory shrinker,
did you see the problem with shrinker as well?



It seems an ideal balloon interface would allow the guest to round 
robin through free guest physical pages, allowing the host to unback 
them, but never having more than a few pages allocated to the balloon 
at any one time. For example:
1. Guest allocates 1 page and notifies balloon device of this page's 
address.

2. Host debacks the received page.
3. Guest frees the page.
4. Repeat at #1, but ensure that different pages are allocated each time.


Probably you need a mechanism to "ensure" different pages to be allocated.
The current implementation (having balloon hold the allocated pages) could
be thought of as one mechanism (it is simple).



This way the "balloon size" is never more than a few pages and does 
not create memory pressure. However the difficulty is in ensuring each 
set of sent pages is disjoint from previously sent pages. Is there a 
mechanism to round-robin allocations through all of guest physical 
memory? Does VIRTIO_BALLOON_F_FREE_PAGE_HINT enable this?




AFAIK, no such round-robin allocation so far. This may need the page 
allocation to have states recording

the allocation history.

Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v37 1/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-12-28 Thread Christian Borntraeger


On 28.12.2018 04:12, Wei Wang wrote:
> On 12/27/2018 08:03 PM, Christian Borntraeger wrote:
>> On 27.08.2018 03:32, Wei Wang wrote:
>>>   static int init_vqs(struct virtio_balloon *vb)
>>>   {
>>> -    struct virtqueue *vqs[3];
>>> -    vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
>>> };
>>> -    static const char * const names[] = { "inflate", "deflate", "stats" };
>>> -    int err, nvqs;
>>> +    struct virtqueue *vqs[VIRTIO_BALLOON_VQ_MAX];
>>> +    vq_callback_t *callbacks[VIRTIO_BALLOON_VQ_MAX];
>>> +    const char *names[VIRTIO_BALLOON_VQ_MAX];
>>> +    int err;
>>>
>>>   /*
>>> - * We expect two virtqueues: inflate and deflate, and
>>> - * optionally stat.
>>> + * Inflateq and deflateq are used unconditionally. The names[]
>>> + * will be NULL if the related feature is not enabled, which will
>>> + * cause no allocation for the corresponding virtqueue in find_vqs.
>>>    */
>> This might be true for virtio-pci, but it is not for virtio-ccw.
> 
> Hi Christian,
> 
> 
> Please try the fix patches: https://lkml.org/lkml/2018/12/27/336

See answer to that thread. It fixes the random boot crashes.
There is still the regression that ballooning does no longer work on
s390 (see the call trace).

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v37 1/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-12-27 Thread Wei Wang

On 12/27/2018 08:03 PM, Christian Borntraeger wrote:

On 27.08.2018 03:32, Wei Wang wrote:

  static int init_vqs(struct virtio_balloon *vb)
  {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-   static const char * const names[] = { "inflate", "deflate", "stats" };
-   int err, nvqs;
+   struct virtqueue *vqs[VIRTIO_BALLOON_VQ_MAX];
+   vq_callback_t *callbacks[VIRTIO_BALLOON_VQ_MAX];
+   const char *names[VIRTIO_BALLOON_VQ_MAX];
+   int err;

/*
-* We expect two virtqueues: inflate and deflate, and
-* optionally stat.
+* Inflateq and deflateq are used unconditionally. The names[]
+* will be NULL if the related feature is not enabled, which will
+* cause no allocation for the corresponding virtqueue in find_vqs.
 */

This might be true for virtio-pci, but it is not for virtio-ccw.


Hi Christian,


Please try the fix patches: https://lkml.org/lkml/2018/12/27/336

Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v37 1/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-12-27 Thread Christian Borntraeger
On 27.08.2018 03:32, Wei Wang wrote:
>  static int init_vqs(struct virtio_balloon *vb)
>  {
> - struct virtqueue *vqs[3];
> - vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
> };
> - static const char * const names[] = { "inflate", "deflate", "stats" };
> - int err, nvqs;
> + struct virtqueue *vqs[VIRTIO_BALLOON_VQ_MAX];
> + vq_callback_t *callbacks[VIRTIO_BALLOON_VQ_MAX];
> + const char *names[VIRTIO_BALLOON_VQ_MAX];
> + int err;
> 
>   /*
> -  * We expect two virtqueues: inflate and deflate, and
> -  * optionally stat.
> +  * Inflateq and deflateq are used unconditionally. The names[]
> +  * will be NULL if the related feature is not enabled, which will
> +  * cause no allocation for the corresponding virtqueue in find_vqs.
>*/

This might be true for virtio-pci, but it is not for virtio-ccw.

> - nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> - err = virtio_find_vqs(vb->vdev, nvqs, vqs, callbacks, names, NULL);
> + callbacks[VIRTIO_BALLOON_VQ_INFLATE] = balloon_ack;
> + names[VIRTIO_BALLOON_VQ_INFLATE] = "inflate";
> + callbacks[VIRTIO_BALLOON_VQ_DEFLATE] = balloon_ack;
> + names[VIRTIO_BALLOON_VQ_DEFLATE] = "deflate";
> + names[VIRTIO_BALLOON_VQ_STATS] = NULL;
> + names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
> +
> + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
> + names[VIRTIO_BALLOON_VQ_STATS] = "stats";
> + callbacks[VIRTIO_BALLOON_VQ_STATS] = stats_request;
> + }
> +
> + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> + names[VIRTIO_BALLOON_VQ_FREE_PAGE] = "free_page_vq";
> + callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
> + }
> +
> + err = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX,
[...]

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v37 1/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-08-26 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.
Currenlty, only free page blocks of MAX_ORDER - 1 are reported. They are
obtained one by one from the mm free list via the regular allocation
function.

Host requests the guest to report free page hints by sending a new cmd id
to the guest via the free_page_report_cmd_id configuration register. When
the guest starts to report, it first sends a start cmd to host via the
free page vq, which acks to host the cmd id received. When the guest
finishes reporting free pages, a stop cmd is sent to host via the vq.
Host may also send a stop cmd id to the guest to stop the reporting.

VIRTIO_BALLOON_CMD_ID_STOP: Host sends this cmd to stop the guest
reporting.
VIRTIO_BALLOON_CMD_ID_DONE: Host sends this cmd to tell the guest that
the reported pages are ready to be freed.

Why does the guest free the reported pages when host tells it is ready to
free?
This is because freeing pages appears to be expensive for live migration.
free_pages() dirties memory very quickly and makes the live migraion not
converge in some cases. So it is good to delay the free_page operation
when the migration is done, and host sends a command to guest about that.

Why do we need the new VIRTIO_BALLOON_CMD_ID_DONE, instead of reusing
VIRTIO_BALLOON_CMD_ID_STOP?
This is because live migration is usually done in several rounds. At the
end of each round, host needs to send a VIRTIO_BALLOON_CMD_ID_STOP cmd to
the guest to stop (or say pause) the reporting. The guest resumes the
reporting when it receives a new command id at the beginning of the next
round. So we need a new cmd id to distinguish between "stop reporting" and
"ready to free the reported pages".

TODO:
- Add a batch page allocation API to amortize the allocation overhead.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
Cc: Linus Torvalds 
---
 drivers/virtio/virtio_balloon.c | 364 
 include/uapi/linux/virtio_balloon.h |   5 +
 2 files changed, 336 insertions(+), 33 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index d1c1f62..a185678 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -41,13 +41,34 @@
 #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
+#define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
+__GFP_NOMEMALLOC)
+/* The order of free page blocks to report to host */
+#define VIRTIO_BALLOON_FREE_PAGE_ORDER (MAX_ORDER - 1)
+/* The size of a free page block in bytes */
+#define VIRTIO_BALLOON_FREE_PAGE_SIZE \
+   (1 << (VIRTIO_BALLOON_FREE_PAGE_ORDER + PAGE_SHIFT))
+
 #ifdef CONFIG_BALLOON_COMPACTION
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -57,6 +78,18 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The list of allocated free pages, waiting to be given back to mm */
+   struct list_head free_page_list;
+   spinlock_t free_page_list_lock;
+   /* The number of free page blocks on the above list */
+   unsigned long num_free_page_blocks;
+   /* The cmd id received from host */
+   u32 cmd_id_received;
+   /* The cmd id that is actively in use */
+   __virtio32 cmd_id_active;
+   /* Buffer to store the stop sign */
+   __virtio32 cmd_id_stop;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -320,17 +353,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct 

[PATCH v36 3/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-07-20 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.
Currenlty, only free page blocks of MAX_ORDER - 1 are reported. They are
obtained one by one from the mm free list via the regular allocation
function. The allocated pages are given back to mm after they are put onto
the vq.

Host requests the guest to report free page hints by sending a new cmd id
to the guest via the free_page_report_cmd_id configuration register. When
the guest starts to report, it first sends a start cmd to host via the
free page vq, which acks to host the cmd id received. When the guest
finishes reporting free pages, a stop cmd is sent to host via the vq.

TODO:
- Add a batch page allocation API to amortize the allocation overhead.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
Cc: Linus Torvalds 
---
 drivers/virtio/virtio_balloon.c | 331 +---
 include/uapi/linux/virtio_balloon.h |   4 +
 2 files changed, 307 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index c6fd406..82cd497 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -42,6 +42,14 @@
 #define DEFAULT_BALLOON_PAGES_TO_SHRINK 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
+#define VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG (__GFP_NORETRY | __GFP_NOWARN | \
+__GFP_NOMEMALLOC)
+/* The order of free page blocks to report to host */
+#define VIRTIO_BALLOON_FREE_PAGE_ORDER (MAX_ORDER - 1)
+/* The size of a free page block in bytes */
+#define VIRTIO_BALLOON_FREE_PAGE_SIZE \
+   (1 << (VIRTIO_BALLOON_FREE_PAGE_ORDER + PAGE_SHIFT))
+
 static unsigned long balloon_pages_to_shrink = DEFAULT_BALLOON_PAGES_TO_SHRINK;
 module_param(balloon_pages_to_shrink, ulong, 0600);
 MODULE_PARM_DESC(balloon_pages_to_shrink, "pages to free on memory presure");
@@ -50,9 +58,22 @@ MODULE_PARM_DESC(balloon_pages_to_shrink, "pages to free on 
memory presure");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -62,6 +83,16 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The list of allocated free pages, waiting to be given back to mm */
+   struct list_head free_page_list;
+   spinlock_t free_page_list_lock;
+   /* The cmd id received from host */
+   u32 cmd_id_received;
+   /* The cmd id that is actively in use */
+   __virtio32 cmd_id_active;
+   /* Buffer to store the stop sign */
+   __virtio32 cmd_id_stop;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -325,17 +356,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -352,6 +372,52 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id,

[PATCH v35 3/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-07-10 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd id
to the guest via the free_page_report_cmd_id configuration register.

As the first step here, virtio-balloon only reports free page hints from
the max order (i.e. 10) free page list to host. This has generated similar
good results as reporting all free page hints during our tests.

When the guest starts to report, it first sends a start cmd to host via
the free page vq, which acks to host the cmd id received, and tells it the
hint size (e.g. 4MB each on x86). When the guest finishes the reporting,
a stop cmd is sent to host via the vq.

TODO:
- support reporting free page hints from smaller order free page lists
  when there is a need/request from users.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
---
 drivers/virtio/virtio_balloon.c | 399 +---
 include/uapi/linux/virtio_balloon.h |  11 +
 2 files changed, 384 insertions(+), 26 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 9356a1a..8754154 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -43,6 +43,14 @@
 #define OOM_VBALLOON_DEFAULT_PAGES 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
+/* The order used to allocate a buffer to load free page hints */
+#define VIRTIO_BALLOON_HINT_BUF_ORDER (MAX_ORDER - 1)
+/* The number of pages a hint buffer has */
+#define VIRTIO_BALLOON_HINT_BUF_PAGES (1 << VIRTIO_BALLOON_HINT_BUF_ORDER)
+/* The size of a hint buffer in bytes */
+#define VIRTIO_BALLOON_HINT_BUF_SIZE (VIRTIO_BALLOON_HINT_BUF_PAGES << \
+ PAGE_SHIFT)
+
 static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
 module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
@@ -51,9 +59,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +84,15 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* Command buffers to start and stop the reporting of hints to host */
+   struct virtio_balloon_free_page_hints_cmd cmd_start;
+   struct virtio_balloon_free_page_hints_cmd cmd_stop;
+
+   /* The cmd id received from host */
+   u32 cmd_id_received;
+   /* The cmd id that is actively in use */
+   u32 cmd_id_active;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -326,17 +356,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -353,6 +372,35 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_

Re: [virtio-dev] Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-27 Thread Michael S. Tsirkin
On Wed, Jun 27, 2018 at 01:27:55PM +0800, Wei Wang wrote:
> On 06/27/2018 11:58 AM, Michael S. Tsirkin wrote:
> > On Wed, Jun 27, 2018 at 11:00:05AM +0800, Wei Wang wrote:
> > > On 06/27/2018 10:41 AM, Michael S. Tsirkin wrote:
> > > > On Wed, Jun 27, 2018 at 09:24:18AM +0800, Wei Wang wrote:
> > > > > On 06/26/2018 09:34 PM, Michael S. Tsirkin wrote:
> > > > > > On Tue, Jun 26, 2018 at 08:27:44PM +0800, Wei Wang wrote:
> > > > > > > On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:
> > > > > > > > On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:
> > > > > > > > 
> > > > > > > > > > > + if (!arrays)
> > > > > > > > > > > + return NULL;
> > > > > > > > > > > +
> > > > > > > > > > > + for (i = 0; i < max_array_num; i++) {
> > > > > > > > > > So we are getting a ton of memory here just to free it up a 
> > > > > > > > > > bit later.
> > > > > > > > > > Why doesn't get_from_free_page_list get the pages from free 
> > > > > > > > > > list for us?
> > > > > > > > > > We could also avoid the 1st allocation then - just build a 
> > > > > > > > > > list
> > > > > > > > > > of these.
> > > > > > > > > That wouldn't be a good choice for us. If we check how the 
> > > > > > > > > regular
> > > > > > > > > allocation works, there are many many things we need to 
> > > > > > > > > consider when pages
> > > > > > > > > are allocated to users.
> > > > > > > > > For example, we need to take care of the nr_free
> > > > > > > > > counter, we need to check the watermark and perform the 
> > > > > > > > > related actions.
> > > > > > > > > Also the folks working on arch_alloc_page to monitor page 
> > > > > > > > > allocation
> > > > > > > > > activities would get a surprise..if page allocation is 
> > > > > > > > > allowed to work in
> > > > > > > > > this way.
> > > > > > > > > 
> > > > > > > > mm/ code is well positioned to handle all this correctly.
> > > > > > > I'm afraid that would be a re-implementation of the alloc 
> > > > > > > functions,
> > > > > > A re-factoring - you can share code. The main difference is locking.
> > > > > > 
> > > > > > > and
> > > > > > > that would be much more complex than what we have. I think your 
> > > > > > > idea of
> > > > > > > passing a list of pages is better.
> > > > > > > 
> > > > > > > Best,
> > > > > > > Wei
> > > > > > How much memory is this allocating anyway?
> > > > > > 
> > > > > For every 2TB memory that the guest has, we allocate 4MB.
> > > > Hmm I guess I'm missing something, I don't see it:
> > > > 
> > > > 
> > > > +   max_entries = max_free_page_blocks(ARRAY_ALLOC_ORDER);
> > > > +   entries_per_page = PAGE_SIZE / sizeof(__le64);
> > > > +   entries_per_array = entries_per_page * (1 << ARRAY_ALLOC_ORDER);
> > > > +   max_array_num = max_entries / entries_per_array +
> > > > +   !!(max_entries % entries_per_array);
> > > > 
> > > > Looks like you always allocate the max number?
> > > Yes. We allocated the max number and then free what's not used.
> > > For example, a 16TB guest, we allocate Four 4MB buffers and pass the 4
> > > buffers to get_from_free_page_list. If it uses 3, then the remaining 1 
> > > "4MB
> > > buffer" will end up being freed.
> > > 
> > > For today's guests, max_array_num is usually 1.
> > > 
> > > Best,
> > > Wei
> > I see, it's based on total ram pages. It's reasonable but might
> > get out of sync if memory is onlined quickly. So you want to
> > detect that there's more free memory than can fit and
> > retry the reporting.
> > 
> 
> 
> - AFAIK, memory hotplug isn't expected to happen during live migration
> today. Hypervisors (e.g. QEMU) explicitly forbid this.

That's a temporary limitation.

> - Allocating buffers based on total ram pages already gives some headroom
> for newly plugged memory if that could happen in any case. Also, we can
> think about why people plug in more memory - usually because the existing
> memory isn't enough, which implies that the free page list is very likely to
> be close to empty.

Or maybe because guest is expected to use more memory.

> - This method could be easily scaled if people really need more headroom for
> hot-plugged memory. For example, calculation based on "X * total_ram_pages",
> X could be a number passed from the hypervisor.

All this in place of a simple retry loop within guest?

> - This is an optimization feature, and reporting less free memory in that
> rare case doesn't hurt anything.

People working on memory hotplug can't be expected to worry about
balloon. And maintainers have other things to do than debug hard to
trigger failure reports from the field.

> 
> So I think it is good to start from a fundamental implementation, which
> doesn't confuse people, and complexities can be added when there is a real
> need in the future.
> 
> Best,
> Wei

The usefulness of the whole patchset hasn't been proven in the field yet.
The more uncovered corner cases there are, the higher the chance that
it will turn out not to be useful after all.

> 

Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-26 Thread Wei Wang

On 06/27/2018 11:58 AM, Michael S. Tsirkin wrote:

On Wed, Jun 27, 2018 at 11:00:05AM +0800, Wei Wang wrote:

On 06/27/2018 10:41 AM, Michael S. Tsirkin wrote:

On Wed, Jun 27, 2018 at 09:24:18AM +0800, Wei Wang wrote:

On 06/26/2018 09:34 PM, Michael S. Tsirkin wrote:

On Tue, Jun 26, 2018 at 08:27:44PM +0800, Wei Wang wrote:

On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:

On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:


+   if (!arrays)
+   return NULL;
+
+   for (i = 0; i < max_array_num; i++) {

So we are getting a ton of memory here just to free it up a bit later.
Why doesn't get_from_free_page_list get the pages from free list for us?
We could also avoid the 1st allocation then - just build a list
of these.

That wouldn't be a good choice for us. If we check how the regular
allocation works, there are many many things we need to consider when pages
are allocated to users.
For example, we need to take care of the nr_free
counter, we need to check the watermark and perform the related actions.
Also the folks working on arch_alloc_page to monitor page allocation
activities would get a surprise..if page allocation is allowed to work in
this way.


mm/ code is well positioned to handle all this correctly.

I'm afraid that would be a re-implementation of the alloc functions,

A re-factoring - you can share code. The main difference is locking.


and
that would be much more complex than what we have. I think your idea of
passing a list of pages is better.

Best,
Wei

How much memory is this allocating anyway?


For every 2TB memory that the guest has, we allocate 4MB.

Hmm I guess I'm missing something, I don't see it:


+   max_entries = max_free_page_blocks(ARRAY_ALLOC_ORDER);
+   entries_per_page = PAGE_SIZE / sizeof(__le64);
+   entries_per_array = entries_per_page * (1 << ARRAY_ALLOC_ORDER);
+   max_array_num = max_entries / entries_per_array +
+   !!(max_entries % entries_per_array);

Looks like you always allocate the max number?

Yes. We allocated the max number and then free what's not used.
For example, a 16TB guest, we allocate Four 4MB buffers and pass the 4
buffers to get_from_free_page_list. If it uses 3, then the remaining 1 "4MB
buffer" will end up being freed.

For today's guests, max_array_num is usually 1.

Best,
Wei

I see, it's based on total ram pages. It's reasonable but might
get out of sync if memory is onlined quickly. So you want to
detect that there's more free memory than can fit and
retry the reporting.




- AFAIK, memory hotplug isn't expected to happen during live migration 
today. Hypervisors (e.g. QEMU) explicitly forbid this.


- Allocating buffers based on total ram pages already gives some 
headroom for newly plugged memory if that could happen in any case. 
Also, we can think about why people plug in more memory - usually 
because the existing memory isn't enough, which implies that the free 
page list is very likely to be close to empty.


- This method could be easily scaled if people really need more headroom 
for hot-plugged memory. For example, calculation based on "X * 
total_ram_pages", X could be a number passed from the hypervisor.


- This is an optimization feature, and reporting less free memory in 
that rare case doesn't hurt anything.


So I think it is good to start from a fundamental implementation, which 
doesn't confuse people, and complexities can be added when there is a 
real need in the future.


Best,
Wei


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-26 Thread Michael S. Tsirkin
On Wed, Jun 27, 2018 at 11:00:05AM +0800, Wei Wang wrote:
> On 06/27/2018 10:41 AM, Michael S. Tsirkin wrote:
> > On Wed, Jun 27, 2018 at 09:24:18AM +0800, Wei Wang wrote:
> > > On 06/26/2018 09:34 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 26, 2018 at 08:27:44PM +0800, Wei Wang wrote:
> > > > > On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:
> > > > > > On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:
> > > > > > 
> > > > > > > > > + if (!arrays)
> > > > > > > > > + return NULL;
> > > > > > > > > +
> > > > > > > > > + for (i = 0; i < max_array_num; i++) {
> > > > > > > > So we are getting a ton of memory here just to free it up a bit 
> > > > > > > > later.
> > > > > > > > Why doesn't get_from_free_page_list get the pages from free 
> > > > > > > > list for us?
> > > > > > > > We could also avoid the 1st allocation then - just build a list
> > > > > > > > of these.
> > > > > > > That wouldn't be a good choice for us. If we check how the regular
> > > > > > > allocation works, there are many many things we need to consider 
> > > > > > > when pages
> > > > > > > are allocated to users.
> > > > > > > For example, we need to take care of the nr_free
> > > > > > > counter, we need to check the watermark and perform the related 
> > > > > > > actions.
> > > > > > > Also the folks working on arch_alloc_page to monitor page 
> > > > > > > allocation
> > > > > > > activities would get a surprise..if page allocation is allowed to 
> > > > > > > work in
> > > > > > > this way.
> > > > > > > 
> > > > > > mm/ code is well positioned to handle all this correctly.
> > > > > I'm afraid that would be a re-implementation of the alloc functions,
> > > > A re-factoring - you can share code. The main difference is locking.
> > > > 
> > > > > and
> > > > > that would be much more complex than what we have. I think your idea 
> > > > > of
> > > > > passing a list of pages is better.
> > > > > 
> > > > > Best,
> > > > > Wei
> > > > How much memory is this allocating anyway?
> > > > 
> > > For every 2TB memory that the guest has, we allocate 4MB.
> > Hmm I guess I'm missing something, I don't see it:
> > 
> > 
> > +   max_entries = max_free_page_blocks(ARRAY_ALLOC_ORDER);
> > +   entries_per_page = PAGE_SIZE / sizeof(__le64);
> > +   entries_per_array = entries_per_page * (1 << ARRAY_ALLOC_ORDER);
> > +   max_array_num = max_entries / entries_per_array +
> > +   !!(max_entries % entries_per_array);
> > 
> > Looks like you always allocate the max number?
> 
> Yes. We allocated the max number and then free what's not used.
> For example, a 16TB guest, we allocate Four 4MB buffers and pass the 4
> buffers to get_from_free_page_list. If it uses 3, then the remaining 1 "4MB
> buffer" will end up being freed.
> 
> For today's guests, max_array_num is usually 1.
> 
> Best,
> Wei

I see, it's based on total ram pages. It's reasonable but might
get out of sync if memory is onlined quickly. So you want to
detect that there's more free memory than can fit and
retry the reporting.

> 
> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-26 Thread Wei Wang

On 06/27/2018 10:41 AM, Michael S. Tsirkin wrote:

On Wed, Jun 27, 2018 at 09:24:18AM +0800, Wei Wang wrote:

On 06/26/2018 09:34 PM, Michael S. Tsirkin wrote:

On Tue, Jun 26, 2018 at 08:27:44PM +0800, Wei Wang wrote:

On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:

On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:


+   if (!arrays)
+   return NULL;
+
+   for (i = 0; i < max_array_num; i++) {

So we are getting a ton of memory here just to free it up a bit later.
Why doesn't get_from_free_page_list get the pages from free list for us?
We could also avoid the 1st allocation then - just build a list
of these.

That wouldn't be a good choice for us. If we check how the regular
allocation works, there are many many things we need to consider when pages
are allocated to users.
For example, we need to take care of the nr_free
counter, we need to check the watermark and perform the related actions.
Also the folks working on arch_alloc_page to monitor page allocation
activities would get a surprise..if page allocation is allowed to work in
this way.


mm/ code is well positioned to handle all this correctly.

I'm afraid that would be a re-implementation of the alloc functions,

A re-factoring - you can share code. The main difference is locking.


and
that would be much more complex than what we have. I think your idea of
passing a list of pages is better.

Best,
Wei

How much memory is this allocating anyway?


For every 2TB memory that the guest has, we allocate 4MB.

Hmm I guess I'm missing something, I don't see it:


+   max_entries = max_free_page_blocks(ARRAY_ALLOC_ORDER);
+   entries_per_page = PAGE_SIZE / sizeof(__le64);
+   entries_per_array = entries_per_page * (1 << ARRAY_ALLOC_ORDER);
+   max_array_num = max_entries / entries_per_array +
+   !!(max_entries % entries_per_array);

Looks like you always allocate the max number?


Yes. We allocated the max number and then free what's not used.
For example, a 16TB guest, we allocate Four 4MB buffers and pass the 4 
buffers to get_from_free_page_list. If it uses 3, then the remaining 1 
"4MB buffer" will end up being freed.


For today's guests, max_array_num is usually 1.

Best,
Wei





___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-26 Thread Michael S. Tsirkin
On Wed, Jun 27, 2018 at 09:24:18AM +0800, Wei Wang wrote:
> On 06/26/2018 09:34 PM, Michael S. Tsirkin wrote:
> > On Tue, Jun 26, 2018 at 08:27:44PM +0800, Wei Wang wrote:
> > > On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:
> > > > 
> > > > > > > + if (!arrays)
> > > > > > > + return NULL;
> > > > > > > +
> > > > > > > + for (i = 0; i < max_array_num; i++) {
> > > > > > So we are getting a ton of memory here just to free it up a bit 
> > > > > > later.
> > > > > > Why doesn't get_from_free_page_list get the pages from free list 
> > > > > > for us?
> > > > > > We could also avoid the 1st allocation then - just build a list
> > > > > > of these.
> > > > > That wouldn't be a good choice for us. If we check how the regular
> > > > > allocation works, there are many many things we need to consider when 
> > > > > pages
> > > > > are allocated to users.
> > > > > For example, we need to take care of the nr_free
> > > > > counter, we need to check the watermark and perform the related 
> > > > > actions.
> > > > > Also the folks working on arch_alloc_page to monitor page allocation
> > > > > activities would get a surprise..if page allocation is allowed to 
> > > > > work in
> > > > > this way.
> > > > > 
> > > > mm/ code is well positioned to handle all this correctly.
> > > I'm afraid that would be a re-implementation of the alloc functions,
> > A re-factoring - you can share code. The main difference is locking.
> > 
> > > and
> > > that would be much more complex than what we have. I think your idea of
> > > passing a list of pages is better.
> > > 
> > > Best,
> > > Wei
> > How much memory is this allocating anyway?
> > 
> 
> For every 2TB memory that the guest has, we allocate 4MB.

Hmm I guess I'm missing something, I don't see it:


+   max_entries = max_free_page_blocks(ARRAY_ALLOC_ORDER);
+   entries_per_page = PAGE_SIZE / sizeof(__le64);
+   entries_per_array = entries_per_page * (1 << ARRAY_ALLOC_ORDER);
+   max_array_num = max_entries / entries_per_array +
+   !!(max_entries % entries_per_array);

Looks like you always allocate the max number?


> This is the same
> for both cases.
> For today's guests, usually there will be only one 4MB allocated and passed
> to get_from_free_page_list.
> 
> Best,
> Wei
> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-26 Thread Wei Wang

On 06/26/2018 09:34 PM, Michael S. Tsirkin wrote:

On Tue, Jun 26, 2018 at 08:27:44PM +0800, Wei Wang wrote:

On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:

On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:


+   if (!arrays)
+   return NULL;
+
+   for (i = 0; i < max_array_num; i++) {

So we are getting a ton of memory here just to free it up a bit later.
Why doesn't get_from_free_page_list get the pages from free list for us?
We could also avoid the 1st allocation then - just build a list
of these.

That wouldn't be a good choice for us. If we check how the regular
allocation works, there are many many things we need to consider when pages
are allocated to users.
For example, we need to take care of the nr_free
counter, we need to check the watermark and perform the related actions.
Also the folks working on arch_alloc_page to monitor page allocation
activities would get a surprise..if page allocation is allowed to work in
this way.


mm/ code is well positioned to handle all this correctly.

I'm afraid that would be a re-implementation of the alloc functions,

A re-factoring - you can share code. The main difference is locking.


and
that would be much more complex than what we have. I think your idea of
passing a list of pages is better.

Best,
Wei

How much memory is this allocating anyway?



For every 2TB memory that the guest has, we allocate 4MB. This is the 
same for both cases.
For today's guests, usually there will be only one 4MB allocated and 
passed to get_from_free_page_list.


Best,
Wei



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-26 Thread Michael S. Tsirkin
On Tue, Jun 26, 2018 at 08:27:44PM +0800, Wei Wang wrote:
> On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:
> > On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:
> > 
> 
> > > 
> > > > 
> > > > > + if (!arrays)
> > > > > + return NULL;
> > > > > +
> > > > > + for (i = 0; i < max_array_num; i++) {
> > > > So we are getting a ton of memory here just to free it up a bit later.
> > > > Why doesn't get_from_free_page_list get the pages from free list for us?
> > > > We could also avoid the 1st allocation then - just build a list
> > > > of these.
> > > That wouldn't be a good choice for us. If we check how the regular
> > > allocation works, there are many many things we need to consider when 
> > > pages
> > > are allocated to users.
> > > For example, we need to take care of the nr_free
> > > counter, we need to check the watermark and perform the related actions.
> > > Also the folks working on arch_alloc_page to monitor page allocation
> > > activities would get a surprise..if page allocation is allowed to work in
> > > this way.
> > > 
> > mm/ code is well positioned to handle all this correctly.
> 
> I'm afraid that would be a re-implementation of the alloc functions,

A re-factoring - you can share code. The main difference is locking.

> and
> that would be much more complex than what we have. I think your idea of
> passing a list of pages is better.
> 
> Best,
> Wei

How much memory is this allocating anyway?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-26 Thread Wei Wang

On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:

On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:








+   if (!arrays)
+   return NULL;
+
+   for (i = 0; i < max_array_num; i++) {

So we are getting a ton of memory here just to free it up a bit later.
Why doesn't get_from_free_page_list get the pages from free list for us?
We could also avoid the 1st allocation then - just build a list
of these.

That wouldn't be a good choice for us. If we check how the regular
allocation works, there are many many things we need to consider when pages
are allocated to users.
For example, we need to take care of the nr_free
counter, we need to check the watermark and perform the related actions.
Also the folks working on arch_alloc_page to monitor page allocation
activities would get a surprise..if page allocation is allowed to work in
this way.


mm/ code is well positioned to handle all this correctly.


I'm afraid that would be a re-implementation of the alloc functions, and 
that would be much more complex than what we have. I think your idea of 
passing a list of pages is better.


Best,
Wei






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-25 Thread Michael S. Tsirkin
On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:
> On 06/26/2018 09:37 AM, Michael S. Tsirkin wrote:
> > On Mon, Jun 25, 2018 at 08:05:10PM +0800, Wei Wang wrote:
> > 
> > > @@ -326,17 +353,6 @@ static void stats_handle_request(struct 
> > > virtio_balloon *vb)
> > >   virtqueue_kick(vq);
> > >   }
> > > -static void virtballoon_changed(struct virtio_device *vdev)
> > > -{
> > > - struct virtio_balloon *vb = vdev->priv;
> > > - unsigned long flags;
> > > -
> > > - spin_lock_irqsave(>stop_update_lock, flags);
> > > - if (!vb->stop_update)
> > > - queue_work(system_freezable_wq, >update_balloon_size_work);
> > > - spin_unlock_irqrestore(>stop_update_lock, flags);
> > > -}
> > > -
> > >   static inline s64 towards_target(struct virtio_balloon *vb)
> > >   {
> > >   s64 target;
> > > @@ -353,6 +369,35 @@ static inline s64 towards_target(struct 
> > > virtio_balloon *vb)
> > >   return target - vb->num_pages;
> > >   }
> > > +static void virtballoon_changed(struct virtio_device *vdev)
> > > +{
> > > + struct virtio_balloon *vb = vdev->priv;
> > > + unsigned long flags;
> > > + s64 diff = towards_target(vb);
> > > +
> > > + if (diff) {
> > > + spin_lock_irqsave(>stop_update_lock, flags);
> > > + if (!vb->stop_update)
> > > + queue_work(system_freezable_wq,
> > > +>update_balloon_size_work);
> > > + spin_unlock_irqrestore(>stop_update_lock, flags);
> > > + }
> > > +
> > > + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> > > + virtio_cread(vdev, struct virtio_balloon_config,
> > > +  free_page_report_cmd_id, >cmd_id_received);
> > > + if (vb->cmd_id_received !=
> > > + VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID &&
> > > + vb->cmd_id_received != vb->cmd_id_active) {
> > > + spin_lock_irqsave(>stop_update_lock, flags);
> > > + if (!vb->stop_update)
> > > + queue_work(vb->balloon_wq,
> > > +>report_free_page_work);
> > > + spin_unlock_irqrestore(>stop_update_lock, flags);
> > > + }
> > > + }
> > > +}
> > > +
> > >   static void update_balloon_size(struct virtio_balloon *vb)
> > >   {
> > >   u32 actual = vb->num_pages;
> > > @@ -425,44 +470,253 @@ static void update_balloon_size_func(struct 
> > > work_struct *work)
> > >   queue_work(system_freezable_wq, work);
> > >   }
> > > +static void free_page_vq_cb(struct virtqueue *vq)
> > > +{
> > > + unsigned int len;
> > > + void *buf;
> > > + struct virtio_balloon *vb = vq->vdev->priv;
> > > +
> > > + while (1) {
> > > + buf = virtqueue_get_buf(vq, );
> > > +
> > > + if (!buf || buf == >cmd_start || buf == >cmd_stop)
> > > + break;
> > If there's any buffer after this one we might never get another
> > callback.
> 
> I think every used buffer can get the callback, because host takes from the
> arrays one by one, and puts back each with a vq notify.

It's probabky racy even in this case. Besides, host is free to do it in
any way that's legal in spec.

> 
> 
> > > + free_pages((unsigned long)buf, ARRAY_ALLOC_ORDER);
> > > + }
> > > +}
> > > +
> > >   static int init_vqs(struct virtio_balloon *vb)
> > >   {
> > > - struct virtqueue *vqs[3];
> > > - vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
> > > };
> > > - static const char * const names[] = { "inflate", "deflate", "stats" };
> > > - int err, nvqs;
> > > + struct virtqueue *vqs[VIRTIO_BALLOON_VQ_MAX];
> > > + vq_callback_t *callbacks[VIRTIO_BALLOON_VQ_MAX];
> > > + const char *names[VIRTIO_BALLOON_VQ_MAX];
> > > + struct scatterlist sg;
> > > + int ret;
> > >   /*
> > > -  * We expect two virtqueues: inflate and deflate, and
> > > -  * optionally stat.
> > > +  * Inflateq and deflateq are used unconditionally. The names[]
> > > +  * will be NULL if the related feature is 

Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-25 Thread Wei Wang

On 06/26/2018 09:37 AM, Michael S. Tsirkin wrote:

On Mon, Jun 25, 2018 at 08:05:10PM +0800, Wei Wang wrote:


@@ -326,17 +353,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
  }
  
-static void virtballoon_changed(struct virtio_device *vdev)

-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
  static inline s64 towards_target(struct virtio_balloon *vb)
  {
s64 target;
@@ -353,6 +369,35 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
  }
  
+static void virtballoon_changed(struct virtio_device *vdev)

+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID &&
+   vb->cmd_id_received != vb->cmd_id_active) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  >report_free_page_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+   }
+}
+
  static void update_balloon_size(struct virtio_balloon *vb)
  {
u32 actual = vb->num_pages;
@@ -425,44 +470,253 @@ static void update_balloon_size_func(struct work_struct 
*work)
queue_work(system_freezable_wq, work);
  }
  
+static void free_page_vq_cb(struct virtqueue *vq)

+{
+   unsigned int len;
+   void *buf;
+   struct virtio_balloon *vb = vq->vdev->priv;
+
+   while (1) {
+   buf = virtqueue_get_buf(vq, );
+
+   if (!buf || buf == >cmd_start || buf == >cmd_stop)
+   break;

If there's any buffer after this one we might never get another
callback.


I think every used buffer can get the callback, because host takes from 
the arrays one by one, and puts back each with a vq notify.





+   free_pages((unsigned long)buf, ARRAY_ALLOC_ORDER);
+   }
+}
+
  static int init_vqs(struct virtio_balloon *vb)
  {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-   static const char * const names[] = { "inflate", "deflate", "stats" };
-   int err, nvqs;
+   struct virtqueue *vqs[VIRTIO_BALLOON_VQ_MAX];
+   vq_callback_t *callbacks[VIRTIO_BALLOON_VQ_MAX];
+   const char *names[VIRTIO_BALLOON_VQ_MAX];
+   struct scatterlist sg;
+   int ret;
  
  	/*

-* We expect two virtqueues: inflate and deflate, and
-* optionally stat.
+* Inflateq and deflateq are used unconditionally. The names[]
+* will be NULL if the related feature is not enabled, which will
+* cause no allocation for the corresponding virtqueue in find_vqs.
 */
-   nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
-   err = virtio_find_vqs(vb->vdev, nvqs, vqs, callbacks, names, NULL);
-   if (err)
-   return err;
+   callbacks[VIRTIO_BALLOON_VQ_INFLATE] = balloon_ack;
+   names[VIRTIO_BALLOON_VQ_INFLATE] = "inflate";
+   callbacks[VIRTIO_BALLOON_VQ_DEFLATE] = balloon_ack;
+   names[VIRTIO_BALLOON_VQ_DEFLATE] = "deflate";
+   names[VIRTIO_BALLOON_VQ_STATS] = NULL;
+   names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
  
-	vb->inflate_vq = vqs[0];

-   vb->deflate_vq = vqs[1];
if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
-   struct scatterlist sg;
-   unsigned int num_stats;
-   vb->stats_vq = vqs[2];
+   names[VIRTIO_BALLOON_VQ_STATS] = "stats";
+   callbacks[VIRTIO_BALLOON_VQ_STATS] = stats_request;
+   }
  
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {

+   names[VIRTIO_BALLOON_VQ_FREE_PAGE] = "free_page_vq";
+   callbacks[VIRTIO_BALLOON_

Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-25 Thread Michael S. Tsirkin
On Mon, Jun 25, 2018 at 08:05:10PM +0800, Wei Wang wrote:
> Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
> support of reporting hints of guest free pages to host via virtio-balloon.
> 
> Host requests the guest to report free page hints by sending a new cmd id
> to the guest via the free_page_report_cmd_id configuration register.
> 
> As the first step here, virtio-balloon only reports free page hints from
> the max order (i.e. 10) free page list to host. This has generated similar
> good results as reporting all free page hints during our tests.
> 
> When the guest starts to report, it first sends a start cmd to host via
> the free page vq, which acks to host the cmd id received, and tells it the
> hint size (e.g. 4MB each on x86). When the guest finishes the reporting,
> a stop cmd is sent to host via the vq.
> 
> TODO:
> - support reporting free page hints from smaller order free page lists
>   when there is a need/request from users.
> 
> Signed-off-by: Wei Wang 
> Signed-off-by: Liang Li 
> Cc: Michael S. Tsirkin 
> Cc: Michal Hocko 
> Cc: Andrew Morton 
> ---
>  drivers/virtio/virtio_balloon.c | 347 
> 
>  include/uapi/linux/virtio_balloon.h |  11 ++
>  2 files changed, 322 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 6b237e3..d05f0ba 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -43,6 +43,11 @@
>  #define OOM_VBALLOON_DEFAULT_PAGES 256
>  #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
>  
> +/* The order used to allocate an array to load free page hints */
> +#define ARRAY_ALLOC_ORDER (MAX_ORDER - 1)
> +/* The size of an array in bytes */
> +#define ARRAY_ALLOC_SIZE ((1 << ARRAY_ALLOC_ORDER) << PAGE_SHIFT)
> +


Pls prefix macros so we can figure out they are local ones.

>  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
>  module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> @@ -51,9 +56,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  static struct vfsmount *balloon_mnt;
>  #endif
>  
> +enum virtio_balloon_vq {
> + VIRTIO_BALLOON_VQ_INFLATE,
> + VIRTIO_BALLOON_VQ_DEFLATE,
> + VIRTIO_BALLOON_VQ_STATS,
> + VIRTIO_BALLOON_VQ_FREE_PAGE,
> + VIRTIO_BALLOON_VQ_MAX
> +};
> +
>  struct virtio_balloon {
>   struct virtio_device *vdev;
> - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> +
> + /* Balloon's own wq for cpu-intensive work items */
> + struct workqueue_struct *balloon_wq;
> + /* The free page reporting work item submitted to the balloon wq */
> + struct work_struct report_free_page_work;
>  
>   /* The balloon servicing is delegated to a freezable workqueue. */
>   struct work_struct update_balloon_stats_work;
> @@ -63,6 +81,15 @@ struct virtio_balloon {
>   spinlock_t stop_update_lock;
>   bool stop_update;
>  
> + /* Command buffers to start and stop the reporting of hints to host */
> + struct virtio_balloon_free_page_hints_cmd cmd_start;
> + struct virtio_balloon_free_page_hints_cmd cmd_stop;
> +
> + /* The cmd id received from host */
> + uint32_t cmd_id_received;
> + /* The cmd id that is actively in use */
> + uint32_t cmd_id_active;
> +
>   /* Waiting for host to ack the pages we released. */
>   wait_queue_head_t acked;
>  

You want u32 types.

> @@ -326,17 +353,6 @@ static void stats_handle_request(struct virtio_balloon 
> *vb)
>   virtqueue_kick(vq);
>  }
>  
> -static void virtballoon_changed(struct virtio_device *vdev)
> -{
> - struct virtio_balloon *vb = vdev->priv;
> - unsigned long flags;
> -
> - spin_lock_irqsave(>stop_update_lock, flags);
> - if (!vb->stop_update)
> - queue_work(system_freezable_wq, >update_balloon_size_work);
> - spin_unlock_irqrestore(>stop_update_lock, flags);
> -}
> -
>  static inline s64 towards_target(struct virtio_balloon *vb)
>  {
>   s64 target;
> @@ -353,6 +369,35 @@ static inline s64 towards_target(struct virtio_balloon 
> *vb)
>   return target - vb->num_pages;
>  }
>  
> +static void virtballoon_changed(struct virtio_device *vdev)
> +{
> + struct virtio_balloon *vb = vdev->priv;
> + unsigned long flags;
> + s64 diff = towards_target(vb);
> +
> + if (diff) {
> + spin_lock_irqsave(>stop_update_lock, flags);
> + if (!vb->stop

[PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-25 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd id
to the guest via the free_page_report_cmd_id configuration register.

As the first step here, virtio-balloon only reports free page hints from
the max order (i.e. 10) free page list to host. This has generated similar
good results as reporting all free page hints during our tests.

When the guest starts to report, it first sends a start cmd to host via
the free page vq, which acks to host the cmd id received, and tells it the
hint size (e.g. 4MB each on x86). When the guest finishes the reporting,
a stop cmd is sent to host via the vq.

TODO:
- support reporting free page hints from smaller order free page lists
  when there is a need/request from users.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
---
 drivers/virtio/virtio_balloon.c | 347 
 include/uapi/linux/virtio_balloon.h |  11 ++
 2 files changed, 322 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 6b237e3..d05f0ba 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -43,6 +43,11 @@
 #define OOM_VBALLOON_DEFAULT_PAGES 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
+/* The order used to allocate an array to load free page hints */
+#define ARRAY_ALLOC_ORDER (MAX_ORDER - 1)
+/* The size of an array in bytes */
+#define ARRAY_ALLOC_SIZE ((1 << ARRAY_ALLOC_ORDER) << PAGE_SHIFT)
+
 static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
 module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
@@ -51,9 +56,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +81,15 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* Command buffers to start and stop the reporting of hints to host */
+   struct virtio_balloon_free_page_hints_cmd cmd_start;
+   struct virtio_balloon_free_page_hints_cmd cmd_stop;
+
+   /* The cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is actively in use */
+   uint32_t cmd_id_active;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -326,17 +353,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -353,6 +369,35 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID &&
+   vb->cmd_id_received != vb->cmd_id_active) {
+   spin

Re: [virtio-dev] Re: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-20 Thread Michael S. Tsirkin
On Wed, Jun 20, 2018 at 09:11:39AM +, Wang, Wei W wrote:
> On Tuesday, June 19, 2018 10:43 PM, Michael S. Tsirk wrote:
> > On Tue, Jun 19, 2018 at 08:13:37PM +0800, Wei Wang wrote:
> > > On 06/19/2018 11:05 AM, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 19, 2018 at 01:06:48AM +, Wang, Wei W wrote:
> > > > > On Monday, June 18, 2018 10:29 AM, Michael S. Tsirkin wrote:
> > > > > > On Sat, Jun 16, 2018 at 01:09:44AM +, Wang, Wei W wrote:
> > > > > > > Not necessarily, I think. We have min(4m_page_blocks / 512,
> > > > > > > 1024) above,
> > > > > > so the maximum memory that can be reported is 2TB. For larger
> > guests, e.g.
> > > > > > 4TB, the optimization can still offer 2TB free memory (better
> > > > > > than no optimization).
> > > > > >
> > > > > > Maybe it's better, maybe it isn't. It certainly muddies the waters 
> > > > > > even
> > more.
> > > > > > I'd rather we had a better plan. From that POV I like what
> > > > > > Matthew Wilcox suggested for this which is to steal the necessary # 
> > > > > > of
> > entries off the list.
> > > > > Actually what Matthew suggested doesn't make a difference here.
> > > > > That method always steal the first free page blocks, and sure can
> > > > > be changed to take more. But all these can be achieved via kmalloc
> > > > I'd do get_user_pages really. You don't want pages split, etc.
> > 
> > Oops sorry. I meant get_free_pages .
> 
> Yes, we can use __get_free_pages, and the max allocation is MAX_ORDER - 1, 
> which can report up to 2TB free memory. 
> 
> "getting two pages isn't harder", do you mean passing two arrays (two 
> allocations by get_free_pages(,MAX_ORDER -1)) to the mm API?

Yes, or generally a list of pages with as many as needed.


> Please see if the following logic aligns to what you think:
> 
> uint32_t i, max_hints, hints_per_page, hints_per_array, total_arrays;
> unsigned long *arrays;
>  
>  /*
>  * Each array size is MAX_ORDER_NR_PAGES. If one array is not enough 
> to
>  * store all the hints, we need to allocate multiple arrays.
>  * max_hints: the max number of 4MB free page blocks
>  * hints_per_page: the number of hints each page can store
>  * hints_per_array: the number of hints an array can store
>  * total_arrays: the number of arrays we need
>  */
> max_hints = totalram_pages / MAX_ORDER_NR_PAGES;
> hints_per_page = PAGE_SIZE / sizeof(__le64);
> hints_per_array = hints_per_page * MAX_ORDER_NR_PAGES;
> total_arrays = max_hints /  hints_per_array +
>!!(max_hints % hints_per_array);
> arrays = kmalloc(total_arrays * sizeof(unsigned long), GFP_KERNEL);
> for (i = 0; i < total_arrays; i++) {
> arrays[i] = __get_free_pages(__GFP_ATOMIC | __GFP_NOMEMALLOC, 
> MAX_ORDER - 1);
> 
>   if (!arrays[i])
>   goto out;
> }
> 
> 
> - the mm API needs to be changed to support storing hints to multiple 
> separated arrays offered by the caller.
> 
> Best,
> Wei

Yes. And add an API to just count entries so we know how many arrays to 
allocate.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [virtio-dev] Re: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-20 Thread Wang, Wei W
On Tuesday, June 19, 2018 10:43 PM, Michael S. Tsirk wrote:
> On Tue, Jun 19, 2018 at 08:13:37PM +0800, Wei Wang wrote:
> > On 06/19/2018 11:05 AM, Michael S. Tsirkin wrote:
> > > On Tue, Jun 19, 2018 at 01:06:48AM +, Wang, Wei W wrote:
> > > > On Monday, June 18, 2018 10:29 AM, Michael S. Tsirkin wrote:
> > > > > On Sat, Jun 16, 2018 at 01:09:44AM +, Wang, Wei W wrote:
> > > > > > Not necessarily, I think. We have min(4m_page_blocks / 512,
> > > > > > 1024) above,
> > > > > so the maximum memory that can be reported is 2TB. For larger
> guests, e.g.
> > > > > 4TB, the optimization can still offer 2TB free memory (better
> > > > > than no optimization).
> > > > >
> > > > > Maybe it's better, maybe it isn't. It certainly muddies the waters 
> > > > > even
> more.
> > > > > I'd rather we had a better plan. From that POV I like what
> > > > > Matthew Wilcox suggested for this which is to steal the necessary # of
> entries off the list.
> > > > Actually what Matthew suggested doesn't make a difference here.
> > > > That method always steal the first free page blocks, and sure can
> > > > be changed to take more. But all these can be achieved via kmalloc
> > > I'd do get_user_pages really. You don't want pages split, etc.
> 
> Oops sorry. I meant get_free_pages .

Yes, we can use __get_free_pages, and the max allocation is MAX_ORDER - 1, 
which can report up to 2TB free memory. 

"getting two pages isn't harder", do you mean passing two arrays (two 
allocations by get_free_pages(,MAX_ORDER -1)) to the mm API?

Please see if the following logic aligns to what you think:

uint32_t i, max_hints, hints_per_page, hints_per_array, total_arrays;
unsigned long *arrays;
 
 /*
 * Each array size is MAX_ORDER_NR_PAGES. If one array is not enough to
 * store all the hints, we need to allocate multiple arrays.
 * max_hints: the max number of 4MB free page blocks
 * hints_per_page: the number of hints each page can store
 * hints_per_array: the number of hints an array can store
 * total_arrays: the number of arrays we need
 */
max_hints = totalram_pages / MAX_ORDER_NR_PAGES;
hints_per_page = PAGE_SIZE / sizeof(__le64);
hints_per_array = hints_per_page * MAX_ORDER_NR_PAGES;
total_arrays = max_hints /  hints_per_array +
   !!(max_hints % hints_per_array);
arrays = kmalloc(total_arrays * sizeof(unsigned long), GFP_KERNEL);
for (i = 0; i < total_arrays; i++) {
arrays[i] = __get_free_pages(__GFP_ATOMIC | __GFP_NOMEMALLOC, 
MAX_ORDER - 1);

  if (!arrays[i])
  goto out;
}


- the mm API needs to be changed to support storing hints to multiple separated 
arrays offered by the caller.

Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-19 Thread Michael S. Tsirkin
On Tue, Jun 19, 2018 at 08:13:37PM +0800, Wei Wang wrote:
> On 06/19/2018 11:05 AM, Michael S. Tsirkin wrote:
> > On Tue, Jun 19, 2018 at 01:06:48AM +, Wang, Wei W wrote:
> > > On Monday, June 18, 2018 10:29 AM, Michael S. Tsirkin wrote:
> > > > On Sat, Jun 16, 2018 at 01:09:44AM +, Wang, Wei W wrote:
> > > > > Not necessarily, I think. We have min(4m_page_blocks / 512, 1024) 
> > > > > above,
> > > > so the maximum memory that can be reported is 2TB. For larger guests, 
> > > > e.g.
> > > > 4TB, the optimization can still offer 2TB free memory (better than no
> > > > optimization).
> > > > 
> > > > Maybe it's better, maybe it isn't. It certainly muddies the waters even 
> > > > more.
> > > > I'd rather we had a better plan. From that POV I like what Matthew 
> > > > Wilcox
> > > > suggested for this which is to steal the necessary # of entries off the 
> > > > list.
> > > Actually what Matthew suggested doesn't make a difference here. That 
> > > method always steal the first free page blocks, and sure can be changed 
> > > to take more. But all these can be achieved via kmalloc
> > I'd do get_user_pages really. You don't want pages split, etc.

Oops sorry. I meant get_free_pages .

> 
> > > by the caller which is more prudent and makes the code more 
> > > straightforward. I think we don't need to take that risk unless the MM 
> > > folks strongly endorse that approach.
> > > 
> > > The max size of the kmalloc-ed memory is 4MB, which gives us the 
> > > limitation that the max free memory to report is 2TB. Back to the 
> > > motivation of this work, the cloud guys want to use this optimization to 
> > > accelerate their guest live migration. 2TB guests are not common in 
> > > today's clouds. When huge guests become common in the future, we can 
> > > easily tweak this API to fill hints into scattered buffer (e.g. several 
> > > 4MB arrays passed to this API) instead of one as in this version.
> > > 
> > > This limitation doesn't cause any issue from functionality perspective. 
> > > For the extreme case like a 100TB guest live migration which is 
> > > theoretically possible today, this optimization helps skip 2TB of its 
> > > free memory. This result is that it may reduce only 2% live migration 
> > > time, but still better than not skipping the 2TB (if not using the 
> > > feature).
> > Not clearly better, no, since you are slowing the guest.
> 
> Not really. Live migration slows down the guest itself. It seems that the
> guest spends a little extra time reporting free pages, but in return the
> live migration time gets reduced a lot, which makes the guest endure less
> from live migration. (there is no drop of the workload performance when
> using the optimization in the tests)

My point was you can't say what is better without measuring.
Without special limitations you have hint overhead vs migration
overhead. I think we need to  build to scale to huge guests.
We might discover scalability problems down the road,
but no sense in building in limitations straight away.

> 
> 
> > 
> > 
> > > So, for the first release of this feature, I think it is better to have 
> > > the simpler and more straightforward solution as we have now, and clearly 
> > > document why it can report up to 2TB free memory.
> > No one has the time to read documentation about how an internal flag
> > within a device works. Come on, getting two pages isn't much harder
> > than a single one.
> 
> > > > If that doesn't fly, we can allocate out of the loop and just retry 
> > > > with more
> > > > pages.
> > > > 
> > > > > On the other hand, large guests being large mostly because the guests 
> > > > > need
> > > > to use large memory. In that case, they usually won't have that much 
> > > > free
> > > > memory to report.
> > > > 
> > > > And following this logic small guests don't have a lot of memory to 
> > > > report at
> > > > all.
> > > > Could you remind me why are we considering this optimization then?
> > > If there is a 3TB guest, it is 3TB not 2TB mostly because it would need 
> > > to use e.g. 2.5TB memory from time to time. In the worst case, it only 
> > > has 0.5TB free memory to report, but reporting 0.5TB with this 
> > > optimization is better than no optimization. (and the current 2TB 
> > > limitation isn't a limitation for the 3TB guest in this case)
> > I'd rather not spend time writing up random limitations.
> 
> This is not a random limitation. It would be more clear to see the code.

Users don't see code though, that's the point.

Exporting internal limitations from code to users isn't great.


> Also I'm not sure how get_user_pages could be used in our case, and what you
> meant by "getting two pages". I'll post out a new version, and we can
> discuss on the code.

Sorry, I meant get_free_pages.

> 
> Best,
> Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-19 Thread Wei Wang

On 06/19/2018 11:05 AM, Michael S. Tsirkin wrote:

On Tue, Jun 19, 2018 at 01:06:48AM +, Wang, Wei W wrote:

On Monday, June 18, 2018 10:29 AM, Michael S. Tsirkin wrote:

On Sat, Jun 16, 2018 at 01:09:44AM +, Wang, Wei W wrote:

Not necessarily, I think. We have min(4m_page_blocks / 512, 1024) above,

so the maximum memory that can be reported is 2TB. For larger guests, e.g.
4TB, the optimization can still offer 2TB free memory (better than no
optimization).

Maybe it's better, maybe it isn't. It certainly muddies the waters even more.
I'd rather we had a better plan. From that POV I like what Matthew Wilcox
suggested for this which is to steal the necessary # of entries off the list.

Actually what Matthew suggested doesn't make a difference here. That method 
always steal the first free page blocks, and sure can be changed to take more. 
But all these can be achieved via kmalloc

I'd do get_user_pages really. You don't want pages split, etc.



by the caller which is more prudent and makes the code more straightforward. I 
think we don't need to take that risk unless the MM folks strongly endorse that 
approach.

The max size of the kmalloc-ed memory is 4MB, which gives us the limitation 
that the max free memory to report is 2TB. Back to the motivation of this work, 
the cloud guys want to use this optimization to accelerate their guest live 
migration. 2TB guests are not common in today's clouds. When huge guests become 
common in the future, we can easily tweak this API to fill hints into scattered 
buffer (e.g. several 4MB arrays passed to this API) instead of one as in this 
version.

This limitation doesn't cause any issue from functionality perspective. For the 
extreme case like a 100TB guest live migration which is theoretically possible 
today, this optimization helps skip 2TB of its free memory. This result is that 
it may reduce only 2% live migration time, but still better than not skipping 
the 2TB (if not using the feature).

Not clearly better, no, since you are slowing the guest.


Not really. Live migration slows down the guest itself. It seems that 
the guest spends a little extra time reporting free pages, but in return 
the live migration time gets reduced a lot, which makes the guest endure 
less from live migration. (there is no drop of the workload performance 
when using the optimization in the tests)








So, for the first release of this feature, I think it is better to have the 
simpler and more straightforward solution as we have now, and clearly document 
why it can report up to 2TB free memory.

No one has the time to read documentation about how an internal flag
within a device works. Come on, getting two pages isn't much harder
than a single one.


  

If that doesn't fly, we can allocate out of the loop and just retry with more
pages.


On the other hand, large guests being large mostly because the guests need

to use large memory. In that case, they usually won't have that much free
memory to report.

And following this logic small guests don't have a lot of memory to report at
all.
Could you remind me why are we considering this optimization then?

If there is a 3TB guest, it is 3TB not 2TB mostly because it would need to use 
e.g. 2.5TB memory from time to time. In the worst case, it only has 0.5TB free 
memory to report, but reporting 0.5TB with this optimization is better than no 
optimization. (and the current 2TB limitation isn't a limitation for the 3TB 
guest in this case)

I'd rather not spend time writing up random limitations.


This is not a random limitation. It would be more clear to see the code. 
Also I'm not sure how get_user_pages could be used in our case, and what 
you meant by "getting two pages". I'll post out a new version, and we 
can discuss on the code.



Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-18 Thread Michael S. Tsirkin
On Tue, Jun 19, 2018 at 01:06:48AM +, Wang, Wei W wrote:
> On Monday, June 18, 2018 10:29 AM, Michael S. Tsirkin wrote:
> > On Sat, Jun 16, 2018 at 01:09:44AM +, Wang, Wei W wrote:
> > > Not necessarily, I think. We have min(4m_page_blocks / 512, 1024) above,
> > so the maximum memory that can be reported is 2TB. For larger guests, e.g.
> > 4TB, the optimization can still offer 2TB free memory (better than no
> > optimization).
> > 
> > Maybe it's better, maybe it isn't. It certainly muddies the waters even 
> > more.
> > I'd rather we had a better plan. From that POV I like what Matthew Wilcox
> > suggested for this which is to steal the necessary # of entries off the 
> > list.
> 
> Actually what Matthew suggested doesn't make a difference here. That method 
> always steal the first free page blocks, and sure can be changed to take 
> more. But all these can be achieved via kmalloc

I'd do get_user_pages really. You don't want pages split, etc.

> by the caller which is more prudent and makes the code more straightforward. 
> I think we don't need to take that risk unless the MM folks strongly endorse 
> that approach.
> 
> The max size of the kmalloc-ed memory is 4MB, which gives us the limitation 
> that the max free memory to report is 2TB. Back to the motivation of this 
> work, the cloud guys want to use this optimization to accelerate their guest 
> live migration. 2TB guests are not common in today's clouds. When huge guests 
> become common in the future, we can easily tweak this API to fill hints into 
> scattered buffer (e.g. several 4MB arrays passed to this API) instead of one 
> as in this version.
> 
> This limitation doesn't cause any issue from functionality perspective. For 
> the extreme case like a 100TB guest live migration which is theoretically 
> possible today, this optimization helps skip 2TB of its free memory. This 
> result is that it may reduce only 2% live migration time, but still better 
> than not skipping the 2TB (if not using the feature).

Not clearly better, no, since you are slowing the guest.


> So, for the first release of this feature, I think it is better to have the 
> simpler and more straightforward solution as we have now, and clearly 
> document why it can report up to 2TB free memory.

No one has the time to read documentation about how an internal flag
within a device works. Come on, getting two pages isn't much harder
than a single one.

> 
>  
> > If that doesn't fly, we can allocate out of the loop and just retry with 
> > more
> > pages.
> > 
> > > On the other hand, large guests being large mostly because the guests need
> > to use large memory. In that case, they usually won't have that much free
> > memory to report.
> > 
> > And following this logic small guests don't have a lot of memory to report 
> > at
> > all.
> > Could you remind me why are we considering this optimization then?
> 
> If there is a 3TB guest, it is 3TB not 2TB mostly because it would need to 
> use e.g. 2.5TB memory from time to time. In the worst case, it only has 0.5TB 
> free memory to report, but reporting 0.5TB with this optimization is better 
> than no optimization. (and the current 2TB limitation isn't a limitation for 
> the 3TB guest in this case)

I'd rather not spend time writing up random limitations.


> Best,
> Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [virtio-dev] Re: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-18 Thread Wang, Wei W
On Monday, June 18, 2018 10:29 AM, Michael S. Tsirkin wrote:
> On Sat, Jun 16, 2018 at 01:09:44AM +, Wang, Wei W wrote:
> > Not necessarily, I think. We have min(4m_page_blocks / 512, 1024) above,
> so the maximum memory that can be reported is 2TB. For larger guests, e.g.
> 4TB, the optimization can still offer 2TB free memory (better than no
> optimization).
> 
> Maybe it's better, maybe it isn't. It certainly muddies the waters even more.
> I'd rather we had a better plan. From that POV I like what Matthew Wilcox
> suggested for this which is to steal the necessary # of entries off the list.

Actually what Matthew suggested doesn't make a difference here. That method 
always steal the first free page blocks, and sure can be changed to take more. 
But all these can be achieved via kmalloc by the caller which is more prudent 
and makes the code more straightforward. I think we don't need to take that 
risk unless the MM folks strongly endorse that approach.

The max size of the kmalloc-ed memory is 4MB, which gives us the limitation 
that the max free memory to report is 2TB. Back to the motivation of this work, 
the cloud guys want to use this optimization to accelerate their guest live 
migration. 2TB guests are not common in today's clouds. When huge guests become 
common in the future, we can easily tweak this API to fill hints into scattered 
buffer (e.g. several 4MB arrays passed to this API) instead of one as in this 
version.

This limitation doesn't cause any issue from functionality perspective. For the 
extreme case like a 100TB guest live migration which is theoretically possible 
today, this optimization helps skip 2TB of its free memory. This result is that 
it may reduce only 2% live migration time, but still better than not skipping 
the 2TB (if not using the feature).

So, for the first release of this feature, I think it is better to have the 
simpler and more straightforward solution as we have now, and clearly document 
why it can report up to 2TB free memory.


 
> If that doesn't fly, we can allocate out of the loop and just retry with more
> pages.
> 
> > On the other hand, large guests being large mostly because the guests need
> to use large memory. In that case, they usually won't have that much free
> memory to report.
> 
> And following this logic small guests don't have a lot of memory to report at
> all.
> Could you remind me why are we considering this optimization then?

If there is a 3TB guest, it is 3TB not 2TB mostly because it would need to use 
e.g. 2.5TB memory from time to time. In the worst case, it only has 0.5TB free 
memory to report, but reporting 0.5TB with this optimization is better than no 
optimization. (and the current 2TB limitation isn't a limitation for the 3TB 
guest in this case)

Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-17 Thread Michael S. Tsirkin
On Sat, Jun 16, 2018 at 01:09:44AM +, Wang, Wei W wrote:
> On Friday, June 15, 2018 10:29 PM, Michael S. Tsirkin wrote:
> > On Fri, Jun 15, 2018 at 02:11:23PM +, Wang, Wei W wrote:
> > > On Friday, June 15, 2018 7:42 PM, Michael S. Tsirkin wrote:
> > > > On Fri, Jun 15, 2018 at 12:43:11PM +0800, Wei Wang wrote:
> > > > > Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature
> > > > > indicates the support of reporting hints of guest free pages to host 
> > > > > via
> > virtio-balloon.
> > > > >
> > > > > Host requests the guest to report free page hints by sending a
> > > > > command to the guest via setting the
> > > > VIRTIO_BALLOON_HOST_CMD_FREE_PAGE_HINT
> > > > > bit of the host_cmd config register.
> > > > >
> > > > > As the first step here, virtio-balloon only reports free page
> > > > > hints from the max order (10) free page list to host. This has
> > > > > generated similar good results as reporting all free page hints during
> > our tests.
> > > > >
> > > > > TODO:
> > > > > - support reporting free page hints from smaller order free page lists
> > > > >   when there is a need/request from users.
> > > > >
> > > > > Signed-off-by: Wei Wang 
> > > > > Signed-off-by: Liang Li 
> > > > > Cc: Michael S. Tsirkin 
> > > > > Cc: Michal Hocko 
> > > > > Cc: Andrew Morton 
> > > > > ---
> > > > >  drivers/virtio/virtio_balloon.c | 187
> > +--
> > > > -
> > > > >  include/uapi/linux/virtio_balloon.h |  13 +++
> > > > >  2 files changed, 163 insertions(+), 37 deletions(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_balloon.c
> > > > > b/drivers/virtio/virtio_balloon.c index 6b237e3..582a03b 100644
> > > > > --- a/drivers/virtio/virtio_balloon.c
> > > > > +++ b/drivers/virtio/virtio_balloon.c
> > > > > @@ -43,6 +43,9 @@
> > > > >  #define OOM_VBALLOON_DEFAULT_PAGES 256  #define
> > > > > VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
> > > > >
> > > > > +/* The size of memory in bytes allocated for reporting free page
> > > > > +hints */ #define FREE_PAGE_HINT_MEM_SIZE (PAGE_SIZE * 16)
> > > > > +
> > > > >  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
> > > > > module_param(oom_pages, int, S_IRUSR | S_IWUSR);
> > > > > MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> > > >
> > > > Doesn't this limit memory size of the guest we can report?
> > > > Apparently to several gigabytes ...
> > > > OTOH huge guests with lots of free memory is exactly where we would
> > > > gain the most ...
> > >
> > > Yes, the 16-page array can report up to 32GB (each page can hold 512
> > addresses of 4MB free page blocks, i.e. 2GB free memory per page) free
> > memory to host. It is not flexible.
> > >
> > > How about allocating the buffer according to the guest memory size
> > > (proportional)? That is,
> > >
> > > /* Calculates the maximum number of 4MB (equals to 1024 pages) free
> > > pages blocks that the system can have */ 4m_page_blocks =
> > > totalram_pages / 1024;
> > >
> > > /* Allocating one page can hold 512 free page blocks, so calculates
> > > the number of pages that can hold those 4MB blocks. And this
> > > allocation should not exceed 1024 pages */ pages_to_allocate =
> > > min(4m_page_blocks / 512, 1024);
> > >
> > > For a 2TB guests, which has 2^19 page blocks (4MB each), we will allocate
> > 1024 pages as the buffer.
> > >
> > > When the guest has large memory, it should be easier to succeed in
> > allocation of large buffer. If that allocation fails, that implies that 
> > nothing
> > would be got from the 4MB free page list.
> > >
> > > I think the proportional allocation is simpler compared to other
> > > approaches like
> > > - scattered buffer, which will complicate the get_from_free_page_list
> > > implementation;
> > > - one buffer to call get_from_free_page_list multiple times, which needs
> > get_from_free_page_list to maintain states.. also too complicated.
> > >
> > > Best,
> > > Wei
> > >
> > 
> > That's 

RE: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-15 Thread Wang, Wei W
On Friday, June 15, 2018 10:29 PM, Michael S. Tsirkin wrote:
> On Fri, Jun 15, 2018 at 02:11:23PM +, Wang, Wei W wrote:
> > On Friday, June 15, 2018 7:42 PM, Michael S. Tsirkin wrote:
> > > On Fri, Jun 15, 2018 at 12:43:11PM +0800, Wei Wang wrote:
> > > > Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature
> > > > indicates the support of reporting hints of guest free pages to host via
> virtio-balloon.
> > > >
> > > > Host requests the guest to report free page hints by sending a
> > > > command to the guest via setting the
> > > VIRTIO_BALLOON_HOST_CMD_FREE_PAGE_HINT
> > > > bit of the host_cmd config register.
> > > >
> > > > As the first step here, virtio-balloon only reports free page
> > > > hints from the max order (10) free page list to host. This has
> > > > generated similar good results as reporting all free page hints during
> our tests.
> > > >
> > > > TODO:
> > > > - support reporting free page hints from smaller order free page lists
> > > >   when there is a need/request from users.
> > > >
> > > > Signed-off-by: Wei Wang 
> > > > Signed-off-by: Liang Li 
> > > > Cc: Michael S. Tsirkin 
> > > > Cc: Michal Hocko 
> > > > Cc: Andrew Morton 
> > > > ---
> > > >  drivers/virtio/virtio_balloon.c | 187
> +--
> > > -
> > > >  include/uapi/linux/virtio_balloon.h |  13 +++
> > > >  2 files changed, 163 insertions(+), 37 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_balloon.c
> > > > b/drivers/virtio/virtio_balloon.c index 6b237e3..582a03b 100644
> > > > --- a/drivers/virtio/virtio_balloon.c
> > > > +++ b/drivers/virtio/virtio_balloon.c
> > > > @@ -43,6 +43,9 @@
> > > >  #define OOM_VBALLOON_DEFAULT_PAGES 256  #define
> > > > VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
> > > >
> > > > +/* The size of memory in bytes allocated for reporting free page
> > > > +hints */ #define FREE_PAGE_HINT_MEM_SIZE (PAGE_SIZE * 16)
> > > > +
> > > >  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
> > > > module_param(oom_pages, int, S_IRUSR | S_IWUSR);
> > > > MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> > >
> > > Doesn't this limit memory size of the guest we can report?
> > > Apparently to several gigabytes ...
> > > OTOH huge guests with lots of free memory is exactly where we would
> > > gain the most ...
> >
> > Yes, the 16-page array can report up to 32GB (each page can hold 512
> addresses of 4MB free page blocks, i.e. 2GB free memory per page) free
> memory to host. It is not flexible.
> >
> > How about allocating the buffer according to the guest memory size
> > (proportional)? That is,
> >
> > /* Calculates the maximum number of 4MB (equals to 1024 pages) free
> > pages blocks that the system can have */ 4m_page_blocks =
> > totalram_pages / 1024;
> >
> > /* Allocating one page can hold 512 free page blocks, so calculates
> > the number of pages that can hold those 4MB blocks. And this
> > allocation should not exceed 1024 pages */ pages_to_allocate =
> > min(4m_page_blocks / 512, 1024);
> >
> > For a 2TB guests, which has 2^19 page blocks (4MB each), we will allocate
> 1024 pages as the buffer.
> >
> > When the guest has large memory, it should be easier to succeed in
> allocation of large buffer. If that allocation fails, that implies that 
> nothing
> would be got from the 4MB free page list.
> >
> > I think the proportional allocation is simpler compared to other
> > approaches like
> > - scattered buffer, which will complicate the get_from_free_page_list
> > implementation;
> > - one buffer to call get_from_free_page_list multiple times, which needs
> get_from_free_page_list to maintain states.. also too complicated.
> >
> > Best,
> > Wei
> >
> 
> That's more reasonable, but question remains what to do if that value
> exceeds MAX_ORDER. I'd say maybe tell host we can't report it.

Not necessarily, I think. We have min(4m_page_blocks / 512, 1024) above, so the 
maximum memory that can be reported is 2TB. For larger guests, e.g. 4TB, the 
optimization can still offer 2TB free memory (better than no optimization).

On the other hand, large guests being large mostly because the guests need to 
use large memory. In that case, they usually won't have that much free memory 
to report.

> 
> Also allocating it with GFP_KERNEL is out. You only want to take it off the 
> free
> list. So I guess __GFP_NOMEMALLOC and __GFP_ATOMIC.

Sounds good, thanks.

> Also you can't allocate this on device start. First totalram_pages can change.
> Second that's too much memory to tie up forever.

Yes, makes sense.

Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-15 Thread Michael S. Tsirkin
On Fri, Jun 15, 2018 at 02:11:23PM +, Wang, Wei W wrote:
> On Friday, June 15, 2018 7:42 PM, Michael S. Tsirkin wrote:
> > On Fri, Jun 15, 2018 at 12:43:11PM +0800, Wei Wang wrote:
> > > Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates
> > > the support of reporting hints of guest free pages to host via 
> > > virtio-balloon.
> > >
> > > Host requests the guest to report free page hints by sending a command
> > > to the guest via setting the
> > VIRTIO_BALLOON_HOST_CMD_FREE_PAGE_HINT
> > > bit of the host_cmd config register.
> > >
> > > As the first step here, virtio-balloon only reports free page hints
> > > from the max order (10) free page list to host. This has generated
> > > similar good results as reporting all free page hints during our tests.
> > >
> > > TODO:
> > > - support reporting free page hints from smaller order free page lists
> > >   when there is a need/request from users.
> > >
> > > Signed-off-by: Wei Wang 
> > > Signed-off-by: Liang Li 
> > > Cc: Michael S. Tsirkin 
> > > Cc: Michal Hocko 
> > > Cc: Andrew Morton 
> > > ---
> > >  drivers/virtio/virtio_balloon.c | 187 +--
> > -
> > >  include/uapi/linux/virtio_balloon.h |  13 +++
> > >  2 files changed, 163 insertions(+), 37 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_balloon.c
> > > b/drivers/virtio/virtio_balloon.c index 6b237e3..582a03b 100644
> > > --- a/drivers/virtio/virtio_balloon.c
> > > +++ b/drivers/virtio/virtio_balloon.c
> > > @@ -43,6 +43,9 @@
> > >  #define OOM_VBALLOON_DEFAULT_PAGES 256  #define
> > > VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
> > >
> > > +/* The size of memory in bytes allocated for reporting free page
> > > +hints */ #define FREE_PAGE_HINT_MEM_SIZE (PAGE_SIZE * 16)
> > > +
> > >  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
> > > module_param(oom_pages, int, S_IRUSR | S_IWUSR);
> > > MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> > 
> > Doesn't this limit memory size of the guest we can report?
> > Apparently to several gigabytes ...
> > OTOH huge guests with lots of free memory is exactly where we would gain
> > the most ...
> 
> Yes, the 16-page array can report up to 32GB (each page can hold 512 
> addresses of 4MB free page blocks, i.e. 2GB free memory per page) free memory 
> to host. It is not flexible.
> 
> How about allocating the buffer according to the guest memory size 
> (proportional)? That is,
> 
> /* Calculates the maximum number of 4MB (equals to 1024 pages) free pages 
> blocks that the system can have */
> 4m_page_blocks = totalram_pages / 1024;
> 
> /* Allocating one page can hold 512 free page blocks, so calculates the 
> number of pages that can hold those 4MB blocks. And this allocation should 
> not exceed 1024 pages */
> pages_to_allocate = min(4m_page_blocks / 512, 1024);
> 
> For a 2TB guests, which has 2^19 page blocks (4MB each), we will allocate 
> 1024 pages as the buffer.
> 
> When the guest has large memory, it should be easier to succeed in allocation 
> of large buffer. If that allocation fails, that implies that nothing would be 
> got from the 4MB free page list.
> 
> I think the proportional allocation is simpler compared to other approaches 
> like 
> - scattered buffer, which will complicate the get_from_free_page_list 
> implementation;
> - one buffer to call get_from_free_page_list multiple times, which needs 
> get_from_free_page_list to maintain states.. also too complicated.
> 
> Best,
> Wei
> 

That's more reasonable, but question remains what to do if that value
exceeds MAX_ORDER. I'd say maybe tell host we can't report it.

Also allocating it with GFP_KERNEL is out. You only want to take
it off the free list. So I guess __GFP_NOMEMALLOC and __GFP_ATOMIC.

Also you can't allocate this on device start. First totalram_pages can
change. Second that's too much memory to tie up forever.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-15 Thread Wang, Wei W
On Friday, June 15, 2018 7:42 PM, Michael S. Tsirkin wrote:
> On Fri, Jun 15, 2018 at 12:43:11PM +0800, Wei Wang wrote:
> > Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates
> > the support of reporting hints of guest free pages to host via 
> > virtio-balloon.
> >
> > Host requests the guest to report free page hints by sending a command
> > to the guest via setting the
> VIRTIO_BALLOON_HOST_CMD_FREE_PAGE_HINT
> > bit of the host_cmd config register.
> >
> > As the first step here, virtio-balloon only reports free page hints
> > from the max order (10) free page list to host. This has generated
> > similar good results as reporting all free page hints during our tests.
> >
> > TODO:
> > - support reporting free page hints from smaller order free page lists
> >   when there is a need/request from users.
> >
> > Signed-off-by: Wei Wang 
> > Signed-off-by: Liang Li 
> > Cc: Michael S. Tsirkin 
> > Cc: Michal Hocko 
> > Cc: Andrew Morton 
> > ---
> >  drivers/virtio/virtio_balloon.c | 187 +--
> -
> >  include/uapi/linux/virtio_balloon.h |  13 +++
> >  2 files changed, 163 insertions(+), 37 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_balloon.c
> > b/drivers/virtio/virtio_balloon.c index 6b237e3..582a03b 100644
> > --- a/drivers/virtio/virtio_balloon.c
> > +++ b/drivers/virtio/virtio_balloon.c
> > @@ -43,6 +43,9 @@
> >  #define OOM_VBALLOON_DEFAULT_PAGES 256  #define
> > VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
> >
> > +/* The size of memory in bytes allocated for reporting free page
> > +hints */ #define FREE_PAGE_HINT_MEM_SIZE (PAGE_SIZE * 16)
> > +
> >  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
> > module_param(oom_pages, int, S_IRUSR | S_IWUSR);
> > MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> 
> Doesn't this limit memory size of the guest we can report?
> Apparently to several gigabytes ...
> OTOH huge guests with lots of free memory is exactly where we would gain
> the most ...

Yes, the 16-page array can report up to 32GB (each page can hold 512 addresses 
of 4MB free page blocks, i.e. 2GB free memory per page) free memory to host. It 
is not flexible.

How about allocating the buffer according to the guest memory size 
(proportional)? That is,

/* Calculates the maximum number of 4MB (equals to 1024 pages) free pages 
blocks that the system can have */
4m_page_blocks = totalram_pages / 1024;

/* Allocating one page can hold 512 free page blocks, so calculates the number 
of pages that can hold those 4MB blocks. And this allocation should not exceed 
1024 pages */
pages_to_allocate = min(4m_page_blocks / 512, 1024);

For a 2TB guests, which has 2^19 page blocks (4MB each), we will allocate 1024 
pages as the buffer.

When the guest has large memory, it should be easier to succeed in allocation 
of large buffer. If that allocation fails, that implies that nothing would be 
got from the 4MB free page list.

I think the proportional allocation is simpler compared to other approaches 
like 
- scattered buffer, which will complicate the get_from_free_page_list 
implementation;
- one buffer to call get_from_free_page_list multiple times, which needs 
get_from_free_page_list to maintain states.. also too complicated.

Best,
Wei



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-15 Thread Michael S. Tsirkin
On Fri, Jun 15, 2018 at 12:43:11PM +0800, Wei Wang wrote:
> Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
> support of reporting hints of guest free pages to host via virtio-balloon.
> 
> Host requests the guest to report free page hints by sending a command
> to the guest via setting the VIRTIO_BALLOON_HOST_CMD_FREE_PAGE_HINT bit
> of the host_cmd config register.
> 
> As the first step here, virtio-balloon only reports free page hints from
> the max order (10) free page list to host. This has generated similar good
> results as reporting all free page hints during our tests.
> 
> TODO:
> - support reporting free page hints from smaller order free page lists
>   when there is a need/request from users.
> 
> Signed-off-by: Wei Wang 
> Signed-off-by: Liang Li 
> Cc: Michael S. Tsirkin 
> Cc: Michal Hocko 
> Cc: Andrew Morton 
> ---
>  drivers/virtio/virtio_balloon.c | 187 
> +---
>  include/uapi/linux/virtio_balloon.h |  13 +++
>  2 files changed, 163 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 6b237e3..582a03b 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -43,6 +43,9 @@
>  #define OOM_VBALLOON_DEFAULT_PAGES 256
>  #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
>  
> +/* The size of memory in bytes allocated for reporting free page hints */
> +#define FREE_PAGE_HINT_MEM_SIZE (PAGE_SIZE * 16)
> +
>  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
>  module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");

Doesn't this limit memory size of the guest we can report?
Apparently to several gigabytes ...
OTOH huge guests with lots of free memory is exactly
where we would gain the most ...

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v33 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-14 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a command
to the guest via setting the VIRTIO_BALLOON_HOST_CMD_FREE_PAGE_HINT bit
of the host_cmd config register.

As the first step here, virtio-balloon only reports free page hints from
the max order (10) free page list to host. This has generated similar good
results as reporting all free page hints during our tests.

TODO:
- support reporting free page hints from smaller order free page lists
  when there is a need/request from users.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
Cc: Andrew Morton 
---
 drivers/virtio/virtio_balloon.c | 187 +---
 include/uapi/linux/virtio_balloon.h |  13 +++
 2 files changed, 163 insertions(+), 37 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 6b237e3..582a03b 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -43,6 +43,9 @@
 #define OOM_VBALLOON_DEFAULT_PAGES 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
+/* The size of memory in bytes allocated for reporting free page hints */
+#define FREE_PAGE_HINT_MEM_SIZE (PAGE_SIZE * 16)
+
 static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
 module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
@@ -51,9 +54,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +79,8 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   struct virtio_balloon_free_page_hints *hints;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -326,17 +344,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -353,6 +360,32 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   uint32_t host_cmd;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   virtio_cread(vdev, struct virtio_balloon_config, host_cmd, _cmd);
+   if ((host_cmd & VIRTIO_BALLOON_HOST_CMD_FREE_PAGE_HINT) &&
+    virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  >report_free_page_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -425,44 +458,98 @@ static void update_balloon_size_func(struct work_struct 
*work)
queue_work(system_freezable_wq, work);
 }
 
+static void free_page_vq_cb(struct virtqueue *vq)
+{
+   unsigned int unused;
+
+   while (virtqueue_get_buf(vq, ))
+   ;
+}
+
 static int init_vqs(struct virtio_balloon *vb)
 {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { 

[PATCH v32 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-09 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang <wei.w.w...@intel.com>
Signed-off-by: Liang Li <liang.z...@intel.com>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Michal Hocko <mho...@kernel.org>
---
 drivers/virtio/virtio_balloon.c | 288 +++-
 include/uapi/linux/virtio_balloon.h |   4 +
 2 files changed, 256 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index dfe5684..197f865 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,13 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The new cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is actively in use */
+   __virtio32 cmd_id_active;
+   /* Buffer to store the stop sign */
+   __virtio32 stop_cmd_id;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -320,17 +340,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -347,6 +356,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  >report_free_page_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -419,44 +456,196 @@ static void update_balloon_size_func(struct work_struct 
*work)
queue_work(system_freezable_wq, work);
 }
 
+static void free_page_vq_cb(struct virtqueue *vq)
+{
+   unsigned int unused;
+
+   while (vir

Re: [PATCH v31 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-09 Thread Wei Wang

On 04/09/2018 02:03 PM, Michael S. Tsirkin wrote:

On Fri, Apr 06, 2018 at 08:17:23PM +0800, Wei Wang wrote:

Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang <wei.w.w...@intel.com>
Signed-off-by: Liang Li <liang.z...@intel.com>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Michal Hocko <mho...@kernel.org>


Pretty good by now, Minor comments below.


Thanks for the comments.




---
  drivers/virtio/virtio_balloon.c | 272 +++-
  include/uapi/linux/virtio_balloon.h |   4 +
  2 files changed, 240 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index dfe5684..aef73ee 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
  static struct vfsmount *balloon_mnt;
  #endif
  
+enum virtio_balloon_vq {

+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
  struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
  
  	/* The balloon servicing is delegated to a freezable workqueue. */

struct work_struct update_balloon_stats_work;
@@ -63,6 +76,13 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
  
+	/* The new cmd id received from host */

+   uint32_t cmd_id_received;
+   /* The cmd id that is in use */
+   __virtio32 cmd_id_use;

I'd prefer cmd_id_active but it's not critical.


OK, will change.



+
+static void report_free_page_func(struct work_struct *work)
+{
+   struct virtio_balloon *vb;
+   struct virtqueue *vq;
+   unsigned int unused;
+   int ret;
+
+   vb = container_of(work, struct virtio_balloon, report_free_page_work);
+   vq = vb->free_page_vq;
+
+   /* Start by sending the received cmd id to host with an outbuf. */
+   ret = send_start_cmd_id(vb, vb->cmd_id_received);
+   if (unlikely(ret))
+   goto err;
+
+   ret = walk_free_mem_block(vb, 0, _balloon_send_free_pages);
+   if (unlikely(ret == -EIO))
+   goto err;
why is EIO special? I think you should special-case EINTR maybe.


Actually EINTR isn't an error which needs to bail out. That's just the 
case that the vq is full, that hint isn't added. Maybe it is not 
necessary to treat the "vq full" case as an error.
How about just returning "0" when the vq is full, instead of returning 
"EINTR"? (The next hint will continue to be added)







+
+   /* End by sending a stop id to host with an outbuf. */
+   ret = send_stop_cmd_id(vb);
+   if (likely(!ret)) {

What happens on failure? Don't we need to detach anyway?


Yes. Please see below, we could make some more change.



+   /* Ending: detach all the used buffers from the vq. */
+   while (vq->num_free != virtqueue_get_vring_size(vq))
+   virtqueue_get_buf(vq, );

This isn't all that happens here. It also waits for buffers to
be consumed. Is this by design? And why is it good idea to
busy poll while doing it?


Because host and guest operations happen asynchronously. When the guest 
reaches here, host may have not put anything to the vq yet. How about 
doing this via the free page vq handler?
Host will send a free page vq interrupt before exiting the optimization. 
I think this would be nicer.



Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v31 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-09 Thread Michael S. Tsirkin
On Fri, Apr 06, 2018 at 08:17:23PM +0800, Wei Wang wrote:
> Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
> support of reporting hints of guest free pages to host via virtio-balloon.
> 
> Host requests the guest to report free page hints by sending a new cmd
> id to the guest via the free_page_report_cmd_id configuration register.
> 
> When the guest starts to report, the first element added to the free page
> vq is the cmd id given by host. When the guest finishes the reporting
> of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
> to the vq to tell host that the reporting is done. Host polls the free
> page vq after sending the starting cmd id, so the guest doesn't need to
> kick after filling an element to the vq.
> 
> Host may also requests the guest to stop the reporting in advance by
> sending the stop cmd id to the guest via the configuration register.
> 
> Signed-off-by: Wei Wang <wei.w.w...@intel.com>
> Signed-off-by: Liang Li <liang.z...@intel.com>
> Cc: Michael S. Tsirkin <m...@redhat.com>
> Cc: Michal Hocko <mho...@kernel.org>


Pretty good by now, Minor comments below.

> ---
>  drivers/virtio/virtio_balloon.c | 272 
> +++-
>  include/uapi/linux/virtio_balloon.h |   4 +
>  2 files changed, 240 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index dfe5684..aef73ee 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  static struct vfsmount *balloon_mnt;
>  #endif
>  
> +enum virtio_balloon_vq {
> + VIRTIO_BALLOON_VQ_INFLATE,
> + VIRTIO_BALLOON_VQ_DEFLATE,
> + VIRTIO_BALLOON_VQ_STATS,
> + VIRTIO_BALLOON_VQ_FREE_PAGE,
> + VIRTIO_BALLOON_VQ_MAX
> +};
> +
>  struct virtio_balloon {
>   struct virtio_device *vdev;
> - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> +
> + /* Balloon's own wq for cpu-intensive work items */
> + struct workqueue_struct *balloon_wq;
> + /* The free page reporting work item submitted to the balloon wq */
> + struct work_struct report_free_page_work;
>  
>   /* The balloon servicing is delegated to a freezable workqueue. */
>   struct work_struct update_balloon_stats_work;
> @@ -63,6 +76,13 @@ struct virtio_balloon {
>   spinlock_t stop_update_lock;
>   bool stop_update;
>  
> + /* The new cmd id received from host */
> + uint32_t cmd_id_received;
> + /* The cmd id that is in use */
> + __virtio32 cmd_id_use;

I'd prefer cmd_id_active but it's not critical.

> + /* Buffer to store the stop sign */
> + __virtio32 stop_cmd_id;
> +
>   /* Waiting for host to ack the pages we released. */
>   wait_queue_head_t acked;
>  
> @@ -320,17 +340,6 @@ static void stats_handle_request(struct virtio_balloon 
> *vb)
>   virtqueue_kick(vq);
>  }
>  
> -static void virtballoon_changed(struct virtio_device *vdev)
> -{
> - struct virtio_balloon *vb = vdev->priv;
> - unsigned long flags;
> -
> - spin_lock_irqsave(>stop_update_lock, flags);
> - if (!vb->stop_update)
> - queue_work(system_freezable_wq, >update_balloon_size_work);
> - spin_unlock_irqrestore(>stop_update_lock, flags);
> -}
> -
>  static inline s64 towards_target(struct virtio_balloon *vb)
>  {
>   s64 target;
> @@ -347,6 +356,34 @@ static inline s64 towards_target(struct virtio_balloon 
> *vb)
>   return target - vb->num_pages;
>  }
>  
> +static void virtballoon_changed(struct virtio_device *vdev)
> +{
> + struct virtio_balloon *vb = vdev->priv;
> + unsigned long flags;
> + s64 diff = towards_target(vb);
> +
> + if (diff) {
> + spin_lock_irqsave(>stop_update_lock, flags);
> + if (!vb->stop_update)
> + queue_work(system_freezable_wq,
> +>update_balloon_size_work);
> + spin_unlock_irqrestore(>stop_update_lock, flags);
> + }
> +
> + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> + virtio_cread(vdev, struct virtio_balloon_config,
> +  free_page_report_cmd_id, >cmd_id_received);
> + if (vb->cmd_id_received !=
> + VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
> + spin_lock_irqsave(>stop_update_lock, flags);
> + if (!vb-&

[PATCH v31 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-06 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang <wei.w.w...@intel.com>
Signed-off-by: Liang Li <liang.z...@intel.com>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Michal Hocko <mho...@kernel.org>
---
 drivers/virtio/virtio_balloon.c | 272 +++-
 include/uapi/linux/virtio_balloon.h |   4 +
 2 files changed, 240 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index dfe5684..aef73ee 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,13 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The new cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is in use */
+   __virtio32 cmd_id_use;
+   /* Buffer to store the stop sign */
+   __virtio32 stop_cmd_id;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -320,17 +340,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -347,6 +356,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  >report_free_page_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -421,42 +458,178 @@ static void update_balloon_size_func(struct work_struct 
*work)
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-  

Re: [PATCH v30 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-05 Thread Michael S. Tsirkin
On Thu, Apr 05, 2018 at 03:47:28PM +, Wang, Wei W wrote:
> On Thursday, April 5, 2018 10:04 PM, Michael S. Tsirkin wrote:
> > On Thu, Apr 05, 2018 at 02:05:03AM +, Wang, Wei W wrote:
> > > On Thursday, April 5, 2018 9:12 AM, Michael S. Tsirkin wrote:
> > > > On Thu, Apr 05, 2018 at 12:30:27AM +, Wang, Wei W wrote:
> > > > > On Wednesday, April 4, 2018 10:08 PM, Michael S. Tsirkin wrote:
> > > > > > On Wed, Apr 04, 2018 at 10:07:51AM +0800, Wei Wang wrote:
> > > > > > > On 04/04/2018 02:47 AM, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Apr 04, 2018 at 12:10:03AM +0800, Wei Wang wrote:
> > > > > I'm afraid the driver couldn't be aware if the added hints are
> > > > > stale or not,
> > > >
> > > >
> > > > No - I mean that driver has code that compares two values and stops
> > > > reporting. Can one of the values be stale?
> > >
> > > The driver compares "vb->cmd_id_use != vb->cmd_id_received" to decide
> > > if it needs to stop reporting hints, and cmd_id_received is what the
> > > driver reads from host (host notifies the driver to read for the
> > > latest value). If host sends a new cmd id, it will notify the guest to
> > > read again. I'm not sure how that could be a stale cmd id (or maybe I
> > > misunderstood your point here?)
> > >
> > > Best,
> > > Wei
> > 
> > The comparison is done in one thread, the update in another one.
> 
> I think this isn't something that could be solved by adding a lock,
> unless host waits for the driver's ACK about finishing the update
> (this is not agreed in the QEMU part discussion).
> 
> Actually virtio_balloon has F_IOMMU_PLATFORM disabled, maybe we don't
> need to worry about that using DMA api case (we only have gpa added to
> the vq, and having some entries stay in the vq seems fine). For this
> feature, I think it would not work with F_IOMMU enabled either.

Adding a code comment explaining all this might be a good idea.

> If there is any further need (I couldn't think of a need so far), I
> think we could consider to let host inject a vq interrupt at some
> point, and then the driver handler can do the virtqueue_get_buf work.
> 
> Best,
> Wei




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v30 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-05 Thread Wang, Wei W
On Thursday, April 5, 2018 10:04 PM, Michael S. Tsirkin wrote:
> On Thu, Apr 05, 2018 at 02:05:03AM +, Wang, Wei W wrote:
> > On Thursday, April 5, 2018 9:12 AM, Michael S. Tsirkin wrote:
> > > On Thu, Apr 05, 2018 at 12:30:27AM +, Wang, Wei W wrote:
> > > > On Wednesday, April 4, 2018 10:08 PM, Michael S. Tsirkin wrote:
> > > > > On Wed, Apr 04, 2018 at 10:07:51AM +0800, Wei Wang wrote:
> > > > > > On 04/04/2018 02:47 AM, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Apr 04, 2018 at 12:10:03AM +0800, Wei Wang wrote:
> > > > I'm afraid the driver couldn't be aware if the added hints are
> > > > stale or not,
> > >
> > >
> > > No - I mean that driver has code that compares two values and stops
> > > reporting. Can one of the values be stale?
> >
> > The driver compares "vb->cmd_id_use != vb->cmd_id_received" to decide
> > if it needs to stop reporting hints, and cmd_id_received is what the
> > driver reads from host (host notifies the driver to read for the
> > latest value). If host sends a new cmd id, it will notify the guest to
> > read again. I'm not sure how that could be a stale cmd id (or maybe I
> > misunderstood your point here?)
> >
> > Best,
> > Wei
> 
> The comparison is done in one thread, the update in another one.

I think this isn't something that could be solved by adding a lock, unless host 
waits for the driver's ACK about finishing the update (this is not agreed in 
the QEMU part discussion).

Actually virtio_balloon has F_IOMMU_PLATFORM disabled, maybe we don't need to 
worry about that using DMA api case (we only have gpa added to the vq, and 
having some entries stay in the vq seems fine). For this feature, I think it 
would not work with F_IOMMU enabled either.

If there is any further need (I couldn't think of a need so far), I think we 
could consider to let host inject a vq interrupt at some point, and then the 
driver handler can do the virtqueue_get_buf work.

Best,
Wei

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v30 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-05 Thread Michael S. Tsirkin
On Thu, Apr 05, 2018 at 02:05:03AM +, Wang, Wei W wrote:
> On Thursday, April 5, 2018 9:12 AM, Michael S. Tsirkin wrote:
> > On Thu, Apr 05, 2018 at 12:30:27AM +, Wang, Wei W wrote:
> > > On Wednesday, April 4, 2018 10:08 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Apr 04, 2018 at 10:07:51AM +0800, Wei Wang wrote:
> > > > > On 04/04/2018 02:47 AM, Michael S. Tsirkin wrote:
> > > > > > On Wed, Apr 04, 2018 at 12:10:03AM +0800, Wei Wang wrote:
> > > I'm afraid the driver couldn't be aware if the added hints are stale
> > > or not,
> > 
> > 
> > No - I mean that driver has code that compares two values and stops
> > reporting. Can one of the values be stale?
> 
> The driver compares "vb->cmd_id_use != vb->cmd_id_received" to decide if it 
> needs to stop reporting hints, and cmd_id_received is what the driver reads 
> from host (host notifies the driver to read for the latest value). If host 
> sends a new cmd id, it will notify the guest to read again. I'm not sure how 
> that could be a stale cmd id (or maybe I misunderstood your point here?)
> 
> Best,
> Wei

The comparison is done in one thread, the update in another one.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v30 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-04 Thread Wang, Wei W
On Thursday, April 5, 2018 9:12 AM, Michael S. Tsirkin wrote:
> On Thu, Apr 05, 2018 at 12:30:27AM +, Wang, Wei W wrote:
> > On Wednesday, April 4, 2018 10:08 PM, Michael S. Tsirkin wrote:
> > > On Wed, Apr 04, 2018 at 10:07:51AM +0800, Wei Wang wrote:
> > > > On 04/04/2018 02:47 AM, Michael S. Tsirkin wrote:
> > > > > On Wed, Apr 04, 2018 at 12:10:03AM +0800, Wei Wang wrote:
> > I'm afraid the driver couldn't be aware if the added hints are stale
> > or not,
> 
> 
> No - I mean that driver has code that compares two values and stops
> reporting. Can one of the values be stale?

The driver compares "vb->cmd_id_use != vb->cmd_id_received" to decide if it 
needs to stop reporting hints, and cmd_id_received is what the driver reads 
from host (host notifies the driver to read for the latest value). If host 
sends a new cmd id, it will notify the guest to read again. I'm not sure how 
that could be a stale cmd id (or maybe I misunderstood your point here?)

Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v30 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-04 Thread Michael S. Tsirkin
On Thu, Apr 05, 2018 at 12:30:27AM +, Wang, Wei W wrote:
> On Wednesday, April 4, 2018 10:08 PM, Michael S. Tsirkin wrote:
> > On Wed, Apr 04, 2018 at 10:07:51AM +0800, Wei Wang wrote:
> > > On 04/04/2018 02:47 AM, Michael S. Tsirkin wrote:
> > > > On Wed, Apr 04, 2018 at 12:10:03AM +0800, Wei Wang wrote:
> > > > > +static int add_one_sg(struct virtqueue *vq, unsigned long pfn,
> > > > > +uint32_t len) {
> > > > > + struct scatterlist sg;
> > > > > + unsigned int unused;
> > > > > +
> > > > > + sg_init_table(, 1);
> > > > > + sg_set_page(, pfn_to_page(pfn), len, 0);
> > > > > +
> > > > > + /* Detach all the used buffers from the vq */
> > > > > + while (virtqueue_get_buf(vq, ))
> > > > > + ;
> > > > > +
> > > > > + /*
> > > > > +  * Since this is an optimization feature, losing a couple of 
> > > > > free
> > > > > +  * pages to report isn't important. We simply return without 
> > > > > adding
> > > > > +  * the page hint if the vq is full.
> > > > why not stop scanning of following pages though?
> > >
> > > Because continuing to send hints is a way to deliver the maximum
> > > possible hints to host. For example, host may have a delay in taking
> > > hints at some point, and then it resumes to take hints soon. If the
> > > driver does not stop when the vq is full, it will be able to put more
> > > hints to the vq once the vq has available entries to add.
> > 
> > What this appears to be is just lack of coordination between host and guest.
> > 
> > But meanwhile you are spending cycles walking the list uselessly.
> > Instead of trying nilly-willy, the standard thing to do is to wait for host 
> > to
> > consume an entry and proceed.
> > 
> > Coding it up might be tricky, so it's probably acceptable as is for now, but
> > please replace the justification about with a TODO entry that we should
> > synchronize with the host.
> 
> Thanks. I plan to add
> 
> TODO: The current implementation could be further improved by stopping the 
> reporting when the vq is full and continuing the reporting when host notifies 
> that there are available entries for the driver to add.

... that entries have been used.

> 
> > 
> > 
> > >
> > > >
> > > > > +  * We are adding one entry each time, which essentially results 
> > > > > in no
> > > > > +  * memory allocation, so the GFP_KERNEL flag below can be 
> > > > > ignored.
> > > > > +  * Host works by polling the free page vq for hints after 
> > > > > sending the
> > > > > +  * starting cmd id, so the driver doesn't need to kick after 
> > > > > filling
> > > > > +  * the vq.
> > > > > +  * Lastly, there is always one entry reserved for the cmd id to 
> > > > > use.
> > > > > +  */
> > > > > + if (vq->num_free > 1)
> > > > > + return virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL);
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +static int virtio_balloon_send_free_pages(void *opaque, unsigned long
> > pfn,
> > > > > +unsigned long nr_pages)
> > > > > +{
> > > > > + struct virtio_balloon *vb = (struct virtio_balloon *)opaque;
> > > > > + uint32_t len = nr_pages << PAGE_SHIFT;
> > > > > +
> > > > > + /*
> > > > > +  * If a stop id or a new cmd id was just received from host, 
> > > > > stop
> > > > > +  * the reporting, and return 1 to indicate an active stop.
> > > > > +  */
> > > > > + if (virtio32_to_cpu(vb->vdev, vb->cmd_id_use) != vb-
> > >cmd_id_received)
> > > > > + return 1;
> > 
> > functions returning int should return 0 or -errno on failure, positive 
> > return
> > code should indicate progress.
> > 
> > If you want a boolean, use bool pls.
> 
> OK. I plan to change 1  to -EBUSY to indicate the case that host actively 
> asks the driver to stop reporting (This makes the callback return value type 
> consistent with walk_free_mem_block). 
> 

something like EINTR might be a better fit.

> 
> > 
> > 
> > > > > +
> > > > this access to cmd_id_use and cmd_id_received without locks bothers
> > > > me. Pls document why it's safe.
> > >
> > > OK. Probably we could add below to the above comments:
> > >
> > > cmd_id_use and cmd_id_received don't need to be accessed under locks
> > > because the reporting does not have to stop immediately before
> > > cmd_id_received is changed (i.e. when host requests to stop). That is,
> > > reporting more hints after host requests to stop isn't an issue for
> > > this optimization feature, because host will simply drop the stale
> > > hints next time when it needs a new reporting.
> > 
> > What about the other direction? Can this observe a stale value and exit
> > erroneously?
> 
> I'm afraid the driver couldn't be aware if the added hints are stale or not,


No - I mean that driver has code that compares two values
and stops reporting. Can one of the values be stale?

> because host and guest actions happen asynchronously. That is, 

RE: [PATCH v30 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-04 Thread Wang, Wei W
On Wednesday, April 4, 2018 10:08 PM, Michael S. Tsirkin wrote:
> On Wed, Apr 04, 2018 at 10:07:51AM +0800, Wei Wang wrote:
> > On 04/04/2018 02:47 AM, Michael S. Tsirkin wrote:
> > > On Wed, Apr 04, 2018 at 12:10:03AM +0800, Wei Wang wrote:
> > > > +static int add_one_sg(struct virtqueue *vq, unsigned long pfn,
> > > > +uint32_t len) {
> > > > +   struct scatterlist sg;
> > > > +   unsigned int unused;
> > > > +
> > > > +   sg_init_table(, 1);
> > > > +   sg_set_page(, pfn_to_page(pfn), len, 0);
> > > > +
> > > > +   /* Detach all the used buffers from the vq */
> > > > +   while (virtqueue_get_buf(vq, ))
> > > > +   ;
> > > > +
> > > > +   /*
> > > > +* Since this is an optimization feature, losing a couple of 
> > > > free
> > > > +* pages to report isn't important. We simply return without 
> > > > adding
> > > > +* the page hint if the vq is full.
> > > why not stop scanning of following pages though?
> >
> > Because continuing to send hints is a way to deliver the maximum
> > possible hints to host. For example, host may have a delay in taking
> > hints at some point, and then it resumes to take hints soon. If the
> > driver does not stop when the vq is full, it will be able to put more
> > hints to the vq once the vq has available entries to add.
> 
> What this appears to be is just lack of coordination between host and guest.
> 
> But meanwhile you are spending cycles walking the list uselessly.
> Instead of trying nilly-willy, the standard thing to do is to wait for host to
> consume an entry and proceed.
> 
> Coding it up might be tricky, so it's probably acceptable as is for now, but
> please replace the justification about with a TODO entry that we should
> synchronize with the host.

Thanks. I plan to add

TODO: The current implementation could be further improved by stopping the 
reporting when the vq is full and continuing the reporting when host notifies 
that there are available entries for the driver to add.


> 
> 
> >
> > >
> > > > +* We are adding one entry each time, which essentially results 
> > > > in no
> > > > +* memory allocation, so the GFP_KERNEL flag below can be 
> > > > ignored.
> > > > +* Host works by polling the free page vq for hints after 
> > > > sending the
> > > > +* starting cmd id, so the driver doesn't need to kick after 
> > > > filling
> > > > +* the vq.
> > > > +* Lastly, there is always one entry reserved for the cmd id to 
> > > > use.
> > > > +*/
> > > > +   if (vq->num_free > 1)
> > > > +   return virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL);
> > > > +
> > > > +   return 0;
> > > > +}
> > > > +
> > > > +static int virtio_balloon_send_free_pages(void *opaque, unsigned long
> pfn,
> > > > +  unsigned long nr_pages)
> > > > +{
> > > > +   struct virtio_balloon *vb = (struct virtio_balloon *)opaque;
> > > > +   uint32_t len = nr_pages << PAGE_SHIFT;
> > > > +
> > > > +   /*
> > > > +* If a stop id or a new cmd id was just received from host, 
> > > > stop
> > > > +* the reporting, and return 1 to indicate an active stop.
> > > > +*/
> > > > +   if (virtio32_to_cpu(vb->vdev, vb->cmd_id_use) != vb-
> >cmd_id_received)
> > > > +   return 1;
> 
> functions returning int should return 0 or -errno on failure, positive return
> code should indicate progress.
> 
> If you want a boolean, use bool pls.

OK. I plan to change 1  to -EBUSY to indicate the case that host actively asks 
the driver to stop reporting (This makes the callback return value type 
consistent with walk_free_mem_block). 



> 
> 
> > > > +
> > > this access to cmd_id_use and cmd_id_received without locks bothers
> > > me. Pls document why it's safe.
> >
> > OK. Probably we could add below to the above comments:
> >
> > cmd_id_use and cmd_id_received don't need to be accessed under locks
> > because the reporting does not have to stop immediately before
> > cmd_id_received is changed (i.e. when host requests to stop). That is,
> > reporting more hints after host requests to stop isn't an issue for
> > this optimization feature, because host will simply drop the stale
> > hints next time when it needs a new reporting.
> 
> What about the other direction? Can this observe a stale value and exit
> erroneously?

I'm afraid the driver couldn't be aware if the added hints are stale or not, 
because host and guest actions happen asynchronously. That is, host side 
iothread stops taking hints as soon as the migration thread asks to stop, it 
doesn't wait for any ACK from the driver to stop (as we discussed before, host 
couldn't always assume that the driver is in a responsive state).

Btw, we also don't need to worry about any memory left in the vq, since only 
addresses are added to the vq, there is no real memory allocations.

Best,
Wei

Re: [PATCH v30 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-04 Thread Michael S. Tsirkin
On Wed, Apr 04, 2018 at 10:07:51AM +0800, Wei Wang wrote:
> On 04/04/2018 02:47 AM, Michael S. Tsirkin wrote:
> > On Wed, Apr 04, 2018 at 12:10:03AM +0800, Wei Wang wrote:
> > > +static int add_one_sg(struct virtqueue *vq, unsigned long pfn, uint32_t 
> > > len)
> > > +{
> > > + struct scatterlist sg;
> > > + unsigned int unused;
> > > +
> > > + sg_init_table(, 1);
> > > + sg_set_page(, pfn_to_page(pfn), len, 0);
> > > +
> > > + /* Detach all the used buffers from the vq */
> > > + while (virtqueue_get_buf(vq, ))
> > > + ;
> > > +
> > > + /*
> > > +  * Since this is an optimization feature, losing a couple of free
> > > +  * pages to report isn't important. We simply return without adding
> > > +  * the page hint if the vq is full.
> > why not stop scanning of following pages though?
> 
> Because continuing to send hints is a way to deliver the maximum possible
> hints to host. For example, host may have a delay in taking hints at some
> point, and then it resumes to take hints soon. If the driver does not stop
> when the vq is full, it will be able to put more hints to the vq once the vq
> has available entries to add.

What this appears to be is just lack of coordination between
host and guest.

But meanwhile you are spending cycles walking the list uselessly.
Instead of trying nilly-willy, the standard thing to do
is to wait for host to consume an entry and proceed.

Coding it up might be tricky, so it's probably acceptable as is
for now, but please replace the justification about with
a TODO entry that we should synchronize with the host.


> 
> > 
> > > +  * We are adding one entry each time, which essentially results in no
> > > +  * memory allocation, so the GFP_KERNEL flag below can be ignored.
> > > +  * Host works by polling the free page vq for hints after sending the
> > > +  * starting cmd id, so the driver doesn't need to kick after filling
> > > +  * the vq.
> > > +  * Lastly, there is always one entry reserved for the cmd id to use.
> > > +  */
> > > + if (vq->num_free > 1)
> > > + return virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int virtio_balloon_send_free_pages(void *opaque, unsigned long 
> > > pfn,
> > > +unsigned long nr_pages)
> > > +{
> > > + struct virtio_balloon *vb = (struct virtio_balloon *)opaque;
> > > + uint32_t len = nr_pages << PAGE_SHIFT;
> > > +
> > > + /*
> > > +  * If a stop id or a new cmd id was just received from host, stop
> > > +  * the reporting, and return 1 to indicate an active stop.
> > > +  */
> > > + if (virtio32_to_cpu(vb->vdev, vb->cmd_id_use) != vb->cmd_id_received)
> > > + return 1;

functions returning int should return 0 or -errno on failure,
positive return code should indicate progress.

If you want a boolean, use bool pls.


> > > +
> > this access to cmd_id_use and cmd_id_received without locks
> > bothers me. Pls document why it's safe.
> 
> OK. Probably we could add below to the above comments:
> 
> cmd_id_use and cmd_id_received don't need to be accessed under locks because
> the reporting does not have to stop immediately before cmd_id_received is
> changed (i.e. when host requests to stop). That is, reporting more hints
> after host requests to stop isn't an issue for this optimization feature,
> because host will simply drop the stale hints next time when it needs a new
> reporting.

What about the other direction? Can this observe a stale value and
exit erroneously?

> 
> 
> 
> > 
> > > + return add_one_sg(vb->free_page_vq, pfn, len);
> > > +}
> > > +
> > > +static int send_start_cmd_id(struct virtio_balloon *vb, uint32_t cmd_id)
> > > +{
> > > + struct scatterlist sg;
> > > + struct virtqueue *vq = vb->free_page_vq;
> > > +
> > > + vb->cmd_id_use = cpu_to_virtio32(vb->vdev, cmd_id);
> > > + sg_init_one(, >cmd_id_use, sizeof(vb->cmd_id_use));
> > > + return virtqueue_add_outbuf(vq, , 1, vb, GFP_KERNEL);
> > > +}
> > > +
> > > +static int send_stop_cmd_id(struct virtio_balloon *vb)
> > > +{
> > > + struct scatterlist sg;
> > > + struct virtqueue *vq = vb->free_page_vq;
> > > +
> > > + sg_init_one(, >stop_cmd_id, sizeof(vb->cmd_id_use));
> > why the inconsistency?
> 
> Thanks, will make it consistent.
> 
> Best,
> Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v30 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-03 Thread Wei Wang

On 04/04/2018 02:47 AM, Michael S. Tsirkin wrote:

On Wed, Apr 04, 2018 at 12:10:03AM +0800, Wei Wang wrote:

+static int add_one_sg(struct virtqueue *vq, unsigned long pfn, uint32_t len)
+{
+   struct scatterlist sg;
+   unsigned int unused;
+
+   sg_init_table(, 1);
+   sg_set_page(, pfn_to_page(pfn), len, 0);
+
+   /* Detach all the used buffers from the vq */
+   while (virtqueue_get_buf(vq, ))
+   ;
+
+   /*
+* Since this is an optimization feature, losing a couple of free
+* pages to report isn't important. We simply return without adding
+* the page hint if the vq is full.

why not stop scanning of following pages though?


Because continuing to send hints is a way to deliver the maximum 
possible hints to host. For example, host may have a delay in taking 
hints at some point, and then it resumes to take hints soon. If the 
driver does not stop when the vq is full, it will be able to put more 
hints to the vq once the vq has available entries to add.






+* We are adding one entry each time, which essentially results in no
+* memory allocation, so the GFP_KERNEL flag below can be ignored.
+* Host works by polling the free page vq for hints after sending the
+* starting cmd id, so the driver doesn't need to kick after filling
+* the vq.
+* Lastly, there is always one entry reserved for the cmd id to use.
+*/
+   if (vq->num_free > 1)
+   return virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL);
+
+   return 0;
+}
+
+static int virtio_balloon_send_free_pages(void *opaque, unsigned long pfn,
+  unsigned long nr_pages)
+{
+   struct virtio_balloon *vb = (struct virtio_balloon *)opaque;
+   uint32_t len = nr_pages << PAGE_SHIFT;
+
+   /*
+* If a stop id or a new cmd id was just received from host, stop
+* the reporting, and return 1 to indicate an active stop.
+*/
+   if (virtio32_to_cpu(vb->vdev, vb->cmd_id_use) != vb->cmd_id_received)
+   return 1;
+

this access to cmd_id_use and cmd_id_received without locks
bothers me. Pls document why it's safe.


OK. Probably we could add below to the above comments:

cmd_id_use and cmd_id_received don't need to be accessed under locks 
because the reporting does not have to stop immediately before 
cmd_id_received is changed (i.e. when host requests to stop). That is, 
reporting more hints after host requests to stop isn't an issue for this 
optimization feature, because host will simply drop the stale hints next 
time when it needs a new reporting.








+   return add_one_sg(vb->free_page_vq, pfn, len);
+}
+
+static int send_start_cmd_id(struct virtio_balloon *vb, uint32_t cmd_id)
+{
+   struct scatterlist sg;
+   struct virtqueue *vq = vb->free_page_vq;
+
+   vb->cmd_id_use = cpu_to_virtio32(vb->vdev, cmd_id);
+   sg_init_one(, >cmd_id_use, sizeof(vb->cmd_id_use));
+   return virtqueue_add_outbuf(vq, , 1, vb, GFP_KERNEL);
+}
+
+static int send_stop_cmd_id(struct virtio_balloon *vb)
+{
+   struct scatterlist sg;
+   struct virtqueue *vq = vb->free_page_vq;
+
+   sg_init_one(, >stop_cmd_id, sizeof(vb->cmd_id_use));

why the inconsistency?


Thanks, will make it consistent.

Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v30 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-03 Thread Michael S. Tsirkin
On Wed, Apr 04, 2018 at 12:10:03AM +0800, Wei Wang wrote:
> Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
> support of reporting hints of guest free pages to host via virtio-balloon.
> 
> Host requests the guest to report free page hints by sending a new cmd
> id to the guest via the free_page_report_cmd_id configuration register.
> 
> When the guest starts to report, the first element added to the free page
> vq is the cmd id given by host. When the guest finishes the reporting
> of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
> to the vq to tell host that the reporting is done. Host polls the free
> page vq after sending the starting cmd id, so the guest doesn't need to
> kick after filling an element to the vq.
> 
> Host may also requests the guest to stop the reporting in advance by
> sending the stop cmd id to the guest via the configuration register.
> 
> Signed-off-by: Wei Wang <wei.w.w...@intel.com>
> Signed-off-by: Liang Li <liang.z...@intel.com>
> Cc: Michael S. Tsirkin <m...@redhat.com>
> Cc: Michal Hocko <mho...@kernel.org>
> ---
>  drivers/virtio/virtio_balloon.c | 257 
> +++-
>  include/uapi/linux/virtio_balloon.h |   4 +
>  2 files changed, 225 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index dfe5684..18d24a4 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -51,9 +51,22 @@
>  static struct vfsmount *balloon_mnt;
>  #endif
>  
> +enum virtio_balloon_vq {
> + VIRTIO_BALLOON_VQ_INFLATE,
> + VIRTIO_BALLOON_VQ_DEFLATE,
> + VIRTIO_BALLOON_VQ_STATS,
> + VIRTIO_BALLOON_VQ_FREE_PAGE,
> + VIRTIO_BALLOON_VQ_MAX
> +};
> +
>  struct virtio_balloon {
>   struct virtio_device *vdev;
> - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> +
> + /* Balloon's own wq for cpu-intensive work items */
> + struct workqueue_struct *balloon_wq;
> + /* The free page reporting work item submitted to the balloon wq */
> + struct work_struct report_free_page_work;
>  
>   /* The balloon servicing is delegated to a freezable workqueue. */
>   struct work_struct update_balloon_stats_work;
> @@ -63,6 +76,13 @@ struct virtio_balloon {
>   spinlock_t stop_update_lock;
>   bool stop_update;
>  
> + /* The new cmd id received from host */
> + uint32_t cmd_id_received;
> + /* The cmd id that is in use */
> + __virtio32 cmd_id_use;
> + /* Buffer to store the stop sign */
> + __virtio32 stop_cmd_id;
> +
>   /* Waiting for host to ack the pages we released. */
>   wait_queue_head_t acked;
>  
> @@ -320,17 +340,6 @@ static void stats_handle_request(struct virtio_balloon 
> *vb)
>   virtqueue_kick(vq);
>  }
>  
> -static void virtballoon_changed(struct virtio_device *vdev)
> -{
> - struct virtio_balloon *vb = vdev->priv;
> - unsigned long flags;
> -
> - spin_lock_irqsave(>stop_update_lock, flags);
> - if (!vb->stop_update)
> - queue_work(system_freezable_wq, >update_balloon_size_work);
> - spin_unlock_irqrestore(>stop_update_lock, flags);
> -}
> -
>  static inline s64 towards_target(struct virtio_balloon *vb)
>  {
>   s64 target;
> @@ -347,6 +356,34 @@ static inline s64 towards_target(struct virtio_balloon 
> *vb)
>   return target - vb->num_pages;
>  }
>  
> +static void virtballoon_changed(struct virtio_device *vdev)
> +{
> + struct virtio_balloon *vb = vdev->priv;
> + unsigned long flags;
> + s64 diff = towards_target(vb);
> +
> + if (diff) {
> + spin_lock_irqsave(>stop_update_lock, flags);
> + if (!vb->stop_update)
> + queue_work(system_freezable_wq,
> +>update_balloon_size_work);
> + spin_unlock_irqrestore(>stop_update_lock, flags);
> + }
> +
> + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> + virtio_cread(vdev, struct virtio_balloon_config,
> +  free_page_report_cmd_id, >cmd_id_received);
> + if (vb->cmd_id_received !=
> + VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
> + spin_lock_irqsave(>stop_update_lock, flags);
> + if (!vb->stop_update)
> + queue_work(vb->balloon_wq,
> +>report_free_page_work);

[PATCH v30 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-03 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang <wei.w.w...@intel.com>
Signed-off-by: Liang Li <liang.z...@intel.com>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Michal Hocko <mho...@kernel.org>
---
 drivers/virtio/virtio_balloon.c | 257 +++-
 include/uapi/linux/virtio_balloon.h |   4 +
 2 files changed, 225 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index dfe5684..18d24a4 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,13 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The new cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is in use */
+   __virtio32 cmd_id_use;
+   /* Buffer to store the stop sign */
+   __virtio32 stop_cmd_id;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -320,17 +340,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -347,6 +356,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  >report_free_page_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -421,42 +458,163 @@ static void update_balloon_size_func(struct work_struct 
*work)
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-   static const char * const names[] = { &quo

[PATCH v29 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-03-25 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang <wei.w.w...@intel.com>
Signed-off-by: Liang Li <liang.z...@intel.com>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Michal Hocko <mho...@kernel.org>
---
 drivers/virtio/virtio_balloon.c | 257 +++-
 include/uapi/linux/virtio_balloon.h |   4 +
 2 files changed, 225 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index dfe5684..18d24a4 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,13 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The new cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is in use */
+   __virtio32 cmd_id_use;
+   /* Buffer to store the stop sign */
+   __virtio32 stop_cmd_id;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -320,17 +340,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -347,6 +356,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  >report_free_page_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -421,42 +458,163 @@ static void update_balloon_size_func(struct work_struct 
*work)
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-  

[PATCH v28 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-08 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang <wei.w.w...@intel.com>
Signed-off-by: Liang Li <liang.z...@intel.com>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Michal Hocko <mho...@kernel.org>
---
 drivers/virtio/virtio_balloon.c | 245 ++--
 include/uapi/linux/virtio_balloon.h |   4 +
 2 files changed, 213 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index a1fb52c..39ecce3 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,11 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The new cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is in use */
+   __virtio32 cmd_id_use;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -316,17 +334,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -343,6 +350,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  >report_free_page_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -417,42 +452,155 @@ static void update_balloon_size_func(struct work_struct 
*work)
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-   static const char * const names[] = { "inflate&qu

Re: [PATCH v26 2/2 RESEND] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-06 Thread Wei Wang

On 02/07/2018 12:34 PM, Michael S. Tsirkin wrote:

On Wed, Feb 07, 2018 at 11:01:06AM +0800, Wei Wang wrote:

Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang <wei.w.w...@intel.com>
Signed-off-by: Liang Li <liang.z...@intel.com>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Michal Hocko <mho...@kernel.org>
---
  drivers/virtio/virtio_balloon.c | 255 +++-
  include/uapi/linux/virtio_balloon.h |   7 +
  mm/page_poison.c|   6 +
  3 files changed, 232 insertions(+), 36 deletions(-)

Resend Change:
- Expose page_poisoning_enabled to kernel modules

RESEND tag is for reposting unchanged patches.
you want to post a v27, and you want the mm patch
as a separate one, so you can get an ack on it from
someone on linux-mm.

In fact, I would probably add reporting the poison value as
a separate feature/couple of patches.



OK. I have made them separate patches in v27. Thanks a lot for reviewing 
so many versions, I learned a lot from the comments and discussion.


Best,
Wei

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v27 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-06 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang <wei.w.w...@intel.com>
Signed-off-by: Liang Li <liang.z...@intel.com>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Michal Hocko <mho...@kernel.org>
---
 drivers/virtio/virtio_balloon.c | 245 ++--
 include/uapi/linux/virtio_balloon.h |   4 +
 2 files changed, 213 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index a1fb52c..39ecce3 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,11 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The new cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is in use */
+   __virtio32 cmd_id_use;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -316,17 +334,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -343,6 +350,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  >report_free_page_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -417,42 +452,155 @@ static void update_balloon_size_func(struct work_struct 
*work)
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-   static const char * const names[] = { "inflate&qu

Re: [PATCH v26 2/2 RESEND] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-06 Thread Michael S. Tsirkin
On Wed, Feb 07, 2018 at 11:01:06AM +0800, Wei Wang wrote:
> Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
> support of reporting hints of guest free pages to host via virtio-balloon.
> 
> Host requests the guest to report free page hints by sending a new cmd
> id to the guest via the free_page_report_cmd_id configuration register.
> 
> When the guest starts to report, the first element added to the free page
> vq is the cmd id given by host. When the guest finishes the reporting
> of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
> to the vq to tell host that the reporting is done. Host polls the free
> page vq after sending the starting cmd id, so the guest doesn't need to
> kick after filling an element to the vq.
> 
> Host may also requests the guest to stop the reporting in advance by
> sending the stop cmd id to the guest via the configuration register.
> 
> Signed-off-by: Wei Wang <wei.w.w...@intel.com>
> Signed-off-by: Liang Li <liang.z...@intel.com>
> Cc: Michael S. Tsirkin <m...@redhat.com>
> Cc: Michal Hocko <mho...@kernel.org>
> ---
>  drivers/virtio/virtio_balloon.c | 255 
> +++-
>  include/uapi/linux/virtio_balloon.h |   7 +
>  mm/page_poison.c|   6 +
>  3 files changed, 232 insertions(+), 36 deletions(-)
> 
> Resend Change:
>   - Expose page_poisoning_enabled to kernel modules

RESEND tag is for reposting unchanged patches.
you want to post a v27, and you want the mm patch
as a separate one, so you can get an ack on it from
someone on linux-mm.

In fact, I would probably add reporting the poison value as
a separate feature/couple of patches.

> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index a1fb52c..5476725 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  static struct vfsmount *balloon_mnt;
>  #endif
>  
> +enum virtio_balloon_vq {
> + VIRTIO_BALLOON_VQ_INFLATE,
> + VIRTIO_BALLOON_VQ_DEFLATE,
> + VIRTIO_BALLOON_VQ_STATS,
> + VIRTIO_BALLOON_VQ_FREE_PAGE,
> + VIRTIO_BALLOON_VQ_MAX
> +};
> +
>  struct virtio_balloon {
>   struct virtio_device *vdev;
> - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> +
> + /* Balloon's own wq for cpu-intensive work items */
> + struct workqueue_struct *balloon_wq;
> + /* The free page reporting work item submitted to the balloon wq */
> + struct work_struct report_free_page_work;
>  
>   /* The balloon servicing is delegated to a freezable workqueue. */
>   struct work_struct update_balloon_stats_work;
> @@ -63,6 +76,11 @@ struct virtio_balloon {
>   spinlock_t stop_update_lock;
>   bool stop_update;
>  
> + /* The new cmd id received from host */
> + uint32_t cmd_id_received;
> + /* The cmd id that is in use */
> + __virtio32 cmd_id_use;
> +
>   /* Waiting for host to ack the pages we released. */
>   wait_queue_head_t acked;
>  
> @@ -316,17 +334,6 @@ static void stats_handle_request(struct virtio_balloon 
> *vb)
>   virtqueue_kick(vq);
>  }
>  
> -static void virtballoon_changed(struct virtio_device *vdev)
> -{
> - struct virtio_balloon *vb = vdev->priv;
> - unsigned long flags;
> -
> - spin_lock_irqsave(>stop_update_lock, flags);
> - if (!vb->stop_update)
> - queue_work(system_freezable_wq, >update_balloon_size_work);
> - spin_unlock_irqrestore(>stop_update_lock, flags);
> -}
> -
>  static inline s64 towards_target(struct virtio_balloon *vb)
>  {
>   s64 target;
> @@ -343,6 +350,34 @@ static inline s64 towards_target(struct virtio_balloon 
> *vb)
>   return target - vb->num_pages;
>  }
>  
> +static void virtballoon_changed(struct virtio_device *vdev)
> +{
> + struct virtio_balloon *vb = vdev->priv;
> + unsigned long flags;
> + s64 diff = towards_target(vb);
> +
> + if (diff) {
> + spin_lock_irqsave(>stop_update_lock, flags);
> + if (!vb->stop_update)
> + queue_work(system_freezable_wq,
> +>update_balloon_size_work);
> + spin_unlock_irqrestore(>stop_update_lock, flags);
> + }
> +
> + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> + virtio_cread(vdev, struct virtio_balloon_config,
> +  free_page_report_cmd_id, >cmd_id_recei

[PATCH v26 2/2 RESEND] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-06 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang <wei.w.w...@intel.com>
Signed-off-by: Liang Li <liang.z...@intel.com>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Michal Hocko <mho...@kernel.org>
---
 drivers/virtio/virtio_balloon.c | 255 +++-
 include/uapi/linux/virtio_balloon.h |   7 +
 mm/page_poison.c|   6 +
 3 files changed, 232 insertions(+), 36 deletions(-)

Resend Change:
- Expose page_poisoning_enabled to kernel modules

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index a1fb52c..5476725 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,11 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The new cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is in use */
+   __virtio32 cmd_id_use;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -316,17 +334,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -343,6 +350,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  >report_free_page_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -417,42 +452,155 @@ static void update_balloon_size_func(struct work_struct 
*work)
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callback

[PATCH v26 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-04 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.

Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.

Signed-off-by: Wei Wang <wei.w.w...@intel.com>
Signed-off-by: Liang Li <liang.z...@intel.com>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Michal Hocko <mho...@kernel.org>
---
 drivers/virtio/virtio_balloon.c | 255 +++-
 include/uapi/linux/virtio_balloon.h |   7 +
 2 files changed, 226 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index a1fb52c..5476725 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,11 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The new cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is in use */
+   __virtio32 cmd_id_use;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -316,17 +334,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -343,6 +350,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  >report_free_page_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -417,42 +452,155 @@ static void update_balloon_size_func(struct work_struct 
*work)
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-   static const char * const names[] = { "inflate&qu

Re: [PATCH v24 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-01 Thread Michael S. Tsirkin
On Thu, Jan 25, 2018 at 08:28:52PM +0900, Tetsuo Handa wrote:
> On 2018/01/25 12:32, Wei Wang wrote:
> > On 01/25/2018 01:15 AM, Michael S. Tsirkin wrote:
> >> On Wed, Jan 24, 2018 at 06:42:42PM +0800, Wei Wang wrote:
> >> +
> >> +static void report_free_page_func(struct work_struct *work)
> >> +{
> >> +struct virtio_balloon *vb;
> >> +unsigned long flags;
> >> +
> >> +vb = container_of(work, struct virtio_balloon, report_free_page_work);
> >> +
> >> +/* Start by sending the obtained cmd id to the host with an outbuf */
> >> +send_cmd_id(vb, >start_cmd_id);
> >> +
> >> +/*
> >> + * Set start_cmd_id to VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID to
> >> + * indicate a new request can be queued.
> >> + */
> >> +spin_lock_irqsave(>stop_update_lock, flags);
> >> +vb->start_cmd_id = cpu_to_virtio32(vb->vdev,
> >> +VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
> >> +spin_unlock_irqrestore(>stop_update_lock, flags);
> >> +
> >> +walk_free_mem_block(vb, 0, _balloon_send_free_pages);
> >> Can you teach walk_free_mem_block to return the && of all
> >> return calls, so caller knows whether it completed?
> > 
> > There will be two cases that can cause walk_free_mem_block to return 
> > without completing:
> > 1) host requests to stop in advance
> > 2) vq->broken
> > 
> > How about letting walk_free_mem_block simply return the value returned by 
> > its callback (i.e. virtio_balloon_send_free_pages)?
> > 
> > For host requests to stop, it returns "1", and the above only bails out 
> > when walk_free_mem_block return a "< 0" value.
> 
> I feel that virtio_balloon_send_free_pages is doing too heavy things.
> 
> It can be called for many times with IRQ disabled. Number of times
> it is called depends on amount of free pages (and fragmentation state).
> Generally, more free pages, more calls.
> 
> Then, why don't you allocate some pages for holding all pfn values
> and then call walk_free_mem_block() only for storing pfn values
> and then send pfn values without disabling IRQ?

So going over it, at some level we are talking about optimizations here.

I'll go over it one last time but at some level we might
be able to make progress faster if we start by enabling
the functionality in question.
If there are no other issues, I'm inclined to merge.

If nothing else, this
will make it possible for people to experiment and
report sources of overhead.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH v25 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-01 Thread Michael S. Tsirkin
On Thu, Feb 01, 2018 at 05:43:22PM +0800, Wei Wang wrote:
> 3) Hints means the pages are quite likely to be free pages (no guarantee).
> If the pages given to host are going to be freed, then we really couldn't
> call them hints, they are true free pages. Ballooning needs true free pages,
> while live migration needs hints, would you agree with this?

It's an interesting point, I'm convinced by it.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH v25 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-02-01 Thread Wei Wang

On 01/31/2018 07:44 AM, Michael S. Tsirkin wrote:

On Fri, Jan 26, 2018 at 11:31:19AM +0800, Wei Wang wrote:

On 01/26/2018 10:42 AM, Michael S. Tsirkin wrote:

On Fri, Jan 26, 2018 at 09:40:44AM +0800, Wei Wang wrote:

On 01/25/2018 09:49 PM, Michael S. Tsirkin wrote:

On Thu, Jan 25, 2018 at 05:14:06PM +0800, Wei Wang wrote:


The controversy is that the free list is not static
once the lock is dropped, so everything is dynamically changing, including
the state that was recorded. The method we are using is more prudent, IMHO.
How about taking the fundamental solution, and seek to improve incrementally
in the future?


Best,
Wei

I'd like to see kicks happen outside the spinlock. kick with a spinlock
taken looks like a scalability issue that won't be easy to
reproduce but hurt workloads at random unexpected times.


Is that "kick inside the spinlock" the only concern you have? I think we can
remove the kick actually. If we check how the host side works, it is
worthwhile to let the host poll the virtqueue after it receives the cmd id
from the guest (kick for cmd id isn't within the lock).


Best,
Wei

So really there are different ways to put free page hints to use.

The current interface requires host to do dirty tracking
for all memory, and it's more or less useless for
things like freeing host memory.

So while your project's needs seem to be addressed, I'm
still a bit disappointed that so little collaboration
happened with e.g. Nitesh's project, to the point where
you don't even CC him on patches.


Isn't "ni...@redhat.com" Nitesh? Actually it's been cc-ed long time ago.

I think we should at least see the performance numbers and a working 
prototype from them (I remember they lack the host side implementation).


Btw, this feature is requested by many customers of Linux (not our own 
project's need). They want to use this feature to optimize their *live 
migration*. Hope the community could understand our need.




So I'm kind of trying to bridge this a bit - I would
like the interfaces that we build to at least superficially
look like they might be reusable for other uses of hinting.

Imagine that you don't have dirty tracking on the host.
What would it take to still use hinting information,
e.g. to call MADV_FREE on the pages guest gives us?

I think you need to kick and you need to wait for
host to consume the hint before page is reused.
And we know madvise takes a lot of time sometimes,
so locking out the free list does not sound like a
good idea.

That's why I was talking about kick out of lock,
so that eventually we can reuse that for hinting
and actually wait for an interrupt.

So how about we take a bunch of pages out of the free list, move them to
the balloon, kick (and optionally wait for host to consume), them move
them back? Preferably to end of the list? This will also make things
like sorting them much easier as you can just put them in a binary tree
or something.

For when we need to be careful to make sure we don't
create an OOM situation with this out of thin air,
and for when you can't give everything to host in one go,
you might want some kind of notifier that tells you
that you need to return pages to the free list ASAP.

How'd this sound?



I think the above is a duplicate function of ballooning, though there 
are some differences. Please see below my concerns and different thoughts:


1) From the previous discussion, the only acceptable method to get pages 
from mm is to do alloc() (btw, we are not getting pages in this patch, 
we are getting hints). The above sounds like we are going to take pages 
from the free list without mm's awareness. I'm not sure if you would be 
ready to convince the mm folks that this idea is allowed.


2) If the guest has 8G free memory, how much can virtio-balloon take 
with the above method? For example, if virtio-balloon only takes 1G, 
with 7G left in mm. The next moment, it is possible that something comes 
out and needs to use 7.5GB. I think it is barely possible to ensure that 
the amount of memory we take to virtio-balloon won't affect the system.


3) Hints means the pages are quite likely to be free pages (no 
guarantee). If the pages given to host are going to be freed, then we 
really couldn't call them hints, they are true free pages. Ballooning 
needs true free pages, while live migration needs hints, would you agree 
with this? From the perspective of features, they are two different 
features, and should be gated with two feature bits and separated 
implementations. Mixing them would cause many unexpected issues (e.g. 
the case when the two features function at the same time)


4) If we want to add another function of ballooning, how is this better 
than the existing ballooning? The difference I can see is the current 
ballooning takes free pages via alloc(), while the above hacks into the 
free page list.



Best,
Wei
___
Virtualization mailing list

Re: [virtio-dev] Re: [PATCH v25 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-01-30 Thread Michael S. Tsirkin
On Fri, Jan 26, 2018 at 11:31:19AM +0800, Wei Wang wrote:
> On 01/26/2018 10:42 AM, Michael S. Tsirkin wrote:
> > On Fri, Jan 26, 2018 at 09:40:44AM +0800, Wei Wang wrote:
> > > On 01/25/2018 09:49 PM, Michael S. Tsirkin wrote:
> > > > On Thu, Jan 25, 2018 at 05:14:06PM +0800, Wei Wang wrote:
> > > > 
> 
> > > The controversy is that the free list is not static
> > > once the lock is dropped, so everything is dynamically changing, including
> > > the state that was recorded. The method we are using is more prudent, 
> > > IMHO.
> > > How about taking the fundamental solution, and seek to improve 
> > > incrementally
> > > in the future?
> > > 
> > > 
> > > Best,
> > > Wei
> > I'd like to see kicks happen outside the spinlock. kick with a spinlock
> > taken looks like a scalability issue that won't be easy to
> > reproduce but hurt workloads at random unexpected times.
> > 
> 
> Is that "kick inside the spinlock" the only concern you have? I think we can
> remove the kick actually. If we check how the host side works, it is
> worthwhile to let the host poll the virtqueue after it receives the cmd id
> from the guest (kick for cmd id isn't within the lock).
> 
> 
> Best,
> Wei

So really there are different ways to put free page hints to use.

The current interface requires host to do dirty tracking
for all memory, and it's more or less useless for
things like freeing host memory.

So while your project's needs seem to be addressed, I'm
still a bit disappointed that so little collaboration
happened with e.g. Nitesh's project, to the point where
you don't even CC him on patches.

So I'm kind of trying to bridge this a bit - I would
like the interfaces that we build to at least superficially
look like they might be reusable for other uses of hinting.

Imagine that you don't have dirty tracking on the host.
What would it take to still use hinting information,
e.g. to call MADV_FREE on the pages guest gives us?

I think you need to kick and you need to wait for
host to consume the hint before page is reused.
And we know madvise takes a lot of time sometimes,
so locking out the free list does not sound like a
good idea.

That's why I was talking about kick out of lock,
so that eventually we can reuse that for hinting
and actually wait for an interrupt.

So how about we take a bunch of pages out of the free list, move them to
the balloon, kick (and optionally wait for host to consume), them move
them back? Preferably to end of the list? This will also make things
like sorting them much easier as you can just put them in a binary tree
or something.

For when we need to be careful to make sure we don't
create an OOM situation with this out of thin air,
and for when you can't give everything to host in one go,
you might want some kind of notifier that tells you
that you need to return pages to the free list ASAP.

How'd this sound?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH v25 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-01-26 Thread Tetsuo Handa
On 2018/01/26 12:31, Wei Wang wrote:
> On 01/26/2018 10:42 AM, Michael S. Tsirkin wrote:
>> On Fri, Jan 26, 2018 at 09:40:44AM +0800, Wei Wang wrote:
>>> On 01/25/2018 09:49 PM, Michael S. Tsirkin wrote:
 On Thu, Jan 25, 2018 at 05:14:06PM +0800, Wei Wang wrote:

> 
>>> The controversy is that the free list is not static
>>> once the lock is dropped, so everything is dynamically changing, including
>>> the state that was recorded. The method we are using is more prudent, IMHO.
>>> How about taking the fundamental solution, and seek to improve incrementally
>>> in the future?
>>>
>>>
>>> Best,
>>> Wei
>> I'd like to see kicks happen outside the spinlock. kick with a spinlock
>> taken looks like a scalability issue that won't be easy to
>> reproduce but hurt workloads at random unexpected times.
>>
> 
> Is that "kick inside the spinlock" the only concern you have? I think we can 
> remove the kick actually. If we check how the host side works, it is 
> worthwhile to let the host poll the virtqueue after it receives the cmd id 
> from the guest (kick for cmd id isn't within the lock).

We should start from the worst case.

+ * The callback itself must not sleep or perform any operations which would
+ * require any memory allocations directly (not even GFP_NOWAIT/GFP_ATOMIC)
+ * or via any lock dependency. It is generally advisable to implement
+ * the callback as simple as possible and defer any heavy lifting to a
+ * different context.

Making decision based on performance numbers of idle guests is dangerous.
There might be busy CPUs waiting for zone->lock.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH v25 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-01-25 Thread Wei Wang

On 01/26/2018 10:42 AM, Michael S. Tsirkin wrote:

On Fri, Jan 26, 2018 at 09:40:44AM +0800, Wei Wang wrote:

On 01/25/2018 09:49 PM, Michael S. Tsirkin wrote:

On Thu, Jan 25, 2018 at 05:14:06PM +0800, Wei Wang wrote:




The controversy is that the free list is not static
once the lock is dropped, so everything is dynamically changing, including
the state that was recorded. The method we are using is more prudent, IMHO.
How about taking the fundamental solution, and seek to improve incrementally
in the future?


Best,
Wei

I'd like to see kicks happen outside the spinlock. kick with a spinlock
taken looks like a scalability issue that won't be easy to
reproduce but hurt workloads at random unexpected times.



Is that "kick inside the spinlock" the only concern you have? I think we 
can remove the kick actually. If we check how the host side works, it is 
worthwhile to let the host poll the virtqueue after it receives the cmd 
id from the guest (kick for cmd id isn't within the lock).



Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH v25 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-01-25 Thread Michael S. Tsirkin
On Fri, Jan 26, 2018 at 09:40:44AM +0800, Wei Wang wrote:
> On 01/25/2018 09:49 PM, Michael S. Tsirkin wrote:
> > On Thu, Jan 25, 2018 at 05:14:06PM +0800, Wei Wang wrote:
> > > +
> > > +static void report_free_page_func(struct work_struct *work)
> > > +{
> > > + struct virtio_balloon *vb;
> > > + int ret;
> > > +
> > > + vb = container_of(work, struct virtio_balloon, report_free_page_work);
> > > +
> > > + /* Start by sending the received cmd id to host with an outbuf */
> > > + ret = send_cmd_id(vb, vb->cmd_id_received);
> > > + if (unlikely(ret))
> > > + goto err;
> > > +
> > > + ret = walk_free_mem_block(vb, 0, _balloon_send_free_pages);
> > > + if (unlikely(ret < 0))
> > > + goto err;
> > > +
> > > + /* End by sending a stop id to host with an outbuf */
> > > + ret = send_cmd_id(vb, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
> > > + if (likely(!ret))
> > > + return;
> > > +err:
> > > + dev_err(>vdev->dev, "%s failure: free page vq is broken\n",
> > > + __func__);
> > > +}
> > > +
> > So that's very simple, but it only works well if the whole
> > free list fits in the queue or host processes the queue faster
> > than the guest. What if it doesn't?
> 
> This is the case that the virtqueue gets full, and I think we've agreed that
> this is an optimization feature and losing some hints to report isn't
> important, right?
> 
> Actually, in the tests, there is no chance to see the ring is full. If we
> check the host patches that were shared before, the device side operation is
> quite simple, it just clears the related bits from the bitmap, and then
> continues to take entries from the virtqueue till the virtqueue gets empty.
> 
> 
> > If we had restartability you could just drop the lock
> > and wait for a vq interrupt to make more progress, which
> > would be better I think.
> > 
> 
> Restartability means that caller needs to record the state where it was when
> it stopped last time.

See my comment on the mm patch: if you rotate the previously reported
pages towards the end, then you mostly get restartability for free,
if only per zone.
The only thing remaining will be stopping at a page you already reported.

There aren't many zones so restartability wrt zones is kind of
trivial.

> The controversy is that the free list is not static
> once the lock is dropped, so everything is dynamically changing, including
> the state that was recorded. The method we are using is more prudent, IMHO.
> How about taking the fundamental solution, and seek to improve incrementally
> in the future?
> 
> 
> Best,
> Wei

I'd like to see kicks happen outside the spinlock. kick with a spinlock
taken looks like a scalability issue that won't be easy to
reproduce but hurt workloads at random unexpected times.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v25 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-01-25 Thread Michael S. Tsirkin
On Thu, Jan 25, 2018 at 05:14:06PM +0800, Wei Wang wrote:
> Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
> support of reporting hints of guest free pages to host via virtio-balloon.
> 
> Host requests the guest to report free pages by sending a new cmd
> id to the guest via the free_page_report_cmd_id configuration register.
> 
> When the guest starts to report, the first element added to the free page
> vq is the cmd id given by host. When the guest finishes the reporting
> of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
> to the vq to tell host that the reporting is done. Host may also requests
> the guest to stop the reporting in advance by sending the stop cmd id to
> the guest via the configuration register.
> 
> Signed-off-by: Wei Wang <wei.w.w...@intel.com>
> Signed-off-by: Liang Li <liang.z...@intel.com>
> Cc: Michael S. Tsirkin <m...@redhat.com>
> Cc: Michal Hocko <mho...@kernel.org>
> ---
>  drivers/virtio/virtio_balloon.c | 251 
> ++--
>  include/uapi/linux/virtio_balloon.h |   7 +
>  2 files changed, 222 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index a1fb52c..114985b 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  static struct vfsmount *balloon_mnt;
>  #endif
>  
> +enum virtio_balloon_vq {
> + VIRTIO_BALLOON_VQ_INFLATE,
> + VIRTIO_BALLOON_VQ_DEFLATE,
> + VIRTIO_BALLOON_VQ_STATS,
> + VIRTIO_BALLOON_VQ_FREE_PAGE,
> + VIRTIO_BALLOON_VQ_MAX
> +};
> +
>  struct virtio_balloon {
>   struct virtio_device *vdev;
> - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> +
> + /* Balloon's own wq for cpu-intensive work items */
> + struct workqueue_struct *balloon_wq;
> + /* The free page reporting work item submitted to the balloon wq */
> + struct work_struct report_free_page_work;
>  
>   /* The balloon servicing is delegated to a freezable workqueue. */
>   struct work_struct update_balloon_stats_work;
> @@ -63,6 +76,11 @@ struct virtio_balloon {
>   spinlock_t stop_update_lock;
>   bool stop_update;
>  
> + /* The new cmd id received from host */
> + uint32_t cmd_id_received;
> + /* The cmd id that is in use */
> + __virtio32 cmd_id_use;
> +
>   /* Waiting for host to ack the pages we released. */
>   wait_queue_head_t acked;
>  
> @@ -316,17 +334,6 @@ static void stats_handle_request(struct virtio_balloon 
> *vb)
>   virtqueue_kick(vq);
>  }
>  
> -static void virtballoon_changed(struct virtio_device *vdev)
> -{
> - struct virtio_balloon *vb = vdev->priv;
> - unsigned long flags;
> -
> - spin_lock_irqsave(>stop_update_lock, flags);
> - if (!vb->stop_update)
> - queue_work(system_freezable_wq, >update_balloon_size_work);
> - spin_unlock_irqrestore(>stop_update_lock, flags);
> -}
> -
>  static inline s64 towards_target(struct virtio_balloon *vb)
>  {
>   s64 target;
> @@ -343,6 +350,34 @@ static inline s64 towards_target(struct virtio_balloon 
> *vb)
>   return target - vb->num_pages;
>  }
>  
> +static void virtballoon_changed(struct virtio_device *vdev)
> +{
> + struct virtio_balloon *vb = vdev->priv;
> + unsigned long flags;
> + s64 diff = towards_target(vb);
> +
> + if (diff) {
> + spin_lock_irqsave(>stop_update_lock, flags);
> + if (!vb->stop_update)
> + queue_work(system_freezable_wq,
> +>update_balloon_size_work);
> + spin_unlock_irqrestore(>stop_update_lock, flags);
> + }
> +
> + if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> + virtio_cread(vdev, struct virtio_balloon_config,
> +  free_page_report_cmd_id, >cmd_id_received);
> + if (vb->cmd_id_received !=
> + VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
> + spin_lock_irqsave(>stop_update_lock, flags);
> + if (!vb->stop_update)
> + queue_work(vb->balloon_wq,
> +>report_free_page_work);
> + spin_unlock_irqrestore(>stop_update_lock, flags);
> + }
> + }
> +}
> +
>  static void update_balloon_size(s

Re: [PATCH v24 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-01-25 Thread Wei Wang

On 01/25/2018 07:28 PM, Tetsuo Handa wrote:

On 2018/01/25 12:32, Wei Wang wrote:

On 01/25/2018 01:15 AM, Michael S. Tsirkin wrote:

On Wed, Jan 24, 2018 at 06:42:42PM +0800, Wei Wang wrote:
+
+static void report_free_page_func(struct work_struct *work)
+{
+struct virtio_balloon *vb;
+unsigned long flags;
+
+vb = container_of(work, struct virtio_balloon, report_free_page_work);
+
+/* Start by sending the obtained cmd id to the host with an outbuf */
+send_cmd_id(vb, >start_cmd_id);
+
+/*
+ * Set start_cmd_id to VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID to
+ * indicate a new request can be queued.
+ */
+spin_lock_irqsave(>stop_update_lock, flags);
+vb->start_cmd_id = cpu_to_virtio32(vb->vdev,
+VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
+spin_unlock_irqrestore(>stop_update_lock, flags);
+
+walk_free_mem_block(vb, 0, _balloon_send_free_pages);
Can you teach walk_free_mem_block to return the && of all
return calls, so caller knows whether it completed?

There will be two cases that can cause walk_free_mem_block to return without 
completing:
1) host requests to stop in advance
2) vq->broken

How about letting walk_free_mem_block simply return the value returned by its 
callback (i.e. virtio_balloon_send_free_pages)?

For host requests to stop, it returns "1", and the above only bails out when 
walk_free_mem_block return a "< 0" value.

I feel that virtio_balloon_send_free_pages is doing too heavy things.

It can be called for many times with IRQ disabled. Number of times
it is called depends on amount of free pages (and fragmentation state).
Generally, more free pages, more calls.

Then, why don't you allocate some pages for holding all pfn values
and then call walk_free_mem_block() only for storing pfn values
and then send pfn values without disabling IRQ?


We have actually tried many methods for this feature before, and what 
you suggested is one of them, and you could also find the related 
discussion in earlier versions. In addition to the complexity of that 
method (if thinking deeper along that line), I can share the performance 
(the live migration time) comparison of that method with this one in 
this patch: ~405ms vs. ~260 ms.


The things that you worried about have also been discussed actually. The 
strategy is that we start with something fundamental and increase 
incrementally (if you check earlier versions, we also have a method 
which makes the lock finer granularity, but we decided to leave this to 
the future improvement for prudence purpose). If possible, please let 
Michael review this patch, he already knows all those things. We will 
finish this feature as soon as possible, and then discuss with you about 
another one if you want. Thanks.


Best,
Wei



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v24 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-01-25 Thread Tetsuo Handa
On 2018/01/25 12:32, Wei Wang wrote:
> On 01/25/2018 01:15 AM, Michael S. Tsirkin wrote:
>> On Wed, Jan 24, 2018 at 06:42:42PM +0800, Wei Wang wrote:
>> +
>> +static void report_free_page_func(struct work_struct *work)
>> +{
>> +struct virtio_balloon *vb;
>> +unsigned long flags;
>> +
>> +vb = container_of(work, struct virtio_balloon, report_free_page_work);
>> +
>> +/* Start by sending the obtained cmd id to the host with an outbuf */
>> +send_cmd_id(vb, >start_cmd_id);
>> +
>> +/*
>> + * Set start_cmd_id to VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID to
>> + * indicate a new request can be queued.
>> + */
>> +spin_lock_irqsave(>stop_update_lock, flags);
>> +vb->start_cmd_id = cpu_to_virtio32(vb->vdev,
>> +VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
>> +spin_unlock_irqrestore(>stop_update_lock, flags);
>> +
>> +walk_free_mem_block(vb, 0, _balloon_send_free_pages);
>> Can you teach walk_free_mem_block to return the && of all
>> return calls, so caller knows whether it completed?
> 
> There will be two cases that can cause walk_free_mem_block to return without 
> completing:
> 1) host requests to stop in advance
> 2) vq->broken
> 
> How about letting walk_free_mem_block simply return the value returned by its 
> callback (i.e. virtio_balloon_send_free_pages)?
> 
> For host requests to stop, it returns "1", and the above only bails out when 
> walk_free_mem_block return a "< 0" value.

I feel that virtio_balloon_send_free_pages is doing too heavy things.

It can be called for many times with IRQ disabled. Number of times
it is called depends on amount of free pages (and fragmentation state).
Generally, more free pages, more calls.

Then, why don't you allocate some pages for holding all pfn values
and then call walk_free_mem_block() only for storing pfn values
and then send pfn values without disabling IRQ?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v24 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-01-25 Thread Wei Wang

On 01/25/2018 01:15 AM, Michael S. Tsirkin wrote:

On Wed, Jan 24, 2018 at 06:42:42PM +0800, Wei Wang wrote:
  


What is this doing? Basically handling the case where vq is broken?
It's kind of ugly to tweak feature bits, most code assumes they never
change.  Please just return an error to caller instead and handle it
there.

You can then avoid sprinking the check for the feature bit
all over the code.



One thing I don't like about this one is that the previous request
will still try to run to completion.

And it all seems pretty complex.

How about:
- pass cmd id to a queued work
- queued work gets that cmd id, stores a copy and uses that,
   re-checking periodically - stop if cmd id changes:
   will replace  report_free_page too since that's set to
   stop.

This means you do not reuse the queued cmd id also
for the buffer - which is probably for the best.


Thanks for the suggestion. Please have a check how it's implemented in v25.
Just a little reminder that work queue has internally ensured that there 
is no re-entrant of the same queued function.


Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v25 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-01-25 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free pages by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host may also requests
the guest to stop the reporting in advance by sending the stop cmd id to
the guest via the configuration register.

Signed-off-by: Wei Wang <wei.w.w...@intel.com>
Signed-off-by: Liang Li <liang.z...@intel.com>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Michal Hocko <mho...@kernel.org>
---
 drivers/virtio/virtio_balloon.c | 251 ++--
 include/uapi/linux/virtio_balloon.h |   7 +
 2 files changed, 222 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index a1fb52c..114985b 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+enum virtio_balloon_vq {
+   VIRTIO_BALLOON_VQ_INFLATE,
+   VIRTIO_BALLOON_VQ_DEFLATE,
+   VIRTIO_BALLOON_VQ_STATS,
+   VIRTIO_BALLOON_VQ_FREE_PAGE,
+   VIRTIO_BALLOON_VQ_MAX
+};
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,11 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* The new cmd id received from host */
+   uint32_t cmd_id_received;
+   /* The cmd id that is in use */
+   __virtio32 cmd_id_use;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -316,17 +334,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_size_work);
-   spin_unlock_irqrestore(>stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -343,6 +350,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+   unsigned long flags;
+   s64 diff = towards_target(vb);
+
+   if (diff) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(system_freezable_wq,
+  >update_balloon_size_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+
+   if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+   virtio_cread(vdev, struct virtio_balloon_config,
+free_page_report_cmd_id, >cmd_id_received);
+   if (vb->cmd_id_received !=
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+   spin_lock_irqsave(>stop_update_lock, flags);
+   if (!vb->stop_update)
+   queue_work(vb->balloon_wq,
+  >report_free_page_work);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+   }
+   }
+}
+
 static void update_balloon_size(struct virtio_balloon *vb)
 {
u32 actual = vb->num_pages;
@@ -417,42 +452,151 @@ static void update_balloon_size_func(struct work_struct 
*work)
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-   struct virtqueue *vqs[3];
-   vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
};
-   static const char * const names[] = { "inflate", "deflate", "stats" };
-   int err, nvqs;
+   struct virtqueue *vqs[VIRTIO_BALLOON_VQ_MAX];
+   vq_callback_t *callbacks

Re: [PATCH v24 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-01-24 Thread Wei Wang

On 01/25/2018 01:15 AM, Michael S. Tsirkin wrote:

On Wed, Jan 24, 2018 at 06:42:42PM +0800, Wei Wang wrote:
+
+static void report_free_page_func(struct work_struct *work)
+{
+   struct virtio_balloon *vb;
+   unsigned long flags;
+
+   vb = container_of(work, struct virtio_balloon, report_free_page_work);
+
+   /* Start by sending the obtained cmd id to the host with an outbuf */
+   send_cmd_id(vb, >start_cmd_id);
+
+   /*
+* Set start_cmd_id to VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID to
+* indicate a new request can be queued.
+*/
+   spin_lock_irqsave(>stop_update_lock, flags);
+   vb->start_cmd_id = cpu_to_virtio32(vb->vdev,
+   VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
+   spin_unlock_irqrestore(>stop_update_lock, flags);
+
+   walk_free_mem_block(vb, 0, _balloon_send_free_pages);
Can you teach walk_free_mem_block to return the && of all
return calls, so caller knows whether it completed?


There will be two cases that can cause walk_free_mem_block to return 
without completing:

1) host requests to stop in advance
2) vq->broken

How about letting walk_free_mem_block simply return the value returned 
by its callback (i.e. virtio_balloon_send_free_pages)?


For host requests to stop, it returns "1", and the above only bails out 
when walk_free_mem_block return a "< 0" value.


Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v24 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-01-24 Thread Michael S. Tsirkin
On Wed, Jan 24, 2018 at 06:42:42PM +0800, Wei Wang wrote:
> Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
> support of reporting hints of guest free pages to host via virtio-balloon.
> 
> Host requests the guest to report free pages by sending a new cmd
> id to the guest via the free_page_report_cmd_id configuration register.
> 
> When the guest starts to report, the first element added to the free page
> vq is the cmd id given by host. When the guest finishes the reporting
> of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
> to the vq to tell host that the reporting is done. Host may also requests
> the guest to stop the reporting in advance by sending the stop cmd id to
> the guest via the configuration register.
> 
> Signed-off-by: Wei Wang <wei.w.w...@intel.com>
> Signed-off-by: Liang Li <liang.z...@intel.com>
> Cc: Michael S. Tsirkin <m...@redhat.com>
> Cc: Michal Hocko <mho...@kernel.org>
> ---
>  drivers/virtio/virtio_balloon.c | 265 
> +++-
>  include/uapi/linux/virtio_balloon.h |   7 +
>  2 files changed, 236 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index a1fb52c..4440873 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -51,9 +51,21 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  static struct vfsmount *balloon_mnt;
>  #endif
>  
> +/* The number of virtqueues supported by virtio-balloon */
> +#define VIRTIO_BALLOON_VQ_NUM4
> +#define VIRTIO_BALLOON_VQ_ID_INFLATE 0
> +#define VIRTIO_BALLOON_VQ_ID_DEFLATE 1
> +#define VIRTIO_BALLOON_VQ_ID_STATS   2
> +#define VIRTIO_BALLOON_VQ_ID_FREE_PAGE   3
> +

Please do an enum instead of defines. VQ_ID can be just VQ
(it's not an ID, it's just the number).


>  struct virtio_balloon {
>   struct virtio_device *vdev;
> - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> +
> + /* Balloon's own wq for cpu-intensive work items */
> + struct workqueue_struct *balloon_wq;
> + /* The free page reporting work item submitted to the balloon wq */
> + struct work_struct report_free_page_work;
>  
>   /* The balloon servicing is delegated to a freezable workqueue. */
>   struct work_struct update_balloon_stats_work;
> @@ -63,6 +75,13 @@ struct virtio_balloon {
>   spinlock_t stop_update_lock;
>   bool stop_update;
>  
> + /* Start to report free pages */
> + bool report_free_page;
> + /* Stores the cmd id given by host to start the free page reporting */
> + __virtio32 start_cmd_id;
> + /* Stores STOP_ID as a sign to tell host that the reporting is done */
> + __virtio32 stop_cmd_id;
> +
>   /* Waiting for host to ack the pages we released. */
>   wait_queue_head_t acked;
>  
> @@ -281,6 +300,53 @@ static unsigned int update_balloon_stats(struct 
> virtio_balloon *vb)
>   return idx;
>  }
>  
> +static int add_one_sg(struct virtqueue *vq, unsigned long pfn, uint32_t len)
> +{
> + struct scatterlist sg;
> + unsigned int unused;
> + int ret = 0;
> +
> + sg_init_table(, 1);
> + sg_set_page(, pfn_to_page(pfn), len, 0);
> +
> + /* Detach all the used buffers from the vq */
> + while (virtqueue_get_buf(vq, ))
> + ;
> +
> + /*
> +  * Since this is an optimization feature, losing a couple of free
> +  * pages to report isn't important. We simply return without adding
> +  * the page if the vq is full.
> +  * We are adding one entry each time, which essentially results in no
> +  * memory allocation, so the GFP_KERNEL flag below can be ignored.
> +  * There is always one entry reserved for the cmd id to use.
> +  */
> + if (vq->num_free > 1)
> + ret = virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL);
> +
> + if (vq->num_free < virtqueue_get_vring_size(vq) / 2)
> + virtqueue_kick(vq);
> +
> + return ret;
> +}
> +
> +static void send_cmd_id(struct virtio_balloon *vb, __virtio32 *cmd_id)
> +{
> + struct scatterlist sg;
> + struct virtqueue *vq = vb->free_page_vq;
> +
> + if (unlikely(!virtio_has_feature(vb->vdev,
> +  VIRTIO_BALLOON_F_FREE_PAGE_HINT)))
> + return;
> +
> + sg_init_one(, cmd_id, sizeof(*cmd_id));
> +
> + if (virtqueue_add_outbuf(vq, , 1, vb, GFP_KERNEL))
> + __virtio_clear_bit(vb->vdev, VIRTIO_BALLOON_F_FREE_PA

[PATCH v24 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-01-24 Thread Wei Wang
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free pages by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host may also requests
the guest to stop the reporting in advance by sending the stop cmd id to
the guest via the configuration register.

Signed-off-by: Wei Wang <wei.w.w...@intel.com>
Signed-off-by: Liang Li <liang.z...@intel.com>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Michal Hocko <mho...@kernel.org>
---
 drivers/virtio/virtio_balloon.c | 265 +++-
 include/uapi/linux/virtio_balloon.h |   7 +
 2 files changed, 236 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index a1fb52c..4440873 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,21 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 static struct vfsmount *balloon_mnt;
 #endif
 
+/* The number of virtqueues supported by virtio-balloon */
+#define VIRTIO_BALLOON_VQ_NUM  4
+#define VIRTIO_BALLOON_VQ_ID_INFLATE   0
+#define VIRTIO_BALLOON_VQ_ID_DEFLATE   1
+#define VIRTIO_BALLOON_VQ_ID_STATS 2
+#define VIRTIO_BALLOON_VQ_ID_FREE_PAGE 3
+
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +75,13 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* Start to report free pages */
+   bool report_free_page;
+   /* Stores the cmd id given by host to start the free page reporting */
+   __virtio32 start_cmd_id;
+   /* Stores STOP_ID as a sign to tell host that the reporting is done */
+   __virtio32 stop_cmd_id;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -281,6 +300,53 @@ static unsigned int update_balloon_stats(struct 
virtio_balloon *vb)
return idx;
 }
 
+static int add_one_sg(struct virtqueue *vq, unsigned long pfn, uint32_t len)
+{
+   struct scatterlist sg;
+   unsigned int unused;
+   int ret = 0;
+
+   sg_init_table(, 1);
+   sg_set_page(, pfn_to_page(pfn), len, 0);
+
+   /* Detach all the used buffers from the vq */
+   while (virtqueue_get_buf(vq, ))
+   ;
+
+   /*
+* Since this is an optimization feature, losing a couple of free
+* pages to report isn't important. We simply return without adding
+* the page if the vq is full.
+* We are adding one entry each time, which essentially results in no
+* memory allocation, so the GFP_KERNEL flag below can be ignored.
+* There is always one entry reserved for the cmd id to use.
+*/
+   if (vq->num_free > 1)
+   ret = virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL);
+
+   if (vq->num_free < virtqueue_get_vring_size(vq) / 2)
+   virtqueue_kick(vq);
+
+   return ret;
+}
+
+static void send_cmd_id(struct virtio_balloon *vb, __virtio32 *cmd_id)
+{
+   struct scatterlist sg;
+   struct virtqueue *vq = vb->free_page_vq;
+
+   if (unlikely(!virtio_has_feature(vb->vdev,
+    VIRTIO_BALLOON_F_FREE_PAGE_HINT)))
+   return;
+
+   sg_init_one(, cmd_id, sizeof(*cmd_id));
+
+   if (virtqueue_add_outbuf(vq, , 1, vb, GFP_KERNEL))
+   __virtio_clear_bit(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+
+   virtqueue_kick(vq);
+}
+
 /*
  * While most virtqueues communicate guest-initiated requests to the 
hypervisor,
  * the stats queue operates in reverse.  The driver initializes the virtqueue
@@ -316,17 +382,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, >update_balloon_