Re: [PATCH v6 1/2] Create debugfs file with virtio balloon usage information

2022-08-09 Thread Michael S. Tsirkin
On Tue, Jul 26, 2022 at 02:08:27PM +, Alexander Atanasov wrote:
> Allow the guest to know how much it is ballooned by the host
> and how that memory is accounted.
> 
> It is useful when debugging out of memory conditions,
> as well for userspace processes that monitor the memory pressure
> and for nested virtualization.

Hmm. debugging is ok. monitoring/nested use-cases probably
call for sysfs/procfs.


> Depending on the deflate on oom flag memory is accounted in two ways.
> If the flag is set the inflated pages are accounted as used memory.
> If the flag is not set the inflated pages are substracted from totalram.
> To make clear how are inflated pages accounted sign prefix the value.
> Where negative means substracted from totalram and positive means
> they are accounted as used.
> 
> Signed-off-by: Alexander Atanasov 
> ---
>  drivers/virtio/virtio_balloon.c | 59 +
>  1 file changed, 59 insertions(+)
> 
> V2:
>   - fixed coding style
>   - removed pretty print
> V3:
>   - removed dublicate of features
>   - comment about balooned_pages more clear
>   - convert host pages to balloon pages
> V4:
>   - added a define for BALLOON_PAGE_SIZE to make things clear
> V5:
>   - Made the calculatons work properly for both ways of memory accounting
> with or without deflate_on_oom
>   - dropped comment
> V6:
>   - reduced the file to minimum
>   - unify accounting
> 
> I didn't understand if you agree to change the accounting to be the same
> so following part 2 is about it.
> 
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index f4c34a2a6b8e..97d3b29cb9f1 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -731,6 +732,59 @@ static void report_free_page_func(struct work_struct 
> *work)
>   }
>  }
>  
> +/*
> + * DEBUGFS Interface
> + */
> +#ifdef CONFIG_DEBUG_FS
> +
> +/**
> + * virtio_balloon_debug_show - shows statistics of balloon operations.
> + * @f: pointer to the  seq_file.
> + * @offset: ignored.
> + *
> + * Provides the statistics that can be accessed in virtio-balloon in the 
> debugfs.
> + *
> + * Return: zero on success or an error code.
> + */
> +
> +static int virtio_balloon_debug_show(struct seq_file *f, void *offset)
> +{
> + struct virtio_balloon *vb = f->private;
> + s64 num_pages = vb->num_pages << (VIRTIO_BALLOON_PFN_SHIFT - 10);
> +
> + if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> + num_pages = -num_pages;
> +
> + seq_printf(f, "inflated: %lld kB\n", num_pages);
> +
> + return 0;
> +}

Can't we just have two attributes, without hacks like this one?

> +
> +DEFINE_SHOW_ATTRIBUTE(virtio_balloon_debug);
> +
> +static void  virtio_balloon_debugfs_init(struct virtio_balloon *b)
> +{
> + debugfs_create_file("virtio-balloon", 0444, NULL, b,
> + _balloon_debug_fops);
> +}
> +
> +static void  virtio_balloon_debugfs_exit(struct virtio_balloon *b)
> +{
> + debugfs_remove(debugfs_lookup("virtio-balloon", NULL));
> +}
> +
> +#else
> +
> +static inline void virtio_balloon_debugfs_init(struct virtio_balloon *b)
> +{
> +}
> +
> +static inline void virtio_balloon_debugfs_exit(struct virtio_balloon *b)
> +{
> +}
> +
> +#endif   /* CONFIG_DEBUG_FS */
> +
>  #ifdef CONFIG_BALLOON_COMPACTION
>  /*
>   * virtballoon_migratepage - perform the balloon page migration on behalf of
> @@ -1019,6 +1073,9 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  
>   if (towards_target(vb))
>   virtballoon_changed(vdev);
> +
> + virtio_balloon_debugfs_init(vb);
> +
>   return 0;
>  
>  out_unregister_oom:
> @@ -1065,6 +1122,8 @@ static void virtballoon_remove(struct virtio_device 
> *vdev)
>  {
>   struct virtio_balloon *vb = vdev->priv;
>  
> + virtio_balloon_debugfs_exit(vb);
> +
>   if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING))
>   page_reporting_unregister(>pr_dev_info);
>   if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> -- 
> 2.25.1
> 
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC] how the ballooned memory should be accounted by the drivers inside the guests? (was:[PATCH v6 1/2] Create debugfs file with virtio balloon usage information)

2022-08-09 Thread David Hildenbrand
On 09.08.22 11:36, Alexander Atanasov wrote:
> Hello,
> 
> On 2.08.22 16:48, David Hildenbrand wrote:

 In case of Hyper-V I remember a customer BUG report that requested that
 exact behavior, however, I'm not able to locate the BZ quickly.
 [1] 
 https://lists.linuxfoundation.org/pipermail/virtualization/2021-November/057767.html
 (note that I can't easily find the original mail in the archives)
>>>
>>> VMWare does not, Xen do, HV do (but it didn't) - Virtio does both.
>>>
>>> For me the confusion comes from mixing ballooning and hot plug.
>>
>> For example, QEMU (and even libvirt) doesn't even have built in support
>> for any kind of automatic balloon resizing on guest memory pressure (and
>> I'm happy that we don't implement any such heuristics). As a user/admin,
>> all you can do is manually adjust the logical VM size by requesting to
>> inflate/deflate the balloon. For virtio-balloon, we cannot derive what
>> the hypervisor/admin might or might not do -- and whether the admin
>> intends to use memory ballooning for memory hotunplug or for optimizing > 
>> memory overcommit.
> 
> Is the lack of proper hotplug/unplug leading the admins to use it like 
> this?

Yes. Especially unplug is tricky and hard to get working reliably in
practice because of unmovable pages. Ballooning is an easy hack to get
unplug somewhat working.

For reference: under Windows ballooning was (and maybe still mostly is)
the only way to unplug memory again. Unplug of DIMMs is not supported.

> I really don't understand why you don't like automatic resizing - 
> both HyperV and VMWare do it ?

You need heuristics to guess what the guest might be doing next and
deflate fast enough to avoid any kind of OOMs in the guest. I pretty
much dislike that concept, because it just screams to be fragile.

In short: I don't like ballooning, to me it's a technology from ancient
times before we were able to do any better. In comparison, I do like
free page reporting for optimizing memory overcommit, but it still has
some drawbacks (caches consuming too much memory), that people are
working on to improve. So we're stuck with ballooning for now for some
use cases.

Personally, I consider any balloon extensions (inflate/deflate, not
things like free page reporting) a waste of time and effort, but that's
just my humble opinion. So I keep reviewing and maintaining them ;)

> 
>> As another example, HV dynamic memory actually combines memory hotplug
>> with memory ballooning: use memory hotplug to add more memory on demand
>> and use memory ballooning to logically unplug memory again.
> 
> Have both as an options - min/max memory , percentage free to keep as 
> minimum, fixed size and have hot add - kind of hot plug to go above 
> initial max - tries to manage it in dynamic way.
> 
>> The VMWare balloon is a bit special, because it actually usually
>> implements deflate-on-oom semantics in the hypervisor. IIRC, the
>> hypervisor will actually adjust the balloon size based on guest memory
>> stats, and there isn't really an interface to manually set the balloon
>> size for an admin. But I might be wrong regarding the latter.
> 
> For me this is what makes sense - you have a limited amount of
> physical RAM that you want to be used optimally by the guests.
> Waiting for the admin to adjust the balloon would not give very
> optimal results.

Exactly. For the use case of optimizing memory overcommit in the
hypervisor, you want deflate-on-oom semantics if you go with balloon
inflation/deflation.

> 
>>
>>>
>>> Ballooning is like a heap inside the guest from which the host can
>>> allocate/deallocate pages, if there is a mechanism for the guest to ask
>>> the host for more/to free/ pages or the host have a heuristic to monitor
>>> the guest and inflate/deflate the guest it is a matter of implementation.
>>
>> Please don't assume that the use case for memory ballooning is only
>> optimizing memory overcommit in the hypervisor under memory pressure.
> 
> I understood that - currently it is used and as a way to do 
> hotplug/unplug. The question is if that is the right way to use it.

People use it like that, and we have no control over that. I've heard of
people using hotplug of DIMMs to increase VM memory and balloon
inflation to hotunplug memory, to decrease VM memory.

> 
>>>
>>> Hot plug is adding  to MemTotal and it is not a random event either in
>>> real or virtual environment -  so you can act upon it. MemTotal  goes
>>> down on hot unplug and if pages get marked as faulty RAM.
>>
>> "not a random event either" -- sure, with ppc dlpar, xen balloon, hv
>> balloon or virtio-mem ... which all are able to hotplug memory fairly
>> randomly based on hypervisor decisions.
>>
>> In physical environments, it's not really a random event, I agree.
> 
> Even if it is not manual - if they do hotplug it is ok - you can track 
> hotplug events. But you can not track balloon events.

I was already asking myself in the past if there should be 

Re: [RFC] how the ballooned memory should be accounted by the drivers inside the guests? (was:[PATCH v6 1/2] Create debugfs file with virtio balloon usage information)

2022-08-02 Thread David Hildenbrand
>>
>> In case of Hyper-V I remember a customer BUG report that requested that
>> exact behavior, however, I'm not able to locate the BZ quickly.
>> [1] 
>> https://lists.linuxfoundation.org/pipermail/virtualization/2021-November/057767.html
>> (note that I can't easily find the original mail in the archives)
> 
> VMWare does not, Xen do, HV do (but it didn't) - Virtio does both.
> 
> For me the confusion comes from mixing ballooning and hot plug.

For example, QEMU (and even libvirt) doesn't even have built in support
for any kind of automatic balloon resizing on guest memory pressure (and
I'm happy that we don't implement any such heuristics). As a user/admin,
all you can do is manually adjust the logical VM size by requesting to
inflate/deflate the balloon. For virtio-balloon, we cannot derive what
the hypervisor/admin might or might not do -- and whether the admin
intends to use memory ballooning for memory hotunplug or for optimizing
memory overcommit.

As another example, HV dynamic memory actually combines memory hotplug
with memory ballooning: use memory hotplug to add more memory on demand
and use memory ballooning to logically unplug memory again.

The VMWare balloon is a bit special, because it actually usually
implements deflate-on-oom semantics in the hypervisor. IIRC, the
hypervisor will actually adjust the balloon size based on guest memory
stats, and there isn't really an interface to manually set the balloon
size for an admin. But I might be wrong regarding the latter.

> 
> Ballooning is like a heap inside the guest from which the host can 
> allocate/deallocate pages, if there is a mechanism for the guest to ask 
> the host for more/to free/ pages or the host have a heuristic to monitor 
> the guest and inflate/deflate the guest it is a matter of implementation.

Please don't assume that the use case for memory ballooning is only
optimizing memory overcommit in the hypervisor under memory pressure.

> 
> Hot plug is adding  to MemTotal and it is not a random event either in 
> real or virtual environment -  so you can act upon it. MemTotal  goes 
> down on hot unplug and if pages get marked as faulty RAM.

"not a random event either" -- sure, with ppc dlpar, xen balloon, hv
balloon or virtio-mem ... which all are able to hotplug memory fairly
randomly based on hypervisor decisions.

In physical environments, it's not really a random event, I agree.

> 
> Historically MemTotal is a stable value ( i agree with most of David 
> Stevens points) and user space is expecting it to be stable , 
> initialized at startup and it does not expect it to change.

Just like some apps are not prepared for memory hot(un)plug. Some apps
just don't work in environments with variable physical memory sizes:
examples include databases, where memory ballooning might essentially be
completely useless (there is a paper about application-aware memory
ballooning for that exact use case).

> 
> Used is what changes and that is what user space expects to change.
> 
> Delfate on oom might have been a mistake but it is there and if anything 
> depends on changing MemTotal  it will be broken by that option.  How 
> that can be fixed?

I didn't quite get your concern here. Deflate-on-oom in virtio-balloon
won't adjust MemTotal, so under which condition would be something broken?

> 
> I agree that the host can not reclaim what is marked as used  but should 
> it be able to ? May be it will be good to teach oom killer that there 
> can be such ram that can not be reclaimed.
> 
>> Note: I suggested under [1] to expose inflated pages via /proc/meminfo
>> directly. We could do that consistently over all balloon drivers ...
>> doesn't sound too crazy.
> 
> Initally i wanted to do exactly this BUT:
> - some drivers prefer to expose some more internal information in the file.

They always can have an extended debugfs interface in addition.

> - a lot of user space is using meminfo so better keep it as is to avoid 
> breaking something, ballooning is not very frequently used.

We can always extend. Just recently, we exposed Zswap data:

commit f6498b776d280b30a4614d8261840961e993c2c8
Author: Johannes Weiner 
Date:   Thu May 19 14:08:53 2022 -0700

mm: zswap: add basic meminfo and vmstat coverage


Exposing information about inflated pages in a generic way doesn't sound
completely wrong to me, but there might be people that object.

> 
> 
> Please, share your view on how the ballooned memory should be accounted by 
> the drivers inside the guests so we can work towards consistent behaviour:
> 
> Should the inflated memory be accounted as Used or MemTotal be adjusted?

I hope I was able to make it clear that it completely depends on how
memory ballooning is actually intended to be used. It's not uncommon to
use it as form of fake memory hotunplug, where that memory is actually
gone for good and won't simply come back when under memory pressure.

> 
> Should the inflated memory be added to /proc/meminfo ?

My gut feeling is 

Re: [PATCH v6 1/2] Create debugfs file with virtio balloon usage information

2022-08-01 Thread David Hildenbrand
>>> +
>>> +   if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>>> +   num_pages = -num_pages;
>> With VIRTIO_BALLOON_F_DEFLATE_ON_OOM this will now always report "0".
>>
>> which would be the same as "num_pages = 0;" and would deserve a comment
>> explaining why we don't indicate that as "inflated: ".
>>
>> Thanks.
> 
> Except if "now"  refers to some commit that i am missing it does not report 0.

/me facepalm

I read "-= num_pages"

> 
> 
> with qemu-system-x86_64  -enable-kvm -device 
> virtio-balloon,id=balloon0,deflate-on-oom=on,notify_on_empty=on,page-per-vq=on
>  -m 3G 
> 
> / # cat /sys/kernel/debug/virtio-balloon
> inflated: 0 kB
> / # QEMU 4.2.1 monitor - type 'help' for more information
> (qemu) balloon 1024
> (qemu)
> 
> / # cat /sys/kernel/debug/virtio-balloon
> inflated: 2097152 kB
> / #
> 
> with qemu-system-x86_64  -enable-kvm -device 
> virtio-balloon,id=balloon0,deflate-on-oom=off,notify_on_empty=on,page-per-vq=on
>  
> -m 3G ...
> 
> / # cat /sys/kernel/debug/virtio-balloon
> inflated: 0 kB
> / # QEMU 4.2.1 monitor - type 'help' for more information
> (qemu) balloon 1024
> (qemu)
> / # cat /sys/kernel/debug/virtio-balloon
> inflated: -2097152 kB

What's the rationale of making it negative?

> 
> To join the threads:
> 
>>> Always account inflated memory as used for both cases - with and
>>> without deflate on oom. Do not change total ram which can confuse
>>> userspace and users.
> 
>> Sorry, but NAK.
> 
> Ok.
> 
>> This would affect existing users / user space / balloon stats. For example
>> HV just recently switch to properly using adjust_managed_page_count()
> 
> 
> I am wondering what's the rationale behind this i have never seen such users
> that expect it to work like this. Do you have any pointers to such users, so
> i can understood why they do so ?

We adjust total pages and managed pages to simulate what memory is
actually available to the system (just like during memory hot(un)plug).
Even though the pages are "allocated" by the driver, they are actually
unusable for the system, just as if they would have been offlined.
Strictly speaking, the guest OS can kill as many processes as it wants,
it cannot reclaim that memory, as it's logically no longer available.

There is nothing (valid, well, except driver unloading) the guest can do
to reuse these pages. The hypervisor has to get involved first to grant
access to some of these pages again (deflate the balloon).

It's different with deflate-on-oom: the guest will *itself* decide to
reuse inflated pages to deflate them. So the allocated pages can become
back usable easily. There was a recent discussion for virtio-balloon to
change that behavior when it's known that the hypervisor essentially
implements "deflate-on-oom" by looking at guest memory stats and
adjusting the balloon size accordingly; however, as long as we don't
know what the hypervisor does or doesn't do, we have to keep the
existing behavior.

Note that most balloon drivers under Linux share that behavior.

In case of Hyper-V I remember a customer BUG report that requested that
exact behavior, however, I'm not able to locate the BZ quickly.


[1]
https://lists.linuxfoundation.org/pipermail/virtualization/2021-November/057767.html
(note that I can't easily find the original mail in the archives)

Note: I suggested under [1] to expose inflated pages via /proc/meminfo
directly. We could do that consistently over all balloon drivers ...
doesn't sound too crazy.

-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6 1/2] Create debugfs file with virtio balloon usage information

2022-08-01 Thread David Hildenbrand
On 26.07.22 16:08, Alexander Atanasov wrote:
> Allow the guest to know how much it is ballooned by the host
> and how that memory is accounted.
> 
> It is useful when debugging out of memory conditions,
> as well for userspace processes that monitor the memory pressure
> and for nested virtualization.
> 
> Depending on the deflate on oom flag memory is accounted in two ways.
> If the flag is set the inflated pages are accounted as used memory.
> If the flag is not set the inflated pages are substracted from totalram.
> To make clear how are inflated pages accounted sign prefix the value.
> Where negative means substracted from totalram and positive means
> they are accounted as used.
> 
> Signed-off-by: Alexander Atanasov 
> ---
>  drivers/virtio/virtio_balloon.c | 59 +
>  1 file changed, 59 insertions(+)
> 
> V2:
>   - fixed coding style
>   - removed pretty print
> V3:
>   - removed dublicate of features
>   - comment about balooned_pages more clear
>   - convert host pages to balloon pages
> V4:
>   - added a define for BALLOON_PAGE_SIZE to make things clear
> V5:
>   - Made the calculatons work properly for both ways of memory accounting
> with or without deflate_on_oom
>   - dropped comment
> V6:
>   - reduced the file to minimum
>   - unify accounting
> 
> I didn't understand if you agree to change the accounting to be the same
> so following part 2 is about it.
> 
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index f4c34a2a6b8e..97d3b29cb9f1 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -731,6 +732,59 @@ static void report_free_page_func(struct work_struct 
> *work)
>   }
>  }
>  
> +/*
> + * DEBUGFS Interface
> + */
> +#ifdef CONFIG_DEBUG_FS
> +
> +/**
> + * virtio_balloon_debug_show - shows statistics of balloon operations.
> + * @f: pointer to the  seq_file.
> + * @offset: ignored.
> + *
> + * Provides the statistics that can be accessed in virtio-balloon in the 
> debugfs.
> + *
> + * Return: zero on success or an error code.
> + */
> +

Superfluous empty line. Personally, I'd just get rid of this
(comparatively obvious) documentation completely.

> +static int virtio_balloon_debug_show(struct seq_file *f, void *offset)
> +{
> + struct virtio_balloon *vb = f->private;
> + s64 num_pages = vb->num_pages << (VIRTIO_BALLOON_PFN_SHIFT - 10);

Rather call this "inflated_kb" then, it's no longer "pages".

> +
> + if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> + num_pages = -num_pages;

With VIRTIO_BALLOON_F_DEFLATE_ON_OOM this will now always report "0".

which would be the same as "num_pages = 0;" and would deserve a comment
explaining why we don't indicate that as "inflated: ".

Thanks.

-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization