Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO

2021-01-05 Thread David Hildenbrand
On 05.01.21 10:20, Michal Hocko wrote:
> On Mon 04-01-21 15:00:31, Dave Hansen wrote:
>> On 1/4/21 12:11 PM, David Hildenbrand wrote:
 Yeah, it certainly can't be the default, but it *is* useful for
 thing where we know that there are no cache benefits to zeroing
 close to where the memory is allocated.

 The trick is opting into it somehow, either in a process or a VMA.

>>> The patch set is mostly trying to optimize starting a new process. So
>>> process/vma doesn‘t really work.
>>
>> Let's say you have a system-wide tunable that says: pre-zero pages and
>> keep 10GB of them around.  Then, you opt-in a process to being allowed
>> to dip into that pool with a process-wide flag or an madvise() call.
>> You could even have the flag be inherited across execve() if you wanted
>> to have helper apps be able to set the policy and access the pool like
>> how numactl works.
> 
> While possible, it sounds quite heavy weight to me. Page allocator would
> have to somehow maintain those pre-zeroed pages. This pool will also
> become a very scarce resource very soon because everybody just want to
> run faster. So this would open many more interesting questions.

Agreed.

> 
> A global knob with all or nothing sounds like an easier to use and
> maintain solution to me.

I mean, that brings me back to my original suggestion: just use
hugetlbfs and implement some sort of pre-zeroing there (worker thread,
whatsoever). Most vfio users should already be better of using hugepages.

It's a "pool of pages" already. Selected users use it. I really don't
see a need to extend the buddy with something like that.

-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO

2021-01-04 Thread Dave Hansen
On 1/4/21 12:11 PM, David Hildenbrand wrote:
>> Yeah, it certainly can't be the default, but it *is* useful for
>> thing where we know that there are no cache benefits to zeroing
>> close to where the memory is allocated.
>> 
>> The trick is opting into it somehow, either in a process or a VMA.
>> 
> The patch set is mostly trying to optimize starting a new process. So
> process/vma doesn‘t really work.

Let's say you have a system-wide tunable that says: pre-zero pages and
keep 10GB of them around.  Then, you opt-in a process to being allowed
to dip into that pool with a process-wide flag or an madvise() call.
You could even have the flag be inherited across execve() if you wanted
to have helper apps be able to set the policy and access the pool like
how numactl works.

Dan makes a very good point about using filesystems for this, though.
It wouldn't be rocket science to set up a special tmpfs mount just for
VM memory and pre-zero it from userspace.  For qemu, you'd need to teach
the management layer to hand out zeroed files via mem-path=.  Heck, if
you taught MADV_FREE how to handle tmpfs, you could even pre-zero *and*
get the memory back quickly if those files ended up over-sized somehow.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO

2021-01-04 Thread Dan Williams
On Mon, Jan 4, 2021 at 12:11 PM David Hildenbrand  wrote:
>
>
> > Am 04.01.2021 um 20:52 schrieb Dave Hansen :
> >
> > On 1/4/21 11:27 AM, Matthew Wilcox wrote:
> >>> On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote:
> >>> On 12/21/20 8:30 AM, Liang Li wrote:
>  --- a/include/linux/page-flags.h
>  +++ b/include/linux/page-flags.h
>  @@ -137,6 +137,9 @@ enum pageflags {
>  #endif
>  #ifdef CONFIG_64BIT
> PG_arch_2,
>  +#endif
>  +#ifdef CONFIG_PREZERO_PAGE
>  +PG_zero,
>  #endif
> __NR_PAGEFLAGS,
> >>>
> >>> I don't think this is worth a generic page->flags bit.
> >>>
> >>> There's a ton of space in 'struct page' for pages that are in the
> >>> allocator.  Can't we use some of that space?
> >>
> >> I was going to object to that too, but I think the entire approach is
> >> flawed and needs to be thrown out.  It just nukes the caches in extremely
> >> subtle and hard to measure ways, lowering overall system performance.
> >
> > Yeah, it certainly can't be the default, but it *is* useful for thing
> > where we know that there are no cache benefits to zeroing close to where
> > the memory is allocated.
> >
> > The trick is opting into it somehow, either in a process or a VMA.
> >
>
> The patch set is mostly trying to optimize starting a new process. So 
> process/vma doesn‘t really work.
>
> I still wonder if using tmpfs/shmem cannot somehow be used to cover the 
> original use case of starting a new vm fast (or rebooting an existing one 
> involving restarting the process).

If it's rebooting a VM then file-backed should be able to skip the
zeroing because the stale data exposure is only to itself. If the
memory is being repurposed to a new VM then the file needs to be
reallocated / re-zeroed just like the anonymous case.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO

2021-01-04 Thread David Hildenbrand

> Am 04.01.2021 um 20:52 schrieb Dave Hansen :
> 
> On 1/4/21 11:27 AM, Matthew Wilcox wrote:
>>> On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote:
>>> On 12/21/20 8:30 AM, Liang Li wrote:
 --- a/include/linux/page-flags.h
 +++ b/include/linux/page-flags.h
 @@ -137,6 +137,9 @@ enum pageflags {
 #endif
 #ifdef CONFIG_64BIT
PG_arch_2,
 +#endif
 +#ifdef CONFIG_PREZERO_PAGE
 +PG_zero,
 #endif
__NR_PAGEFLAGS,
>>> 
>>> I don't think this is worth a generic page->flags bit.
>>> 
>>> There's a ton of space in 'struct page' for pages that are in the
>>> allocator.  Can't we use some of that space?
>> 
>> I was going to object to that too, but I think the entire approach is
>> flawed and needs to be thrown out.  It just nukes the caches in extremely
>> subtle and hard to measure ways, lowering overall system performance.
> 
> Yeah, it certainly can't be the default, but it *is* useful for thing
> where we know that there are no cache benefits to zeroing close to where
> the memory is allocated.
> 
> The trick is opting into it somehow, either in a process or a VMA.
> 

The patch set is mostly trying to optimize starting a new process. So 
process/vma doesn‘t really work.

I still wonder if using tmpfs/shmem cannot somehow be used to cover the 
original use case of starting a new vm fast (or rebooting an existing one 
involving restarting the process).

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO

2021-01-04 Thread Dave Hansen
On 1/4/21 11:27 AM, Matthew Wilcox wrote:
> On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote:
>> On 12/21/20 8:30 AM, Liang Li wrote:
>>> --- a/include/linux/page-flags.h
>>> +++ b/include/linux/page-flags.h
>>> @@ -137,6 +137,9 @@ enum pageflags {
>>>  #endif
>>>  #ifdef CONFIG_64BIT
>>> PG_arch_2,
>>> +#endif
>>> +#ifdef CONFIG_PREZERO_PAGE
>>> +   PG_zero,
>>>  #endif
>>> __NR_PAGEFLAGS,
>>
>> I don't think this is worth a generic page->flags bit.
>>
>> There's a ton of space in 'struct page' for pages that are in the
>> allocator.  Can't we use some of that space?
> 
> I was going to object to that too, but I think the entire approach is
> flawed and needs to be thrown out.  It just nukes the caches in extremely
> subtle and hard to measure ways, lowering overall system performance.

Yeah, it certainly can't be the default, but it *is* useful for thing
where we know that there are no cache benefits to zeroing close to where
the memory is allocated.

The trick is opting into it somehow, either in a process or a VMA.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO

2021-01-04 Thread Dan Williams
On Mon, Jan 4, 2021 at 11:28 AM Matthew Wilcox  wrote:
>
> On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote:
> > On 12/21/20 8:30 AM, Liang Li wrote:
> > > --- a/include/linux/page-flags.h
> > > +++ b/include/linux/page-flags.h
> > > @@ -137,6 +137,9 @@ enum pageflags {
> > >  #endif
> > >  #ifdef CONFIG_64BIT
> > > PG_arch_2,
> > > +#endif
> > > +#ifdef CONFIG_PREZERO_PAGE
> > > +   PG_zero,
> > >  #endif
> > > __NR_PAGEFLAGS,
> >
> > I don't think this is worth a generic page->flags bit.
> >
> > There's a ton of space in 'struct page' for pages that are in the
> > allocator.  Can't we use some of that space?
>
> I was going to object to that too, but I think the entire approach is
> flawed and needs to be thrown out.  It just nukes the caches in extremely
> subtle and hard to measure ways, lowering overall system performance.

At a minimum the performance analysis should at least try to quantify
that externalized cost. Certainly that overhead went somewhere. Maybe
if this overhead was limited to run when the CPU would otherwise go
idle, but that might mean it never runs in practice...
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO

2021-01-04 Thread Matthew Wilcox
On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote:
> On 12/21/20 8:30 AM, Liang Li wrote:
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -137,6 +137,9 @@ enum pageflags {
> >  #endif
> >  #ifdef CONFIG_64BIT
> > PG_arch_2,
> > +#endif
> > +#ifdef CONFIG_PREZERO_PAGE
> > +   PG_zero,
> >  #endif
> > __NR_PAGEFLAGS,
> 
> I don't think this is worth a generic page->flags bit.
> 
> There's a ton of space in 'struct page' for pages that are in the
> allocator.  Can't we use some of that space?

I was going to object to that too, but I think the entire approach is
flawed and needs to be thrown out.  It just nukes the caches in extremely
subtle and hard to measure ways, lowering overall system performance.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO

2021-01-04 Thread Dave Hansen
On 12/21/20 8:30 AM, Liang Li wrote:
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -137,6 +137,9 @@ enum pageflags {
>  #endif
>  #ifdef CONFIG_64BIT
>   PG_arch_2,
> +#endif
> +#ifdef CONFIG_PREZERO_PAGE
> + PG_zero,
>  #endif
>   __NR_PAGEFLAGS,

I don't think this is worth a generic page->flags bit.

There's a ton of space in 'struct page' for pages that are in the
allocator.  Can't we use some of that space?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization