Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
On 05.01.21 10:20, Michal Hocko wrote: > On Mon 04-01-21 15:00:31, Dave Hansen wrote: >> On 1/4/21 12:11 PM, David Hildenbrand wrote: Yeah, it certainly can't be the default, but it *is* useful for thing where we know that there are no cache benefits to zeroing close to where the memory is allocated. The trick is opting into it somehow, either in a process or a VMA. >>> The patch set is mostly trying to optimize starting a new process. So >>> process/vma doesn‘t really work. >> >> Let's say you have a system-wide tunable that says: pre-zero pages and >> keep 10GB of them around. Then, you opt-in a process to being allowed >> to dip into that pool with a process-wide flag or an madvise() call. >> You could even have the flag be inherited across execve() if you wanted >> to have helper apps be able to set the policy and access the pool like >> how numactl works. > > While possible, it sounds quite heavy weight to me. Page allocator would > have to somehow maintain those pre-zeroed pages. This pool will also > become a very scarce resource very soon because everybody just want to > run faster. So this would open many more interesting questions. Agreed. > > A global knob with all or nothing sounds like an easier to use and > maintain solution to me. I mean, that brings me back to my original suggestion: just use hugetlbfs and implement some sort of pre-zeroing there (worker thread, whatsoever). Most vfio users should already be better of using hugepages. It's a "pool of pages" already. Selected users use it. I really don't see a need to extend the buddy with something like that. -- Thanks, David / dhildenb ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
On 1/4/21 12:11 PM, David Hildenbrand wrote: >> Yeah, it certainly can't be the default, but it *is* useful for >> thing where we know that there are no cache benefits to zeroing >> close to where the memory is allocated. >> >> The trick is opting into it somehow, either in a process or a VMA. >> > The patch set is mostly trying to optimize starting a new process. So > process/vma doesn‘t really work. Let's say you have a system-wide tunable that says: pre-zero pages and keep 10GB of them around. Then, you opt-in a process to being allowed to dip into that pool with a process-wide flag or an madvise() call. You could even have the flag be inherited across execve() if you wanted to have helper apps be able to set the policy and access the pool like how numactl works. Dan makes a very good point about using filesystems for this, though. It wouldn't be rocket science to set up a special tmpfs mount just for VM memory and pre-zero it from userspace. For qemu, you'd need to teach the management layer to hand out zeroed files via mem-path=. Heck, if you taught MADV_FREE how to handle tmpfs, you could even pre-zero *and* get the memory back quickly if those files ended up over-sized somehow. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
On Mon, Jan 4, 2021 at 12:11 PM David Hildenbrand wrote: > > > > Am 04.01.2021 um 20:52 schrieb Dave Hansen : > > > > On 1/4/21 11:27 AM, Matthew Wilcox wrote: > >>> On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote: > >>> On 12/21/20 8:30 AM, Liang Li wrote: > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -137,6 +137,9 @@ enum pageflags { > #endif > #ifdef CONFIG_64BIT > PG_arch_2, > +#endif > +#ifdef CONFIG_PREZERO_PAGE > +PG_zero, > #endif > __NR_PAGEFLAGS, > >>> > >>> I don't think this is worth a generic page->flags bit. > >>> > >>> There's a ton of space in 'struct page' for pages that are in the > >>> allocator. Can't we use some of that space? > >> > >> I was going to object to that too, but I think the entire approach is > >> flawed and needs to be thrown out. It just nukes the caches in extremely > >> subtle and hard to measure ways, lowering overall system performance. > > > > Yeah, it certainly can't be the default, but it *is* useful for thing > > where we know that there are no cache benefits to zeroing close to where > > the memory is allocated. > > > > The trick is opting into it somehow, either in a process or a VMA. > > > > The patch set is mostly trying to optimize starting a new process. So > process/vma doesn‘t really work. > > I still wonder if using tmpfs/shmem cannot somehow be used to cover the > original use case of starting a new vm fast (or rebooting an existing one > involving restarting the process). If it's rebooting a VM then file-backed should be able to skip the zeroing because the stale data exposure is only to itself. If the memory is being repurposed to a new VM then the file needs to be reallocated / re-zeroed just like the anonymous case. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
> Am 04.01.2021 um 20:52 schrieb Dave Hansen : > > On 1/4/21 11:27 AM, Matthew Wilcox wrote: >>> On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote: >>> On 12/21/20 8:30 AM, Liang Li wrote: --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -137,6 +137,9 @@ enum pageflags { #endif #ifdef CONFIG_64BIT PG_arch_2, +#endif +#ifdef CONFIG_PREZERO_PAGE +PG_zero, #endif __NR_PAGEFLAGS, >>> >>> I don't think this is worth a generic page->flags bit. >>> >>> There's a ton of space in 'struct page' for pages that are in the >>> allocator. Can't we use some of that space? >> >> I was going to object to that too, but I think the entire approach is >> flawed and needs to be thrown out. It just nukes the caches in extremely >> subtle and hard to measure ways, lowering overall system performance. > > Yeah, it certainly can't be the default, but it *is* useful for thing > where we know that there are no cache benefits to zeroing close to where > the memory is allocated. > > The trick is opting into it somehow, either in a process or a VMA. > The patch set is mostly trying to optimize starting a new process. So process/vma doesn‘t really work. I still wonder if using tmpfs/shmem cannot somehow be used to cover the original use case of starting a new vm fast (or rebooting an existing one involving restarting the process). ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
On 1/4/21 11:27 AM, Matthew Wilcox wrote: > On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote: >> On 12/21/20 8:30 AM, Liang Li wrote: >>> --- a/include/linux/page-flags.h >>> +++ b/include/linux/page-flags.h >>> @@ -137,6 +137,9 @@ enum pageflags { >>> #endif >>> #ifdef CONFIG_64BIT >>> PG_arch_2, >>> +#endif >>> +#ifdef CONFIG_PREZERO_PAGE >>> + PG_zero, >>> #endif >>> __NR_PAGEFLAGS, >> >> I don't think this is worth a generic page->flags bit. >> >> There's a ton of space in 'struct page' for pages that are in the >> allocator. Can't we use some of that space? > > I was going to object to that too, but I think the entire approach is > flawed and needs to be thrown out. It just nukes the caches in extremely > subtle and hard to measure ways, lowering overall system performance. Yeah, it certainly can't be the default, but it *is* useful for thing where we know that there are no cache benefits to zeroing close to where the memory is allocated. The trick is opting into it somehow, either in a process or a VMA. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
On Mon, Jan 4, 2021 at 11:28 AM Matthew Wilcox wrote: > > On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote: > > On 12/21/20 8:30 AM, Liang Li wrote: > > > --- a/include/linux/page-flags.h > > > +++ b/include/linux/page-flags.h > > > @@ -137,6 +137,9 @@ enum pageflags { > > > #endif > > > #ifdef CONFIG_64BIT > > > PG_arch_2, > > > +#endif > > > +#ifdef CONFIG_PREZERO_PAGE > > > + PG_zero, > > > #endif > > > __NR_PAGEFLAGS, > > > > I don't think this is worth a generic page->flags bit. > > > > There's a ton of space in 'struct page' for pages that are in the > > allocator. Can't we use some of that space? > > I was going to object to that too, but I think the entire approach is > flawed and needs to be thrown out. It just nukes the caches in extremely > subtle and hard to measure ways, lowering overall system performance. At a minimum the performance analysis should at least try to quantify that externalized cost. Certainly that overhead went somewhere. Maybe if this overhead was limited to run when the CPU would otherwise go idle, but that might mean it never runs in practice... ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
On Mon, Jan 04, 2021 at 11:19:13AM -0800, Dave Hansen wrote: > On 12/21/20 8:30 AM, Liang Li wrote: > > --- a/include/linux/page-flags.h > > +++ b/include/linux/page-flags.h > > @@ -137,6 +137,9 @@ enum pageflags { > > #endif > > #ifdef CONFIG_64BIT > > PG_arch_2, > > +#endif > > +#ifdef CONFIG_PREZERO_PAGE > > + PG_zero, > > #endif > > __NR_PAGEFLAGS, > > I don't think this is worth a generic page->flags bit. > > There's a ton of space in 'struct page' for pages that are in the > allocator. Can't we use some of that space? I was going to object to that too, but I think the entire approach is flawed and needs to be thrown out. It just nukes the caches in extremely subtle and hard to measure ways, lowering overall system performance. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v2 PATCH 4/4] mm: pre zero out free pages to speed up page allocation for __GFP_ZERO
On 12/21/20 8:30 AM, Liang Li wrote: > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -137,6 +137,9 @@ enum pageflags { > #endif > #ifdef CONFIG_64BIT > PG_arch_2, > +#endif > +#ifdef CONFIG_PREZERO_PAGE > + PG_zero, > #endif > __NR_PAGEFLAGS, I don't think this is worth a generic page->flags bit. There's a ton of space in 'struct page' for pages that are in the allocator. Can't we use some of that space? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization