Re: [v3 0/9] parallelized "struct page" zeroing

2017-06-01 Thread Michal Hocko
On Wed 31-05-17 23:35:48, Pasha Tatashin wrote: > >OK, so why cannot we make zero_struct_page 8x 8B stores, other arches > >would do memset. You said it would be slower but would that be > >measurable? I am sorry to be so persistent here but I would be really > >happier if this didn't depend on

Re: [v3 0/9] parallelized "struct page" zeroing

2017-06-01 Thread Michal Hocko
On Wed 31-05-17 23:35:48, Pasha Tatashin wrote: > >OK, so why cannot we make zero_struct_page 8x 8B stores, other arches > >would do memset. You said it would be slower but would that be > >measurable? I am sorry to be so persistent here but I would be really > >happier if this didn't depend on

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-31 Thread Pasha Tatashin
OK, so why cannot we make zero_struct_page 8x 8B stores, other arches would do memset. You said it would be slower but would that be measurable? I am sorry to be so persistent here but I would be really happier if this didn't depend on the deferred initialization. If this is absolutely a no-go

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-31 Thread Pasha Tatashin
OK, so why cannot we make zero_struct_page 8x 8B stores, other arches would do memset. You said it would be slower but would that be measurable? I am sorry to be so persistent here but I would be really happier if this didn't depend on the deferred initialization. If this is absolutely a no-go

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-31 Thread David Miller
From: Michal Hocko Date: Wed, 31 May 2017 18:31:31 +0200 > On Tue 30-05-17 13:16:50, Pasha Tatashin wrote: >> >Could you be more specific? E.g. how are other stores done in >> >__init_single_page safe then? I am sorry to be dense here but how does >> >the full 64B store differ

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-31 Thread David Miller
From: Michal Hocko Date: Wed, 31 May 2017 18:31:31 +0200 > On Tue 30-05-17 13:16:50, Pasha Tatashin wrote: >> >Could you be more specific? E.g. how are other stores done in >> >__init_single_page safe then? I am sorry to be dense here but how does >> >the full 64B store differ from other stores

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-31 Thread Michal Hocko
On Tue 30-05-17 13:16:50, Pasha Tatashin wrote: > >Could you be more specific? E.g. how are other stores done in > >__init_single_page safe then? I am sorry to be dense here but how does > >the full 64B store differ from other stores done in the same function. > > Hi Michal, > > It is safe to do

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-31 Thread Michal Hocko
On Tue 30-05-17 13:16:50, Pasha Tatashin wrote: > >Could you be more specific? E.g. how are other stores done in > >__init_single_page safe then? I am sorry to be dense here but how does > >the full 64B store differ from other stores done in the same function. > > Hi Michal, > > It is safe to do

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-30 Thread Pasha Tatashin
Could you be more specific? E.g. how are other stores done in __init_single_page safe then? I am sorry to be dense here but how does the full 64B store differ from other stores done in the same function. Hi Michal, It is safe to do regular 8-byte and smaller stores (stx, st, sth, stb) without

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-30 Thread Pasha Tatashin
Could you be more specific? E.g. how are other stores done in __init_single_page safe then? I am sorry to be dense here but how does the full 64B store differ from other stores done in the same function. Hi Michal, It is safe to do regular 8-byte and smaller stores (stx, st, sth, stb) without

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-29 Thread Michal Hocko
On Fri 26-05-17 12:45:55, Pasha Tatashin wrote: > Hi Michal, > > I have considered your proposals: > > 1. Making memset(0) unconditional inside __init_single_page() is not going > to work because it slows down SPARC, and ppc64. On SPARC even the BSTI > optimization that I have proposed earlier

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-29 Thread Michal Hocko
On Fri 26-05-17 12:45:55, Pasha Tatashin wrote: > Hi Michal, > > I have considered your proposals: > > 1. Making memset(0) unconditional inside __init_single_page() is not going > to work because it slows down SPARC, and ppc64. On SPARC even the BSTI > optimization that I have proposed earlier

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-26 Thread Pasha Tatashin
Hi Michal, I have considered your proposals: 1. Making memset(0) unconditional inside __init_single_page() is not going to work because it slows down SPARC, and ppc64. On SPARC even the BSTI optimization that I have proposed earlier won't work, because after consulting with other engineers I

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-26 Thread Pasha Tatashin
Hi Michal, I have considered your proposals: 1. Making memset(0) unconditional inside __init_single_page() is not going to work because it slows down SPARC, and ppc64. On SPARC even the BSTI optimization that I have proposed earlier won't work, because after consulting with other engineers I

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-16 Thread Benjamin Herrenschmidt
On Fri, 2017-05-12 at 13:37 -0400, David Miller wrote: > > Right now it is larger, but what I suggested is to add a new optimized > > routine just for this case, which would do STBI for 64-bytes but > > without membar (do membar at the end of memmap_init_zone() and > > deferred_init_memmap() > >

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-16 Thread Benjamin Herrenschmidt
On Fri, 2017-05-12 at 13:37 -0400, David Miller wrote: > > Right now it is larger, but what I suggested is to add a new optimized > > routine just for this case, which would do STBI for 64-bytes but > > without membar (do membar at the end of memmap_init_zone() and > > deferred_init_memmap() > >

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-16 Thread Michal Hocko
On Mon 15-05-17 16:44:26, Pasha Tatashin wrote: > On 05/15/2017 03:38 PM, Michal Hocko wrote: > >I do not think this is the right approach. Your measurements just show > >that sparc could have a more optimized memset for small sizes. If you > >keep the same memset only for the parallel

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-16 Thread Michal Hocko
On Mon 15-05-17 16:44:26, Pasha Tatashin wrote: > On 05/15/2017 03:38 PM, Michal Hocko wrote: > >I do not think this is the right approach. Your measurements just show > >that sparc could have a more optimized memset for small sizes. If you > >keep the same memset only for the parallel

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-15 Thread Pasha Tatashin
On 05/15/2017 03:38 PM, Michal Hocko wrote: On Mon 15-05-17 14:12:10, Pasha Tatashin wrote: Hi Michal, After looking at your suggested memblock_virt_alloc_core() change again, I decided to keep what I have. I do not want to inline memblock_virt_alloc_internal(), because it is not a

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-15 Thread Pasha Tatashin
On 05/15/2017 03:38 PM, Michal Hocko wrote: On Mon 15-05-17 14:12:10, Pasha Tatashin wrote: Hi Michal, After looking at your suggested memblock_virt_alloc_core() change again, I decided to keep what I have. I do not want to inline memblock_virt_alloc_internal(), because it is not a

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-15 Thread Michal Hocko
On Mon 15-05-17 14:12:10, Pasha Tatashin wrote: > Hi Michal, > > After looking at your suggested memblock_virt_alloc_core() change again, I > decided to keep what I have. I do not want to inline > memblock_virt_alloc_internal(), because it is not a performance critical > path, and by inlining it

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-15 Thread Michal Hocko
On Mon 15-05-17 14:12:10, Pasha Tatashin wrote: > Hi Michal, > > After looking at your suggested memblock_virt_alloc_core() change again, I > decided to keep what I have. I do not want to inline > memblock_virt_alloc_internal(), because it is not a performance critical > path, and by inlining it

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-15 Thread Pasha Tatashin
Hi Michal, After looking at your suggested memblock_virt_alloc_core() change again, I decided to keep what I have. I do not want to inline memblock_virt_alloc_internal(), because it is not a performance critical path, and by inlining it we will unnecessarily increase the text size on all

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-15 Thread Pasha Tatashin
Hi Michal, After looking at your suggested memblock_virt_alloc_core() change again, I decided to keep what I have. I do not want to inline memblock_virt_alloc_internal(), because it is not a performance critical path, and by inlining it we will unnecessarily increase the text size on all

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-12 Thread David Miller
From: Pasha Tatashin Date: Fri, 12 May 2017 13:24:52 -0400 > Right now it is larger, but what I suggested is to add a new optimized > routine just for this case, which would do STBI for 64-bytes but > without membar (do membar at the end of memmap_init_zone() and >

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-12 Thread David Miller
From: Pasha Tatashin Date: Fri, 12 May 2017 13:24:52 -0400 > Right now it is larger, but what I suggested is to add a new optimized > routine just for this case, which would do STBI for 64-bytes but > without membar (do membar at the end of memmap_init_zone() and > deferred_init_memmap() > >

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-12 Thread Pasha Tatashin
On 05/12/2017 12:57 PM, David Miller wrote: From: Pasha Tatashin Date: Thu, 11 May 2017 16:59:33 -0400 We should either keep memset() only for deferred struct pages as what I have in my patches. Another option is to add a new function struct_page_clear() which

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-12 Thread Pasha Tatashin
On 05/12/2017 12:57 PM, David Miller wrote: From: Pasha Tatashin Date: Thu, 11 May 2017 16:59:33 -0400 We should either keep memset() only for deferred struct pages as what I have in my patches. Another option is to add a new function struct_page_clear() which would default to memset() and

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-12 Thread David Miller
From: Pasha Tatashin Date: Thu, 11 May 2017 16:59:33 -0400 > We should either keep memset() only for deferred struct pages as what > I have in my patches. > > Another option is to add a new function struct_page_clear() which > would default to memset() and to

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-12 Thread David Miller
From: Pasha Tatashin Date: Thu, 11 May 2017 16:59:33 -0400 > We should either keep memset() only for deferred struct pages as what > I have in my patches. > > Another option is to add a new function struct_page_clear() which > would default to memset() and to something else on platforms that >

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-12 Thread David Miller
From: Pasha Tatashin Date: Thu, 11 May 2017 16:47:05 -0400 > So, moving memset() into __init_single_page() benefits Intel. I am > actually surprised why memset() is so slow on intel when it is called > from memblock. But, hurts SPARC, I guess these membars at the end

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-12 Thread David Miller
From: Pasha Tatashin Date: Thu, 11 May 2017 16:47:05 -0400 > So, moving memset() into __init_single_page() benefits Intel. I am > actually surprised why memset() is so slow on intel when it is called > from memblock. But, hurts SPARC, I guess these membars at the end of > memset() kills the

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-11 Thread Pasha Tatashin
We should either keep memset() only for deferred struct pages as what I have in my patches. Another option is to add a new function struct_page_clear() which would default to memset() and to something else on platforms that decide to optimize it. On SPARC it would call STBIs, and we would

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-11 Thread Pasha Tatashin
We should either keep memset() only for deferred struct pages as what I have in my patches. Another option is to add a new function struct_page_clear() which would default to memset() and to something else on platforms that decide to optimize it. On SPARC it would call STBIs, and we would

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-11 Thread Pasha Tatashin
Have you measured that? I do not think it would be super hard to measure. I would be quite surprised if this added much if anything at all as the whole struct page should be in the cache line already. We do set reference count and other struct members. Almost nobody should be looking at our page

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-11 Thread Pasha Tatashin
Have you measured that? I do not think it would be super hard to measure. I would be quite surprised if this added much if anything at all as the whole struct page should be in the cache line already. We do set reference count and other struct members. Almost nobody should be looking at our page

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-11 Thread David Miller
From: Michal Hocko Date: Thu, 11 May 2017 10:05:38 +0200 > Anyway, do you agree that doing the struct page initialization along > with other writes to it shouldn't add a measurable overhead comparing > to pre-zeroing of larger block of struct pages? We already have an >

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-11 Thread David Miller
From: Michal Hocko Date: Thu, 11 May 2017 10:05:38 +0200 > Anyway, do you agree that doing the struct page initialization along > with other writes to it shouldn't add a measurable overhead comparing > to pre-zeroing of larger block of struct pages? We already have an > exclusive cache line and

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-11 Thread Michal Hocko
On Wed 10-05-17 11:19:43, David S. Miller wrote: > From: Michal Hocko > Date: Wed, 10 May 2017 16:57:26 +0200 > > > Have you measured that? I do not think it would be super hard to > > measure. I would be quite surprised if this added much if anything at > > all as the whole

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-11 Thread Michal Hocko
On Wed 10-05-17 11:19:43, David S. Miller wrote: > From: Michal Hocko > Date: Wed, 10 May 2017 16:57:26 +0200 > > > Have you measured that? I do not think it would be super hard to > > measure. I would be quite surprised if this added much if anything at > > all as the whole struct page should

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread Matthew Wilcox
On Wed, May 10, 2017 at 02:00:26PM -0400, David Miller wrote: > From: Matthew Wilcox > Date: Wed, 10 May 2017 10:17:03 -0700 > > On Wed, May 10, 2017 at 11:19:43AM -0400, David Miller wrote: > >> I guess it might be clearer if you understand what the block > >> initializing

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread Matthew Wilcox
On Wed, May 10, 2017 at 02:00:26PM -0400, David Miller wrote: > From: Matthew Wilcox > Date: Wed, 10 May 2017 10:17:03 -0700 > > On Wed, May 10, 2017 at 11:19:43AM -0400, David Miller wrote: > >> I guess it might be clearer if you understand what the block > >> initializing stores do on sparc64.

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread David Miller
From: Matthew Wilcox Date: Wed, 10 May 2017 10:17:03 -0700 > On Wed, May 10, 2017 at 11:19:43AM -0400, David Miller wrote: >> From: Michal Hocko >> Date: Wed, 10 May 2017 16:57:26 +0200 >> >> > Have you measured that? I do not think it would be super

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread David Miller
From: Matthew Wilcox Date: Wed, 10 May 2017 10:17:03 -0700 > On Wed, May 10, 2017 at 11:19:43AM -0400, David Miller wrote: >> From: Michal Hocko >> Date: Wed, 10 May 2017 16:57:26 +0200 >> >> > Have you measured that? I do not think it would be super hard to >> > measure. I would be quite

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread Matthew Wilcox
On Wed, May 10, 2017 at 11:19:43AM -0400, David Miller wrote: > From: Michal Hocko > Date: Wed, 10 May 2017 16:57:26 +0200 > > > Have you measured that? I do not think it would be super hard to > > measure. I would be quite surprised if this added much if anything at > > all

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread Matthew Wilcox
On Wed, May 10, 2017 at 11:19:43AM -0400, David Miller wrote: > From: Michal Hocko > Date: Wed, 10 May 2017 16:57:26 +0200 > > > Have you measured that? I do not think it would be super hard to > > measure. I would be quite surprised if this added much if anything at > > all as the whole struct

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread David Miller
From: Pasha Tatashin Date: Wed, 10 May 2017 11:01:40 -0400 > Perhaps you are right, and I will measure on x86. But, I suspect hit > can become unacceptable on some platfoms: there is an overhead of > calling a function, even if it is leaf-optimized, and there is an >

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread David Miller
From: Pasha Tatashin Date: Wed, 10 May 2017 11:01:40 -0400 > Perhaps you are right, and I will measure on x86. But, I suspect hit > can become unacceptable on some platfoms: there is an overhead of > calling a function, even if it is leaf-optimized, and there is an > overhead in memset() to

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread David Miller
From: Michal Hocko Date: Wed, 10 May 2017 16:57:26 +0200 > Have you measured that? I do not think it would be super hard to > measure. I would be quite surprised if this added much if anything at > all as the whole struct page should be in the cache line already. We do > set

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread David Miller
From: Michal Hocko Date: Wed, 10 May 2017 16:57:26 +0200 > Have you measured that? I do not think it would be super hard to > measure. I would be quite surprised if this added much if anything at > all as the whole struct page should be in the cache line already. We do > set reference count and

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread Pasha Tatashin
On 05/10/2017 10:57 AM, Michal Hocko wrote: On Wed 10-05-17 09:42:22, Pasha Tatashin wrote: Well, I didn't object to this particular part. I was mostly concerned about http://lkml.kernel.org/r/1494003796-748672-4-git-send-email-pasha.tatas...@oracle.com and the "zero" argument for other

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread Pasha Tatashin
On 05/10/2017 10:57 AM, Michal Hocko wrote: On Wed 10-05-17 09:42:22, Pasha Tatashin wrote: Well, I didn't object to this particular part. I was mostly concerned about http://lkml.kernel.org/r/1494003796-748672-4-git-send-email-pasha.tatas...@oracle.com and the "zero" argument for other

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread Michal Hocko
On Wed 10-05-17 09:42:22, Pasha Tatashin wrote: > > > >Well, I didn't object to this particular part. I was mostly concerned > >about > >http://lkml.kernel.org/r/1494003796-748672-4-git-send-email-pasha.tatas...@oracle.com > >and the "zero" argument for other functions. I guess we can do without >

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread Michal Hocko
On Wed 10-05-17 09:42:22, Pasha Tatashin wrote: > > > >Well, I didn't object to this particular part. I was mostly concerned > >about > >http://lkml.kernel.org/r/1494003796-748672-4-git-send-email-pasha.tatas...@oracle.com > >and the "zero" argument for other functions. I guess we can do without >

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread Pasha Tatashin
Well, I didn't object to this particular part. I was mostly concerned about http://lkml.kernel.org/r/1494003796-748672-4-git-send-email-pasha.tatas...@oracle.com and the "zero" argument for other functions. I guess we can do without that. I _think_ that we should simply _always_ initialize the

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread Pasha Tatashin
Well, I didn't object to this particular part. I was mostly concerned about http://lkml.kernel.org/r/1494003796-748672-4-git-send-email-pasha.tatas...@oracle.com and the "zero" argument for other functions. I guess we can do without that. I _think_ that we should simply _always_ initialize the

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread Michal Hocko
On Tue 09-05-17 14:54:50, Pasha Tatashin wrote: [...] > >The implementation just looks too large to what I would expect. E.g. do > >we really need to add zero argument to the large part of the memblock > >API? Wouldn't it be easier to simply export memblock_virt_alloc_internal > >(or its tiny

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-10 Thread Michal Hocko
On Tue 09-05-17 14:54:50, Pasha Tatashin wrote: [...] > >The implementation just looks too large to what I would expect. E.g. do > >we really need to add zero argument to the large part of the memblock > >API? Wouldn't it be easier to simply export memblock_virt_alloc_internal > >(or its tiny

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-09 Thread Pasha Tatashin
Hi Michal, I like the idea of postponing the zeroing from the allocation to the init time. To be honest the improvement looks much larger than I would expect (Btw. this should be a part of the changelog rather than a outside link). The improvements are larger, because this time was never

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-09 Thread Pasha Tatashin
Hi Michal, I like the idea of postponing the zeroing from the allocation to the init time. To be honest the improvement looks much larger than I would expect (Btw. this should be a part of the changelog rather than a outside link). The improvements are larger, because this time was never

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-09 Thread Michal Hocko
On Fri 05-05-17 13:03:07, Pavel Tatashin wrote: > Changelog: > v2 - v3 > - Addressed David's comments about one change per patch: > * Splited changes to platforms into 4 patches > * Made "do not zero vmemmap_buf" as a separate patch > v1 - v2 > -

Re: [v3 0/9] parallelized "struct page" zeroing

2017-05-09 Thread Michal Hocko
On Fri 05-05-17 13:03:07, Pavel Tatashin wrote: > Changelog: > v2 - v3 > - Addressed David's comments about one change per patch: > * Splited changes to platforms into 4 patches > * Made "do not zero vmemmap_buf" as a separate patch > v1 - v2 > -

[v3 0/9] parallelized "struct page" zeroing

2017-05-05 Thread Pavel Tatashin
Changelog: v2 - v3 - Addressed David's comments about one change per patch: * Splited changes to platforms into 4 patches * Made "do not zero vmemmap_buf" as a separate patch v1 - v2 - Per request, added s390 to deferred "struct page"

[v3 0/9] parallelized "struct page" zeroing

2017-05-05 Thread Pavel Tatashin
Changelog: v2 - v3 - Addressed David's comments about one change per patch: * Splited changes to platforms into 4 patches * Made "do not zero vmemmap_buf" as a separate patch v1 - v2 - Per request, added s390 to deferred "struct page"