Re: [v5 09/15] sparc64: optimized struct page zeroing
Hi Sam, Thank you for looking at this. I will update patch description, and as you suggested replace memset() via static assert in next iteration. Pasha On 08/04/2017 01:37 AM, Sam Ravnborg wrote: Hi Pavel. On Thu, Aug 03, 2017 at 05:23:47PM -0400, Pavel Tatashin wrote: Add an optimized mm_zero_struct_page(), so struct page's are zeroed without calling memset(). We do eight regular stores, thus avoid cost of membar. The commit message does no longer reflect the implementation, and should be updated. Signed-off-by: Pavel TatashinReviewed-by: Steven Sistare Reviewed-by: Daniel Jordan Reviewed-by: Bob Picco --- arch/sparc/include/asm/pgtable_64.h | 32 1 file changed, 32 insertions(+) diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 6fbd931f0570..be47537e84c5 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -230,6 +230,38 @@ extern unsigned long _PAGE_ALL_SZ_BITS; extern struct page *mem_map_zero; #define ZERO_PAGE(vaddr) (mem_map_zero) +/* This macro must be updated when the size of struct page grows above 80 + * or reduces below 64. + * The idea that compiler optimizes out switch() statement, and only + * leaves clrx instructions or memset() call. + */ +#definemm_zero_struct_page(pp) do { \ + unsigned long *_pp = (void *)(pp); \ + \ + /* Check that struct page is 8-byte aligned */ \ + BUILD_BUG_ON(sizeof(struct page) & 7); \ Would also be good to catch if sizeof > 80 so we do not silently migrate to the suboptimal version (silent at build time). Can you at build time catch if size is no any of: 64, 72, 80 and simplify the below a little? Sam
Re: [v5 09/15] sparc64: optimized struct page zeroing
Hi Sam, Thank you for looking at this. I will update patch description, and as you suggested replace memset() via static assert in next iteration. Pasha On 08/04/2017 01:37 AM, Sam Ravnborg wrote: Hi Pavel. On Thu, Aug 03, 2017 at 05:23:47PM -0400, Pavel Tatashin wrote: Add an optimized mm_zero_struct_page(), so struct page's are zeroed without calling memset(). We do eight regular stores, thus avoid cost of membar. The commit message does no longer reflect the implementation, and should be updated. Signed-off-by: Pavel Tatashin Reviewed-by: Steven Sistare Reviewed-by: Daniel Jordan Reviewed-by: Bob Picco --- arch/sparc/include/asm/pgtable_64.h | 32 1 file changed, 32 insertions(+) diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 6fbd931f0570..be47537e84c5 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -230,6 +230,38 @@ extern unsigned long _PAGE_ALL_SZ_BITS; extern struct page *mem_map_zero; #define ZERO_PAGE(vaddr) (mem_map_zero) +/* This macro must be updated when the size of struct page grows above 80 + * or reduces below 64. + * The idea that compiler optimizes out switch() statement, and only + * leaves clrx instructions or memset() call. + */ +#definemm_zero_struct_page(pp) do { \ + unsigned long *_pp = (void *)(pp); \ + \ + /* Check that struct page is 8-byte aligned */ \ + BUILD_BUG_ON(sizeof(struct page) & 7); \ Would also be good to catch if sizeof > 80 so we do not silently migrate to the suboptimal version (silent at build time). Can you at build time catch if size is no any of: 64, 72, 80 and simplify the below a little? Sam
Re: [v5 09/15] sparc64: optimized struct page zeroing
Hi Pavel. On Thu, Aug 03, 2017 at 05:23:47PM -0400, Pavel Tatashin wrote: > Add an optimized mm_zero_struct_page(), so struct page's are zeroed without > calling memset(). We do eight regular stores, thus avoid cost of membar. The commit message does no longer reflect the implementation, and should be updated. > > Signed-off-by: Pavel Tatashin> Reviewed-by: Steven Sistare > Reviewed-by: Daniel Jordan > Reviewed-by: Bob Picco > --- > arch/sparc/include/asm/pgtable_64.h | 32 > 1 file changed, 32 insertions(+) > > diff --git a/arch/sparc/include/asm/pgtable_64.h > b/arch/sparc/include/asm/pgtable_64.h > index 6fbd931f0570..be47537e84c5 100644 > --- a/arch/sparc/include/asm/pgtable_64.h > +++ b/arch/sparc/include/asm/pgtable_64.h > @@ -230,6 +230,38 @@ extern unsigned long _PAGE_ALL_SZ_BITS; > extern struct page *mem_map_zero; > #define ZERO_PAGE(vaddr) (mem_map_zero) > > +/* This macro must be updated when the size of struct page grows above 80 > + * or reduces below 64. > + * The idea that compiler optimizes out switch() statement, and only > + * leaves clrx instructions or memset() call. > + */ > +#define mm_zero_struct_page(pp) do { > \ > + unsigned long *_pp = (void *)(pp); \ > + \ > + /* Check that struct page is 8-byte aligned */ \ > + BUILD_BUG_ON(sizeof(struct page) & 7); \ Would also be good to catch if sizeof > 80 so we do not silently migrate to the suboptimal version (silent at build time). Can you at build time catch if size is no any of: 64, 72, 80 and simplify the below a little? Sam
Re: [v5 09/15] sparc64: optimized struct page zeroing
Hi Pavel. On Thu, Aug 03, 2017 at 05:23:47PM -0400, Pavel Tatashin wrote: > Add an optimized mm_zero_struct_page(), so struct page's are zeroed without > calling memset(). We do eight regular stores, thus avoid cost of membar. The commit message does no longer reflect the implementation, and should be updated. > > Signed-off-by: Pavel Tatashin > Reviewed-by: Steven Sistare > Reviewed-by: Daniel Jordan > Reviewed-by: Bob Picco > --- > arch/sparc/include/asm/pgtable_64.h | 32 > 1 file changed, 32 insertions(+) > > diff --git a/arch/sparc/include/asm/pgtable_64.h > b/arch/sparc/include/asm/pgtable_64.h > index 6fbd931f0570..be47537e84c5 100644 > --- a/arch/sparc/include/asm/pgtable_64.h > +++ b/arch/sparc/include/asm/pgtable_64.h > @@ -230,6 +230,38 @@ extern unsigned long _PAGE_ALL_SZ_BITS; > extern struct page *mem_map_zero; > #define ZERO_PAGE(vaddr) (mem_map_zero) > > +/* This macro must be updated when the size of struct page grows above 80 > + * or reduces below 64. > + * The idea that compiler optimizes out switch() statement, and only > + * leaves clrx instructions or memset() call. > + */ > +#define mm_zero_struct_page(pp) do { > \ > + unsigned long *_pp = (void *)(pp); \ > + \ > + /* Check that struct page is 8-byte aligned */ \ > + BUILD_BUG_ON(sizeof(struct page) & 7); \ Would also be good to catch if sizeof > 80 so we do not silently migrate to the suboptimal version (silent at build time). Can you at build time catch if size is no any of: 64, 72, 80 and simplify the below a little? Sam
[v5 09/15] sparc64: optimized struct page zeroing
Add an optimized mm_zero_struct_page(), so struct page's are zeroed without calling memset(). We do eight regular stores, thus avoid cost of membar. Signed-off-by: Pavel TatashinReviewed-by: Steven Sistare Reviewed-by: Daniel Jordan Reviewed-by: Bob Picco --- arch/sparc/include/asm/pgtable_64.h | 32 1 file changed, 32 insertions(+) diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 6fbd931f0570..be47537e84c5 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -230,6 +230,38 @@ extern unsigned long _PAGE_ALL_SZ_BITS; extern struct page *mem_map_zero; #define ZERO_PAGE(vaddr) (mem_map_zero) +/* This macro must be updated when the size of struct page grows above 80 + * or reduces below 64. + * The idea that compiler optimizes out switch() statement, and only + * leaves clrx instructions or memset() call. + */ +#definemm_zero_struct_page(pp) do { \ + unsigned long *_pp = (void *)(pp); \ + \ + /* Check that struct page is 8-byte aligned */ \ + BUILD_BUG_ON(sizeof(struct page) & 7); \ + \ + switch (sizeof(struct page)) { \ + case 80:\ + _pp[9] = 0; /* fallthrough */ \ + case 72:\ + _pp[8] = 0; /* fallthrough */ \ + case 64:\ + _pp[7] = 0; \ + _pp[6] = 0; \ + _pp[5] = 0; \ + _pp[4] = 0; \ + _pp[3] = 0; \ + _pp[2] = 0; \ + _pp[1] = 0; \ + _pp[0] = 0; \ + break; /* no fallthrough */\ + default:\ + pr_warn_once("suboptimal mm_zero_struct_page"); \ + memset(_pp, 0, sizeof(struct page));\ + } \ +} while (0) + /* PFNs are real physical page numbers. However, mem_map only begins to record * per-page information starting at pfn_base. This is to handle systems where * the first physical page in the machine is at some huge physical address, -- 2.13.4
[v5 09/15] sparc64: optimized struct page zeroing
Add an optimized mm_zero_struct_page(), so struct page's are zeroed without calling memset(). We do eight regular stores, thus avoid cost of membar. Signed-off-by: Pavel Tatashin Reviewed-by: Steven Sistare Reviewed-by: Daniel Jordan Reviewed-by: Bob Picco --- arch/sparc/include/asm/pgtable_64.h | 32 1 file changed, 32 insertions(+) diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 6fbd931f0570..be47537e84c5 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -230,6 +230,38 @@ extern unsigned long _PAGE_ALL_SZ_BITS; extern struct page *mem_map_zero; #define ZERO_PAGE(vaddr) (mem_map_zero) +/* This macro must be updated when the size of struct page grows above 80 + * or reduces below 64. + * The idea that compiler optimizes out switch() statement, and only + * leaves clrx instructions or memset() call. + */ +#definemm_zero_struct_page(pp) do { \ + unsigned long *_pp = (void *)(pp); \ + \ + /* Check that struct page is 8-byte aligned */ \ + BUILD_BUG_ON(sizeof(struct page) & 7); \ + \ + switch (sizeof(struct page)) { \ + case 80:\ + _pp[9] = 0; /* fallthrough */ \ + case 72:\ + _pp[8] = 0; /* fallthrough */ \ + case 64:\ + _pp[7] = 0; \ + _pp[6] = 0; \ + _pp[5] = 0; \ + _pp[4] = 0; \ + _pp[3] = 0; \ + _pp[2] = 0; \ + _pp[1] = 0; \ + _pp[0] = 0; \ + break; /* no fallthrough */\ + default:\ + pr_warn_once("suboptimal mm_zero_struct_page"); \ + memset(_pp, 0, sizeof(struct page));\ + } \ +} while (0) + /* PFNs are real physical page numbers. However, mem_map only begins to record * per-page information starting at pfn_base. This is to handle systems where * the first physical page in the machine is at some huge physical address, -- 2.13.4