Re: [v5 09/15] sparc64: optimized struct page zeroing

2017-08-04 Thread Pasha Tatashin

Hi Sam,

Thank you for looking at this. I will update patch description, and as 
you suggested replace memset() via static assert in next iteration.


Pasha

On 08/04/2017 01:37 AM, Sam Ravnborg wrote:

Hi Pavel.

On Thu, Aug 03, 2017 at 05:23:47PM -0400, Pavel Tatashin wrote:

Add an optimized mm_zero_struct_page(), so struct page's are zeroed without
calling memset(). We do eight regular stores, thus avoid cost of membar.


The commit message does no longer reflect the implementation,
and should be updated.



Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
---
  arch/sparc/include/asm/pgtable_64.h | 32 
  1 file changed, 32 insertions(+)

diff --git a/arch/sparc/include/asm/pgtable_64.h 
b/arch/sparc/include/asm/pgtable_64.h
index 6fbd931f0570..be47537e84c5 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -230,6 +230,38 @@ extern unsigned long _PAGE_ALL_SZ_BITS;
  extern struct page *mem_map_zero;
  #define ZERO_PAGE(vaddr)  (mem_map_zero)
  
+/* This macro must be updated when the size of struct page grows above 80

+ * or reduces below 64.
+ * The idea that compiler optimizes out switch() statement, and only
+ * leaves clrx instructions or memset() call.
+ */
+#definemm_zero_struct_page(pp) do {
\
+   unsigned long *_pp = (void *)(pp);  \
+   \
+   /* Check that struct page is 8-byte aligned */  \
+   BUILD_BUG_ON(sizeof(struct page) & 7);  \

Would also be good to catch if sizeof > 80 so we do not silently
migrate to the suboptimal version (silent at build time).
Can you at build time catch if size is no any of: 64, 72, 80
and simplify the below a little?

Sam



Re: [v5 09/15] sparc64: optimized struct page zeroing

2017-08-04 Thread Pasha Tatashin

Hi Sam,

Thank you for looking at this. I will update patch description, and as 
you suggested replace memset() via static assert in next iteration.


Pasha

On 08/04/2017 01:37 AM, Sam Ravnborg wrote:

Hi Pavel.

On Thu, Aug 03, 2017 at 05:23:47PM -0400, Pavel Tatashin wrote:

Add an optimized mm_zero_struct_page(), so struct page's are zeroed without
calling memset(). We do eight regular stores, thus avoid cost of membar.


The commit message does no longer reflect the implementation,
and should be updated.



Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
---
  arch/sparc/include/asm/pgtable_64.h | 32 
  1 file changed, 32 insertions(+)

diff --git a/arch/sparc/include/asm/pgtable_64.h 
b/arch/sparc/include/asm/pgtable_64.h
index 6fbd931f0570..be47537e84c5 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -230,6 +230,38 @@ extern unsigned long _PAGE_ALL_SZ_BITS;
  extern struct page *mem_map_zero;
  #define ZERO_PAGE(vaddr)  (mem_map_zero)
  
+/* This macro must be updated when the size of struct page grows above 80

+ * or reduces below 64.
+ * The idea that compiler optimizes out switch() statement, and only
+ * leaves clrx instructions or memset() call.
+ */
+#definemm_zero_struct_page(pp) do {
\
+   unsigned long *_pp = (void *)(pp);  \
+   \
+   /* Check that struct page is 8-byte aligned */  \
+   BUILD_BUG_ON(sizeof(struct page) & 7);  \

Would also be good to catch if sizeof > 80 so we do not silently
migrate to the suboptimal version (silent at build time).
Can you at build time catch if size is no any of: 64, 72, 80
and simplify the below a little?

Sam



Re: [v5 09/15] sparc64: optimized struct page zeroing

2017-08-03 Thread Sam Ravnborg
Hi Pavel.

On Thu, Aug 03, 2017 at 05:23:47PM -0400, Pavel Tatashin wrote:
> Add an optimized mm_zero_struct_page(), so struct page's are zeroed without
> calling memset(). We do eight regular stores, thus avoid cost of membar.

The commit message does no longer reflect the implementation,
and should be updated.

> 
> Signed-off-by: Pavel Tatashin 
> Reviewed-by: Steven Sistare 
> Reviewed-by: Daniel Jordan 
> Reviewed-by: Bob Picco 
> ---
>  arch/sparc/include/asm/pgtable_64.h | 32 
>  1 file changed, 32 insertions(+)
> 
> diff --git a/arch/sparc/include/asm/pgtable_64.h 
> b/arch/sparc/include/asm/pgtable_64.h
> index 6fbd931f0570..be47537e84c5 100644
> --- a/arch/sparc/include/asm/pgtable_64.h
> +++ b/arch/sparc/include/asm/pgtable_64.h
> @@ -230,6 +230,38 @@ extern unsigned long _PAGE_ALL_SZ_BITS;
>  extern struct page *mem_map_zero;
>  #define ZERO_PAGE(vaddr) (mem_map_zero)
>  
> +/* This macro must be updated when the size of struct page grows above 80
> + * or reduces below 64.
> + * The idea that compiler optimizes out switch() statement, and only
> + * leaves clrx instructions or memset() call.
> + */
> +#define  mm_zero_struct_page(pp) do {
> \
> + unsigned long *_pp = (void *)(pp);  \
> + \
> + /* Check that struct page is 8-byte aligned */  \
> + BUILD_BUG_ON(sizeof(struct page) & 7);  \
Would also be good to catch if sizeof > 80 so we do not silently
migrate to the suboptimal version (silent at build time).
Can you at build time catch if size is no any of: 64, 72, 80
and simplify the below a little?

Sam


Re: [v5 09/15] sparc64: optimized struct page zeroing

2017-08-03 Thread Sam Ravnborg
Hi Pavel.

On Thu, Aug 03, 2017 at 05:23:47PM -0400, Pavel Tatashin wrote:
> Add an optimized mm_zero_struct_page(), so struct page's are zeroed without
> calling memset(). We do eight regular stores, thus avoid cost of membar.

The commit message does no longer reflect the implementation,
and should be updated.

> 
> Signed-off-by: Pavel Tatashin 
> Reviewed-by: Steven Sistare 
> Reviewed-by: Daniel Jordan 
> Reviewed-by: Bob Picco 
> ---
>  arch/sparc/include/asm/pgtable_64.h | 32 
>  1 file changed, 32 insertions(+)
> 
> diff --git a/arch/sparc/include/asm/pgtable_64.h 
> b/arch/sparc/include/asm/pgtable_64.h
> index 6fbd931f0570..be47537e84c5 100644
> --- a/arch/sparc/include/asm/pgtable_64.h
> +++ b/arch/sparc/include/asm/pgtable_64.h
> @@ -230,6 +230,38 @@ extern unsigned long _PAGE_ALL_SZ_BITS;
>  extern struct page *mem_map_zero;
>  #define ZERO_PAGE(vaddr) (mem_map_zero)
>  
> +/* This macro must be updated when the size of struct page grows above 80
> + * or reduces below 64.
> + * The idea that compiler optimizes out switch() statement, and only
> + * leaves clrx instructions or memset() call.
> + */
> +#define  mm_zero_struct_page(pp) do {
> \
> + unsigned long *_pp = (void *)(pp);  \
> + \
> + /* Check that struct page is 8-byte aligned */  \
> + BUILD_BUG_ON(sizeof(struct page) & 7);  \
Would also be good to catch if sizeof > 80 so we do not silently
migrate to the suboptimal version (silent at build time).
Can you at build time catch if size is no any of: 64, 72, 80
and simplify the below a little?

Sam


[v5 09/15] sparc64: optimized struct page zeroing

2017-08-03 Thread Pavel Tatashin
Add an optimized mm_zero_struct_page(), so struct page's are zeroed without
calling memset(). We do eight regular stores, thus avoid cost of membar.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
---
 arch/sparc/include/asm/pgtable_64.h | 32 
 1 file changed, 32 insertions(+)

diff --git a/arch/sparc/include/asm/pgtable_64.h 
b/arch/sparc/include/asm/pgtable_64.h
index 6fbd931f0570..be47537e84c5 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -230,6 +230,38 @@ extern unsigned long _PAGE_ALL_SZ_BITS;
 extern struct page *mem_map_zero;
 #define ZERO_PAGE(vaddr)   (mem_map_zero)
 
+/* This macro must be updated when the size of struct page grows above 80
+ * or reduces below 64.
+ * The idea that compiler optimizes out switch() statement, and only
+ * leaves clrx instructions or memset() call.
+ */
+#definemm_zero_struct_page(pp) do {
\
+   unsigned long *_pp = (void *)(pp);  \
+   \
+   /* Check that struct page is 8-byte aligned */  \
+   BUILD_BUG_ON(sizeof(struct page) & 7);  \
+   \
+   switch (sizeof(struct page)) {  \
+   case 80:\
+   _pp[9] = 0; /* fallthrough */   \
+   case 72:\
+   _pp[8] = 0; /* fallthrough */   \
+   case 64:\
+   _pp[7] = 0; \
+   _pp[6] = 0; \
+   _pp[5] = 0; \
+   _pp[4] = 0; \
+   _pp[3] = 0; \
+   _pp[2] = 0; \
+   _pp[1] = 0; \
+   _pp[0] = 0; \
+   break;  /* no fallthrough */\
+   default:\
+   pr_warn_once("suboptimal mm_zero_struct_page"); \
+   memset(_pp, 0, sizeof(struct page));\
+   }   \
+} while (0)
+
 /* PFNs are real physical page numbers.  However, mem_map only begins to record
  * per-page information starting at pfn_base.  This is to handle systems where
  * the first physical page in the machine is at some huge physical address,
-- 
2.13.4



[v5 09/15] sparc64: optimized struct page zeroing

2017-08-03 Thread Pavel Tatashin
Add an optimized mm_zero_struct_page(), so struct page's are zeroed without
calling memset(). We do eight regular stores, thus avoid cost of membar.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
---
 arch/sparc/include/asm/pgtable_64.h | 32 
 1 file changed, 32 insertions(+)

diff --git a/arch/sparc/include/asm/pgtable_64.h 
b/arch/sparc/include/asm/pgtable_64.h
index 6fbd931f0570..be47537e84c5 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -230,6 +230,38 @@ extern unsigned long _PAGE_ALL_SZ_BITS;
 extern struct page *mem_map_zero;
 #define ZERO_PAGE(vaddr)   (mem_map_zero)
 
+/* This macro must be updated when the size of struct page grows above 80
+ * or reduces below 64.
+ * The idea that compiler optimizes out switch() statement, and only
+ * leaves clrx instructions or memset() call.
+ */
+#definemm_zero_struct_page(pp) do {
\
+   unsigned long *_pp = (void *)(pp);  \
+   \
+   /* Check that struct page is 8-byte aligned */  \
+   BUILD_BUG_ON(sizeof(struct page) & 7);  \
+   \
+   switch (sizeof(struct page)) {  \
+   case 80:\
+   _pp[9] = 0; /* fallthrough */   \
+   case 72:\
+   _pp[8] = 0; /* fallthrough */   \
+   case 64:\
+   _pp[7] = 0; \
+   _pp[6] = 0; \
+   _pp[5] = 0; \
+   _pp[4] = 0; \
+   _pp[3] = 0; \
+   _pp[2] = 0; \
+   _pp[1] = 0; \
+   _pp[0] = 0; \
+   break;  /* no fallthrough */\
+   default:\
+   pr_warn_once("suboptimal mm_zero_struct_page"); \
+   memset(_pp, 0, sizeof(struct page));\
+   }   \
+} while (0)
+
 /* PFNs are real physical page numbers.  However, mem_map only begins to record
  * per-page information starting at pfn_base.  This is to handle systems where
  * the first physical page in the machine is at some huge physical address,
-- 
2.13.4