Re: [PATCH] powerpc: Avoid taking a data miss on every userspace instruction miss

2017-04-13 Thread Michael Ellerman
Christophe LEROY  writes:

> Hi Anton,
>
> Le 04/04/2017 à 00:00, Anton Blanchard a écrit :
>> Hi Christophe,
>>
 -  if (user_mode(regs))
 +  if (!is_exec && user_mode(regs))
>>>
>>> Shouldn't it also check 'is_write' ?
>>> If it is a store, is_write should be set, shouldn't it ?
>>
>> Thanks, Ben had the same suggestion. I'll add that further optimisation
>> in a subsequent patch.
>>
>
> For your information, I made some benchmark test using 'perf stat' with 
> your app on MPC8321 and MPC885, and I got the following results:

 MPC8321:
 before47386  faults 
 after 35181  faults-12205
 is_write  35181  faults-12205 

So that's good.

 MPC885:
 before:  176067  dTLB-load-misses 
   52722  iTLB-load-misses 
   25718  faults 
 after:   152462  dTLB-load-misses-23605
   52715  iTLB-load-misses-7
   19611  faults   -6107

 is_write:147162  dTLB-load-misses-28905
   52716  iTLB-load-misses-6
   19610  faults   -6108

Also good, and shows that is_write idea would be even better.

cheers


Re: [PATCH] powerpc: Avoid taking a data miss on every userspace instruction miss

2017-04-12 Thread Christophe LEROY

Hi Anton,

Le 04/04/2017 à 00:00, Anton Blanchard a écrit :

Hi Christophe,


-   if (user_mode(regs))
+   if (!is_exec && user_mode(regs))


Shouldn't it also check 'is_write' ?
If it is a store, is_write should be set, shouldn't it ?


Thanks, Ben had the same suggestion. I'll add that further optimisation
in a subsequent patch.

Anton



For your information, I made some benchmark test using 'perf stat' with 
your app on MPC8321 and MPC885, and I got the following results:


MPC8321 before the change:

 Performance counter stats for './fault 1000' (10 runs):

   4491.971466  cpu-clock (msec) 
  ( +-  0.03% )
 47386  faults 
  ( +-  0.02% )


   4.727864465 seconds time elapsed 
 ( +-  0.17% )



MPC8321 after your change:

 Performance counter stats for './fault 1000' (10 runs):

   4278.738845  cpu-clock (msec) 
  ( +-  0.02% )
 35181  faults 
  ( +-  0.02% )


   4.504443891 seconds time elapsed 
 ( +-  0.19% )



MPC8321 after changing !is_exec by is_write

 Performance counter stats for './fault 1000' (10 runs):

   4268.187261  cpu-clock (msec) 
  ( +-  0.03% )
 35181  faults 
  ( +-  0.01% )


   4.489207922 seconds time elapsed 
 ( +-  0.20% )






MPC885 before the change:

 Performance counter stats for './fault 500' (10 runs):

 726605854  cpu-cycles 
  ( +-  0.03% )
176067  dTLB-load-misses 
  ( +-  0.08% )
 52722  iTLB-load-misses 
  ( +-  0.01% )
 25718  faults 
  ( +-  0.03% )


   5.795924654 seconds time elapsed 
 ( +-  0.14% )



MPC885 after your change:

 Performance counter stats for './fault 500' (10 runs):

 711233251  cpu-cycles 
  ( +-  0.04% )
152462  dTLB-load-misses 
  ( +-  0.09% )
 52715  iTLB-load-misses 
  ( +-  0.01% )
 19611  faults 
  ( +-  0.02% )


   5.673784606 seconds time elapsed 
 ( +-  0.14% )



MPC885 after changing !is_exec by is_write

 Performance counter stats for './fault 500' (10 runs):

 710904083  cpu-cycles 
  ( +-  0.05% )
147162  dTLB-load-misses 
  ( +-  0.06% )
 52716  iTLB-load-misses 
  ( +-  0.01% )
 19610  faults 
  ( +-  0.02% )


   5.672091139 seconds time elapsed 
 ( +-  0.15% )




Christophe


Re: [PATCH] powerpc: Avoid taking a data miss on every userspace instruction miss

2017-04-03 Thread Anton Blanchard
Hi Christophe,

> > -   if (user_mode(regs))
> > +   if (!is_exec && user_mode(regs))  
> 
> Shouldn't it also check 'is_write' ?
> If it is a store, is_write should be set, shouldn't it ?

Thanks, Ben had the same suggestion. I'll add that further optimisation
in a subsequent patch.

Anton


Re: [PATCH] powerpc: Avoid taking a data miss on every userspace instruction miss

2017-04-03 Thread LEROY Christophe

Anton Blanchard  a écrit :


From: Anton Blanchard 

Early on in do_page_fault() we call store_updates_sp(), regardless of
the type of exception. For an instruction miss this doesn't make
sense, because we only use this information to detect if a data miss
is the result of a stack expansion instruction or not.

Worse still, it results in a data miss within every userspace
instruction miss handler, because we try and load the very instruction
we are about to install a pte for!

A simple exec microbenchmark runs 6% faster on POWER8 with this fix:

 #include 
 #include 
 #include 

int main(int argc, char *argv[])
{
unsigned long left = atol(argv[1]);
char leftstr[16];

if (left-- == 0)
return 0;

sprintf(leftstr, "%ld", left);
execlp(argv[0], argv[0], leftstr, NULL);
perror("exec failed\n");

return 0;
}

Pass the number of iterations on the command line (eg 1) and time
how long it takes to execute.

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/mm/fault.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index fd6484fc2fa9..3a7d580fdc59 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -287,7 +287,7 @@ int do_page_fault(struct pt_regs *regs, unsigned  
long address,

 * can result in fault, which will cause a deadlock when called with
 * mmap_sem held
 */
-   if (user_mode(regs))
+   if (!is_exec && user_mode(regs))


Shouldn't it also check 'is_write' ?
If it is a store, is_write should be set, shouldn't it ?

Christophe


store_update_sp = store_updates_sp(regs);

if (user_mode(regs))
--
2.11.0





[PATCH] powerpc: Avoid taking a data miss on every userspace instruction miss

2017-04-03 Thread Anton Blanchard
From: Anton Blanchard 

Early on in do_page_fault() we call store_updates_sp(), regardless of
the type of exception. For an instruction miss this doesn't make
sense, because we only use this information to detect if a data miss
is the result of a stack expansion instruction or not.

Worse still, it results in a data miss within every userspace
instruction miss handler, because we try and load the very instruction
we are about to install a pte for!

A simple exec microbenchmark runs 6% faster on POWER8 with this fix:

 #include 
 #include 
 #include 

int main(int argc, char *argv[])
{
unsigned long left = atol(argv[1]);
char leftstr[16];

if (left-- == 0)
return 0;

sprintf(leftstr, "%ld", left);
execlp(argv[0], argv[0], leftstr, NULL);
perror("exec failed\n");

return 0;
}

Pass the number of iterations on the command line (eg 1) and time
how long it takes to execute.

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/mm/fault.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index fd6484fc2fa9..3a7d580fdc59 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -287,7 +287,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long 
address,
 * can result in fault, which will cause a deadlock when called with
 * mmap_sem held
 */
-   if (user_mode(regs))
+   if (!is_exec && user_mode(regs))
store_update_sp = store_updates_sp(regs);
 
if (user_mode(regs))
-- 
2.11.0



[PATCH] powerpc: Avoid taking a data miss on every userspace instruction miss

2017-03-30 Thread Anton Blanchard
From: Anton Blanchard 

Early on in do_page_fault() we call store_updates_sp(), regardless of
the type of exception. For an instruction miss this doesn't make
sense, because we only use this information to detect if a data miss
is the result of a stack expansion instruction or not.

Worse still, it results in a data miss within every userspace
instruction miss handler, because we try and load the very instruction
we are about to install a pte for!

A simple exec microbenchmark runs 6% faster on POWER8 with this fix:

int main(int argc, char *argv[])
{
unsigned long left = atol(argv[1]);
char leftstr[16];

if (left-- == 0)
return 0;

sprintf(leftstr, "%ld", left);
execlp(argv[0], argv[0], leftstr, NULL);
perror("exec failed\n");

return 0;
}

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/mm/fault.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index fd6484fc2fa9..3a7d580fdc59 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -287,7 +287,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long 
address,
 * can result in fault, which will cause a deadlock when called with
 * mmap_sem held
 */
-   if (user_mode(regs))
+   if (!is_exec && user_mode(regs))
store_update_sp = store_updates_sp(regs);
 
if (user_mode(regs))
-- 
2.11.0