Re: [PATCH v5 20/22] powerpc/syscall: Avoid storing 'current' in another pointer

2021-02-10 Thread Christophe Leroy




Le 10/02/2021 à 03:00, Nicholas Piggin a écrit :

Excerpts from Christophe Leroy's message of February 10, 2021 3:03 am:



Le 09/02/2021 à 15:31, David Laight a écrit :

From: Segher Boessenkool

Sent: 09 February 2021 13:51

On Tue, Feb 09, 2021 at 12:36:20PM +1000, Nicholas Piggin wrote:

What if you did this?



+static inline struct task_struct *get_current(void)
+{
+   register struct task_struct *task asm ("r2");
+
+   return task;
+}


Local register asm variables are *only* guaranteed to live in that
register as operands to an asm.  See

https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
("The only supported use" etc.)

You can do something like

static inline struct task_struct *get_current(void)
{
register struct task_struct *task asm ("r2");

asm("" : "+r"(task));

return task;
}

which makes sure that "task" actually is in r2 at the point of that asm.


If "r2" always contains current (and is never assigned by the compiler)
why not use a global register variable for it?




The change proposed by Nick doesn't solve the issue.


It seemed to change code generation in a simple test case, oh well.



The problem is that at the begining of the function we have:

unsigned long *ti_flagsp = _thread_info()->flags;

When the function uses ti_flagsp for the first time, it does use 112(r2)

Then the function calls some other functions.

Most likely because the function could update 'current', GCC copies r2 into 
r30, so that if r2 get
changed by the called function, ti_flagsp is still based on the previous value 
of current.

Allthough we know r2 wont change, GCC doesn't know it. And in order to save r2 
into r30, it needs to
save r30 in the stack.


By using _thread_info()->flags directly instead of this intermediaite 
ti_flagsp pointer, GCC
uses r2 instead instead of doing a copy.


Nick, I don't understand the reason why you need that 'ti_flagsp' local var.


Just to save typing, I don't mind your patch I was just wondering if
current could be improved in general.



Thanks,

I'll post v6 of it as a follow-up of yesterday's two remaining follow-up 
patches.

Christophe


Re: [PATCH v5 20/22] powerpc/syscall: Avoid storing 'current' in another pointer

2021-02-09 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of February 10, 2021 3:03 am:
> 
> 
> Le 09/02/2021 à 15:31, David Laight a écrit :
>> From: Segher Boessenkool
>>> Sent: 09 February 2021 13:51
>>>
>>> On Tue, Feb 09, 2021 at 12:36:20PM +1000, Nicholas Piggin wrote:
 What if you did this?
>>>
 +static inline struct task_struct *get_current(void)
 +{
 +  register struct task_struct *task asm ("r2");
 +
 +  return task;
 +}
>>>
>>> Local register asm variables are *only* guaranteed to live in that
>>> register as operands to an asm.  See
>>>
>>> https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
>>> ("The only supported use" etc.)
>>>
>>> You can do something like
>>>
>>> static inline struct task_struct *get_current(void)
>>> {
>>> register struct task_struct *task asm ("r2");
>>>
>>> asm("" : "+r"(task));
>>>
>>> return task;
>>> }
>>>
>>> which makes sure that "task" actually is in r2 at the point of that asm.
>> 
>> If "r2" always contains current (and is never assigned by the compiler)
>> why not use a global register variable for it?
>> 
> 
> 
> The change proposed by Nick doesn't solve the issue.

It seemed to change code generation in a simple test case, oh well.

> 
> The problem is that at the begining of the function we have:
> 
>   unsigned long *ti_flagsp = _thread_info()->flags;
> 
> When the function uses ti_flagsp for the first time, it does use 112(r2)
> 
> Then the function calls some other functions.
> 
> Most likely because the function could update 'current', GCC copies r2 into 
> r30, so that if r2 get 
> changed by the called function, ti_flagsp is still based on the previous 
> value of current.
> 
> Allthough we know r2 wont change, GCC doesn't know it. And in order to save 
> r2 into r30, it needs to 
> save r30 in the stack.
> 
> 
> By using _thread_info()->flags directly instead of this intermediaite 
> ti_flagsp pointer, GCC 
> uses r2 instead instead of doing a copy.
> 
> 
> Nick, I don't understand the reason why you need that 'ti_flagsp' local var.

Just to save typing, I don't mind your patch I was just wondering if 
current could be improved in general.

Thanks,
Nick


RE: [PATCH v5 20/22] powerpc/syscall: Avoid storing 'current' in another pointer

2021-02-09 Thread David Laight
From: Christophe Leroy 
> Sent: 09 February 2021 17:04
> 
> Le 09/02/2021 à 15:31, David Laight a écrit :
> > From: Segher Boessenkool
> >> Sent: 09 February 2021 13:51
> >>
> >> On Tue, Feb 09, 2021 at 12:36:20PM +1000, Nicholas Piggin wrote:
> >>> What if you did this?
> >>
> >>> +static inline struct task_struct *get_current(void)
> >>> +{
> >>> + register struct task_struct *task asm ("r2");
> >>> +
> >>> + return task;
> >>> +}
> >>
> >> Local register asm variables are *only* guaranteed to live in that
> >> register as operands to an asm.  See
> >>
> >> https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
> >> ("The only supported use" etc.)
> >>
> >> You can do something like
> >>
> >> static inline struct task_struct *get_current(void)
> >> {
> >>register struct task_struct *task asm ("r2");
> >>
> >>asm("" : "+r"(task));
> >>
> >>return task;
> >> }
> >>
> >> which makes sure that "task" actually is in r2 at the point of that asm.
> >
> > If "r2" always contains current (and is never assigned by the compiler)
> > why not use a global register variable for it?
> >
> 
> 
> The change proposed by Nick doesn't solve the issue.
> 
> The problem is that at the begining of the function we have:
> 
>   unsigned long *ti_flagsp = _thread_info()->flags;
> 
> When the function uses ti_flagsp for the first time, it does use 112(r2)
> 
> Then the function calls some other functions.
> 
> Most likely because the function could update 'current', GCC copies r2 into 
> r30, so that if r2 get
> changed by the called function, ti_flagsp is still based on the previous 
> value of current.
> 
> Allthough we know r2 wont change, GCC doesn't know it. And in order to save 
> r2 into r30, it needs to
> save r30 in the stack.
> 
> 
> By using _thread_info()->flags directly instead of this intermediaite 
> ti_flagsp pointer, GCC
> uses r2 instead instead of doing a copy.

Does marking current_thread_info() 'pure' (I think that the right one)
work - so that gcc knows its result doesn't depend on external data
and that it doesn't change external data.

Although I'm not 100% how well those attributes actually work.

> Nick, I don't understand the reason why you need that 'ti_flagsp' local var.

Probably to save typing.

I sometimes reload locals after function calls.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [PATCH v5 20/22] powerpc/syscall: Avoid storing 'current' in another pointer

2021-02-09 Thread Christophe Leroy




Le 09/02/2021 à 15:31, David Laight a écrit :

From: Segher Boessenkool

Sent: 09 February 2021 13:51

On Tue, Feb 09, 2021 at 12:36:20PM +1000, Nicholas Piggin wrote:

What if you did this?



+static inline struct task_struct *get_current(void)
+{
+   register struct task_struct *task asm ("r2");
+
+   return task;
+}


Local register asm variables are *only* guaranteed to live in that
register as operands to an asm.  See
   
https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
("The only supported use" etc.)

You can do something like

static inline struct task_struct *get_current(void)
{
register struct task_struct *task asm ("r2");

asm("" : "+r"(task));

return task;
}

which makes sure that "task" actually is in r2 at the point of that asm.


If "r2" always contains current (and is never assigned by the compiler)
why not use a global register variable for it?




The change proposed by Nick doesn't solve the issue.

The problem is that at the begining of the function we have:

unsigned long *ti_flagsp = _thread_info()->flags;

When the function uses ti_flagsp for the first time, it does use 112(r2)

Then the function calls some other functions.

Most likely because the function could update 'current', GCC copies r2 into r30, so that if r2 get 
changed by the called function, ti_flagsp is still based on the previous value of current.


Allthough we know r2 wont change, GCC doesn't know it. And in order to save r2 into r30, it needs to 
save r30 in the stack.



By using _thread_info()->flags directly instead of this intermediaite ti_flagsp pointer, GCC 
uses r2 instead instead of doing a copy.



Nick, I don't understand the reason why you need that 'ti_flagsp' local var.

Christophe


RE: [PATCH v5 20/22] powerpc/syscall: Avoid storing 'current' in another pointer

2021-02-09 Thread David Laight
From: Segher Boessenkool
> Sent: 09 February 2021 13:51
> 
> On Tue, Feb 09, 2021 at 12:36:20PM +1000, Nicholas Piggin wrote:
> > What if you did this?
> 
> > +static inline struct task_struct *get_current(void)
> > +{
> > +   register struct task_struct *task asm ("r2");
> > +
> > +   return task;
> > +}
> 
> Local register asm variables are *only* guaranteed to live in that
> register as operands to an asm.  See
>   
> https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
> ("The only supported use" etc.)
> 
> You can do something like
> 
> static inline struct task_struct *get_current(void)
> {
>   register struct task_struct *task asm ("r2");
> 
>   asm("" : "+r"(task));
> 
>   return task;
> }
> 
> which makes sure that "task" actually is in r2 at the point of that asm.

If "r2" always contains current (and is never assigned by the compiler)
why not use a global register variable for it?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: [PATCH v5 20/22] powerpc/syscall: Avoid storing 'current' in another pointer

2021-02-09 Thread Segher Boessenkool
On Tue, Feb 09, 2021 at 12:36:20PM +1000, Nicholas Piggin wrote:
> What if you did this?

> +static inline struct task_struct *get_current(void)
> +{
> + register struct task_struct *task asm ("r2");
> +
> + return task;
> +}

Local register asm variables are *only* guaranteed to live in that
register as operands to an asm.  See
  
https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
("The only supported use" etc.)

You can do something like

static inline struct task_struct *get_current(void)
{
register struct task_struct *task asm ("r2");

asm("" : "+r"(task));

return task;
}

which makes sure that "task" actually is in r2 at the point of that asm.


Segher


Re: [PATCH v5 20/22] powerpc/syscall: Avoid storing 'current' in another pointer

2021-02-08 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of February 9, 2021 1:10 am:
> By saving the pointer pointing to thread_info.flags, gcc copies r2
> in a non-volatile register.
> 
> We know 'current' doesn't change, so avoid that intermediaite pointer.
> 
> Reduces null_syscall benchmark by 2 cycles (322 => 320 cycles)
> 
> On PPC64, gcc seems to know that 'current' is not changing, and it keeps
> it in a non volatile register to avoid multiple read of 'current' in paca.
> 
> Signed-off-by: Christophe Leroy 

What if you did this?

---
 arch/powerpc/include/asm/current.h | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/current.h 
b/arch/powerpc/include/asm/current.h
index bbfb94800415..59ab327972a5 100644
--- a/arch/powerpc/include/asm/current.h
+++ b/arch/powerpc/include/asm/current.h
@@ -23,16 +23,19 @@ static inline struct task_struct *get_current(void)
 
return task;
 }
-#define currentget_current()
 
 #else
 
-/*
- * We keep `current' in r2 for speed.
- */
-register struct task_struct *current asm ("r2");
+static inline struct task_struct *get_current(void)
+{
+   register struct task_struct *task asm ("r2");
+
+   return task;
+}
 
 #endif
 
+#define currentget_current()
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_CURRENT_H */
-- 


[PATCH v5 20/22] powerpc/syscall: Avoid storing 'current' in another pointer

2021-02-08 Thread Christophe Leroy
By saving the pointer pointing to thread_info.flags, gcc copies r2
in a non-volatile register.

We know 'current' doesn't change, so avoid that intermediaite pointer.

Reduces null_syscall benchmark by 2 cycles (322 => 320 cycles)

On PPC64, gcc seems to know that 'current' is not changing, and it keeps
it in a non volatile register to avoid multiple read of 'current' in paca.

Signed-off-by: Christophe Leroy 
---
v5: Also in interrupt exit prepare
---
 arch/powerpc/kernel/interrupt.c | 28 +++-
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index 8c38e8c95be2..c89a8eac3e24 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -223,7 +223,6 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
   struct pt_regs *regs,
   long scv)
 {
-   unsigned long *ti_flagsp = _thread_info()->flags;
unsigned long ti_flags;
unsigned long ret = 0;
 
@@ -241,7 +240,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
/* Check whether the syscall is issued inside a restartable sequence */
rseq_syscall(regs);
 
-   ti_flags = *ti_flagsp;
+   ti_flags = current_thread_info()->flags;
 
if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && !scv) {
if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL {
@@ -255,7 +254,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
ret = _TIF_RESTOREALL;
else
regs->gpr[3] = r3;
-   clear_bits(_TIF_PERSYSCALL_MASK, ti_flagsp);
+   clear_bits(_TIF_PERSYSCALL_MASK, _thread_info()->flags);
} else {
regs->gpr[3] = r3;
}
@@ -268,7 +267,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
local_irq_disable();
 
 again:
-   ti_flags = READ_ONCE(*ti_flagsp);
+   ti_flags = READ_ONCE(current_thread_info()->flags);
while (unlikely(ti_flags & (_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM))) {
local_irq_enable();
if (ti_flags & _TIF_NEED_RESCHED) {
@@ -284,7 +283,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
do_notify_resume(regs, ti_flags);
}
local_irq_disable();
-   ti_flags = READ_ONCE(*ti_flagsp);
+   ti_flags = READ_ONCE(current_thread_info()->flags);
}
 
if (IS_ENABLED(CONFIG_PPC_BOOK3S) && IS_ENABLED(CONFIG_PPC_FPU)) {
@@ -339,10 +338,6 @@ notrace unsigned long syscall_exit_prepare(unsigned long 
r3,
 #ifndef CONFIG_PPC_BOOK3E_64 /* BOOK3E not yet using this */
 notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, 
unsigned long msr)
 {
-#ifdef CONFIG_PPC_BOOK3E
-   struct thread_struct *ts = >thread;
-#endif
-   unsigned long *ti_flagsp = _thread_info()->flags;
unsigned long ti_flags;
unsigned long flags;
unsigned long ret = 0;
@@ -365,7 +360,7 @@ notrace unsigned long interrupt_exit_user_prepare(struct 
pt_regs *regs, unsigned
local_irq_save(flags);
 
 again:
-   ti_flags = READ_ONCE(*ti_flagsp);
+   ti_flags = READ_ONCE(current_thread_info()->flags);
while (unlikely(ti_flags & (_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM))) {
local_irq_enable(); /* returning to user: may enable */
if (ti_flags & _TIF_NEED_RESCHED) {
@@ -376,7 +371,7 @@ notrace unsigned long interrupt_exit_user_prepare(struct 
pt_regs *regs, unsigned
do_notify_resume(regs, ti_flags);
}
local_irq_disable();
-   ti_flags = READ_ONCE(*ti_flagsp);
+   ti_flags = READ_ONCE(current_thread_info()->flags);
}
 
if (IS_ENABLED(CONFIG_PPC_BOOK3S) && IS_ENABLED(CONFIG_PPC_FPU)) {
@@ -407,13 +402,13 @@ notrace unsigned long interrupt_exit_user_prepare(struct 
pt_regs *regs, unsigned
}
 
 #ifdef CONFIG_PPC_BOOK3E
-   if (unlikely(ts->debug.dbcr0 & DBCR0_IDM)) {
+   if (unlikely(current->thread.debug.dbcr0 & DBCR0_IDM)) {
/*
 * Check to see if the dbcr0 register is set up to debug.
 * Use the internal debug mode bit to do this.
 */
mtmsr(mfmsr() & ~MSR_DE);
-   mtspr(SPRN_DBCR0, ts->debug.dbcr0);
+   mtspr(SPRN_DBCR0, current->thread.debug.dbcr0);
mtspr(SPRN_DBSR, -1);
}
 #endif
@@ -438,7 +433,6 @@ void preempt_schedule_irq(void);
 
 notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, 
unsigned long msr)
 {
-   unsigned long *ti_flagsp = _thread_info()->flags;
unsigned long flags;
unsigned long ret = 0;
 #ifdef CONFIG_PPC64
@@ -461,8 +455,8 @@ notrace