Re: [PATCH] powerpc/kernel: improve FP and vector registers restoration

2017-06-04 Thread Michael Ellerman
Breno Leitao  writes:

> Currently tsk->thread->load_vec and load_fp are not initialized during a
> task creation, which set garbage to these variables (non-zero value).
>
> These variables will be checked later at restore_math() to validate if the
> FP and vectors are being utilized. Since these values might be non-zero,
> the restore_math() will continue to save the FP and vectors even if they
> were never utilized before the userspace application. load_fp and load_vec
> counters will then overflow and the FP and Altivec will be finally
> disabled, but before that condition is reached (counter overflow) several
> context switches restored FP and vector registers without need, causing a
> performance degradation.
>
> Signed-off-by: Breno Leitao 
> Signed-off-by: Gustavo Romero 

Thanks, I tweaked the wording a little and added:

Fixes: 70fe3d980f5f ("powerpc: Restore FPU/VEC/VSX if previously used")
Cc: sta...@vger.kernel.org # v4.6+

cheers


Re: [PATCH] powerpc/kernel: improve FP and vector registers restoration

2017-06-04 Thread Breno Leitao
On Sun, Jun 04, 2017 at 11:38:14AM +1000, Anton Blanchard wrote:
> On Sat, 3 Jun 2017 19:42:14 -0300
> Breno Leitao  wrote:
> 
> > Hi Anton,
> > 
> > On Sat, Jun 03, 2017 at 08:04:11AM +1000, Anton Blanchard wrote:
> > > Hi Breno,
> > >   
> > > > Currently tsk->thread->load_vec and load_fp are not initialized
> > > > during a task creation, which set garbage to these variables
> > > > (non-zero value).  
> > > 
> > > Nice catch! It seems like we should zero load_tm too though?  
> > 
> > Yes, it seems we need to zero load_tm also, since it does not seem to
> > be zeroed anywhere else.
> > 
> > But I did some tests, and load_tm is always zero after start_thread()
> > is being called.
> > 
> > In fact, start_thread() is being called and pt_regs->load_tm is
> > already zero since the function start.
> > 
> > I also wrote a SystemTap script[1] to investigate it better, and I've
> > never seen a single load_tm != 0 in a my machine. I tested on both
> > POWER8 bare metal and KVM guests. (load_vec and load_fp happened to
> > have garbage all the time)
> > 
> > Any idea if this is just occasional event, or, if there is someone
> > zeroing it in an obscure code?
> 
> Quite likely no one uses TM :) Try:

In fact, I had tested with TM[1] and haven't seen any issue, but I was not
calling a nested application (through execve() syscall). Somehow if I
call  "$ ./tm_application ; /bin/true", I do not see a non-zero load_tm
in the new task->thread.

On the other side, I see the corruption with your test case, mainly if I
sleep after 'tbegin.' and before execlp(), giving a chance to have
load_tm incremented, and this value is being inherited in the new
task->thread.

This is obviously wrong, I will send a patch to have it fixed.

Thanks for the guidance!

[1] https://github.com/leitao/htm_torture


Re: [PATCH] powerpc/kernel: improve FP and vector registers restoration

2017-06-03 Thread Anton Blanchard
On Sat, 3 Jun 2017 19:42:14 -0300
Breno Leitao  wrote:

> Hi Anton,
> 
> On Sat, Jun 03, 2017 at 08:04:11AM +1000, Anton Blanchard wrote:
> > Hi Breno,
> >   
> > > Currently tsk->thread->load_vec and load_fp are not initialized
> > > during a task creation, which set garbage to these variables
> > > (non-zero value).  
> > 
> > Nice catch! It seems like we should zero load_tm too though?  
> 
> Yes, it seems we need to zero load_tm also, since it does not seem to
> be zeroed anywhere else.
> 
> But I did some tests, and load_tm is always zero after start_thread()
> is being called.
> 
> In fact, start_thread() is being called and pt_regs->load_tm is
> already zero since the function start.
> 
> I also wrote a SystemTap script[1] to investigate it better, and I've
> never seen a single load_tm != 0 in a my machine. I tested on both
> POWER8 bare metal and KVM guests. (load_vec and load_fp happened to
> have garbage all the time)
> 
> Any idea if this is just occasional event, or, if there is someone
> zeroing it in an obscure code?

Quite likely no one uses TM :) Try:

#include 

int main(void)
{
__builtin_tbegin(0);
execlp("/bin/true", "/bin/true", NULL);
}

Anton


Re: [PATCH] powerpc/kernel: improve FP and vector registers restoration

2017-06-03 Thread Breno Leitao
Hi Anton,

On Sat, Jun 03, 2017 at 08:04:11AM +1000, Anton Blanchard wrote:
> Hi Breno,
> 
> > Currently tsk->thread->load_vec and load_fp are not initialized
> > during a task creation, which set garbage to these variables
> > (non-zero value).
> 
> Nice catch! It seems like we should zero load_tm too though?

Yes, it seems we need to zero load_tm also, since it does not seem to be
zeroed anywhere else.

But I did some tests, and load_tm is always zero after start_thread()
is being called.

In fact, start_thread() is being called and pt_regs->load_tm is already
zero since the function start.

I also wrote a SystemTap script[1] to investigate it better, and I've
never seen a single load_tm != 0 in a my machine. I tested on both
POWER8 bare metal and KVM guests. (load_vec and load_fp happened to have
garbage all the time)

Any idea if this is just occasional event, or, if there is someone
zeroing it in an obscure code?

[1] 
https://github.com/leitao/htm_torture/blob/master/systemtap/load_tm_at_start_thread.stap


[PATCH] powerpc/kernel: improve FP and vector registers restoration

2017-06-02 Thread Breno Leitao
Currently tsk->thread->load_vec and load_fp are not initialized during a
task creation, which set garbage to these variables (non-zero value).

These variables will be checked later at restore_math() to validate if the
FP and vectors are being utilized. Since these values might be non-zero,
the restore_math() will continue to save the FP and vectors even if they
were never utilized before the userspace application. load_fp and load_vec
counters will then overflow and the FP and Altivec will be finally
disabled, but before that condition is reached (counter overflow) several
context switches restored FP and vector registers without need, causing a
performance degradation.

Signed-off-by: Breno Leitao 
Signed-off-by: Gustavo Romero 
---
 arch/powerpc/kernel/process.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index baae104b16c7..a9435397eab8 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1666,6 +1666,7 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
 #ifdef CONFIG_VSX
current->thread.used_vsr = 0;
 #endif
+   current->thread.load_fp = 0;
memset(>thread.fp_state, 0, sizeof(current->thread.fp_state));
current->thread.fp_save_area = NULL;
 #ifdef CONFIG_ALTIVEC
@@ -1674,6 +1675,7 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
current->thread.vr_save_area = NULL;
current->thread.vrsave = 0;
current->thread.used_vr = 0;
+   current->thread.load_vec = 0;
 #endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_SPE
memset(current->thread.evr, 0, sizeof(current->thread.evr));
-- 
2.11.0



Re: [PATCH] powerpc/kernel: improve FP and vector registers restoration

2017-06-02 Thread Anton Blanchard
Hi Breno,

> Currently tsk->thread->load_vec and load_fp are not initialized
> during a task creation, which set garbage to these variables
> (non-zero value).

Nice catch! It seems like we should zero load_tm too though?

Acked-by: Anton Blanchard 

Anton

> These variables will be checked later at restore_math() to validate
> if the FP and vectors are being utilized. Since these values might be
> non-zero, the restore_math() will continue to save the FP and vectors
> even if they were never utilized before the userspace application.
> load_fp and load_vec counters will then overflow and the FP and
> Altivec will be finally disabled, but before that condition is
> reached (counter overflow) several context switches restored FP and
> vector registers without need, causing a performance degradation.
> 
> Signed-off-by: Breno Leitao 
> Signed-off-by: Gustavo Romero 
> ---
>  arch/powerpc/kernel/process.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/process.c
> b/arch/powerpc/kernel/process.c index baae104b16c7..a9435397eab8
> 100644 --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1666,6 +1666,7 @@ void start_thread(struct pt_regs *regs,
> unsigned long start, unsigned long sp) #ifdef CONFIG_VSX
>   current->thread.used_vsr = 0;
>  #endif
> + current->thread.load_fp = 0;
>   memset(>thread.fp_state, 0,
> sizeof(current->thread.fp_state)); current->thread.fp_save_area =
> NULL; #ifdef CONFIG_ALTIVEC
> @@ -1674,6 +1675,7 @@ void start_thread(struct pt_regs *regs,
> unsigned long start, unsigned long sp) current->thread.vr_save_area =
> NULL; current->thread.vrsave = 0;
>   current->thread.used_vr = 0;
> + current->thread.load_vec = 0;
>  #endif /* CONFIG_ALTIVEC */
>  #ifdef CONFIG_SPE
>   memset(current->thread.evr, 0, sizeof(current->thread.evr));



[PATCH] powerpc/kernel: improve FP and vector registers restoration

2017-06-02 Thread Breno Leitao
Currently tsk->thread->load_vec and load_fp are not initialized during a
task creation, which set garbage to these variables (non-zero value).

These variables will be checked later at restore_math() to validate if the
FP and vectors are being utilized. Since these values might be non-zero,
the restore_math() will continue to save the FP and vectors even if they
were never utilized before the userspace application. load_fp and load_vec
counters will then overflow and the FP and Altivec will be finally
disabled, but before that condition is reached (counter overflow) several
context switches restored FP and vector registers without need, causing a
performance degradation.

Signed-off-by: Breno Leitao 
Signed-off-by: Gustavo Romero 
---
 arch/powerpc/kernel/process.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index baae104b16c7..a9435397eab8 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1666,6 +1666,7 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
 #ifdef CONFIG_VSX
current->thread.used_vsr = 0;
 #endif
+   current->thread.load_fp = 0;
memset(>thread.fp_state, 0, sizeof(current->thread.fp_state));
current->thread.fp_save_area = NULL;
 #ifdef CONFIG_ALTIVEC
@@ -1674,6 +1675,7 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
current->thread.vr_save_area = NULL;
current->thread.vrsave = 0;
current->thread.used_vr = 0;
+   current->thread.load_vec = 0;
 #endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_SPE
memset(current->thread.evr, 0, sizeof(current->thread.evr));
-- 
2.11.0