Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2017-02-23 Thread Borislav Petkov
On Thu, Feb 23, 2017 at 06:28:01PM +0100, Peter Zijlstra wrote:
> I think Boris and Thomas were talking about moving the entire c1e
> detection out of there.

Yeah, see

07c94a381253 ("x86/amd: Simplify AMD E400 aware idle routine")

tglx and I got rid of reading that MSR in the idle routine.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2017-02-23 Thread Borislav Petkov
On Thu, Feb 23, 2017 at 06:28:01PM +0100, Peter Zijlstra wrote:
> I think Boris and Thomas were talking about moving the entire c1e
> detection out of there.

Yeah, see

07c94a381253 ("x86/amd: Simplify AMD E400 aware idle routine")

tglx and I got rid of reading that MSR in the idle routine.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2017-02-23 Thread Peter Zijlstra
On Thu, Feb 23, 2017 at 01:24:34PM +0100, Jiri Olsa wrote:
> On Mon, Nov 21, 2016 at 10:28:50AM +0100, Peter Zijlstra wrote:
> > 
> > I think I prefer something like the below, that only annotates the one
> > RDMSR in question, instead of all of them.
> > 
> > 
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index 0888a879120f..d6c6aa80675f 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
> > if (!amd_e400_c1e_detected) {
> > u32 lo, hi;
> >  
> > -   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> > +   RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
> >  
> > if (lo & K8_INTP_C1E_ACTIVE_MASK) {
> > amd_e400_c1e_detected = true;
> 
> hum, I might have missed some other solution in discussion,
> and can't see this one being pulled in.. should I resend this?
> 

I think Boris and Thomas were talking about moving the entire c1e
detection out of there.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2017-02-23 Thread Peter Zijlstra
On Thu, Feb 23, 2017 at 01:24:34PM +0100, Jiri Olsa wrote:
> On Mon, Nov 21, 2016 at 10:28:50AM +0100, Peter Zijlstra wrote:
> > 
> > I think I prefer something like the below, that only annotates the one
> > RDMSR in question, instead of all of them.
> > 
> > 
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index 0888a879120f..d6c6aa80675f 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
> > if (!amd_e400_c1e_detected) {
> > u32 lo, hi;
> >  
> > -   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> > +   RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
> >  
> > if (lo & K8_INTP_C1E_ACTIVE_MASK) {
> > amd_e400_c1e_detected = true;
> 
> hum, I might have missed some other solution in discussion,
> and can't see this one being pulled in.. should I resend this?
> 

I think Boris and Thomas were talking about moving the entire c1e
detection out of there.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2017-02-23 Thread Andi Kleen
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index 0888a879120f..d6c6aa80675f 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
> > if (!amd_e400_c1e_detected) {
> > u32 lo, hi;
> >  
> > -   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> > +   RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
> >  
> > if (lo & K8_INTP_C1E_ACTIVE_MASK) {
> > amd_e400_c1e_detected = true;
> 
> hum, I might have missed some other solution in discussion,
> and can't see this one being pulled in.. should I resend this?

I suppose you could also use the new notrace msr variant.

BTW this check is quite annoying.

-Andi


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2017-02-23 Thread Andi Kleen
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index 0888a879120f..d6c6aa80675f 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
> > if (!amd_e400_c1e_detected) {
> > u32 lo, hi;
> >  
> > -   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> > +   RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
> >  
> > if (lo & K8_INTP_C1E_ACTIVE_MASK) {
> > amd_e400_c1e_detected = true;
> 
> hum, I might have missed some other solution in discussion,
> and can't see this one being pulled in.. should I resend this?

I suppose you could also use the new notrace msr variant.

BTW this check is quite annoying.

-Andi


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2017-02-23 Thread Jiri Olsa
On Mon, Nov 21, 2016 at 10:28:50AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 21, 2016 at 01:53:43AM +0100, Jiri Olsa wrote:
> > hi,
> > Jan hit following output when msr tracepoints are enabled on amd server:
> > 
> > [   91.585653] ===
> > [   91.589840] [ INFO: suspicious RCU usage. ]
> > [   91.594025] 4.9.0-rc1+ #1 Not tainted
> > [   91.597691] ---
> > [   91.601877] ./arch/x86/include/asm/msr-trace.h:42 suspicious 
> > rcu_dereference_check() usage!
> > [   91.610222] 
> > [   91.610222] other info that might help us debug this:
> > [   91.610222] 
> > [   91.618224] 
> > [   91.618224] RCU used illegally from idle CPU!
> > [   91.618224] rcu_scheduler_active = 1, debug_locks = 0
> > [   91.629081] RCU used illegally from extended quiescent state!
> > [   91.634820] no locks held by swapper/1/0.
> > [   91.638832] 
> > [   91.638832] stack backtrace:
> > [   91.643192] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc1+ #1
> > [   91.649457] Hardware name: empty empty/S3992, BIOS 'V2.03   ' 05/09/2008
> > [   91.656159]  c900018fbdf8 813ed43c 88017ede8000 
> > 0001
> > [   91.663637]  c900018fbe28 810fdcd7 880233f95dd0 
> > c0010055
> > [   91.671107]    c900018fbe58 
> > 814297ac
> > [   91.678560] Call Trace:
> > [   91.681022]  [] dump_stack+0x85/0xc9
> > [   91.686164]  [] lockdep_rcu_suspicious+0xe7/0x120
> > [   91.692429]  [] do_trace_read_msr+0x14c/0x1b0
> > [   91.698349]  [] native_read_msr+0x32/0x40
> > [   91.703921]  [] amd_e400_idle+0x7e/0x110
> > [   91.709407]  [] arch_cpu_idle+0xf/0x20
> > [   91.714720]  [] default_idle_call+0x23/0x40
> > [   91.720467]  [] cpu_startup_entry+0x1da/0x2b0
> > [   91.726387]  [] start_secondary+0x17f/0x1f0
> > 
> > 
> > it got away with attached change.. but this rcu logic
> > is far beyond me, so it's just wild guess.. ;-)
> 
> I think I prefer something like the below, that only annotates the one
> RDMSR in question, instead of all of them.
> 
> 
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 0888a879120f..d6c6aa80675f 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
>   if (!amd_e400_c1e_detected) {
>   u32 lo, hi;
>  
> - rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> + RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
>  
>   if (lo & K8_INTP_C1E_ACTIVE_MASK) {
>   amd_e400_c1e_detected = true;

hum, I might have missed some other solution in discussion,
and can't see this one being pulled in.. should I resend this?

thanks,
jirka


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2017-02-23 Thread Jiri Olsa
On Mon, Nov 21, 2016 at 10:28:50AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 21, 2016 at 01:53:43AM +0100, Jiri Olsa wrote:
> > hi,
> > Jan hit following output when msr tracepoints are enabled on amd server:
> > 
> > [   91.585653] ===
> > [   91.589840] [ INFO: suspicious RCU usage. ]
> > [   91.594025] 4.9.0-rc1+ #1 Not tainted
> > [   91.597691] ---
> > [   91.601877] ./arch/x86/include/asm/msr-trace.h:42 suspicious 
> > rcu_dereference_check() usage!
> > [   91.610222] 
> > [   91.610222] other info that might help us debug this:
> > [   91.610222] 
> > [   91.618224] 
> > [   91.618224] RCU used illegally from idle CPU!
> > [   91.618224] rcu_scheduler_active = 1, debug_locks = 0
> > [   91.629081] RCU used illegally from extended quiescent state!
> > [   91.634820] no locks held by swapper/1/0.
> > [   91.638832] 
> > [   91.638832] stack backtrace:
> > [   91.643192] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc1+ #1
> > [   91.649457] Hardware name: empty empty/S3992, BIOS 'V2.03   ' 05/09/2008
> > [   91.656159]  c900018fbdf8 813ed43c 88017ede8000 
> > 0001
> > [   91.663637]  c900018fbe28 810fdcd7 880233f95dd0 
> > c0010055
> > [   91.671107]    c900018fbe58 
> > 814297ac
> > [   91.678560] Call Trace:
> > [   91.681022]  [] dump_stack+0x85/0xc9
> > [   91.686164]  [] lockdep_rcu_suspicious+0xe7/0x120
> > [   91.692429]  [] do_trace_read_msr+0x14c/0x1b0
> > [   91.698349]  [] native_read_msr+0x32/0x40
> > [   91.703921]  [] amd_e400_idle+0x7e/0x110
> > [   91.709407]  [] arch_cpu_idle+0xf/0x20
> > [   91.714720]  [] default_idle_call+0x23/0x40
> > [   91.720467]  [] cpu_startup_entry+0x1da/0x2b0
> > [   91.726387]  [] start_secondary+0x17f/0x1f0
> > 
> > 
> > it got away with attached change.. but this rcu logic
> > is far beyond me, so it's just wild guess.. ;-)
> 
> I think I prefer something like the below, that only annotates the one
> RDMSR in question, instead of all of them.
> 
> 
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 0888a879120f..d6c6aa80675f 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
>   if (!amd_e400_c1e_detected) {
>   u32 lo, hi;
>  
> - rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> + RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
>  
>   if (lo & K8_INTP_C1E_ACTIVE_MASK) {
>   amd_e400_c1e_detected = true;

hum, I might have missed some other solution in discussion,
and can't see this one being pulled in.. should I resend this?

thanks,
jirka


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-30 Thread Thomas Gleixner
On Wed, 30 Nov 2016, Borislav Petkov wrote:
> On Wed, Nov 30, 2016 at 09:54:58AM +0100, Thomas Gleixner wrote:
> > Right, that's the safe bet. But I'm quite sure that the C1E crap only
> > starts to work _after_ ACPI initialization.
> 
> Yap, I think it is an ACPI decision whether to enter C1E or not. And all
> those boxes which are unaffected - they actually are but since the FW
> doesn't enter C1E, they don't hit the bug.
> 
> > tick_force_broadcast() is irreversible
> 
> So if I let the cores in that small window use amd_e400_idle() and
> then when I detect the machine doesn't enter C1E after all, I do
> tick_broadcast_exit() on all cores in amd_e400_c1e_mask, then clear it
> and free it, that would work?
> 
> Or do you see a better solution?

Start with the bug cleared and select amd_e400_idle(), which will then use
default_idle() and not do any of the broadcast crap.

After ACPI gets initialized check the C1E misfeature and if it's detected,
then set the CPU BUG and send an IPI to all online CPUs.

I think this is safe because the CPU running the ACPI initialization is not
in idle and if any of the other CPUs would be stuck due to the C1E
enablement, then the IPI will kick them out.

At least worth a try.

Thanks,

tglx


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-30 Thread Thomas Gleixner
On Wed, 30 Nov 2016, Borislav Petkov wrote:
> On Wed, Nov 30, 2016 at 09:54:58AM +0100, Thomas Gleixner wrote:
> > Right, that's the safe bet. But I'm quite sure that the C1E crap only
> > starts to work _after_ ACPI initialization.
> 
> Yap, I think it is an ACPI decision whether to enter C1E or not. And all
> those boxes which are unaffected - they actually are but since the FW
> doesn't enter C1E, they don't hit the bug.
> 
> > tick_force_broadcast() is irreversible
> 
> So if I let the cores in that small window use amd_e400_idle() and
> then when I detect the machine doesn't enter C1E after all, I do
> tick_broadcast_exit() on all cores in amd_e400_c1e_mask, then clear it
> and free it, that would work?
> 
> Or do you see a better solution?

Start with the bug cleared and select amd_e400_idle(), which will then use
default_idle() and not do any of the broadcast crap.

After ACPI gets initialized check the C1E misfeature and if it's detected,
then set the CPU BUG and send an IPI to all online CPUs.

I think this is safe because the CPU running the ACPI initialization is not
in idle and if any of the other CPUs would be stuck due to the C1E
enablement, then the IPI will kick them out.

At least worth a try.

Thanks,

tglx


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-30 Thread Borislav Petkov
On Wed, Nov 30, 2016 at 09:54:58AM +0100, Thomas Gleixner wrote:
> Right, that's the safe bet. But I'm quite sure that the C1E crap only
> starts to work _after_ ACPI initialization.

Yap, I think it is an ACPI decision whether to enter C1E or not. And all
those boxes which are unaffected - they actually are but since the FW
doesn't enter C1E, they don't hit the bug.

> tick_force_broadcast() is irreversible

So if I let the cores in that small window use amd_e400_idle() and
then when I detect the machine doesn't enter C1E after all, I do
tick_broadcast_exit() on all cores in amd_e400_c1e_mask, then clear it
and free it, that would work?

Or do you see a better solution?

Thanks.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-30 Thread Borislav Petkov
On Wed, Nov 30, 2016 at 09:54:58AM +0100, Thomas Gleixner wrote:
> Right, that's the safe bet. But I'm quite sure that the C1E crap only
> starts to work _after_ ACPI initialization.

Yap, I think it is an ACPI decision whether to enter C1E or not. And all
those boxes which are unaffected - they actually are but since the FW
doesn't enter C1E, they don't hit the bug.

> tick_force_broadcast() is irreversible

So if I let the cores in that small window use amd_e400_idle() and
then when I detect the machine doesn't enter C1E after all, I do
tick_broadcast_exit() on all cores in amd_e400_c1e_mask, then clear it
and free it, that would work?

Or do you see a better solution?

Thanks.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-30 Thread Thomas Gleixner
On Wed, 30 Nov 2016, Borislav Petkov wrote:
> On Tue, Nov 29, 2016 at 02:59:01PM +0100, Thomas Gleixner wrote:
> > The issue is that you obvioulsy start with the assumption, that the machine
> > has this bug. As a consequence the machine is brute forced into tick
> > broadcast mode, which cannot be reverted when you clear that misfeature
> > after ACPI init. So in case of !NOHZ and !HIGHRES the periodic tick is
> > forced into broadcast mode, which is not what you want.
> > 
> > As far as I understood the whole magic, this C1E misfeature takes only
> > effect _after_ ACPI has been initialized. So instead of setting the bug in
> > early boot and therefor forcing the broadcast nonsense, we should only set
> > it when ACPI has actually detected it.
> 
> Problem is, select_idle_routine() runs a lot earlier than acpi_init() so
> there's a window where we don't definitively know yet whether the box is
> actually going to enter C1E or not.
> 
>   [ I presume the reason why we have to do the proper detection after
> ACPI has been initialized is because the frickelware decides whether
> to do C1E entry or not and then sets those bits in the MSR (or not). ]
> 
> If in that window we enter idle and we're on an affected machine and we
> *don't* switch to broadcast mode, we risk not waking up from C1E, i.e.,
> the main reason this fix was even done.
> 
> So, if we "prematurely" switch to broadcast mode on the affected CPUs,
> we're ok, it will be detected properly later and we're in broadcast
> mode already.

Right, that's the safe bet. But I'm quite sure that the C1E crap only
starts to work _after_ ACPI initialization.

> Now, on those machines which are not affected and we clear
> X86_BUG_AMD_APIC_C1E because they don't enter C1E at all, I was thinking
> of maybe doing amd_e400_remove_cpu() and clearing that e400 mask and
> even freeing it so that they can do default_idle().
> 
> But you're saying tick_broadcast_enter() is irreversible?

tick_force_broadcast() is irreversible

Thanks,

tglx



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-30 Thread Thomas Gleixner
On Wed, 30 Nov 2016, Borislav Petkov wrote:
> On Tue, Nov 29, 2016 at 02:59:01PM +0100, Thomas Gleixner wrote:
> > The issue is that you obvioulsy start with the assumption, that the machine
> > has this bug. As a consequence the machine is brute forced into tick
> > broadcast mode, which cannot be reverted when you clear that misfeature
> > after ACPI init. So in case of !NOHZ and !HIGHRES the periodic tick is
> > forced into broadcast mode, which is not what you want.
> > 
> > As far as I understood the whole magic, this C1E misfeature takes only
> > effect _after_ ACPI has been initialized. So instead of setting the bug in
> > early boot and therefor forcing the broadcast nonsense, we should only set
> > it when ACPI has actually detected it.
> 
> Problem is, select_idle_routine() runs a lot earlier than acpi_init() so
> there's a window where we don't definitively know yet whether the box is
> actually going to enter C1E or not.
> 
>   [ I presume the reason why we have to do the proper detection after
> ACPI has been initialized is because the frickelware decides whether
> to do C1E entry or not and then sets those bits in the MSR (or not). ]
> 
> If in that window we enter idle and we're on an affected machine and we
> *don't* switch to broadcast mode, we risk not waking up from C1E, i.e.,
> the main reason this fix was even done.
> 
> So, if we "prematurely" switch to broadcast mode on the affected CPUs,
> we're ok, it will be detected properly later and we're in broadcast
> mode already.

Right, that's the safe bet. But I'm quite sure that the C1E crap only
starts to work _after_ ACPI initialization.

> Now, on those machines which are not affected and we clear
> X86_BUG_AMD_APIC_C1E because they don't enter C1E at all, I was thinking
> of maybe doing amd_e400_remove_cpu() and clearing that e400 mask and
> even freeing it so that they can do default_idle().
> 
> But you're saying tick_broadcast_enter() is irreversible?

tick_force_broadcast() is irreversible

Thanks,

tglx



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-30 Thread Borislav Petkov
On Tue, Nov 29, 2016 at 02:59:01PM +0100, Thomas Gleixner wrote:
> The issue is that you obvioulsy start with the assumption, that the machine
> has this bug. As a consequence the machine is brute forced into tick
> broadcast mode, which cannot be reverted when you clear that misfeature
> after ACPI init. So in case of !NOHZ and !HIGHRES the periodic tick is
> forced into broadcast mode, which is not what you want.
> 
> As far as I understood the whole magic, this C1E misfeature takes only
> effect _after_ ACPI has been initialized. So instead of setting the bug in
> early boot and therefor forcing the broadcast nonsense, we should only set
> it when ACPI has actually detected it.

Problem is, select_idle_routine() runs a lot earlier than acpi_init() so
there's a window where we don't definitively know yet whether the box is
actually going to enter C1E or not.

  [ I presume the reason why we have to do the proper detection after
ACPI has been initialized is because the frickelware decides whether
to do C1E entry or not and then sets those bits in the MSR (or not). ]

If in that window we enter idle and we're on an affected machine and we
*don't* switch to broadcast mode, we risk not waking up from C1E, i.e.,
the main reason this fix was even done.

So, if we "prematurely" switch to broadcast mode on the affected CPUs,
we're ok, it will be detected properly later and we're in broadcast
mode already.

Now, on those machines which are not affected and we clear
X86_BUG_AMD_APIC_C1E because they don't enter C1E at all, I was thinking
of maybe doing amd_e400_remove_cpu() and clearing that e400 mask and
even freeing it so that they can do default_idle().

But you're saying tick_broadcast_enter() is irreversible?

Thanks.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-30 Thread Borislav Petkov
On Tue, Nov 29, 2016 at 02:59:01PM +0100, Thomas Gleixner wrote:
> The issue is that you obvioulsy start with the assumption, that the machine
> has this bug. As a consequence the machine is brute forced into tick
> broadcast mode, which cannot be reverted when you clear that misfeature
> after ACPI init. So in case of !NOHZ and !HIGHRES the periodic tick is
> forced into broadcast mode, which is not what you want.
> 
> As far as I understood the whole magic, this C1E misfeature takes only
> effect _after_ ACPI has been initialized. So instead of setting the bug in
> early boot and therefor forcing the broadcast nonsense, we should only set
> it when ACPI has actually detected it.

Problem is, select_idle_routine() runs a lot earlier than acpi_init() so
there's a window where we don't definitively know yet whether the box is
actually going to enter C1E or not.

  [ I presume the reason why we have to do the proper detection after
ACPI has been initialized is because the frickelware decides whether
to do C1E entry or not and then sets those bits in the MSR (or not). ]

If in that window we enter idle and we're on an affected machine and we
*don't* switch to broadcast mode, we risk not waking up from C1E, i.e.,
the main reason this fix was even done.

So, if we "prematurely" switch to broadcast mode on the affected CPUs,
we're ok, it will be detected properly later and we're in broadcast
mode already.

Now, on those machines which are not affected and we clear
X86_BUG_AMD_APIC_C1E because they don't enter C1E at all, I was thinking
of maybe doing amd_e400_remove_cpu() and clearing that e400 mask and
even freeing it so that they can do default_idle().

But you're saying tick_broadcast_enter() is irreversible?

Thanks.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-29 Thread Jiri Olsa
On Tue, Nov 29, 2016 at 02:16:49PM +0100, Borislav Petkov wrote:
> On Mon, Nov 21, 2016 at 05:06:54PM +0100, Borislav Petkov wrote:
> > IOW, what's the worst thing that can happen if we did this below?
> > 
> > We basically get rid of the detection and switch the timer to broadcast
> > mode immediately on the halting CPU.
> > 
> > amd_e400_idle() is behind an "if (cpu_has_bug(c, X86_BUG_AMD_APIC_C1E))"
> > check so it will run on the affected CPUs only...
> > 
> > Thoughts?
> 
> Actually, here's a better version. The E400 detection works only after
> ACPI has been enabled so we piggyback the end of acpi_init().
> 
> We don't need the MSR read now - we do
> 
>   if (static_cpu_has_bug(X86_BUG_AMD_APIC_C1E))
> 
> on the idle path which is as fast as it gets.
> 
> Any complaints about this before I go and test it everywhere?
> 
> It builds and boots in my guest here ok, not that it means a whole lot.
> 
> The good news is, I have collected a fleet of boxes which all have that
> erratum so testing should be pretty reliable. Something that doesn't
> happen everytime!

+1 one test server over here.. will check ;-)

jirka


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-29 Thread Jiri Olsa
On Tue, Nov 29, 2016 at 02:16:49PM +0100, Borislav Petkov wrote:
> On Mon, Nov 21, 2016 at 05:06:54PM +0100, Borislav Petkov wrote:
> > IOW, what's the worst thing that can happen if we did this below?
> > 
> > We basically get rid of the detection and switch the timer to broadcast
> > mode immediately on the halting CPU.
> > 
> > amd_e400_idle() is behind an "if (cpu_has_bug(c, X86_BUG_AMD_APIC_C1E))"
> > check so it will run on the affected CPUs only...
> > 
> > Thoughts?
> 
> Actually, here's a better version. The E400 detection works only after
> ACPI has been enabled so we piggyback the end of acpi_init().
> 
> We don't need the MSR read now - we do
> 
>   if (static_cpu_has_bug(X86_BUG_AMD_APIC_C1E))
> 
> on the idle path which is as fast as it gets.
> 
> Any complaints about this before I go and test it everywhere?
> 
> It builds and boots in my guest here ok, not that it means a whole lot.
> 
> The good news is, I have collected a fleet of boxes which all have that
> erratum so testing should be pretty reliable. Something that doesn't
> happen everytime!

+1 one test server over here.. will check ;-)

jirka


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-29 Thread Thomas Gleixner
On Tue, 29 Nov 2016, Borislav Petkov wrote:
> On Mon, Nov 21, 2016 at 05:06:54PM +0100, Borislav Petkov wrote:
> > IOW, what's the worst thing that can happen if we did this below?
> > 
> > We basically get rid of the detection and switch the timer to broadcast
> > mode immediately on the halting CPU.
> > 
> > amd_e400_idle() is behind an "if (cpu_has_bug(c, X86_BUG_AMD_APIC_C1E))"
> > check so it will run on the affected CPUs only...
> > 
> > Thoughts?
> 
> Actually, here's a better version. The E400 detection works only after
> ACPI has been enabled so we piggyback the end of acpi_init().
> 
> We don't need the MSR read now - we do
> 
>   if (static_cpu_has_bug(X86_BUG_AMD_APIC_C1E))
> 
> on the idle path which is as fast as it gets.
> 
> Any complaints about this before I go and test it everywhere?

The issue is that you obvioulsy start with the assumption, that the machine
has this bug. As a consequence the machine is brute forced into tick
broadcast mode, which cannot be reverted when you clear that misfeature
after ACPI init. So in case of !NOHZ and !HIGHRES the periodic tick is
forced into broadcast mode, which is not what you want.

As far as I understood the whole magic, this C1E misfeature takes only
effect _after_ ACPI has been initialized. So instead of setting the bug in
early boot and therefor forcing the broadcast nonsense, we should only set
it when ACPI has actually detected it.

Thanks,

tglx








Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-29 Thread Thomas Gleixner
On Tue, 29 Nov 2016, Borislav Petkov wrote:
> On Mon, Nov 21, 2016 at 05:06:54PM +0100, Borislav Petkov wrote:
> > IOW, what's the worst thing that can happen if we did this below?
> > 
> > We basically get rid of the detection and switch the timer to broadcast
> > mode immediately on the halting CPU.
> > 
> > amd_e400_idle() is behind an "if (cpu_has_bug(c, X86_BUG_AMD_APIC_C1E))"
> > check so it will run on the affected CPUs only...
> > 
> > Thoughts?
> 
> Actually, here's a better version. The E400 detection works only after
> ACPI has been enabled so we piggyback the end of acpi_init().
> 
> We don't need the MSR read now - we do
> 
>   if (static_cpu_has_bug(X86_BUG_AMD_APIC_C1E))
> 
> on the idle path which is as fast as it gets.
> 
> Any complaints about this before I go and test it everywhere?

The issue is that you obvioulsy start with the assumption, that the machine
has this bug. As a consequence the machine is brute forced into tick
broadcast mode, which cannot be reverted when you clear that misfeature
after ACPI init. So in case of !NOHZ and !HIGHRES the periodic tick is
forced into broadcast mode, which is not what you want.

As far as I understood the whole magic, this C1E misfeature takes only
effect _after_ ACPI has been initialized. So instead of setting the bug in
early boot and therefor forcing the broadcast nonsense, we should only set
it when ACPI has actually detected it.

Thanks,

tglx








Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-29 Thread Borislav Petkov
On Mon, Nov 21, 2016 at 05:06:54PM +0100, Borislav Petkov wrote:
> IOW, what's the worst thing that can happen if we did this below?
> 
> We basically get rid of the detection and switch the timer to broadcast
> mode immediately on the halting CPU.
> 
> amd_e400_idle() is behind an "if (cpu_has_bug(c, X86_BUG_AMD_APIC_C1E))"
> check so it will run on the affected CPUs only...
> 
> Thoughts?

Actually, here's a better version. The E400 detection works only after
ACPI has been enabled so we piggyback the end of acpi_init().

We don't need the MSR read now - we do

if (static_cpu_has_bug(X86_BUG_AMD_APIC_C1E))

on the idle path which is as fast as it gets.

Any complaints about this before I go and test it everywhere?

It builds and boots in my guest here ok, not that it means a whole lot.

The good news is, I have collected a fleet of boxes which all have that
erratum so testing should be pretty reliable. Something that doesn't
happen everytime!

:-)

---
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 5391b0ae7cc3..aa8c879e814f 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -94,7 +94,8 @@ static inline unsigned int 
acpi_processor_cstate_check(unsigned int max_cstate)
boot_cpu_data.x86_model <= 0x05 &&
boot_cpu_data.x86_mask < 0x0A)
return 1;
-   else if (amd_e400_c1e_detected)
+
+   else if (boot_cpu_has(X86_BUG_AMD_APIC_C1E))
return 1;
else
return max_cstate;
@@ -131,6 +132,7 @@ static inline bool acpi_has_cpu_in_madt(void)
return !!acpi_lapic;
 }
 
+extern void __init __weak arch_post_acpi_init(void);
 #else /* !CONFIG_ACPI */
 
 #define acpi_lapic 0
diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 1d2b69fc0ceb..b3918609f6f1 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -204,6 +204,7 @@ static __always_inline __pure bool _static_cpu_has(u16 bit)
 
 #define static_cpu_has_bug(bit)static_cpu_has((bit))
 #define boot_cpu_has_bug(bit)  cpu_has_bug(_cpu_data, (bit))
+#define setup_clear_cpu_bug(bit)   setup_clear_cpu_cap((bit))
 
 #define MAX_CPU_FEATURES   (NCAPINTS * 32)
 #define cpu_have_feature   boot_cpu_has
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 1f6a92903b09..d98c5c432209 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -636,7 +636,6 @@ extern void select_idle_routine(const struct cpuinfo_x86 
*c);
 extern void init_amd_e400_c1e_mask(void);
 
 extern unsigned long   boot_option_idle_override;
-extern boolamd_e400_c1e_detected;
 
 enum idle_boot_override {IDLE_NO_OVERRIDE=0, IDLE_HALT, IDLE_NOMWAIT,
 IDLE_POLL};
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ee023919e476..56653b7322f9 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -289,9 +289,6 @@ void stop_this_cpu(void *dummy)
halt();
 }
 
-bool amd_e400_c1e_detected;
-EXPORT_SYMBOL(amd_e400_c1e_detected);
-
 static cpumask_var_t amd_e400_c1e_mask;
 
 void amd_e400_remove_cpu(int cpu)
@@ -301,26 +298,14 @@ void amd_e400_remove_cpu(int cpu)
 }
 
 /*
- * AMD Erratum 400 aware idle routine. We check for C1E active in the interrupt
- * pending message MSR. If we detect C1E, then we handle it the same
- * way as C3 power states (local apic timer and TSC stop)
+ * AMD Erratum 400 aware idle routine.
+ *
+ *
+ * We handle it the same way as C3 power states (local apic timer and TSC stop)
  */
 static void amd_e400_idle(void)
 {
-   if (!amd_e400_c1e_detected) {
-   u32 lo, hi;
-
-   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
-
-   if (lo & K8_INTP_C1E_ACTIVE_MASK) {
-   amd_e400_c1e_detected = true;
-   if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
-   mark_tsc_unstable("TSC halt in AMD C1E");
-   pr_info("System has AMD C1E enabled\n");
-   }
-   }
-
-   if (amd_e400_c1e_detected) {
+   if (static_cpu_has_bug(X86_BUG_AMD_APIC_C1E)) {
int cpu = smp_processor_id();
 
if (!cpumask_test_cpu(cpu, amd_e400_c1e_mask)) {
@@ -419,6 +404,30 @@ void __init init_amd_e400_c1e_mask(void)
zalloc_cpumask_var(_e400_c1e_mask, GFP_KERNEL);
 }
 
+void __init arch_post_acpi_init(void)
+{
+   u32 lo, hi;
+
+   if (!boot_cpu_has_bug(X86_BUG_AMD_APIC_C1E))
+   return;
+
+   /*
+* AMD E400 detection needs to happen *after* ACPI has been enabled. We
+* check for C1E active in the interrupt pending message MSR. If we 
detect
+* C1E enabled, machine is affected.
+*/
+   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
+
+   if (!(lo & 

Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-29 Thread Borislav Petkov
On Mon, Nov 21, 2016 at 05:06:54PM +0100, Borislav Petkov wrote:
> IOW, what's the worst thing that can happen if we did this below?
> 
> We basically get rid of the detection and switch the timer to broadcast
> mode immediately on the halting CPU.
> 
> amd_e400_idle() is behind an "if (cpu_has_bug(c, X86_BUG_AMD_APIC_C1E))"
> check so it will run on the affected CPUs only...
> 
> Thoughts?

Actually, here's a better version. The E400 detection works only after
ACPI has been enabled so we piggyback the end of acpi_init().

We don't need the MSR read now - we do

if (static_cpu_has_bug(X86_BUG_AMD_APIC_C1E))

on the idle path which is as fast as it gets.

Any complaints about this before I go and test it everywhere?

It builds and boots in my guest here ok, not that it means a whole lot.

The good news is, I have collected a fleet of boxes which all have that
erratum so testing should be pretty reliable. Something that doesn't
happen everytime!

:-)

---
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 5391b0ae7cc3..aa8c879e814f 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -94,7 +94,8 @@ static inline unsigned int 
acpi_processor_cstate_check(unsigned int max_cstate)
boot_cpu_data.x86_model <= 0x05 &&
boot_cpu_data.x86_mask < 0x0A)
return 1;
-   else if (amd_e400_c1e_detected)
+
+   else if (boot_cpu_has(X86_BUG_AMD_APIC_C1E))
return 1;
else
return max_cstate;
@@ -131,6 +132,7 @@ static inline bool acpi_has_cpu_in_madt(void)
return !!acpi_lapic;
 }
 
+extern void __init __weak arch_post_acpi_init(void);
 #else /* !CONFIG_ACPI */
 
 #define acpi_lapic 0
diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 1d2b69fc0ceb..b3918609f6f1 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -204,6 +204,7 @@ static __always_inline __pure bool _static_cpu_has(u16 bit)
 
 #define static_cpu_has_bug(bit)static_cpu_has((bit))
 #define boot_cpu_has_bug(bit)  cpu_has_bug(_cpu_data, (bit))
+#define setup_clear_cpu_bug(bit)   setup_clear_cpu_cap((bit))
 
 #define MAX_CPU_FEATURES   (NCAPINTS * 32)
 #define cpu_have_feature   boot_cpu_has
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 1f6a92903b09..d98c5c432209 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -636,7 +636,6 @@ extern void select_idle_routine(const struct cpuinfo_x86 
*c);
 extern void init_amd_e400_c1e_mask(void);
 
 extern unsigned long   boot_option_idle_override;
-extern boolamd_e400_c1e_detected;
 
 enum idle_boot_override {IDLE_NO_OVERRIDE=0, IDLE_HALT, IDLE_NOMWAIT,
 IDLE_POLL};
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ee023919e476..56653b7322f9 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -289,9 +289,6 @@ void stop_this_cpu(void *dummy)
halt();
 }
 
-bool amd_e400_c1e_detected;
-EXPORT_SYMBOL(amd_e400_c1e_detected);
-
 static cpumask_var_t amd_e400_c1e_mask;
 
 void amd_e400_remove_cpu(int cpu)
@@ -301,26 +298,14 @@ void amd_e400_remove_cpu(int cpu)
 }
 
 /*
- * AMD Erratum 400 aware idle routine. We check for C1E active in the interrupt
- * pending message MSR. If we detect C1E, then we handle it the same
- * way as C3 power states (local apic timer and TSC stop)
+ * AMD Erratum 400 aware idle routine.
+ *
+ *
+ * We handle it the same way as C3 power states (local apic timer and TSC stop)
  */
 static void amd_e400_idle(void)
 {
-   if (!amd_e400_c1e_detected) {
-   u32 lo, hi;
-
-   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
-
-   if (lo & K8_INTP_C1E_ACTIVE_MASK) {
-   amd_e400_c1e_detected = true;
-   if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
-   mark_tsc_unstable("TSC halt in AMD C1E");
-   pr_info("System has AMD C1E enabled\n");
-   }
-   }
-
-   if (amd_e400_c1e_detected) {
+   if (static_cpu_has_bug(X86_BUG_AMD_APIC_C1E)) {
int cpu = smp_processor_id();
 
if (!cpumask_test_cpu(cpu, amd_e400_c1e_mask)) {
@@ -419,6 +404,30 @@ void __init init_amd_e400_c1e_mask(void)
zalloc_cpumask_var(_e400_c1e_mask, GFP_KERNEL);
 }
 
+void __init arch_post_acpi_init(void)
+{
+   u32 lo, hi;
+
+   if (!boot_cpu_has_bug(X86_BUG_AMD_APIC_C1E))
+   return;
+
+   /*
+* AMD E400 detection needs to happen *after* ACPI has been enabled. We
+* check for C1E active in the interrupt pending message MSR. If we 
detect
+* C1E enabled, machine is affected.
+*/
+   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
+
+   if (!(lo & 

Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-28 Thread Andi Kleen
> Well, actually it's been optimized a lot from the first instances.
> Note, tracepoints need to be quite generic, so where it can be optimized
> is not obvious.

There is quite a bit of stuff that doesn't change from trace point to
trace point. So you could just cache all these decisions into a single
per cpu variable, and invalidate if anything changes the conditions.

> > 
> > Functions with # of instructions:
> > 
> > 25 trace_event_raw_event_sched_switch
> 
> This is from the TRACE_EVENT macro.
> 
> if (trace_trigger_soft_disabled(trace_file))\
> return; \

Cache?

> if ((trace_file->flags & EVENT_FILE_FL_PID_FILTER) &&
> trace_event_ignore_this_pid(trace_file))
> return NULL;
> 
> *current_rb = trace_file->tr->trace_buffer.buffer;

Indirection could be cached.

> 
> if ((trace_file->flags &
>  (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) &&
> (entry = this_cpu_read(trace_buffered_event))) {

Multiple tests of trace_file->flags in different functions, 
could be all folded into a single test.

> 
> 
> Note: Some of these if statements can be encapsulated with jump labels, as 
> they
> are false pretty much all of the time.
> 
> 
> /* Try to use the per cpu buffer first */
> val = this_cpu_inc_return(trace_buffered_event_cnt);
> if (val == 1) {
> trace_event_setup(entry, type, flags, pc);
> entry->array[0] = len;
> return entry;
> }
> this_cpu_dec(trace_buffered_event_cnt);

This can be pre cached in a fast path. If true just point per cpu
variable directly to buffer and invalidate if buffer changes or is full.

> preempt_disable_notrace();
> 
> if (unlikely(atomic_read(>record_disabled)))
> goto out;

Cache. 
> 
> cpu = raw_smp_processor_id();
> 
> if (unlikely(!cpumask_test_cpu(cpu, buffer->cpumask)))
> goto out;

Not needed for per cpu buffers in a fast path.

> 
> cpu_buffer = buffer->buffers[cpu];
> 
> if (unlikely(atomic_read(_buffer->record_disabled)))
> goto out;
cache

> 
> if (unlikely(length > BUF_MAX_DATA_SIZE))
> goto out;
> 
> if (unlikely(trace_recursive_lock(cpu_buffer)))
> goto out;

locking should invalidate state on this cpu, so shouldn't be needed

> u64 diff;
> 
> rb_start_commit(cpu_buffer);
> 
> #ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
> /*
>  * Due to the ability to swap a cpu buffer from a buffer
>  * it is possible it was swapped before we committed.
>  * (committing stops a swap). We check for it here and
>  * if it happened, we have to fail the write.
>  */
> barrier();
> if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
> local_dec(_buffer->committing);
> local_dec(_buffer->commits);
> return NULL;
> }

swapping should invalidate state centrally (and then run slow path like
current path)

> /*
>  * We allow for interrupts to reenter here and do a trace.
>  * If one does, it will cause this original code to loop
>  * back here. Even with heavy interrupts happening, this
>  * should only happen a few times in a row. If this happens
>  * 1000 times in a row, there must be either an interrupt
>  * storm or we have something buggy.
>  * Bail!
>  */
> if (RB_WARN_ON(cpu_buffer, ++nr_loops > 1000))
> goto out_fail;

didn't you disable interrupts earlier?

> /* Did the write stamp get updated already? */
> if (likely(info.ts >= cpu_buffer->write_stamp)) {
> info.delta = diff;
> if (unlikely(test_time_stamp(info.delta)))
> rb_handle_timestamp(cpu_buffer, );
> }

Not sure why a write stamp is needed. isn't that the last few entries?

...

-Andi


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-28 Thread Andi Kleen
> Well, actually it's been optimized a lot from the first instances.
> Note, tracepoints need to be quite generic, so where it can be optimized
> is not obvious.

There is quite a bit of stuff that doesn't change from trace point to
trace point. So you could just cache all these decisions into a single
per cpu variable, and invalidate if anything changes the conditions.

> > 
> > Functions with # of instructions:
> > 
> > 25 trace_event_raw_event_sched_switch
> 
> This is from the TRACE_EVENT macro.
> 
> if (trace_trigger_soft_disabled(trace_file))\
> return; \

Cache?

> if ((trace_file->flags & EVENT_FILE_FL_PID_FILTER) &&
> trace_event_ignore_this_pid(trace_file))
> return NULL;
> 
> *current_rb = trace_file->tr->trace_buffer.buffer;

Indirection could be cached.

> 
> if ((trace_file->flags &
>  (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) &&
> (entry = this_cpu_read(trace_buffered_event))) {

Multiple tests of trace_file->flags in different functions, 
could be all folded into a single test.

> 
> 
> Note: Some of these if statements can be encapsulated with jump labels, as 
> they
> are false pretty much all of the time.
> 
> 
> /* Try to use the per cpu buffer first */
> val = this_cpu_inc_return(trace_buffered_event_cnt);
> if (val == 1) {
> trace_event_setup(entry, type, flags, pc);
> entry->array[0] = len;
> return entry;
> }
> this_cpu_dec(trace_buffered_event_cnt);

This can be pre cached in a fast path. If true just point per cpu
variable directly to buffer and invalidate if buffer changes or is full.

> preempt_disable_notrace();
> 
> if (unlikely(atomic_read(>record_disabled)))
> goto out;

Cache. 
> 
> cpu = raw_smp_processor_id();
> 
> if (unlikely(!cpumask_test_cpu(cpu, buffer->cpumask)))
> goto out;

Not needed for per cpu buffers in a fast path.

> 
> cpu_buffer = buffer->buffers[cpu];
> 
> if (unlikely(atomic_read(_buffer->record_disabled)))
> goto out;
cache

> 
> if (unlikely(length > BUF_MAX_DATA_SIZE))
> goto out;
> 
> if (unlikely(trace_recursive_lock(cpu_buffer)))
> goto out;

locking should invalidate state on this cpu, so shouldn't be needed

> u64 diff;
> 
> rb_start_commit(cpu_buffer);
> 
> #ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
> /*
>  * Due to the ability to swap a cpu buffer from a buffer
>  * it is possible it was swapped before we committed.
>  * (committing stops a swap). We check for it here and
>  * if it happened, we have to fail the write.
>  */
> barrier();
> if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
> local_dec(_buffer->committing);
> local_dec(_buffer->commits);
> return NULL;
> }

swapping should invalidate state centrally (and then run slow path like
current path)

> /*
>  * We allow for interrupts to reenter here and do a trace.
>  * If one does, it will cause this original code to loop
>  * back here. Even with heavy interrupts happening, this
>  * should only happen a few times in a row. If this happens
>  * 1000 times in a row, there must be either an interrupt
>  * storm or we have something buggy.
>  * Bail!
>  */
> if (RB_WARN_ON(cpu_buffer, ++nr_loops > 1000))
> goto out_fail;

didn't you disable interrupts earlier?

> /* Did the write stamp get updated already? */
> if (likely(info.ts >= cpu_buffer->write_stamp)) {
> info.delta = diff;
> if (unlikely(test_time_stamp(info.delta)))
> rb_handle_timestamp(cpu_buffer, );
> }

Not sure why a write stamp is needed. isn't that the last few entries?

...

-Andi


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-28 Thread Steven Rostedt
On Mon, 28 Nov 2016 13:48:25 -0800
Andi Kleen  wrote:

> > I took a look at this and forced some more functions to be inlined. I
> > did a little tweaking here and there. Could you pull my tree and see if
> > things are better?  I don't currently have the hardware to run this
> > myself.
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> > 
> >   branch: ftrace/core  
> 
> Didn't help much unfortunately. I think you need a real fast path here.

Well, actually it's been optimized a lot from the first instances.
Note, tracepoints need to be quite generic, so where it can be optimized
is not obvious.

> 
> http://halobates.de/tracepoint-trace2
> 
> $ wc -l tracepoint-trace2
> 595 tracepoint-trace2
> 
> Opcode Histogram: 
> 
> 201 mov
>  51 push
>  51 pop
>  34 test
>  32 cmp
>  26 jz
>  24 jnz
>  24 and
>  16 ret
>  16 lea
>  15 call
>  12 add
>  10 jmp
> 
> Functions with # of instructions:
> 
> 25 trace_event_raw_event_sched_switch

This is from the TRACE_EVENT macro.

trace_event_raw_event_##call(void *__data, proto)   \
{   \
struct trace_event_file *trace_file = __data;   \
struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\
struct trace_event_buffer fbuffer;  \
struct trace_event_raw_##call *entry;   \
int __data_size;\
\
if (trace_trigger_soft_disabled(trace_file))\
return; \
\
__data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
\
entry = trace_event_buffer_reserve(, trace_file,\
 sizeof(*entry) + __data_size); \


>  19 trace_event_buffer_reserve

void *trace_event_buffer_reserve(struct trace_event_buffer *fbuffer,
 struct trace_event_file *trace_file,
 unsigned long len)
{
struct trace_event_call *event_call = trace_file->event_call;

if ((trace_file->flags & EVENT_FILE_FL_PID_FILTER) &&
trace_event_ignore_this_pid(trace_file))
return NULL;

local_save_flags(fbuffer->flags);
fbuffer->pc = preempt_count();
/*
 * If CONFIG_PREEMPT is enabled, then the tracepoint itself disables
 * preemption (adding one to the preempt_count). Since we are
 * interested in the preempt_count at the time the tracepoint was
 * hit, we need to subtract one to offset the increment.
 */
if (IS_ENABLED(CONFIG_PREEMPT))
fbuffer->pc--;
fbuffer->trace_file = trace_file;

fbuffer->event =
trace_event_buffer_lock_reserve(>buffer, trace_file,
event_call->event.type, len,
fbuffer->flags, fbuffer->pc);


>  21 trace_event_buffer_lock_reserve

struct ring_buffer_event *
trace_event_buffer_lock_reserve(struct ring_buffer **current_rb,
  struct trace_event_file *trace_file,
  int type, unsigned long len,
  unsigned long flags, int pc)
{
struct ring_buffer_event *entry;
int val;

*current_rb = trace_file->tr->trace_buffer.buffer;

if ((trace_file->flags &
 (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) &&
(entry = this_cpu_read(trace_buffered_event))) {


Note: Some of these if statements can be encapsulated with jump labels, as they
are false pretty much all of the time.


/* Try to use the per cpu buffer first */
val = this_cpu_inc_return(trace_buffered_event_cnt);
if (val == 1) {
trace_event_setup(entry, type, flags, pc);
entry->array[0] = len;
return entry;
}
this_cpu_dec(trace_buffered_event_cnt);
}

entry = trace_buffer_lock_reserve(*current_rb,
 type, len, flags, pc);


>  57 ring_buffer_lock_reserve

struct ring_buffer_event *
ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
{
struct ring_buffer_per_cpu *cpu_buffer;
struct ring_buffer_event *event;
int cpu;

/* If we are tracing schedule, we don't want to recurse */
preempt_disable_notrace();


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-28 Thread Steven Rostedt
On Mon, 28 Nov 2016 13:48:25 -0800
Andi Kleen  wrote:

> > I took a look at this and forced some more functions to be inlined. I
> > did a little tweaking here and there. Could you pull my tree and see if
> > things are better?  I don't currently have the hardware to run this
> > myself.
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> > 
> >   branch: ftrace/core  
> 
> Didn't help much unfortunately. I think you need a real fast path here.

Well, actually it's been optimized a lot from the first instances.
Note, tracepoints need to be quite generic, so where it can be optimized
is not obvious.

> 
> http://halobates.de/tracepoint-trace2
> 
> $ wc -l tracepoint-trace2
> 595 tracepoint-trace2
> 
> Opcode Histogram: 
> 
> 201 mov
>  51 push
>  51 pop
>  34 test
>  32 cmp
>  26 jz
>  24 jnz
>  24 and
>  16 ret
>  16 lea
>  15 call
>  12 add
>  10 jmp
> 
> Functions with # of instructions:
> 
> 25 trace_event_raw_event_sched_switch

This is from the TRACE_EVENT macro.

trace_event_raw_event_##call(void *__data, proto)   \
{   \
struct trace_event_file *trace_file = __data;   \
struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\
struct trace_event_buffer fbuffer;  \
struct trace_event_raw_##call *entry;   \
int __data_size;\
\
if (trace_trigger_soft_disabled(trace_file))\
return; \
\
__data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
\
entry = trace_event_buffer_reserve(, trace_file,\
 sizeof(*entry) + __data_size); \


>  19 trace_event_buffer_reserve

void *trace_event_buffer_reserve(struct trace_event_buffer *fbuffer,
 struct trace_event_file *trace_file,
 unsigned long len)
{
struct trace_event_call *event_call = trace_file->event_call;

if ((trace_file->flags & EVENT_FILE_FL_PID_FILTER) &&
trace_event_ignore_this_pid(trace_file))
return NULL;

local_save_flags(fbuffer->flags);
fbuffer->pc = preempt_count();
/*
 * If CONFIG_PREEMPT is enabled, then the tracepoint itself disables
 * preemption (adding one to the preempt_count). Since we are
 * interested in the preempt_count at the time the tracepoint was
 * hit, we need to subtract one to offset the increment.
 */
if (IS_ENABLED(CONFIG_PREEMPT))
fbuffer->pc--;
fbuffer->trace_file = trace_file;

fbuffer->event =
trace_event_buffer_lock_reserve(>buffer, trace_file,
event_call->event.type, len,
fbuffer->flags, fbuffer->pc);


>  21 trace_event_buffer_lock_reserve

struct ring_buffer_event *
trace_event_buffer_lock_reserve(struct ring_buffer **current_rb,
  struct trace_event_file *trace_file,
  int type, unsigned long len,
  unsigned long flags, int pc)
{
struct ring_buffer_event *entry;
int val;

*current_rb = trace_file->tr->trace_buffer.buffer;

if ((trace_file->flags &
 (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) &&
(entry = this_cpu_read(trace_buffered_event))) {


Note: Some of these if statements can be encapsulated with jump labels, as they
are false pretty much all of the time.


/* Try to use the per cpu buffer first */
val = this_cpu_inc_return(trace_buffered_event_cnt);
if (val == 1) {
trace_event_setup(entry, type, flags, pc);
entry->array[0] = len;
return entry;
}
this_cpu_dec(trace_buffered_event_cnt);
}

entry = trace_buffer_lock_reserve(*current_rb,
 type, len, flags, pc);


>  57 ring_buffer_lock_reserve

struct ring_buffer_event *
ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
{
struct ring_buffer_per_cpu *cpu_buffer;
struct ring_buffer_event *event;
int cpu;

/* If we are tracing schedule, we don't want to recurse */
preempt_disable_notrace();

if 

Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-28 Thread Andi Kleen
> I took a look at this and forced some more functions to be inlined. I
> did a little tweaking here and there. Could you pull my tree and see if
> things are better?  I don't currently have the hardware to run this
> myself.
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> 
>   branch: ftrace/core

Didn't help much unfortunately. I think you need a real fast path here.

http://halobates.de/tracepoint-trace2

$ wc -l tracepoint-trace2
595 tracepoint-trace2

Opcode Histogram: 

201 mov
 51 push
 51 pop
 34 test
 32 cmp
 26 jz
 24 jnz
 24 and
 16 ret
 16 lea
 15 call
 12 add
 10 jmp

Functions with # of instructions:

25 trace_event_raw_event_sched_switch
 19 trace_event_buffer_reserve
 21 trace_event_buffer_lock_reserve
 57 ring_buffer_lock_reserve
  3 trace_clock_local
  3 sched_clock
 24 native_sched_clock
  3 sched_clock
  2 trace_clock_local
 11 ring_buffer_lock_reserve
 81 __rb_reserve_next
 11 ring_buffer_lock_reserve
  6 trace_event_buffer_lock_reserve
 18 ring_buffer_event_data
  5 trace_event_buffer_lock_reserve
 38 tracing_generic_entry_update
 11 trace_event_buffer_lock_reserve
  5 trace_event_buffer_reserve
 18 ring_buffer_event_data
  4 trace_event_buffer_reserve
 28 trace_event_raw_event_sched_switch
 30 trace_event_buffer_commit
 19 trace_buffer_unlock_commit_regs
 11 ring_buffer_unlock_commit
 85 rb_commit
 18 ring_buffer_unlock_commit
 10 trace_buffer_unlock_commit_regs
  3 ftrace_trace_userstack
  7 trace_buffer_unlock_commit_regs
 11 trace_event_buffer_commit
  8 trace_event_raw_event_sched_switch

-Andi


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-28 Thread Andi Kleen
> I took a look at this and forced some more functions to be inlined. I
> did a little tweaking here and there. Could you pull my tree and see if
> things are better?  I don't currently have the hardware to run this
> myself.
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> 
>   branch: ftrace/core

Didn't help much unfortunately. I think you need a real fast path here.

http://halobates.de/tracepoint-trace2

$ wc -l tracepoint-trace2
595 tracepoint-trace2

Opcode Histogram: 

201 mov
 51 push
 51 pop
 34 test
 32 cmp
 26 jz
 24 jnz
 24 and
 16 ret
 16 lea
 15 call
 12 add
 10 jmp

Functions with # of instructions:

25 trace_event_raw_event_sched_switch
 19 trace_event_buffer_reserve
 21 trace_event_buffer_lock_reserve
 57 ring_buffer_lock_reserve
  3 trace_clock_local
  3 sched_clock
 24 native_sched_clock
  3 sched_clock
  2 trace_clock_local
 11 ring_buffer_lock_reserve
 81 __rb_reserve_next
 11 ring_buffer_lock_reserve
  6 trace_event_buffer_lock_reserve
 18 ring_buffer_event_data
  5 trace_event_buffer_lock_reserve
 38 tracing_generic_entry_update
 11 trace_event_buffer_lock_reserve
  5 trace_event_buffer_reserve
 18 ring_buffer_event_data
  4 trace_event_buffer_reserve
 28 trace_event_raw_event_sched_switch
 30 trace_event_buffer_commit
 19 trace_buffer_unlock_commit_regs
 11 ring_buffer_unlock_commit
 85 rb_commit
 18 ring_buffer_unlock_commit
 10 trace_buffer_unlock_commit_regs
  3 ftrace_trace_userstack
  7 trace_buffer_unlock_commit_regs
 11 trace_event_buffer_commit
  8 trace_event_raw_event_sched_switch

-Andi


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-23 Thread Steven Rostedt
On Mon, 21 Nov 2016 10:37:00 -0800
Andi Kleen  wrote:

> > It tries to be optimized. I "unoptimized" it a while back to pull out
> > all the inlines that were done in the tracepoint itself. That is, the
> > trace_() function is inlined in the code itself. By
> > breaking that up a bit, I was able to save a bunch of text because the
> > tracepoints were bloating the kernel tremendously.  
> 
> Just adding a few inlines won't fix the gigantic bloat that is currently
> there. See the PT trace I posted earlier (it was even truncated, it's
> actually worse). Just a single enabled trace point took about a us.
> 
> POPF can cause some serializion but it won't be more than a few tens
> of cycles, which would be a few percent at best.
> 
> Here is it again untruncated:
> 
> http://halobates.de/tracepoint-trace
> 

I took a look at this and forced some more functions to be inlined. I
did a little tweaking here and there. Could you pull my tree and see if
things are better?  I don't currently have the hardware to run this
myself.

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git

  branch: ftrace/core


Thanks!

-- Steve



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-23 Thread Steven Rostedt
On Mon, 21 Nov 2016 10:37:00 -0800
Andi Kleen  wrote:

> > It tries to be optimized. I "unoptimized" it a while back to pull out
> > all the inlines that were done in the tracepoint itself. That is, the
> > trace_() function is inlined in the code itself. By
> > breaking that up a bit, I was able to save a bunch of text because the
> > tracepoints were bloating the kernel tremendously.  
> 
> Just adding a few inlines won't fix the gigantic bloat that is currently
> there. See the PT trace I posted earlier (it was even truncated, it's
> actually worse). Just a single enabled trace point took about a us.
> 
> POPF can cause some serializion but it won't be more than a few tens
> of cycles, which would be a few percent at best.
> 
> Here is it again untruncated:
> 
> http://halobates.de/tracepoint-trace
> 

I took a look at this and forced some more functions to be inlined. I
did a little tweaking here and there. Could you pull my tree and see if
things are better?  I don't currently have the hardware to run this
myself.

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git

  branch: ftrace/core


Thanks!

-- Steve



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-22 Thread Andi Kleen
On Tue, Nov 22, 2016 at 09:39:45AM -0500, Steven Rostedt wrote:
> On Mon, 21 Nov 2016 10:37:00 -0800
> Andi Kleen  wrote:
> 
> 
> > Here is it again untruncated:
> > 
> > http://halobates.de/tracepoint-trace
> 
> BTW, what tool did you use to generate this?

perf and Intel PT, with the disassembler patches added and udis86
installed:

https://git.kernel.org/cgit/linux/kernel/git/ak/linux-misc.git/log/?h=perf/disassembler-2
http://udis86.sourceforge.net/

You need a CPU that supports PT, like Broadwell, Skylake, Goldmont.

# enable trace point
perf record -e intel_pt// -a sleep X
perf script --itrace=i0ns --ns -F cpu,ip,time,asm,sym

On Skylake can also get more accurate timing by enabling cycle timing
(at the cost of some overhead)

perf record -e intel_pt/cyc=1,cyc_thresh=1/ -a ...

The earlier trace didn't have that enabled.

-Andi



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-22 Thread Andi Kleen
On Tue, Nov 22, 2016 at 09:39:45AM -0500, Steven Rostedt wrote:
> On Mon, 21 Nov 2016 10:37:00 -0800
> Andi Kleen  wrote:
> 
> 
> > Here is it again untruncated:
> > 
> > http://halobates.de/tracepoint-trace
> 
> BTW, what tool did you use to generate this?

perf and Intel PT, with the disassembler patches added and udis86
installed:

https://git.kernel.org/cgit/linux/kernel/git/ak/linux-misc.git/log/?h=perf/disassembler-2
http://udis86.sourceforge.net/

You need a CPU that supports PT, like Broadwell, Skylake, Goldmont.

# enable trace point
perf record -e intel_pt// -a sleep X
perf script --itrace=i0ns --ns -F cpu,ip,time,asm,sym

On Skylake can also get more accurate timing by enabling cycle timing
(at the cost of some overhead)

perf record -e intel_pt/cyc=1,cyc_thresh=1/ -a ...

The earlier trace didn't have that enabled.

-Andi



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-22 Thread Steven Rostedt
On Mon, 21 Nov 2016 10:37:00 -0800
Andi Kleen  wrote:


> Here is it again untruncated:
> 
> http://halobates.de/tracepoint-trace

BTW, what tool did you use to generate this?

-- Steve


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-22 Thread Steven Rostedt
On Mon, 21 Nov 2016 10:37:00 -0800
Andi Kleen  wrote:


> Here is it again untruncated:
> 
> http://halobates.de/tracepoint-trace

BTW, what tool did you use to generate this?

-- Steve


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-22 Thread Paul E. McKenney
On Mon, Nov 21, 2016 at 12:44:45PM -0800, Andi Kleen wrote:
> > > http://halobates.de/tracepoint-trace
> > 
> > There's a lot of push and pop regs due to function calling. There's
> > places that inlines can still improve things, and perhaps even some
> > likely unlikelys well placed.
> 
> Assuming you avoid all the push/pop and all the call/ret this would only be
> ~25% of the total instructions. There is just far too much logic and
> computation in there.
> 
> % awk ' { print $5 } '  tracepoint-trace | sort | uniq -c | sort -rn
> 222 mov
>  57 push
>  57 pop
>  35 test
>  34 cmp
>  32 and
>  28 jz
>  25 jnz
>  21 ret
>  20 call
>  16 lea
>  11 add

Hmmm...  It does indeed look like some performance analysis would be good...

Thanx, Paul



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-22 Thread Paul E. McKenney
On Mon, Nov 21, 2016 at 12:44:45PM -0800, Andi Kleen wrote:
> > > http://halobates.de/tracepoint-trace
> > 
> > There's a lot of push and pop regs due to function calling. There's
> > places that inlines can still improve things, and perhaps even some
> > likely unlikelys well placed.
> 
> Assuming you avoid all the push/pop and all the call/ret this would only be
> ~25% of the total instructions. There is just far too much logic and
> computation in there.
> 
> % awk ' { print $5 } '  tracepoint-trace | sort | uniq -c | sort -rn
> 222 mov
>  57 push
>  57 pop
>  35 test
>  34 cmp
>  32 and
>  28 jz
>  25 jnz
>  21 ret
>  20 call
>  16 lea
>  11 add

Hmmm...  It does indeed look like some performance analysis would be good...

Thanx, Paul



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Andi Kleen
> > http://halobates.de/tracepoint-trace
> 
> There's a lot of push and pop regs due to function calling. There's
> places that inlines can still improve things, and perhaps even some
> likely unlikelys well placed.

Assuming you avoid all the push/pop and all the call/ret this would only be
~25% of the total instructions. There is just far too much logic and
computation in there.

% awk ' { print $5 } '  tracepoint-trace | sort | uniq -c | sort -rn
222 mov
 57 push
 57 pop
 35 test
 34 cmp
 32 and
 28 jz
 25 jnz
 21 ret
 20 call
 16 lea
 11 add

-Andi


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Andi Kleen
> > http://halobates.de/tracepoint-trace
> 
> There's a lot of push and pop regs due to function calling. There's
> places that inlines can still improve things, and perhaps even some
> likely unlikelys well placed.

Assuming you avoid all the push/pop and all the call/ret this would only be
~25% of the total instructions. There is just far too much logic and
computation in there.

% awk ' { print $5 } '  tracepoint-trace | sort | uniq -c | sort -rn
222 mov
 57 push
 57 pop
 35 test
 34 cmp
 32 and
 28 jz
 25 jnz
 21 ret
 20 call
 16 lea
 11 add

-Andi


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Paul E. McKenney
On Mon, Nov 21, 2016 at 01:24:38PM -0500, Steven Rostedt wrote:
> 
> Paul,
> 
> 
> On Mon, 21 Nov 2016 12:55:01 -0500
> Steven Rostedt  wrote:
> 
> > On Mon, 21 Nov 2016 18:18:53 +0100
> > Peter Zijlstra  wrote:
> > 
> > > Its not ftrace as such though, its RCU, ftrace simply uses RCU to avoid
> > > locking, as one does.  
> > 
> > Just to be clear, as ftrace in the kernel mostly represents function
> > tracing, which doesn't use RCU. This is a tracepoint feature.
> > 
> > > 
> > > Biggest objection would be that the rcu_irq_enter_irqson() thing does
> > > POPF and rcu_irq_exit_irqson() does again. So wrapping every tracepoint
> > > with that is quite a few cycles.  
> > 
> > Agree. Even though this ends up being a whack-a-mole(TM) fix, I'm not
> > concerned enough to put a heavy weight rcu idle code in for all
> > tracepoints.
> > 
> > Although, what about a percpu flag that can be checked in the
> > tracepoint code to see if it should enable RCU or not?
> > 
> > Hmm, I wonder if "rcu_is_watching()" is light enough to have in all
> > tracepoints?
> 
> Is it possible to make rcu_is_watching() an inlined call to prevent the
> overhead of doing a function call? This way we could use this in the
> fast path of the tracepoint.

It would mean exposing the rcu_dynticks structure to the rest of the
kernel, but I guess that wouldn't be the end of the world.  Are you
calling rcu_is_watching() or __rcu_is_watching()?  The latter is
appropriate if preemption is already  disabled.

Thanx, Paul



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Paul E. McKenney
On Mon, Nov 21, 2016 at 01:24:38PM -0500, Steven Rostedt wrote:
> 
> Paul,
> 
> 
> On Mon, 21 Nov 2016 12:55:01 -0500
> Steven Rostedt  wrote:
> 
> > On Mon, 21 Nov 2016 18:18:53 +0100
> > Peter Zijlstra  wrote:
> > 
> > > Its not ftrace as such though, its RCU, ftrace simply uses RCU to avoid
> > > locking, as one does.  
> > 
> > Just to be clear, as ftrace in the kernel mostly represents function
> > tracing, which doesn't use RCU. This is a tracepoint feature.
> > 
> > > 
> > > Biggest objection would be that the rcu_irq_enter_irqson() thing does
> > > POPF and rcu_irq_exit_irqson() does again. So wrapping every tracepoint
> > > with that is quite a few cycles.  
> > 
> > Agree. Even though this ends up being a whack-a-mole(TM) fix, I'm not
> > concerned enough to put a heavy weight rcu idle code in for all
> > tracepoints.
> > 
> > Although, what about a percpu flag that can be checked in the
> > tracepoint code to see if it should enable RCU or not?
> > 
> > Hmm, I wonder if "rcu_is_watching()" is light enough to have in all
> > tracepoints?
> 
> Is it possible to make rcu_is_watching() an inlined call to prevent the
> overhead of doing a function call? This way we could use this in the
> fast path of the tracepoint.

It would mean exposing the rcu_dynticks structure to the rest of the
kernel, but I guess that wouldn't be the end of the world.  Are you
calling rcu_is_watching() or __rcu_is_watching()?  The latter is
appropriate if preemption is already  disabled.

Thanx, Paul



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt
On Mon, 21 Nov 2016 10:37:00 -0800
Andi Kleen  wrote:


> Just adding a few inlines won't fix the gigantic bloat that is currently
> there. See the PT trace I posted earlier (it was even truncated, it's
> actually worse). Just a single enabled trace point took about a us.

If you want to play with that, enable "Add tracepoint that benchmarks
tracepoints", and enable /sys/kernel/debug/tracing/events/benchmark

I just did on my box, and I have a max of 1.27 us, average .184 us.

Thus, on average, a tracepoint takes 184 nanoseconds.

Note, a cold cache tracepoint (the first one it traced which is not
saved in the "max") was 3.3 us.

Yes it can still be improved, and I do spend time doing that.

> 
> POPF can cause some serializion but it won't be more than a few tens
> of cycles, which would be a few percent at best.
> 
> Here is it again untruncated:
> 
> http://halobates.de/tracepoint-trace

There's a lot of push and pop regs due to function calling. There's
places that inlines can still improve things, and perhaps even some
likely unlikelys well placed.

-- Steve


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt
On Mon, 21 Nov 2016 10:37:00 -0800
Andi Kleen  wrote:


> Just adding a few inlines won't fix the gigantic bloat that is currently
> there. See the PT trace I posted earlier (it was even truncated, it's
> actually worse). Just a single enabled trace point took about a us.

If you want to play with that, enable "Add tracepoint that benchmarks
tracepoints", and enable /sys/kernel/debug/tracing/events/benchmark

I just did on my box, and I have a max of 1.27 us, average .184 us.

Thus, on average, a tracepoint takes 184 nanoseconds.

Note, a cold cache tracepoint (the first one it traced which is not
saved in the "max") was 3.3 us.

Yes it can still be improved, and I do spend time doing that.

> 
> POPF can cause some serializion but it won't be more than a few tens
> of cycles, which would be a few percent at best.
> 
> Here is it again untruncated:
> 
> http://halobates.de/tracepoint-trace

There's a lot of push and pop regs due to function calling. There's
places that inlines can still improve things, and perhaps even some
likely unlikelys well placed.

-- Steve


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt
On Mon, 21 Nov 2016 10:37:00 -0800
Andi Kleen  wrote:

> Ok so how should tracing in idle code work then in your opinion?

As I suggested already. If we can get a light weight rcu_is_watching()
then we can do the rcu_idle work when needed, and not when we don't
need it. Sure this will add more branches, but that's still cheaper
than popf.

-- Steve



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt
On Mon, 21 Nov 2016 10:37:00 -0800
Andi Kleen  wrote:

> Ok so how should tracing in idle code work then in your opinion?

As I suggested already. If we can get a light weight rcu_is_watching()
then we can do the rcu_idle work when needed, and not when we don't
need it. Sure this will add more branches, but that's still cheaper
than popf.

-- Steve



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Andi Kleen
> It tries to be optimized. I "unoptimized" it a while back to pull out
> all the inlines that were done in the tracepoint itself. That is, the
> trace_() function is inlined in the code itself. By
> breaking that up a bit, I was able to save a bunch of text because the
> tracepoints were bloating the kernel tremendously.

Just adding a few inlines won't fix the gigantic bloat that is currently
there. See the PT trace I posted earlier (it was even truncated, it's
actually worse). Just a single enabled trace point took about a us.

POPF can cause some serializion but it won't be more than a few tens
of cycles, which would be a few percent at best.

Here is it again untruncated:

http://halobates.de/tracepoint-trace

$ wc -l tracepoint-trace 
640 tracepoint-trace
$ head -1 tracepoint-trace 
[001] 264171.903577780:  810d6040 trace_event_raw_event_sched_switch
push %rbp
$ tail -1 tracepoint-trace
[001] 264171.903578780:  810d6117 trace_event_raw_event_sched_switch
ret


> There can be more optimization done too. But just because it's not
> optimized to the best it can be (which should be our goal) is not
> excuse to bloat it more with popf!

Ok so how should tracing in idle code work then in your opinion?

-Andi


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Andi Kleen
> It tries to be optimized. I "unoptimized" it a while back to pull out
> all the inlines that were done in the tracepoint itself. That is, the
> trace_() function is inlined in the code itself. By
> breaking that up a bit, I was able to save a bunch of text because the
> tracepoints were bloating the kernel tremendously.

Just adding a few inlines won't fix the gigantic bloat that is currently
there. See the PT trace I posted earlier (it was even truncated, it's
actually worse). Just a single enabled trace point took about a us.

POPF can cause some serializion but it won't be more than a few tens
of cycles, which would be a few percent at best.

Here is it again untruncated:

http://halobates.de/tracepoint-trace

$ wc -l tracepoint-trace 
640 tracepoint-trace
$ head -1 tracepoint-trace 
[001] 264171.903577780:  810d6040 trace_event_raw_event_sched_switch
push %rbp
$ tail -1 tracepoint-trace
[001] 264171.903578780:  810d6117 trace_event_raw_event_sched_switch
ret


> There can be more optimization done too. But just because it's not
> optimized to the best it can be (which should be our goal) is not
> excuse to bloat it more with popf!

Ok so how should tracing in idle code work then in your opinion?

-Andi


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt

Paul,


On Mon, 21 Nov 2016 12:55:01 -0500
Steven Rostedt  wrote:

> On Mon, 21 Nov 2016 18:18:53 +0100
> Peter Zijlstra  wrote:
> 
> > Its not ftrace as such though, its RCU, ftrace simply uses RCU to avoid
> > locking, as one does.  
> 
> Just to be clear, as ftrace in the kernel mostly represents function
> tracing, which doesn't use RCU. This is a tracepoint feature.
> 
> > 
> > Biggest objection would be that the rcu_irq_enter_irqson() thing does
> > POPF and rcu_irq_exit_irqson() does again. So wrapping every tracepoint
> > with that is quite a few cycles.  
> 
> Agree. Even though this ends up being a whack-a-mole(TM) fix, I'm not
> concerned enough to put a heavy weight rcu idle code in for all
> tracepoints.
> 
> Although, what about a percpu flag that can be checked in the
> tracepoint code to see if it should enable RCU or not?
> 
> Hmm, I wonder if "rcu_is_watching()" is light enough to have in all
> tracepoints?

Is it possible to make rcu_is_watching() an inlined call to prevent the
overhead of doing a function call? This way we could use this in the
fast path of the tracepoint.

-- Steve



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt

Paul,


On Mon, 21 Nov 2016 12:55:01 -0500
Steven Rostedt  wrote:

> On Mon, 21 Nov 2016 18:18:53 +0100
> Peter Zijlstra  wrote:
> 
> > Its not ftrace as such though, its RCU, ftrace simply uses RCU to avoid
> > locking, as one does.  
> 
> Just to be clear, as ftrace in the kernel mostly represents function
> tracing, which doesn't use RCU. This is a tracepoint feature.
> 
> > 
> > Biggest objection would be that the rcu_irq_enter_irqson() thing does
> > POPF and rcu_irq_exit_irqson() does again. So wrapping every tracepoint
> > with that is quite a few cycles.  
> 
> Agree. Even though this ends up being a whack-a-mole(TM) fix, I'm not
> concerned enough to put a heavy weight rcu idle code in for all
> tracepoints.
> 
> Although, what about a percpu flag that can be checked in the
> tracepoint code to see if it should enable RCU or not?
> 
> Hmm, I wonder if "rcu_is_watching()" is light enough to have in all
> tracepoints?

Is it possible to make rcu_is_watching() an inlined call to prevent the
overhead of doing a function call? This way we could use this in the
fast path of the tracepoint.

-- Steve



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt
On Mon, 21 Nov 2016 10:06:54 -0800
Andi Kleen  wrote:

> > And a popf can be much more expensive than any of these. You should
> > know, not all instructions are equal.
> > 
> > Using perf, I've seen popf take up almst 30% of a function the size of
> > this.  
> 
> In any case it's a small fraction of the 600+ instructions which are currently
> executed for every enabled trace point.
> 
> If ftrace was actually optimized code this would make some sense, but it
> clearly isn't ...

It tries to be optimized. I "unoptimized" it a while back to pull out
all the inlines that were done in the tracepoint itself. That is, the
trace_() function is inlined in the code itself. By
breaking that up a bit, I was able to save a bunch of text because the
tracepoints were bloating the kernel tremendously.

There can be more optimization done too. But just because it's not
optimized to the best it can be (which should be our goal) is not
excuse to bloat it more with popf!

-- Steve


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt
On Mon, 21 Nov 2016 10:06:54 -0800
Andi Kleen  wrote:

> > And a popf can be much more expensive than any of these. You should
> > know, not all instructions are equal.
> > 
> > Using perf, I've seen popf take up almst 30% of a function the size of
> > this.  
> 
> In any case it's a small fraction of the 600+ instructions which are currently
> executed for every enabled trace point.
> 
> If ftrace was actually optimized code this would make some sense, but it
> clearly isn't ...

It tries to be optimized. I "unoptimized" it a while back to pull out
all the inlines that were done in the tracepoint itself. That is, the
trace_() function is inlined in the code itself. By
breaking that up a bit, I was able to save a bunch of text because the
tracepoints were bloating the kernel tremendously.

There can be more optimization done too. But just because it's not
optimized to the best it can be (which should be our goal) is not
excuse to bloat it more with popf!

-- Steve


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Andi Kleen
> And a popf can be much more expensive than any of these. You should
> know, not all instructions are equal.
> 
> Using perf, I've seen popf take up almst 30% of a function the size of
> this.

In any case it's a small fraction of the 600+ instructions which are currently
executed for every enabled trace point.

If ftrace was actually optimized code this would make some sense, but it
clearly isn't ...

-Andi


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Andi Kleen
> And a popf can be much more expensive than any of these. You should
> know, not all instructions are equal.
> 
> Using perf, I've seen popf take up almst 30% of a function the size of
> this.

In any case it's a small fraction of the 600+ instructions which are currently
executed for every enabled trace point.

If ftrace was actually optimized code this would make some sense, but it
clearly isn't ...

-Andi


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt
On Mon, 21 Nov 2016 09:45:04 -0800
Andi Kleen  wrote:

> On Mon, Nov 21, 2016 at 06:18:53PM +0100, Peter Zijlstra wrote:
> > On Mon, Nov 21, 2016 at 09:06:13AM -0800, Andi Kleen wrote:  
> > > > > it got away with attached change.. but this rcu logic
> > > > > is far beyond me, so it's just wild guess.. ;-)  
> > > > 
> > > > I think I prefer something like the below, that only annotates the one
> > > > RDMSR in question, instead of all of them.  
> > > 
> > > It would be far better to just fix trace points that they always work.
> > > 
> > > This whole thing is a travesty: we have tens of thousands of lines of 
> > > code in
> > > ftrace to support tracing in NMIs, but then "debug features"[1] like
> > > this come around and make trace points unusable for far more code than
> > > just the NMI handlers.  
> > 
> > Not sure the idle handlers are more code than we have NMI code, but yes,
> > its annoying.
> > 
> > Its not ftrace as such though, its RCU, ftrace simply uses RCU to avoid
> > locking, as one does.
> > 
> > Biggest objection would be that the rcu_irq_enter_irqson() thing does
> > POPF and rcu_irq_exit_irqson() does again. So wrapping every tracepoint
> > with that is quite a few cycles.  
> 
> This is only when the trace point is enabled right?
> 
> ftrace is already executing a lot of instructions when enabled. Adding two 
> POPF
> won't be end of the world.

And a popf can be much more expensive than any of these. You should
know, not all instructions are equal.

Using perf, I've seen popf take up almst 30% of a function the size of
this.

-- Steve

> 
> -Andi
> 
> 
> 264171.903577780:  810d6040 trace_event_raw_event_sched_switch
> push %rbp
> 264171.903577780:  810d6041 trace_event_raw_event_sched_switch
> mov %rsp, %rbp
> 264171.903577780:  810d6044 trace_event_raw_event_sched_switch
> push %r15
> 264171.903577780:  810d6046 trace_event_raw_event_sched_switch
> push %r14
> 264171.903577780:  810d6048 trace_event_raw_event_sched_switch
> push %r13
> 264171.903577780:  810d604a trace_event_raw_event_sched_switch
> push %r12
> 264171.903577780:  810d604c trace_event_raw_event_sched_switch
> mov %esi, %r13d
> 264171.903577780:  810d604f trace_event_raw_event_sched_switch
> push %rbx
> 264171.903577780:  810d6050 trace_event_raw_event_sched_switch
> mov %rdi, %r12
> 264171.903577780:  810d6053 trace_event_raw_event_sched_switch
> mov %rdx, %r14
> 264171.903577780:  810d6056 trace_event_raw_event_sched_switch
> mov %rcx, %r15
> 264171.903577780:  810d6059 trace_event_raw_event_sched_switch
> sub $0x30, %rsp
> 264171.903577780:  810d605d trace_event_raw_event_sched_switch
> mov 0x48(%rdi), %rbx
> 264171.903577780:  810d6061 trace_event_raw_event_sched_switch
> test $0x80, %bl
> 264171.903577780:  810d6064 trace_event_raw_event_sched_switch
> jnz 0x810d608c
> 264171.903577780:  810d6066 trace_event_raw_event_sched_switch
> test $0x40, %bl
> 264171.903577780:  810d6069 trace_event_raw_event_sched_switch
> jnz trace_event_raw_event_sched_switch+216
> 264171.903577780:  810d606f trace_event_raw_event_sched_switch
> test $0x20, %bl
> 264171.903577780:  810d6072 trace_event_raw_event_sched_switch
> jz 0x810d6083
> 264171.903577780:  810d6083 trace_event_raw_event_sched_switch
> and $0x1, %bh
> 264171.903577780:  810d6086 trace_event_raw_event_sched_switch
> jnz trace_event_raw_event_sched_switch+237
> 264171.903578113:  810d608c trace_event_raw_event_sched_switch
> mov $0x40, %edx
> 264171.903578113:  810d6091 trace_event_raw_event_sched_switch
> mov %r12, %rsi
> 264171.903578113:  810d6094 trace_event_raw_event_sched_switch
> mov %rsp, %rdi
> 264171.903578113:  810d6097 trace_event_raw_event_sched_switch
> call trace_event_buffer_reserve
> 264171.903578113:  8114f150 trace_event_buffer_reservetest 
> $0x1, 0x49(%rsi)
> 264171.903578113:  8114f154 trace_event_buffer_reservepush 
> %rbp
> 264171.903578113:  8114f155 trace_event_buffer_reservemov 
> 0x10(%rsi), %r10
> 264171.903578113:  8114f159 trace_event_buffer_reservemov 
> %rsp, %rbp
> 264171.903578113:  8114f15c trace_event_buffer_reservepush 
> %rbx
> 264171.903578113:  8114f15d trace_event_buffer_reservejz 
> 0x8114f17e
> 264171.903578113:  8114f17e trace_event_buffer_reservemov 
> %rdx, %rcx
> 264171.903578113:  8114f181 trace_event_buffer_reservemov 
> %rdi, %rbx
> 264171.903578113:  8114f184 trace_event_buffer_reservepushfq 
> 264171.903578113:  8114f185 trace_event_buffer_reserve

Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt
On Mon, 21 Nov 2016 09:45:04 -0800
Andi Kleen  wrote:

> On Mon, Nov 21, 2016 at 06:18:53PM +0100, Peter Zijlstra wrote:
> > On Mon, Nov 21, 2016 at 09:06:13AM -0800, Andi Kleen wrote:  
> > > > > it got away with attached change.. but this rcu logic
> > > > > is far beyond me, so it's just wild guess.. ;-)  
> > > > 
> > > > I think I prefer something like the below, that only annotates the one
> > > > RDMSR in question, instead of all of them.  
> > > 
> > > It would be far better to just fix trace points that they always work.
> > > 
> > > This whole thing is a travesty: we have tens of thousands of lines of 
> > > code in
> > > ftrace to support tracing in NMIs, but then "debug features"[1] like
> > > this come around and make trace points unusable for far more code than
> > > just the NMI handlers.  
> > 
> > Not sure the idle handlers are more code than we have NMI code, but yes,
> > its annoying.
> > 
> > Its not ftrace as such though, its RCU, ftrace simply uses RCU to avoid
> > locking, as one does.
> > 
> > Biggest objection would be that the rcu_irq_enter_irqson() thing does
> > POPF and rcu_irq_exit_irqson() does again. So wrapping every tracepoint
> > with that is quite a few cycles.  
> 
> This is only when the trace point is enabled right?
> 
> ftrace is already executing a lot of instructions when enabled. Adding two 
> POPF
> won't be end of the world.

And a popf can be much more expensive than any of these. You should
know, not all instructions are equal.

Using perf, I've seen popf take up almst 30% of a function the size of
this.

-- Steve

> 
> -Andi
> 
> 
> 264171.903577780:  810d6040 trace_event_raw_event_sched_switch
> push %rbp
> 264171.903577780:  810d6041 trace_event_raw_event_sched_switch
> mov %rsp, %rbp
> 264171.903577780:  810d6044 trace_event_raw_event_sched_switch
> push %r15
> 264171.903577780:  810d6046 trace_event_raw_event_sched_switch
> push %r14
> 264171.903577780:  810d6048 trace_event_raw_event_sched_switch
> push %r13
> 264171.903577780:  810d604a trace_event_raw_event_sched_switch
> push %r12
> 264171.903577780:  810d604c trace_event_raw_event_sched_switch
> mov %esi, %r13d
> 264171.903577780:  810d604f trace_event_raw_event_sched_switch
> push %rbx
> 264171.903577780:  810d6050 trace_event_raw_event_sched_switch
> mov %rdi, %r12
> 264171.903577780:  810d6053 trace_event_raw_event_sched_switch
> mov %rdx, %r14
> 264171.903577780:  810d6056 trace_event_raw_event_sched_switch
> mov %rcx, %r15
> 264171.903577780:  810d6059 trace_event_raw_event_sched_switch
> sub $0x30, %rsp
> 264171.903577780:  810d605d trace_event_raw_event_sched_switch
> mov 0x48(%rdi), %rbx
> 264171.903577780:  810d6061 trace_event_raw_event_sched_switch
> test $0x80, %bl
> 264171.903577780:  810d6064 trace_event_raw_event_sched_switch
> jnz 0x810d608c
> 264171.903577780:  810d6066 trace_event_raw_event_sched_switch
> test $0x40, %bl
> 264171.903577780:  810d6069 trace_event_raw_event_sched_switch
> jnz trace_event_raw_event_sched_switch+216
> 264171.903577780:  810d606f trace_event_raw_event_sched_switch
> test $0x20, %bl
> 264171.903577780:  810d6072 trace_event_raw_event_sched_switch
> jz 0x810d6083
> 264171.903577780:  810d6083 trace_event_raw_event_sched_switch
> and $0x1, %bh
> 264171.903577780:  810d6086 trace_event_raw_event_sched_switch
> jnz trace_event_raw_event_sched_switch+237
> 264171.903578113:  810d608c trace_event_raw_event_sched_switch
> mov $0x40, %edx
> 264171.903578113:  810d6091 trace_event_raw_event_sched_switch
> mov %r12, %rsi
> 264171.903578113:  810d6094 trace_event_raw_event_sched_switch
> mov %rsp, %rdi
> 264171.903578113:  810d6097 trace_event_raw_event_sched_switch
> call trace_event_buffer_reserve
> 264171.903578113:  8114f150 trace_event_buffer_reservetest 
> $0x1, 0x49(%rsi)
> 264171.903578113:  8114f154 trace_event_buffer_reservepush 
> %rbp
> 264171.903578113:  8114f155 trace_event_buffer_reservemov 
> 0x10(%rsi), %r10
> 264171.903578113:  8114f159 trace_event_buffer_reservemov 
> %rsp, %rbp
> 264171.903578113:  8114f15c trace_event_buffer_reservepush 
> %rbx
> 264171.903578113:  8114f15d trace_event_buffer_reservejz 
> 0x8114f17e
> 264171.903578113:  8114f17e trace_event_buffer_reservemov 
> %rdx, %rcx
> 264171.903578113:  8114f181 trace_event_buffer_reservemov 
> %rdi, %rbx
> 264171.903578113:  8114f184 trace_event_buffer_reservepushfq 
> 264171.903578113:  8114f185 trace_event_buffer_reservepop %r8
> 

Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt
On Mon, 21 Nov 2016 18:18:53 +0100
Peter Zijlstra  wrote:

> Its not ftrace as such though, its RCU, ftrace simply uses RCU to avoid
> locking, as one does.

Just to be clear, as ftrace in the kernel mostly represents function
tracing, which doesn't use RCU. This is a tracepoint feature.

> 
> Biggest objection would be that the rcu_irq_enter_irqson() thing does
> POPF and rcu_irq_exit_irqson() does again. So wrapping every tracepoint
> with that is quite a few cycles.

Agree. Even though this ends up being a whack-a-mole(TM) fix, I'm not
concerned enough to put a heavy weight rcu idle code in for all
tracepoints.

Although, what about a percpu flag that can be checked in the
tracepoint code to see if it should enable RCU or not?

Hmm, I wonder if "rcu_is_watching()" is light enough to have in all
tracepoints?

-- Steve


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt
On Mon, 21 Nov 2016 18:18:53 +0100
Peter Zijlstra  wrote:

> Its not ftrace as such though, its RCU, ftrace simply uses RCU to avoid
> locking, as one does.

Just to be clear, as ftrace in the kernel mostly represents function
tracing, which doesn't use RCU. This is a tracepoint feature.

> 
> Biggest objection would be that the rcu_irq_enter_irqson() thing does
> POPF and rcu_irq_exit_irqson() does again. So wrapping every tracepoint
> with that is quite a few cycles.

Agree. Even though this ends up being a whack-a-mole(TM) fix, I'm not
concerned enough to put a heavy weight rcu idle code in for all
tracepoints.

Although, what about a percpu flag that can be checked in the
tracepoint code to see if it should enable RCU or not?

Hmm, I wonder if "rcu_is_watching()" is light enough to have in all
tracepoints?

-- Steve


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Andi Kleen
On Mon, Nov 21, 2016 at 06:18:53PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 21, 2016 at 09:06:13AM -0800, Andi Kleen wrote:
> > > > it got away with attached change.. but this rcu logic
> > > > is far beyond me, so it's just wild guess.. ;-)
> > > 
> > > I think I prefer something like the below, that only annotates the one
> > > RDMSR in question, instead of all of them.
> > 
> > It would be far better to just fix trace points that they always work.
> > 
> > This whole thing is a travesty: we have tens of thousands of lines of code 
> > in
> > ftrace to support tracing in NMIs, but then "debug features"[1] like
> > this come around and make trace points unusable for far more code than
> > just the NMI handlers.
> 
> Not sure the idle handlers are more code than we have NMI code, but yes,
> its annoying.
> 
> Its not ftrace as such though, its RCU, ftrace simply uses RCU to avoid
> locking, as one does.
> 
> Biggest objection would be that the rcu_irq_enter_irqson() thing does
> POPF and rcu_irq_exit_irqson() does again. So wrapping every tracepoint
> with that is quite a few cycles.

This is only when the trace point is enabled right?

ftrace is already executing a lot of instructions when enabled. Adding two POPF
won't be end of the world.

-Andi


264171.903577780:  810d6040 trace_event_raw_event_sched_switch  push 
%rbp
264171.903577780:  810d6041 trace_event_raw_event_sched_switch  mov 
%rsp, %rbp
264171.903577780:  810d6044 trace_event_raw_event_sched_switch  push 
%r15
264171.903577780:  810d6046 trace_event_raw_event_sched_switch  push 
%r14
264171.903577780:  810d6048 trace_event_raw_event_sched_switch  push 
%r13
264171.903577780:  810d604a trace_event_raw_event_sched_switch  push 
%r12
264171.903577780:  810d604c trace_event_raw_event_sched_switch  mov 
%esi, %r13d
264171.903577780:  810d604f trace_event_raw_event_sched_switch  push 
%rbx
264171.903577780:  810d6050 trace_event_raw_event_sched_switch  mov 
%rdi, %r12
264171.903577780:  810d6053 trace_event_raw_event_sched_switch  mov 
%rdx, %r14
264171.903577780:  810d6056 trace_event_raw_event_sched_switch  mov 
%rcx, %r15
264171.903577780:  810d6059 trace_event_raw_event_sched_switch  sub 
$0x30, %rsp
264171.903577780:  810d605d trace_event_raw_event_sched_switch  mov 
0x48(%rdi), %rbx
264171.903577780:  810d6061 trace_event_raw_event_sched_switch  test 
$0x80, %bl
264171.903577780:  810d6064 trace_event_raw_event_sched_switch  jnz 
0x810d608c
264171.903577780:  810d6066 trace_event_raw_event_sched_switch  test 
$0x40, %bl
264171.903577780:  810d6069 trace_event_raw_event_sched_switch  jnz 
trace_event_raw_event_sched_switch+216
264171.903577780:  810d606f trace_event_raw_event_sched_switch  test 
$0x20, %bl
264171.903577780:  810d6072 trace_event_raw_event_sched_switch  jz 
0x810d6083
264171.903577780:  810d6083 trace_event_raw_event_sched_switch  and 
$0x1, %bh
264171.903577780:  810d6086 trace_event_raw_event_sched_switch  jnz 
trace_event_raw_event_sched_switch+237
264171.903578113:  810d608c trace_event_raw_event_sched_switch  mov 
$0x40, %edx
264171.903578113:  810d6091 trace_event_raw_event_sched_switch  mov 
%r12, %rsi
264171.903578113:  810d6094 trace_event_raw_event_sched_switch  mov 
%rsp, %rdi
264171.903578113:  810d6097 trace_event_raw_event_sched_switch  call 
trace_event_buffer_reserve
264171.903578113:  8114f150 trace_event_buffer_reserve  test $0x1, 
0x49(%rsi)
264171.903578113:  8114f154 trace_event_buffer_reserve  push %rbp
264171.903578113:  8114f155 trace_event_buffer_reserve  mov 0x10(%rsi), 
%r10
264171.903578113:  8114f159 trace_event_buffer_reserve  mov %rsp, %rbp
264171.903578113:  8114f15c trace_event_buffer_reserve  push %rbx
264171.903578113:  8114f15d trace_event_buffer_reserve  jz 0x8114f17e
264171.903578113:  8114f17e trace_event_buffer_reserve  mov %rdx, %rcx
264171.903578113:  8114f181 trace_event_buffer_reserve  mov %rdi, %rbx
264171.903578113:  8114f184 trace_event_buffer_reserve  pushfq 
264171.903578113:  8114f185 trace_event_buffer_reserve  pop %r8
264171.903578113:  8114f187 trace_event_buffer_reserve  mov 
%gs:0x7eebd2c1(%rip), %r9d
264171.903578113:  8114f18f trace_event_buffer_reserve  and 
$0x7fff, %r9d
264171.903578113:  8114f196 trace_event_buffer_reserve  mov %r8, 
0x20(%rdi)
264171.903578113:  8114f19a trace_event_buffer_reserve  mov %rsi, 
0x10(%rdi)
264171.903578113:  8114f19e trace_event_buffer_reserve  mov %r9d, 
0x28(%rdi)
264171.903578113:  8114f1a2 trace_event_buffer_reserve  mov 0x40(%r10), 
%edx
264171.903578113:  8114f1a6 trace_event_buffer_reserve  call 
trace_event_buffer_lock_reserve
264171.903578113:  81142210 

Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Andi Kleen
On Mon, Nov 21, 2016 at 06:18:53PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 21, 2016 at 09:06:13AM -0800, Andi Kleen wrote:
> > > > it got away with attached change.. but this rcu logic
> > > > is far beyond me, so it's just wild guess.. ;-)
> > > 
> > > I think I prefer something like the below, that only annotates the one
> > > RDMSR in question, instead of all of them.
> > 
> > It would be far better to just fix trace points that they always work.
> > 
> > This whole thing is a travesty: we have tens of thousands of lines of code 
> > in
> > ftrace to support tracing in NMIs, but then "debug features"[1] like
> > this come around and make trace points unusable for far more code than
> > just the NMI handlers.
> 
> Not sure the idle handlers are more code than we have NMI code, but yes,
> its annoying.
> 
> Its not ftrace as such though, its RCU, ftrace simply uses RCU to avoid
> locking, as one does.
> 
> Biggest objection would be that the rcu_irq_enter_irqson() thing does
> POPF and rcu_irq_exit_irqson() does again. So wrapping every tracepoint
> with that is quite a few cycles.

This is only when the trace point is enabled right?

ftrace is already executing a lot of instructions when enabled. Adding two POPF
won't be end of the world.

-Andi


264171.903577780:  810d6040 trace_event_raw_event_sched_switch  push 
%rbp
264171.903577780:  810d6041 trace_event_raw_event_sched_switch  mov 
%rsp, %rbp
264171.903577780:  810d6044 trace_event_raw_event_sched_switch  push 
%r15
264171.903577780:  810d6046 trace_event_raw_event_sched_switch  push 
%r14
264171.903577780:  810d6048 trace_event_raw_event_sched_switch  push 
%r13
264171.903577780:  810d604a trace_event_raw_event_sched_switch  push 
%r12
264171.903577780:  810d604c trace_event_raw_event_sched_switch  mov 
%esi, %r13d
264171.903577780:  810d604f trace_event_raw_event_sched_switch  push 
%rbx
264171.903577780:  810d6050 trace_event_raw_event_sched_switch  mov 
%rdi, %r12
264171.903577780:  810d6053 trace_event_raw_event_sched_switch  mov 
%rdx, %r14
264171.903577780:  810d6056 trace_event_raw_event_sched_switch  mov 
%rcx, %r15
264171.903577780:  810d6059 trace_event_raw_event_sched_switch  sub 
$0x30, %rsp
264171.903577780:  810d605d trace_event_raw_event_sched_switch  mov 
0x48(%rdi), %rbx
264171.903577780:  810d6061 trace_event_raw_event_sched_switch  test 
$0x80, %bl
264171.903577780:  810d6064 trace_event_raw_event_sched_switch  jnz 
0x810d608c
264171.903577780:  810d6066 trace_event_raw_event_sched_switch  test 
$0x40, %bl
264171.903577780:  810d6069 trace_event_raw_event_sched_switch  jnz 
trace_event_raw_event_sched_switch+216
264171.903577780:  810d606f trace_event_raw_event_sched_switch  test 
$0x20, %bl
264171.903577780:  810d6072 trace_event_raw_event_sched_switch  jz 
0x810d6083
264171.903577780:  810d6083 trace_event_raw_event_sched_switch  and 
$0x1, %bh
264171.903577780:  810d6086 trace_event_raw_event_sched_switch  jnz 
trace_event_raw_event_sched_switch+237
264171.903578113:  810d608c trace_event_raw_event_sched_switch  mov 
$0x40, %edx
264171.903578113:  810d6091 trace_event_raw_event_sched_switch  mov 
%r12, %rsi
264171.903578113:  810d6094 trace_event_raw_event_sched_switch  mov 
%rsp, %rdi
264171.903578113:  810d6097 trace_event_raw_event_sched_switch  call 
trace_event_buffer_reserve
264171.903578113:  8114f150 trace_event_buffer_reserve  test $0x1, 
0x49(%rsi)
264171.903578113:  8114f154 trace_event_buffer_reserve  push %rbp
264171.903578113:  8114f155 trace_event_buffer_reserve  mov 0x10(%rsi), 
%r10
264171.903578113:  8114f159 trace_event_buffer_reserve  mov %rsp, %rbp
264171.903578113:  8114f15c trace_event_buffer_reserve  push %rbx
264171.903578113:  8114f15d trace_event_buffer_reserve  jz 0x8114f17e
264171.903578113:  8114f17e trace_event_buffer_reserve  mov %rdx, %rcx
264171.903578113:  8114f181 trace_event_buffer_reserve  mov %rdi, %rbx
264171.903578113:  8114f184 trace_event_buffer_reserve  pushfq 
264171.903578113:  8114f185 trace_event_buffer_reserve  pop %r8
264171.903578113:  8114f187 trace_event_buffer_reserve  mov 
%gs:0x7eebd2c1(%rip), %r9d
264171.903578113:  8114f18f trace_event_buffer_reserve  and 
$0x7fff, %r9d
264171.903578113:  8114f196 trace_event_buffer_reserve  mov %r8, 
0x20(%rdi)
264171.903578113:  8114f19a trace_event_buffer_reserve  mov %rsi, 
0x10(%rdi)
264171.903578113:  8114f19e trace_event_buffer_reserve  mov %r9d, 
0x28(%rdi)
264171.903578113:  8114f1a2 trace_event_buffer_reserve  mov 0x40(%r10), 
%edx
264171.903578113:  8114f1a6 trace_event_buffer_reserve  call 
trace_event_buffer_lock_reserve
264171.903578113:  81142210 

Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 09:06:13AM -0800, Andi Kleen wrote:
> > > it got away with attached change.. but this rcu logic
> > > is far beyond me, so it's just wild guess.. ;-)
> > 
> > I think I prefer something like the below, that only annotates the one
> > RDMSR in question, instead of all of them.
> 
> It would be far better to just fix trace points that they always work.
> 
> This whole thing is a travesty: we have tens of thousands of lines of code in
> ftrace to support tracing in NMIs, but then "debug features"[1] like
> this come around and make trace points unusable for far more code than
> just the NMI handlers.

Not sure the idle handlers are more code than we have NMI code, but yes,
its annoying.

Its not ftrace as such though, its RCU, ftrace simply uses RCU to avoid
locking, as one does.

Biggest objection would be that the rcu_irq_enter_irqson() thing does
POPF and rcu_irq_exit_irqson() does again. So wrapping every tracepoint
with that is quite a few cycles.



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 09:06:13AM -0800, Andi Kleen wrote:
> > > it got away with attached change.. but this rcu logic
> > > is far beyond me, so it's just wild guess.. ;-)
> > 
> > I think I prefer something like the below, that only annotates the one
> > RDMSR in question, instead of all of them.
> 
> It would be far better to just fix trace points that they always work.
> 
> This whole thing is a travesty: we have tens of thousands of lines of code in
> ftrace to support tracing in NMIs, but then "debug features"[1] like
> this come around and make trace points unusable for far more code than
> just the NMI handlers.

Not sure the idle handlers are more code than we have NMI code, but yes,
its annoying.

Its not ftrace as such though, its RCU, ftrace simply uses RCU to avoid
locking, as one does.

Biggest objection would be that the rcu_irq_enter_irqson() thing does
POPF and rcu_irq_exit_irqson() does again. So wrapping every tracepoint
with that is quite a few cycles.



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Andi Kleen
> > it got away with attached change.. but this rcu logic
> > is far beyond me, so it's just wild guess.. ;-)
> 
> I think I prefer something like the below, that only annotates the one
> RDMSR in question, instead of all of them.

It would be far better to just fix trace points that they always work.

This whole thing is a travesty: we have tens of thousands of lines of code in
ftrace to support tracing in NMIs, but then "debug features"[1] like
this come around and make trace points unusable for far more code than
just the NMI handlers.

-Andi

[1] unclear what they  actually "debug" here.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Andi Kleen
> > it got away with attached change.. but this rcu logic
> > is far beyond me, so it's just wild guess.. ;-)
> 
> I think I prefer something like the below, that only annotates the one
> RDMSR in question, instead of all of them.

It would be far better to just fix trace points that they always work.

This whole thing is a travesty: we have tens of thousands of lines of code in
ftrace to support tracing in NMIs, but then "debug features"[1] like
this come around and make trace points unusable for far more code than
just the NMI handlers.

-Andi

[1] unclear what they  actually "debug" here.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Borislav Petkov
+ tglx.

On Mon, Nov 21, 2016 at 04:41:04PM +0100, Peter Zijlstra wrote:
> Right, I just wondered about the !c1e present case. In that case we'll
> forever read the msr in the hope of finding that bit set, and we never
> will.

Well, erratum 400 flag is set on everything F10h from model 0x2 onwards,
which is basically everything that's still out there. And we set it for
most of the K8s too. So I don't think there's an affected machine out
there where we don't enable that erratum for so I'd assume it is turned
on everywhere.

IOW, what's the worst thing that can happen if we did this below?

We basically get rid of the detection and switch the timer to broadcast
mode immediately on the halting CPU.

amd_e400_idle() is behind an "if (cpu_has_bug(c, X86_BUG_AMD_APIC_C1E))"
check so it will run on the affected CPUs only...

Thoughts?

---
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0888a879120f..7ab9d0254b4d 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -354,41 +354,30 @@ void amd_e400_remove_cpu(int cpu)
  */
 static void amd_e400_idle(void)
 {
-   if (!amd_e400_c1e_detected) {
-   u32 lo, hi;
+   int cpu = smp_processor_id();
 
-   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
+   if (!static_cpu_has(X86_FEATURE_NONSTOP_TSC))
+   mark_tsc_unstable("TSC halt in AMD C1E");
 
-   if (lo & K8_INTP_C1E_ACTIVE_MASK) {
-   amd_e400_c1e_detected = true;
-   if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
-   mark_tsc_unstable("TSC halt in AMD C1E");
-   pr_info("System has AMD C1E enabled\n");
-   }
-   }
-
-   if (amd_e400_c1e_detected) {
-   int cpu = smp_processor_id();
+   pr_info("System has AMD C1E enabled\n");
 
-   if (!cpumask_test_cpu(cpu, amd_e400_c1e_mask)) {
-   cpumask_set_cpu(cpu, amd_e400_c1e_mask);
-   /* Force broadcast so ACPI can not interfere. */
-   tick_broadcast_force();
-   pr_info("Switch to broadcast mode on CPU%d\n", cpu);
-   }
-   tick_broadcast_enter();
+   if (!cpumask_test_cpu(cpu, amd_e400_c1e_mask)) {
+   cpumask_set_cpu(cpu, amd_e400_c1e_mask);
+   /* Force broadcast so ACPI can not interfere. */
+   tick_broadcast_force();
+   pr_info("Switch to broadcast mode on CPU%d\n", cpu);
+   }
+   tick_broadcast_enter();
 
-   default_idle();
+   default_idle();
 
-   /*
-* The switch back from broadcast mode needs to be
-* called with interrupts disabled.
-*/
-   local_irq_disable();
-   tick_broadcast_exit();
-   local_irq_enable();
-   } else
-   default_idle();
+   /*
+* The switch back from broadcast mode needs to be
+* called with interrupts disabled.
+*/
+   local_irq_disable();
+   tick_broadcast_exit();
+   local_irq_enable();
 }
 
 /*

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Borislav Petkov
+ tglx.

On Mon, Nov 21, 2016 at 04:41:04PM +0100, Peter Zijlstra wrote:
> Right, I just wondered about the !c1e present case. In that case we'll
> forever read the msr in the hope of finding that bit set, and we never
> will.

Well, erratum 400 flag is set on everything F10h from model 0x2 onwards,
which is basically everything that's still out there. And we set it for
most of the K8s too. So I don't think there's an affected machine out
there where we don't enable that erratum for so I'd assume it is turned
on everywhere.

IOW, what's the worst thing that can happen if we did this below?

We basically get rid of the detection and switch the timer to broadcast
mode immediately on the halting CPU.

amd_e400_idle() is behind an "if (cpu_has_bug(c, X86_BUG_AMD_APIC_C1E))"
check so it will run on the affected CPUs only...

Thoughts?

---
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0888a879120f..7ab9d0254b4d 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -354,41 +354,30 @@ void amd_e400_remove_cpu(int cpu)
  */
 static void amd_e400_idle(void)
 {
-   if (!amd_e400_c1e_detected) {
-   u32 lo, hi;
+   int cpu = smp_processor_id();
 
-   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
+   if (!static_cpu_has(X86_FEATURE_NONSTOP_TSC))
+   mark_tsc_unstable("TSC halt in AMD C1E");
 
-   if (lo & K8_INTP_C1E_ACTIVE_MASK) {
-   amd_e400_c1e_detected = true;
-   if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
-   mark_tsc_unstable("TSC halt in AMD C1E");
-   pr_info("System has AMD C1E enabled\n");
-   }
-   }
-
-   if (amd_e400_c1e_detected) {
-   int cpu = smp_processor_id();
+   pr_info("System has AMD C1E enabled\n");
 
-   if (!cpumask_test_cpu(cpu, amd_e400_c1e_mask)) {
-   cpumask_set_cpu(cpu, amd_e400_c1e_mask);
-   /* Force broadcast so ACPI can not interfere. */
-   tick_broadcast_force();
-   pr_info("Switch to broadcast mode on CPU%d\n", cpu);
-   }
-   tick_broadcast_enter();
+   if (!cpumask_test_cpu(cpu, amd_e400_c1e_mask)) {
+   cpumask_set_cpu(cpu, amd_e400_c1e_mask);
+   /* Force broadcast so ACPI can not interfere. */
+   tick_broadcast_force();
+   pr_info("Switch to broadcast mode on CPU%d\n", cpu);
+   }
+   tick_broadcast_enter();
 
-   default_idle();
+   default_idle();
 
-   /*
-* The switch back from broadcast mode needs to be
-* called with interrupts disabled.
-*/
-   local_irq_disable();
-   tick_broadcast_exit();
-   local_irq_enable();
-   } else
-   default_idle();
+   /*
+* The switch back from broadcast mode needs to be
+* called with interrupts disabled.
+*/
+   local_irq_disable();
+   tick_broadcast_exit();
+   local_irq_enable();
 }
 
 /*

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 04:35:38PM +0100, Borislav Petkov wrote:
> On Mon, Nov 21, 2016 at 03:37:16PM +0100, Peter Zijlstra wrote:
> > Yeah, but this one does a printk() when it hits the contidion it checks
> > for, so not tracing it would be fine I think.
> 
> FWIW, there's already:
> 
> static inline void notrace
> native_write_msr(unsigned int msr, u32 low, u32 high)
> {
> __native_write_msr_notrace(msr, low, high);
> if (msr_tracepoint_active(__tracepoint_write_msr))
> do_trace_write_msr(msr, ((u64)high << 32 | low), 0);
> }
> 
> so could be mirrored.

Yeah, I know. I caused that to appear ;-)

> > Also, Boris, why do we need to redo that rdmsr until we see that bit
> > set? Can't we simply do the rdmsr once and then be done with it?
> 
> Hmm, so the way I read the definition of those bits -
> K8_INTP_C1E_ACTIVE_MASK - it looks like they get set when the cores
> enter halt:
> 
> 28 C1eOnCmpHalt: C1E on chip multi-processing halt. Read-write.
> 
> 1=When all cores of a processor have entered the halt state, the processor
> ^^^
> generates an IO cycle as specified by IORd, IOMsgData, and IOMsgAddr.
> When this bit is set, SmiOnCmpHalt and IntPndMsg must be 0, otherwise
> the behavior is undefined. For revision DA-C, this bit is only supported
> for dual-core processors. For revision C3 and E, this bit is supported
> for any number of cores. See 2.4.3.3.3 [Hardware Initiated C1E].
> 
> 27 SmiOnCmpHalt: SMI on chip multi-processing halt. Read-write.
> 
> 1=When all cores of the processor have entered the halt state, the
> ^
> 
> processor generates an SMI-trigger IO cycle as specified by IORd,
> IOMsgData, and IOMsgAddr. When this bit is set C1eOnCmpHalt and
> IntPndMsg must be 0, otherwise the behavior is undefined. The status is
> stored in SMMFEC4[SmiOnCmpHaltSts]. See 2.4.3.3.1 [SMI Initiated C1E].
> 
> But the thing is, we do it only once until amd_e400_c1e_detected is true.
> 
> FWIW, I'd love it if we could do
> 
>   if (cpu_has_bug(c, X86_BUG_AMD_APIC_C1E))
> 
> but I'm afraid we need to look at that MSR at least once until we set
> the boolean.
> 
> I could ask if that is really the case and whether we can detect it
> differently...
> 
> In any case, I have a box here in case you want me to test patches.

Right, I just wondered about the !c1e present case. In that case we'll
forever read the msr in the hope of finding that bit set, and we never
will.

Don't care too much, just idle curiousity. Let me do the rdmsr_notrace()
thing.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 04:35:38PM +0100, Borislav Petkov wrote:
> On Mon, Nov 21, 2016 at 03:37:16PM +0100, Peter Zijlstra wrote:
> > Yeah, but this one does a printk() when it hits the contidion it checks
> > for, so not tracing it would be fine I think.
> 
> FWIW, there's already:
> 
> static inline void notrace
> native_write_msr(unsigned int msr, u32 low, u32 high)
> {
> __native_write_msr_notrace(msr, low, high);
> if (msr_tracepoint_active(__tracepoint_write_msr))
> do_trace_write_msr(msr, ((u64)high << 32 | low), 0);
> }
> 
> so could be mirrored.

Yeah, I know. I caused that to appear ;-)

> > Also, Boris, why do we need to redo that rdmsr until we see that bit
> > set? Can't we simply do the rdmsr once and then be done with it?
> 
> Hmm, so the way I read the definition of those bits -
> K8_INTP_C1E_ACTIVE_MASK - it looks like they get set when the cores
> enter halt:
> 
> 28 C1eOnCmpHalt: C1E on chip multi-processing halt. Read-write.
> 
> 1=When all cores of a processor have entered the halt state, the processor
> ^^^
> generates an IO cycle as specified by IORd, IOMsgData, and IOMsgAddr.
> When this bit is set, SmiOnCmpHalt and IntPndMsg must be 0, otherwise
> the behavior is undefined. For revision DA-C, this bit is only supported
> for dual-core processors. For revision C3 and E, this bit is supported
> for any number of cores. See 2.4.3.3.3 [Hardware Initiated C1E].
> 
> 27 SmiOnCmpHalt: SMI on chip multi-processing halt. Read-write.
> 
> 1=When all cores of the processor have entered the halt state, the
> ^
> 
> processor generates an SMI-trigger IO cycle as specified by IORd,
> IOMsgData, and IOMsgAddr. When this bit is set C1eOnCmpHalt and
> IntPndMsg must be 0, otherwise the behavior is undefined. The status is
> stored in SMMFEC4[SmiOnCmpHaltSts]. See 2.4.3.3.1 [SMI Initiated C1E].
> 
> But the thing is, we do it only once until amd_e400_c1e_detected is true.
> 
> FWIW, I'd love it if we could do
> 
>   if (cpu_has_bug(c, X86_BUG_AMD_APIC_C1E))
> 
> but I'm afraid we need to look at that MSR at least once until we set
> the boolean.
> 
> I could ask if that is really the case and whether we can detect it
> differently...
> 
> In any case, I have a box here in case you want me to test patches.

Right, I just wondered about the !c1e present case. In that case we'll
forever read the msr in the hope of finding that bit set, and we never
will.

Don't care too much, just idle curiousity. Let me do the rdmsr_notrace()
thing.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Borislav Petkov
On Mon, Nov 21, 2016 at 03:37:16PM +0100, Peter Zijlstra wrote:
> Yeah, but this one does a printk() when it hits the contidion it checks
> for, so not tracing it would be fine I think.

FWIW, there's already:

static inline void notrace
native_write_msr(unsigned int msr, u32 low, u32 high)
{
__native_write_msr_notrace(msr, low, high);
if (msr_tracepoint_active(__tracepoint_write_msr))
do_trace_write_msr(msr, ((u64)high << 32 | low), 0);
}

so could be mirrored.

> Also, Boris, why do we need to redo that rdmsr until we see that bit
> set? Can't we simply do the rdmsr once and then be done with it?

Hmm, so the way I read the definition of those bits -
K8_INTP_C1E_ACTIVE_MASK - it looks like they get set when the cores
enter halt:

28 C1eOnCmpHalt: C1E on chip multi-processing halt. Read-write.

1=When all cores of a processor have entered the halt state, the processor
^^^
generates an IO cycle as specified by IORd, IOMsgData, and IOMsgAddr.
When this bit is set, SmiOnCmpHalt and IntPndMsg must be 0, otherwise
the behavior is undefined. For revision DA-C, this bit is only supported
for dual-core processors. For revision C3 and E, this bit is supported
for any number of cores. See 2.4.3.3.3 [Hardware Initiated C1E].

27 SmiOnCmpHalt: SMI on chip multi-processing halt. Read-write.

1=When all cores of the processor have entered the halt state, the
^

processor generates an SMI-trigger IO cycle as specified by IORd,
IOMsgData, and IOMsgAddr. When this bit is set C1eOnCmpHalt and
IntPndMsg must be 0, otherwise the behavior is undefined. The status is
stored in SMMFEC4[SmiOnCmpHaltSts]. See 2.4.3.3.1 [SMI Initiated C1E].

But the thing is, we do it only once until amd_e400_c1e_detected is true.

FWIW, I'd love it if we could do

if (cpu_has_bug(c, X86_BUG_AMD_APIC_C1E))

but I'm afraid we need to look at that MSR at least once until we set
the boolean.

I could ask if that is really the case and whether we can detect it
differently...

In any case, I have a box here in case you want me to test patches.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Borislav Petkov
On Mon, Nov 21, 2016 at 03:37:16PM +0100, Peter Zijlstra wrote:
> Yeah, but this one does a printk() when it hits the contidion it checks
> for, so not tracing it would be fine I think.

FWIW, there's already:

static inline void notrace
native_write_msr(unsigned int msr, u32 low, u32 high)
{
__native_write_msr_notrace(msr, low, high);
if (msr_tracepoint_active(__tracepoint_write_msr))
do_trace_write_msr(msr, ((u64)high << 32 | low), 0);
}

so could be mirrored.

> Also, Boris, why do we need to redo that rdmsr until we see that bit
> set? Can't we simply do the rdmsr once and then be done with it?

Hmm, so the way I read the definition of those bits -
K8_INTP_C1E_ACTIVE_MASK - it looks like they get set when the cores
enter halt:

28 C1eOnCmpHalt: C1E on chip multi-processing halt. Read-write.

1=When all cores of a processor have entered the halt state, the processor
^^^
generates an IO cycle as specified by IORd, IOMsgData, and IOMsgAddr.
When this bit is set, SmiOnCmpHalt and IntPndMsg must be 0, otherwise
the behavior is undefined. For revision DA-C, this bit is only supported
for dual-core processors. For revision C3 and E, this bit is supported
for any number of cores. See 2.4.3.3.3 [Hardware Initiated C1E].

27 SmiOnCmpHalt: SMI on chip multi-processing halt. Read-write.

1=When all cores of the processor have entered the halt state, the
^

processor generates an SMI-trigger IO cycle as specified by IORd,
IOMsgData, and IOMsgAddr. When this bit is set C1eOnCmpHalt and
IntPndMsg must be 0, otherwise the behavior is undefined. The status is
stored in SMMFEC4[SmiOnCmpHaltSts]. See 2.4.3.3.1 [SMI Initiated C1E].

But the thing is, we do it only once until amd_e400_c1e_detected is true.

FWIW, I'd love it if we could do

if (cpu_has_bug(c, X86_BUG_AMD_APIC_C1E))

but I'm afraid we need to look at that MSR at least once until we set
the boolean.

I could ask if that is really the case and whether we can detect it
differently...

In any case, I have a box here in case you want me to test patches.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 09:15:43AM -0500, Steven Rostedt wrote:
> On Mon, 21 Nov 2016 13:58:30 +0100
> Peter Zijlstra  wrote:
> 
> > On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:
> > 
> > > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > > > index 0888a879120f..d6c6aa80675f 100644
> > > > --- a/arch/x86/kernel/process.c
> > > > +++ b/arch/x86/kernel/process.c
> > > > @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
> > > > if (!amd_e400_c1e_detected) {
> > > > u32 lo, hi;
> > > >  
> > > > -   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> > > > +   RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
> > > >  
> > > > if (lo & K8_INTP_C1E_ACTIVE_MASK) {
> > > > amd_e400_c1e_detected = true;  
> > 
> > OK, so while looking at this again, I don't like this ether :/
> > 
> > Problem with this one is that it always adds the RCU fiddling overhead,
> > even when we're not tracing.
> > 
> > I could do an rdmsr_notrace() for this one, dunno if its important.
> 
> But that would neglect the point of tracing rdmsr. What about:
> 
>   /* tracepoints require RCU enabled */
>   if (trace_read_msr_enabled())
>   RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
>   else
>   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);

Yeah, but this one does a printk() when it hits the contidion it checks
for, so not tracing it would be fine I think.

Also, Boris, why do we need to redo that rdmsr until we see that bit
set? Can't we simply do the rdmsr once and then be done with it?


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 09:15:43AM -0500, Steven Rostedt wrote:
> On Mon, 21 Nov 2016 13:58:30 +0100
> Peter Zijlstra  wrote:
> 
> > On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:
> > 
> > > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > > > index 0888a879120f..d6c6aa80675f 100644
> > > > --- a/arch/x86/kernel/process.c
> > > > +++ b/arch/x86/kernel/process.c
> > > > @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
> > > > if (!amd_e400_c1e_detected) {
> > > > u32 lo, hi;
> > > >  
> > > > -   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> > > > +   RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
> > > >  
> > > > if (lo & K8_INTP_C1E_ACTIVE_MASK) {
> > > > amd_e400_c1e_detected = true;  
> > 
> > OK, so while looking at this again, I don't like this ether :/
> > 
> > Problem with this one is that it always adds the RCU fiddling overhead,
> > even when we're not tracing.
> > 
> > I could do an rdmsr_notrace() for this one, dunno if its important.
> 
> But that would neglect the point of tracing rdmsr. What about:
> 
>   /* tracepoints require RCU enabled */
>   if (trace_read_msr_enabled())
>   RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
>   else
>   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);

Yeah, but this one does a printk() when it hits the contidion it checks
for, so not tracing it would be fine I think.

Also, Boris, why do we need to redo that rdmsr until we see that bit
set? Can't we simply do the rdmsr once and then be done with it?


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt
On Mon, 21 Nov 2016 13:58:30 +0100
Peter Zijlstra  wrote:

> On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:
> 
> > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > > index 0888a879120f..d6c6aa80675f 100644
> > > --- a/arch/x86/kernel/process.c
> > > +++ b/arch/x86/kernel/process.c
> > > @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
> > >   if (!amd_e400_c1e_detected) {
> > >   u32 lo, hi;
> > >  
> > > - rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> > > + RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
> > >  
> > >   if (lo & K8_INTP_C1E_ACTIVE_MASK) {
> > >   amd_e400_c1e_detected = true;  
> 
> OK, so while looking at this again, I don't like this ether :/
> 
> Problem with this one is that it always adds the RCU fiddling overhead,
> even when we're not tracing.
> 
> I could do an rdmsr_notrace() for this one, dunno if its important.

But that would neglect the point of tracing rdmsr. What about:

/* tracepoints require RCU enabled */
if (trace_read_msr_enabled())
RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
else
rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);

-- Steve


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt
On Mon, 21 Nov 2016 13:58:30 +0100
Peter Zijlstra  wrote:

> On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:
> 
> > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > > index 0888a879120f..d6c6aa80675f 100644
> > > --- a/arch/x86/kernel/process.c
> > > +++ b/arch/x86/kernel/process.c
> > > @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
> > >   if (!amd_e400_c1e_detected) {
> > >   u32 lo, hi;
> > >  
> > > - rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> > > + RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
> > >  
> > >   if (lo & K8_INTP_C1E_ACTIVE_MASK) {
> > >   amd_e400_c1e_detected = true;  
> 
> OK, so while looking at this again, I don't like this ether :/
> 
> Problem with this one is that it always adds the RCU fiddling overhead,
> even when we're not tracing.
> 
> I could do an rdmsr_notrace() for this one, dunno if its important.

But that would neglect the point of tracing rdmsr. What about:

/* tracepoints require RCU enabled */
if (trace_read_msr_enabled())
RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
else
rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);

-- Steve


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt
On Mon, 21 Nov 2016 13:58:30 +0100
Peter Zijlstra  wrote:

> On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:
> 
> > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > > index 0888a879120f..d6c6aa80675f 100644
> > > --- a/arch/x86/kernel/process.c
> > > +++ b/arch/x86/kernel/process.c
> > > @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
> > >   if (!amd_e400_c1e_detected) {
> > >   u32 lo, hi;
> > >  
> > > - rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> > > + RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
> > >  
> > >   if (lo & K8_INTP_C1E_ACTIVE_MASK) {
> > >   amd_e400_c1e_detected = true;  
> 
> OK, so while looking at this again, I don't like this ether :/
> 
> Problem with this one is that it always adds the RCU fiddling overhead,
> even when we're not tracing.
> 
> I could do an rdmsr_notrace() for this one, dunno if its important.

But that would neglect the point of tracing rdmsr. What about:

/* tracepoints require RCU enabled */
if (trace_read_msr_enabled())
RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
else
rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);

-- Steve


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Steven Rostedt
On Mon, 21 Nov 2016 13:58:30 +0100
Peter Zijlstra  wrote:

> On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:
> 
> > > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > > index 0888a879120f..d6c6aa80675f 100644
> > > --- a/arch/x86/kernel/process.c
> > > +++ b/arch/x86/kernel/process.c
> > > @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
> > >   if (!amd_e400_c1e_detected) {
> > >   u32 lo, hi;
> > >  
> > > - rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> > > + RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
> > >  
> > >   if (lo & K8_INTP_C1E_ACTIVE_MASK) {
> > >   amd_e400_c1e_detected = true;  
> 
> OK, so while looking at this again, I don't like this ether :/
> 
> Problem with this one is that it always adds the RCU fiddling overhead,
> even when we're not tracing.
> 
> I could do an rdmsr_notrace() for this one, dunno if its important.

But that would neglect the point of tracing rdmsr. What about:

/* tracepoints require RCU enabled */
if (trace_read_msr_enabled())
RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
else
rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);

-- Steve


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:

> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index 0888a879120f..d6c6aa80675f 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
> > if (!amd_e400_c1e_detected) {
> > u32 lo, hi;
> >  
> > -   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> > +   RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
> >  
> > if (lo & K8_INTP_C1E_ACTIVE_MASK) {
> > amd_e400_c1e_detected = true;

OK, so while looking at this again, I don't like this ether :/

Problem with this one is that it always adds the RCU fiddling overhead,
even when we're not tracing.

I could do an rdmsr_notrace() for this one, dunno if its important.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:

> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index 0888a879120f..d6c6aa80675f 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
> > if (!amd_e400_c1e_detected) {
> > u32 lo, hi;
> >  
> > -   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> > +   RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
> >  
> > if (lo & K8_INTP_C1E_ACTIVE_MASK) {
> > amd_e400_c1e_detected = true;

OK, so while looking at this again, I don't like this ether :/

Problem with this one is that it always adds the RCU fiddling overhead,
even when we're not tracing.

I could do an rdmsr_notrace() for this one, dunno if its important.


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Jiri Olsa
On Mon, Nov 21, 2016 at 12:31:31PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 21, 2016 at 12:22:29PM +0100, Jiri Olsa wrote:
> > On Mon, Nov 21, 2016 at 10:42:27AM +0100, Peter Zijlstra wrote:
> > > On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:
> > > > On Mon, Nov 21, 2016 at 10:28:50AM +0100, Peter Zijlstra wrote:
> > > 
> > > > > I think I prefer something like the below, that only annotates the one
> > > > > RDMSR in question, instead of all of them.
> > > > 
> > > > I was wondering about that, but haven't found RCU_NONIDLE 
> > > 
> > > Its in rcupdate.h ;-)
> > 
> > it was too late in the morning.. ;-) I'm assuming you'll queue this patch.. 
> > ?
> 
> Probably, I still have to make pretty the uncore patch from last week,
> and I have a PEBS patch that needs looking at :-)
> 
> BTW, if you have any opinion on that PEBS register set crap, do holler
> :-)
> 
>   lkml.kernel.org/r/20161117171731.gv3...@twins.programming.kicks-ass.net

ook, will check

thanks,
jirka


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Jiri Olsa
On Mon, Nov 21, 2016 at 12:31:31PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 21, 2016 at 12:22:29PM +0100, Jiri Olsa wrote:
> > On Mon, Nov 21, 2016 at 10:42:27AM +0100, Peter Zijlstra wrote:
> > > On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:
> > > > On Mon, Nov 21, 2016 at 10:28:50AM +0100, Peter Zijlstra wrote:
> > > 
> > > > > I think I prefer something like the below, that only annotates the one
> > > > > RDMSR in question, instead of all of them.
> > > > 
> > > > I was wondering about that, but haven't found RCU_NONIDLE 
> > > 
> > > Its in rcupdate.h ;-)
> > 
> > it was too late in the morning.. ;-) I'm assuming you'll queue this patch.. 
> > ?
> 
> Probably, I still have to make pretty the uncore patch from last week,
> and I have a PEBS patch that needs looking at :-)
> 
> BTW, if you have any opinion on that PEBS register set crap, do holler
> :-)
> 
>   lkml.kernel.org/r/20161117171731.gv3...@twins.programming.kicks-ass.net

ook, will check

thanks,
jirka


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 12:22:29PM +0100, Jiri Olsa wrote:
> On Mon, Nov 21, 2016 at 10:42:27AM +0100, Peter Zijlstra wrote:
> > On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:
> > > On Mon, Nov 21, 2016 at 10:28:50AM +0100, Peter Zijlstra wrote:
> > 
> > > > I think I prefer something like the below, that only annotates the one
> > > > RDMSR in question, instead of all of them.
> > > 
> > > I was wondering about that, but haven't found RCU_NONIDLE 
> > 
> > Its in rcupdate.h ;-)
> 
> it was too late in the morning.. ;-) I'm assuming you'll queue this patch.. ?

Probably, I still have to make pretty the uncore patch from last week,
and I have a PEBS patch that needs looking at :-)

BTW, if you have any opinion on that PEBS register set crap, do holler
:-)

  lkml.kernel.org/r/20161117171731.gv3...@twins.programming.kicks-ass.net


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 12:22:29PM +0100, Jiri Olsa wrote:
> On Mon, Nov 21, 2016 at 10:42:27AM +0100, Peter Zijlstra wrote:
> > On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:
> > > On Mon, Nov 21, 2016 at 10:28:50AM +0100, Peter Zijlstra wrote:
> > 
> > > > I think I prefer something like the below, that only annotates the one
> > > > RDMSR in question, instead of all of them.
> > > 
> > > I was wondering about that, but haven't found RCU_NONIDLE 
> > 
> > Its in rcupdate.h ;-)
> 
> it was too late in the morning.. ;-) I'm assuming you'll queue this patch.. ?

Probably, I still have to make pretty the uncore patch from last week,
and I have a PEBS patch that needs looking at :-)

BTW, if you have any opinion on that PEBS register set crap, do holler
:-)

  lkml.kernel.org/r/20161117171731.gv3...@twins.programming.kicks-ass.net


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Jiri Olsa
On Mon, Nov 21, 2016 at 10:42:27AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:
> > On Mon, Nov 21, 2016 at 10:28:50AM +0100, Peter Zijlstra wrote:
> 
> > > I think I prefer something like the below, that only annotates the one
> > > RDMSR in question, instead of all of them.
> > 
> > I was wondering about that, but haven't found RCU_NONIDLE 
> 
> Its in rcupdate.h ;-)

it was too late in the morning.. ;-) I'm assuming you'll queue this patch.. ?

thanks,
jirka


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Jiri Olsa
On Mon, Nov 21, 2016 at 10:42:27AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:
> > On Mon, Nov 21, 2016 at 10:28:50AM +0100, Peter Zijlstra wrote:
> 
> > > I think I prefer something like the below, that only annotates the one
> > > RDMSR in question, instead of all of them.
> > 
> > I was wondering about that, but haven't found RCU_NONIDLE 
> 
> Its in rcupdate.h ;-)

it was too late in the morning.. ;-) I'm assuming you'll queue this patch.. ?

thanks,
jirka


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Paul E. McKenney
On Mon, Nov 21, 2016 at 10:43:21AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 21, 2016 at 01:02:25AM -0800, Paul E. McKenney wrote:
> > On Mon, Nov 21, 2016 at 01:53:43AM +0100, Jiri Olsa wrote:
> > > 
> > > it got away with attached change.. but this rcu logic
> > > is far beyond me, so it's just wild guess.. ;-)
> > 
> > If in idle, the _rcuidle() is needed, so:
> 
> Well, the point is, only this one rdmsr users is in idle, all the others
> are not, so we should not be annotating _all_ of them, should we?

Fair enough!

Thanx, Paul



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Paul E. McKenney
On Mon, Nov 21, 2016 at 10:43:21AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 21, 2016 at 01:02:25AM -0800, Paul E. McKenney wrote:
> > On Mon, Nov 21, 2016 at 01:53:43AM +0100, Jiri Olsa wrote:
> > > 
> > > it got away with attached change.. but this rcu logic
> > > is far beyond me, so it's just wild guess.. ;-)
> > 
> > If in idle, the _rcuidle() is needed, so:
> 
> Well, the point is, only this one rdmsr users is in idle, all the others
> are not, so we should not be annotating _all_ of them, should we?

Fair enough!

Thanx, Paul



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 01:02:25AM -0800, Paul E. McKenney wrote:
> On Mon, Nov 21, 2016 at 01:53:43AM +0100, Jiri Olsa wrote:
> > 
> > it got away with attached change.. but this rcu logic
> > is far beyond me, so it's just wild guess.. ;-)
> 
> If in idle, the _rcuidle() is needed, so:

Well, the point is, only this one rdmsr users is in idle, all the others
are not, so we should not be annotating _all_ of them, should we?


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 01:02:25AM -0800, Paul E. McKenney wrote:
> On Mon, Nov 21, 2016 at 01:53:43AM +0100, Jiri Olsa wrote:
> > 
> > it got away with attached change.. but this rcu logic
> > is far beyond me, so it's just wild guess.. ;-)
> 
> If in idle, the _rcuidle() is needed, so:

Well, the point is, only this one rdmsr users is in idle, all the others
are not, so we should not be annotating _all_ of them, should we?


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:
> On Mon, Nov 21, 2016 at 10:28:50AM +0100, Peter Zijlstra wrote:

> > I think I prefer something like the below, that only annotates the one
> > RDMSR in question, instead of all of them.
> 
> I was wondering about that, but haven't found RCU_NONIDLE 

Its in rcupdate.h ;-)


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 10:34:25AM +0100, Jiri Olsa wrote:
> On Mon, Nov 21, 2016 at 10:28:50AM +0100, Peter Zijlstra wrote:

> > I think I prefer something like the below, that only annotates the one
> > RDMSR in question, instead of all of them.
> 
> I was wondering about that, but haven't found RCU_NONIDLE 

Its in rcupdate.h ;-)


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Jiri Olsa
On Mon, Nov 21, 2016 at 10:28:50AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 21, 2016 at 01:53:43AM +0100, Jiri Olsa wrote:
> > hi,
> > Jan hit following output when msr tracepoints are enabled on amd server:
> > 
> > [   91.585653] ===
> > [   91.589840] [ INFO: suspicious RCU usage. ]
> > [   91.594025] 4.9.0-rc1+ #1 Not tainted
> > [   91.597691] ---
> > [   91.601877] ./arch/x86/include/asm/msr-trace.h:42 suspicious 
> > rcu_dereference_check() usage!
> > [   91.610222] 
> > [   91.610222] other info that might help us debug this:
> > [   91.610222] 
> > [   91.618224] 
> > [   91.618224] RCU used illegally from idle CPU!
> > [   91.618224] rcu_scheduler_active = 1, debug_locks = 0
> > [   91.629081] RCU used illegally from extended quiescent state!
> > [   91.634820] no locks held by swapper/1/0.
> > [   91.638832] 
> > [   91.638832] stack backtrace:
> > [   91.643192] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc1+ #1
> > [   91.649457] Hardware name: empty empty/S3992, BIOS 'V2.03   ' 05/09/2008
> > [   91.656159]  c900018fbdf8 813ed43c 88017ede8000 
> > 0001
> > [   91.663637]  c900018fbe28 810fdcd7 880233f95dd0 
> > c0010055
> > [   91.671107]    c900018fbe58 
> > 814297ac
> > [   91.678560] Call Trace:
> > [   91.681022]  [] dump_stack+0x85/0xc9
> > [   91.686164]  [] lockdep_rcu_suspicious+0xe7/0x120
> > [   91.692429]  [] do_trace_read_msr+0x14c/0x1b0
> > [   91.698349]  [] native_read_msr+0x32/0x40
> > [   91.703921]  [] amd_e400_idle+0x7e/0x110
> > [   91.709407]  [] arch_cpu_idle+0xf/0x20
> > [   91.714720]  [] default_idle_call+0x23/0x40
> > [   91.720467]  [] cpu_startup_entry+0x1da/0x2b0
> > [   91.726387]  [] start_secondary+0x17f/0x1f0
> > 
> > 
> > it got away with attached change.. but this rcu logic
> > is far beyond me, so it's just wild guess.. ;-)
> 
> I think I prefer something like the below, that only annotates the one
> RDMSR in question, instead of all of them.

I was wondering about that, but haven't found RCU_NONIDLE 

thanks,
jirka

> 
> 
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 0888a879120f..d6c6aa80675f 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
>   if (!amd_e400_c1e_detected) {
>   u32 lo, hi;
>  
> - rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> + RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
>  
>   if (lo & K8_INTP_C1E_ACTIVE_MASK) {
>   amd_e400_c1e_detected = true;


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Jiri Olsa
On Mon, Nov 21, 2016 at 10:28:50AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 21, 2016 at 01:53:43AM +0100, Jiri Olsa wrote:
> > hi,
> > Jan hit following output when msr tracepoints are enabled on amd server:
> > 
> > [   91.585653] ===
> > [   91.589840] [ INFO: suspicious RCU usage. ]
> > [   91.594025] 4.9.0-rc1+ #1 Not tainted
> > [   91.597691] ---
> > [   91.601877] ./arch/x86/include/asm/msr-trace.h:42 suspicious 
> > rcu_dereference_check() usage!
> > [   91.610222] 
> > [   91.610222] other info that might help us debug this:
> > [   91.610222] 
> > [   91.618224] 
> > [   91.618224] RCU used illegally from idle CPU!
> > [   91.618224] rcu_scheduler_active = 1, debug_locks = 0
> > [   91.629081] RCU used illegally from extended quiescent state!
> > [   91.634820] no locks held by swapper/1/0.
> > [   91.638832] 
> > [   91.638832] stack backtrace:
> > [   91.643192] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc1+ #1
> > [   91.649457] Hardware name: empty empty/S3992, BIOS 'V2.03   ' 05/09/2008
> > [   91.656159]  c900018fbdf8 813ed43c 88017ede8000 
> > 0001
> > [   91.663637]  c900018fbe28 810fdcd7 880233f95dd0 
> > c0010055
> > [   91.671107]    c900018fbe58 
> > 814297ac
> > [   91.678560] Call Trace:
> > [   91.681022]  [] dump_stack+0x85/0xc9
> > [   91.686164]  [] lockdep_rcu_suspicious+0xe7/0x120
> > [   91.692429]  [] do_trace_read_msr+0x14c/0x1b0
> > [   91.698349]  [] native_read_msr+0x32/0x40
> > [   91.703921]  [] amd_e400_idle+0x7e/0x110
> > [   91.709407]  [] arch_cpu_idle+0xf/0x20
> > [   91.714720]  [] default_idle_call+0x23/0x40
> > [   91.720467]  [] cpu_startup_entry+0x1da/0x2b0
> > [   91.726387]  [] start_secondary+0x17f/0x1f0
> > 
> > 
> > it got away with attached change.. but this rcu logic
> > is far beyond me, so it's just wild guess.. ;-)
> 
> I think I prefer something like the below, that only annotates the one
> RDMSR in question, instead of all of them.

I was wondering about that, but haven't found RCU_NONIDLE 

thanks,
jirka

> 
> 
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 0888a879120f..d6c6aa80675f 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -357,7 +357,7 @@ static void amd_e400_idle(void)
>   if (!amd_e400_c1e_detected) {
>   u32 lo, hi;
>  
> - rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
> + RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
>  
>   if (lo & K8_INTP_C1E_ACTIVE_MASK) {
>   amd_e400_c1e_detected = true;


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 01:53:43AM +0100, Jiri Olsa wrote:
> hi,
> Jan hit following output when msr tracepoints are enabled on amd server:
> 
> [   91.585653] ===
> [   91.589840] [ INFO: suspicious RCU usage. ]
> [   91.594025] 4.9.0-rc1+ #1 Not tainted
> [   91.597691] ---
> [   91.601877] ./arch/x86/include/asm/msr-trace.h:42 suspicious 
> rcu_dereference_check() usage!
> [   91.610222] 
> [   91.610222] other info that might help us debug this:
> [   91.610222] 
> [   91.618224] 
> [   91.618224] RCU used illegally from idle CPU!
> [   91.618224] rcu_scheduler_active = 1, debug_locks = 0
> [   91.629081] RCU used illegally from extended quiescent state!
> [   91.634820] no locks held by swapper/1/0.
> [   91.638832] 
> [   91.638832] stack backtrace:
> [   91.643192] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc1+ #1
> [   91.649457] Hardware name: empty empty/S3992, BIOS 'V2.03   ' 05/09/2008
> [   91.656159]  c900018fbdf8 813ed43c 88017ede8000 
> 0001
> [   91.663637]  c900018fbe28 810fdcd7 880233f95dd0 
> c0010055
> [   91.671107]    c900018fbe58 
> 814297ac
> [   91.678560] Call Trace:
> [   91.681022]  [] dump_stack+0x85/0xc9
> [   91.686164]  [] lockdep_rcu_suspicious+0xe7/0x120
> [   91.692429]  [] do_trace_read_msr+0x14c/0x1b0
> [   91.698349]  [] native_read_msr+0x32/0x40
> [   91.703921]  [] amd_e400_idle+0x7e/0x110
> [   91.709407]  [] arch_cpu_idle+0xf/0x20
> [   91.714720]  [] default_idle_call+0x23/0x40
> [   91.720467]  [] cpu_startup_entry+0x1da/0x2b0
> [   91.726387]  [] start_secondary+0x17f/0x1f0
> 
> 
> it got away with attached change.. but this rcu logic
> is far beyond me, so it's just wild guess.. ;-)

I think I prefer something like the below, that only annotates the one
RDMSR in question, instead of all of them.


diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0888a879120f..d6c6aa80675f 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -357,7 +357,7 @@ static void amd_e400_idle(void)
if (!amd_e400_c1e_detected) {
u32 lo, hi;
 
-   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
+   RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
 
if (lo & K8_INTP_C1E_ACTIVE_MASK) {
amd_e400_c1e_detected = true;


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Peter Zijlstra
On Mon, Nov 21, 2016 at 01:53:43AM +0100, Jiri Olsa wrote:
> hi,
> Jan hit following output when msr tracepoints are enabled on amd server:
> 
> [   91.585653] ===
> [   91.589840] [ INFO: suspicious RCU usage. ]
> [   91.594025] 4.9.0-rc1+ #1 Not tainted
> [   91.597691] ---
> [   91.601877] ./arch/x86/include/asm/msr-trace.h:42 suspicious 
> rcu_dereference_check() usage!
> [   91.610222] 
> [   91.610222] other info that might help us debug this:
> [   91.610222] 
> [   91.618224] 
> [   91.618224] RCU used illegally from idle CPU!
> [   91.618224] rcu_scheduler_active = 1, debug_locks = 0
> [   91.629081] RCU used illegally from extended quiescent state!
> [   91.634820] no locks held by swapper/1/0.
> [   91.638832] 
> [   91.638832] stack backtrace:
> [   91.643192] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc1+ #1
> [   91.649457] Hardware name: empty empty/S3992, BIOS 'V2.03   ' 05/09/2008
> [   91.656159]  c900018fbdf8 813ed43c 88017ede8000 
> 0001
> [   91.663637]  c900018fbe28 810fdcd7 880233f95dd0 
> c0010055
> [   91.671107]    c900018fbe58 
> 814297ac
> [   91.678560] Call Trace:
> [   91.681022]  [] dump_stack+0x85/0xc9
> [   91.686164]  [] lockdep_rcu_suspicious+0xe7/0x120
> [   91.692429]  [] do_trace_read_msr+0x14c/0x1b0
> [   91.698349]  [] native_read_msr+0x32/0x40
> [   91.703921]  [] amd_e400_idle+0x7e/0x110
> [   91.709407]  [] arch_cpu_idle+0xf/0x20
> [   91.714720]  [] default_idle_call+0x23/0x40
> [   91.720467]  [] cpu_startup_entry+0x1da/0x2b0
> [   91.726387]  [] start_secondary+0x17f/0x1f0
> 
> 
> it got away with attached change.. but this rcu logic
> is far beyond me, so it's just wild guess.. ;-)

I think I prefer something like the below, that only annotates the one
RDMSR in question, instead of all of them.


diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0888a879120f..d6c6aa80675f 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -357,7 +357,7 @@ static void amd_e400_idle(void)
if (!amd_e400_c1e_detected) {
u32 lo, hi;
 
-   rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi);
+   RCU_NONIDLE(rdmsr(MSR_K8_INT_PENDING_MSG, lo, hi));
 
if (lo & K8_INTP_C1E_ACTIVE_MASK) {
amd_e400_c1e_detected = true;


Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Paul E. McKenney
On Mon, Nov 21, 2016 at 01:53:43AM +0100, Jiri Olsa wrote:
> hi,
> Jan hit following output when msr tracepoints are enabled on amd server:
> 
> [   91.585653] ===
> [   91.589840] [ INFO: suspicious RCU usage. ]
> [   91.594025] 4.9.0-rc1+ #1 Not tainted
> [   91.597691] ---
> [   91.601877] ./arch/x86/include/asm/msr-trace.h:42 suspicious 
> rcu_dereference_check() usage!
> [   91.610222] 
> [   91.610222] other info that might help us debug this:
> [   91.610222] 
> [   91.618224] 
> [   91.618224] RCU used illegally from idle CPU!
> [   91.618224] rcu_scheduler_active = 1, debug_locks = 0
> [   91.629081] RCU used illegally from extended quiescent state!
> [   91.634820] no locks held by swapper/1/0.
> [   91.638832] 
> [   91.638832] stack backtrace:
> [   91.643192] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc1+ #1
> [   91.649457] Hardware name: empty empty/S3992, BIOS 'V2.03   ' 05/09/2008
> [   91.656159]  c900018fbdf8 813ed43c 88017ede8000 
> 0001
> [   91.663637]  c900018fbe28 810fdcd7 880233f95dd0 
> c0010055
> [   91.671107]    c900018fbe58 
> 814297ac
> [   91.678560] Call Trace:
> [   91.681022]  [] dump_stack+0x85/0xc9
> [   91.686164]  [] lockdep_rcu_suspicious+0xe7/0x120
> [   91.692429]  [] do_trace_read_msr+0x14c/0x1b0
> [   91.698349]  [] native_read_msr+0x32/0x40
> [   91.703921]  [] amd_e400_idle+0x7e/0x110
> [   91.709407]  [] arch_cpu_idle+0xf/0x20
> [   91.714720]  [] default_idle_call+0x23/0x40
> [   91.720467]  [] cpu_startup_entry+0x1da/0x2b0
> [   91.726387]  [] start_secondary+0x17f/0x1f0
> 
> 
> it got away with attached change.. but this rcu logic
> is far beyond me, so it's just wild guess.. ;-)

If in idle, the _rcuidle() is needed, so:

Acked-by: Paul E. McKenney 

> thanks,
> jirka
> 
> 
> ---
> diff --git a/arch/x86/lib/msr.c b/arch/x86/lib/msr.c
> index d1dee753b949..ca15becea3b6 100644
> --- a/arch/x86/lib/msr.c
> +++ b/arch/x86/lib/msr.c
> @@ -115,14 +115,14 @@ int msr_clear_bit(u32 msr, u8 bit)
>  #ifdef CONFIG_TRACEPOINTS
>  void do_trace_write_msr(unsigned msr, u64 val, int failed)
>  {
> - trace_write_msr(msr, val, failed);
> + trace_write_msr_rcuidle(msr, val, failed);
>  }
>  EXPORT_SYMBOL(do_trace_write_msr);
>  EXPORT_TRACEPOINT_SYMBOL(write_msr);
> 
>  void do_trace_read_msr(unsigned msr, u64 val, int failed)
>  {
> - trace_read_msr(msr, val, failed);
> + trace_read_msr_rcuidle(msr, val, failed);
>  }
>  EXPORT_SYMBOL(do_trace_read_msr);
>  EXPORT_TRACEPOINT_SYMBOL(read_msr);
> 



Re: [BUG] msr-trace.h:42 suspicious rcu_dereference_check() usage!

2016-11-21 Thread Paul E. McKenney
On Mon, Nov 21, 2016 at 01:53:43AM +0100, Jiri Olsa wrote:
> hi,
> Jan hit following output when msr tracepoints are enabled on amd server:
> 
> [   91.585653] ===
> [   91.589840] [ INFO: suspicious RCU usage. ]
> [   91.594025] 4.9.0-rc1+ #1 Not tainted
> [   91.597691] ---
> [   91.601877] ./arch/x86/include/asm/msr-trace.h:42 suspicious 
> rcu_dereference_check() usage!
> [   91.610222] 
> [   91.610222] other info that might help us debug this:
> [   91.610222] 
> [   91.618224] 
> [   91.618224] RCU used illegally from idle CPU!
> [   91.618224] rcu_scheduler_active = 1, debug_locks = 0
> [   91.629081] RCU used illegally from extended quiescent state!
> [   91.634820] no locks held by swapper/1/0.
> [   91.638832] 
> [   91.638832] stack backtrace:
> [   91.643192] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc1+ #1
> [   91.649457] Hardware name: empty empty/S3992, BIOS 'V2.03   ' 05/09/2008
> [   91.656159]  c900018fbdf8 813ed43c 88017ede8000 
> 0001
> [   91.663637]  c900018fbe28 810fdcd7 880233f95dd0 
> c0010055
> [   91.671107]    c900018fbe58 
> 814297ac
> [   91.678560] Call Trace:
> [   91.681022]  [] dump_stack+0x85/0xc9
> [   91.686164]  [] lockdep_rcu_suspicious+0xe7/0x120
> [   91.692429]  [] do_trace_read_msr+0x14c/0x1b0
> [   91.698349]  [] native_read_msr+0x32/0x40
> [   91.703921]  [] amd_e400_idle+0x7e/0x110
> [   91.709407]  [] arch_cpu_idle+0xf/0x20
> [   91.714720]  [] default_idle_call+0x23/0x40
> [   91.720467]  [] cpu_startup_entry+0x1da/0x2b0
> [   91.726387]  [] start_secondary+0x17f/0x1f0
> 
> 
> it got away with attached change.. but this rcu logic
> is far beyond me, so it's just wild guess.. ;-)

If in idle, the _rcuidle() is needed, so:

Acked-by: Paul E. McKenney 

> thanks,
> jirka
> 
> 
> ---
> diff --git a/arch/x86/lib/msr.c b/arch/x86/lib/msr.c
> index d1dee753b949..ca15becea3b6 100644
> --- a/arch/x86/lib/msr.c
> +++ b/arch/x86/lib/msr.c
> @@ -115,14 +115,14 @@ int msr_clear_bit(u32 msr, u8 bit)
>  #ifdef CONFIG_TRACEPOINTS
>  void do_trace_write_msr(unsigned msr, u64 val, int failed)
>  {
> - trace_write_msr(msr, val, failed);
> + trace_write_msr_rcuidle(msr, val, failed);
>  }
>  EXPORT_SYMBOL(do_trace_write_msr);
>  EXPORT_TRACEPOINT_SYMBOL(write_msr);
> 
>  void do_trace_read_msr(unsigned msr, u64 val, int failed)
>  {
> - trace_read_msr(msr, val, failed);
> + trace_read_msr_rcuidle(msr, val, failed);
>  }
>  EXPORT_SYMBOL(do_trace_read_msr);
>  EXPORT_TRACEPOINT_SYMBOL(read_msr);
>