Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-16 Thread Don Zickus
On Thu, Jun 12, 2014 at 09:37:16AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 12, 2014 at 04:00:11PM +0900, HATAYAMA Daisuke wrote:
> > Also, I checked cpuid on the system with Neharlem processor where I
> > have never seen CondChg bit is set.
> > 
> > [root@localhost ~]# ./cpuid -r
> > CPU 0:
> >0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e 
> > edx=0x49656e69
> >0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd 
> > edx=0xbfebfbff
> > 
> >0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x 
> > edx=0x0603
> > ^^
> > So, cpuid tells that CondChg bit is supported on this processor, too.
> 
> Yeah, I can't remember ever seeing that bit on nhm/wsm either. Weird
> stuff that.

Just to add before I forget, this problem has been around for awhile as it
was explained to me.  The reason why it was never reported is because (in
our customer case), the nmi_watchdog clears the register after about 10
seconds after boot.  Most machines do not tend to send external NMIs the
first 10 seconds after booting.  Our customer saw it because he happened
to purposely press his external NMI button to trigger a panic with the
nmi_watchdog disabled and the watchdog happened to be disabled because we
were debugging a kdump problem.

Cheers,
Don



> 
> > > In any case, the proposed patch seems fine, just needs a better
> > > changelog.
> > > 
> > 
> > I see.
> > 
> > I'll write that the problem is that any NMI could be robbed by NMI
> > watchdog explicitly. Now only patch title says this explicitly. This
> > is your first comment.
> 
> Yeah, since that is the actual problem, its good to be clear on that.
> 
> > About CondChgd bit, I cannot write more than I see on actual
> > system. If it's necessary to describe more about CondChgd bit, it
> > would be appreciated if someone tell me more information about it.
> 
> I think we've found all 2 sentences the SDM has about that and unless
> someone from Intel is going to come and explain why they wasted precious
> silicon on this I suppose it will remain a mystery. No need to update on
> that.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-16 Thread Don Zickus
On Thu, Jun 12, 2014 at 09:37:16AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 12, 2014 at 04:00:11PM +0900, HATAYAMA Daisuke wrote:
> > Also, I checked cpuid on the system with Neharlem processor where I
> > have never seen CondChg bit is set.
> > 
> > [root@localhost ~]# ./cpuid -r
> > CPU 0:
> >0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e 
> > edx=0x49656e69
> >0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd 
> > edx=0xbfebfbff
> > 
> >0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x 
> > edx=0x0603
> > ^^
> > So, cpuid tells that CondChg bit is supported on this processor, too.
> 
> Yeah, I can't remember ever seeing that bit on nhm/wsm either. Weird
> stuff that.
> 
> > > In any case, the proposed patch seems fine, just needs a better
> > > changelog.
> > > 
> > 
> > I see.
> > 
> > I'll write that the problem is that any NMI could be robbed by NMI
> > watchdog explicitly. Now only patch title says this explicitly. This
> > is your first comment.
> 
> Yeah, since that is the actual problem, its good to be clear on that.
> 
> > About CondChgd bit, I cannot write more than I see on actual
> > system. If it's necessary to describe more about CondChgd bit, it
> > would be appreciated if someone tell me more information about it.
> 
> I think we've found all 2 sentences the SDM has about that and unless
> someone from Intel is going to come and explain why they wasted precious
> silicon on this I suppose it will remain a mystery. No need to update on
> that.

Just to add to the mix, we (Red Hat) has a customer with the same
problem.  I told them to fight it out with Intel to figure out why that
bit is non-zero at boot.  Partly because I didn't feel like send a patch
upstream and feel the wrath of Peter Z. descend upon me. :-)

So if this patch is acceptable, I would ack it as it fixes our customer's
problem too.

Cheers,
Don

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-16 Thread Don Zickus
On Thu, Jun 12, 2014 at 09:37:16AM +0200, Peter Zijlstra wrote:
 On Thu, Jun 12, 2014 at 04:00:11PM +0900, HATAYAMA Daisuke wrote:
  Also, I checked cpuid on the system with Neharlem processor where I
  have never seen CondChg bit is set.
  
  [root@localhost ~]# ./cpuid -r
  CPU 0:
 0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e 
  edx=0x49656e69
 0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd 
  edx=0xbfebfbff
  snip
 0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x 
  edx=0x0603
  ^^
  So, cpuid tells that CondChg bit is supported on this processor, too.
 
 Yeah, I can't remember ever seeing that bit on nhm/wsm either. Weird
 stuff that.
 
   In any case, the proposed patch seems fine, just needs a better
   changelog.
   
  
  I see.
  
  I'll write that the problem is that any NMI could be robbed by NMI
  watchdog explicitly. Now only patch title says this explicitly. This
  is your first comment.
 
 Yeah, since that is the actual problem, its good to be clear on that.
 
  About CondChgd bit, I cannot write more than I see on actual
  system. If it's necessary to describe more about CondChgd bit, it
  would be appreciated if someone tell me more information about it.
 
 I think we've found all 2 sentences the SDM has about that and unless
 someone from Intel is going to come and explain why they wasted precious
 silicon on this I suppose it will remain a mystery. No need to update on
 that.

Just to add to the mix, we (Red Hat) has a customer with the same
problem.  I told them to fight it out with Intel to figure out why that
bit is non-zero at boot.  Partly because I didn't feel like send a patch
upstream and feel the wrath of Peter Z. descend upon me. :-)

So if this patch is acceptable, I would ack it as it fixes our customer's
problem too.

Cheers,
Don

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-16 Thread Don Zickus
On Thu, Jun 12, 2014 at 09:37:16AM +0200, Peter Zijlstra wrote:
 On Thu, Jun 12, 2014 at 04:00:11PM +0900, HATAYAMA Daisuke wrote:
  Also, I checked cpuid on the system with Neharlem processor where I
  have never seen CondChg bit is set.
  
  [root@localhost ~]# ./cpuid -r
  CPU 0:
 0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e 
  edx=0x49656e69
 0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd 
  edx=0xbfebfbff
  snip
 0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x 
  edx=0x0603
  ^^
  So, cpuid tells that CondChg bit is supported on this processor, too.
 
 Yeah, I can't remember ever seeing that bit on nhm/wsm either. Weird
 stuff that.

Just to add before I forget, this problem has been around for awhile as it
was explained to me.  The reason why it was never reported is because (in
our customer case), the nmi_watchdog clears the register after about 10
seconds after boot.  Most machines do not tend to send external NMIs the
first 10 seconds after booting.  Our customer saw it because he happened
to purposely press his external NMI button to trigger a panic with the
nmi_watchdog disabled and the watchdog happened to be disabled because we
were debugging a kdump problem.

Cheers,
Don



 
   In any case, the proposed patch seems fine, just needs a better
   changelog.
   
  
  I see.
  
  I'll write that the problem is that any NMI could be robbed by NMI
  watchdog explicitly. Now only patch title says this explicitly. This
  is your first comment.
 
 Yeah, since that is the actual problem, its good to be clear on that.
 
  About CondChgd bit, I cannot write more than I see on actual
  system. If it's necessary to describe more about CondChgd bit, it
  would be appreciated if someone tell me more information about it.
 
 I think we've found all 2 sentences the SDM has about that and unless
 someone from Intel is going to come and explain why they wasted precious
 silicon on this I suppose it will remain a mystery. No need to update on
 that.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-12 Thread Peter Zijlstra
On Thu, Jun 12, 2014 at 04:00:11PM +0900, HATAYAMA Daisuke wrote:
> Also, I checked cpuid on the system with Neharlem processor where I
> have never seen CondChg bit is set.
> 
> [root@localhost ~]# ./cpuid -r
> CPU 0:
>0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e 
> edx=0x49656e69
>0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd 
> edx=0xbfebfbff
> 
>0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x 
> edx=0x0603
> ^^
> So, cpuid tells that CondChg bit is supported on this processor, too.

Yeah, I can't remember ever seeing that bit on nhm/wsm either. Weird
stuff that.

> > In any case, the proposed patch seems fine, just needs a better
> > changelog.
> > 
> 
> I see.
> 
> I'll write that the problem is that any NMI could be robbed by NMI
> watchdog explicitly. Now only patch title says this explicitly. This
> is your first comment.

Yeah, since that is the actual problem, its good to be clear on that.

> About CondChgd bit, I cannot write more than I see on actual
> system. If it's necessary to describe more about CondChgd bit, it
> would be appreciated if someone tell me more information about it.

I think we've found all 2 sentences the SDM has about that and unless
someone from Intel is going to come and explain why they wasted precious
silicon on this I suppose it will remain a mystery. No need to update on
that.


pgpd7mAscmkpo.pgp
Description: PGP signature


Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-12 Thread HATAYAMA Daisuke
From: Peter Zijlstra 
Subject: Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI 
handling
Date: Wed, 11 Jun 2014 13:54:13 +0200

> On Wed, Jun 11, 2014 at 10:54:48AM +0200, Peter Zijlstra wrote:
>> > I'm not sure about exact behavior of CondChgd bit, in particular when
>> > this bit is set. Although I read Intel System Programmer's Manual to
>> > figure out but I have yet completed that. At least, I think ignoring
>> > CondChgd bit should be enough for NMI watchdog perspective.
>> 
>> So yes, the SDM lists the bit as existing but never once mentions it
>> outside of that, and its been doing that at least back to 2008.
>> 
>> Ooh, I found it:
>> 
>>   "The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to
>> indicate changes to the state of performance monitoring hardware (see
>> Figure 18-29)."
>> 
>> Which is of course completely useless, not to mention inconsistent with
>> the later CondChgd name.
>> 
>> HPA, can you explain wtf that bit does and why hatayama-san's ivb feels
>> like having that set on boot? 
> 
> Matt found in the MSR listing for GLOBAL_STATUS:
> 
> 63 CondChg: status bits of this register has changed.  If CPUID.0AH: EAX[7:0] 
> > 0
> 
> Which brings us to a grand total of 3 different names for this bit.
> 
> If it indeed does what it says on the tin, set every time the status
> changes its like the most useless bit ever and I wonder why people
> bothered to spend silicon on it.
> 

Yes, I didn't mention in patch description this, but I reached the
same conclusion. The description confuses me because the desciption
and the behaviour of CondChg bit I see on the actual system is not
coincide.

Also, I checked cpuid on the system with Neharlem processor where I
have never seen CondChg bit is set.

[root@localhost ~]# ./cpuid -r
CPU 0:
   0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
   0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd edx=0xbfebfbff

   0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x edx=0x0603
^^
So, cpuid tells that CondChg bit is supported on this processor, too.

> In any case, the proposed patch seems fine, just needs a better
> changelog.
> 

I see.

I'll write that the problem is that any NMI could be robbed by NMI
watchdog explicitly. Now only patch title says this explicitly. This
is your first comment.

About CondChgd bit, I cannot write more than I see on actual
system. If it's necessary to describe more about CondChgd bit, it
would be appreciated if someone tell me more information about it.

Thanks.
HATAYAMA, Daisuke
N‹§²ζμrΈ›yϊθšΨb²X¬ΆΗ§vΨ^–)ήΊ{.nΗ+‰·₯Š{±‘κηzX§Ά›‘ά¨}©ž²Ζ 
zΪ:+v‰¨Ύ«‘κηzZ+€Κ+zf£’·hšˆ§~†­†Ϋi�ϋΰzΉ�w₯’Έ?™¨θ­Ϊ&’)ί’f”ω^jΗ«y§m…α@A«aΆΪ�
0Άμh�ε’i

Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-12 Thread HATAYAMA Daisuke
From: Peter Zijlstra 
Subject: Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI 
handling
Date: Wed, 11 Jun 2014 10:54:48 +0200

> On Wed, Jun 11, 2014 at 04:30:28PM +0900, HATAYAMA Daisuke wrote:
>> Currently, a NMI handler for NMI watchdog may falsely handle any NMI
>> signaled for different purpose if CondChgd bit in
>> MSR_CORE_PERF_GLOBAL_STATUS MSR is set.
>> 
>> This commit deals with the issue simply by ignoring CondChgd bit.
>> 
>> Here is explanation in detail.
>> 
>> On x86 NMI watchdog uses performance monitoring feature to
>> periodically signal NMI each time performance counter gets overflowed.
>> 
>> intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
>> handler of NMI watchdog, perf_event_nmi_handler(). It identifies owner
>> of a given NMI by looking at overflow status bits in
>> MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
>> handles the given NMI as its own NMI.
>> 
>> The problem is that intel_pmu_handle_irq() doesn't distinguish
>> CondChgd bit from other bits. Unlike the other status bits, CondChgd
>> bit doesn't represent overflow status for performance counters. Thus,
>> CondChgd bit cannot be thought of as a mark indicating a given NMI is
>> NMI watchdog's.
> 
> So what was the problem? It ate another NMI?
> 

Yes.

The handler for NMI watchdog is called in NMI_LOCAL. This is done in
earlier timing than NMI_UNKNOWN, and if some of the handlers in
NMI_LOCAL handles at least one NMI, then handlers in NMI_UNKNOWN are
not called. Like this:

handled = nmi_handle(NMI_LOCAL, regs, b2b);
__this_cpu_add(nmi_stats.normal, handled);
if (handled) {
/*
 * There are cases when a NMI handler handles multiple
 * events in the current NMI.  One of these events may
 * be queued for in the next NMI.  Because the event is
 * already handled, the next NMI will result in an unknown
 * NMI.  Instead lets flag this for a potential NMI to
 * swallow.
 */
if (handled > 1)
__this_cpu_write(swallow_nmi, true);
return;
}

For example, we use unknown NMI to make system panic by enabling
kernel.unkown_nmi_panic. Without this fix, unknown NMI is robbed by
NMI handler for NMI watchdog.

>> I noticed this behavior on systems with Ivy Bridge processors: Intel
>> Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
>> CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
>> in the beginning at boot. (then the CondChgd bit is cleared by next
>> wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain 0.)
>> 
>> On the other hand, on older processors such as Nehalem, CondChgd bit
>> is not set in the beginning at boot.
>> 
>> I'm not sure about exact behavior of CondChgd bit, in particular when
>> this bit is set. Although I read Intel System Programmer's Manual to
>> figure out but I have yet completed that. At least, I think ignoring
>> CondChgd bit should be enough for NMI watchdog perspective.
> 
> So yes, the SDM lists the bit as existing but never once mentions it
> outside of that, and its been doing that at least back to 2008.
> 
> Ooh, I found it:
> 
>   "The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to
> indicate changes to the state of performance monitoring hardware (see
> Figure 18-29)."
> 
> Which is of course completely useless, not to mention inconsistent with
> the later CondChgd name.
> 
> HPA, can you explain wtf that bit does and why hatayama-san's ivb feels
> like having that set on boot? 

--
Thanks.
HATAYAMA, Daisuke


Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-12 Thread HATAYAMA Daisuke
From: Peter Zijlstra pet...@infradead.org
Subject: Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI 
handling
Date: Wed, 11 Jun 2014 10:54:48 +0200

 On Wed, Jun 11, 2014 at 04:30:28PM +0900, HATAYAMA Daisuke wrote:
 Currently, a NMI handler for NMI watchdog may falsely handle any NMI
 signaled for different purpose if CondChgd bit in
 MSR_CORE_PERF_GLOBAL_STATUS MSR is set.
 
 This commit deals with the issue simply by ignoring CondChgd bit.
 
 Here is explanation in detail.
 
 On x86 NMI watchdog uses performance monitoring feature to
 periodically signal NMI each time performance counter gets overflowed.
 
 intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
 handler of NMI watchdog, perf_event_nmi_handler(). It identifies owner
 of a given NMI by looking at overflow status bits in
 MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
 handles the given NMI as its own NMI.
 
 The problem is that intel_pmu_handle_irq() doesn't distinguish
 CondChgd bit from other bits. Unlike the other status bits, CondChgd
 bit doesn't represent overflow status for performance counters. Thus,
 CondChgd bit cannot be thought of as a mark indicating a given NMI is
 NMI watchdog's.
 
 So what was the problem? It ate another NMI?
 

Yes.

The handler for NMI watchdog is called in NMI_LOCAL. This is done in
earlier timing than NMI_UNKNOWN, and if some of the handlers in
NMI_LOCAL handles at least one NMI, then handlers in NMI_UNKNOWN are
not called. Like this:

handled = nmi_handle(NMI_LOCAL, regs, b2b);
__this_cpu_add(nmi_stats.normal, handled);
if (handled) {
/*
 * There are cases when a NMI handler handles multiple
 * events in the current NMI.  One of these events may
 * be queued for in the next NMI.  Because the event is
 * already handled, the next NMI will result in an unknown
 * NMI.  Instead lets flag this for a potential NMI to
 * swallow.
 */
if (handled  1)
__this_cpu_write(swallow_nmi, true);
return;
}

For example, we use unknown NMI to make system panic by enabling
kernel.unkown_nmi_panic. Without this fix, unknown NMI is robbed by
NMI handler for NMI watchdog.

 I noticed this behavior on systems with Ivy Bridge processors: Intel
 Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
 CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
 in the beginning at boot. (then the CondChgd bit is cleared by next
 wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain 0.)
 
 On the other hand, on older processors such as Nehalem, CondChgd bit
 is not set in the beginning at boot.
 
 I'm not sure about exact behavior of CondChgd bit, in particular when
 this bit is set. Although I read Intel System Programmer's Manual to
 figure out but I have yet completed that. At least, I think ignoring
 CondChgd bit should be enough for NMI watchdog perspective.
 
 So yes, the SDM lists the bit as existing but never once mentions it
 outside of that, and its been doing that at least back to 2008.
 
 Ooh, I found it:
 
   The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to
 indicate changes to the state of performance monitoring hardware (see
 Figure 18-29).
 
 Which is of course completely useless, not to mention inconsistent with
 the later CondChgd name.
 
 HPA, can you explain wtf that bit does and why hatayama-san's ivb feels
 like having that set on boot? 

--
Thanks.
HATAYAMA, Daisuke


Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-12 Thread HATAYAMA Daisuke
From: Peter Zijlstra pet...@infradead.org
Subject: Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI 
handling
Date: Wed, 11 Jun 2014 13:54:13 +0200

 On Wed, Jun 11, 2014 at 10:54:48AM +0200, Peter Zijlstra wrote:
  I'm not sure about exact behavior of CondChgd bit, in particular when
  this bit is set. Although I read Intel System Programmer's Manual to
  figure out but I have yet completed that. At least, I think ignoring
  CondChgd bit should be enough for NMI watchdog perspective.
 
 So yes, the SDM lists the bit as existing but never once mentions it
 outside of that, and its been doing that at least back to 2008.
 
 Ooh, I found it:
 
   The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to
 indicate changes to the state of performance monitoring hardware (see
 Figure 18-29).
 
 Which is of course completely useless, not to mention inconsistent with
 the later CondChgd name.
 
 HPA, can you explain wtf that bit does and why hatayama-san's ivb feels
 like having that set on boot? 
 
 Matt found in the MSR listing for GLOBAL_STATUS:
 
 63 CondChg: status bits of this register has changed.  If CPUID.0AH: EAX[7:0] 
  0
 
 Which brings us to a grand total of 3 different names for this bit.
 
 If it indeed does what it says on the tin, set every time the status
 changes its like the most useless bit ever and I wonder why people
 bothered to spend silicon on it.
 

Yes, I didn't mention in patch description this, but I reached the
same conclusion. The description confuses me because the desciption
and the behaviour of CondChg bit I see on the actual system is not
coincide.

Also, I checked cpuid on the system with Neharlem processor where I
have never seen CondChg bit is set.

[root@localhost ~]# ./cpuid -r
CPU 0:
   0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
   0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd edx=0xbfebfbff
snip
   0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x edx=0x0603
^^
So, cpuid tells that CondChg bit is supported on this processor, too.

 In any case, the proposed patch seems fine, just needs a better
 changelog.
 

I see.

I'll write that the problem is that any NMI could be robbed by NMI
watchdog explicitly. Now only patch title says this explicitly. This
is your first comment.

About CondChgd bit, I cannot write more than I see on actual
system. If it's necessary to describe more about CondChgd bit, it
would be appreciated if someone tell me more information about it.

Thanks.
HATAYAMA, Daisuke
N‹§²ζμrΈ›yϊθšΨb²X¬ΆΗ§vΨ^–)ήΊ{.nΗ+‰·₯Š{±‘κηzX§Ά›‘ά¨}©ž²Ζ 
zΪj:+v‰¨Ύ«‘κηzZ+€Κ+zf£’·hšˆ§~†­†Ϋi�ϋΰzΉ�w₯’Έ?™¨θ­Ϊ’)ί’f”ω^jΗ«y§m…α@A«aΆΪ�
0Άμh�ε’i

Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-12 Thread Peter Zijlstra
On Thu, Jun 12, 2014 at 04:00:11PM +0900, HATAYAMA Daisuke wrote:
 Also, I checked cpuid on the system with Neharlem processor where I
 have never seen CondChg bit is set.
 
 [root@localhost ~]# ./cpuid -r
 CPU 0:
0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e 
 edx=0x49656e69
0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd 
 edx=0xbfebfbff
 snip
0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x 
 edx=0x0603
 ^^
 So, cpuid tells that CondChg bit is supported on this processor, too.

Yeah, I can't remember ever seeing that bit on nhm/wsm either. Weird
stuff that.

  In any case, the proposed patch seems fine, just needs a better
  changelog.
  
 
 I see.
 
 I'll write that the problem is that any NMI could be robbed by NMI
 watchdog explicitly. Now only patch title says this explicitly. This
 is your first comment.

Yeah, since that is the actual problem, its good to be clear on that.

 About CondChgd bit, I cannot write more than I see on actual
 system. If it's necessary to describe more about CondChgd bit, it
 would be appreciated if someone tell me more information about it.

I think we've found all 2 sentences the SDM has about that and unless
someone from Intel is going to come and explain why they wasted precious
silicon on this I suppose it will remain a mystery. No need to update on
that.


pgpd7mAscmkpo.pgp
Description: PGP signature


Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-11 Thread Matt Fleming
On Wed, 11 Jun, at 01:54:13PM, Peter Zijlstra wrote:
> 
> Matt found in the MSR listing for GLOBAL_STATUS:
> 
> 63 CondChg: status bits of this register has changed.  If CPUID.0AH: EAX[7:0] 
> > 0
> 
> Which brings us to a grand total of 3 different names for this bit.

Right, I'm flinging emails around to get this fixed in the SDM, so that
the answer to,

Q: "How many kernel developers does it take to find the definition of
this bit?"

is less than 4.

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-11 Thread Peter Zijlstra
On Wed, Jun 11, 2014 at 10:54:48AM +0200, Peter Zijlstra wrote:
> > I'm not sure about exact behavior of CondChgd bit, in particular when
> > this bit is set. Although I read Intel System Programmer's Manual to
> > figure out but I have yet completed that. At least, I think ignoring
> > CondChgd bit should be enough for NMI watchdog perspective.
> 
> So yes, the SDM lists the bit as existing but never once mentions it
> outside of that, and its been doing that at least back to 2008.
> 
> Ooh, I found it:
> 
>   "The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to
> indicate changes to the state of performance monitoring hardware (see
> Figure 18-29)."
> 
> Which is of course completely useless, not to mention inconsistent with
> the later CondChgd name.
> 
> HPA, can you explain wtf that bit does and why hatayama-san's ivb feels
> like having that set on boot? 

Matt found in the MSR listing for GLOBAL_STATUS:

63 CondChg: status bits of this register has changed.  If CPUID.0AH: EAX[7:0] > 0

Which brings us to a grand total of 3 different names for this bit.

If it indeed does what it says on the tin, set every time the status
changes its like the most useless bit ever and I wonder why people
bothered to spend silicon on it.

In any case, the proposed patch seems fine, just needs a better
changelog.



pgp0LX_msu2vf.pgp
Description: PGP signature


Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-11 Thread Peter Zijlstra
On Wed, Jun 11, 2014 at 04:30:28PM +0900, HATAYAMA Daisuke wrote:
> Currently, a NMI handler for NMI watchdog may falsely handle any NMI
> signaled for different purpose if CondChgd bit in
> MSR_CORE_PERF_GLOBAL_STATUS MSR is set.
> 
> This commit deals with the issue simply by ignoring CondChgd bit.
> 
> Here is explanation in detail.
> 
> On x86 NMI watchdog uses performance monitoring feature to
> periodically signal NMI each time performance counter gets overflowed.
> 
> intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
> handler of NMI watchdog, perf_event_nmi_handler(). It identifies owner
> of a given NMI by looking at overflow status bits in
> MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
> handles the given NMI as its own NMI.
> 
> The problem is that intel_pmu_handle_irq() doesn't distinguish
> CondChgd bit from other bits. Unlike the other status bits, CondChgd
> bit doesn't represent overflow status for performance counters. Thus,
> CondChgd bit cannot be thought of as a mark indicating a given NMI is
> NMI watchdog's.

So what was the problem? It ate another NMI?

> I noticed this behavior on systems with Ivy Bridge processors: Intel
> Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
> CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
> in the beginning at boot. (then the CondChgd bit is cleared by next
> wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain 0.)
> 
> On the other hand, on older processors such as Nehalem, CondChgd bit
> is not set in the beginning at boot.
> 
> I'm not sure about exact behavior of CondChgd bit, in particular when
> this bit is set. Although I read Intel System Programmer's Manual to
> figure out but I have yet completed that. At least, I think ignoring
> CondChgd bit should be enough for NMI watchdog perspective.

So yes, the SDM lists the bit as existing but never once mentions it
outside of that, and its been doing that at least back to 2008.

Ooh, I found it:

  "The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to
indicate changes to the state of performance monitoring hardware (see
Figure 18-29)."

Which is of course completely useless, not to mention inconsistent with
the later CondChgd name.

HPA, can you explain wtf that bit does and why hatayama-san's ivb feels
like having that set on boot? 


pgp50ejCGf1_9.pgp
Description: PGP signature


Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-11 Thread Peter Zijlstra
On Wed, Jun 11, 2014 at 04:30:28PM +0900, HATAYAMA Daisuke wrote:
 Currently, a NMI handler for NMI watchdog may falsely handle any NMI
 signaled for different purpose if CondChgd bit in
 MSR_CORE_PERF_GLOBAL_STATUS MSR is set.
 
 This commit deals with the issue simply by ignoring CondChgd bit.
 
 Here is explanation in detail.
 
 On x86 NMI watchdog uses performance monitoring feature to
 periodically signal NMI each time performance counter gets overflowed.
 
 intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
 handler of NMI watchdog, perf_event_nmi_handler(). It identifies owner
 of a given NMI by looking at overflow status bits in
 MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
 handles the given NMI as its own NMI.
 
 The problem is that intel_pmu_handle_irq() doesn't distinguish
 CondChgd bit from other bits. Unlike the other status bits, CondChgd
 bit doesn't represent overflow status for performance counters. Thus,
 CondChgd bit cannot be thought of as a mark indicating a given NMI is
 NMI watchdog's.

So what was the problem? It ate another NMI?

 I noticed this behavior on systems with Ivy Bridge processors: Intel
 Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
 CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
 in the beginning at boot. (then the CondChgd bit is cleared by next
 wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain 0.)
 
 On the other hand, on older processors such as Nehalem, CondChgd bit
 is not set in the beginning at boot.
 
 I'm not sure about exact behavior of CondChgd bit, in particular when
 this bit is set. Although I read Intel System Programmer's Manual to
 figure out but I have yet completed that. At least, I think ignoring
 CondChgd bit should be enough for NMI watchdog perspective.

So yes, the SDM lists the bit as existing but never once mentions it
outside of that, and its been doing that at least back to 2008.

Ooh, I found it:

  The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to
indicate changes to the state of performance monitoring hardware (see
Figure 18-29).

Which is of course completely useless, not to mention inconsistent with
the later CondChgd name.

HPA, can you explain wtf that bit does and why hatayama-san's ivb feels
like having that set on boot? 


pgp50ejCGf1_9.pgp
Description: PGP signature


Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-11 Thread Peter Zijlstra
On Wed, Jun 11, 2014 at 10:54:48AM +0200, Peter Zijlstra wrote:
  I'm not sure about exact behavior of CondChgd bit, in particular when
  this bit is set. Although I read Intel System Programmer's Manual to
  figure out but I have yet completed that. At least, I think ignoring
  CondChgd bit should be enough for NMI watchdog perspective.
 
 So yes, the SDM lists the bit as existing but never once mentions it
 outside of that, and its been doing that at least back to 2008.
 
 Ooh, I found it:
 
   The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to
 indicate changes to the state of performance monitoring hardware (see
 Figure 18-29).
 
 Which is of course completely useless, not to mention inconsistent with
 the later CondChgd name.
 
 HPA, can you explain wtf that bit does and why hatayama-san's ivb feels
 like having that set on boot? 

Matt found in the MSR listing for GLOBAL_STATUS:

63 CondChg: status bits of this register has changed.  If CPUID.0AH: EAX[7:0]  0

Which brings us to a grand total of 3 different names for this bit.

If it indeed does what it says on the tin, set every time the status
changes its like the most useless bit ever and I wonder why people
bothered to spend silicon on it.

In any case, the proposed patch seems fine, just needs a better
changelog.



pgp0LX_msu2vf.pgp
Description: PGP signature


Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling

2014-06-11 Thread Matt Fleming
On Wed, 11 Jun, at 01:54:13PM, Peter Zijlstra wrote:
 
 Matt found in the MSR listing for GLOBAL_STATUS:
 
 63 CondChg: status bits of this register has changed.  If CPUID.0AH: EAX[7:0] 
  0
 
 Which brings us to a grand total of 3 different names for this bit.

Right, I'm flinging emails around to get this fixed in the SDM, so that
the answer to,

Q: How many kernel developers does it take to find the definition of
this bit?

is less than 4.

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/