Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
On Thu, Jun 12, 2014 at 09:37:16AM +0200, Peter Zijlstra wrote: > On Thu, Jun 12, 2014 at 04:00:11PM +0900, HATAYAMA Daisuke wrote: > > Also, I checked cpuid on the system with Neharlem processor where I > > have never seen CondChg bit is set. > > > > [root@localhost ~]# ./cpuid -r > > CPU 0: > >0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e > > edx=0x49656e69 > >0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd > > edx=0xbfebfbff > > > >0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x > > edx=0x0603 > > ^^ > > So, cpuid tells that CondChg bit is supported on this processor, too. > > Yeah, I can't remember ever seeing that bit on nhm/wsm either. Weird > stuff that. Just to add before I forget, this problem has been around for awhile as it was explained to me. The reason why it was never reported is because (in our customer case), the nmi_watchdog clears the register after about 10 seconds after boot. Most machines do not tend to send external NMIs the first 10 seconds after booting. Our customer saw it because he happened to purposely press his external NMI button to trigger a panic with the nmi_watchdog disabled and the watchdog happened to be disabled because we were debugging a kdump problem. Cheers, Don > > > > In any case, the proposed patch seems fine, just needs a better > > > changelog. > > > > > > > I see. > > > > I'll write that the problem is that any NMI could be robbed by NMI > > watchdog explicitly. Now only patch title says this explicitly. This > > is your first comment. > > Yeah, since that is the actual problem, its good to be clear on that. > > > About CondChgd bit, I cannot write more than I see on actual > > system. If it's necessary to describe more about CondChgd bit, it > > would be appreciated if someone tell me more information about it. > > I think we've found all 2 sentences the SDM has about that and unless > someone from Intel is going to come and explain why they wasted precious > silicon on this I suppose it will remain a mystery. No need to update on > that. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
On Thu, Jun 12, 2014 at 09:37:16AM +0200, Peter Zijlstra wrote: > On Thu, Jun 12, 2014 at 04:00:11PM +0900, HATAYAMA Daisuke wrote: > > Also, I checked cpuid on the system with Neharlem processor where I > > have never seen CondChg bit is set. > > > > [root@localhost ~]# ./cpuid -r > > CPU 0: > >0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e > > edx=0x49656e69 > >0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd > > edx=0xbfebfbff > > > >0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x > > edx=0x0603 > > ^^ > > So, cpuid tells that CondChg bit is supported on this processor, too. > > Yeah, I can't remember ever seeing that bit on nhm/wsm either. Weird > stuff that. > > > > In any case, the proposed patch seems fine, just needs a better > > > changelog. > > > > > > > I see. > > > > I'll write that the problem is that any NMI could be robbed by NMI > > watchdog explicitly. Now only patch title says this explicitly. This > > is your first comment. > > Yeah, since that is the actual problem, its good to be clear on that. > > > About CondChgd bit, I cannot write more than I see on actual > > system. If it's necessary to describe more about CondChgd bit, it > > would be appreciated if someone tell me more information about it. > > I think we've found all 2 sentences the SDM has about that and unless > someone from Intel is going to come and explain why they wasted precious > silicon on this I suppose it will remain a mystery. No need to update on > that. Just to add to the mix, we (Red Hat) has a customer with the same problem. I told them to fight it out with Intel to figure out why that bit is non-zero at boot. Partly because I didn't feel like send a patch upstream and feel the wrath of Peter Z. descend upon me. :-) So if this patch is acceptable, I would ack it as it fixes our customer's problem too. Cheers, Don -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
On Thu, Jun 12, 2014 at 09:37:16AM +0200, Peter Zijlstra wrote: On Thu, Jun 12, 2014 at 04:00:11PM +0900, HATAYAMA Daisuke wrote: Also, I checked cpuid on the system with Neharlem processor where I have never seen CondChg bit is set. [root@localhost ~]# ./cpuid -r CPU 0: 0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69 0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd edx=0xbfebfbff snip 0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x edx=0x0603 ^^ So, cpuid tells that CondChg bit is supported on this processor, too. Yeah, I can't remember ever seeing that bit on nhm/wsm either. Weird stuff that. In any case, the proposed patch seems fine, just needs a better changelog. I see. I'll write that the problem is that any NMI could be robbed by NMI watchdog explicitly. Now only patch title says this explicitly. This is your first comment. Yeah, since that is the actual problem, its good to be clear on that. About CondChgd bit, I cannot write more than I see on actual system. If it's necessary to describe more about CondChgd bit, it would be appreciated if someone tell me more information about it. I think we've found all 2 sentences the SDM has about that and unless someone from Intel is going to come and explain why they wasted precious silicon on this I suppose it will remain a mystery. No need to update on that. Just to add to the mix, we (Red Hat) has a customer with the same problem. I told them to fight it out with Intel to figure out why that bit is non-zero at boot. Partly because I didn't feel like send a patch upstream and feel the wrath of Peter Z. descend upon me. :-) So if this patch is acceptable, I would ack it as it fixes our customer's problem too. Cheers, Don -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
On Thu, Jun 12, 2014 at 09:37:16AM +0200, Peter Zijlstra wrote: On Thu, Jun 12, 2014 at 04:00:11PM +0900, HATAYAMA Daisuke wrote: Also, I checked cpuid on the system with Neharlem processor where I have never seen CondChg bit is set. [root@localhost ~]# ./cpuid -r CPU 0: 0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69 0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd edx=0xbfebfbff snip 0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x edx=0x0603 ^^ So, cpuid tells that CondChg bit is supported on this processor, too. Yeah, I can't remember ever seeing that bit on nhm/wsm either. Weird stuff that. Just to add before I forget, this problem has been around for awhile as it was explained to me. The reason why it was never reported is because (in our customer case), the nmi_watchdog clears the register after about 10 seconds after boot. Most machines do not tend to send external NMIs the first 10 seconds after booting. Our customer saw it because he happened to purposely press his external NMI button to trigger a panic with the nmi_watchdog disabled and the watchdog happened to be disabled because we were debugging a kdump problem. Cheers, Don In any case, the proposed patch seems fine, just needs a better changelog. I see. I'll write that the problem is that any NMI could be robbed by NMI watchdog explicitly. Now only patch title says this explicitly. This is your first comment. Yeah, since that is the actual problem, its good to be clear on that. About CondChgd bit, I cannot write more than I see on actual system. If it's necessary to describe more about CondChgd bit, it would be appreciated if someone tell me more information about it. I think we've found all 2 sentences the SDM has about that and unless someone from Intel is going to come and explain why they wasted precious silicon on this I suppose it will remain a mystery. No need to update on that. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
On Thu, Jun 12, 2014 at 04:00:11PM +0900, HATAYAMA Daisuke wrote: > Also, I checked cpuid on the system with Neharlem processor where I > have never seen CondChg bit is set. > > [root@localhost ~]# ./cpuid -r > CPU 0: >0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e > edx=0x49656e69 >0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd > edx=0xbfebfbff > >0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x > edx=0x0603 > ^^ > So, cpuid tells that CondChg bit is supported on this processor, too. Yeah, I can't remember ever seeing that bit on nhm/wsm either. Weird stuff that. > > In any case, the proposed patch seems fine, just needs a better > > changelog. > > > > I see. > > I'll write that the problem is that any NMI could be robbed by NMI > watchdog explicitly. Now only patch title says this explicitly. This > is your first comment. Yeah, since that is the actual problem, its good to be clear on that. > About CondChgd bit, I cannot write more than I see on actual > system. If it's necessary to describe more about CondChgd bit, it > would be appreciated if someone tell me more information about it. I think we've found all 2 sentences the SDM has about that and unless someone from Intel is going to come and explain why they wasted precious silicon on this I suppose it will remain a mystery. No need to update on that. pgpd7mAscmkpo.pgp Description: PGP signature
Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
From: Peter Zijlstra Subject: Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling Date: Wed, 11 Jun 2014 13:54:13 +0200 > On Wed, Jun 11, 2014 at 10:54:48AM +0200, Peter Zijlstra wrote: >> > I'm not sure about exact behavior of CondChgd bit, in particular when >> > this bit is set. Although I read Intel System Programmer's Manual to >> > figure out but I have yet completed that. At least, I think ignoring >> > CondChgd bit should be enough for NMI watchdog perspective. >> >> So yes, the SDM lists the bit as existing but never once mentions it >> outside of that, and its been doing that at least back to 2008. >> >> Ooh, I found it: >> >> "The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to >> indicate changes to the state of performance monitoring hardware (see >> Figure 18-29)." >> >> Which is of course completely useless, not to mention inconsistent with >> the later CondChgd name. >> >> HPA, can you explain wtf that bit does and why hatayama-san's ivb feels >> like having that set on boot? > > Matt found in the MSR listing for GLOBAL_STATUS: > > 63 CondChg: status bits of this register has changed. If CPUID.0AH: EAX[7:0] > > 0 > > Which brings us to a grand total of 3 different names for this bit. > > If it indeed does what it says on the tin, set every time the status > changes its like the most useless bit ever and I wonder why people > bothered to spend silicon on it. > Yes, I didn't mention in patch description this, but I reached the same conclusion. The description confuses me because the desciption and the behaviour of CondChg bit I see on the actual system is not coincide. Also, I checked cpuid on the system with Neharlem processor where I have never seen CondChg bit is set. [root@localhost ~]# ./cpuid -r CPU 0: 0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69 0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd edx=0xbfebfbff 0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x edx=0x0603 ^^ So, cpuid tells that CondChg bit is supported on this processor, too. > In any case, the proposed patch seems fine, just needs a better > changelog. > I see. I'll write that the problem is that any NMI could be robbed by NMI watchdog explicitly. Now only patch title says this explicitly. This is your first comment. About CondChgd bit, I cannot write more than I see on actual system. If it's necessary to describe more about CondChgd bit, it would be appreciated if someone tell me more information about it. Thanks. HATAYAMA, Daisuke N§²ζμrΈyϊθΨb²X¬ΆΗ§vΨ^)ήΊ{.nΗ+·₯{±κηzX§Ά‘ά¨}©²Ζ zΪ:+v¨Ύ«κηzZ+Κ+zf£’·h§~Ϋi�ϋΰzΉ�w₯’Έ?¨θΪ&’)ί’fω^jΗ«y§m α@A«aΆΪ� 0Άμh�εi
Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
From: Peter Zijlstra Subject: Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling Date: Wed, 11 Jun 2014 10:54:48 +0200 > On Wed, Jun 11, 2014 at 04:30:28PM +0900, HATAYAMA Daisuke wrote: >> Currently, a NMI handler for NMI watchdog may falsely handle any NMI >> signaled for different purpose if CondChgd bit in >> MSR_CORE_PERF_GLOBAL_STATUS MSR is set. >> >> This commit deals with the issue simply by ignoring CondChgd bit. >> >> Here is explanation in detail. >> >> On x86 NMI watchdog uses performance monitoring feature to >> periodically signal NMI each time performance counter gets overflowed. >> >> intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI >> handler of NMI watchdog, perf_event_nmi_handler(). It identifies owner >> of a given NMI by looking at overflow status bits in >> MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it >> handles the given NMI as its own NMI. >> >> The problem is that intel_pmu_handle_irq() doesn't distinguish >> CondChgd bit from other bits. Unlike the other status bits, CondChgd >> bit doesn't represent overflow status for performance counters. Thus, >> CondChgd bit cannot be thought of as a mark indicating a given NMI is >> NMI watchdog's. > > So what was the problem? It ate another NMI? > Yes. The handler for NMI watchdog is called in NMI_LOCAL. This is done in earlier timing than NMI_UNKNOWN, and if some of the handlers in NMI_LOCAL handles at least one NMI, then handlers in NMI_UNKNOWN are not called. Like this: handled = nmi_handle(NMI_LOCAL, regs, b2b); __this_cpu_add(nmi_stats.normal, handled); if (handled) { /* * There are cases when a NMI handler handles multiple * events in the current NMI. One of these events may * be queued for in the next NMI. Because the event is * already handled, the next NMI will result in an unknown * NMI. Instead lets flag this for a potential NMI to * swallow. */ if (handled > 1) __this_cpu_write(swallow_nmi, true); return; } For example, we use unknown NMI to make system panic by enabling kernel.unkown_nmi_panic. Without this fix, unknown NMI is robbed by NMI handler for NMI watchdog. >> I noticed this behavior on systems with Ivy Bridge processors: Intel >> Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems, >> CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set >> in the beginning at boot. (then the CondChgd bit is cleared by next >> wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain 0.) >> >> On the other hand, on older processors such as Nehalem, CondChgd bit >> is not set in the beginning at boot. >> >> I'm not sure about exact behavior of CondChgd bit, in particular when >> this bit is set. Although I read Intel System Programmer's Manual to >> figure out but I have yet completed that. At least, I think ignoring >> CondChgd bit should be enough for NMI watchdog perspective. > > So yes, the SDM lists the bit as existing but never once mentions it > outside of that, and its been doing that at least back to 2008. > > Ooh, I found it: > > "The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to > indicate changes to the state of performance monitoring hardware (see > Figure 18-29)." > > Which is of course completely useless, not to mention inconsistent with > the later CondChgd name. > > HPA, can you explain wtf that bit does and why hatayama-san's ivb feels > like having that set on boot? -- Thanks. HATAYAMA, Daisuke
Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
From: Peter Zijlstra pet...@infradead.org Subject: Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling Date: Wed, 11 Jun 2014 10:54:48 +0200 On Wed, Jun 11, 2014 at 04:30:28PM +0900, HATAYAMA Daisuke wrote: Currently, a NMI handler for NMI watchdog may falsely handle any NMI signaled for different purpose if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set. This commit deals with the issue simply by ignoring CondChgd bit. Here is explanation in detail. On x86 NMI watchdog uses performance monitoring feature to periodically signal NMI each time performance counter gets overflowed. intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI handler of NMI watchdog, perf_event_nmi_handler(). It identifies owner of a given NMI by looking at overflow status bits in MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it handles the given NMI as its own NMI. The problem is that intel_pmu_handle_irq() doesn't distinguish CondChgd bit from other bits. Unlike the other status bits, CondChgd bit doesn't represent overflow status for performance counters. Thus, CondChgd bit cannot be thought of as a mark indicating a given NMI is NMI watchdog's. So what was the problem? It ate another NMI? Yes. The handler for NMI watchdog is called in NMI_LOCAL. This is done in earlier timing than NMI_UNKNOWN, and if some of the handlers in NMI_LOCAL handles at least one NMI, then handlers in NMI_UNKNOWN are not called. Like this: handled = nmi_handle(NMI_LOCAL, regs, b2b); __this_cpu_add(nmi_stats.normal, handled); if (handled) { /* * There are cases when a NMI handler handles multiple * events in the current NMI. One of these events may * be queued for in the next NMI. Because the event is * already handled, the next NMI will result in an unknown * NMI. Instead lets flag this for a potential NMI to * swallow. */ if (handled 1) __this_cpu_write(swallow_nmi, true); return; } For example, we use unknown NMI to make system panic by enabling kernel.unkown_nmi_panic. Without this fix, unknown NMI is robbed by NMI handler for NMI watchdog. I noticed this behavior on systems with Ivy Bridge processors: Intel Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems, CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set in the beginning at boot. (then the CondChgd bit is cleared by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain 0.) On the other hand, on older processors such as Nehalem, CondChgd bit is not set in the beginning at boot. I'm not sure about exact behavior of CondChgd bit, in particular when this bit is set. Although I read Intel System Programmer's Manual to figure out but I have yet completed that. At least, I think ignoring CondChgd bit should be enough for NMI watchdog perspective. So yes, the SDM lists the bit as existing but never once mentions it outside of that, and its been doing that at least back to 2008. Ooh, I found it: The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to indicate changes to the state of performance monitoring hardware (see Figure 18-29). Which is of course completely useless, not to mention inconsistent with the later CondChgd name. HPA, can you explain wtf that bit does and why hatayama-san's ivb feels like having that set on boot? -- Thanks. HATAYAMA, Daisuke
Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
From: Peter Zijlstra pet...@infradead.org Subject: Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling Date: Wed, 11 Jun 2014 13:54:13 +0200 On Wed, Jun 11, 2014 at 10:54:48AM +0200, Peter Zijlstra wrote: I'm not sure about exact behavior of CondChgd bit, in particular when this bit is set. Although I read Intel System Programmer's Manual to figure out but I have yet completed that. At least, I think ignoring CondChgd bit should be enough for NMI watchdog perspective. So yes, the SDM lists the bit as existing but never once mentions it outside of that, and its been doing that at least back to 2008. Ooh, I found it: The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to indicate changes to the state of performance monitoring hardware (see Figure 18-29). Which is of course completely useless, not to mention inconsistent with the later CondChgd name. HPA, can you explain wtf that bit does and why hatayama-san's ivb feels like having that set on boot? Matt found in the MSR listing for GLOBAL_STATUS: 63 CondChg: status bits of this register has changed. If CPUID.0AH: EAX[7:0] 0 Which brings us to a grand total of 3 different names for this bit. If it indeed does what it says on the tin, set every time the status changes its like the most useless bit ever and I wonder why people bothered to spend silicon on it. Yes, I didn't mention in patch description this, but I reached the same conclusion. The description confuses me because the desciption and the behaviour of CondChg bit I see on the actual system is not coincide. Also, I checked cpuid on the system with Neharlem processor where I have never seen CondChg bit is set. [root@localhost ~]# ./cpuid -r CPU 0: 0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69 0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd edx=0xbfebfbff snip 0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x edx=0x0603 ^^ So, cpuid tells that CondChg bit is supported on this processor, too. In any case, the proposed patch seems fine, just needs a better changelog. I see. I'll write that the problem is that any NMI could be robbed by NMI watchdog explicitly. Now only patch title says this explicitly. This is your first comment. About CondChgd bit, I cannot write more than I see on actual system. If it's necessary to describe more about CondChgd bit, it would be appreciated if someone tell me more information about it. Thanks. HATAYAMA, Daisuke N§²ζμrΈyϊθΨb²X¬ΆΗ§vΨ^)ήΊ{.nΗ+·₯{±κηzX§Ά‘ά¨}©²Ζ zΪj:+v¨Ύ«κηzZ+Κ+zf£’·h§~Ϋi�ϋΰzΉ�w₯’Έ?¨θΪ’)ί’fω^jΗ«y§m α@A«aΆΪ� 0Άμh�εi
Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
On Thu, Jun 12, 2014 at 04:00:11PM +0900, HATAYAMA Daisuke wrote: Also, I checked cpuid on the system with Neharlem processor where I have never seen CondChg bit is set. [root@localhost ~]# ./cpuid -r CPU 0: 0x 0x00: eax=0x000b ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69 0x0001 0x00: eax=0x000206e6 ebx=0x40200800 ecx=0x00bce3bd edx=0xbfebfbff snip 0x000a 0x00: eax=0x07300403 ebx=0x0044 ecx=0x edx=0x0603 ^^ So, cpuid tells that CondChg bit is supported on this processor, too. Yeah, I can't remember ever seeing that bit on nhm/wsm either. Weird stuff that. In any case, the proposed patch seems fine, just needs a better changelog. I see. I'll write that the problem is that any NMI could be robbed by NMI watchdog explicitly. Now only patch title says this explicitly. This is your first comment. Yeah, since that is the actual problem, its good to be clear on that. About CondChgd bit, I cannot write more than I see on actual system. If it's necessary to describe more about CondChgd bit, it would be appreciated if someone tell me more information about it. I think we've found all 2 sentences the SDM has about that and unless someone from Intel is going to come and explain why they wasted precious silicon on this I suppose it will remain a mystery. No need to update on that. pgpd7mAscmkpo.pgp Description: PGP signature
Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
On Wed, 11 Jun, at 01:54:13PM, Peter Zijlstra wrote: > > Matt found in the MSR listing for GLOBAL_STATUS: > > 63 CondChg: status bits of this register has changed. If CPUID.0AH: EAX[7:0] > > 0 > > Which brings us to a grand total of 3 different names for this bit. Right, I'm flinging emails around to get this fixed in the SDM, so that the answer to, Q: "How many kernel developers does it take to find the definition of this bit?" is less than 4. -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
On Wed, Jun 11, 2014 at 10:54:48AM +0200, Peter Zijlstra wrote: > > I'm not sure about exact behavior of CondChgd bit, in particular when > > this bit is set. Although I read Intel System Programmer's Manual to > > figure out but I have yet completed that. At least, I think ignoring > > CondChgd bit should be enough for NMI watchdog perspective. > > So yes, the SDM lists the bit as existing but never once mentions it > outside of that, and its been doing that at least back to 2008. > > Ooh, I found it: > > "The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to > indicate changes to the state of performance monitoring hardware (see > Figure 18-29)." > > Which is of course completely useless, not to mention inconsistent with > the later CondChgd name. > > HPA, can you explain wtf that bit does and why hatayama-san's ivb feels > like having that set on boot? Matt found in the MSR listing for GLOBAL_STATUS: 63 CondChg: status bits of this register has changed. If CPUID.0AH: EAX[7:0] > 0 Which brings us to a grand total of 3 different names for this bit. If it indeed does what it says on the tin, set every time the status changes its like the most useless bit ever and I wonder why people bothered to spend silicon on it. In any case, the proposed patch seems fine, just needs a better changelog. pgp0LX_msu2vf.pgp Description: PGP signature
Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
On Wed, Jun 11, 2014 at 04:30:28PM +0900, HATAYAMA Daisuke wrote: > Currently, a NMI handler for NMI watchdog may falsely handle any NMI > signaled for different purpose if CondChgd bit in > MSR_CORE_PERF_GLOBAL_STATUS MSR is set. > > This commit deals with the issue simply by ignoring CondChgd bit. > > Here is explanation in detail. > > On x86 NMI watchdog uses performance monitoring feature to > periodically signal NMI each time performance counter gets overflowed. > > intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI > handler of NMI watchdog, perf_event_nmi_handler(). It identifies owner > of a given NMI by looking at overflow status bits in > MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it > handles the given NMI as its own NMI. > > The problem is that intel_pmu_handle_irq() doesn't distinguish > CondChgd bit from other bits. Unlike the other status bits, CondChgd > bit doesn't represent overflow status for performance counters. Thus, > CondChgd bit cannot be thought of as a mark indicating a given NMI is > NMI watchdog's. So what was the problem? It ate another NMI? > I noticed this behavior on systems with Ivy Bridge processors: Intel > Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems, > CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set > in the beginning at boot. (then the CondChgd bit is cleared by next > wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain 0.) > > On the other hand, on older processors such as Nehalem, CondChgd bit > is not set in the beginning at boot. > > I'm not sure about exact behavior of CondChgd bit, in particular when > this bit is set. Although I read Intel System Programmer's Manual to > figure out but I have yet completed that. At least, I think ignoring > CondChgd bit should be enough for NMI watchdog perspective. So yes, the SDM lists the bit as existing but never once mentions it outside of that, and its been doing that at least back to 2008. Ooh, I found it: "The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to indicate changes to the state of performance monitoring hardware (see Figure 18-29)." Which is of course completely useless, not to mention inconsistent with the later CondChgd name. HPA, can you explain wtf that bit does and why hatayama-san's ivb feels like having that set on boot? pgp50ejCGf1_9.pgp Description: PGP signature
Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
On Wed, Jun 11, 2014 at 04:30:28PM +0900, HATAYAMA Daisuke wrote: Currently, a NMI handler for NMI watchdog may falsely handle any NMI signaled for different purpose if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set. This commit deals with the issue simply by ignoring CondChgd bit. Here is explanation in detail. On x86 NMI watchdog uses performance monitoring feature to periodically signal NMI each time performance counter gets overflowed. intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI handler of NMI watchdog, perf_event_nmi_handler(). It identifies owner of a given NMI by looking at overflow status bits in MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it handles the given NMI as its own NMI. The problem is that intel_pmu_handle_irq() doesn't distinguish CondChgd bit from other bits. Unlike the other status bits, CondChgd bit doesn't represent overflow status for performance counters. Thus, CondChgd bit cannot be thought of as a mark indicating a given NMI is NMI watchdog's. So what was the problem? It ate another NMI? I noticed this behavior on systems with Ivy Bridge processors: Intel Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems, CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set in the beginning at boot. (then the CondChgd bit is cleared by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain 0.) On the other hand, on older processors such as Nehalem, CondChgd bit is not set in the beginning at boot. I'm not sure about exact behavior of CondChgd bit, in particular when this bit is set. Although I read Intel System Programmer's Manual to figure out but I have yet completed that. At least, I think ignoring CondChgd bit should be enough for NMI watchdog perspective. So yes, the SDM lists the bit as existing but never once mentions it outside of that, and its been doing that at least back to 2008. Ooh, I found it: The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to indicate changes to the state of performance monitoring hardware (see Figure 18-29). Which is of course completely useless, not to mention inconsistent with the later CondChgd name. HPA, can you explain wtf that bit does and why hatayama-san's ivb feels like having that set on boot? pgp50ejCGf1_9.pgp Description: PGP signature
Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
On Wed, Jun 11, 2014 at 10:54:48AM +0200, Peter Zijlstra wrote: I'm not sure about exact behavior of CondChgd bit, in particular when this bit is set. Although I read Intel System Programmer's Manual to figure out but I have yet completed that. At least, I think ignoring CondChgd bit should be enough for NMI watchdog perspective. So yes, the SDM lists the bit as existing but never once mentions it outside of that, and its been doing that at least back to 2008. Ooh, I found it: The IA32_PERF_GLOBAL_STATUS MSR also provides a ‘sticky bit’ to indicate changes to the state of performance monitoring hardware (see Figure 18-29). Which is of course completely useless, not to mention inconsistent with the later CondChgd name. HPA, can you explain wtf that bit does and why hatayama-san's ivb feels like having that set on boot? Matt found in the MSR listing for GLOBAL_STATUS: 63 CondChg: status bits of this register has changed. If CPUID.0AH: EAX[7:0] 0 Which brings us to a grand total of 3 different names for this bit. If it indeed does what it says on the tin, set every time the status changes its like the most useless bit ever and I wonder why people bothered to spend silicon on it. In any case, the proposed patch seems fine, just needs a better changelog. pgp0LX_msu2vf.pgp Description: PGP signature
Re: [PATCH] perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
On Wed, 11 Jun, at 01:54:13PM, Peter Zijlstra wrote: Matt found in the MSR listing for GLOBAL_STATUS: 63 CondChg: status bits of this register has changed. If CPUID.0AH: EAX[7:0] 0 Which brings us to a grand total of 3 different names for this bit. Right, I'm flinging emails around to get this fixed in the SDM, so that the answer to, Q: How many kernel developers does it take to find the definition of this bit? is less than 4. -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/