Re: [RFC PATCH 3/3] Convert mce_disabled

2012-10-12 Thread Naveen N. Rao
On 10/10/2012 07:50 PM, Borislav Petkov wrote: From: Borislav Petkov borislav.pet...@amd.com Not-Signed-off-by: Borislav Petkov borislav.pet...@amd.com --- arch/x86/include/asm/mce.h | 9 + arch/x86/kernel/cpu/mcheck/mce.c | 12 +--- arch/x86/lguest/boot.c |

Re: [RFC PATCH 3/3] Convert mce_disabled

2012-10-14 Thread Naveen N. Rao
On 10/12/2012 05:26 PM, Borislav Petkov wrote: On Fri, Oct 12, 2012 at 04:20:40PM +0530, Naveen N. Rao wrote: Hi Boris, Thanks for getting to this before I could! Ah ok, I thought you wasn't interested in doing this anymore :). Sorry - just got sidetracked a bit, I'm afraid :) I had

Re: [PATCH v3] x86/mce: Honour bios-set CMCI threshold

2012-10-17 Thread Naveen N. Rao
On 10/17/2012 04:29 PM, Borislav Petkov wrote: +static struct dev_ext_attribute dev_attr_bios_cmci_threshold = { + __ATTR(bios_cmci_threshold, 0444, device_show_int, NULL), + mce_bios_cmci_threshold Ok, I just noticed this (we must've missed it during review) but why is this

Re: [PATCH 2/5] x86, MCA: Convert dont_log_ce, banks and tolerant

2012-10-17 Thread Naveen N. Rao
Apart from a few nits below, patch series: Acked-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com Regards, Naveen On 10/17/2012 04:43 PM, Borislav Petkov wrote: From: Borislav Petkov borislav.pet...@amd.com Move those MCA configuration variables into struct mca_config and adjust the places

Re: [PATCH v3] x86/mce: Honour bios-set CMCI threshold

2012-10-17 Thread Naveen N. Rao
On 10/17/2012 06:39 PM, Borislav Petkov wrote: On Wed, Oct 17, 2012 at 04:57:30PM +0530, Naveen N. Rao wrote: On 10/17/2012 04:29 PM, Borislav Petkov wrote: +static struct dev_ext_attribute dev_attr_bios_cmci_threshold = { + __ATTR(bios_cmci_threshold, 0444, device_show_int, NULL

Re: [PATCH v3] x86/mce: Honour bios-set CMCI threshold

2012-10-17 Thread Naveen N. Rao
On 10/17/2012 10:58 PM, Luck, Tony wrote: BUT (squared) do you even really need to know that thresholds were set? You could look at bits {52:38} in the MCi_STATUS information for the bank to see how many corrected errors had been logged. Ah, nice. I think we should be able to use this instead

[PATCH v2 RESEND] Hardware breakpoints: Invoke __perf_event_disable() if interrupts are already disabled

2012-07-18 Thread Naveen N. Rao
interrupts are already disabled, instead of perf_event_disable(). Reported-by: Edjunior Barbosa Machado emach...@linux.vnet.ibm.com Signed-off-by: K.Prasad prasad.krish...@gmail.com Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- include/linux/perf_event.h|2 ++ kernel/events

[PATCH RESEND] PPC Hardware Breakpoints: Fix incorrect pointer access

2012-07-18 Thread Naveen N. Rao
If arch_validate_hwbkpt_settings() fails, bp-ctx won't be valid and the kernel panics. Add a check to fix this. Reported-by: Edjunior Barbosa Machado emach...@linux.vnet.ibm.com Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- arch/powerpc/kernel/hw_breakpoint.c |2 +- 1 file

Re: [PATCH v2 RESEND] Hardware breakpoints: Invoke __perf_event_disable() if interrupts are already disabled

2012-07-19 Thread Naveen N. Rao
On 07/18/2012 05:27 PM, Frederic Weisbecker wrote: On Wed, Jul 18, 2012 at 04:00:46PM +0530, Naveen N. Rao wrote: Please find v2 of the patch from Prasad, based on Peter Zijlstra's feedback. This applies on top of v3.5-rc7. This has been tested and found to work fine by Edjunior. Regards

Re: [PATCH] [mcelog] Start using the new sysfs tunables location

2012-09-06 Thread Naveen N. Rao
On 09/06/2012 12:39 AM, Tony Luck wrote: On Wed, Sep 5, 2012 at 11:47 AM, Andi Kleen a...@firstfloor.org wrote: On Wed, Sep 05, 2012 at 04:02:37PM +0530, Naveen N. Rao wrote: All the current mce tunables are now available under /sys/devices/system/machinecheck. Start using this new location

Re: [PATCH 2/3] x86/mce: Pack boolean MCE flags into a structure

2012-09-06 Thread Naveen N. Rao
On 09/06/2012 12:26 AM, Tony Luck wrote: On Wed, Sep 5, 2012 at 3:22 AM, Naveen N. Rao naveen.n@linux.vnet.ibm.com wrote: Many MCE flags are boolean in nature, but are declared as integers currently. We can pack these into a bitfield to save some space. Before this patch: size arch/x86

Re: [PATCH] [mcelog] Start using the new sysfs tunables location

2012-09-06 Thread Naveen N. Rao
On 09/06/2012 05:58 PM, Andi Kleen wrote: The change is still under discussion. Stage one is to add the new global pathnames in addition to keeping the old per-cpu ones. Also fix all utilities (just mcelog(8) as far as we know) to prefer the new paths. But why do you even want to change it?

[PATCH v2] x86/mce: Honour bios-set CMCI threshold

2012-09-10 Thread Naveen N. Rao
, we initialize threshold to 1 if some banks have not been initialized by the bios and warn the user. v2: Just separating out the patch. I will send a separate patch for consolidating the MCE boot flags. Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- Documentation/x86/x86_64/boot

Re: [PATCH v3] x86/mce: Honour bios-set CMCI threshold

2012-09-21 Thread Naveen N. Rao
Hi Tony, Can you kindly take in this patch if there are no further comments? Thanks, Naveen On 09/12/2012 05:55 PM, Naveen N. Rao wrote: The ACPI spec doesn't provide for a way for the bios to pass down recommended thresholds to the OS on a _per-bank_ basis. This patch adds a new boot option

[PATCH v3] x86/mce: Honour bios-set CMCI threshold

2012-09-12 Thread Naveen N. Rao
, we initialize threshold to 1 if some banks have not been initialized by the bios and warn the user. v3: Updated messages as per Tony's inputs. v2: Just separating out the patch. I will send a separate patch for consolidating the MCE boot flags. Signed-off-by: Naveen N. Rao naveen.n

Re: [PATCH v3] Hardware breakpoints: Invoke __perf_event_disable() if interrupts are already disabled

2012-08-28 Thread Naveen N. Rao
On 08/16/2012 01:46 PM, Peter Zijlstra wrote: On Wed, 2012-08-15 at 20:42 +0200, Frederic Weisbecker wrote: On Wed, Aug 15, 2012 at 11:07:01PM +0530, Naveen N. Rao wrote: Hi Frederick, Did you get a chance to take a look at this? Regards, Naveen Yeah, I'm ok with the patch. Peter, are you

[PATCH RFC] x86/mce: Move MCE sysfs attributes out of the per-cpu location

2012-08-29 Thread Naveen N. Rao
to be updated to use the new path. However, if we ever get to a point where cpu0 can be offlined, these tools will need to be updated anyway (as they mostly hardcode machinecheck0 currently) Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- arch/x86/kernel/cpu/mcheck/mce.c | 46

Re: [PATCH RFC] x86/mce: Move MCE sysfs attributes out of the per-cpu location

2012-08-29 Thread Naveen N. Rao
On 08/29/2012 03:43 PM, Borislav Petkov wrote: On Wed, Aug 29, 2012 at 01:11:55PM +0530, Naveen N. Rao wrote: All the MCE attributes currently exported via sysfs appear under /sys/devices/system/machinecheck/machinecheckn/. Pretty much all of these are global in nature and not specific

Re: [PATCH RFC] x86/mce: Move MCE sysfs attributes out of the per-cpu location

2012-08-29 Thread Naveen N. Rao
On 08/29/2012 04:10 PM, Borislav Petkov wrote: On Wed, Aug 29, 2012 at 03:56:04PM +0530, Naveen N. Rao wrote: Hmmm.. Can't we just deprecate these? ;) Perhaps we can consider adding newer tunables in the right place. In case you haven't noticed yet: I'm all on your side. Yup, I know :) I

Re: [PATCH RFC] x86/mce: Move MCE sysfs attributes out of the per-cpu location

2012-08-30 Thread Naveen N. Rao
On 08/29/2012 08:13 PM, Luck, Tony wrote: Note: I'm not sure if it's ok to change sysfs entries and this does break userspace tools that depend on the current path for some of these attributes. So, they will need to be updated to use the new path. However, if we ever get to a point where cpu0

[PATCH 0/3] x86:mce: Some cleanups and bios-set CMCI thresholds

2012-09-05 Thread Naveen N. Rao
of -tip. Thanks, Naveen --- Naveen N. Rao (3): x86/mce: Make sysfs tunables available globally across all cpus x86/mce: Pack boolean MCE flags into a structure x86/mce: Honour bios-set CMCI threshold Documentation/x86/x86_64/boot-options.txt |5 + Documentation/x86/x86_64

[PATCH 2/3] x86/mce: Pack boolean MCE flags into a structure

2012-09-05 Thread Naveen N. Rao
Many MCE flags are boolean in nature, but are declared as integers currently. We can pack these into a bitfield to save some space. Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- arch/x86/include/asm/mce.h|2 - arch/x86/kernel/cpu/mcheck/mce-internal.h

[PATCH 3/3] x86/mce: Honour bios-set CMCI threshold

2012-09-05 Thread Naveen N. Rao
, we initialize threshold to 1 if some banks have not been initialized by the bios and warn the user. Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- Documentation/x86/x86_64/boot-options.txt |5 arch/x86/kernel/cpu/mcheck/mce-internal.h |3 +- arch/x86/kernel/cpu

[PATCH 1/3] x86/mce: Make sysfs tunables available globally across all cpus

2012-09-05 Thread Naveen N. Rao
documentation to also point to the new location so that user-space tools can pick up on the new location. We would eventually want to remove these from the per-cpu location. Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- Documentation/x86/x86_64/machinecheck |4 ++-- arch/x86

[PATCH] [mcelog] Start using the new sysfs tunables location

2012-09-05 Thread Naveen N. Rao
All the current mce tunables are now available under /sys/devices/system/machinecheck. Start using this new location, but fall back to the older per-cpu location so that we continue working with older kernels. Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- README |2

Re: [PATCH v3] Hardware breakpoints: Invoke __perf_event_disable() if interrupts are already disabled

2012-08-15 Thread Naveen N. Rao
Hi Frederick, Did you get a chance to take a look at this? Regards, Naveen On 08/02/2012 01:46 PM, Naveen N. Rao wrote: Hi Frederick, I've added a check to make sure we are targeting the current task. This applies on top of v3.5. Kindly review. Thanks, Naveen History: v3: Added check to make

[PATCH] x86: mce: Honour bios-set CMCI threshold

2012-08-22 Thread Naveen N. Rao
, we initialize threshold to 1 if some banks have not been initialized by the bios and warn the user. Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- Documentation/x86/x86_64/boot-options.txt |5 arch/x86/include/asm/mce.h|1 + arch/x86/kernel/cpu

Re: [PATCH] x86: mce: Honour bios-set CMCI threshold

2012-08-23 Thread Naveen N. Rao
On 08/22/2012 06:16 PM, Borislav Petkov wrote: On Wed, Aug 22, 2012 at 06:00:54PM +0530, Naveen N. Rao wrote: The ACPI spec doesn't provide for a way for the bios to pass down recommended thresholds to the OS on a _per-bank_ basis. This patch adds a new boot option, which if passed, allows bios

Re: [PATCH] x86: mce: Honour bios-set CMCI threshold

2012-08-26 Thread Naveen N. Rao
On 08/22/2012 06:16 PM, Borislav Petkov wrote: On Wed, Aug 22, 2012 at 06:00:54PM +0530, Naveen N. Rao wrote: The ACPI spec doesn't provide for a way for the bios to pass down recommended thresholds to the OS on a _per-bank_ basis. This patch adds a new boot option, which if passed, allows bios

Re: [PATCH] x86: mce: Honour bios-set CMCI threshold

2012-08-27 Thread Naveen N. Rao
On 08/27/2012 02:42 PM, Borislav Petkov wrote: On Thu, Aug 23, 2012 at 05:26:09PM +0530, Naveen N. Rao wrote: Sure - sounds like a good idea. Further, a #define could eliminate the need to change other references, but I'm not sure that's GENERALLacceptable #define mce_bios_cmci_threshold

[PATCH 1/2] x86/mce: Pack boolean MCE boot flags into a structure

2012-08-27 Thread Naveen N. Rao
Many MCE boot flags are boolean in nature, but are declared as integers currently. We can pack these into a bitfield to save some space. Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- arch/x86/include/asm/mce.h | 11 +++- arch/x86/kernel/cpu/mcheck/mce.c

[PATCH 2/2] x86/mce: Honour bios-set CMCI threshold

2012-08-27 Thread Naveen N. Rao
, we initialize threshold to 1 if some banks have not been initialized by the bios and warn the user. Changes: - Use the mce_boot_flags structure. - Expose bios_cmci_threshold via sysfs. Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- Documentation/x86/x86_64/boot-options.txt

Re: [PATCH 2/2] x86/mce: Honour bios-set CMCI threshold

2012-08-27 Thread Naveen N. Rao
On 08/27/2012 08:18 PM, Borislav Petkov wrote: On Mon, Aug 27, 2012 at 04:55:12PM +0530, Naveen N. Rao wrote: The ACPI spec doesn't provide for a way for the bios to pass down recommended thresholds to the OS on a _per-bank_ basis. This patch adds a new boot option, which if passed, allows bios

Re: [PATCH 1/2] x86/mce: Pack boolean MCE boot flags into a structure

2012-08-27 Thread Naveen N. Rao
On 08/27/2012 08:06 PM, Borislav Petkov wrote: On Mon, Aug 27, 2012 at 04:55:03PM +0530, Naveen N. Rao wrote: Many MCE boot flags are boolean in nature, but are declared as integers currently. We can pack these into a bitfield to save some space. Signed-off-by: Naveen N. Rao naveen.n

Re: [PATCH 1/2] x86/mce: Pack boolean MCE boot flags into a structure

2012-08-27 Thread Naveen N. Rao
On 08/27/2012 09:17 PM, Borislav Petkov wrote: On Mon, Aug 27, 2012 at 09:05:46PM +0530, Naveen N. Rao wrote: + +extern struct mce_boot_flags mce_boot_flags; Why do we need that extern thing? So that this is visible across mce.c and mce_intel.c? Ok. But if you move the struct to mce

Re: [PATCH 1/2] x86/mce: Pack boolean MCE boot flags into a structure

2012-08-27 Thread Naveen N. Rao
On 08/27/2012 10:04 PM, Borislav Petkov wrote: On Mon, Aug 27, 2012 at 09:31:11PM +0530, Naveen N. Rao wrote: On 08/27/2012 09:17 PM, Borislav Petkov wrote: On Mon, Aug 27, 2012 at 09:05:46PM +0530, Naveen N. Rao wrote: + +extern struct mce_boot_flags mce_boot_flags; Why do we need

Re: [PATCH 1/2] x86/mce: Pack boolean MCE boot flags into a structure

2012-08-28 Thread Naveen N. Rao
On 08/27/2012 07:48 PM, Borislav Petkov wrote: On Mon, Aug 27, 2012 at 03:58:59PM +0200, Andi Kleen wrote: On Mon, Aug 27, 2012 at 04:55:03PM +0530, Naveen N. Rao wrote: Many MCE boot flags are boolean in nature, but are declared as integers currently. We can pack these into a bitfield to save

Re: [PATCH 1/2] x86/mce: Pack boolean MCE boot flags into a structure

2012-08-28 Thread Naveen N. Rao
On 08/28/2012 01:48 AM, Borislav Petkov wrote: On Mon, Aug 27, 2012 at 10:44:40PM +0530, Naveen N. Rao wrote: Looks good. Infact, I had actually added mce_ser and mce_disabled into the bitfield, but backed off not wanting to overdo. We could pull in all the other configuration parameters

[PATCH v3] Hardware breakpoints: Invoke __perf_event_disable() if interrupts are already disabled

2012-08-02 Thread Naveen N. Rao
target current task] Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- include/linux/perf_event.h|2 ++ kernel/events/core.c |2 +- kernel/events/hw_breakpoint.c | 11 ++- 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/include/linux

Re: [PATCH v2 RESEND] Hardware breakpoints: Invoke __perf_event_disable() if interrupts are already disabled

2012-07-25 Thread Naveen N. Rao
On 07/19/2012 04:46 PM, Naveen N. Rao wrote: On 07/18/2012 05:27 PM, Frederic Weisbecker wrote: On Wed, Jul 18, 2012 at 04:00:46PM +0530, Naveen N. Rao wrote: Please find v2 of the patch from Prasad, based on Peter Zijlstra's feedback. This applies on top of v3.5-rc7. This has been tested

Re: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors.

2013-07-25 Thread Naveen N. Rao
...@linux.intel.com, Naveen N. Rao naveen.n@linux.vnet.ibm.com Subject: Re: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors. Gah ... there is another bug in that unaffected thread entry. The check for MCG_STATUS should be for RIPV=1 *and* EIPV=0 I set MCGMASK

Re: [PATCH 0/2] machine check decode fixes

2013-07-25 Thread Naveen N. Rao
parsing 'UC' errors arch/x86/include/asm/mce.h| 13 +++-- arch/x86/kernel/cpu/mcheck/mce-severity.c | 4 ++-- 2 files changed, 13 insertions(+), 4 deletions(-) Series Acked-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com - Naveen -- To unsubscribe from this list

Re: [PATCH] APEI/ERST: Fix error message formatting

2013-07-25 Thread Naveen N. Rao
On 07/24/2013 10:53 PM, Joe Perches wrote: On Wed, 2013-07-24 at 22:43 +0530, Naveen N. Rao wrote: On 2013/07/22 11:01PM, Borislav Petkov wrote: From: Borislav Petkov b...@suse.de [5.525861] ERST: Can not request iomem region 0xc7eff000-0x c7f0 for ERST. This needs

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-13 Thread Naveen N. Rao
On 08/12/2013 06:23 PM, Borislav Petkov wrote: On Mon, Aug 12, 2013 at 06:11:49PM +0530, Naveen N. Rao wrote: So, I looked at ghes_edac and it basically seems to boil down to trace_mc_event. But, this only seems to expose the APEI data as a string and doesn't look to really make all the fields

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-13 Thread Naveen N. Rao
On 08/12/2013 11:26 PM, Borislav Petkov wrote: On Mon, Aug 12, 2013 at 02:25:57PM -0300, Mauro Carvalho Chehab wrote: Userspace still needs the EDAC sysfs, in order to identify how the memory is organized, and do the proper memory labels association. What edac_ghes does is to fill those sysfs

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-13 Thread Naveen N. Rao
On 08/12/2013 08:14 PM, Mauro Carvalho Chehab wrote: But, this only seems to expose the APEI data as a string and doesn't look to really make all the fields available to user-space in a raw manner. Not sure how well this can be utilised by a user-space tool. Do you have suggestions on how we can

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-13 Thread Naveen N. Rao
On 08/13/2013 05:51 PM, Mauro Carvalho Chehab wrote: Em Tue, 13 Aug 2013 17:06:14 +0530 Naveen N. Rao naveen.n@linux.vnet.ibm.com escreveu: On 08/12/2013 11:26 PM, Borislav Petkov wrote: On Mon, Aug 12, 2013 at 02:25:57PM -0300, Mauro Carvalho Chehab wrote: Userspace still needs the EDAC

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-13 Thread Naveen N. Rao
On 08/13/2013 06:11 PM, Mauro Carvalho Chehab wrote: Em Tue, 13 Aug 2013 17:11:18 +0530 Naveen N. Rao naveen.n@linux.vnet.ibm.com escreveu: On 08/12/2013 08:14 PM, Mauro Carvalho Chehab wrote: But, this only seems to expose the APEI data as a string and doesn't look to really make all

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-13 Thread Naveen N. Rao
On 08/13/2013 06:12 PM, Borislav Petkov wrote: On Tue, Aug 13, 2013 at 04:51:33PM +0530, Naveen N. Rao wrote: You're right - my trace point makes all the data provided by apei as-is to userspace. However, ghes_edac seems to squash some of this data into a string when reporting through mc_event

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-14 Thread Naveen N. Rao
On 08/13/2013 11:09 PM, Luck, Tony wrote: In the meantime, like Boris suggests, I think we can have a different trace event for raw APEI reports - userspace can use it as it pleases. Once ghes_edac gets better, users can decide whether they want raw APEI reports or the EDAC-processed version

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-14 Thread Naveen N. Rao
On 08/13/2013 11:28 PM, Borislav Petkov wrote: On Tue, Aug 13, 2013 at 11:02:08PM +0530, Naveen N. Rao wrote: If I'm not mistaken, even for systems that have EDAC drivers, it looks to me like EDAC can't really decode to the DIMM given what is provided by the bios in the APEI report currently

Re: [PATCH 4] mce: acpi/apei: Add a sysctl to control page offlining on firmware report

2013-07-10 Thread Naveen N. Rao
On 07/09/2013 01:56 AM, Luck, Tony wrote: I'm happy with just the acpi=nocmcff to avoid a BIOS that does weird stuff. Or do you think we might still have to deal with a string of APEI messages? Agreed - and I don't think this patch can help with a string of APEI messages either. So yes, I

Re: [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification

2013-07-10 Thread Naveen N. Rao
failure scenarios. Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- drivers/acpi/apei/ghes.c | 38 +- include/linux/mm.h |1 + mm/memory-failure.c |5 - 3 files changed, 34 insertions(+), 10 deletions(-) diff --git

Re: [PATCH] Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in

2013-06-19 Thread Naveen N. Rao
On 06/19/2013 03:59 AM, Tony Luck wrote: On Mon, Jun 17, 2013 at 11:43 PM, Naveen N. Rao naveen.n@linux.vnet.ibm.com wrote: + if (bank = mca_cfg.banks) { + pr_info(mce_disable_bank: Invalid MCA bank %d ignored.\n, bank); Let's have a FW_BUG in that message to point

[PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-19 Thread Naveen N. Rao
through a boot option. Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- arch/x86/include/asm/mce.h|3 ++ arch/x86/kernel/cpu/mcheck/mce-internal.h |3 ++ arch/x86/kernel/cpu/mcheck/mce.c | 25 ++ arch/x86/kernel/cpu/mcheck/mce_intel.c

[PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors

2013-06-19 Thread Naveen N. Rao
Add a boot option to disable firmware first mode for corrected errors. Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- Documentation/x86/x86_64/boot-options.txt |5 + arch/x86/include/asm/acpi.h |2 ++ arch/x86/kernel/acpi/boot.c |5

Re: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors

2013-06-19 Thread Naveen N. Rao
On 06/19/2013 11:34 PM, Borislav Petkov wrote: On Wed, Jun 19, 2013 at 11:27:42PM +0530, Naveen N. Rao wrote: Add a boot option to disable firmware first mode for corrected errors. Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- Documentation/x86/x86_64/boot-options.txt

Re: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors

2013-06-20 Thread Naveen N. Rao
On 06/20/2013 01:18 PM, Borislav Petkov wrote: On Wed, Jun 19, 2013 at 11:27:42PM +0530, Naveen N. Rao wrote: Add a boot option to disable firmware first mode for corrected errors. Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- Documentation/x86/x86_64/boot-options.txt

Re: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-20 Thread Naveen N. Rao
On 06/20/2013 01:09 PM, Borislav Petkov wrote: On Wed, Jun 19, 2013 at 11:27:17PM +0530, Naveen N. Rao wrote: The Corrected Machine Check structure (CMC) in HEST has a flag which can be set by the firmware to indicate to the OS that it prefers to process the corrected error events first

Re: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-20 Thread Naveen N. Rao
On 06/21/2013 12:59 AM, Borislav Petkov wrote: On Fri, Jun 21, 2013 at 12:38:13AM +0530, Naveen N. Rao wrote: We need this bitfield to prevent enabling CMCI in future cmci_discover() invocations. See usage in cmci_discover() further below. So?! /* Skip banks in firmware first mode

Re: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-20 Thread Naveen N. Rao
On 06/21/2013 02:27 AM, Borislav Petkov wrote: On Fri, Jun 21, 2013 at 01:44:00AM +0530, Naveen N. Rao wrote: This won't work across cpu offline/online, right? We will end up _not_ enabling CMCI on certain banks where we should have. Huh, don't understand. cmci_discover runs on each CPU

Re: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors

2013-06-20 Thread Naveen N. Rao
On 06/20/2013 02:58 AM, Luck, Tony wrote: Ok, where is that semantics? What in a CPER record does say this error should tell you that you need to offline the containing page and I'm telling you this exactly only once? Error Severity 0, i.e. Recoverable? Naveen - this one is for you (or for

Re: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-21 Thread Naveen N. Rao
On 06/21/2013 01:04 PM, Borislav Petkov wrote: On Fri, Jun 21, 2013 at 02:52:25AM +0530, Naveen N. Rao wrote: Exactly, but mce_poll_banks also doesn't have bits set for banks on which CMCI is enabled. Let's say we have a cpu with 2 banks (not shared), none of which work in FF mode. Both

Re: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-21 Thread Naveen N. Rao
On 06/21/2013 02:06 PM, Borislav Petkov wrote: On Fri, Jun 21, 2013 at 01:16:50PM +0530, Naveen N. Rao wrote: Yes, but I'm afraid this won't work either - mce_banks_owned is cleared during cpu offline. This is necessary since a cmci rediscover is triggered on cpu offline, so that if this bank

Re: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors

2013-06-21 Thread Naveen N. Rao
On 06/21/2013 12:57 PM, Borislav Petkov wrote: On Thu, Jun 20, 2013 at 10:11:27PM +, Luck, Tony wrote: - Two, the Generic Error Data Entry (aka UEFI Section Descriptor) has a flag which indicates 'Error Threshold Exceeded'. From the UEFI spec, it looks like we could consider this as an

Re: [PATCH] APEI/ERST: Fix error message formatting

2013-07-31 Thread Naveen N. Rao
On 07/25/2013 11:02 PM, Bjorn Helgaas wrote: On Thu, Jul 25, 2013 at 5:23 AM, Naveen N. Rao naveen.n@linux.vnet.ibm.com wrote: On 07/24/2013 10:53 PM, Joe Perches wrote: On Wed, 2013-07-24 at 22:43 +0530, Naveen N. Rao wrote: On 2013/07/22 11:01PM, Borislav Petkov wrote: From

Re: [PATCH] APEI/ERST: Fix error message formatting

2013-07-31 Thread Naveen N. Rao
On 07/29/2013 08:52 PM, Borislav Petkov wrote: @@ -186,8 +186,8 @@ static int erst_exec_stall(struct apei_exec_context *ctx, if (ctx-value FIRMWARE_MAX_STALL) { if (!in_nmi()) - pr_warning(FW_WARN ERST_PFX - Too long stall

Re: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors.

2013-07-31 Thread Naveen N. Rao
On 07/25/2013 11:31 PM, Luck, Tony wrote: MCESEV( + PANIC, Action required but kernel thread is not continuable, + SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR, MCI_UC_SAR|MCI_ADDR), + MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV,

Re: [PATCH] APEI/ERST: Fix error message formatting

2013-08-01 Thread Naveen N. Rao
On 07/31/2013 11:30 PM, Bjorn Helgaas wrote: On Wed, Jul 31, 2013 at 3:46 AM, Naveen N. Rao naveen.n@linux.vnet.ibm.com wrote: My key question was about why we are using a field width of 10 implying a 32-bit value, rather than a field width of 18 as suggested by the data type

Re: [PATCH] Changes to the ACPI/APEI/EINJ debugfs interface

2013-11-05 Thread Naveen N. Rao
scripts we maintain the old behaviour if flags remains set at zero (or is reset to 0). Signed-off-by: Tony Luck tony.l...@intel.com Patch Acked-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com with a small change below. --- diff --git a/Documentation/acpi/apei/einj.txt b

Re: [PATCH 2/8] ACPI, CPER: Update cper info

2013-10-17 Thread Naveen N. Rao
On 10/16/2013 07:09 AM, Chen Gong wrote: On Tue, Oct 15, 2013 at 11:47:23PM +0530, Naveen N. Rao wrote: Date: Tue, 15 Oct 2013 23:47:23 +0530 From: Naveen N. Rao naveen.n@linux.vnet.ibm.com To: Chen, Gong gong.c...@linux.intel.com Cc: tony.l...@intel.com, b...@alien8.de, linux-kernel

Re: [PATCH v2 6/9] ACPI, APEI, CPER: Add UEFI 2.4 support for memory error

2013-10-17 Thread Naveen N. Rao
| 5 ++--- include/linux/cper.h | 11 +-- 5 files changed, 18 insertions(+), 12 deletions(-) Acked-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com Regards, Naveen -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message

Re: Extended H/W error log driver

2013-10-17 Thread Naveen N. Rao
On 10/16/2013 12:53 AM, Borislav Petkov wrote: On Wed, Oct 16, 2013 at 12:40:40AM +0530, Naveen N. Rao wrote: +2 ;) You're counting for 2 people, huh? That's me raising both my hands :) :-) While at it, I wonder if we're better off calling these Hardware events rather than Hardware

Re: [PATCH 8/8] ACPI / trace: Add trace interface for eMCA driver

2013-10-18 Thread Naveen N. Rao
On 10/16/2013 04:19 PM, Borislav Petkov wrote: Btw, I don't know what's the problem but when I hit reply-to-all to your emails, mutt drops your email address from the To: and makes the CC: list become the To: list. Strange. I'm seeing the same thing. Looking at the headers, Chen Gong's email

Re: [PATCH 2/8] ACPI, CPER: Update cper info

2013-10-18 Thread Naveen N. Rao
Gong, This mail seems to have missed copying you given the header issues. Thanks, Naveen On 10/17/2013 05:51 PM, Naveen N. Rao wrote: On 10/16/2013 07:09 AM, Chen Gong wrote: On Tue, Oct 15, 2013 at 11:47:23PM +0530, Naveen N. Rao wrote: Date: Tue, 15 Oct 2013 23:47:23 +0530 From: Naveen N

Re: [PATCH v3 8/9] ACPI, APEI, CPER: Cleanup CPER memory error output format

2013-10-18 Thread Naveen N. Rao
On 10/18/2013 01:53 PM, Chen, Gong wrote: Keep up only the most important fields for memory error reporting. The detail information will be moved to perf/trace interface. Suggested-by: Tony Luck tony.l...@intel.com Signed-off-by: Chen, Gong gong.c...@linux.intel.com Reviewed-by: Mauro Carvalho

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-18 Thread Naveen N. Rao
On 10/18/2013 01:53 PM, Chen, Gong wrote: This H/W error log driver (a.k.a eMCA driver) is implemented based on http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html After errors are captured, more valuable information can be got via this new

Re: [PATCH v3 2/9] ACPI, CPER: Update cper info

2013-10-18 Thread Naveen N. Rao
On 10/18/2013 01:53 PM, Chen, Gong wrote: To prepare for the following patches and make related definition more clear, update some definitions about CPER. v2 - v1: Update some more definitions suggested by Boris Signed-off-by: Chen, Gong gong.c...@linux.intel.com Acked-by: Borislav Petkov

Re: [PATCH 7/8] ACPI, APEI, CPER: Cleanup CPER memory error output format

2013-10-15 Thread Naveen N. Rao
On 10/14/2013 10:42 PM, Tony Luck wrote: On Mon, Oct 14, 2013 at 3:36 AM, Borislav Petkov b...@alien8.de wrote: On Mon, Oct 14, 2013 at 12:55:00AM -0400, Chen Gong wrote: Because most of data in CPER are empty or unimportant. It is not about whether it is important or not - the question is

Re: [PATCH 7/8] ACPI, APEI, CPER: Cleanup CPER memory error output format

2013-10-15 Thread Naveen N. Rao
On 10/14/2013 10:42 PM, Tony Luck wrote: On Mon, Oct 14, 2013 at 3:36 AM, Borislav Petkov b...@alien8.de wrote: On Mon, Oct 14, 2013 at 12:55:00AM -0400, Chen Gong wrote: Because most of data in CPER are empty or unimportant. It is not about whether it is important or not - the question is

Re: [PATCH 8/8] ACPI / trace: Add trace interface for eMCA driver

2013-10-15 Thread Naveen N. Rao
On 2013/10/11 02:32AM, Chen Gong wrote: Use trace interface to elaborate all H/W error related information. Signed-off-by: Chen, Gong gong.c...@linux.intel.com --- snip +TRACE_EVENT(extlog_mem_event, + TP_PROTO(u32 etype, + char *dimm_loc, + const uuid_le

Re: [PATCH 5/8] ACPI, APEI, CPER: Add UEFI 2.4 support for memory error

2013-10-15 Thread Naveen N. Rao
On 2013/10/11 02:32AM, Chen Gong wrote: In latest UEFI spec(by now it is 2.4) memory error definition for CPER (UEFI 2.4 Appendix N Common Platform Error Record) adds some new fields. These fields help people to locate memory error on actual DIMM location. Original-author: Tony Luck

Re: [PATCH 8/8] ACPI / trace: Add trace interface for eMCA driver

2013-10-15 Thread Naveen N. Rao
On 10/15/2013 10:30 PM, Borislav Petkov wrote: On Tue, Oct 15, 2013 at 10:24:35PM +0530, Naveen N. Rao wrote: On 2013/10/11 02:32AM, Chen Gong wrote: Use trace interface to elaborate all H/W error related information. Signed-off-by: Chen, Gong gong.c...@linux.intel.com --- snip +TRACE_EVENT

Re: [PATCH 2/8] ACPI, CPER: Update cper info

2013-10-15 Thread Naveen N. Rao
On 2013/10/11 02:32AM, Chen Gong wrote: To satisfy the necessary of following patches and make related definition more clear, update some definitions about CPER. No functional changes. Signed-off-by: Chen, Gong gong.c...@linux.intel.com --- drivers/acpi/apei/apei-internal.h | 12 -

Re: [PATCH 4/8] DMI: Parse memory device (type 17) in SMBIOS

2013-10-15 Thread Naveen N. Rao
/kernel/setup.c | 1 + drivers/firmware/dmi_scan.c | 60 + include/linux/dmi.h | 5 4 files changed, 67 insertions(+) Acked-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com -- To unsubscribe from this list: send the line

Re: Extended H/W error log driver

2013-10-15 Thread Naveen N. Rao
On 2013/10/15 09:15AM, Tony Luck wrote: On Tue, Oct 15, 2013 at 2:28 AM, Borislav Petkov b...@alien8.de wrote: We can even add a hint for the user like: Above errors have been corrected by the hardware and require no further action. Btw, this is valid for both dmesg and trace

Re: [PATCH 6/8] ACPI, APEI, CPER: Enhance memory reporting capability

2013-10-15 Thread Naveen N. Rao
gong.c...@linux.intel.com --- drivers/acpi/apei/cper.c | 12 1 file changed, 12 insertions(+) Acked-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com diff --git a/drivers/acpi/apei/cper.c b/drivers/acpi/apei/cper.c index 680230c..2a4389f 100644 --- a/drivers/acpi/apei/cper.c

[PATCH] Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in

2013-06-14 Thread Naveen N. Rao
HEST for corrected machine checks Here's a patch that implements this technique. If the firmware advertises support for firmware first mode in the CMC structure, we disable CMCI and polling for all the MCA banks listed in the CMC structure. - Naveen Signed-off-by: Naveen N. Rao naveen.n

Re: [PATCH] Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in

2013-06-17 Thread Naveen N. Rao
On 06/15/2013 08:18 PM, Borislav Petkov wrote: On Fri, Jun 14, 2013 at 11:47:21PM +0530, Naveen N. Rao wrote: HEST for corrected machine checks Here's a patch that implements this technique. If the firmware advertises support for firmware first mode in the CMC structure, we disable CMCI

Re: [PATCH] Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in

2013-06-17 Thread Naveen N. Rao
On 06/16/2013 05:50 PM, Borislav Petkov wrote: On Fri, Jun 14, 2013 at 11:47:21PM +0530, Naveen N. Rao wrote: +static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, void *data) +{ + int i; + struct acpi_hest_ia_corrected *cmc; + struct acpi_hest_ia_error_bank

Re: [PATCH] Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in

2013-06-17 Thread Naveen N. Rao
On 2013/06/17 09:06AM, Borislav Petkov wrote: On Mon, Jun 17, 2013 at 12:30:05PM +0530, Naveen N. Rao wrote: Hmm, so if CMCI is not supported, you just disabled polling of this bank and returned here. Not good. This is on purpose. If the bank doesn't support CMCI and we were polling

Re: [PATCH] Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in

2013-06-17 Thread Naveen N. Rao
On 06/17/2013 01:51 PM, Borislav Petkov wrote: On Mon, Jun 17, 2013 at 01:41:03PM +0530, Naveen N. Rao wrote: Yes, we used to poll since we do not get notified via MCE/CMCI. However, with firmware first set in CMC structure, the firmware is now controlling all corrected error reporting

Re: [PATCH] Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in

2013-06-18 Thread Naveen N. Rao
which MCA banks function in FF mode, so that we continue to monitor error events on the other banks. - Naveen Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- arch/x86/include/asm/mce.h|3 ++ arch/x86/kernel/cpu/mcheck/mce-internal.h |3 ++ arch/x86/kernel

Re: [PATCH] x86/MCE: Update MCE severity condition check

2013-06-25 Thread Naveen N. Rao
On 2013/06/20 05:16AM, Chen Gong wrote: Update some SRAR severity conditions check to make it clearer, according to latest Intel SDM Vol 3(June 2013), table 15-20. Signed-off-by: Chen Gong gong.c...@linux.intel.com --- arch/x86/kernel/cpu/mcheck/mce-severity.c | 15 +-- 1

Re: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-25 Thread Naveen N. Rao
Tony, Boris, Can you please see if the comments in the below patch include the details you were expecting? Thanks, Naveen -- Add comments to clarify usage of the various bitfields in the MCA subsystem Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com --- arch/x86/kernel/cpu/mcheck

Re: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-25 Thread Naveen N. Rao
Tony, Thanks - I have included your text in the patch. I wasn't sure if I should add your Signed-off-by. Kindly review and do the needful. Thanks, Naveen -- Add comments to clarify usage of the various bitfields in the MCA subsystem Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com

Re: [PATCH] x86/MCE: Update MCE severity condition check

2013-06-25 Thread Naveen N. Rao
On 06/25/2013 10:01 PM, Luck, Tony wrote: The SDM talks about non-affected logical processors, but perhaps we can call this an unaffected thread? unaffected sounds a bit more natural (but close enough to the wording in the SDM that people should see the connection). Yup - unnatural is

Re: [PATCH v3 8/9] ACPI, APEI, CPER: Cleanup CPER memory error output format

2013-10-21 Thread Naveen N. Rao
On 10/19/2013 04:56 PM, Chen Gong wrote: On Fri, Oct 18, 2013 at 05:31:21PM +0530, Naveen N. Rao wrote: Date: Fri, 18 Oct 2013 17:31:21 +0530 From: Naveen N. Rao naveen.n@linux.vnet.ibm.com To: Chen, Gong gong.c...@linux.intel.com, tony.l...@intel.com, b...@alien8.de, j...@perches.com

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-21 Thread Naveen N. Rao
On 10/20/2013 01:51 PM, Borislav Petkov wrote: On Sun, Oct 20, 2013 at 03:06:15AM -0400, Chen Gong wrote: Oh, yes it is. Furthermore, it reminds me where is the best place to put cper.c from I write this patch series. CPER really doesn't dpend on APEI even ACPI. Maybe lib/ ia an option. I can

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-22 Thread Naveen N. Rao
On 10/22/2013 12:33 AM, Luck, Tony wrote: But yes, this is possible and it would make it all even cleaner and simpler by simply not needing the reg/dereg interfaces for mce_ext_err_print but adding it to the chain. So this is on top of the 9 patch series (using the V4 that Chen Gong posted for

  1   2   3   4   5   6   7   8   9   10   >