[RFC PATCH 0/8] Fix perf probe issues on powerpc

2014-12-09 Thread Naveen N. Rao
patch 6. Tested on ppc64 BE and LE. - Naveen Naveen N. Rao (8): kprobes: Fix kallsyms lookup across powerpc ABIv1 and ABIv2 perf probe powerpc: Fix symbol fixup issues due to ELF type perf probe: Improve detection of file/function name in the probe pattern perf probe powerpc: Handle

Re: [PATCH] Changes to the ACPI/APEI/EINJ debugfs interface

2013-11-05 Thread Naveen N. Rao
files param3 and param4 to hold all these values. > > For backwards compatability with old injection scripts we maintain the > old behaviour if flags remains set at zero (or is reset to 0). > > Signed-off-by: Tony Luck Patch Acked-by: Naveen N. Rao with a small change below. > > ---

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-22 Thread Naveen N. Rao
On 10/22/2013 12:33 AM, Luck, Tony wrote: But yes, this is possible and it would make it all even cleaner and simpler by simply not needing the reg/dereg interfaces for mce_ext_err_print but adding it to the chain. So this is on top of the 9 patch series (using the V4 that Chen Gong posted for

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-21 Thread Naveen N. Rao
On 10/20/2013 01:51 PM, Borislav Petkov wrote: On Sun, Oct 20, 2013 at 03:06:15AM -0400, Chen Gong wrote: Oh, yes it is. Furthermore, it reminds me where is the best place to put cper.c from I write this patch series. CPER really doesn't dpend on APEI even ACPI. Maybe lib/ ia an option. I can up

Re: [PATCH v3 8/9] ACPI, APEI, CPER: Cleanup CPER memory error output format

2013-10-21 Thread Naveen N. Rao
On 10/19/2013 04:56 PM, Chen Gong wrote: On Fri, Oct 18, 2013 at 05:31:21PM +0530, Naveen N. Rao wrote: Date: Fri, 18 Oct 2013 17:31:21 +0530 From: "Naveen N. Rao" To: "Chen, Gong" , tony.l...@intel.com, b...@alien8.de, j...@perches.com, m.che...@samsung.com CC: aroza...

Re: [PATCH v3 2/9] ACPI, CPER: Update cper info

2013-10-18 Thread Naveen N. Rao
On 10/18/2013 01:53 PM, Chen, Gong wrote: To prepare for the following patches and make related definition more clear, update some definitions about CPER. v2 -> v1: Update some more definitions suggested by Boris Signed-off-by: Chen, Gong Acked-by: Borislav Petkov Reviewed-by: Mauro Carvalho

Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

2013-10-18 Thread Naveen N. Rao
On 10/18/2013 01:53 PM, Chen, Gong wrote: This H/W error log driver (a.k.a eMCA driver) is implemented based on http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html After errors are captured, more valuable information can be got via this new enh

Re: [PATCH v3 8/9] ACPI, APEI, CPER: Cleanup CPER memory error output format

2013-10-18 Thread Naveen N. Rao
On 10/18/2013 01:53 PM, Chen, Gong wrote: Keep up only the most important fields for memory error reporting. The detail information will be moved to perf/trace interface. Suggested-by: Tony Luck Signed-off-by: Chen, Gong Reviewed-by: Mauro Carvalho Chehab --- drivers/acpi/apei/cper.c | 67 +

Re: [PATCH 2/8] ACPI, CPER: Update cper info

2013-10-18 Thread Naveen N. Rao
Gong, This mail seems to have missed copying you given the header issues. Thanks, Naveen On 10/17/2013 05:51 PM, Naveen N. Rao wrote: On 10/16/2013 07:09 AM, Chen Gong wrote: On Tue, Oct 15, 2013 at 11:47:23PM +0530, Naveen N. Rao wrote: Date: Tue, 15 Oct 2013 23:47:23 +0530 From: "Nav

Re: [PATCH 8/8] ACPI / trace: Add trace interface for eMCA driver

2013-10-18 Thread Naveen N. Rao
On 10/16/2013 04:19 PM, Borislav Petkov wrote: Btw, I don't know what's the problem but when I hit reply-to-all to your emails, mutt drops your email address from the To: and makes the CC: list become the To: list. Strange. I'm seeing the same thing. Looking at the headers, Chen Gong's email i

Re: Extended H/W error log driver

2013-10-17 Thread Naveen N. Rao
On 10/16/2013 12:53 AM, Borislav Petkov wrote: On Wed, Oct 16, 2013 at 12:40:40AM +0530, Naveen N. Rao wrote: +2 ;) You're counting for 2 people, huh? That's me raising both my hands :) :-) While at it, I wonder if we're better off calling these "Hardware events&qu

Re: [PATCH v2 6/9] ACPI, APEI, CPER: Add UEFI 2.4 support for memory error

2013-10-17 Thread Naveen N. Rao
+-- 5 files changed, 18 insertions(+), 12 deletions(-) Acked-by: Naveen N. Rao Regards, Naveen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org

Re: [PATCH 2/8] ACPI, CPER: Update cper info

2013-10-17 Thread Naveen N. Rao
On 10/16/2013 07:09 AM, Chen Gong wrote: On Tue, Oct 15, 2013 at 11:47:23PM +0530, Naveen N. Rao wrote: Date: Tue, 15 Oct 2013 23:47:23 +0530 From: "Naveen N. Rao" To: "Chen, Gong" Cc: tony.l...@intel.com, b...@alien8.de, linux-kernel@vger.kernel.org, linux-a...@vger.ke

Re: [PATCH 6/8] ACPI, APEI, CPER: Enhance memory reporting capability

2013-10-15 Thread Naveen N. Rao
off-by: Chen, Gong > --- > drivers/acpi/apei/cper.c | 12 > 1 file changed, 12 insertions(+) > Acked-by: Naveen N. Rao > diff --git a/drivers/acpi/apei/cper.c b/drivers/acpi/apei/cper.c > index 680230c..2a4389f 100644 > --- a/drivers/acpi/apei/cper.c > +++ b

Re: Extended H/W error log driver

2013-10-15 Thread Naveen N. Rao
On 2013/10/15 09:15AM, Tony Luck wrote: > On Tue, Oct 15, 2013 at 2:28 AM, Borislav Petkov wrote: > > We can even add a hint for the user like: > > > > "Above errors have been corrected by the hardware and require no > > further action." > > > > Btw, this is valid for both dmesg and trace

Re: [PATCH 4/8] DMI: Parse memory device (type 17) in SMBIOS

2013-10-15 Thread Naveen N. Rao
.c | 1 + > drivers/firmware/dmi_scan.c | 60 > + > include/linux/dmi.h | 5 > 4 files changed, 67 insertions(+) Acked-by: Naveen N. Rao -- To unsubscribe from this list: send the line "unsubscribe linux-k

Re: [PATCH 2/8] ACPI, CPER: Update cper info

2013-10-15 Thread Naveen N. Rao
On 2013/10/11 02:32AM, Chen Gong wrote: > To satisfy the necessary of following patches and make related definition > more clear, update some definitions about CPER. No functional changes. > > Signed-off-by: Chen, Gong > --- > drivers/acpi/apei/apei-internal.h | 12 - > drivers/acpi/apei

Re: [PATCH 8/8] ACPI / trace: Add trace interface for eMCA driver

2013-10-15 Thread Naveen N. Rao
On 10/15/2013 10:30 PM, Borislav Petkov wrote: On Tue, Oct 15, 2013 at 10:24:35PM +0530, Naveen N. Rao wrote: On 2013/10/11 02:32AM, Chen Gong wrote: Use trace interface to elaborate all H/W error related information. Signed-off-by: Chen, Gong --- +TRACE_EVENT(extlog_mem_event

Re: [PATCH 5/8] ACPI, APEI, CPER: Add UEFI 2.4 support for memory error

2013-10-15 Thread Naveen N. Rao
On 2013/10/11 02:32AM, Chen Gong wrote: > In latest UEFI spec(by now it is 2.4) memory error definition > for CPER (UEFI 2.4 Appendix N Common Platform Error Record) > adds some new fields. These fields help people to locate > memory error on actual DIMM location. > > Original-author: Tony Luck >

Re: [PATCH 8/8] ACPI / trace: Add trace interface for eMCA driver

2013-10-15 Thread Naveen N. Rao
On 2013/10/11 02:32AM, Chen Gong wrote: > Use trace interface to elaborate all H/W error related > information. > > Signed-off-by: Chen, Gong > --- > +TRACE_EVENT(extlog_mem_event, > + TP_PROTO(u32 etype, > + char *dimm_loc, > + const uuid_le *fru_id, > +

Re: [PATCH 7/8] ACPI, APEI, CPER: Cleanup CPER memory error output format

2013-10-15 Thread Naveen N. Rao
On 10/14/2013 10:42 PM, Tony Luck wrote: On Mon, Oct 14, 2013 at 3:36 AM, Borislav Petkov wrote: On Mon, Oct 14, 2013 at 12:55:00AM -0400, Chen Gong wrote: Because most of data in CPER are empty or unimportant. It is not about whether it is important or not - the question is whether changing

Re: [PATCH 7/8] ACPI, APEI, CPER: Cleanup CPER memory error output format

2013-10-15 Thread Naveen N. Rao
On 10/14/2013 10:42 PM, Tony Luck wrote: On Mon, Oct 14, 2013 at 3:36 AM, Borislav Petkov wrote: On Mon, Oct 14, 2013 at 12:55:00AM -0400, Chen Gong wrote: Because most of data in CPER are empty or unimportant. It is not about whether it is important or not - the question is whether changing

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-14 Thread Naveen N. Rao
On 08/13/2013 11:28 PM, Borislav Petkov wrote: On Tue, Aug 13, 2013 at 11:02:08PM +0530, Naveen N. Rao wrote: If I'm not mistaken, even for systems that have EDAC drivers, it looks to me like EDAC can't really decode to the DIMM given what is provided by the bios in the APEI report

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-14 Thread Naveen N. Rao
On 08/13/2013 11:09 PM, Luck, Tony wrote: In the meantime, like Boris suggests, I think we can have a different trace event for raw APEI reports - userspace can use it as it pleases. Once ghes_edac gets better, users can decide whether they want raw APEI reports or the EDAC-processed version and

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-13 Thread Naveen N. Rao
On 08/13/2013 06:12 PM, Borislav Petkov wrote: On Tue, Aug 13, 2013 at 04:51:33PM +0530, Naveen N. Rao wrote: You're right - my trace point makes all the data provided by apei as-is to userspace. However, ghes_edac seems to squash some of this data into a string when reporting through mc_

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-13 Thread Naveen N. Rao
On 08/13/2013 06:11 PM, Mauro Carvalho Chehab wrote: Em Tue, 13 Aug 2013 17:11:18 +0530 "Naveen N. Rao" escreveu: On 08/12/2013 08:14 PM, Mauro Carvalho Chehab wrote: But, this only seems to expose the APEI data as a string and doesn't look to really make all the fields av

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-13 Thread Naveen N. Rao
On 08/13/2013 05:51 PM, Mauro Carvalho Chehab wrote: Em Tue, 13 Aug 2013 17:06:14 +0530 "Naveen N. Rao" escreveu: On 08/12/2013 11:26 PM, Borislav Petkov wrote: On Mon, Aug 12, 2013 at 02:25:57PM -0300, Mauro Carvalho Chehab wrote: Userspace still needs the EDAC sysfs, in order t

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-13 Thread Naveen N. Rao
On 08/12/2013 08:14 PM, Mauro Carvalho Chehab wrote: But, this only seems to expose the APEI data as a string and doesn't look to really make all the fields available to user-space in a raw manner. Not sure how well this can be utilised by a user-space tool. Do you have suggestions on how we can

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-13 Thread Naveen N. Rao
On 08/12/2013 11:26 PM, Borislav Petkov wrote: On Mon, Aug 12, 2013 at 02:25:57PM -0300, Mauro Carvalho Chehab wrote: Userspace still needs the EDAC sysfs, in order to identify how the memory is organized, and do the proper memory labels association. What edac_ghes does is to fill those sysfs n

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-13 Thread Naveen N. Rao
On 08/12/2013 06:23 PM, Borislav Petkov wrote: On Mon, Aug 12, 2013 at 06:11:49PM +0530, Naveen N. Rao wrote: So, I looked at ghes_edac and it basically seems to boil down to trace_mc_event. But, this only seems to expose the APEI data as a string and doesn't look to really make all the f

Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-12 Thread Naveen N. Rao
On 08/12/2013 05:03 PM, Mauro Carvalho Chehab wrote: Em Sat, 10 Aug 2013 20:03:22 +0200 Borislav Petkov escreveu: On Thu, Aug 08, 2013 at 04:38:22PM -0300, Mauro Carvalho Chehab wrote: Em Thu, 08 Aug 2013 23:57:51 +0530 "Naveen N. Rao" escreveu: Enable memory error trace event

Re: [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally

2013-08-12 Thread Naveen N. Rao
On 08/09/2013 12:53 AM, Steven Rostedt wrote: [ attempting to try out claws-mail, hopefully this messages isn't scrambled ;-) ] Works just fine :) On Thu, 8 Aug 2013 23:57:49 +0530 "Naveen N. Rao" wrote: Since we'll be adding multiple trace events to ras.h, we need t

Re: [PATCH 2/3] mce: acpi/apei: trace: Add trace event for ghes memory error

2013-08-12 Thread Naveen N. Rao
On 08/09/2013 12:47 AM, Borislav Petkov wrote: On Thu, Aug 08, 2013 at 11:57:50PM +0530, Naveen N. Rao wrote: +TRACE_EVENT(ghes_platform_memory_event, + TP_PROTO(const struct acpi_hest_generic_status *estatus, +const struct acpi_hest_generic_data *gdata

[PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event

2013-08-08 Thread Naveen N. Rao
Enable memory error trace event in cper.c Signed-off-by: Naveen N. Rao --- drivers/acpi/apei/cper.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/apei/cper.c b/drivers/acpi/apei/cper.c index 33dc6a0..19a9c0b 100644 --- a/drivers/acpi

[PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally

2013-08-08 Thread Naveen N. Rao
Since we'll be adding multiple trace events to ras.h, we need to protect each block appropriately so that they only get included in the right places. Update PCIe AER trace event for this purpose. Signed-off-by: Naveen N. Rao --- drivers/pci/pcie/aer/aerdrv_errprint.c | 1 + include/trace/e

[PATCH 2/3] mce: acpi/apei: trace: Add trace event for ghes memory error

2013-08-08 Thread Naveen N. Rao
Add a trace event for memory error event from generic hardware error source. We expose all members from the generic error status block, the generic error data and the cper memory error record. Signed-off-by: Naveen N. Rao --- include/trace/events/ras.h | 157

[PATCH 0/3] Add trace event for ghes memory error

2013-08-08 Thread Naveen N. Rao
This patch series adds a new trace event for memory errors reported via APEI generic hardware error source. - Naveen Naveen N. Rao (3): mce: acpi/apei: trace: Include PCIe AER trace event conditionally mce: acpi/apei: trace: Add trace event for ghes memory error mce: acpi/apei: trace

Re: [PATCH] APEI/ERST: Fix error message formatting

2013-08-01 Thread Naveen N. Rao
On 07/31/2013 11:30 PM, Bjorn Helgaas wrote: On Wed, Jul 31, 2013 at 3:46 AM, Naveen N. Rao wrote: My key question was about why we are using a field width of 10 implying a 32-bit value, rather than a field width of 18 as suggested by the data type? This shouldn't truncate the value, b

Re: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors.

2013-07-31 Thread Naveen N. Rao
On 07/25/2013 11:31 PM, Luck, Tony wrote: MCESEV( + PANIC, "Action required but kernel thread is not continuable", + SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR, MCI_UC_SAR|MCI_ADDR), + MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, MCG_STATUS_RIPV|

Re: [PATCH] APEI/ERST: Fix error message formatting

2013-07-31 Thread Naveen N. Rao
On 07/29/2013 08:52 PM, Borislav Petkov wrote: @@ -186,8 +186,8 @@ static int erst_exec_stall(struct apei_exec_context *ctx, if (ctx->value > FIRMWARE_MAX_STALL) { if (!in_nmi()) - pr_warning(FW_WARN ERST_PFX - "Too long stall t

Re: [PATCH] APEI/ERST: Fix error message formatting

2013-07-31 Thread Naveen N. Rao
On 07/25/2013 11:02 PM, Bjorn Helgaas wrote: On Thu, Jul 25, 2013 at 5:23 AM, Naveen N. Rao wrote: On 07/24/2013 10:53 PM, Joe Perches wrote: On Wed, 2013-07-24 at 22:43 +0530, Naveen N. Rao wrote: On 2013/07/22 11:01PM, Borislav Petkov wrote: From: Borislav Petkov [5.525861] ERST

Re: [PATCH] APEI/ERST: Fix error message formatting

2013-07-25 Thread Naveen N. Rao
On 07/24/2013 10:53 PM, Joe Perches wrote: On Wed, 2013-07-24 at 22:43 +0530, Naveen N. Rao wrote: On 2013/07/22 11:01PM, Borislav Petkov wrote: From: Borislav Petkov [5.525861] ERST: Can not request iomem region <0xc7eff000-0x c7f0> for ERST. This needs t

Re: [PATCH 0/2] machine check decode fixes

2013-07-25 Thread Naveen N. Rao
CACOD when parsing 'UC' errors arch/x86/include/asm/mce.h| 13 +++-- arch/x86/kernel/cpu/mcheck/mce-severity.c | 4 ++-- 2 files changed, 13 insertions(+), 4 deletions(-) Series Acked-by: Naveen N. Rao - Naveen -- To unsubscribe from this list: send t

Re: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors.

2013-07-25 Thread Naveen N. Rao
On 07/24/2013 11:46 AM, Chen Gong wrote: > On Tue, Jul 23, 2013 at 03:51:14PM -0700, Tony Luck wrote: >> Date: Tue, 23 Jul 2013 15:51:14 -0700 >> From: Tony Luck >> To: Linux Kernel Mailing List >> Cc: Borislav Petkov , Chen Gong , >> "Naveen N. Rao&quo

Re: [PATCH] APEI/ERST: Fix error message formatting

2013-07-24 Thread Naveen N. Rao
16llx-0x%016llx> for > ERST.\n", > (unsigned long long)erst_erange.base, > (unsigned long long)erst_erange.base + erst_erange.size); > rc = -EIO; Acked-by: Naveen N. Rao While looking at this, I noticed that we seem to be using varying field width

Re: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors.

2013-07-24 Thread Naveen N. Rao
On 2013/07/23 01:34PM, Tony Luck wrote: > The 0x1000 bit of the MCACOD field of machine check MCi_STATUS > registers is only defined for corrected errors (where it means > that hardware may be filtering errors see SDM section 15.9.2.1). > > For uncorrected errors it may, or may not be set - so we

Re: [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification

2013-07-10 Thread Naveen N. Rao
up similar to how we handle memory failure scenarios. Signed-off-by: Naveen N. Rao --- drivers/acpi/apei/ghes.c | 38 +- include/linux/mm.h |1 + mm/memory-failure.c |5 - 3 files changed, 34 insertions(+), 10 deletions(-) diff --git

Re: [PATCH 4] mce: acpi/apei: Add a sysctl to control page offlining on firmware report

2013-07-10 Thread Naveen N. Rao
On 07/09/2013 01:56 AM, Luck, Tony wrote: I'm happy with just the acpi=nocmcff to avoid a BIOS that does weird stuff. Or do you think we might still have to deal with a string of APEI messages? Agreed - and I don't think this patch can help with a string of APEI messages either. So yes, I thi

Re: [PATCH 4] mce: acpi/apei: Add a sysctl to control page offlining on firmware report

2013-07-03 Thread Naveen N. Rao
On 07/03/2013 08:16 PM, Borislav Petkov wrote: On Tue, Jul 02, 2013 at 06:24:00PM +0530, Naveen N. Rao wrote: I am adding another patch here to disable page offlining in case the firmware starts acting up. Thanks, Naveen -- Add a sysctl memory_failure_soft_offline to control what is done on

Re: [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification

2013-07-03 Thread Naveen N. Rao
On 07/03/2013 08:14 PM, Borislav Petkov wrote: On Tue, Jul 02, 2013 at 05:02:48PM +0530, Naveen N. Rao wrote: Here is the updated patch. I also added printk_ratelimit() in line with the rest of the GHES code. Thanks, Naveen -- If the firmware indicates in GHES error data entry that the error

[PATCH 4] mce: acpi/apei: Add a sysctl to control page offlining on firmware report

2013-07-02 Thread Naveen N. Rao
immediately. If set to 0, no action is taken. Signed-off-by: Naveen N. Rao --- Documentation/sysctl/vm.txt | 12 include/linux/mm.h |1 + kernel/sysctl.c |9 + mm/memory-failure.c | 10 +++--- 4 files changed, 29 insertions(+), 3

Re: [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification

2013-07-02 Thread Naveen N. Rao
interrupt context, so we queue this up similar to how we handle memory failure scenarios. Signed-off-by: Naveen N. Rao --- drivers/acpi/apei/ghes.c | 38 +- include/linux/mm.h |1 + mm/memory-failure.c |5 - 3 files changed, 34

Re: [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification

2013-07-02 Thread Naveen N. Rao
On 07/02/2013 04:38 AM, Borislav Petkov wrote: On Mon, Jul 01, 2013 at 09:08:59PM +0530, Naveen N. Rao wrote: If the firmware indicates in GHES error data entry that the error threshold has exceeded for a corrected error event, then we try to soft-offline the page. This could be called in

Re: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors

2013-07-01 Thread Naveen N. Rao
On 07/01/2013 09:08 PM, Borislav Petkov wrote: On Mon, Jul 01, 2013 at 08:37:43PM +0530, Naveen N. Rao wrote: On 06/28/2013 11:01 PM, Tony Luck wrote: + if (sec_sev == GHES_SEV_CORRECTED && + (gdata->flags & CPER_SEC_ERROR_THR

[PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification

2013-07-01 Thread Naveen N. Rao
If the firmware indicates in GHES error data entry that the error threshold has exceeded for a corrected error event, then we try to soft-offline the page. This could be called in interrupt context, so we queue this up similar to how we handle memory failure scenarios. Signed-off-by: Naveen N

[PATCH v3 1/3] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-07-01 Thread Naveen N. Rao
which MCA banks function in FF mode, so that we continue to monitor error events on the other banks. Signed-off-by: Naveen N. Rao --- arch/x86/include/asm/mce.h|3 ++ arch/x86/kernel/cpu/mcheck/mce-internal.h |3 ++ arch/x86/kernel/cpu/mcheck/mce.c | 28

[PATCH v3 0/3] Firmware first mode for corrected errors

2013-07-01 Thread Naveen N. Rao
--- Naveen N. Rao (3): mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC mce: acpi/apei: Add a boot option to disable ff mode for corrected errors mce, acpi/apei: Soft-offline a page on firmware GHES notification Documentation/x86/x86_64/boot

[PATCH v3 2/3] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors

2013-07-01 Thread Naveen N. Rao
Add a boot option to disable firmware first mode for corrected errors. Signed-off-by: Naveen N. Rao --- Documentation/x86/x86_64/boot-options.txt |5 + arch/x86/include/asm/acpi.h |2 ++ arch/x86/kernel/acpi/boot.c |5 + drivers/acpi/apei/hest.c

Re: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors

2013-07-01 Thread Naveen N. Rao
On 06/28/2013 11:01 PM, Tony Luck wrote: + if (sec_sev == GHES_SEV_CORRECTED && + (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED) && + (mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS)) { +

Re: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors

2013-06-28 Thread Naveen N. Rao
. Signed-off-by: Naveen N. Rao --- drivers/acpi/apei/ghes.c |7 ++ include/linux/mm.h |1 + mm/memory-failure.c | 53 ++ 3 files changed, 43 insertions(+), 18 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei

Re: [PATCH] x86/mce: Update MCE severity condition check

2013-06-27 Thread Naveen N. Rao
cked-by" bandwagon - speak now. Yep - looks fine to me. Acked-by: Naveen N. Rao Thanks, Naveen -Tony arch/x86/kernel/cpu/mcheck/mce-severity.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/

Re: [PATCH] x86/MCE: Update MCE severity condition check

2013-06-25 Thread Naveen N. Rao
On 06/25/2013 10:01 PM, Luck, Tony wrote: The SDM talks about "non-affected" logical processors, but perhaps we can call this an "unaffected" thread? "unaffected" sounds a bit more natural (but close enough to the wording in the SDM that people should see the connection). Yup - "unnatural" is

Re: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-25 Thread Naveen N. Rao
Tony, Thanks - I have included your text in the patch. I wasn't sure if I should add your Signed-off-by. Kindly review and do the needful. Thanks, Naveen -- Add comments to clarify usage of the various bitfields in the MCA subsystem Signed-off-by: Naveen N. Rao Acked-by: Borislav P

Re: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-25 Thread Naveen N. Rao
Tony, Boris, Can you please see if the comments in the below patch include the details you were expecting? Thanks, Naveen -- Add comments to clarify usage of the various bitfields in the MCA subsystem Signed-off-by: Naveen N. Rao --- arch/x86/kernel/cpu/mcheck/mce.c |5 - arch

Re: [PATCH] x86/MCE: Update MCE severity condition check

2013-06-24 Thread Naveen N. Rao
On 2013/06/20 05:16AM, Chen Gong wrote: > Update some SRAR severity conditions check to make it clearer, > according to latest Intel SDM Vol 3(June 2013), table 15-20. > > Signed-off-by: Chen Gong > --- > arch/x86/kernel/cpu/mcheck/mce-severity.c | 15 +-- > 1 file changed, 5 inser

Re: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors

2013-06-21 Thread Naveen N. Rao
On 06/21/2013 12:57 PM, Borislav Petkov wrote: On Thu, Jun 20, 2013 at 10:11:27PM +, Luck, Tony wrote: - Two, the Generic Error Data Entry (aka UEFI Section Descriptor) has a flag which indicates 'Error Threshold Exceeded'. From the UEFI spec, it looks like we could consider this as an indic

Re: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-21 Thread Naveen N. Rao
On 06/21/2013 02:06 PM, Borislav Petkov wrote: On Fri, Jun 21, 2013 at 01:16:50PM +0530, Naveen N. Rao wrote: Yes, but I'm afraid this won't work either - mce_banks_owned is cleared during cpu offline. This is necessary since a cmci rediscover is triggered on cpu offline, so that if

Re: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-21 Thread Naveen N. Rao
On 06/21/2013 01:04 PM, Borislav Petkov wrote: On Fri, Jun 21, 2013 at 02:52:25AM +0530, Naveen N. Rao wrote: Exactly, but mce_poll_banks also doesn't have bits set for banks on which CMCI is enabled. Let's say we have a cpu with 2 banks (not shared), none of which work in FF mode.

Re: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-20 Thread Naveen N. Rao
On 06/21/2013 02:27 AM, Borislav Petkov wrote: On Fri, Jun 21, 2013 at 01:44:00AM +0530, Naveen N. Rao wrote: This won't work across cpu offline/online, right? We will end up _not_ enabling CMCI on certain banks where we should have. Huh, don't understand. cmci_discover runs o

Re: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors

2013-06-20 Thread Naveen N. Rao
On 06/20/2013 02:58 AM, Luck, Tony wrote: Ok, where is that semantics? What in a CPER record does say "this error should tell you that you need to offline the containing page and I'm telling you this exactly only once"? Error Severity 0, i.e. Recoverable? Naveen - this one is for you (or for yo

Re: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-20 Thread Naveen N. Rao
On 06/21/2013 12:59 AM, Borislav Petkov wrote: On Fri, Jun 21, 2013 at 12:38:13AM +0530, Naveen N. Rao wrote: We need this bitfield to prevent enabling CMCI in future cmci_discover() invocations. See usage in cmci_discover() further below. So?! /* Skip banks in firmware first mode

Re: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-20 Thread Naveen N. Rao
On 06/20/2013 01:09 PM, Borislav Petkov wrote: On Wed, Jun 19, 2013 at 11:27:17PM +0530, Naveen N. Rao wrote: The Corrected Machine Check structure (CMC) in HEST has a flag which can be set by the firmware to indicate to the OS that it prefers to process the corrected error events first. In

Re: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors

2013-06-20 Thread Naveen N. Rao
On 06/20/2013 01:18 PM, Borislav Petkov wrote: On Wed, Jun 19, 2013 at 11:27:42PM +0530, Naveen N. Rao wrote: Add a boot option to disable firmware first mode for corrected errors. Signed-off-by: Naveen N. Rao --- Documentation/x86/x86_64/boot-options.txt |5 + arch/x86/include/asm

Re: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors

2013-06-19 Thread Naveen N. Rao
On 06/19/2013 11:34 PM, Borislav Petkov wrote: On Wed, Jun 19, 2013 at 11:27:42PM +0530, Naveen N. Rao wrote: Add a boot option to disable firmware first mode for corrected errors. Signed-off-by: Naveen N. Rao --- Documentation/x86/x86_64/boot-options.txt |5 + arch/x86/include/asm

[PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC

2013-06-19 Thread Naveen N. Rao
through a boot option. Signed-off-by: Naveen N. Rao --- arch/x86/include/asm/mce.h|3 ++ arch/x86/kernel/cpu/mcheck/mce-internal.h |3 ++ arch/x86/kernel/cpu/mcheck/mce.c | 25 ++ arch/x86/kernel/cpu/mcheck/mce_intel.c| 40

[PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors

2013-06-19 Thread Naveen N. Rao
Add a boot option to disable firmware first mode for corrected errors. Signed-off-by: Naveen N. Rao --- Documentation/x86/x86_64/boot-options.txt |5 + arch/x86/include/asm/acpi.h |2 ++ arch/x86/kernel/acpi/boot.c |5 + drivers/acpi/apei/hest.c

Re: [PATCH] Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in

2013-06-18 Thread Naveen N. Rao
On 06/19/2013 03:59 AM, Tony Luck wrote: On Mon, Jun 17, 2013 at 11:43 PM, Naveen N. Rao wrote: + if (bank >= mca_cfg.banks) { + pr_info("mce_disable_bank: Invalid MCA bank %d ignored.\n", bank); Let's have a FW_BUG in that message to point a finger at

Re: [PATCH] Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in

2013-06-17 Thread Naveen N. Rao
which MCA banks function in FF mode, so that we continue to monitor error events on the other banks. - Naveen Signed-off-by: Naveen N. Rao --- arch/x86/include/asm/mce.h|3 ++ arch/x86/kernel/cpu/mcheck/mce-internal.h |3 ++ arch/x86/kernel/cpu/mcheck/mce.c | 23

Re: [PATCH] Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in

2013-06-17 Thread Naveen N. Rao
On 06/17/2013 01:51 PM, Borislav Petkov wrote: On Mon, Jun 17, 2013 at 01:41:03PM +0530, Naveen N. Rao wrote: Yes, we used to poll since we do not get notified via MCE/CMCI. However, with firmware first set in CMC structure, the firmware is now controlling all corrected error reporting for

Re: [PATCH] Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in

2013-06-17 Thread Naveen N. Rao
On 2013/06/17 09:06AM, Borislav Petkov wrote: > On Mon, Jun 17, 2013 at 12:30:05PM +0530, Naveen N. Rao wrote: > > >Hmm, so if CMCI is not supported, you just disabled polling of this bank > > >and returned here. Not good. > > > > This is on purpose. If the bank

Re: [PATCH] Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in

2013-06-17 Thread Naveen N. Rao
On 06/16/2013 05:50 PM, Borislav Petkov wrote: On Fri, Jun 14, 2013 at 11:47:21PM +0530, Naveen N. Rao wrote: +static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, void *data) +{ + int i; + struct acpi_hest_ia_corrected *cmc; + struct acpi_hest_ia_error_bank

Re: [PATCH] Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in

2013-06-17 Thread Naveen N. Rao
On 06/15/2013 08:18 PM, Borislav Petkov wrote: On Fri, Jun 14, 2013 at 11:47:21PM +0530, Naveen N. Rao wrote: HEST for corrected machine checks Here's a patch that implements this technique. If the firmware advertises support for firmware first mode in the CMC structure, we disable CMC

[PATCH] Re: [Patch] MCE, APEI: Don't enable CMCI when Firmware First mode is set in

2013-06-14 Thread Naveen N. Rao
HEST for corrected machine checks Here's a patch that implements this technique. If the firmware advertises support for firmware first mode in the CMC structure, we disable CMCI and polling for all the MCA banks listed in the CMC structure. - Naveen Signed-off-by: Naveen N. Rao --- arc

Re: [PATCH v3] x86/mce: Honour bios-set CMCI threshold

2012-10-17 Thread Naveen N. Rao
On 10/17/2012 10:58 PM, Luck, Tony wrote: BUT (squared) do you even really need to know that thresholds were set? You could look at bits {52:38} in the MCi_STATUS information for the bank to see how many corrected errors had been logged. Ah, nice. I think we should be able to use this instead o

Re: [PATCH v3] x86/mce: Honour bios-set CMCI threshold

2012-10-17 Thread Naveen N. Rao
On 10/17/2012 06:39 PM, Borislav Petkov wrote: On Wed, Oct 17, 2012 at 04:57:30PM +0530, Naveen N. Rao wrote: On 10/17/2012 04:29 PM, Borislav Petkov wrote: +static struct dev_ext_attribute dev_attr_bios_cmci_threshold = { + __ATTR(bios_cmci_threshold, 0444, device_show_int, NULL

Re: [PATCH 2/5] x86, MCA: Convert dont_log_ce, banks and tolerant

2012-10-17 Thread Naveen N. Rao
Apart from a few nits below, patch series: Acked-by: Naveen N. Rao Regards, Naveen On 10/17/2012 04:43 PM, Borislav Petkov wrote: From: Borislav Petkov Move those MCA configuration variables into struct mca_config and adjust the places they're used accordingly. Signed-off-by: Bor

Re: [PATCH v3] x86/mce: Honour bios-set CMCI threshold

2012-10-17 Thread Naveen N. Rao
On 10/17/2012 04:29 PM, Borislav Petkov wrote: +static struct dev_ext_attribute dev_attr_bios_cmci_threshold = { + __ATTR(bios_cmci_threshold, 0444, device_show_int, NULL), + &mce_bios_cmci_threshold Ok, I just noticed this (we must've missed it during review) but why is this read-

Re: [RFC PATCH 3/3] Convert mce_disabled

2012-10-14 Thread Naveen N. Rao
On 10/12/2012 05:26 PM, Borislav Petkov wrote: On Fri, Oct 12, 2012 at 04:20:40PM +0530, Naveen N. Rao wrote: Hi Boris, Thanks for getting to this before I could! Ah ok, I thought you wasn't interested in doing this anymore :). Sorry - just got sidetracked a bit, I'm afraid :)

Re: [RFC PATCH 3/3] Convert mce_disabled

2012-10-12 Thread Naveen N. Rao
On 10/10/2012 07:50 PM, Borislav Petkov wrote: From: Borislav Petkov Not-Signed-off-by: Borislav Petkov --- arch/x86/include/asm/mce.h | 9 + arch/x86/kernel/cpu/mcheck/mce.c | 12 +--- arch/x86/lguest/boot.c | 2 +- 3 files changed, 11 insertions(+), 12

Re: [PATCH v3] x86/mce: Honour bios-set CMCI threshold

2012-09-21 Thread Naveen N. Rao
Hi Tony, Can you kindly take in this patch if there are no further comments? Thanks, Naveen On 09/12/2012 05:55 PM, Naveen N. Rao wrote: The ACPI spec doesn't provide for a way for the bios to pass down recommended thresholds to the OS on a _per-bank_ basis. This patch adds a new boot o

[PATCH v3] x86/mce: Honour bios-set CMCI threshold

2012-09-12 Thread Naveen N. Rao
-safe, we initialize threshold to 1 if some banks have not been initialized by the bios and warn the user. v3: Updated messages as per Tony's inputs. v2: Just separating out the patch. I will send a separate patch for consolidating the MCE boot flags. Signed-off-by: Naveen N. Rao --- Documen

[PATCH v2] x86/mce: Honour bios-set CMCI threshold

2012-09-10 Thread Naveen N. Rao
-safe, we initialize threshold to 1 if some banks have not been initialized by the bios and warn the user. v2: Just separating out the patch. I will send a separate patch for consolidating the MCE boot flags. Signed-off-by: Naveen N. Rao --- Documentation/x86/x86_64/boot-options.txt |5

Re: [PATCH] [mcelog] Start using the new sysfs tunables location

2012-09-06 Thread Naveen N. Rao
On 09/06/2012 05:58 PM, Andi Kleen wrote: The change is still under discussion. Stage one is to add the new global pathnames in addition to keeping the old per-cpu ones. Also fix all utilities (just mcelog(8) as far as we know) to prefer the new paths. But why do you even want to change it? Do

Re: [PATCH 2/3] x86/mce: Pack boolean MCE flags into a structure

2012-09-05 Thread Naveen N. Rao
On 09/06/2012 12:26 AM, Tony Luck wrote: On Wed, Sep 5, 2012 at 3:22 AM, Naveen N. Rao wrote: Many MCE flags are boolean in nature, but are declared as integers currently. We can pack these into a bitfield to save some space. Before this patch: size arch/x86/kernel/cpu/mcheck/mce.o text

Re: [PATCH] [mcelog] Start using the new sysfs tunables location

2012-09-05 Thread Naveen N. Rao
On 09/06/2012 12:39 AM, Tony Luck wrote: On Wed, Sep 5, 2012 at 11:47 AM, Andi Kleen wrote: On Wed, Sep 05, 2012 at 04:02:37PM +0530, Naveen N. Rao wrote: All the current mce tunables are now available under /sys/devices/system/machinecheck. Start using this new location, but fall back to the

[PATCH] [mcelog] Start using the new sysfs tunables location

2012-09-05 Thread Naveen N. Rao
All the current mce tunables are now available under /sys/devices/system/machinecheck. Start using this new location, but fall back to the older per-cpu location so that we continue working with older kernels. Signed-off-by: Naveen N. Rao --- README |2 +- mcelog.init |5

[PATCH 1/3] x86/mce: Make sysfs tunables available globally across all cpus

2012-09-05 Thread Naveen N. Rao
documentation to also point to the new location so that user-space tools can pick up on the new location. We would eventually want to remove these from the per-cpu location. Signed-off-by: Naveen N. Rao --- Documentation/x86/x86_64/machinecheck |4 ++-- arch/x86/kernel/cpu/mcheck/mce.c | 24

[PATCH 3/3] x86/mce: Honour bios-set CMCI threshold

2012-09-05 Thread Naveen N. Rao
-safe, we initialize threshold to 1 if some banks have not been initialized by the bios and warn the user. Signed-off-by: Naveen N. Rao --- Documentation/x86/x86_64/boot-options.txt |5 arch/x86/kernel/cpu/mcheck/mce-internal.h |3 +- arch/x86/kernel/cpu/mcheck/mce.c |

[PATCH 2/3] x86/mce: Pack boolean MCE flags into a structure

2012-09-05 Thread Naveen N. Rao
Many MCE flags are boolean in nature, but are declared as integers currently. We can pack these into a bitfield to save some space. Signed-off-by: Naveen N. Rao --- arch/x86/include/asm/mce.h|2 - arch/x86/kernel/cpu/mcheck/mce-internal.h |9 +++ arch/x86/kernel/cpu

[PATCH 0/3] x86:mce: Some cleanups and bios-set CMCI thresholds

2012-09-05 Thread Naveen N. Rao
op of -tip. Thanks, Naveen --- Naveen N. Rao (3): x86/mce: Make sysfs tunables available globally across all cpus x86/mce: Pack boolean MCE flags into a structure x86/mce: Honour bios-set CMCI threshold Documentation/x86/x86_64/boot-options.txt |5 + Documentation/x86/x

<    2   3   4   5   6   7   8   >