patch 6.
Tested on ppc64 BE and LE.
- Naveen
Naveen N. Rao (8):
kprobes: Fix kallsyms lookup across powerpc ABIv1 and ABIv2
perf probe powerpc: Fix symbol fixup issues due to ELF type
perf probe: Improve detection of file/function name in the probe
pattern
perf probe powerpc: Handle
files param3 and param4 to hold all these values.
>
> For backwards compatability with old injection scripts we maintain the
> old behaviour if flags remains set at zero (or is reset to 0).
>
> Signed-off-by: Tony Luck
Patch
Acked-by: Naveen N. Rao
with a small change below.
>
> ---
On 10/22/2013 12:33 AM, Luck, Tony wrote:
But yes, this is possible and it would make it all even cleaner
and simpler by simply not needing the reg/dereg interfaces for
mce_ext_err_print but adding it to the chain.
So this is on top of the 9 patch series (using the V4 that Chen Gong
posted for
On 10/20/2013 01:51 PM, Borislav Petkov wrote:
On Sun, Oct 20, 2013 at 03:06:15AM -0400, Chen Gong wrote:
Oh, yes it is. Furthermore, it reminds me where is the best place
to put cper.c from I write this patch series. CPER really doesn't
dpend on APEI even ACPI. Maybe lib/ ia an option. I can up
On 10/19/2013 04:56 PM, Chen Gong wrote:
On Fri, Oct 18, 2013 at 05:31:21PM +0530, Naveen N. Rao wrote:
Date: Fri, 18 Oct 2013 17:31:21 +0530
From: "Naveen N. Rao"
To: "Chen, Gong" , tony.l...@intel.com,
b...@alien8.de, j...@perches.com, m.che...@samsung.com
CC: aroza...
On 10/18/2013 01:53 PM, Chen, Gong wrote:
To prepare for the following patches and make related
definition more clear, update some definitions about CPER.
v2 -> v1: Update some more definitions suggested by Boris
Signed-off-by: Chen, Gong
Acked-by: Borislav Petkov
Reviewed-by: Mauro Carvalho
On 10/18/2013 01:53 PM, Chen, Gong wrote:
This H/W error log driver (a.k.a eMCA driver) is implemented based on
http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html
After errors are captured, more valuable information can be
got via this new enh
On 10/18/2013 01:53 PM, Chen, Gong wrote:
Keep up only the most important fields for memory error
reporting. The detail information will be moved to perf/trace
interface.
Suggested-by: Tony Luck
Signed-off-by: Chen, Gong
Reviewed-by: Mauro Carvalho Chehab
---
drivers/acpi/apei/cper.c | 67 +
Gong,
This mail seems to have missed copying you given the header issues.
Thanks,
Naveen
On 10/17/2013 05:51 PM, Naveen N. Rao wrote:
On 10/16/2013 07:09 AM, Chen Gong wrote:
On Tue, Oct 15, 2013 at 11:47:23PM +0530, Naveen N. Rao wrote:
Date: Tue, 15 Oct 2013 23:47:23 +0530
From: "Nav
On 10/16/2013 04:19 PM, Borislav Petkov wrote:
Btw, I don't know what's the problem but when I hit reply-to-all to your
emails, mutt drops your email address from the To: and makes the CC:
list become the To: list. Strange.
I'm seeing the same thing. Looking at the headers, Chen Gong's email
i
On 10/16/2013 12:53 AM, Borislav Petkov wrote:
On Wed, Oct 16, 2013 at 12:40:40AM +0530, Naveen N. Rao wrote:
+2 ;)
You're counting for 2 people, huh?
That's me raising both my hands :)
:-)
While at it, I wonder if we're better off calling these "Hardware
events&qu
+--
5 files changed, 18 insertions(+), 12 deletions(-)
Acked-by: Naveen N. Rao
Regards,
Naveen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org
On 10/16/2013 07:09 AM, Chen Gong wrote:
On Tue, Oct 15, 2013 at 11:47:23PM +0530, Naveen N. Rao wrote:
Date: Tue, 15 Oct 2013 23:47:23 +0530
From: "Naveen N. Rao"
To: "Chen, Gong"
Cc: tony.l...@intel.com, b...@alien8.de, linux-kernel@vger.kernel.org,
linux-a...@vger.ke
off-by: Chen, Gong
> ---
> drivers/acpi/apei/cper.c | 12
> 1 file changed, 12 insertions(+)
>
Acked-by: Naveen N. Rao
> diff --git a/drivers/acpi/apei/cper.c b/drivers/acpi/apei/cper.c
> index 680230c..2a4389f 100644
> --- a/drivers/acpi/apei/cper.c
> +++ b
On 2013/10/15 09:15AM, Tony Luck wrote:
> On Tue, Oct 15, 2013 at 2:28 AM, Borislav Petkov wrote:
> > We can even add a hint for the user like:
> >
> > "Above errors have been corrected by the hardware and require no
> > further action."
> >
> > Btw, this is valid for both dmesg and trace
.c | 1 +
> drivers/firmware/dmi_scan.c | 60
> +
> include/linux/dmi.h | 5
> 4 files changed, 67 insertions(+)
Acked-by: Naveen N. Rao
--
To unsubscribe from this list: send the line "unsubscribe linux-k
On 2013/10/11 02:32AM, Chen Gong wrote:
> To satisfy the necessary of following patches and make related definition
> more clear, update some definitions about CPER. No functional changes.
>
> Signed-off-by: Chen, Gong
> ---
> drivers/acpi/apei/apei-internal.h | 12 -
> drivers/acpi/apei
On 10/15/2013 10:30 PM, Borislav Petkov wrote:
On Tue, Oct 15, 2013 at 10:24:35PM +0530, Naveen N. Rao wrote:
On 2013/10/11 02:32AM, Chen Gong wrote:
Use trace interface to elaborate all H/W error related
information.
Signed-off-by: Chen, Gong
---
+TRACE_EVENT(extlog_mem_event
On 2013/10/11 02:32AM, Chen Gong wrote:
> In latest UEFI spec(by now it is 2.4) memory error definition
> for CPER (UEFI 2.4 Appendix N Common Platform Error Record)
> adds some new fields. These fields help people to locate
> memory error on actual DIMM location.
>
> Original-author: Tony Luck
>
On 2013/10/11 02:32AM, Chen Gong wrote:
> Use trace interface to elaborate all H/W error related
> information.
>
> Signed-off-by: Chen, Gong
> ---
> +TRACE_EVENT(extlog_mem_event,
> + TP_PROTO(u32 etype,
> + char *dimm_loc,
> + const uuid_le *fru_id,
> +
On 10/14/2013 10:42 PM, Tony Luck wrote:
On Mon, Oct 14, 2013 at 3:36 AM, Borislav Petkov wrote:
On Mon, Oct 14, 2013 at 12:55:00AM -0400, Chen Gong wrote:
Because most of data in CPER are empty or unimportant.
It is not about whether it is important or not - the question is whether
changing
On 10/14/2013 10:42 PM, Tony Luck wrote:
On Mon, Oct 14, 2013 at 3:36 AM, Borislav Petkov wrote:
On Mon, Oct 14, 2013 at 12:55:00AM -0400, Chen Gong wrote:
Because most of data in CPER are empty or unimportant.
It is not about whether it is important or not - the question is whether
changing
On 08/13/2013 11:28 PM, Borislav Petkov wrote:
On Tue, Aug 13, 2013 at 11:02:08PM +0530, Naveen N. Rao wrote:
If I'm not mistaken, even for systems that have EDAC drivers, it looks
to me like EDAC can't really decode to the DIMM given what is provided
by the bios in the APEI report
On 08/13/2013 11:09 PM, Luck, Tony wrote:
In the meantime, like Boris suggests, I think we can have a different
trace event for raw APEI reports - userspace can use it as it pleases.
Once ghes_edac gets better, users can decide whether they want raw APEI
reports or the EDAC-processed version and
On 08/13/2013 06:12 PM, Borislav Petkov wrote:
On Tue, Aug 13, 2013 at 04:51:33PM +0530, Naveen N. Rao wrote:
You're right - my trace point makes all the data provided by apei
as-is to userspace. However, ghes_edac seems to squash some of this
data into a string when reporting through mc_
On 08/13/2013 06:11 PM, Mauro Carvalho Chehab wrote:
Em Tue, 13 Aug 2013 17:11:18 +0530
"Naveen N. Rao" escreveu:
On 08/12/2013 08:14 PM, Mauro Carvalho Chehab wrote:
But, this only seems to expose the APEI data as a string
and doesn't look to really make all the fields av
On 08/13/2013 05:51 PM, Mauro Carvalho Chehab wrote:
Em Tue, 13 Aug 2013 17:06:14 +0530
"Naveen N. Rao" escreveu:
On 08/12/2013 11:26 PM, Borislav Petkov wrote:
On Mon, Aug 12, 2013 at 02:25:57PM -0300, Mauro Carvalho Chehab wrote:
Userspace still needs the EDAC sysfs, in order t
On 08/12/2013 08:14 PM, Mauro Carvalho Chehab wrote:
But, this only seems to expose the APEI data as a string
and doesn't look to really make all the fields available to user-space
in a raw manner. Not sure how well this can be utilised by a user-space
tool. Do you have suggestions on how we can
On 08/12/2013 11:26 PM, Borislav Petkov wrote:
On Mon, Aug 12, 2013 at 02:25:57PM -0300, Mauro Carvalho Chehab wrote:
Userspace still needs the EDAC sysfs, in order to identify how the
memory is organized, and do the proper memory labels association.
What edac_ghes does is to fill those sysfs n
On 08/12/2013 06:23 PM, Borislav Petkov wrote:
On Mon, Aug 12, 2013 at 06:11:49PM +0530, Naveen N. Rao wrote:
So, I looked at ghes_edac and it basically seems to boil down to
trace_mc_event. But, this only seems to expose the APEI data as a
string and doesn't look to really make all the f
On 08/12/2013 05:03 PM, Mauro Carvalho Chehab wrote:
Em Sat, 10 Aug 2013 20:03:22 +0200
Borislav Petkov escreveu:
On Thu, Aug 08, 2013 at 04:38:22PM -0300, Mauro Carvalho Chehab wrote:
Em Thu, 08 Aug 2013 23:57:51 +0530
"Naveen N. Rao" escreveu:
Enable memory error trace event
On 08/09/2013 12:53 AM, Steven Rostedt wrote:
[ attempting to try out claws-mail, hopefully this messages isn't
scrambled ;-) ]
Works just fine :)
On Thu, 8 Aug 2013 23:57:49 +0530
"Naveen N. Rao" wrote:
Since we'll be adding multiple trace events to ras.h, we need t
On 08/09/2013 12:47 AM, Borislav Petkov wrote:
On Thu, Aug 08, 2013 at 11:57:50PM +0530, Naveen N. Rao wrote:
+TRACE_EVENT(ghes_platform_memory_event,
+ TP_PROTO(const struct acpi_hest_generic_status *estatus,
+const struct acpi_hest_generic_data *gdata
Enable memory error trace event in cper.c
Signed-off-by: Naveen N. Rao
---
drivers/acpi/apei/cper.c | 21 -
1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/acpi/apei/cper.c b/drivers/acpi/apei/cper.c
index 33dc6a0..19a9c0b 100644
--- a/drivers/acpi
Since we'll be adding multiple trace events to ras.h, we need to protect
each block appropriately so that they only get included in the right
places. Update PCIe AER trace event for this purpose.
Signed-off-by: Naveen N. Rao
---
drivers/pci/pcie/aer/aerdrv_errprint.c | 1 +
include/trace/e
Add a trace event for memory error event from generic hardware error
source. We expose all members from the generic error status block, the
generic error data and the cper memory error record.
Signed-off-by: Naveen N. Rao
---
include/trace/events/ras.h | 157
This patch series adds a new trace event for memory errors reported via APEI
generic hardware error source.
- Naveen
Naveen N. Rao (3):
mce: acpi/apei: trace: Include PCIe AER trace event conditionally
mce: acpi/apei: trace: Add trace event for ghes memory error
mce: acpi/apei: trace
On 07/31/2013 11:30 PM, Bjorn Helgaas wrote:
On Wed, Jul 31, 2013 at 3:46 AM, Naveen N. Rao
wrote:
My key question was about why we are using a field width of 10 implying a
32-bit value, rather than a field width of 18 as suggested by the data type?
This shouldn't truncate the value, b
On 07/25/2013 11:31 PM, Luck, Tony wrote:
MCESEV(
+ PANIC, "Action required but kernel thread is not continuable",
+ SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR,
MCI_UC_SAR|MCI_ADDR),
+ MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV,
MCG_STATUS_RIPV|
On 07/29/2013 08:52 PM, Borislav Petkov wrote:
@@ -186,8 +186,8 @@ static int erst_exec_stall(struct apei_exec_context *ctx,
if (ctx->value > FIRMWARE_MAX_STALL) {
if (!in_nmi())
- pr_warning(FW_WARN ERST_PFX
- "Too long stall t
On 07/25/2013 11:02 PM, Bjorn Helgaas wrote:
On Thu, Jul 25, 2013 at 5:23 AM, Naveen N. Rao
wrote:
On 07/24/2013 10:53 PM, Joe Perches wrote:
On Wed, 2013-07-24 at 22:43 +0530, Naveen N. Rao wrote:
On 2013/07/22 11:01PM, Borislav Petkov wrote:
From: Borislav Petkov
[5.525861] ERST
On 07/24/2013 10:53 PM, Joe Perches wrote:
On Wed, 2013-07-24 at 22:43 +0530, Naveen N. Rao wrote:
On 2013/07/22 11:01PM, Borislav Petkov wrote:
From: Borislav Petkov
[5.525861] ERST: Can not request iomem region <0xc7eff000-0x
c7f0> for ERST.
This needs t
CACOD when parsing 'UC'
errors
arch/x86/include/asm/mce.h| 13 +++--
arch/x86/kernel/cpu/mcheck/mce-severity.c | 4 ++--
2 files changed, 13 insertions(+), 4 deletions(-)
Series Acked-by: Naveen N. Rao
- Naveen
--
To unsubscribe from this list: send t
On 07/24/2013 11:46 AM, Chen Gong wrote:
> On Tue, Jul 23, 2013 at 03:51:14PM -0700, Tony Luck wrote:
>> Date: Tue, 23 Jul 2013 15:51:14 -0700
>> From: Tony Luck
>> To: Linux Kernel Mailing List
>> Cc: Borislav Petkov , Chen Gong ,
>> "Naveen N. Rao&quo
16llx-0x%016llx> for
> ERST.\n",
> (unsigned long long)erst_erange.base,
> (unsigned long long)erst_erange.base + erst_erange.size);
> rc = -EIO;
Acked-by: Naveen N. Rao
While looking at this, I noticed that we seem to be using varying field
width
On 2013/07/23 01:34PM, Tony Luck wrote:
> The 0x1000 bit of the MCACOD field of machine check MCi_STATUS
> registers is only defined for corrected errors (where it means
> that hardware may be filtering errors see SDM section 15.9.2.1).
>
> For uncorrected errors it may, or may not be set - so we
up similar
to how we handle memory failure scenarios.
Signed-off-by: Naveen N. Rao
---
drivers/acpi/apei/ghes.c | 38 +-
include/linux/mm.h |1 +
mm/memory-failure.c |5 -
3 files changed, 34 insertions(+), 10 deletions(-)
diff --git
On 07/09/2013 01:56 AM, Luck, Tony wrote:
I'm happy with just the acpi=nocmcff to avoid a BIOS that does weird
stuff. Or do you think we might still have to deal with a string of APEI
messages?
Agreed - and I don't think this patch can help with a string of APEI
messages either. So yes, I thi
On 07/03/2013 08:16 PM, Borislav Petkov wrote:
On Tue, Jul 02, 2013 at 06:24:00PM +0530, Naveen N. Rao wrote:
I am adding another patch here to disable page offlining in case the firmware
starts acting up.
Thanks,
Naveen
--
Add a sysctl memory_failure_soft_offline to control what is done on
On 07/03/2013 08:14 PM, Borislav Petkov wrote:
On Tue, Jul 02, 2013 at 05:02:48PM +0530, Naveen N. Rao wrote:
Here is the updated patch. I also added printk_ratelimit() in line with the
rest of the GHES code.
Thanks,
Naveen
--
If the firmware indicates in GHES error data entry that the error
immediately. If set to 0, no action is taken.
Signed-off-by: Naveen N. Rao
---
Documentation/sysctl/vm.txt | 12
include/linux/mm.h |1 +
kernel/sysctl.c |9 +
mm/memory-failure.c | 10 +++---
4 files changed, 29 insertions(+), 3
interrupt context, so we queue this up similar
to how we handle memory failure scenarios.
Signed-off-by: Naveen N. Rao
---
drivers/acpi/apei/ghes.c | 38 +-
include/linux/mm.h |1 +
mm/memory-failure.c |5 -
3 files changed, 34
On 07/02/2013 04:38 AM, Borislav Petkov wrote:
On Mon, Jul 01, 2013 at 09:08:59PM +0530, Naveen N. Rao wrote:
If the firmware indicates in GHES error data entry that the error threshold
has exceeded for a corrected error event, then we try to soft-offline the
page. This could be called in
On 07/01/2013 09:08 PM, Borislav Petkov wrote:
On Mon, Jul 01, 2013 at 08:37:43PM +0530, Naveen N. Rao wrote:
On 06/28/2013 11:01 PM, Tony Luck wrote:
+ if (sec_sev == GHES_SEV_CORRECTED &&
+ (gdata->flags & CPER_SEC_ERROR_THR
If the firmware indicates in GHES error data entry that the error threshold
has exceeded for a corrected error event, then we try to soft-offline the
page. This could be called in interrupt context, so we queue this up similar
to how we handle memory failure scenarios.
Signed-off-by: Naveen N
which MCA banks function in FF mode, so that we continue to
monitor error events on the other banks.
Signed-off-by: Naveen N. Rao
---
arch/x86/include/asm/mce.h|3 ++
arch/x86/kernel/cpu/mcheck/mce-internal.h |3 ++
arch/x86/kernel/cpu/mcheck/mce.c | 28
---
Naveen N. Rao (3):
mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST
CMC
mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
mce, acpi/apei: Soft-offline a page on firmware GHES notification
Documentation/x86/x86_64/boot
Add a boot option to disable firmware first mode for corrected errors.
Signed-off-by: Naveen N. Rao
---
Documentation/x86/x86_64/boot-options.txt |5 +
arch/x86/include/asm/acpi.h |2 ++
arch/x86/kernel/acpi/boot.c |5 +
drivers/acpi/apei/hest.c
On 06/28/2013 11:01 PM, Tony Luck wrote:
+ if (sec_sev == GHES_SEV_CORRECTED &&
+ (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED)
&&
+ (mem_err->validation_bits &
CPER_MEM_VALID_PHYSICAL_ADDRESS)) {
+
.
Signed-off-by: Naveen N. Rao
---
drivers/acpi/apei/ghes.c |7 ++
include/linux/mm.h |1 +
mm/memory-failure.c | 53 ++
3 files changed, 43 insertions(+), 18 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei
cked-by" bandwagon - speak now.
Yep - looks fine to me.
Acked-by: Naveen N. Rao
Thanks,
Naveen
-Tony
arch/x86/kernel/cpu/mcheck/mce-severity.c | 15 +--
1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c
b/arch/
On 06/25/2013 10:01 PM, Luck, Tony wrote:
The SDM talks about "non-affected" logical processors, but perhaps we
can call this an "unaffected" thread?
"unaffected" sounds a bit more natural (but close enough to the wording in
the SDM that people should see the connection).
Yup - "unnatural" is
Tony,
Thanks - I have included your text in the patch. I wasn't sure if I should add
your Signed-off-by. Kindly review and do the needful.
Thanks,
Naveen
--
Add comments to clarify usage of the various bitfields in the MCA subsystem
Signed-off-by: Naveen N. Rao
Acked-by: Borislav P
Tony, Boris,
Can you please see if the comments in the below patch include the details you
were expecting?
Thanks,
Naveen
--
Add comments to clarify usage of the various bitfields in the MCA subsystem
Signed-off-by: Naveen N. Rao
---
arch/x86/kernel/cpu/mcheck/mce.c |5 -
arch
On 2013/06/20 05:16AM, Chen Gong wrote:
> Update some SRAR severity conditions check to make it clearer,
> according to latest Intel SDM Vol 3(June 2013), table 15-20.
>
> Signed-off-by: Chen Gong
> ---
> arch/x86/kernel/cpu/mcheck/mce-severity.c | 15 +--
> 1 file changed, 5 inser
On 06/21/2013 12:57 PM, Borislav Petkov wrote:
On Thu, Jun 20, 2013 at 10:11:27PM +, Luck, Tony wrote:
- Two, the Generic Error Data Entry (aka UEFI Section Descriptor) has a
flag which indicates 'Error Threshold Exceeded'. From the UEFI spec, it
looks like we could consider this as an indic
On 06/21/2013 02:06 PM, Borislav Petkov wrote:
On Fri, Jun 21, 2013 at 01:16:50PM +0530, Naveen N. Rao wrote:
Yes, but I'm afraid this won't work either - mce_banks_owned is
cleared during cpu offline. This is necessary since a cmci
rediscover is triggered on cpu offline, so that if
On 06/21/2013 01:04 PM, Borislav Petkov wrote:
On Fri, Jun 21, 2013 at 02:52:25AM +0530, Naveen N. Rao wrote:
Exactly, but mce_poll_banks also doesn't have bits set for banks on
which CMCI is enabled.
Let's say we have a cpu with 2 banks (not shared), none of which work
in FF mode.
On 06/21/2013 02:27 AM, Borislav Petkov wrote:
On Fri, Jun 21, 2013 at 01:44:00AM +0530, Naveen N. Rao wrote:
This won't work across cpu offline/online, right? We will end up
_not_ enabling CMCI on certain banks where we should have.
Huh, don't understand. cmci_discover runs o
On 06/20/2013 02:58 AM, Luck, Tony wrote:
Ok, where is that semantics? What in a CPER record does say "this error
should tell you that you need to offline the containing page and I'm
telling you this exactly only once"? Error Severity 0, i.e. Recoverable?
Naveen - this one is for you (or for yo
On 06/21/2013 12:59 AM, Borislav Petkov wrote:
On Fri, Jun 21, 2013 at 12:38:13AM +0530, Naveen N. Rao wrote:
We need this bitfield to prevent enabling CMCI in future
cmci_discover() invocations. See usage in cmci_discover() further
below.
So?!
/* Skip banks in firmware first mode
On 06/20/2013 01:09 PM, Borislav Petkov wrote:
On Wed, Jun 19, 2013 at 11:27:17PM +0530, Naveen N. Rao wrote:
The Corrected Machine Check structure (CMC) in HEST has a flag which can be
set by the firmware to indicate to the OS that it prefers to process the
corrected error events first. In
On 06/20/2013 01:18 PM, Borislav Petkov wrote:
On Wed, Jun 19, 2013 at 11:27:42PM +0530, Naveen N. Rao wrote:
Add a boot option to disable firmware first mode for corrected errors.
Signed-off-by: Naveen N. Rao
---
Documentation/x86/x86_64/boot-options.txt |5 +
arch/x86/include/asm
On 06/19/2013 11:34 PM, Borislav Petkov wrote:
On Wed, Jun 19, 2013 at 11:27:42PM +0530, Naveen N. Rao wrote:
Add a boot option to disable firmware first mode for corrected errors.
Signed-off-by: Naveen N. Rao
---
Documentation/x86/x86_64/boot-options.txt |5 +
arch/x86/include/asm
through a boot option.
Signed-off-by: Naveen N. Rao
---
arch/x86/include/asm/mce.h|3 ++
arch/x86/kernel/cpu/mcheck/mce-internal.h |3 ++
arch/x86/kernel/cpu/mcheck/mce.c | 25 ++
arch/x86/kernel/cpu/mcheck/mce_intel.c| 40
Add a boot option to disable firmware first mode for corrected errors.
Signed-off-by: Naveen N. Rao
---
Documentation/x86/x86_64/boot-options.txt |5 +
arch/x86/include/asm/acpi.h |2 ++
arch/x86/kernel/acpi/boot.c |5 +
drivers/acpi/apei/hest.c
On 06/19/2013 03:59 AM, Tony Luck wrote:
On Mon, Jun 17, 2013 at 11:43 PM, Naveen N. Rao
wrote:
+ if (bank >= mca_cfg.banks) {
+ pr_info("mce_disable_bank: Invalid MCA bank %d ignored.\n",
bank);
Let's have a FW_BUG in that message to point a finger at
which MCA banks function in FF mode, so that we continue to
monitor error events on the other banks.
- Naveen
Signed-off-by: Naveen N. Rao
---
arch/x86/include/asm/mce.h|3 ++
arch/x86/kernel/cpu/mcheck/mce-internal.h |3 ++
arch/x86/kernel/cpu/mcheck/mce.c | 23
On 06/17/2013 01:51 PM, Borislav Petkov wrote:
On Mon, Jun 17, 2013 at 01:41:03PM +0530, Naveen N. Rao wrote:
Yes, we used to poll since we do not get notified via MCE/CMCI.
However, with firmware first set in CMC structure, the firmware is
now controlling all corrected error reporting for
On 2013/06/17 09:06AM, Borislav Petkov wrote:
> On Mon, Jun 17, 2013 at 12:30:05PM +0530, Naveen N. Rao wrote:
> > >Hmm, so if CMCI is not supported, you just disabled polling of this bank
> > >and returned here. Not good.
> >
> > This is on purpose. If the bank
On 06/16/2013 05:50 PM, Borislav Petkov wrote:
On Fri, Jun 14, 2013 at 11:47:21PM +0530, Naveen N. Rao wrote:
+static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, void *data)
+{
+ int i;
+ struct acpi_hest_ia_corrected *cmc;
+ struct acpi_hest_ia_error_bank
On 06/15/2013 08:18 PM, Borislav Petkov wrote:
On Fri, Jun 14, 2013 at 11:47:21PM +0530, Naveen N. Rao wrote:
HEST for corrected machine checks
Here's a patch that implements this technique. If the firmware advertises
support for firmware first mode in the CMC structure, we disable CMC
HEST for corrected machine checks
Here's a patch that implements this technique. If the firmware advertises
support for firmware first mode in the CMC structure, we disable CMCI and
polling for all the MCA banks listed in the CMC structure.
- Naveen
Signed-off-by: Naveen N. Rao
---
arc
On 10/17/2012 10:58 PM, Luck, Tony wrote:
BUT (squared) do you even really need to know that thresholds were set? You
could look at bits {52:38} in the MCi_STATUS information for the bank to see
how many corrected errors had been logged.
Ah, nice. I think we should be able to use this instead o
On 10/17/2012 06:39 PM, Borislav Petkov wrote:
On Wed, Oct 17, 2012 at 04:57:30PM +0530, Naveen N. Rao wrote:
On 10/17/2012 04:29 PM, Borislav Petkov wrote:
+static struct dev_ext_attribute dev_attr_bios_cmci_threshold = {
+ __ATTR(bios_cmci_threshold, 0444, device_show_int, NULL
Apart from a few nits below, patch series:
Acked-by: Naveen N. Rao
Regards,
Naveen
On 10/17/2012 04:43 PM, Borislav Petkov wrote:
From: Borislav Petkov
Move those MCA configuration variables into struct mca_config and adjust
the places they're used accordingly.
Signed-off-by: Bor
On 10/17/2012 04:29 PM, Borislav Petkov wrote:
+static struct dev_ext_attribute dev_attr_bios_cmci_threshold = {
+ __ATTR(bios_cmci_threshold, 0444, device_show_int, NULL),
+ &mce_bios_cmci_threshold
Ok, I just noticed this (we must've missed it during review) but why is
this read-
On 10/12/2012 05:26 PM, Borislav Petkov wrote:
On Fri, Oct 12, 2012 at 04:20:40PM +0530, Naveen N. Rao wrote:
Hi Boris, Thanks for getting to this before I could!
Ah ok, I thought you wasn't interested in doing this anymore :).
Sorry - just got sidetracked a bit, I'm afraid :)
On 10/10/2012 07:50 PM, Borislav Petkov wrote:
From: Borislav Petkov
Not-Signed-off-by: Borislav Petkov
---
arch/x86/include/asm/mce.h | 9 +
arch/x86/kernel/cpu/mcheck/mce.c | 12 +---
arch/x86/lguest/boot.c | 2 +-
3 files changed, 11 insertions(+), 12
Hi Tony,
Can you kindly take in this patch if there are no further comments?
Thanks,
Naveen
On 09/12/2012 05:55 PM, Naveen N. Rao wrote:
The ACPI spec doesn't provide for a way for the bios to pass down
recommended thresholds to the OS on a _per-bank_ basis. This patch adds
a new boot o
-safe, we initialize threshold to 1 if some banks have not been
initialized by the bios and warn the user.
v3: Updated messages as per Tony's inputs.
v2: Just separating out the patch. I will send a separate patch for
consolidating the MCE boot flags.
Signed-off-by: Naveen N. Rao
---
Documen
-safe, we initialize threshold to 1 if some banks have not been
initialized by the bios and warn the user.
v2: Just separating out the patch. I will send a separate patch for
consolidating the MCE boot flags.
Signed-off-by: Naveen N. Rao
---
Documentation/x86/x86_64/boot-options.txt |5
On 09/06/2012 05:58 PM, Andi Kleen wrote:
The change is still under discussion. Stage one is to add the new global
pathnames in addition to keeping the old per-cpu ones. Also fix all utilities
(just mcelog(8) as far as we know) to prefer the new paths.
But why do you even want to change it? Do
On 09/06/2012 12:26 AM, Tony Luck wrote:
On Wed, Sep 5, 2012 at 3:22 AM, Naveen N. Rao
wrote:
Many MCE flags are boolean in nature, but are declared as integers
currently. We can pack these into a bitfield to save some space.
Before this patch:
size arch/x86/kernel/cpu/mcheck/mce.o
text
On 09/06/2012 12:39 AM, Tony Luck wrote:
On Wed, Sep 5, 2012 at 11:47 AM, Andi Kleen wrote:
On Wed, Sep 05, 2012 at 04:02:37PM +0530, Naveen N. Rao wrote:
All the current mce tunables are now available under
/sys/devices/system/machinecheck. Start using this new location, but fall back
to the
All the current mce tunables are now available under
/sys/devices/system/machinecheck. Start using this new location, but fall back
to the older per-cpu location so that we continue working with older kernels.
Signed-off-by: Naveen N. Rao
---
README |2 +-
mcelog.init |5
documentation to also point to the new location so that user-space
tools can pick up on the new location. We would eventually want to remove
these from the per-cpu location.
Signed-off-by: Naveen N. Rao
---
Documentation/x86/x86_64/machinecheck |4 ++--
arch/x86/kernel/cpu/mcheck/mce.c | 24
-safe, we initialize threshold to 1 if some banks have not been
initialized by the bios and warn the user.
Signed-off-by: Naveen N. Rao
---
Documentation/x86/x86_64/boot-options.txt |5
arch/x86/kernel/cpu/mcheck/mce-internal.h |3 +-
arch/x86/kernel/cpu/mcheck/mce.c |
Many MCE flags are boolean in nature, but are declared as integers
currently. We can pack these into a bitfield to save some space.
Signed-off-by: Naveen N. Rao
---
arch/x86/include/asm/mce.h|2 -
arch/x86/kernel/cpu/mcheck/mce-internal.h |9 +++
arch/x86/kernel/cpu
op of -tip.
Thanks,
Naveen
---
Naveen N. Rao (3):
x86/mce: Make sysfs tunables available globally across all cpus
x86/mce: Pack boolean MCE flags into a structure
x86/mce: Honour bios-set CMCI threshold
Documentation/x86/x86_64/boot-options.txt |5 +
Documentation/x86/x
601 - 700 of 723 matches
Mail list logo