On 10/10/2012 07:50 PM, Borislav Petkov wrote:
From: Borislav Petkov borislav.pet...@amd.com
Not-Signed-off-by: Borislav Petkov borislav.pet...@amd.com
---
arch/x86/include/asm/mce.h | 9 +
arch/x86/kernel/cpu/mcheck/mce.c | 12 +---
arch/x86/lguest/boot.c |
On 10/12/2012 05:26 PM, Borislav Petkov wrote:
On Fri, Oct 12, 2012 at 04:20:40PM +0530, Naveen N. Rao wrote:
Hi Boris, Thanks for getting to this before I could!
Ah ok, I thought you wasn't interested in doing this anymore :).
Sorry - just got sidetracked a bit, I'm afraid :)
I had
On 10/17/2012 04:29 PM, Borislav Petkov wrote:
+static struct dev_ext_attribute dev_attr_bios_cmci_threshold = {
+ __ATTR(bios_cmci_threshold, 0444, device_show_int, NULL),
+ mce_bios_cmci_threshold
Ok, I just noticed this (we must've missed it during review) but why is
this
Apart from a few nits below, patch series:
Acked-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
Regards,
Naveen
On 10/17/2012 04:43 PM, Borislav Petkov wrote:
From: Borislav Petkov borislav.pet...@amd.com
Move those MCA configuration variables into struct mca_config and adjust
the places
On 10/17/2012 06:39 PM, Borislav Petkov wrote:
On Wed, Oct 17, 2012 at 04:57:30PM +0530, Naveen N. Rao wrote:
On 10/17/2012 04:29 PM, Borislav Petkov wrote:
+static struct dev_ext_attribute dev_attr_bios_cmci_threshold = {
+ __ATTR(bios_cmci_threshold, 0444, device_show_int, NULL
On 10/17/2012 10:58 PM, Luck, Tony wrote:
BUT (squared) do you even really need to know that thresholds were set? You
could look at bits {52:38} in the MCi_STATUS information for the bank to see
how many corrected errors had been logged.
Ah, nice. I think we should be able to use this instead
interrupts are already
disabled, instead of perf_event_disable().
Reported-by: Edjunior Barbosa Machado emach...@linux.vnet.ibm.com
Signed-off-by: K.Prasad prasad.krish...@gmail.com
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
include/linux/perf_event.h|2 ++
kernel/events
If arch_validate_hwbkpt_settings() fails, bp-ctx won't be valid and the
kernel panics. Add a check to fix this.
Reported-by: Edjunior Barbosa Machado emach...@linux.vnet.ibm.com
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
arch/powerpc/kernel/hw_breakpoint.c |2 +-
1 file
On 07/18/2012 05:27 PM, Frederic Weisbecker wrote:
On Wed, Jul 18, 2012 at 04:00:46PM +0530, Naveen N. Rao wrote:
Please find v2 of the patch from Prasad, based on Peter Zijlstra's
feedback. This applies on top of v3.5-rc7. This has been tested and
found to work fine by Edjunior.
Regards
On 09/06/2012 12:39 AM, Tony Luck wrote:
On Wed, Sep 5, 2012 at 11:47 AM, Andi Kleen a...@firstfloor.org wrote:
On Wed, Sep 05, 2012 at 04:02:37PM +0530, Naveen N. Rao wrote:
All the current mce tunables are now available under
/sys/devices/system/machinecheck. Start using this new location
On 09/06/2012 12:26 AM, Tony Luck wrote:
On Wed, Sep 5, 2012 at 3:22 AM, Naveen N. Rao
naveen.n@linux.vnet.ibm.com wrote:
Many MCE flags are boolean in nature, but are declared as integers
currently. We can pack these into a bitfield to save some space.
Before this patch:
size arch/x86
On 09/06/2012 05:58 PM, Andi Kleen wrote:
The change is still under discussion. Stage one is to add the new global
pathnames in addition to keeping the old per-cpu ones. Also fix all utilities
(just mcelog(8) as far as we know) to prefer the new paths.
But why do you even want to change it?
, we initialize threshold to 1 if some banks have not been
initialized by the bios and warn the user.
v2: Just separating out the patch. I will send a separate patch for
consolidating the MCE boot flags.
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
Documentation/x86/x86_64/boot
Hi Tony,
Can you kindly take in this patch if there are no further comments?
Thanks,
Naveen
On 09/12/2012 05:55 PM, Naveen N. Rao wrote:
The ACPI spec doesn't provide for a way for the bios to pass down
recommended thresholds to the OS on a _per-bank_ basis. This patch adds
a new boot option
, we initialize threshold to 1 if some banks have not been
initialized by the bios and warn the user.
v3: Updated messages as per Tony's inputs.
v2: Just separating out the patch. I will send a separate patch for
consolidating the MCE boot flags.
Signed-off-by: Naveen N. Rao naveen.n
On 08/16/2012 01:46 PM, Peter Zijlstra wrote:
On Wed, 2012-08-15 at 20:42 +0200, Frederic Weisbecker wrote:
On Wed, Aug 15, 2012 at 11:07:01PM +0530, Naveen N. Rao wrote:
Hi Frederick,
Did you get a chance to take a look at this?
Regards,
Naveen
Yeah, I'm ok with the patch. Peter, are you
to be updated to use the new path. However, if we ever get
to a point where cpu0 can be offlined, these tools will need to be updated
anyway (as they mostly hardcode machinecheck0 currently)
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
arch/x86/kernel/cpu/mcheck/mce.c | 46
On 08/29/2012 03:43 PM, Borislav Petkov wrote:
On Wed, Aug 29, 2012 at 01:11:55PM +0530, Naveen N. Rao wrote:
All the MCE attributes currently exported via sysfs appear under
/sys/devices/system/machinecheck/machinecheckn/. Pretty much all of these
are global in nature and not specific
On 08/29/2012 04:10 PM, Borislav Petkov wrote:
On Wed, Aug 29, 2012 at 03:56:04PM +0530, Naveen N. Rao wrote:
Hmmm.. Can't we just deprecate these? ;) Perhaps we can consider
adding newer tunables in the right place.
In case you haven't noticed yet: I'm all on your side.
Yup, I know :)
I
On 08/29/2012 08:13 PM, Luck, Tony wrote:
Note: I'm not sure if it's ok to change sysfs entries and this does break
userspace tools that depend on the current path for some of these attributes.
So, they will need to be updated to use the new path. However, if we ever get
to a point where cpu0
of -tip.
Thanks,
Naveen
---
Naveen N. Rao (3):
x86/mce: Make sysfs tunables available globally across all cpus
x86/mce: Pack boolean MCE flags into a structure
x86/mce: Honour bios-set CMCI threshold
Documentation/x86/x86_64/boot-options.txt |5 +
Documentation/x86/x86_64
Many MCE flags are boolean in nature, but are declared as integers
currently. We can pack these into a bitfield to save some space.
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
arch/x86/include/asm/mce.h|2 -
arch/x86/kernel/cpu/mcheck/mce-internal.h
, we initialize threshold to 1 if some banks have not been
initialized by the bios and warn the user.
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
Documentation/x86/x86_64/boot-options.txt |5
arch/x86/kernel/cpu/mcheck/mce-internal.h |3 +-
arch/x86/kernel/cpu
documentation to also point to the new location so that user-space
tools can pick up on the new location. We would eventually want to remove
these from the per-cpu location.
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
Documentation/x86/x86_64/machinecheck |4 ++--
arch/x86
All the current mce tunables are now available under
/sys/devices/system/machinecheck. Start using this new location, but fall back
to the older per-cpu location so that we continue working with older kernels.
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
README |2
Hi Frederick,
Did you get a chance to take a look at this?
Regards,
Naveen
On 08/02/2012 01:46 PM, Naveen N. Rao wrote:
Hi Frederick,
I've added a check to make sure we are targeting the current task. This
applies on top of v3.5. Kindly review.
Thanks,
Naveen
History:
v3: Added check to make
, we initialize threshold to 1 if some banks have not been
initialized by the bios and warn the user.
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
Documentation/x86/x86_64/boot-options.txt |5
arch/x86/include/asm/mce.h|1 +
arch/x86/kernel/cpu
On 08/22/2012 06:16 PM, Borislav Petkov wrote:
On Wed, Aug 22, 2012 at 06:00:54PM +0530, Naveen N. Rao wrote:
The ACPI spec doesn't provide for a way for the bios to pass down
recommended thresholds to the OS on a _per-bank_ basis. This patch adds
a new boot option, which if passed, allows bios
On 08/22/2012 06:16 PM, Borislav Petkov wrote:
On Wed, Aug 22, 2012 at 06:00:54PM +0530, Naveen N. Rao wrote:
The ACPI spec doesn't provide for a way for the bios to pass down
recommended thresholds to the OS on a _per-bank_ basis. This patch adds
a new boot option, which if passed, allows bios
On 08/27/2012 02:42 PM, Borislav Petkov wrote:
On Thu, Aug 23, 2012 at 05:26:09PM +0530, Naveen N. Rao wrote:
Sure - sounds like a good idea. Further, a #define could eliminate
the need to change other references, but I'm not sure that's
GENERALLacceptable
#define mce_bios_cmci_threshold
Many MCE boot flags are boolean in nature, but are declared as integers
currently. We can pack these into a bitfield to save some space.
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
arch/x86/include/asm/mce.h | 11 +++-
arch/x86/kernel/cpu/mcheck/mce.c
, we initialize threshold to 1 if some banks have not been
initialized by the bios and warn the user.
Changes:
- Use the mce_boot_flags structure.
- Expose bios_cmci_threshold via sysfs.
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
Documentation/x86/x86_64/boot-options.txt
On 08/27/2012 08:18 PM, Borislav Petkov wrote:
On Mon, Aug 27, 2012 at 04:55:12PM +0530, Naveen N. Rao wrote:
The ACPI spec doesn't provide for a way for the bios to pass down
recommended thresholds to the OS on a _per-bank_ basis. This patch adds
a new boot option, which if passed, allows bios
On 08/27/2012 08:06 PM, Borislav Petkov wrote:
On Mon, Aug 27, 2012 at 04:55:03PM +0530, Naveen N. Rao wrote:
Many MCE boot flags are boolean in nature, but are declared as integers
currently. We can pack these into a bitfield to save some space.
Signed-off-by: Naveen N. Rao naveen.n
On 08/27/2012 09:17 PM, Borislav Petkov wrote:
On Mon, Aug 27, 2012 at 09:05:46PM +0530, Naveen N. Rao wrote:
+
+extern struct mce_boot_flags mce_boot_flags;
Why do we need that extern thing?
So that this is visible across mce.c and mce_intel.c?
Ok. But if you move the struct to mce
On 08/27/2012 10:04 PM, Borislav Petkov wrote:
On Mon, Aug 27, 2012 at 09:31:11PM +0530, Naveen N. Rao wrote:
On 08/27/2012 09:17 PM, Borislav Petkov wrote:
On Mon, Aug 27, 2012 at 09:05:46PM +0530, Naveen N. Rao wrote:
+
+extern struct mce_boot_flags mce_boot_flags;
Why do we need
On 08/27/2012 07:48 PM, Borislav Petkov wrote:
On Mon, Aug 27, 2012 at 03:58:59PM +0200, Andi Kleen wrote:
On Mon, Aug 27, 2012 at 04:55:03PM +0530, Naveen N. Rao wrote:
Many MCE boot flags are boolean in nature, but are declared as integers
currently. We can pack these into a bitfield to save
On 08/28/2012 01:48 AM, Borislav Petkov wrote:
On Mon, Aug 27, 2012 at 10:44:40PM +0530, Naveen N. Rao wrote:
Looks good. Infact, I had actually added mce_ser and mce_disabled
into the bitfield, but backed off not wanting to overdo.
We could pull in all the other configuration parameters
target current task]
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
include/linux/perf_event.h|2 ++
kernel/events/core.c |2 +-
kernel/events/hw_breakpoint.c | 11 ++-
3 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/include/linux
On 07/19/2012 04:46 PM, Naveen N. Rao wrote:
On 07/18/2012 05:27 PM, Frederic Weisbecker wrote:
On Wed, Jul 18, 2012 at 04:00:46PM +0530, Naveen N. Rao wrote:
Please find v2 of the patch from Prasad, based on Peter Zijlstra's
feedback. This applies on top of v3.5-rc7. This has been tested
...@linux.intel.com,
Naveen N. Rao naveen.n@linux.vnet.ibm.com
Subject: Re: [PATCH] x86/mce: Pay no attention to 'F' bit in MCACOD when
parsing 'UC' errors.
Gah ... there is another bug in that unaffected thread entry. The check
for
MCG_STATUS should be for RIPV=1 *and* EIPV=0
I set MCGMASK
parsing 'UC'
errors
arch/x86/include/asm/mce.h| 13 +++--
arch/x86/kernel/cpu/mcheck/mce-severity.c | 4 ++--
2 files changed, 13 insertions(+), 4 deletions(-)
Series Acked-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
- Naveen
--
To unsubscribe from this list
On 07/24/2013 10:53 PM, Joe Perches wrote:
On Wed, 2013-07-24 at 22:43 +0530, Naveen N. Rao wrote:
On 2013/07/22 11:01PM, Borislav Petkov wrote:
From: Borislav Petkov b...@suse.de
[5.525861] ERST: Can not request iomem region 0xc7eff000-0x
c7f0 for ERST.
This needs
On 08/12/2013 06:23 PM, Borislav Petkov wrote:
On Mon, Aug 12, 2013 at 06:11:49PM +0530, Naveen N. Rao wrote:
So, I looked at ghes_edac and it basically seems to boil down to
trace_mc_event. But, this only seems to expose the APEI data as a
string and doesn't look to really make all the fields
On 08/12/2013 11:26 PM, Borislav Petkov wrote:
On Mon, Aug 12, 2013 at 02:25:57PM -0300, Mauro Carvalho Chehab wrote:
Userspace still needs the EDAC sysfs, in order to identify how the
memory is organized, and do the proper memory labels association.
What edac_ghes does is to fill those sysfs
On 08/12/2013 08:14 PM, Mauro Carvalho Chehab wrote:
But, this only seems to expose the APEI data as a string
and doesn't look to really make all the fields available to user-space
in a raw manner. Not sure how well this can be utilised by a user-space
tool. Do you have suggestions on how we can
On 08/13/2013 05:51 PM, Mauro Carvalho Chehab wrote:
Em Tue, 13 Aug 2013 17:06:14 +0530
Naveen N. Rao naveen.n@linux.vnet.ibm.com escreveu:
On 08/12/2013 11:26 PM, Borislav Petkov wrote:
On Mon, Aug 12, 2013 at 02:25:57PM -0300, Mauro Carvalho Chehab wrote:
Userspace still needs the EDAC
On 08/13/2013 06:11 PM, Mauro Carvalho Chehab wrote:
Em Tue, 13 Aug 2013 17:11:18 +0530
Naveen N. Rao naveen.n@linux.vnet.ibm.com escreveu:
On 08/12/2013 08:14 PM, Mauro Carvalho Chehab wrote:
But, this only seems to expose the APEI data as a string
and doesn't look to really make all
On 08/13/2013 06:12 PM, Borislav Petkov wrote:
On Tue, Aug 13, 2013 at 04:51:33PM +0530, Naveen N. Rao wrote:
You're right - my trace point makes all the data provided by apei
as-is to userspace. However, ghes_edac seems to squash some of this
data into a string when reporting through mc_event
On 08/13/2013 11:09 PM, Luck, Tony wrote:
In the meantime, like Boris suggests, I think we can have a different
trace event for raw APEI reports - userspace can use it as it pleases.
Once ghes_edac gets better, users can decide whether they want raw APEI
reports or the EDAC-processed version
On 08/13/2013 11:28 PM, Borislav Petkov wrote:
On Tue, Aug 13, 2013 at 11:02:08PM +0530, Naveen N. Rao wrote:
If I'm not mistaken, even for systems that have EDAC drivers, it looks
to me like EDAC can't really decode to the DIMM given what is provided
by the bios in the APEI report currently
On 07/09/2013 01:56 AM, Luck, Tony wrote:
I'm happy with just the acpi=nocmcff to avoid a BIOS that does weird
stuff. Or do you think we might still have to deal with a string of APEI
messages?
Agreed - and I don't think this patch can help with a string of APEI
messages either. So yes, I
failure scenarios.
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
drivers/acpi/apei/ghes.c | 38 +-
include/linux/mm.h |1 +
mm/memory-failure.c |5 -
3 files changed, 34 insertions(+), 10 deletions(-)
diff --git
On 06/19/2013 03:59 AM, Tony Luck wrote:
On Mon, Jun 17, 2013 at 11:43 PM, Naveen N. Rao
naveen.n@linux.vnet.ibm.com wrote:
+ if (bank = mca_cfg.banks) {
+ pr_info(mce_disable_bank: Invalid MCA bank %d ignored.\n,
bank);
Let's have a FW_BUG in that message to point
through a boot option.
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
arch/x86/include/asm/mce.h|3 ++
arch/x86/kernel/cpu/mcheck/mce-internal.h |3 ++
arch/x86/kernel/cpu/mcheck/mce.c | 25 ++
arch/x86/kernel/cpu/mcheck/mce_intel.c
Add a boot option to disable firmware first mode for corrected errors.
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
Documentation/x86/x86_64/boot-options.txt |5 +
arch/x86/include/asm/acpi.h |2 ++
arch/x86/kernel/acpi/boot.c |5
On 06/19/2013 11:34 PM, Borislav Petkov wrote:
On Wed, Jun 19, 2013 at 11:27:42PM +0530, Naveen N. Rao wrote:
Add a boot option to disable firmware first mode for corrected errors.
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
Documentation/x86/x86_64/boot-options.txt
On 06/20/2013 01:18 PM, Borislav Petkov wrote:
On Wed, Jun 19, 2013 at 11:27:42PM +0530, Naveen N. Rao wrote:
Add a boot option to disable firmware first mode for corrected errors.
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
Documentation/x86/x86_64/boot-options.txt
On 06/20/2013 01:09 PM, Borislav Petkov wrote:
On Wed, Jun 19, 2013 at 11:27:17PM +0530, Naveen N. Rao wrote:
The Corrected Machine Check structure (CMC) in HEST has a flag which can be
set by the firmware to indicate to the OS that it prefers to process the
corrected error events first
On 06/21/2013 12:59 AM, Borislav Petkov wrote:
On Fri, Jun 21, 2013 at 12:38:13AM +0530, Naveen N. Rao wrote:
We need this bitfield to prevent enabling CMCI in future
cmci_discover() invocations. See usage in cmci_discover() further
below.
So?!
/* Skip banks in firmware first mode
On 06/21/2013 02:27 AM, Borislav Petkov wrote:
On Fri, Jun 21, 2013 at 01:44:00AM +0530, Naveen N. Rao wrote:
This won't work across cpu offline/online, right? We will end up
_not_ enabling CMCI on certain banks where we should have.
Huh, don't understand. cmci_discover runs on each CPU
On 06/20/2013 02:58 AM, Luck, Tony wrote:
Ok, where is that semantics? What in a CPER record does say this error
should tell you that you need to offline the containing page and I'm
telling you this exactly only once? Error Severity 0, i.e. Recoverable?
Naveen - this one is for you (or for
On 06/21/2013 01:04 PM, Borislav Petkov wrote:
On Fri, Jun 21, 2013 at 02:52:25AM +0530, Naveen N. Rao wrote:
Exactly, but mce_poll_banks also doesn't have bits set for banks on
which CMCI is enabled.
Let's say we have a cpu with 2 banks (not shared), none of which work
in FF mode. Both
On 06/21/2013 02:06 PM, Borislav Petkov wrote:
On Fri, Jun 21, 2013 at 01:16:50PM +0530, Naveen N. Rao wrote:
Yes, but I'm afraid this won't work either - mce_banks_owned is
cleared during cpu offline. This is necessary since a cmci
rediscover is triggered on cpu offline, so that if this bank
On 06/21/2013 12:57 PM, Borislav Petkov wrote:
On Thu, Jun 20, 2013 at 10:11:27PM +, Luck, Tony wrote:
- Two, the Generic Error Data Entry (aka UEFI Section Descriptor) has a
flag which indicates 'Error Threshold Exceeded'. From the UEFI spec, it
looks like we could consider this as an
On 07/25/2013 11:02 PM, Bjorn Helgaas wrote:
On Thu, Jul 25, 2013 at 5:23 AM, Naveen N. Rao
naveen.n@linux.vnet.ibm.com wrote:
On 07/24/2013 10:53 PM, Joe Perches wrote:
On Wed, 2013-07-24 at 22:43 +0530, Naveen N. Rao wrote:
On 2013/07/22 11:01PM, Borislav Petkov wrote:
From
On 07/29/2013 08:52 PM, Borislav Petkov wrote:
@@ -186,8 +186,8 @@ static int erst_exec_stall(struct apei_exec_context *ctx,
if (ctx-value FIRMWARE_MAX_STALL) {
if (!in_nmi())
- pr_warning(FW_WARN ERST_PFX
- Too long stall
On 07/25/2013 11:31 PM, Luck, Tony wrote:
MCESEV(
+ PANIC, Action required but kernel thread is not continuable,
+ SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR,
MCI_UC_SAR|MCI_ADDR),
+ MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV,
On 07/31/2013 11:30 PM, Bjorn Helgaas wrote:
On Wed, Jul 31, 2013 at 3:46 AM, Naveen N. Rao
naveen.n@linux.vnet.ibm.com wrote:
My key question was about why we are using a field width of 10 implying a
32-bit value, rather than a field width of 18 as suggested by the data type
scripts we maintain the
old behaviour if flags remains set at zero (or is reset to 0).
Signed-off-by: Tony Luck tony.l...@intel.com
Patch
Acked-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
with a small change below.
---
diff --git a/Documentation/acpi/apei/einj.txt
b
On 10/16/2013 07:09 AM, Chen Gong wrote:
On Tue, Oct 15, 2013 at 11:47:23PM +0530, Naveen N. Rao wrote:
Date: Tue, 15 Oct 2013 23:47:23 +0530
From: Naveen N. Rao naveen.n@linux.vnet.ibm.com
To: Chen, Gong gong.c...@linux.intel.com
Cc: tony.l...@intel.com, b...@alien8.de, linux-kernel
| 5 ++---
include/linux/cper.h | 11 +--
5 files changed, 18 insertions(+), 12 deletions(-)
Acked-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
Regards,
Naveen
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message
On 10/16/2013 12:53 AM, Borislav Petkov wrote:
On Wed, Oct 16, 2013 at 12:40:40AM +0530, Naveen N. Rao wrote:
+2 ;)
You're counting for 2 people, huh?
That's me raising both my hands :)
:-)
While at it, I wonder if we're better off calling these Hardware
events rather than Hardware
On 10/16/2013 04:19 PM, Borislav Petkov wrote:
Btw, I don't know what's the problem but when I hit reply-to-all to your
emails, mutt drops your email address from the To: and makes the CC:
list become the To: list. Strange.
I'm seeing the same thing. Looking at the headers, Chen Gong's email
Gong,
This mail seems to have missed copying you given the header issues.
Thanks,
Naveen
On 10/17/2013 05:51 PM, Naveen N. Rao wrote:
On 10/16/2013 07:09 AM, Chen Gong wrote:
On Tue, Oct 15, 2013 at 11:47:23PM +0530, Naveen N. Rao wrote:
Date: Tue, 15 Oct 2013 23:47:23 +0530
From: Naveen N
On 10/18/2013 01:53 PM, Chen, Gong wrote:
Keep up only the most important fields for memory error
reporting. The detail information will be moved to perf/trace
interface.
Suggested-by: Tony Luck tony.l...@intel.com
Signed-off-by: Chen, Gong gong.c...@linux.intel.com
Reviewed-by: Mauro Carvalho
On 10/18/2013 01:53 PM, Chen, Gong wrote:
This H/W error log driver (a.k.a eMCA driver) is implemented based on
http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html
After errors are captured, more valuable information can be
got via this new
On 10/18/2013 01:53 PM, Chen, Gong wrote:
To prepare for the following patches and make related
definition more clear, update some definitions about CPER.
v2 - v1: Update some more definitions suggested by Boris
Signed-off-by: Chen, Gong gong.c...@linux.intel.com
Acked-by: Borislav Petkov
On 10/14/2013 10:42 PM, Tony Luck wrote:
On Mon, Oct 14, 2013 at 3:36 AM, Borislav Petkov b...@alien8.de wrote:
On Mon, Oct 14, 2013 at 12:55:00AM -0400, Chen Gong wrote:
Because most of data in CPER are empty or unimportant.
It is not about whether it is important or not - the question is
On 10/14/2013 10:42 PM, Tony Luck wrote:
On Mon, Oct 14, 2013 at 3:36 AM, Borislav Petkov b...@alien8.de wrote:
On Mon, Oct 14, 2013 at 12:55:00AM -0400, Chen Gong wrote:
Because most of data in CPER are empty or unimportant.
It is not about whether it is important or not - the question is
On 2013/10/11 02:32AM, Chen Gong wrote:
Use trace interface to elaborate all H/W error related
information.
Signed-off-by: Chen, Gong gong.c...@linux.intel.com
---
snip
+TRACE_EVENT(extlog_mem_event,
+ TP_PROTO(u32 etype,
+ char *dimm_loc,
+ const uuid_le
On 2013/10/11 02:32AM, Chen Gong wrote:
In latest UEFI spec(by now it is 2.4) memory error definition
for CPER (UEFI 2.4 Appendix N Common Platform Error Record)
adds some new fields. These fields help people to locate
memory error on actual DIMM location.
Original-author: Tony Luck
On 10/15/2013 10:30 PM, Borislav Petkov wrote:
On Tue, Oct 15, 2013 at 10:24:35PM +0530, Naveen N. Rao wrote:
On 2013/10/11 02:32AM, Chen Gong wrote:
Use trace interface to elaborate all H/W error related
information.
Signed-off-by: Chen, Gong gong.c...@linux.intel.com
---
snip
+TRACE_EVENT
On 2013/10/11 02:32AM, Chen Gong wrote:
To satisfy the necessary of following patches and make related definition
more clear, update some definitions about CPER. No functional changes.
Signed-off-by: Chen, Gong gong.c...@linux.intel.com
---
drivers/acpi/apei/apei-internal.h | 12 -
/kernel/setup.c | 1 +
drivers/firmware/dmi_scan.c | 60
+
include/linux/dmi.h | 5
4 files changed, 67 insertions(+)
Acked-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
--
To unsubscribe from this list: send the line
On 2013/10/15 09:15AM, Tony Luck wrote:
On Tue, Oct 15, 2013 at 2:28 AM, Borislav Petkov b...@alien8.de wrote:
We can even add a hint for the user like:
Above errors have been corrected by the hardware and require no
further action.
Btw, this is valid for both dmesg and trace
gong.c...@linux.intel.com
---
drivers/acpi/apei/cper.c | 12
1 file changed, 12 insertions(+)
Acked-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
diff --git a/drivers/acpi/apei/cper.c b/drivers/acpi/apei/cper.c
index 680230c..2a4389f 100644
--- a/drivers/acpi/apei/cper.c
HEST for corrected machine checks
Here's a patch that implements this technique. If the firmware advertises
support for firmware first mode in the CMC structure, we disable CMCI and
polling for all the MCA banks listed in the CMC structure.
- Naveen
Signed-off-by: Naveen N. Rao naveen.n
On 06/15/2013 08:18 PM, Borislav Petkov wrote:
On Fri, Jun 14, 2013 at 11:47:21PM +0530, Naveen N. Rao wrote:
HEST for corrected machine checks
Here's a patch that implements this technique. If the firmware advertises
support for firmware first mode in the CMC structure, we disable CMCI
On 06/16/2013 05:50 PM, Borislav Petkov wrote:
On Fri, Jun 14, 2013 at 11:47:21PM +0530, Naveen N. Rao wrote:
+static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, void *data)
+{
+ int i;
+ struct acpi_hest_ia_corrected *cmc;
+ struct acpi_hest_ia_error_bank
On 2013/06/17 09:06AM, Borislav Petkov wrote:
On Mon, Jun 17, 2013 at 12:30:05PM +0530, Naveen N. Rao wrote:
Hmm, so if CMCI is not supported, you just disabled polling of this bank
and returned here. Not good.
This is on purpose. If the bank doesn't support CMCI and we were
polling
On 06/17/2013 01:51 PM, Borislav Petkov wrote:
On Mon, Jun 17, 2013 at 01:41:03PM +0530, Naveen N. Rao wrote:
Yes, we used to poll since we do not get notified via MCE/CMCI.
However, with firmware first set in CMC structure, the firmware is
now controlling all corrected error reporting
which MCA banks function in FF mode, so that we continue to
monitor error events on the other banks.
- Naveen
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
arch/x86/include/asm/mce.h|3 ++
arch/x86/kernel/cpu/mcheck/mce-internal.h |3 ++
arch/x86/kernel
On 2013/06/20 05:16AM, Chen Gong wrote:
Update some SRAR severity conditions check to make it clearer,
according to latest Intel SDM Vol 3(June 2013), table 15-20.
Signed-off-by: Chen Gong gong.c...@linux.intel.com
---
arch/x86/kernel/cpu/mcheck/mce-severity.c | 15 +--
1
Tony, Boris,
Can you please see if the comments in the below patch include the details you
were expecting?
Thanks,
Naveen
--
Add comments to clarify usage of the various bitfields in the MCA subsystem
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
arch/x86/kernel/cpu/mcheck
Tony,
Thanks - I have included your text in the patch. I wasn't sure if I should add
your Signed-off-by. Kindly review and do the needful.
Thanks,
Naveen
--
Add comments to clarify usage of the various bitfields in the MCA subsystem
Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
On 06/25/2013 10:01 PM, Luck, Tony wrote:
The SDM talks about non-affected logical processors, but perhaps we
can call this an unaffected thread?
unaffected sounds a bit more natural (but close enough to the wording in
the SDM that people should see the connection).
Yup - unnatural is
On 10/19/2013 04:56 PM, Chen Gong wrote:
On Fri, Oct 18, 2013 at 05:31:21PM +0530, Naveen N. Rao wrote:
Date: Fri, 18 Oct 2013 17:31:21 +0530
From: Naveen N. Rao naveen.n@linux.vnet.ibm.com
To: Chen, Gong gong.c...@linux.intel.com, tony.l...@intel.com,
b...@alien8.de, j...@perches.com
On 10/20/2013 01:51 PM, Borislav Petkov wrote:
On Sun, Oct 20, 2013 at 03:06:15AM -0400, Chen Gong wrote:
Oh, yes it is. Furthermore, it reminds me where is the best place
to put cper.c from I write this patch series. CPER really doesn't
dpend on APEI even ACPI. Maybe lib/ ia an option. I can
On 10/22/2013 12:33 AM, Luck, Tony wrote:
But yes, this is possible and it would make it all even cleaner
and simpler by simply not needing the reg/dereg interfaces for
mce_ext_err_print but adding it to the chain.
So this is on top of the 9 patch series (using the V4 that Chen Gong
posted for
1 - 100 of 1306 matches
Mail list logo