[tip:ras/core] x86/mce/AMD: Increase size of the bank_map type

2016-07-08 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  955d1427a91b18f53e082bd7c19c40ce13b0a0f4
Gitweb: http://git.kernel.org/tip/955d1427a91b18f53e082bd7c19c40ce13b0a0f4
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Fri, 8 Jul 2016 11:09:38 +0200
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Fri, 8 Jul 2016 11:29:25 +0200

x86/mce/AMD: Increase size of the bank_map type

Change bank_map type from 'char' to 'int' since we now have more than eight
banks in a system.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghan...@amd.com>
Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Aravind Gopalakrishnan <aravindksg.l...@gmail.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Josh Poimboeuf <jpoim...@redhat.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: linux-edac <linux-e...@vger.kernel.org>
Link: http://lkml.kernel.org/r/1467968983-4874-2-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 10b0661..7b7f3be 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -93,7 +93,7 @@ const char * const amd_df_mcablock_names[] = {
 EXPORT_SYMBOL_GPL(amd_df_mcablock_names);
 
 static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
-static DEFINE_PER_CPU(unsigned char, bank_map);/* see which banks are 
on */
+static DEFINE_PER_CPU(unsigned int, bank_map); /* see which banks are on */
 
 static void amd_threshold_interrupt(void);
 static void amd_deferred_error_interrupt(void);


[tip:ras/core] x86/mce/AMD: Increase size of the bank_map type

2016-07-08 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  955d1427a91b18f53e082bd7c19c40ce13b0a0f4
Gitweb: http://git.kernel.org/tip/955d1427a91b18f53e082bd7c19c40ce13b0a0f4
Author: Aravind Gopalakrishnan 
AuthorDate: Fri, 8 Jul 2016 11:09:38 +0200
Committer:  Ingo Molnar 
CommitDate: Fri, 8 Jul 2016 11:29:25 +0200

x86/mce/AMD: Increase size of the bank_map type

Change bank_map type from 'char' to 'int' since we now have more than eight
banks in a system.

Signed-off-by: Aravind Gopalakrishnan 
Signed-off-by: Yazen Ghannam 
Signed-off-by: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Aravind Gopalakrishnan 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: linux-edac 
Link: http://lkml.kernel.org/r/1467968983-4874-2-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 10b0661..7b7f3be 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -93,7 +93,7 @@ const char * const amd_df_mcablock_names[] = {
 EXPORT_SYMBOL_GPL(amd_df_mcablock_names);
 
 static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
-static DEFINE_PER_CPU(unsigned char, bank_map);/* see which banks are 
on */
+static DEFINE_PER_CPU(unsigned int, bank_map); /* see which banks are on */
 
 static void amd_threshold_interrupt(void);
 static void amd_deferred_error_interrupt(void);


[tip:ras/core] x86/mce: Grade uncorrected errors for SMCA-enabled systems

2016-05-03 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  6bda529ec42e1cd4dde1c3d0a1a18000ffd3d419
Gitweb: http://git.kernel.org/tip/6bda529ec42e1cd4dde1c3d0a1a18000ffd3d419
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Sat, 30 Apr 2016 14:33:52 +0200
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 3 May 2016 08:24:15 +0200

x86/mce: Grade uncorrected errors for SMCA-enabled systems

For upcoming processors with Scalable MCA feature, we need to check the
"succor" CPUID bit and the TCC bit in the MCx_STATUS register in order
to grade an MCE's severity.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghan...@amd.com>
[ Simplified code flow, shortened comments. ]
Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.l...@gmail.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: linux-edac <linux-e...@vger.kernel.org>
Link: 
http://lkml.kernel.org/r/1459886686-13977-3-git-send-email-yazen.ghan...@amd.com
Link: http://lkml.kernel.org/r/1462019637-16474-3-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c 
b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 5119766..631356c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -204,6 +204,33 @@ static int error_context(struct mce *m)
return IN_KERNEL;
 }
 
+static int mce_severity_amd_smca(struct mce *m, int err_ctx)
+{
+   u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
+   u32 low, high;
+
+   /*
+* We need to look at the following bits:
+* - "succor" bit (data poisoning support), and
+* - TCC bit (Task Context Corrupt)
+* in MCi_STATUS to determine error severity.
+*/
+   if (!mce_flags.succor)
+   return MCE_PANIC_SEVERITY;
+
+   if (rdmsr_safe(addr, , ))
+   return MCE_PANIC_SEVERITY;
+
+   /* TCC (Task context corrupt). If set and if IN_KERNEL, panic. */
+   if ((low & MCI_CONFIG_MCAX) &&
+   (m->status & MCI_STATUS_TCC) &&
+   (err_ctx == IN_KERNEL))
+   return MCE_PANIC_SEVERITY;
+
+/* ...otherwise invoke hwpoison handler. */
+   return MCE_AR_SEVERITY;
+}
+
 /*
  * See AMD Error Scope Hierarchy table in a newer BKDG. For example
  * 49125_15h_Models_30h-3Fh_BKDG.pdf, section "RAS Features"
@@ -225,6 +252,9 @@ static int mce_severity_amd(struct mce *m, int tolerant, 
char **msg, bool is_exc
 * to at least kill process to prolong system operation.
 */
if (mce_flags.overflow_recov) {
+   if (mce_flags.smca)
+   return mce_severity_amd_smca(m, ctx);
+
/* software can try to contain */
if (!(m->mcgstatus & MCG_STATUS_RIPV) && (ctx == 
IN_KERNEL))
return MCE_PANIC_SEVERITY;


[tip:ras/core] x86/mce: Carve out writes to MCx_STATUS and MCx_CTL

2016-05-03 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  bb91f8c0176b072aeb6b84cfd7e04084025121e0
Gitweb: http://git.kernel.org/tip/bb91f8c0176b072aeb6b84cfd7e04084025121e0
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Sat, 30 Apr 2016 14:33:53 +0200
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 3 May 2016 08:24:16 +0200

x86/mce: Carve out writes to MCx_STATUS and MCx_CTL

We need to do this after __mcheck_cpu_init_vendor() as for
ScalableMCA processors, there are going to be new MSR write handlers
if the feature is detected using CPUID bit (which happens in
__mcheck_cpu_init_vendor()).

No functional change is introduced here.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghan...@amd.com>
Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.l...@gmail.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: linux-edac <linux-e...@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462019637-16474-4-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 43f8b49..6bffb26 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1420,7 +1420,6 @@ static void __mcheck_cpu_init_generic(void)
enum mcp_flags m_fl = 0;
mce_banks_t all_banks;
u64 cap;
-   int i;
 
if (!mca_cfg.bootlog)
m_fl = MCP_DONTLOG;
@@ -1436,6 +1435,11 @@ static void __mcheck_cpu_init_generic(void)
rdmsrl(MSR_IA32_MCG_CAP, cap);
if (cap & MCG_CTL_P)
wrmsr(MSR_IA32_MCG_CTL, 0x, 0x);
+}
+
+static void __mcheck_cpu_init_clear_banks(void)
+{
+   int i;
 
for (i = 0; i < mca_cfg.banks; i++) {
struct mce_bank *b = _banks[i];
@@ -1717,6 +1721,7 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 
__mcheck_cpu_init_generic();
__mcheck_cpu_init_vendor(c);
+   __mcheck_cpu_init_clear_banks();
__mcheck_cpu_init_timer();
 }
 
@@ -2121,6 +2126,7 @@ static void mce_syscore_resume(void)
 {
__mcheck_cpu_init_generic();
__mcheck_cpu_init_vendor(raw_cpu_ptr(_info));
+   __mcheck_cpu_init_clear_banks();
 }
 
 static struct syscore_ops mce_syscore_ops = {
@@ -2138,6 +2144,7 @@ static void mce_cpu_restart(void *data)
if (!mce_available(raw_cpu_ptr(_info)))
return;
__mcheck_cpu_init_generic();
+   __mcheck_cpu_init_clear_banks();
__mcheck_cpu_init_timer();
 }
 


[tip:ras/core] x86/mce: Log MCEs after a warm rest on AMD, Fam17h and later

2016-05-03 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  10001d91aa0efc793952051f9070a569cc388ebc
Gitweb: http://git.kernel.org/tip/10001d91aa0efc793952051f9070a569cc388ebc
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Sat, 30 Apr 2016 14:33:51 +0200
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 3 May 2016 08:24:15 +0200

x86/mce: Log MCEs after a warm rest on AMD, Fam17h and later

For Fam17h, we want to report errors that persist across reboots. Error
persistence is dependent on HW and no BIOS currently fiddles with values
here. So allow reporting of errors upon boot until something goes wrong.

Logging is disabled on older families because BIOS didn't clear the MCA
banks after a cold reset.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghan...@amd.com>
Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Aravind Gopalakrishnan <aravindksg.l...@gmail.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brge...@gmail.com>
Cc: Denys Vlasenko <dvlas...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: linux-edac <linux-e...@vger.kernel.org>
Link: 
http://lkml.kernel.org/r/1459886686-13977-2-git-send-email-yazen.ghan...@amd.com
Link: http://lkml.kernel.org/r/1462019637-16474-2-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 6b7039c..43f8b49 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1495,7 +1495,7 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 
*c)
 */
clear_bit(10, (unsigned long *)_banks[4].ctl);
}
-   if (c->x86 <= 17 && cfg->bootlog < 0) {
+   if (c->x86 < 17 && cfg->bootlog < 0) {
/*
 * Lots of broken BIOS around that don't clear them
 * by default and leave crap in there. Don't log:


[tip:ras/core] x86/mce: Grade uncorrected errors for SMCA-enabled systems

2016-05-03 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  6bda529ec42e1cd4dde1c3d0a1a18000ffd3d419
Gitweb: http://git.kernel.org/tip/6bda529ec42e1cd4dde1c3d0a1a18000ffd3d419
Author: Aravind Gopalakrishnan 
AuthorDate: Sat, 30 Apr 2016 14:33:52 +0200
Committer:  Ingo Molnar 
CommitDate: Tue, 3 May 2016 08:24:15 +0200

x86/mce: Grade uncorrected errors for SMCA-enabled systems

For upcoming processors with Scalable MCA feature, we need to check the
"succor" CPUID bit and the TCC bit in the MCx_STATUS register in order
to grade an MCE's severity.

Signed-off-by: Aravind Gopalakrishnan 
Signed-off-by: Yazen Ghannam 
[ Simplified code flow, shortened comments. ]
Signed-off-by: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Aravind Gopalakrishnan 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: linux-edac 
Link: 
http://lkml.kernel.org/r/1459886686-13977-3-git-send-email-yazen.ghan...@amd.com
Link: http://lkml.kernel.org/r/1462019637-16474-3-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c 
b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 5119766..631356c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -204,6 +204,33 @@ static int error_context(struct mce *m)
return IN_KERNEL;
 }
 
+static int mce_severity_amd_smca(struct mce *m, int err_ctx)
+{
+   u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
+   u32 low, high;
+
+   /*
+* We need to look at the following bits:
+* - "succor" bit (data poisoning support), and
+* - TCC bit (Task Context Corrupt)
+* in MCi_STATUS to determine error severity.
+*/
+   if (!mce_flags.succor)
+   return MCE_PANIC_SEVERITY;
+
+   if (rdmsr_safe(addr, , ))
+   return MCE_PANIC_SEVERITY;
+
+   /* TCC (Task context corrupt). If set and if IN_KERNEL, panic. */
+   if ((low & MCI_CONFIG_MCAX) &&
+   (m->status & MCI_STATUS_TCC) &&
+   (err_ctx == IN_KERNEL))
+   return MCE_PANIC_SEVERITY;
+
+/* ...otherwise invoke hwpoison handler. */
+   return MCE_AR_SEVERITY;
+}
+
 /*
  * See AMD Error Scope Hierarchy table in a newer BKDG. For example
  * 49125_15h_Models_30h-3Fh_BKDG.pdf, section "RAS Features"
@@ -225,6 +252,9 @@ static int mce_severity_amd(struct mce *m, int tolerant, 
char **msg, bool is_exc
 * to at least kill process to prolong system operation.
 */
if (mce_flags.overflow_recov) {
+   if (mce_flags.smca)
+   return mce_severity_amd_smca(m, ctx);
+
/* software can try to contain */
if (!(m->mcgstatus & MCG_STATUS_RIPV) && (ctx == 
IN_KERNEL))
return MCE_PANIC_SEVERITY;


[tip:ras/core] x86/mce: Carve out writes to MCx_STATUS and MCx_CTL

2016-05-03 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  bb91f8c0176b072aeb6b84cfd7e04084025121e0
Gitweb: http://git.kernel.org/tip/bb91f8c0176b072aeb6b84cfd7e04084025121e0
Author: Aravind Gopalakrishnan 
AuthorDate: Sat, 30 Apr 2016 14:33:53 +0200
Committer:  Ingo Molnar 
CommitDate: Tue, 3 May 2016 08:24:16 +0200

x86/mce: Carve out writes to MCx_STATUS and MCx_CTL

We need to do this after __mcheck_cpu_init_vendor() as for
ScalableMCA processors, there are going to be new MSR write handlers
if the feature is detected using CPUID bit (which happens in
__mcheck_cpu_init_vendor()).

No functional change is introduced here.

Signed-off-by: Aravind Gopalakrishnan 
Signed-off-by: Yazen Ghannam 
Signed-off-by: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Aravind Gopalakrishnan 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: linux-edac 
Link: http://lkml.kernel.org/r/1462019637-16474-4-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/cpu/mcheck/mce.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 43f8b49..6bffb26 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1420,7 +1420,6 @@ static void __mcheck_cpu_init_generic(void)
enum mcp_flags m_fl = 0;
mce_banks_t all_banks;
u64 cap;
-   int i;
 
if (!mca_cfg.bootlog)
m_fl = MCP_DONTLOG;
@@ -1436,6 +1435,11 @@ static void __mcheck_cpu_init_generic(void)
rdmsrl(MSR_IA32_MCG_CAP, cap);
if (cap & MCG_CTL_P)
wrmsr(MSR_IA32_MCG_CTL, 0x, 0x);
+}
+
+static void __mcheck_cpu_init_clear_banks(void)
+{
+   int i;
 
for (i = 0; i < mca_cfg.banks; i++) {
struct mce_bank *b = _banks[i];
@@ -1717,6 +1721,7 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 
__mcheck_cpu_init_generic();
__mcheck_cpu_init_vendor(c);
+   __mcheck_cpu_init_clear_banks();
__mcheck_cpu_init_timer();
 }
 
@@ -2121,6 +2126,7 @@ static void mce_syscore_resume(void)
 {
__mcheck_cpu_init_generic();
__mcheck_cpu_init_vendor(raw_cpu_ptr(_info));
+   __mcheck_cpu_init_clear_banks();
 }
 
 static struct syscore_ops mce_syscore_ops = {
@@ -2138,6 +2144,7 @@ static void mce_cpu_restart(void *data)
if (!mce_available(raw_cpu_ptr(_info)))
return;
__mcheck_cpu_init_generic();
+   __mcheck_cpu_init_clear_banks();
__mcheck_cpu_init_timer();
 }
 


[tip:ras/core] x86/mce: Log MCEs after a warm rest on AMD, Fam17h and later

2016-05-03 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  10001d91aa0efc793952051f9070a569cc388ebc
Gitweb: http://git.kernel.org/tip/10001d91aa0efc793952051f9070a569cc388ebc
Author: Aravind Gopalakrishnan 
AuthorDate: Sat, 30 Apr 2016 14:33:51 +0200
Committer:  Ingo Molnar 
CommitDate: Tue, 3 May 2016 08:24:15 +0200

x86/mce: Log MCEs after a warm rest on AMD, Fam17h and later

For Fam17h, we want to report errors that persist across reboots. Error
persistence is dependent on HW and no BIOS currently fiddles with values
here. So allow reporting of errors upon boot until something goes wrong.

Logging is disabled on older families because BIOS didn't clear the MCA
banks after a cold reset.

Signed-off-by: Aravind Gopalakrishnan 
Signed-off-by: Yazen Ghannam 
Signed-off-by: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Aravind Gopalakrishnan 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: linux-edac 
Link: 
http://lkml.kernel.org/r/1459886686-13977-2-git-send-email-yazen.ghan...@amd.com
Link: http://lkml.kernel.org/r/1462019637-16474-2-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/cpu/mcheck/mce.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 6b7039c..43f8b49 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1495,7 +1495,7 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 
*c)
 */
clear_bit(10, (unsigned long *)_banks[4].ctl);
}
-   if (c->x86 <= 17 && cfg->bootlog < 0) {
+   if (c->x86 < 17 && cfg->bootlog < 0) {
/*
 * Lots of broken BIOS around that don't clear them
 * by default and leave crap in there. Don't log:


[tip:ras/core] x86/mce/AMD: Document some functionality

2016-03-08 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  ea2ca36b658cfc6081ee454e97593c81f646806e
Gitweb: http://git.kernel.org/tip/ea2ca36b658cfc6081ee454e97593c81f646806e
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Mon, 7 Mar 2016 14:02:21 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 8 Mar 2016 11:48:15 +0100

x86/mce/AMD: Document some functionality

In an attempt to aid in understanding of what the threshold_block
structure holds, provide comments to describe the members here. Also,
trim comments around threshold_restart_bank() and update copyright info.

No functional change is introduced.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
[ Shorten comments. ]
Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: linux-edac <linux-e...@vger.kernel.org>
Link: 
http://lkml.kernel.org/r/1457021458-2522-6-git-send-email-aravind.gopalakrish...@amd.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/include/asm/amd_nb.h| 26 +-
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  7 ++-
 2 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 3c56ef1..5e828da 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -27,15 +27,23 @@ struct amd_l3_cache {
 };
 
 struct threshold_block {
-   unsigned intblock;
-   unsigned intbank;
-   unsigned intcpu;
-   u32 address;
-   u16 interrupt_enable;
-   boolinterrupt_capable;
-   u16 threshold_limit;
-   struct kobject  kobj;
-   struct list_headmiscj;
+   unsigned int block; /* Number within bank */
+   unsigned int bank;  /* MCA bank the block belongs 
to */
+   unsigned int cpu;   /* CPU which controls MCA bank 
*/
+   u32  address;   /* MSR address for the block */
+   u16  interrupt_enable;  /* Enable/Disable APIC 
interrupt */
+   bool interrupt_capable; /* Bank can generate an 
interrupt. */
+
+   u16  threshold_limit;   /*
+* Value upon which threshold
+* interrupt is generated.
+*/
+
+   struct kobject   kobj;  /* sysfs object */
+   struct list_head miscj; /*
+* List of threshold blocks
+* within a bank.
+*/
 };
 
 struct threshold_bank {
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index a53eb1b..9d656fd 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -1,5 +1,5 @@
 /*
- *  (c) 2005-2015 Advanced Micro Devices, Inc.
+ *  (c) 2005-2016 Advanced Micro Devices, Inc.
  *  Your use of this code is subject to the terms and conditions of the
  *  GNU general public license version 2. See "COPYING" or
  *  http://www.gnu.org/licenses/gpl.html
@@ -201,10 +201,7 @@ static int lvt_off_valid(struct threshold_block *b, int 
apic, u32 lo, u32 hi)
return 1;
 };
 
-/*
- * Called via smp_call_function_single(), must be called with correct
- * cpu affinity.
- */
+/* Reprogram MCx_MISC MSR behind this threshold bank. */
 static void threshold_restart_bank(void *_tr)
 {
struct thresh_restart *tr = _tr;


[tip:ras/core] x86/mce/AMD: Fix logic to obtain block address

2016-03-08 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  8dd1e17a55b0bb1206c71c7a4344c5e3037cdf65
Gitweb: http://git.kernel.org/tip/8dd1e17a55b0bb1206c71c7a4344c5e3037cdf65
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Mon, 7 Mar 2016 14:02:19 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 8 Mar 2016 11:48:14 +0100

x86/mce/AMD: Fix logic to obtain block address

In upcoming processors, the BLKPTR field is no longer used to indicate
the MSR number of the additional register. Insted, it simply indicates
the prescence of additional MSRs.

Fix the logic here to gather MSR address from MSR_AMD64_SMCA_MCx_MISC()
for newer processors and fall back to existing logic for older
processors.

[ Drop nextaddr_out label; style cleanups. ]
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: linux-edac <linux-e...@vger.kernel.org>
Link: 
http://lkml.kernel.org/r/1457021458-2522-4-git-send-email-aravind.gopalakrish...@amd.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/include/asm/mce.h   |  4 ++
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 84 +++-
 2 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 9c467fe..72f8688 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -104,10 +104,14 @@
 #define MCE_LOG_SIGNATURE  "MACHINECHECK"
 
 /* AMD Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_MISC0   0xc0002003
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
 #define MSR_AMD64_SMCA_MC0_IPID0xc0002005
+#define MSR_AMD64_SMCA_MC0_MISC1   0xc000200a
+#define MSR_AMD64_SMCA_MCx_MISC(x) (MSR_AMD64_SMCA_MC0_MISC0 + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_MISCy(x, y) ((MSR_AMD64_SMCA_MC0_MISC1 + y) + 
(0x10*(x)))
 
 /*
  * This structure contains all data related to the MCE log.  Also
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index ee487a9..a53eb1b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -304,6 +304,51 @@ static void deferred_error_interrupt_enable(struct 
cpuinfo_x86 *c)
wrmsr(MSR_CU_DEF_ERR, low, high);
 }
 
+static u32 get_block_address(u32 current_addr, u32 low, u32 high,
+unsigned int bank, unsigned int block)
+{
+   u32 addr = 0, offset = 0;
+
+   if (mce_flags.smca) {
+   if (!block) {
+   addr = MSR_AMD64_SMCA_MCx_MISC(bank);
+   } else {
+   /*
+* For SMCA enabled processors, BLKPTR field of the
+* first MISC register (MCx_MISC0) indicates presence of
+* additional MISC register set (MISC1-4).
+*/
+   u32 low, high;
+
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank), , 
))
+   return addr;
+
+   if (!(low & MCI_CONFIG_MCAX))
+   return addr;
+
+   if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank), , 
) &&
+   (low & MASK_BLKPTR_LO))
+   addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 
1);
+   }
+   return addr;
+   }
+
+   /* Fall back to method we used for older processors: */
+   switch (block) {
+   case 0:
+   addr = MSR_IA32_MCx_MISC(bank);
+   break;
+   case 1:
+   offset = ((low & MASK_BLKPTR_LO) >> 21);
+   if (offset)
+   addr = MCG_XBLK_ADDR + offset;
+   break;
+   default:
+   addr = ++current_addr;
+   }
+   return addr;
+}
+
 static int
 prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
int offset, u32 misc_high)
@@ -366,16 +411,9 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 
for (bank = 0; bank < mca_cfg.banks; ++bank) {
for (block = 0; block < NR_BLOCKS; ++block) {
-   if (block == 0)
-   address = MSR_IA32_MCx_MISC(bank);
-   else if (block == 1) {
-   address = (low & MASK_BLKPTR_LO) >> 21;
-   if (!address)
-  

[tip:ras/core] x86/mce: Clarify comments regarding deferred error

2016-03-08 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  2cd3b5f9033f0b051842a279dac5a54271cbd3c8
Gitweb: http://git.kernel.org/tip/2cd3b5f9033f0b051842a279dac5a54271cbd3c8
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Mon, 7 Mar 2016 14:02:20 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 8 Mar 2016 11:48:15 +0100

x86/mce: Clarify comments regarding deferred error

Deferred errors indicate errors that hardware could not fix. But it
still does not cause any interruption to program flow. So it does not
generate any #MC and UC bit in MCx_STATUS is not set.

Correct comment.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: linux-edac <linux-e...@vger.kernel.org>
Link: 
http://lkml.kernel.org/r/1457021458-2522-5-git-send-email-aravind.gopalakrish...@amd.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/include/asm/mce.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 72f8688..cfff341 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -40,7 +40,7 @@
 #define MCI_STATUS_AR   (1ULL<<55)  /* Action required */
 
 /* AMD-specific bits */
-#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
+#define MCI_STATUS_DEFERRED(1ULL<<44)  /* uncorrected error, deferred 
exception */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
 #define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
 


[tip:ras/core] x86/mce/AMD: Document some functionality

2016-03-08 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  ea2ca36b658cfc6081ee454e97593c81f646806e
Gitweb: http://git.kernel.org/tip/ea2ca36b658cfc6081ee454e97593c81f646806e
Author: Aravind Gopalakrishnan 
AuthorDate: Mon, 7 Mar 2016 14:02:21 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 8 Mar 2016 11:48:15 +0100

x86/mce/AMD: Document some functionality

In an attempt to aid in understanding of what the threshold_block
structure holds, provide comments to describe the members here. Also,
trim comments around threshold_restart_bank() and update copyright info.

No functional change is introduced.

Signed-off-by: Aravind Gopalakrishnan 
[ Shorten comments. ]
Signed-off-by: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: linux-edac 
Link: 
http://lkml.kernel.org/r/1457021458-2522-6-git-send-email-aravind.gopalakrish...@amd.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/amd_nb.h| 26 +-
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  7 ++-
 2 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 3c56ef1..5e828da 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -27,15 +27,23 @@ struct amd_l3_cache {
 };
 
 struct threshold_block {
-   unsigned intblock;
-   unsigned intbank;
-   unsigned intcpu;
-   u32 address;
-   u16 interrupt_enable;
-   boolinterrupt_capable;
-   u16 threshold_limit;
-   struct kobject  kobj;
-   struct list_headmiscj;
+   unsigned int block; /* Number within bank */
+   unsigned int bank;  /* MCA bank the block belongs 
to */
+   unsigned int cpu;   /* CPU which controls MCA bank 
*/
+   u32  address;   /* MSR address for the block */
+   u16  interrupt_enable;  /* Enable/Disable APIC 
interrupt */
+   bool interrupt_capable; /* Bank can generate an 
interrupt. */
+
+   u16  threshold_limit;   /*
+* Value upon which threshold
+* interrupt is generated.
+*/
+
+   struct kobject   kobj;  /* sysfs object */
+   struct list_head miscj; /*
+* List of threshold blocks
+* within a bank.
+*/
 };
 
 struct threshold_bank {
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index a53eb1b..9d656fd 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -1,5 +1,5 @@
 /*
- *  (c) 2005-2015 Advanced Micro Devices, Inc.
+ *  (c) 2005-2016 Advanced Micro Devices, Inc.
  *  Your use of this code is subject to the terms and conditions of the
  *  GNU general public license version 2. See "COPYING" or
  *  http://www.gnu.org/licenses/gpl.html
@@ -201,10 +201,7 @@ static int lvt_off_valid(struct threshold_block *b, int 
apic, u32 lo, u32 hi)
return 1;
 };
 
-/*
- * Called via smp_call_function_single(), must be called with correct
- * cpu affinity.
- */
+/* Reprogram MCx_MISC MSR behind this threshold bank. */
 static void threshold_restart_bank(void *_tr)
 {
struct thresh_restart *tr = _tr;


[tip:ras/core] x86/mce/AMD: Fix logic to obtain block address

2016-03-08 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  8dd1e17a55b0bb1206c71c7a4344c5e3037cdf65
Gitweb: http://git.kernel.org/tip/8dd1e17a55b0bb1206c71c7a4344c5e3037cdf65
Author: Aravind Gopalakrishnan 
AuthorDate: Mon, 7 Mar 2016 14:02:19 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 8 Mar 2016 11:48:14 +0100

x86/mce/AMD: Fix logic to obtain block address

In upcoming processors, the BLKPTR field is no longer used to indicate
the MSR number of the additional register. Insted, it simply indicates
the prescence of additional MSRs.

Fix the logic here to gather MSR address from MSR_AMD64_SMCA_MCx_MISC()
for newer processors and fall back to existing logic for older
processors.

[ Drop nextaddr_out label; style cleanups. ]
Signed-off-by: Aravind Gopalakrishnan 
Signed-off-by: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: linux-edac 
Link: 
http://lkml.kernel.org/r/1457021458-2522-4-git-send-email-aravind.gopalakrish...@amd.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/mce.h   |  4 ++
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 84 +++-
 2 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 9c467fe..72f8688 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -104,10 +104,14 @@
 #define MCE_LOG_SIGNATURE  "MACHINECHECK"
 
 /* AMD Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_MISC0   0xc0002003
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
 #define MSR_AMD64_SMCA_MC0_IPID0xc0002005
+#define MSR_AMD64_SMCA_MC0_MISC1   0xc000200a
+#define MSR_AMD64_SMCA_MCx_MISC(x) (MSR_AMD64_SMCA_MC0_MISC0 + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_MISCy(x, y) ((MSR_AMD64_SMCA_MC0_MISC1 + y) + 
(0x10*(x)))
 
 /*
  * This structure contains all data related to the MCE log.  Also
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index ee487a9..a53eb1b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -304,6 +304,51 @@ static void deferred_error_interrupt_enable(struct 
cpuinfo_x86 *c)
wrmsr(MSR_CU_DEF_ERR, low, high);
 }
 
+static u32 get_block_address(u32 current_addr, u32 low, u32 high,
+unsigned int bank, unsigned int block)
+{
+   u32 addr = 0, offset = 0;
+
+   if (mce_flags.smca) {
+   if (!block) {
+   addr = MSR_AMD64_SMCA_MCx_MISC(bank);
+   } else {
+   /*
+* For SMCA enabled processors, BLKPTR field of the
+* first MISC register (MCx_MISC0) indicates presence of
+* additional MISC register set (MISC1-4).
+*/
+   u32 low, high;
+
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank), , 
))
+   return addr;
+
+   if (!(low & MCI_CONFIG_MCAX))
+   return addr;
+
+   if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank), , 
) &&
+   (low & MASK_BLKPTR_LO))
+   addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 
1);
+   }
+   return addr;
+   }
+
+   /* Fall back to method we used for older processors: */
+   switch (block) {
+   case 0:
+   addr = MSR_IA32_MCx_MISC(bank);
+   break;
+   case 1:
+   offset = ((low & MASK_BLKPTR_LO) >> 21);
+   if (offset)
+   addr = MCG_XBLK_ADDR + offset;
+   break;
+   default:
+   addr = ++current_addr;
+   }
+   return addr;
+}
+
 static int
 prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
int offset, u32 misc_high)
@@ -366,16 +411,9 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 
for (bank = 0; bank < mca_cfg.banks; ++bank) {
for (block = 0; block < NR_BLOCKS; ++block) {
-   if (block == 0)
-   address = MSR_IA32_MCx_MISC(bank);
-   else if (block == 1) {
-   address = (low & MASK_BLKPTR_LO) >> 21;
-   if (!address)
-   break;
-
-   address += MCG_XBLK_ADDR;
-   } else
-   ++address;
+   address = get_block_address(address, low, high, bank, 
block);
+   if (!address)
+   bre

[tip:ras/core] x86/mce: Clarify comments regarding deferred error

2016-03-08 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  2cd3b5f9033f0b051842a279dac5a54271cbd3c8
Gitweb: http://git.kernel.org/tip/2cd3b5f9033f0b051842a279dac5a54271cbd3c8
Author: Aravind Gopalakrishnan 
AuthorDate: Mon, 7 Mar 2016 14:02:20 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 8 Mar 2016 11:48:15 +0100

x86/mce: Clarify comments regarding deferred error

Deferred errors indicate errors that hardware could not fix. But it
still does not cause any interruption to program flow. So it does not
generate any #MC and UC bit in MCx_STATUS is not set.

Correct comment.

Signed-off-by: Aravind Gopalakrishnan 
Signed-off-by: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: linux-edac 
Link: 
http://lkml.kernel.org/r/1457021458-2522-5-git-send-email-aravind.gopalakrish...@amd.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/mce.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 72f8688..cfff341 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -40,7 +40,7 @@
 #define MCI_STATUS_AR   (1ULL<<55)  /* Action required */
 
 /* AMD-specific bits */
-#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
+#define MCI_STATUS_DEFERRED(1ULL<<44)  /* uncorrected error, deferred 
exception */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
 #define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
 


[tip:ras/core] x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors

2016-03-08 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  be0aec23bf4624fd55650629fe8df20483487049
Gitweb: http://git.kernel.org/tip/be0aec23bf4624fd55650629fe8df20483487049
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Mon, 7 Mar 2016 14:02:18 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 8 Mar 2016 11:48:14 +0100

x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors

For Scalable MCA enabled processors, errors are listed per IP block. And
since it is not required for an IP to map to a particular bank, we need
to use HWID and McaType values from the MCx_IPID register to figure out
which IP a given bank represents.

We also have a new bit (TCC) in the MCx_STATUS register to indicate Task
context is corrupt.

Add logic here to decode errors from all known IP blocks for Fam17h
Model 00-0fh and to print TCC errors.

[ Minor fixups. ]
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: linux-edac <linux-e...@vger.kernel.org>
Link: 
http://lkml.kernel.org/r/1457021458-2522-3-git-send-email-aravind.gopalakrish...@amd.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/include/asm/mce.h   |  59 ++
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  29 +++
 drivers/edac/mce_amd.c   | 335 ++-
 3 files changed, 420 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 80ba0a8..9c467fe 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -42,6 +42,18 @@
 /* AMD-specific bits */
 #define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
+#define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
+
+/*
+ * McaX field if set indicates a given bank supports MCA extensions:
+ *  - Deferred error interrupt type is specifiable by bank.
+ *  - MCx_MISC0[BlkPtr] field indicates presence of extended MISC registers,
+ *But should not be used to determine MSR numbers.
+ *  - TCC bit is present in MCx_STATUS.
+ */
+#define MCI_CONFIG_MCAX0x1
+#define MCI_IPID_MCATYPE   0x
+#define MCI_IPID_HWID  0xFFF
 
 /*
  * Note that the full MCACOD field of IA32_MCi_STATUS MSR is
@@ -93,7 +105,9 @@
 
 /* AMD Scalable MCA */
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MC0_IPID0xc0002005
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
 
 /*
  * This structure contains all data related to the MCE log.  Also
@@ -291,4 +305,49 @@ struct cper_sec_mem_err;
 extern void apei_mce_report_mem_error(int corrected,
  struct cper_sec_mem_err *mem_err);
 
+/*
+ * Enumerate new IP types and HWID values in AMD processors which support
+ * Scalable MCA.
+ */
+#ifdef CONFIG_X86_MCE_AMD
+enum amd_ip_types {
+   SMCA_F17H_CORE = 0, /* Core errors */
+   SMCA_DF,/* Data Fabric */
+   SMCA_UMC,   /* Unified Memory Controller */
+   SMCA_PB,/* Parameter Block */
+   SMCA_PSP,   /* Platform Security Processor */
+   SMCA_SMU,   /* System Management Unit */
+   N_AMD_IP_TYPES
+};
+
+struct amd_hwid {
+   const char *name;
+   unsigned int hwid;
+};
+
+extern struct amd_hwid amd_hwids[N_AMD_IP_TYPES];
+
+enum amd_core_mca_blocks {
+   SMCA_LS = 0,/* Load Store */
+   SMCA_IF,/* Instruction Fetch */
+   SMCA_L2_CACHE,  /* L2 cache */
+   SMCA_DE,/* Decoder unit */
+   RES,/* Reserved */
+   SMCA_EX,/* Execution unit */
+   SMCA_FP,/* Floating Point */
+   SMCA_L3_CACHE,  /* L3 cache */
+   N_CORE_MCA_BLOCKS
+};
+
+extern const char * const amd_core_mcablock_names[N_CORE_MCA_BLOCKS];
+
+enum amd_df_mca_blocks {
+   SMCA_CS = 0,/* Coherent Slave */
+   SMCA_PIE,   /* Power management, Interrupts, etc */
+   N_DF_BLOCKS
+};
+
+extern const char * const amd_df_mcablock_names[N_DF_BLOCKS];
+#endif
+
 #endif /* _ASM_X86_MCE_H */
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 88de27b..ee487a9 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -71,6 +71,35 @@ static const char * const th_names[] = {
"execution_unit",
 };
 
+/* Define HWID to IP type mappings for Scalable MCA */
+struct amd_hwid am

[tip:ras/core] x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors

2016-03-08 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  be0aec23bf4624fd55650629fe8df20483487049
Gitweb: http://git.kernel.org/tip/be0aec23bf4624fd55650629fe8df20483487049
Author: Aravind Gopalakrishnan 
AuthorDate: Mon, 7 Mar 2016 14:02:18 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 8 Mar 2016 11:48:14 +0100

x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors

For Scalable MCA enabled processors, errors are listed per IP block. And
since it is not required for an IP to map to a particular bank, we need
to use HWID and McaType values from the MCx_IPID register to figure out
which IP a given bank represents.

We also have a new bit (TCC) in the MCx_STATUS register to indicate Task
context is corrupt.

Add logic here to decode errors from all known IP blocks for Fam17h
Model 00-0fh and to print TCC errors.

[ Minor fixups. ]
Signed-off-by: Aravind Gopalakrishnan 
Signed-off-by: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: linux-edac 
Link: 
http://lkml.kernel.org/r/1457021458-2522-3-git-send-email-aravind.gopalakrish...@amd.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/mce.h   |  59 ++
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  29 +++
 drivers/edac/mce_amd.c   | 335 ++-
 3 files changed, 420 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 80ba0a8..9c467fe 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -42,6 +42,18 @@
 /* AMD-specific bits */
 #define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
+#define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
+
+/*
+ * McaX field if set indicates a given bank supports MCA extensions:
+ *  - Deferred error interrupt type is specifiable by bank.
+ *  - MCx_MISC0[BlkPtr] field indicates presence of extended MISC registers,
+ *But should not be used to determine MSR numbers.
+ *  - TCC bit is present in MCx_STATUS.
+ */
+#define MCI_CONFIG_MCAX0x1
+#define MCI_IPID_MCATYPE   0x
+#define MCI_IPID_HWID  0xFFF
 
 /*
  * Note that the full MCACOD field of IA32_MCi_STATUS MSR is
@@ -93,7 +105,9 @@
 
 /* AMD Scalable MCA */
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MC0_IPID0xc0002005
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
 
 /*
  * This structure contains all data related to the MCE log.  Also
@@ -291,4 +305,49 @@ struct cper_sec_mem_err;
 extern void apei_mce_report_mem_error(int corrected,
  struct cper_sec_mem_err *mem_err);
 
+/*
+ * Enumerate new IP types and HWID values in AMD processors which support
+ * Scalable MCA.
+ */
+#ifdef CONFIG_X86_MCE_AMD
+enum amd_ip_types {
+   SMCA_F17H_CORE = 0, /* Core errors */
+   SMCA_DF,/* Data Fabric */
+   SMCA_UMC,   /* Unified Memory Controller */
+   SMCA_PB,/* Parameter Block */
+   SMCA_PSP,   /* Platform Security Processor */
+   SMCA_SMU,   /* System Management Unit */
+   N_AMD_IP_TYPES
+};
+
+struct amd_hwid {
+   const char *name;
+   unsigned int hwid;
+};
+
+extern struct amd_hwid amd_hwids[N_AMD_IP_TYPES];
+
+enum amd_core_mca_blocks {
+   SMCA_LS = 0,/* Load Store */
+   SMCA_IF,/* Instruction Fetch */
+   SMCA_L2_CACHE,  /* L2 cache */
+   SMCA_DE,/* Decoder unit */
+   RES,/* Reserved */
+   SMCA_EX,/* Execution unit */
+   SMCA_FP,/* Floating Point */
+   SMCA_L3_CACHE,  /* L3 cache */
+   N_CORE_MCA_BLOCKS
+};
+
+extern const char * const amd_core_mcablock_names[N_CORE_MCA_BLOCKS];
+
+enum amd_df_mca_blocks {
+   SMCA_CS = 0,/* Coherent Slave */
+   SMCA_PIE,   /* Power management, Interrupts, etc */
+   N_DF_BLOCKS
+};
+
+extern const char * const amd_df_mcablock_names[N_DF_BLOCKS];
+#endif
+
 #endif /* _ASM_X86_MCE_H */
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 88de27b..ee487a9 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -71,6 +71,35 @@ static const char * const th_names[] = {
"execution_unit",
 };
 
+/* Define HWID to IP type mappings for Scalable MCA */
+struct amd_hwid amd_hwids[] = {
+   [SMCA_F17H_CORE]= { "f17h_core",0xB0 },
+   [SMCA_DF]   = { "data_fabric",  0x2E },
+   [SMCA_UMC]  = { "umc",  0x96 },
+   [SMCA_PB]   = { "param_block",  0x5 },
+   [SMCA_PSP]  = { 

[tip:ras/core] x86/mce: Move MCx_CONFIG MSR definitions

2016-03-08 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  adc53f2e0ae2fcff10a4b981df14729ffb1482fc
Gitweb: http://git.kernel.org/tip/adc53f2e0ae2fcff10a4b981df14729ffb1482fc
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Mon, 7 Mar 2016 14:02:17 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Tue, 8 Mar 2016 11:48:14 +0100

x86/mce: Move MCx_CONFIG MSR definitions

Those MSRs are used only by the MCE code so move them there.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: linux-edac <linux-e...@vger.kernel.org>
Link: 
http://lkml.kernel.org/r/1456785179-14378-2-git-send-email-aravind.gopalakrish...@amd.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/include/asm/mce.h   | 4 
 arch/x86/include/asm/msr-index.h | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 2ea4527..80ba0a8 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -91,6 +91,10 @@
 #define MCE_LOG_LEN 32
 #define MCE_LOG_SIGNATURE  "MACHINECHECK"
 
+/* AMD Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+
 /*
  * This structure contains all data related to the MCE log.  Also
  * carries a signature to make it easier to find from external
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 5523465..b05402e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -264,10 +264,6 @@
 #define MSR_IA32_MC0_CTL2  0x0280
 #define MSR_IA32_MCx_CTL2(x)   (MSR_IA32_MC0_CTL2 + (x))
 
-/* 'SMCA': AMD64 Scalable MCA */
-#define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
-#define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
-
 #define MSR_P6_PERFCTR00x00c1
 #define MSR_P6_PERFCTR10x00c2
 #define MSR_P6_EVNTSEL00x0186


[tip:ras/core] x86/mce: Move MCx_CONFIG MSR definitions

2016-03-08 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  adc53f2e0ae2fcff10a4b981df14729ffb1482fc
Gitweb: http://git.kernel.org/tip/adc53f2e0ae2fcff10a4b981df14729ffb1482fc
Author: Aravind Gopalakrishnan 
AuthorDate: Mon, 7 Mar 2016 14:02:17 +0100
Committer:  Ingo Molnar 
CommitDate: Tue, 8 Mar 2016 11:48:14 +0100

x86/mce: Move MCx_CONFIG MSR definitions

Those MSRs are used only by the MCE code so move them there.

Signed-off-by: Aravind Gopalakrishnan 
Signed-off-by: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: linux-edac 
Link: 
http://lkml.kernel.org/r/1456785179-14378-2-git-send-email-aravind.gopalakrish...@amd.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/mce.h   | 4 
 arch/x86/include/asm/msr-index.h | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 2ea4527..80ba0a8 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -91,6 +91,10 @@
 #define MCE_LOG_LEN 32
 #define MCE_LOG_SIGNATURE  "MACHINECHECK"
 
+/* AMD Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+
 /*
  * This structure contains all data related to the MCE log.  Also
  * carries a signature to make it easier to find from external
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 5523465..b05402e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -264,10 +264,6 @@
 #define MSR_IA32_MC0_CTL2  0x0280
 #define MSR_IA32_MCx_CTL2(x)   (MSR_IA32_MC0_CTL2 + (x))
 
-/* 'SMCA': AMD64 Scalable MCA */
-#define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
-#define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
-
 #define MSR_P6_PERFCTR00x00c1
 #define MSR_P6_PERFCTR10x00c2
 #define MSR_P6_EVNTSEL00x0186


Re: [PATCH V3 0/5] Updates to EDAC and AMD MCE driver

2016-03-03 Thread Aravind Gopalakrishnan



On 3/3/16 12:45 PM, Borislav Petkov wrote:




Applied, minor stuff corrected and pushed out to

http://git.kernel.org/cgit/linux/kernel/git/bp/bp.git/log/?h=tip-ras

so that the 0day bot can chew on them a little.


Thanks!

-Aravind.



Re: [PATCH V3 0/5] Updates to EDAC and AMD MCE driver

2016-03-03 Thread Aravind Gopalakrishnan



On 3/3/16 12:45 PM, Borislav Petkov wrote:




Applied, minor stuff corrected and pushed out to

http://git.kernel.org/cgit/linux/kernel/git/bp/bp.git/log/?h=tip-ras

so that the 0day bot can chew on them a little.


Thanks!

-Aravind.



[PATCH V3 1/5] x86/mce: Move MCx_CONFIG MSR definition

2016-03-03 Thread Aravind Gopalakrishnan
Since this is contained to only MCE code, move
the MSR definiton there instead of adding to msr-index

Per discussion here:
http://marc.info/?l=linux-kernel=145633699026474=2

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---
 arch/x86/include/asm/mce.h   | 4 
 arch/x86/include/asm/msr-index.h | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 18d2ba9..e8b09b3 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -91,6 +91,10 @@
 #define MCE_LOG_LEN 32
 #define MCE_LOG_SIGNATURE  "MACHINECHECK"
 
+/* 'SMCA': AMD64 Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+
 /*
  * This structure contains all data related to the MCE log.  Also
  * carries a signature to make it easier to find from external
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 75a5bb6..984ab75 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -269,10 +269,6 @@
 #define MSR_IA32_MC0_CTL2  0x0280
 #define MSR_IA32_MCx_CTL2(x)   (MSR_IA32_MC0_CTL2 + (x))
 
-/* 'SMCA': AMD64 Scalable MCA */
-#define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
-#define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
-
 #define MSR_P6_PERFCTR00x00c1
 #define MSR_P6_PERFCTR10x00c2
 #define MSR_P6_EVNTSEL00x0186
-- 
2.7.0



[PATCH V3 1/5] x86/mce: Move MCx_CONFIG MSR definition

2016-03-03 Thread Aravind Gopalakrishnan
Since this is contained to only MCE code, move
the MSR definiton there instead of adding to msr-index

Per discussion here:
http://marc.info/?l=linux-kernel=145633699026474=2

Signed-off-by: Aravind Gopalakrishnan 
---
 arch/x86/include/asm/mce.h   | 4 
 arch/x86/include/asm/msr-index.h | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 18d2ba9..e8b09b3 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -91,6 +91,10 @@
 #define MCE_LOG_LEN 32
 #define MCE_LOG_SIGNATURE  "MACHINECHECK"
 
+/* 'SMCA': AMD64 Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+
 /*
  * This structure contains all data related to the MCE log.  Also
  * carries a signature to make it easier to find from external
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 75a5bb6..984ab75 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -269,10 +269,6 @@
 #define MSR_IA32_MC0_CTL2  0x0280
 #define MSR_IA32_MCx_CTL2(x)   (MSR_IA32_MC0_CTL2 + (x))
 
-/* 'SMCA': AMD64 Scalable MCA */
-#define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
-#define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
-
 #define MSR_P6_PERFCTR00x00c1
 #define MSR_P6_PERFCTR10x00c2
 #define MSR_P6_EVNTSEL00x0186
-- 
2.7.0



[PATCH V3 5/5] x86/mce/AMD: Add comments for easier understanding

2016-03-03 Thread Aravind Gopalakrishnan
In an attempt to aid in understanding of what the threshold_block
structure holds, provide comments to describe the members here.
Also, trim comments around threshold_restart_bank()
and update copyright info.

No functional change is introduced.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---
 arch/x86/include/asm/amd_nb.h| 18 +-
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  7 ++-
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 3c56ef1..bc01c0a 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -27,15 +27,15 @@ struct amd_l3_cache {
 };
 
 struct threshold_block {
-   unsigned intblock;
-   unsigned intbank;
-   unsigned intcpu;
-   u32 address;
-   u16 interrupt_enable;
-   boolinterrupt_capable;
-   u16 threshold_limit;
-   struct kobject  kobj;
-   struct list_headmiscj;
+   unsigned intblock;  /* Threshold block 
number within bank */
+   unsigned intbank;   /* MCA bank the block 
belongs to */
+   unsigned intcpu;/* CPU which controls 
the MCA bank */
+   u32 address;/* MSR address for the 
block */
+   u16 interrupt_enable;   /* Enable/ Disable APIC 
interrupt upon threshold error */
+   boolinterrupt_capable;  /* Specifies if 
interrupt is possible from the block */
+   u16 threshold_limit;/* Value upon which 
threshold interrupt is generated */
+   struct kobject  kobj;   /* sysfs object */
+   struct list_headmiscj;  /* Add multiple 
threshold blocks within a bank to the list */
 };
 
 struct threshold_bank {
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 879c20f..f5b4b80 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -1,5 +1,5 @@
 /*
- *  (c) 2005-2015 Advanced Micro Devices, Inc.
+ *  (c) 2005-2016 Advanced Micro Devices, Inc.
  *  Your use of this code is subject to the terms and conditions of the
  *  GNU general public license version 2. See "COPYING" or
  *  http://www.gnu.org/licenses/gpl.html
@@ -202,10 +202,7 @@ static int lvt_off_valid(struct threshold_block *b, int 
apic, u32 lo, u32 hi)
return 1;
 };
 
-/*
- * Called via smp_call_function_single(), must be called with correct
- * cpu affinity.
- */
+/* Reprogram MCx_MISC MSR behind this threshold bank */
 static void threshold_restart_bank(void *_tr)
 {
struct thresh_restart *tr = _tr;
-- 
2.7.0



[PATCH V3 5/5] x86/mce/AMD: Add comments for easier understanding

2016-03-03 Thread Aravind Gopalakrishnan
In an attempt to aid in understanding of what the threshold_block
structure holds, provide comments to describe the members here.
Also, trim comments around threshold_restart_bank()
and update copyright info.

No functional change is introduced.

Signed-off-by: Aravind Gopalakrishnan 
---
 arch/x86/include/asm/amd_nb.h| 18 +-
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  7 ++-
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 3c56ef1..bc01c0a 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -27,15 +27,15 @@ struct amd_l3_cache {
 };
 
 struct threshold_block {
-   unsigned intblock;
-   unsigned intbank;
-   unsigned intcpu;
-   u32 address;
-   u16 interrupt_enable;
-   boolinterrupt_capable;
-   u16 threshold_limit;
-   struct kobject  kobj;
-   struct list_headmiscj;
+   unsigned intblock;  /* Threshold block 
number within bank */
+   unsigned intbank;   /* MCA bank the block 
belongs to */
+   unsigned intcpu;/* CPU which controls 
the MCA bank */
+   u32 address;/* MSR address for the 
block */
+   u16 interrupt_enable;   /* Enable/ Disable APIC 
interrupt upon threshold error */
+   boolinterrupt_capable;  /* Specifies if 
interrupt is possible from the block */
+   u16 threshold_limit;/* Value upon which 
threshold interrupt is generated */
+   struct kobject  kobj;   /* sysfs object */
+   struct list_headmiscj;  /* Add multiple 
threshold blocks within a bank to the list */
 };
 
 struct threshold_bank {
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 879c20f..f5b4b80 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -1,5 +1,5 @@
 /*
- *  (c) 2005-2015 Advanced Micro Devices, Inc.
+ *  (c) 2005-2016 Advanced Micro Devices, Inc.
  *  Your use of this code is subject to the terms and conditions of the
  *  GNU general public license version 2. See "COPYING" or
  *  http://www.gnu.org/licenses/gpl.html
@@ -202,10 +202,7 @@ static int lvt_off_valid(struct threshold_block *b, int 
apic, u32 lo, u32 hi)
return 1;
 };
 
-/*
- * Called via smp_call_function_single(), must be called with correct
- * cpu affinity.
- */
+/* Reprogram MCx_MISC MSR behind this threshold bank */
 static void threshold_restart_bank(void *_tr)
 {
struct thresh_restart *tr = _tr;
-- 
2.7.0



[PATCH V3 3/5] x86/mce/AMD: Fix logic to obtain block address

2016-03-03 Thread Aravind Gopalakrishnan
In upcoming processors, the BLKPTR field is no longer used
to indicate the MSR number of the additional register.
Insted, it simply indicates the prescence of additional MSRs.

Fixing the logic here to gather MSR address from
MSR_AMD64_SMCA_MCx_MISC() for newer processors
and we fall back to existing logic for older processors.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---
 arch/x86/include/asm/mce.h   |  4 ++
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 90 
 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index cee098e..0681d0a 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -104,10 +104,14 @@
 #define MCE_LOG_SIGNATURE  "MACHINECHECK"
 
 /* 'SMCA': AMD64 Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_MISC0   0xc0002003
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
 #define MSR_AMD64_SMCA_MC0_IPID0xc0002005
+#define MSR_AMD64_SMCA_MC0_MISC1   0xc000200a
+#define MSR_AMD64_SMCA_MCx_MISC(x) (MSR_AMD64_SMCA_MC0_MISC0 + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_MISCy(x, y) ((MSR_AMD64_SMCA_MC0_MISC1 + y) + 
(0x10*(x)))
 
 /*
  * This structure contains all data related to the MCE log.  Also
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 7d495b6..879c20f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -305,6 +305,54 @@ static void deferred_error_interrupt_enable(struct 
cpuinfo_x86 *c)
wrmsr(MSR_CU_DEF_ERR, low, high);
 }
 
+static u32 get_block_address(u32 current_addr, u32 low, u32 high,
+unsigned int bank, unsigned int block)
+{
+   u32 addr = 0, offset = 0;
+
+   if (mce_flags.smca) {
+   if (!block) {
+   addr = MSR_AMD64_SMCA_MCx_MISC(bank);
+   } else {
+   /*
+* For SMCA enabled processors, BLKPTR field
+* of the first MISC register (MCx_MISC0) indicates
+* presence of additional MISC register set (MISC1-4)
+*/
+   u32 low, high;
+
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank),
+  , ) ||
+   !(low & MCI_CONFIG_MCAX))
+   goto nextaddr_out;
+
+   if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank),
+   , ) &&
+   (low & MASK_BLKPTR_LO))
+   addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 
1);
+   }
+
+   goto nextaddr_out;
+   }
+
+   /* Fall back to method we used for older processors */
+   switch (block) {
+   case 0:
+   addr = MSR_IA32_MCx_MISC(bank);
+   break;
+   case 1:
+   offset = ((low & MASK_BLKPTR_LO) >> 21);
+   if (offset)
+   addr = MCG_XBLK_ADDR + offset;
+   break;
+   default:
+   addr = ++current_addr;
+   }
+
+nextaddr_out:
+   return addr;
+}
+
 static int
 prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
int offset, u32 misc_high)
@@ -367,16 +415,10 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 
for (bank = 0; bank < mca_cfg.banks; ++bank) {
for (block = 0; block < NR_BLOCKS; ++block) {
-   if (block == 0)
-   address = MSR_IA32_MCx_MISC(bank);
-   else if (block == 1) {
-   address = (low & MASK_BLKPTR_LO) >> 21;
-   if (!address)
-   break;
-
-   address += MCG_XBLK_ADDR;
-   } else
-   ++address;
+   address = get_block_address(address, low, high,
+   bank, block);
+   if (!address)
+   break;
 
if (rdmsr_safe(address, , ))
break;
@@ -481,16 +523,10 @@ static void amd_threshold_interrupt(void)
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;
for (block = 0; block < NR_BLOCKS; ++block) {
-   if (block == 0) {
-   address = MSR_IA32_MCx_MISC(bank);
-   } else if (block == 1) {

[PATCH V3 3/5] x86/mce/AMD: Fix logic to obtain block address

2016-03-03 Thread Aravind Gopalakrishnan
In upcoming processors, the BLKPTR field is no longer used
to indicate the MSR number of the additional register.
Insted, it simply indicates the prescence of additional MSRs.

Fixing the logic here to gather MSR address from
MSR_AMD64_SMCA_MCx_MISC() for newer processors
and we fall back to existing logic for older processors.

Signed-off-by: Aravind Gopalakrishnan 
---
 arch/x86/include/asm/mce.h   |  4 ++
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 90 
 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index cee098e..0681d0a 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -104,10 +104,14 @@
 #define MCE_LOG_SIGNATURE  "MACHINECHECK"
 
 /* 'SMCA': AMD64 Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_MISC0   0xc0002003
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
 #define MSR_AMD64_SMCA_MC0_IPID0xc0002005
+#define MSR_AMD64_SMCA_MC0_MISC1   0xc000200a
+#define MSR_AMD64_SMCA_MCx_MISC(x) (MSR_AMD64_SMCA_MC0_MISC0 + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_MISCy(x, y) ((MSR_AMD64_SMCA_MC0_MISC1 + y) + 
(0x10*(x)))
 
 /*
  * This structure contains all data related to the MCE log.  Also
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 7d495b6..879c20f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -305,6 +305,54 @@ static void deferred_error_interrupt_enable(struct 
cpuinfo_x86 *c)
wrmsr(MSR_CU_DEF_ERR, low, high);
 }
 
+static u32 get_block_address(u32 current_addr, u32 low, u32 high,
+unsigned int bank, unsigned int block)
+{
+   u32 addr = 0, offset = 0;
+
+   if (mce_flags.smca) {
+   if (!block) {
+   addr = MSR_AMD64_SMCA_MCx_MISC(bank);
+   } else {
+   /*
+* For SMCA enabled processors, BLKPTR field
+* of the first MISC register (MCx_MISC0) indicates
+* presence of additional MISC register set (MISC1-4)
+*/
+   u32 low, high;
+
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank),
+  , ) ||
+   !(low & MCI_CONFIG_MCAX))
+   goto nextaddr_out;
+
+   if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank),
+   , ) &&
+   (low & MASK_BLKPTR_LO))
+   addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 
1);
+   }
+
+   goto nextaddr_out;
+   }
+
+   /* Fall back to method we used for older processors */
+   switch (block) {
+   case 0:
+   addr = MSR_IA32_MCx_MISC(bank);
+   break;
+   case 1:
+   offset = ((low & MASK_BLKPTR_LO) >> 21);
+   if (offset)
+   addr = MCG_XBLK_ADDR + offset;
+   break;
+   default:
+   addr = ++current_addr;
+   }
+
+nextaddr_out:
+   return addr;
+}
+
 static int
 prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
int offset, u32 misc_high)
@@ -367,16 +415,10 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 
for (bank = 0; bank < mca_cfg.banks; ++bank) {
for (block = 0; block < NR_BLOCKS; ++block) {
-   if (block == 0)
-   address = MSR_IA32_MCx_MISC(bank);
-   else if (block == 1) {
-   address = (low & MASK_BLKPTR_LO) >> 21;
-   if (!address)
-   break;
-
-   address += MCG_XBLK_ADDR;
-   } else
-   ++address;
+   address = get_block_address(address, low, high,
+   bank, block);
+   if (!address)
+   break;
 
if (rdmsr_safe(address, , ))
break;
@@ -481,16 +523,10 @@ static void amd_threshold_interrupt(void)
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;
for (block = 0; block < NR_BLOCKS; ++block) {
-   if (block == 0) {
-   address = MSR_IA32_MCx_MISC(bank);
-   } else if (block == 1) {
-

[PATCH V3 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors

2016-03-03 Thread Aravind Gopalakrishnan
For Scalable MCA enabled processors, errors are listed
per IP block. And since it is not required for an IP to
map to a particular bank, we need to use HWID and McaType
values from the MCx_IPID register to figure out which IP
a given bank represents.

We also have a new bit (TCC) in the MCx_STATUS register
to indicate Task context is corrupt.

Add logic here to decode errors from all known IP
blocks for Fam17h Model 00-0fh and to print TCC errors.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---
 arch/x86/include/asm/mce.h   |  59 +++
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  30 
 drivers/edac/mce_amd.c   | 334 ++-
 3 files changed, 420 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index e8b09b3..cee098e 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -42,6 +42,18 @@
 /* AMD-specific bits */
 #define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
+#define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
+
+/*
+ * McaX field if set indicates a given bank supports MCA extensions:
+ *  - Deferred error interrupt type is specifiable by bank.
+ *  - MCx_MISC0[BlkPtr] field indicates presence of extended MISC registers,
+ *But should not be used to determine MSR numbers.
+ *  - TCC bit is present in MCx_STATUS.
+ */
+#define MCI_CONFIG_MCAX0x1
+#define MCI_IPID_MCATYPE   0x
+#define MCI_IPID_HWID  0xFFF
 
 /*
  * Note that the full MCACOD field of IA32_MCi_STATUS MSR is
@@ -93,7 +105,9 @@
 
 /* 'SMCA': AMD64 Scalable MCA */
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MC0_IPID0xc0002005
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
 
 /*
  * This structure contains all data related to the MCE log.  Also
@@ -292,4 +306,49 @@ struct cper_sec_mem_err;
 extern void apei_mce_report_mem_error(int corrected,
  struct cper_sec_mem_err *mem_err);
 
+/*
+ * Enumerate new IP types and HWID values in AMD processors which support
+ * Scalable MCA.
+ */
+#ifdef CONFIG_X86_MCE_AMD
+enum amd_ip_types {
+   SMCA_F17H_CORE = 0, /* Core errors */
+   SMCA_DF,/* Data Fabric */
+   SMCA_UMC,   /* Unified Memory Controller */
+   SMCA_PB,/* Parameter Block */
+   SMCA_PSP,   /* Platform Security Processor */
+   SMCA_SMU,   /* System Management Unit */
+   N_AMD_IP_TYPES
+};
+
+struct amd_hwid {
+   const char *name;
+   unsigned int hwid;
+};
+
+extern struct amd_hwid amd_hwids[N_AMD_IP_TYPES];
+
+enum amd_core_mca_blocks {
+   SMCA_LS = 0,/* Load Store */
+   SMCA_IF,/* Instruction Fetch */
+   SMCA_L2_CACHE,  /* L2 cache */
+   SMCA_DE,/* Decoder unit */
+   RES,/* Reserved */
+   SMCA_EX,/* Execution unit */
+   SMCA_FP,/* Floating Point */
+   SMCA_L3_CACHE,  /* L3 cache */
+   N_CORE_MCA_BLOCKS
+};
+
+extern const char * const amd_core_mcablock_names[N_CORE_MCA_BLOCKS];
+
+enum amd_df_mca_blocks {
+   SMCA_CS = 0,/* Coherent Slave */
+   SMCA_PIE,   /* Power management, Interrupts, etc */
+   N_DF_BLOCKS
+};
+
+extern const char * const amd_df_mcablock_names[N_DF_BLOCKS];
+#endif
+
 #endif /* _ASM_X86_MCE_H */
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 88de27b..7d495b6 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -71,6 +71,36 @@ static const char * const th_names[] = {
"execution_unit",
 };
 
+/* Define HWID to IP type mappings for Scalable MCA */
+struct amd_hwid amd_hwids[] =
+{
+   [SMCA_F17H_CORE]= { "f17h_core",0xB0 },
+   [SMCA_DF]   = { "data_fabric",  0x2E },
+   [SMCA_UMC]  = { "umc",  0x96 },
+   [SMCA_PB]   = { "param_block",  0x5 },
+   [SMCA_PSP]  = { "psp",  0xFF },
+   [SMCA_SMU]  = { "smu",  0x1 },
+};
+EXPORT_SYMBOL_GPL(amd_hwids);
+
+const char * const amd_core_mcablock_names[] = {
+   [SMCA_LS]   = "load_store",
+   [SMCA_IF]   = "insn_fetch",
+   [SMCA_L2_CACHE] = "l2_cache",
+   [SMCA_DE]   = "decode_unit",
+   [RES]   = "",
+   [SMCA_EX]   = "execution_unit",
+   [SMCA_FP]   = &q

[PATCH V3 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors

2016-03-03 Thread Aravind Gopalakrishnan
For Scalable MCA enabled processors, errors are listed
per IP block. And since it is not required for an IP to
map to a particular bank, we need to use HWID and McaType
values from the MCx_IPID register to figure out which IP
a given bank represents.

We also have a new bit (TCC) in the MCx_STATUS register
to indicate Task context is corrupt.

Add logic here to decode errors from all known IP
blocks for Fam17h Model 00-0fh and to print TCC errors.

Signed-off-by: Aravind Gopalakrishnan 
---
 arch/x86/include/asm/mce.h   |  59 +++
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  30 
 drivers/edac/mce_amd.c   | 334 ++-
 3 files changed, 420 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index e8b09b3..cee098e 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -42,6 +42,18 @@
 /* AMD-specific bits */
 #define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
+#define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
+
+/*
+ * McaX field if set indicates a given bank supports MCA extensions:
+ *  - Deferred error interrupt type is specifiable by bank.
+ *  - MCx_MISC0[BlkPtr] field indicates presence of extended MISC registers,
+ *But should not be used to determine MSR numbers.
+ *  - TCC bit is present in MCx_STATUS.
+ */
+#define MCI_CONFIG_MCAX0x1
+#define MCI_IPID_MCATYPE   0x
+#define MCI_IPID_HWID  0xFFF
 
 /*
  * Note that the full MCACOD field of IA32_MCi_STATUS MSR is
@@ -93,7 +105,9 @@
 
 /* 'SMCA': AMD64 Scalable MCA */
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MC0_IPID0xc0002005
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
 
 /*
  * This structure contains all data related to the MCE log.  Also
@@ -292,4 +306,49 @@ struct cper_sec_mem_err;
 extern void apei_mce_report_mem_error(int corrected,
  struct cper_sec_mem_err *mem_err);
 
+/*
+ * Enumerate new IP types and HWID values in AMD processors which support
+ * Scalable MCA.
+ */
+#ifdef CONFIG_X86_MCE_AMD
+enum amd_ip_types {
+   SMCA_F17H_CORE = 0, /* Core errors */
+   SMCA_DF,/* Data Fabric */
+   SMCA_UMC,   /* Unified Memory Controller */
+   SMCA_PB,/* Parameter Block */
+   SMCA_PSP,   /* Platform Security Processor */
+   SMCA_SMU,   /* System Management Unit */
+   N_AMD_IP_TYPES
+};
+
+struct amd_hwid {
+   const char *name;
+   unsigned int hwid;
+};
+
+extern struct amd_hwid amd_hwids[N_AMD_IP_TYPES];
+
+enum amd_core_mca_blocks {
+   SMCA_LS = 0,/* Load Store */
+   SMCA_IF,/* Instruction Fetch */
+   SMCA_L2_CACHE,  /* L2 cache */
+   SMCA_DE,/* Decoder unit */
+   RES,/* Reserved */
+   SMCA_EX,/* Execution unit */
+   SMCA_FP,/* Floating Point */
+   SMCA_L3_CACHE,  /* L3 cache */
+   N_CORE_MCA_BLOCKS
+};
+
+extern const char * const amd_core_mcablock_names[N_CORE_MCA_BLOCKS];
+
+enum amd_df_mca_blocks {
+   SMCA_CS = 0,/* Coherent Slave */
+   SMCA_PIE,   /* Power management, Interrupts, etc */
+   N_DF_BLOCKS
+};
+
+extern const char * const amd_df_mcablock_names[N_DF_BLOCKS];
+#endif
+
 #endif /* _ASM_X86_MCE_H */
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 88de27b..7d495b6 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -71,6 +71,36 @@ static const char * const th_names[] = {
"execution_unit",
 };
 
+/* Define HWID to IP type mappings for Scalable MCA */
+struct amd_hwid amd_hwids[] =
+{
+   [SMCA_F17H_CORE]= { "f17h_core",0xB0 },
+   [SMCA_DF]   = { "data_fabric",  0x2E },
+   [SMCA_UMC]  = { "umc",  0x96 },
+   [SMCA_PB]   = { "param_block",  0x5 },
+   [SMCA_PSP]  = { "psp",  0xFF },
+   [SMCA_SMU]  = { "smu",  0x1 },
+};
+EXPORT_SYMBOL_GPL(amd_hwids);
+
+const char * const amd_core_mcablock_names[] = {
+   [SMCA_LS]   = "load_store",
+   [SMCA_IF]   = "insn_fetch",
+   [SMCA_L2_CACHE] = "l2_cache",
+   [SMCA_DE]   = "decode_unit",
+   [RES]   = "",
+   [SMCA_EX]   = "execution_unit",
+   [SMCA_FP]   = "floating_point",
+  

[PATCH V3 0/5] Updates to EDAC and AMD MCE driver

2016-03-03 Thread Aravind Gopalakrishnan
This patchset mainly provides necessary EDAC bits to decode errors
occuring on Scalable MCA enabled processors and also updates AMD MCE
driver to program the correct MCx_MISC register address for upcoming
processors.

Patches 1, 2 and 3 are for upcoming processor.

Patches 4 and 5 are either fixing or adding comments to help in
understanding the code and do not introduce any functional changes.

Patch 1: Move MSR definition to mce.h
Patch 2: Updates to EDAC driver to decode the new error signatures
Patch 3: Fix logic to obtain correct block address
Patch 4: Fix deferred error comment
Patch 5: Add comments to amd_nb.h to describe threshold_block structure

Note 1: Introduced new patch for moving MCx_CONFIG MSR to mce.h
Note 2: The enums, amd_hwids[], and string arrays amd_core_mcablock_names[],
amd_df_mcablock_names[] are placed in arch/x86 as there are
follow-up patches which use them here.

Changes from V1: (per Boris suggestions)
  - Simplify error decoding routines
  - Move headers to mce.h
  - Rename enumerations and struct members (to be more descriptive)
  - Drop gerund usage
  - Remove comments that are spelling out the code

Changes from V2: (per Boris suggestions)
  - Incorporated all changes as suggested by Boris from here-
- http://marc.info/?l=linux-kernel=145691594921586=2
- http://marc.info/?l=linux-kernel=145691606221610=2
- http://marc.info/?l=linux-kernel=145691610421627=2
  - No functional change is introduced

Aravind Gopalakrishnan (5):
  x86/mce: Move MCx_CONFIG MSR definition
  EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors
  x86/mce/AMD: Fix logic to obtain block address
  x86/mce: Clarify comments regarding deferred error
  x86/mce/AMD: Add comments for easier understanding

 arch/x86/include/asm/amd_nb.h|  18 +-
 arch/x86/include/asm/mce.h   |  69 +++-
 arch/x86/include/asm/msr-index.h |   4 -
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 127 +
 drivers/edac/mce_amd.c   | 334 ++-
 5 files changed, 501 insertions(+), 51 deletions(-)

-- 
2.7.0



[PATCH V3 4/5] x86/mce: Clarify comments regarding deferred error

2016-03-03 Thread Aravind Gopalakrishnan
The Deferred field indicates if we have a Deferred error.
Deferred errors indicate errors that hardware could not
fix. But it still does not cause any interruption to program
flow. So it does not generate any #MC and UC bit in MCx_STATUS
is not set.

Fixing comment here. No functional change

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---
 arch/x86/include/asm/mce.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 0681d0a..b016219 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -40,7 +40,7 @@
 #define MCI_STATUS_AR   (1ULL<<55)  /* Action required */
 
 /* AMD-specific bits */
-#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
+#define MCI_STATUS_DEFERRED(1ULL<<44)  /* uncorrected error, deferred 
exception */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
 #define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
 
-- 
2.7.0



[PATCH V3 0/5] Updates to EDAC and AMD MCE driver

2016-03-03 Thread Aravind Gopalakrishnan
This patchset mainly provides necessary EDAC bits to decode errors
occuring on Scalable MCA enabled processors and also updates AMD MCE
driver to program the correct MCx_MISC register address for upcoming
processors.

Patches 1, 2 and 3 are for upcoming processor.

Patches 4 and 5 are either fixing or adding comments to help in
understanding the code and do not introduce any functional changes.

Patch 1: Move MSR definition to mce.h
Patch 2: Updates to EDAC driver to decode the new error signatures
Patch 3: Fix logic to obtain correct block address
Patch 4: Fix deferred error comment
Patch 5: Add comments to amd_nb.h to describe threshold_block structure

Note 1: Introduced new patch for moving MCx_CONFIG MSR to mce.h
Note 2: The enums, amd_hwids[], and string arrays amd_core_mcablock_names[],
amd_df_mcablock_names[] are placed in arch/x86 as there are
follow-up patches which use them here.

Changes from V1: (per Boris suggestions)
  - Simplify error decoding routines
  - Move headers to mce.h
  - Rename enumerations and struct members (to be more descriptive)
  - Drop gerund usage
  - Remove comments that are spelling out the code

Changes from V2: (per Boris suggestions)
  - Incorporated all changes as suggested by Boris from here-
- http://marc.info/?l=linux-kernel=145691594921586=2
- http://marc.info/?l=linux-kernel=145691606221610=2
- http://marc.info/?l=linux-kernel=145691610421627=2
  - No functional change is introduced

Aravind Gopalakrishnan (5):
  x86/mce: Move MCx_CONFIG MSR definition
  EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors
  x86/mce/AMD: Fix logic to obtain block address
  x86/mce: Clarify comments regarding deferred error
  x86/mce/AMD: Add comments for easier understanding

 arch/x86/include/asm/amd_nb.h|  18 +-
 arch/x86/include/asm/mce.h   |  69 +++-
 arch/x86/include/asm/msr-index.h |   4 -
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 127 +
 drivers/edac/mce_amd.c   | 334 ++-
 5 files changed, 501 insertions(+), 51 deletions(-)

-- 
2.7.0



[PATCH V3 4/5] x86/mce: Clarify comments regarding deferred error

2016-03-03 Thread Aravind Gopalakrishnan
The Deferred field indicates if we have a Deferred error.
Deferred errors indicate errors that hardware could not
fix. But it still does not cause any interruption to program
flow. So it does not generate any #MC and UC bit in MCx_STATUS
is not set.

Fixing comment here. No functional change

Signed-off-by: Aravind Gopalakrishnan 
---
 arch/x86/include/asm/mce.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 0681d0a..b016219 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -40,7 +40,7 @@
 #define MCI_STATUS_AR   (1ULL<<55)  /* Action required */
 
 /* AMD-specific bits */
-#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
+#define MCI_STATUS_DEFERRED(1ULL<<44)  /* uncorrected error, deferred 
exception */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
 #define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
 
-- 
2.7.0



Re: [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding

2016-03-02 Thread Aravind Gopalakrishnan

On 3/2/2016 10:38 AM, Borislav Petkov wrote:

But you can take the three here, merge them again into a single patch
and do the changes ontot.

I made them into three to show you more easily what should be changed.





Ok, I'll just spin a V3 of the entire patchset with all your suggested 
changes then..


Thanks,
-Aravind.




Re: [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding

2016-03-02 Thread Aravind Gopalakrishnan

On 3/2/2016 10:38 AM, Borislav Petkov wrote:

But you can take the three here, merge them again into a single patch
and do the changes ontot.

I made them into three to show you more easily what should be changed.





Ok, I'll just spin a V3 of the entire patchset with all your suggested 
changes then..


Thanks,
-Aravind.




Re: [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding

2016-03-02 Thread Aravind Gopalakrishnan

On 3/2/2016 10:21 AM, Borislav Petkov wrote:

On Wed, Mar 02, 2016 at 09:52:23AM -0600, Aravind Gopalakrishnan wrote:

So, I think we should continue this approach and have something like this-
static const char * const amd_core_mcablock_names[] = {
 [SMCA_LS] = "load_store",
 [SMCA_IF] = "insn_fetch",
 [SMCA_L2_CACHE]   = "l2_cache",
 [SMCA_DE] = "decode_unit",
 [RES]   = "",
 [SMCA_EX] = "execution_unit",
 [SMCA_FP] = "floating_point",
 [SMCA_L3_CACHE]   = "l3_cache",
};

static const char * const amd_df_mcablock_names[] = {
 [SMCA_CS]  = "coherent_slave",
 [SMCA_PIE] = "pie",
};

(Split arrays again because I feel it'd be better to have arrays ordered
according to mca_type values)

Ok, care to take the patch and redo it as you suggest?


Sure. I was going to introduce these strings as part of patch to update 
sysfs code to

understand the new banks anyway. So it's already in the works:)


I really don't want to be assigning strings each time during decoding.


Ok, Will update the EDAC to use the existing string array.


Also, make sure the strings are as human readable as possible and so
that users can at least have an idea what we're saying. "load_store"
is better than "LS", "insn_fetch" is better than "IF", etc. Some
abbreviations should remain, though. "platform_security_processor" is
yucky and I guess there we can stick to "PSP". Ditto for "SMU"...


Understood. Will do as you suggest.


Making the unabbreviated lowercase for sysfs usage is fine too, of
course.




So, have you pushed the set of patches you applied somewhere? (bp.git?)
I can work on top of those and it will be easier to rebase on top of tip.git
once the patches find their way there..

Thanks,
-Aravind.


Re: [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding

2016-03-02 Thread Aravind Gopalakrishnan

On 3/2/2016 10:21 AM, Borislav Petkov wrote:

On Wed, Mar 02, 2016 at 09:52:23AM -0600, Aravind Gopalakrishnan wrote:

So, I think we should continue this approach and have something like this-
static const char * const amd_core_mcablock_names[] = {
 [SMCA_LS] = "load_store",
 [SMCA_IF] = "insn_fetch",
 [SMCA_L2_CACHE]   = "l2_cache",
 [SMCA_DE] = "decode_unit",
 [RES]   = "",
 [SMCA_EX] = "execution_unit",
 [SMCA_FP] = "floating_point",
 [SMCA_L3_CACHE]   = "l3_cache",
};

static const char * const amd_df_mcablock_names[] = {
 [SMCA_CS]  = "coherent_slave",
 [SMCA_PIE] = "pie",
};

(Split arrays again because I feel it'd be better to have arrays ordered
according to mca_type values)

Ok, care to take the patch and redo it as you suggest?


Sure. I was going to introduce these strings as part of patch to update 
sysfs code to

understand the new banks anyway. So it's already in the works:)


I really don't want to be assigning strings each time during decoding.


Ok, Will update the EDAC to use the existing string array.


Also, make sure the strings are as human readable as possible and so
that users can at least have an idea what we're saying. "load_store"
is better than "LS", "insn_fetch" is better than "IF", etc. Some
abbreviations should remain, though. "platform_security_processor" is
yucky and I guess there we can stick to "PSP". Ditto for "SMU"...


Understood. Will do as you suggest.


Making the unabbreviated lowercase for sysfs usage is fine too, of
course.




So, have you pushed the set of patches you applied somewhere? (bp.git?)
I can work on top of those and it will be easier to rebase on top of tip.git
once the patches find their way there..

Thanks,
-Aravind.


Re: [PATCH 3/3] EDAC, mce_amd: Correct error paths

2016-03-02 Thread Aravind Gopalakrishnan

On 3/2/2016 4:54 AM, Borislav Petkov wrote:

From: Borislav Petkov <b...@suse.de>
Date: Wed, 2 Mar 2016 11:46:58 +0100
Subject: [PATCH 3/3] EDAC, mce_amd: Correct error paths

We need to unwind properly when we fail to find the proper decoding
functions. Streamline error messages to resemble the rest of this file,
while at it and do some minor stylistic changes.

Signed-off-by: Borislav Petkov <b...@suse.de>


Looks good. Thanks.

Reviewed-by: Aravind Gopalakrishnan<aravind.gopalakrish...@amd.com>


-
  
  	default:

printk(KERN_WARNING "Huh? What family is it: 0x%x?!\n", c->x86);
-   kfree(fam_ops);
-   fam_ops = NULL;
+   goto err_out;
}
  
  	pr_info("MCE: In-kernel MCE decoding enabled.\n");

@@ -1225,6 +1224,11 @@ static int __init mce_amd_init(void)
mce_register_decode_chain(_mce_dec_nb);
  
  	return 0;

+
+err_out:
+   kfree(fam_ops);
+   fam_ops = NULL;
+   return -EINVAL;


Thanks! Sorry I missed this.

-Aravind.


Re: [PATCH 3/3] EDAC, mce_amd: Correct error paths

2016-03-02 Thread Aravind Gopalakrishnan

On 3/2/2016 4:54 AM, Borislav Petkov wrote:

From: Borislav Petkov 
Date: Wed, 2 Mar 2016 11:46:58 +0100
Subject: [PATCH 3/3] EDAC, mce_amd: Correct error paths

We need to unwind properly when we fail to find the proper decoding
functions. Streamline error messages to resemble the rest of this file,
while at it and do some minor stylistic changes.

Signed-off-by: Borislav Petkov 


Looks good. Thanks.

Reviewed-by: Aravind Gopalakrishnan


-
  
  	default:

printk(KERN_WARNING "Huh? What family is it: 0x%x?!\n", c->x86);
-   kfree(fam_ops);
-   fam_ops = NULL;
+   goto err_out;
}
  
  	pr_info("MCE: In-kernel MCE decoding enabled.\n");

@@ -1225,6 +1224,11 @@ static int __init mce_amd_init(void)
mce_register_decode_chain(_mce_dec_nb);
  
  	return 0;

+
+err_out:
+   kfree(fam_ops);
+   fam_ops = NULL;
+   return -EINVAL;


Thanks! Sorry I missed this.

-Aravind.


Re: [PATCH V2 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors

2016-03-02 Thread Aravind Gopalakrishnan

On 3/2/2016 4:50 AM, Borislav Petkov wrote:


Ok, applied with a bunch of changes ontop.



Thanks!


  The second patch is relying on the assumption that a
hwid of 0 is invalid. Is that so?



Yes, HWID of 0 is invalid.

Thanks,
-Aravind.


Re: [PATCH V2 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors

2016-03-02 Thread Aravind Gopalakrishnan

On 3/2/2016 4:50 AM, Borislav Petkov wrote:


Ok, applied with a bunch of changes ontop.



Thanks!


  The second patch is relying on the assumption that a
hwid of 0 is invalid. Is that so?



Yes, HWID of 0 is invalid.

Thanks,
-Aravind.


Re: [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding

2016-03-02 Thread Aravind Gopalakrishnan

On 3/2/2016 4:53 AM, Borislav Petkov wrote:

Merge all IP blocks into a single enum. This allows for easier block
name use later. Drop superfluous "_BLOCK" from the enum names.

Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>

  enum amd_ip_types {
-   SMCA_F17H_CORE_BLOCK = 0,   /* Core errors */
-   SMCA_DF_BLOCK,  /* Data Fabric */
-   SMCA_UMC_BLOCK, /* Unified Memory Controller */
-   SMCA_PB_BLOCK,  /* Parameter Block */
-   SMCA_PSP_BLOCK, /* Platform Security Processor */
-   SMCA_SMU_BLOCK, /* System Management Unit */
+   SMCA_F17H_CORE = 0, /* Core errors */
+   SMCA_LS,/* - Load Store */
+   SMCA_IF,/* - Instruction Fetch */
+   SMCA_L2_CACHE,  /* - L2 cache */
+   SMCA_DE,/* - Decoder unit */
+   RES,/* - Reserved */
+   SMCA_EX,/* - Execution unit */
+   SMCA_FP,/* - Floating Point */
+   SMCA_L3_CACHE,  /* - L3 cache */
+
+   SMCA_DF,/* Data Fabric */
+   SMCA_CS,/* - Coherent Slave */
+   SMCA_PIE,   /* - Power management, Interrupts, etc */
+
+   SMCA_UMC,   /* Unified Memory Controller */
+   SMCA_PB,/* Parameter Block */
+   SMCA_PSP,   /* Platform Security Processor */
+   SMCA_SMU,   /* System Management Unit */
N_AMD_IP_TYPES
  };
  


No, this would break the logic in EDAC.
The main reason I placed it in separate enums is because the "mca_type" 
values map to the enum.


These blocks-

+   SMCA_LS,/* - Load Store */
+   SMCA_IF,/* - Instruction Fetch */
+   SMCA_L2_CACHE,  /* - L2 cache */
+   SMCA_DE,/* - Decoder unit */
+   RES,/* - Reserved */
+   SMCA_EX,/* - Execution unit */
+   SMCA_FP,/* - Floating Point */
+   SMCA_L3_CACHE,  /* - L3 cache */


have the same hwid value (0xb0). But they differ in mca_type values. And 
in exactly that order.

(LS is mca_type 0, IF is mca_type 1 and so on..)

So, after we get mca_type value from the MSR (mca_type = (high & 
MCI_IPID_MCATYPE) >> 16),

We check for LS (=0) or IF (=1) ...
With this change, LS block is assigned 1 due to the ordering in enum.

And this logic applies to "DF" block as well.  (whose component blocks 
are "coherent slave" and "pie" which have mca_type values of 0 and 1 
respectively)
"DF" and "F17h_core" are essentially parent blocks and CS, PIE, LS, IF 
etc are children. hwid indicates the parent, mca_type indicates the child..




  
  /* Define HWID to IP type mappings for Scalable MCA */

-struct amd_hwid amd_hwid_mappings[] =
-{
-   [SMCA_F17H_CORE_BLOCK]  = { "f17h_core",  0xB0 },
-   [SMCA_DF_BLOCK] = { "data fabric",0x2E },
-   [SMCA_UMC_BLOCK]= { "UMC",0x96 },
-   [SMCA_PB_BLOCK] = { "param block",0x5 },
-   [SMCA_PSP_BLOCK]= { "PSP",0xFF },
-   [SMCA_SMU_BLOCK]= { "SMU",0x1 },
+struct amd_hwid amd_hwids[] =
+{
+   [SMCA_F17H_CORE] = { "F17h core", 0xB0 },
+   [SMCA_LS]= { "Load-Store",0x0 },
+   [SMCA_IF]= { "IFetch",0x0 },
+   [SMCA_L2_CACHE]  = { "L2 Cache",  0x0 },
+   [SMCA_DE]= { "Decoder",   0x0 },
+   [SMCA_EX]= { "Execution", 0x0 },
+   [SMCA_FP]= { "Floating Point",0x0 },
+   [SMCA_L3_CACHE]  = { "L3 Cache",  0x0 },
+   [SMCA_DF]= { "Data Fabric",   0x2E },
+   [SMCA_CS]= { "Coherent Slave",0x0 },
+   [SMCA_PIE]   = { "PwrMan/Intr",   0x0 },
+   [SMCA_UMC]   = { "UMC",   0x96 },
+   [SMCA_PB]= { "Param Block",   0x5 },
+   [SMCA_PSP]   = { "PSP",   0xFF },
+   [SMCA_SMU]   = { "SMU",   0x1 },
  };
-EXPORT_SYMBOL_GPL(amd_hwid_mappings);
+EXPORT_SYMBOL_GPL(amd_hwids);
  


These strings are what I intend to use for the sysfs interface.
So, I am not sure if "PwrMan/Intr" would work when I need to create the 
kobj..


Also, the legacy names use snake_case-
static const char * const th_names[] = {
"load_store",
"insn_fetch",
"combined_unit",
"",
"northbridge",
"execut

Re: [PATCH 2/3] x86/mce/AMD, EDAC: Simplify SMCA decoding

2016-03-02 Thread Aravind Gopalakrishnan

On 3/2/2016 4:53 AM, Borislav Petkov wrote:

Merge all IP blocks into a single enum. This allows for easier block
name use later. Drop superfluous "_BLOCK" from the enum names.

Signed-off-by: Borislav Petkov 
Cc: Aravind Gopalakrishnan 

  enum amd_ip_types {
-   SMCA_F17H_CORE_BLOCK = 0,   /* Core errors */
-   SMCA_DF_BLOCK,  /* Data Fabric */
-   SMCA_UMC_BLOCK, /* Unified Memory Controller */
-   SMCA_PB_BLOCK,  /* Parameter Block */
-   SMCA_PSP_BLOCK, /* Platform Security Processor */
-   SMCA_SMU_BLOCK, /* System Management Unit */
+   SMCA_F17H_CORE = 0, /* Core errors */
+   SMCA_LS,/* - Load Store */
+   SMCA_IF,/* - Instruction Fetch */
+   SMCA_L2_CACHE,  /* - L2 cache */
+   SMCA_DE,/* - Decoder unit */
+   RES,/* - Reserved */
+   SMCA_EX,/* - Execution unit */
+   SMCA_FP,/* - Floating Point */
+   SMCA_L3_CACHE,  /* - L3 cache */
+
+   SMCA_DF,/* Data Fabric */
+   SMCA_CS,/* - Coherent Slave */
+   SMCA_PIE,   /* - Power management, Interrupts, etc */
+
+   SMCA_UMC,   /* Unified Memory Controller */
+   SMCA_PB,/* Parameter Block */
+   SMCA_PSP,   /* Platform Security Processor */
+   SMCA_SMU,   /* System Management Unit */
N_AMD_IP_TYPES
  };
  


No, this would break the logic in EDAC.
The main reason I placed it in separate enums is because the "mca_type" 
values map to the enum.


These blocks-

+   SMCA_LS,/* - Load Store */
+   SMCA_IF,/* - Instruction Fetch */
+   SMCA_L2_CACHE,  /* - L2 cache */
+   SMCA_DE,/* - Decoder unit */
+   RES,/* - Reserved */
+   SMCA_EX,/* - Execution unit */
+   SMCA_FP,/* - Floating Point */
+   SMCA_L3_CACHE,  /* - L3 cache */


have the same hwid value (0xb0). But they differ in mca_type values. And 
in exactly that order.

(LS is mca_type 0, IF is mca_type 1 and so on..)

So, after we get mca_type value from the MSR (mca_type = (high & 
MCI_IPID_MCATYPE) >> 16),

We check for LS (=0) or IF (=1) ...
With this change, LS block is assigned 1 due to the ordering in enum.

And this logic applies to "DF" block as well.  (whose component blocks 
are "coherent slave" and "pie" which have mca_type values of 0 and 1 
respectively)
"DF" and "F17h_core" are essentially parent blocks and CS, PIE, LS, IF 
etc are children. hwid indicates the parent, mca_type indicates the child..




  
  /* Define HWID to IP type mappings for Scalable MCA */

-struct amd_hwid amd_hwid_mappings[] =
-{
-   [SMCA_F17H_CORE_BLOCK]  = { "f17h_core",  0xB0 },
-   [SMCA_DF_BLOCK] = { "data fabric",0x2E },
-   [SMCA_UMC_BLOCK]= { "UMC",0x96 },
-   [SMCA_PB_BLOCK] = { "param block",0x5 },
-   [SMCA_PSP_BLOCK]= { "PSP",0xFF },
-   [SMCA_SMU_BLOCK]= { "SMU",0x1 },
+struct amd_hwid amd_hwids[] =
+{
+   [SMCA_F17H_CORE] = { "F17h core", 0xB0 },
+   [SMCA_LS]= { "Load-Store",0x0 },
+   [SMCA_IF]= { "IFetch",0x0 },
+   [SMCA_L2_CACHE]  = { "L2 Cache",  0x0 },
+   [SMCA_DE]= { "Decoder",   0x0 },
+   [SMCA_EX]= { "Execution", 0x0 },
+   [SMCA_FP]= { "Floating Point",0x0 },
+   [SMCA_L3_CACHE]  = { "L3 Cache",  0x0 },
+   [SMCA_DF]= { "Data Fabric",   0x2E },
+   [SMCA_CS]= { "Coherent Slave",0x0 },
+   [SMCA_PIE]   = { "PwrMan/Intr",   0x0 },
+   [SMCA_UMC]   = { "UMC",   0x96 },
+   [SMCA_PB]= { "Param Block",   0x5 },
+   [SMCA_PSP]   = { "PSP",   0xFF },
+   [SMCA_SMU]   = { "SMU",   0x1 },
  };
-EXPORT_SYMBOL_GPL(amd_hwid_mappings);
+EXPORT_SYMBOL_GPL(amd_hwids);
  


These strings are what I intend to use for the sysfs interface.
So, I am not sure if "PwrMan/Intr" would work when I need to create the 
kobj..


Also, the legacy names use snake_case-
static const char * const th_names[] = {
"load_store",
"insn_fetch",
"combined_unit",
"",
"northbridge",
"execution_unit",
};

So, I think we should continue this ap

[PATCH V2 4/5] x86/mce: Clarify comments regarding deferred error

2016-02-29 Thread Aravind Gopalakrishnan
The Deferred field indicates if we have a Deferred error.
Deferred errors indicate errors that hardware could not
fix. But it still does not cause any interruption to program
flow. So it does not generate any #MC and UC bit in MCx_STATUS
is not set.

Fixing comment here. No functional change

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---
 arch/x86/include/asm/mce.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 69f8bda..3b45e36 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -40,7 +40,7 @@
 #define MCI_STATUS_AR   (1ULL<<55)  /* Action required */
 
 /* AMD-specific bits */
-#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
+#define MCI_STATUS_DEFERRED(1ULL<<44)  /* uncorrected error, deferred 
exception */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
 #define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
 
-- 
2.7.0



[PATCH V2 4/5] x86/mce: Clarify comments regarding deferred error

2016-02-29 Thread Aravind Gopalakrishnan
The Deferred field indicates if we have a Deferred error.
Deferred errors indicate errors that hardware could not
fix. But it still does not cause any interruption to program
flow. So it does not generate any #MC and UC bit in MCx_STATUS
is not set.

Fixing comment here. No functional change

Signed-off-by: Aravind Gopalakrishnan 
---
 arch/x86/include/asm/mce.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 69f8bda..3b45e36 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -40,7 +40,7 @@
 #define MCI_STATUS_AR   (1ULL<<55)  /* Action required */
 
 /* AMD-specific bits */
-#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
+#define MCI_STATUS_DEFERRED(1ULL<<44)  /* uncorrected error, deferred 
exception */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
 #define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
 
-- 
2.7.0



[PATCH V2 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors

2016-02-29 Thread Aravind Gopalakrishnan
For Scalable MCA enabled processors, errors are listed
per IP block. And since it is not required for an IP to
map to a particular bank, we need to use HWID and McaType
values from the MCx_IPID register to figure out which IP
a given bank represents.

We also have a new bit (TCC) in the MCx_STATUS register
to indicate Task context is corrupt.

Add logic here to decode errors from all known IP
blocks for Fam17h Model 00-0fh and to print TCC errors.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---
 arch/x86/include/asm/mce.h   |  53 ++
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  11 ++
 drivers/edac/mce_amd.c   | 342 ++-
 3 files changed, 405 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index e8b09b3..e83bbd6 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -42,6 +42,18 @@
 /* AMD-specific bits */
 #define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
+#define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
+
+/*
+ * McaX field if set indicates a given bank supports MCA extensions:
+ *  - Deferred error interrupt type is specifiable by bank.
+ *  - MCx_MISC0[BlkPtr] field indicates presence of extended MISC registers,
+ *But should not be used to determine MSR numbers.
+ *  - TCC bit is present in MCx_STATUS.
+ */
+#define MCI_CONFIG_MCAX0x1
+#define MCI_IPID_MCATYPE   0x
+#define MCI_IPID_HWID  0xFFF
 
 /*
  * Note that the full MCACOD field of IA32_MCi_STATUS MSR is
@@ -93,7 +105,9 @@
 
 /* 'SMCA': AMD64 Scalable MCA */
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MC0_IPID0xc0002005
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
 
 /*
  * This structure contains all data related to the MCE log.  Also
@@ -292,4 +306,43 @@ struct cper_sec_mem_err;
 extern void apei_mce_report_mem_error(int corrected,
  struct cper_sec_mem_err *mem_err);
 
+/*
+ * Enumerate new IP types and HWID values
+ * in ScalableMCA enabled AMD processors
+ */
+#ifdef CONFIG_X86_MCE_AMD
+enum amd_ip_types {
+   SMCA_F17H_CORE_BLOCK = 0,   /* Core errors */
+   SMCA_DF_BLOCK,  /* Data Fabric */
+   SMCA_UMC_BLOCK, /* Unified Memory Controller */
+   SMCA_PB_BLOCK,  /* Parameter Block */
+   SMCA_PSP_BLOCK, /* Platform Security Processor */
+   SMCA_SMU_BLOCK, /* System Management Unit */
+   N_AMD_IP_TYPES
+};
+
+struct amd_hwid {
+   const char *amd_ipname;
+   unsigned int amd_hwid_value;
+};
+
+extern struct amd_hwid amd_hwid_mappings[N_AMD_IP_TYPES];
+
+enum amd_core_mca_blocks {
+   SMCA_LS_BLOCK = 0,  /* Load Store */
+   SMCA_IF_BLOCK,  /* Instruction Fetch */
+   SMCA_L2_CACHE_BLOCK,/* L2 cache */
+   SMCA_DE_BLOCK,  /* Decoder unit */
+   RES,/* Reserved */
+   SMCA_EX_BLOCK,  /* Execution unit */
+   SMCA_FP_BLOCK,  /* Floating Point */
+   SMCA_L3_CACHE_BLOCK /* L3 cache */
+};
+
+enum amd_df_mca_blocks {
+   SMCA_CS_BLOCK = 0,  /* Coherent Slave */
+   SMCA_PIE_BLOCK  /* Power management, Interrupts, etc */
+};
+#endif
+
 #endif /* _ASM_X86_MCE_H */
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 88de27b..13f15cb 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -71,6 +71,17 @@ static const char * const th_names[] = {
"execution_unit",
 };
 
+/* Define HWID to IP type mappings for Scalable MCA */
+struct amd_hwid amd_hwid_mappings[] = {
+   [SMCA_F17H_CORE_BLOCK]  = { "f17h_core", 0xB0 },
+   [SMCA_DF_BLOCK] = { "df", 0x2E },
+   [SMCA_UMC_BLOCK]= { "umc", 0x96 },
+   [SMCA_PB_BLOCK] = { "pb", 0x5 },
+   [SMCA_PSP_BLOCK]= { "psp", 0xFF },
+   [SMCA_SMU_BLOCK]= { "smu", 0x1 },
+};
+EXPORT_SYMBOL_GPL(amd_hwid_mappings);
+
 static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
 static DEFINE_PER_CPU(unsigned char, bank_map);/* see which banks are 
on */
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index e3a945c..409448e 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -147,6 +147,136 @@ static const char * const mc6_mce_desc[] = {
"Status Register File",
 };
 
+/* Scalable MCA error strings */
+
+static const char * const f17h_ls_mce_desc[] = {
+   "Load queu

[PATCH V2 3/5] x86/mce/AMD: Fix logic to obtain block address

2016-02-29 Thread Aravind Gopalakrishnan
In upcoming processors, the BLKPTR field is no longer used
to indicate the MSR number of the additional register.
Insted, it simply indicates the prescence of additional MSRs.

Fixing the logic here to gather MSR address from
MSR_AMD64_SMCA_MCx_MISC() for newer processors
and we fall back to existing logic for older processors.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---
 arch/x86/include/asm/mce.h   |  4 ++
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 90 
 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index e83bbd6..69f8bda 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -104,10 +104,14 @@
 #define MCE_LOG_SIGNATURE  "MACHINECHECK"
 
 /* 'SMCA': AMD64 Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_MISC0   0xc0002003
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
 #define MSR_AMD64_SMCA_MC0_IPID0xc0002005
+#define MSR_AMD64_SMCA_MC0_MISC1   0xc000200a
+#define MSR_AMD64_SMCA_MCx_MISC(x) (MSR_AMD64_SMCA_MC0_MISC0 + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_MISCy(x, y) ((MSR_AMD64_SMCA_MC0_MISC1 + y) + 
(0x10*(x)))
 
 /*
  * This structure contains all data related to the MCE log.  Also
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 13f15cb..a155eaa 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -286,6 +286,54 @@ static void deferred_error_interrupt_enable(struct 
cpuinfo_x86 *c)
wrmsr(MSR_CU_DEF_ERR, low, high);
 }
 
+static u32 get_block_address(u32 current_addr, u32 low, u32 high,
+unsigned int bank, unsigned int block)
+{
+   u32 addr = 0, offset = 0;
+
+   if (mce_flags.smca) {
+   if (!block) {
+   addr = MSR_AMD64_SMCA_MCx_MISC(bank);
+   } else {
+   /*
+* For SMCA enabled processors, BLKPTR field
+* of the first MISC register (MCx_MISC0) indicates
+* presence of additional MISC register set (MISC1-4)
+*/
+   u32 low, high;
+
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank),
+  , ) ||
+   !(low & MCI_CONFIG_MCAX))
+   goto nextaddr_out;
+
+   if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank),
+   , ) &&
+   (low & MASK_BLKPTR_LO))
+   addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 
1);
+   }
+
+   goto nextaddr_out;
+   }
+
+   /* Fall back to method we used for older processors */
+   switch (block) {
+   case 0:
+   addr = MSR_IA32_MCx_MISC(bank);
+   break;
+   case 1:
+   offset = ((low & MASK_BLKPTR_LO) >> 21);
+   if (offset)
+   addr = MCG_XBLK_ADDR + offset;
+   break;
+   default:
+   addr = ++current_addr;
+   }
+
+nextaddr_out:
+   return addr;
+}
+
 static int
 prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
int offset, u32 misc_high)
@@ -348,16 +396,10 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 
for (bank = 0; bank < mca_cfg.banks; ++bank) {
for (block = 0; block < NR_BLOCKS; ++block) {
-   if (block == 0)
-   address = MSR_IA32_MCx_MISC(bank);
-   else if (block == 1) {
-   address = (low & MASK_BLKPTR_LO) >> 21;
-   if (!address)
-   break;
-
-   address += MCG_XBLK_ADDR;
-   } else
-   ++address;
+   address = get_block_address(address, low, high,
+   bank, block);
+   if (!address)
+   break;
 
if (rdmsr_safe(address, , ))
break;
@@ -462,16 +504,10 @@ static void amd_threshold_interrupt(void)
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;
for (block = 0; block < NR_BLOCKS; ++block) {
-   if (block == 0) {
-   address = MSR_IA32_MCx_MISC(bank);
-   } else if (block == 1) {

[PATCH V2 2/5] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors

2016-02-29 Thread Aravind Gopalakrishnan
For Scalable MCA enabled processors, errors are listed
per IP block. And since it is not required for an IP to
map to a particular bank, we need to use HWID and McaType
values from the MCx_IPID register to figure out which IP
a given bank represents.

We also have a new bit (TCC) in the MCx_STATUS register
to indicate Task context is corrupt.

Add logic here to decode errors from all known IP
blocks for Fam17h Model 00-0fh and to print TCC errors.

Signed-off-by: Aravind Gopalakrishnan 
---
 arch/x86/include/asm/mce.h   |  53 ++
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  11 ++
 drivers/edac/mce_amd.c   | 342 ++-
 3 files changed, 405 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index e8b09b3..e83bbd6 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -42,6 +42,18 @@
 /* AMD-specific bits */
 #define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
+#define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
+
+/*
+ * McaX field if set indicates a given bank supports MCA extensions:
+ *  - Deferred error interrupt type is specifiable by bank.
+ *  - MCx_MISC0[BlkPtr] field indicates presence of extended MISC registers,
+ *But should not be used to determine MSR numbers.
+ *  - TCC bit is present in MCx_STATUS.
+ */
+#define MCI_CONFIG_MCAX0x1
+#define MCI_IPID_MCATYPE   0x
+#define MCI_IPID_HWID  0xFFF
 
 /*
  * Note that the full MCACOD field of IA32_MCi_STATUS MSR is
@@ -93,7 +105,9 @@
 
 /* 'SMCA': AMD64 Scalable MCA */
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MC0_IPID0xc0002005
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
 
 /*
  * This structure contains all data related to the MCE log.  Also
@@ -292,4 +306,43 @@ struct cper_sec_mem_err;
 extern void apei_mce_report_mem_error(int corrected,
  struct cper_sec_mem_err *mem_err);
 
+/*
+ * Enumerate new IP types and HWID values
+ * in ScalableMCA enabled AMD processors
+ */
+#ifdef CONFIG_X86_MCE_AMD
+enum amd_ip_types {
+   SMCA_F17H_CORE_BLOCK = 0,   /* Core errors */
+   SMCA_DF_BLOCK,  /* Data Fabric */
+   SMCA_UMC_BLOCK, /* Unified Memory Controller */
+   SMCA_PB_BLOCK,  /* Parameter Block */
+   SMCA_PSP_BLOCK, /* Platform Security Processor */
+   SMCA_SMU_BLOCK, /* System Management Unit */
+   N_AMD_IP_TYPES
+};
+
+struct amd_hwid {
+   const char *amd_ipname;
+   unsigned int amd_hwid_value;
+};
+
+extern struct amd_hwid amd_hwid_mappings[N_AMD_IP_TYPES];
+
+enum amd_core_mca_blocks {
+   SMCA_LS_BLOCK = 0,  /* Load Store */
+   SMCA_IF_BLOCK,  /* Instruction Fetch */
+   SMCA_L2_CACHE_BLOCK,/* L2 cache */
+   SMCA_DE_BLOCK,  /* Decoder unit */
+   RES,/* Reserved */
+   SMCA_EX_BLOCK,  /* Execution unit */
+   SMCA_FP_BLOCK,  /* Floating Point */
+   SMCA_L3_CACHE_BLOCK /* L3 cache */
+};
+
+enum amd_df_mca_blocks {
+   SMCA_CS_BLOCK = 0,  /* Coherent Slave */
+   SMCA_PIE_BLOCK  /* Power management, Interrupts, etc */
+};
+#endif
+
 #endif /* _ASM_X86_MCE_H */
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 88de27b..13f15cb 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -71,6 +71,17 @@ static const char * const th_names[] = {
"execution_unit",
 };
 
+/* Define HWID to IP type mappings for Scalable MCA */
+struct amd_hwid amd_hwid_mappings[] = {
+   [SMCA_F17H_CORE_BLOCK]  = { "f17h_core", 0xB0 },
+   [SMCA_DF_BLOCK] = { "df", 0x2E },
+   [SMCA_UMC_BLOCK]= { "umc", 0x96 },
+   [SMCA_PB_BLOCK] = { "pb", 0x5 },
+   [SMCA_PSP_BLOCK]= { "psp", 0xFF },
+   [SMCA_SMU_BLOCK]= { "smu", 0x1 },
+};
+EXPORT_SYMBOL_GPL(amd_hwid_mappings);
+
 static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
 static DEFINE_PER_CPU(unsigned char, bank_map);/* see which banks are 
on */
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index e3a945c..409448e 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -147,6 +147,136 @@ static const char * const mc6_mce_desc[] = {
"Status Register File",
 };
 
+/* Scalable MCA error strings */
+
+static const char * const f17h_ls_mce_desc[] = {
+   "Load queue pari

[PATCH V2 3/5] x86/mce/AMD: Fix logic to obtain block address

2016-02-29 Thread Aravind Gopalakrishnan
In upcoming processors, the BLKPTR field is no longer used
to indicate the MSR number of the additional register.
Insted, it simply indicates the prescence of additional MSRs.

Fixing the logic here to gather MSR address from
MSR_AMD64_SMCA_MCx_MISC() for newer processors
and we fall back to existing logic for older processors.

Signed-off-by: Aravind Gopalakrishnan 
---
 arch/x86/include/asm/mce.h   |  4 ++
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 90 
 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index e83bbd6..69f8bda 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -104,10 +104,14 @@
 #define MCE_LOG_SIGNATURE  "MACHINECHECK"
 
 /* 'SMCA': AMD64 Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_MISC0   0xc0002003
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
 #define MSR_AMD64_SMCA_MC0_IPID0xc0002005
+#define MSR_AMD64_SMCA_MC0_MISC1   0xc000200a
+#define MSR_AMD64_SMCA_MCx_MISC(x) (MSR_AMD64_SMCA_MC0_MISC0 + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_MISCy(x, y) ((MSR_AMD64_SMCA_MC0_MISC1 + y) + 
(0x10*(x)))
 
 /*
  * This structure contains all data related to the MCE log.  Also
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 13f15cb..a155eaa 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -286,6 +286,54 @@ static void deferred_error_interrupt_enable(struct 
cpuinfo_x86 *c)
wrmsr(MSR_CU_DEF_ERR, low, high);
 }
 
+static u32 get_block_address(u32 current_addr, u32 low, u32 high,
+unsigned int bank, unsigned int block)
+{
+   u32 addr = 0, offset = 0;
+
+   if (mce_flags.smca) {
+   if (!block) {
+   addr = MSR_AMD64_SMCA_MCx_MISC(bank);
+   } else {
+   /*
+* For SMCA enabled processors, BLKPTR field
+* of the first MISC register (MCx_MISC0) indicates
+* presence of additional MISC register set (MISC1-4)
+*/
+   u32 low, high;
+
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank),
+  , ) ||
+   !(low & MCI_CONFIG_MCAX))
+   goto nextaddr_out;
+
+   if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank),
+   , ) &&
+   (low & MASK_BLKPTR_LO))
+   addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 
1);
+   }
+
+   goto nextaddr_out;
+   }
+
+   /* Fall back to method we used for older processors */
+   switch (block) {
+   case 0:
+   addr = MSR_IA32_MCx_MISC(bank);
+   break;
+   case 1:
+   offset = ((low & MASK_BLKPTR_LO) >> 21);
+   if (offset)
+   addr = MCG_XBLK_ADDR + offset;
+   break;
+   default:
+   addr = ++current_addr;
+   }
+
+nextaddr_out:
+   return addr;
+}
+
 static int
 prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
int offset, u32 misc_high)
@@ -348,16 +396,10 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 
for (bank = 0; bank < mca_cfg.banks; ++bank) {
for (block = 0; block < NR_BLOCKS; ++block) {
-   if (block == 0)
-   address = MSR_IA32_MCx_MISC(bank);
-   else if (block == 1) {
-   address = (low & MASK_BLKPTR_LO) >> 21;
-   if (!address)
-   break;
-
-   address += MCG_XBLK_ADDR;
-   } else
-   ++address;
+   address = get_block_address(address, low, high,
+   bank, block);
+   if (!address)
+   break;
 
if (rdmsr_safe(address, , ))
break;
@@ -462,16 +504,10 @@ static void amd_threshold_interrupt(void)
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
continue;
for (block = 0; block < NR_BLOCKS; ++block) {
-   if (block == 0) {
-   address = MSR_IA32_MCx_MISC(bank);
-   } else if (block == 1) {
-

[PATCH V2 5/5] x86/mce/AMD: Add comments for easier understanding

2016-02-29 Thread Aravind Gopalakrishnan
In an attempt to aid in understand of what threshold_block
structure holds, assing comments to describe the members here.
Also, trimming comments around threshold_restart_bank()
and updating copyright info.

No functional change is introduced.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---
 arch/x86/include/asm/amd_nb.h| 18 +-
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  7 ++-
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 3c56ef1..bc01c0a 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -27,15 +27,15 @@ struct amd_l3_cache {
 };
 
 struct threshold_block {
-   unsigned intblock;
-   unsigned intbank;
-   unsigned intcpu;
-   u32 address;
-   u16 interrupt_enable;
-   boolinterrupt_capable;
-   u16 threshold_limit;
-   struct kobject  kobj;
-   struct list_headmiscj;
+   unsigned intblock;  /* Threshold block 
number within bank */
+   unsigned intbank;   /* MCA bank the block 
belongs to */
+   unsigned intcpu;/* CPU which controls 
the MCA bank */
+   u32 address;/* MSR address for the 
block */
+   u16 interrupt_enable;   /* Enable/ Disable APIC 
interrupt upon threshold error */
+   boolinterrupt_capable;  /* Specifies if 
interrupt is possible from the block */
+   u16 threshold_limit;/* Value upon which 
threshold interrupt is generated */
+   struct kobject  kobj;   /* sysfs object */
+   struct list_headmiscj;  /* Add multiple 
threshold blocks within a bank to the list */
 };
 
 struct threshold_bank {
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index a155eaa..ebb63ec 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -1,5 +1,5 @@
 /*
- *  (c) 2005-2015 Advanced Micro Devices, Inc.
+ *  (c) 2005-2016 Advanced Micro Devices, Inc.
  *  Your use of this code is subject to the terms and conditions of the
  *  GNU general public license version 2. See "COPYING" or
  *  http://www.gnu.org/licenses/gpl.html
@@ -183,10 +183,7 @@ static int lvt_off_valid(struct threshold_block *b, int 
apic, u32 lo, u32 hi)
return 1;
 };
 
-/*
- * Called via smp_call_function_single(), must be called with correct
- * cpu affinity.
- */
+/* Reprogram MCx_MISC MSR behind this threshold bank */
 static void threshold_restart_bank(void *_tr)
 {
struct thresh_restart *tr = _tr;
-- 
2.7.0



[PATCH V2 5/5] x86/mce/AMD: Add comments for easier understanding

2016-02-29 Thread Aravind Gopalakrishnan
In an attempt to aid in understand of what threshold_block
structure holds, assing comments to describe the members here.
Also, trimming comments around threshold_restart_bank()
and updating copyright info.

No functional change is introduced.

Signed-off-by: Aravind Gopalakrishnan 
---
 arch/x86/include/asm/amd_nb.h| 18 +-
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  7 ++-
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 3c56ef1..bc01c0a 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -27,15 +27,15 @@ struct amd_l3_cache {
 };
 
 struct threshold_block {
-   unsigned intblock;
-   unsigned intbank;
-   unsigned intcpu;
-   u32 address;
-   u16 interrupt_enable;
-   boolinterrupt_capable;
-   u16 threshold_limit;
-   struct kobject  kobj;
-   struct list_headmiscj;
+   unsigned intblock;  /* Threshold block 
number within bank */
+   unsigned intbank;   /* MCA bank the block 
belongs to */
+   unsigned intcpu;/* CPU which controls 
the MCA bank */
+   u32 address;/* MSR address for the 
block */
+   u16 interrupt_enable;   /* Enable/ Disable APIC 
interrupt upon threshold error */
+   boolinterrupt_capable;  /* Specifies if 
interrupt is possible from the block */
+   u16 threshold_limit;/* Value upon which 
threshold interrupt is generated */
+   struct kobject  kobj;   /* sysfs object */
+   struct list_headmiscj;  /* Add multiple 
threshold blocks within a bank to the list */
 };
 
 struct threshold_bank {
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index a155eaa..ebb63ec 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -1,5 +1,5 @@
 /*
- *  (c) 2005-2015 Advanced Micro Devices, Inc.
+ *  (c) 2005-2016 Advanced Micro Devices, Inc.
  *  Your use of this code is subject to the terms and conditions of the
  *  GNU general public license version 2. See "COPYING" or
  *  http://www.gnu.org/licenses/gpl.html
@@ -183,10 +183,7 @@ static int lvt_off_valid(struct threshold_block *b, int 
apic, u32 lo, u32 hi)
return 1;
 };
 
-/*
- * Called via smp_call_function_single(), must be called with correct
- * cpu affinity.
- */
+/* Reprogram MCx_MISC MSR behind this threshold bank */
 static void threshold_restart_bank(void *_tr)
 {
struct thresh_restart *tr = _tr;
-- 
2.7.0



[PATCH V2 0/5] Updates to EDAC and AMD MCE driver

2016-02-29 Thread Aravind Gopalakrishnan
This patchset mainly provides necessary EDAC bits to decode errors
occuring on Scalable MCA enabled processors and also updates AMD MCE
driver to program the correct MCx_MISC register address for upcoming
processors.

Patches 1, 2 and 3 are meant for the upcoming processors.

Patches 4 and 5 are either fixing or adding comments to help in
understanding the code and do not introduce any functional changes.

Patch 1: Move MSR definition to mce.h
Patch 2: Updates to EDAC driver to decode the new error signatures
Patch 3: Fix logic to obtain correct block address
Patch 4: Fix deferred error comment
Patch 5: Add comments to amd_nb.h to describe threshold_block structure

Tested V2 patches for regressions on Fam15h, Fam10h systems
and found none.

Note 1: Introduced new patch for moving MCx_CONFIG MSR to mce.h
Note 2: The enums ans amd_hwid_mappings[] array are placed in arch/x86
as there are follow-up patches which need the struct there

Changes from V1: (per Boris suggestions)
  - Simplify error decoding routines
  - Move headers to mce.h
  - Rename enumerations and struct members (to be more descriptive)
  - Drop gerund usage
  - Remove comments that are spelling out the code

Aravind Gopalakrishnan (5):
  x86/mce: Move MCx_CONFIG MSR definition
  EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors
  x86/mce/AMD: Fix logic to obtain block address
  x86/mce: Clarify comments regarding deferred error
  x86/mce/AMD: Add comments for easier understanding

 arch/x86/include/asm/amd_nb.h|  18 +-
 arch/x86/include/asm/mce.h   |  63 ++-
 arch/x86/include/asm/msr-index.h |   4 -
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 108 +++
 drivers/edac/mce_amd.c   | 342 ++-
 5 files changed, 486 insertions(+), 49 deletions(-)

-- 
2.7.0



[PATCH V2 0/5] Updates to EDAC and AMD MCE driver

2016-02-29 Thread Aravind Gopalakrishnan
This patchset mainly provides necessary EDAC bits to decode errors
occuring on Scalable MCA enabled processors and also updates AMD MCE
driver to program the correct MCx_MISC register address for upcoming
processors.

Patches 1, 2 and 3 are meant for the upcoming processors.

Patches 4 and 5 are either fixing or adding comments to help in
understanding the code and do not introduce any functional changes.

Patch 1: Move MSR definition to mce.h
Patch 2: Updates to EDAC driver to decode the new error signatures
Patch 3: Fix logic to obtain correct block address
Patch 4: Fix deferred error comment
Patch 5: Add comments to amd_nb.h to describe threshold_block structure

Tested V2 patches for regressions on Fam15h, Fam10h systems
and found none.

Note 1: Introduced new patch for moving MCx_CONFIG MSR to mce.h
Note 2: The enums ans amd_hwid_mappings[] array are placed in arch/x86
as there are follow-up patches which need the struct there

Changes from V1: (per Boris suggestions)
  - Simplify error decoding routines
  - Move headers to mce.h
  - Rename enumerations and struct members (to be more descriptive)
  - Drop gerund usage
  - Remove comments that are spelling out the code

Aravind Gopalakrishnan (5):
  x86/mce: Move MCx_CONFIG MSR definition
  EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors
  x86/mce/AMD: Fix logic to obtain block address
  x86/mce: Clarify comments regarding deferred error
  x86/mce/AMD: Add comments for easier understanding

 arch/x86/include/asm/amd_nb.h|  18 +-
 arch/x86/include/asm/mce.h   |  63 ++-
 arch/x86/include/asm/msr-index.h |   4 -
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 108 +++
 drivers/edac/mce_amd.c   | 342 ++-
 5 files changed, 486 insertions(+), 49 deletions(-)

-- 
2.7.0



[PATCH V2 1/5] x86/mce: Move MCx_CONFIG MSR definition

2016-02-29 Thread Aravind Gopalakrishnan
Since this is contained to only MCE code, move
the MSR definiton there instead of adding to msr-index

Per discussion here:
http://marc.info/?l=linux-kernel=145633699026474=2

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---
 arch/x86/include/asm/mce.h   | 4 
 arch/x86/include/asm/msr-index.h | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 18d2ba9..e8b09b3 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -91,6 +91,10 @@
 #define MCE_LOG_LEN 32
 #define MCE_LOG_SIGNATURE  "MACHINECHECK"
 
+/* 'SMCA': AMD64 Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+
 /*
  * This structure contains all data related to the MCE log.  Also
  * carries a signature to make it easier to find from external
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 75a5bb6..984ab75 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -269,10 +269,6 @@
 #define MSR_IA32_MC0_CTL2  0x0280
 #define MSR_IA32_MCx_CTL2(x)   (MSR_IA32_MC0_CTL2 + (x))
 
-/* 'SMCA': AMD64 Scalable MCA */
-#define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
-#define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
-
 #define MSR_P6_PERFCTR00x00c1
 #define MSR_P6_PERFCTR10x00c2
 #define MSR_P6_EVNTSEL00x0186
-- 
2.7.0



[PATCH V2 1/5] x86/mce: Move MCx_CONFIG MSR definition

2016-02-29 Thread Aravind Gopalakrishnan
Since this is contained to only MCE code, move
the MSR definiton there instead of adding to msr-index

Per discussion here:
http://marc.info/?l=linux-kernel=145633699026474=2

Signed-off-by: Aravind Gopalakrishnan 
---
 arch/x86/include/asm/mce.h   | 4 
 arch/x86/include/asm/msr-index.h | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 18d2ba9..e8b09b3 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -91,6 +91,10 @@
 #define MCE_LOG_LEN 32
 #define MCE_LOG_SIGNATURE  "MACHINECHECK"
 
+/* 'SMCA': AMD64 Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+
 /*
  * This structure contains all data related to the MCE log.  Also
  * carries a signature to make it easier to find from external
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 75a5bb6..984ab75 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -269,10 +269,6 @@
 #define MSR_IA32_MC0_CTL2  0x0280
 #define MSR_IA32_MCx_CTL2(x)   (MSR_IA32_MC0_CTL2 + (x))
 
-/* 'SMCA': AMD64 Scalable MCA */
-#define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
-#define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
-
 #define MSR_P6_PERFCTR00x00c1
 #define MSR_P6_PERFCTR10x00c2
 #define MSR_P6_EVNTSEL00x0186
-- 
2.7.0



Re: [PATCH 4/4] x86/mce/AMD: Add comments for easier understanding

2016-02-26 Thread Aravind Gopalakrishnan

On 2/26/2016 11:44 AM, Borislav Petkov wrote:


threshold_restart_bank() reprograms the MISC MSR after sanity-checking
the fields supplied for that MSR. store_threshold_limit() sets the error
count, store_interrupt_enable() enables/disables the interrupt and both
call threshold_restart_bank() to do that.

But this is basically spelling the code now - I don't think we need to
comment in that detail.


Ok, Have dropped this for V2.


/*
  * Called via smp_call_function_single(), must be called with correct
  * cpu affinity.
  */

is also useless.


Will remove these as well.


"This function provides user with capabilities to re-program the
'thresold_limit' and 'interrupt_enable' sysfs attributes"

No sorry, I don't want to be explaining every line. Just say: "Reprogram
the MISC MSR behind this threshold bank."



Ok, Will do that.

Btw, included comments around struct threshold_block to describethe members.
Do let me know if this seems OK-

 struct threshold_block {
-   unsigned intblock;
-   unsigned intbank;
-   unsigned intcpu;
-   u32 address;
-   u16 interrupt_enable;
-   boolinterrupt_capable;
-   u16 threshold_limit;
-   struct kobject  kobj;
-   struct list_headmiscj;
+   unsigned intblock;  /* Threshold 
block number within bank */
+   unsigned intbank;   /* MCA bank the 
block belongs to */
+   unsigned intcpu;/* CPU which 
controls the MCA bank */
+   u32 address;/* MSR address 
for the block */
+   u16 interrupt_enable;   /* Enable/ 
Disable APIC interrupt upon threshold error */
+   boolinterrupt_capable;  /* Specifies if 
interrupt is possible from the block */
+   u16 threshold_limit;/* Value upon 
which threshold interrupt is generated */

+   struct kobject  kobj;   /* sysfs object */
+   struct list_headmiscj;  /* Add multiple 
threshold blocks within a bank to the list */

 };

Thanks,
-Aravind.


Re: [PATCH 4/4] x86/mce/AMD: Add comments for easier understanding

2016-02-26 Thread Aravind Gopalakrishnan

On 2/26/2016 11:44 AM, Borislav Petkov wrote:


threshold_restart_bank() reprograms the MISC MSR after sanity-checking
the fields supplied for that MSR. store_threshold_limit() sets the error
count, store_interrupt_enable() enables/disables the interrupt and both
call threshold_restart_bank() to do that.

But this is basically spelling the code now - I don't think we need to
comment in that detail.


Ok, Have dropped this for V2.


/*
  * Called via smp_call_function_single(), must be called with correct
  * cpu affinity.
  */

is also useless.


Will remove these as well.


"This function provides user with capabilities to re-program the
'thresold_limit' and 'interrupt_enable' sysfs attributes"

No sorry, I don't want to be explaining every line. Just say: "Reprogram
the MISC MSR behind this threshold bank."



Ok, Will do that.

Btw, included comments around struct threshold_block to describethe members.
Do let me know if this seems OK-

 struct threshold_block {
-   unsigned intblock;
-   unsigned intbank;
-   unsigned intcpu;
-   u32 address;
-   u16 interrupt_enable;
-   boolinterrupt_capable;
-   u16 threshold_limit;
-   struct kobject  kobj;
-   struct list_headmiscj;
+   unsigned intblock;  /* Threshold 
block number within bank */
+   unsigned intbank;   /* MCA bank the 
block belongs to */
+   unsigned intcpu;/* CPU which 
controls the MCA bank */
+   u32 address;/* MSR address 
for the block */
+   u16 interrupt_enable;   /* Enable/ 
Disable APIC interrupt upon threshold error */
+   boolinterrupt_capable;  /* Specifies if 
interrupt is possible from the block */
+   u16 threshold_limit;/* Value upon 
which threshold interrupt is generated */

+   struct kobject  kobj;   /* sysfs object */
+   struct list_headmiscj;  /* Add multiple 
threshold blocks within a bank to the list */

 };

Thanks,
-Aravind.


Re: [PATCH 4/4] x86/mce/AMD: Add comments for easier understanding

2016-02-24 Thread Aravind Gopalakrishnan

On 2/23/2016 6:35 AM, Borislav Petkov wrote:

On Tue, Feb 16, 2016 at 03:45:11PM -0600, Aravind Gopalakrishnan wrote:
  
  /*

+ * Set the error_count and interrupt_enable sysfs attributes here.
+ * This function gets called during the init phase and when someone
+ * makes changes to either of the sysfs attributes.
+ * During init phase, we also program Interrupt type as 'APIC' and
+ * verify if LVT offset obtained from MCx_MISC is valid.
   * Called via smp_call_function_single(), must be called with correct
   * cpu affinity.
   */

I don't think that's what threshold_restart_bank() does...


Hmm, we call this from mce_threshold_block_init() with set_lvt_off = 1 
to write LVT offset value to MCi_MISC.

And we call this from store_interrupt_enable() to program APIC INT TYPE-
if (tr->b->interrupt_enable)
hi |= INT_TYPE_APIC;

and from store_threshold_limit() to re-set the "error count"-
hi = (hi & ~MASK_ERR_COUNT_HI) |
(new_count & THRESHOLD_MAX);

So I thought it fit the description as to "what" it does..


Also, that comment is too much - it shouldn't explain "what" but "why".


How about-

"This function provides user with capabilities to re-program the 
'thresold_limit' and 'interrupt_enable' sysfs attributes"




@@ -262,6 +267,11 @@ static int setup_APIC_deferred_error(int reserved, int new)
return reserved;
  }
  
+/*

+ * Obtain LVT offset from MSR_CU_DEF_ERR and call
+ * setup_APIC_deferred_error() to program relevant APIC register.
+ * Also, register a deferred error interrupt handler
+ */

No, that's basically spelling what the code does.


Ok, I'll remove this.


  static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
  {
u32 low = 0, high = 0;
@@ -338,6 +348,14 @@ nextaddr_out:
return addr;
  }
  
+/*

+ * struct threshold_block descriptor tracks useful info regarding the
+ * banks' MISC register. Among other things, it tracks whether interrupt
+ * is possible for the given bank, the threshold limit and the sysfs object
+ * that outputs these info.

That should be in form of comments explaining what the members of struct
threshold_block are, where that struct is defined.


Ok, I'll remove comments here and add it to arch/x86/include/asm/amd_nb.h


Initializing the struct here, programming
+ * LVT offset for threshold interrupts and registering a interrupt handler
+ * if we haven't already done so

Also spelling the code.


Will remove this

Thanks,
-Aravind.


Re: [PATCH 4/4] x86/mce/AMD: Add comments for easier understanding

2016-02-24 Thread Aravind Gopalakrishnan

On 2/23/2016 6:35 AM, Borislav Petkov wrote:

On Tue, Feb 16, 2016 at 03:45:11PM -0600, Aravind Gopalakrishnan wrote:
  
  /*

+ * Set the error_count and interrupt_enable sysfs attributes here.
+ * This function gets called during the init phase and when someone
+ * makes changes to either of the sysfs attributes.
+ * During init phase, we also program Interrupt type as 'APIC' and
+ * verify if LVT offset obtained from MCx_MISC is valid.
   * Called via smp_call_function_single(), must be called with correct
   * cpu affinity.
   */

I don't think that's what threshold_restart_bank() does...


Hmm, we call this from mce_threshold_block_init() with set_lvt_off = 1 
to write LVT offset value to MCi_MISC.

And we call this from store_interrupt_enable() to program APIC INT TYPE-
if (tr->b->interrupt_enable)
hi |= INT_TYPE_APIC;

and from store_threshold_limit() to re-set the "error count"-
hi = (hi & ~MASK_ERR_COUNT_HI) |
(new_count & THRESHOLD_MAX);

So I thought it fit the description as to "what" it does..


Also, that comment is too much - it shouldn't explain "what" but "why".


How about-

"This function provides user with capabilities to re-program the 
'thresold_limit' and 'interrupt_enable' sysfs attributes"




@@ -262,6 +267,11 @@ static int setup_APIC_deferred_error(int reserved, int new)
return reserved;
  }
  
+/*

+ * Obtain LVT offset from MSR_CU_DEF_ERR and call
+ * setup_APIC_deferred_error() to program relevant APIC register.
+ * Also, register a deferred error interrupt handler
+ */

No, that's basically spelling what the code does.


Ok, I'll remove this.


  static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
  {
u32 low = 0, high = 0;
@@ -338,6 +348,14 @@ nextaddr_out:
return addr;
  }
  
+/*

+ * struct threshold_block descriptor tracks useful info regarding the
+ * banks' MISC register. Among other things, it tracks whether interrupt
+ * is possible for the given bank, the threshold limit and the sysfs object
+ * that outputs these info.

That should be in form of comments explaining what the members of struct
threshold_block are, where that struct is defined.


Ok, I'll remove comments here and add it to arch/x86/include/asm/amd_nb.h


Initializing the struct here, programming
+ * LVT offset for threshold interrupts and registering a interrupt handler
+ * if we haven't already done so

Also spelling the code.


Will remove this

Thanks,
-Aravind.


Re: [PATCH 3/4] x86/mce: Clarify comments regarding deferred error

2016-02-24 Thread Aravind Gopalakrishnan

On 2/24/2016 5:37 AM, Borislav Petkov wrote:

On Tue, Feb 23, 2016 at 05:02:40PM -0600, Aravind Gopalakrishnan wrote:

On 2/23/16 6:11 AM, Borislav Petkov wrote:

On Tue, Feb 16, 2016 at 03:45:10PM -0600, Aravind Gopalakrishnan wrote:

  /* AMD-specific bits */
-#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
+#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare a deferred error */

/* uncorrected error, deferred exception */

sounds better to me.

Hmm. Well, Deferred error is a separate class of error by itself.
It's neither Corrected in HW nor is it Uncorrected like a MCE.

Let's consult the BKDG:

"Deferred: deferred error.


So it is an uncorrected error for which the raising of the error
exception was deferred until consumption.


Yep. Okay, I'll fix as you suggested.


If you feel "Uncorrected error, deferred error exception" won;t be
confusing, that's OK with me.

Why would it be confusing? It is describing exactly what a deferred
error is, albeit a bit too laconic but people can find the longer
description.



That's precisely it-
I thought I wasn't descriptive enough. But yeah, I guess I can include a 
reference to BKDG as well if anyone wants a detailed description.


Thanks,
-Aravind.


Re: [PATCH 3/4] x86/mce: Clarify comments regarding deferred error

2016-02-24 Thread Aravind Gopalakrishnan

On 2/24/2016 5:37 AM, Borislav Petkov wrote:

On Tue, Feb 23, 2016 at 05:02:40PM -0600, Aravind Gopalakrishnan wrote:

On 2/23/16 6:11 AM, Borislav Petkov wrote:

On Tue, Feb 16, 2016 at 03:45:10PM -0600, Aravind Gopalakrishnan wrote:

  /* AMD-specific bits */
-#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
+#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare a deferred error */

/* uncorrected error, deferred exception */

sounds better to me.

Hmm. Well, Deferred error is a separate class of error by itself.
It's neither Corrected in HW nor is it Uncorrected like a MCE.

Let's consult the BKDG:

"Deferred: deferred error.


So it is an uncorrected error for which the raising of the error
exception was deferred until consumption.


Yep. Okay, I'll fix as you suggested.


If you feel "Uncorrected error, deferred error exception" won;t be
confusing, that's OK with me.

Why would it be confusing? It is describing exactly what a deferred
error is, albeit a bit too laconic but people can find the longer
description.



That's precisely it-
I thought I wasn't descriptive enough. But yeah, I guess I can include a 
reference to BKDG as well if anyone wants a detailed description.


Thanks,
-Aravind.


Re: [PATCH 2/4] x86/mce/AMD: Fix logic to obtain block address

2016-02-24 Thread Aravind Gopalakrishnan

On 2/24/2016 5:33 AM, Borislav Petkov wrote:

On Tue, Feb 23, 2016 at 04:56:38PM -0600, Aravind Gopalakrishnan wrote:

I think MSR_AMD64_SMCA_MC0_MISC0 would be required in mce.c as well.
So might be better to retain it here.

Actually, I'm thinking, these all are - even if used in multiple files
- all MCE-specific and therefore used in MCE/RAS-specific code. So they
all should go into mce.h. Everything RAS includes that header so they're
perfectly fine there...


Hmm. We introduced MSR_AMD64_SMCA_MCx_CONFIG in this patch-
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=e6c8f1873be8a14c7e44202df1f7e6ea61bf3352

Should I change that as well and move it to mce.h?


(It comes up to 81 chars, but will ignore checkpatch in this case..)

The 80-cols rule is not a hard one. Here's some food for thought:

https://lkml.kernel.org/r/20160219095132.ga9...@gmail.com



Got it.

Thanks,
-Aravind.


Re: [PATCH 2/4] x86/mce/AMD: Fix logic to obtain block address

2016-02-24 Thread Aravind Gopalakrishnan

On 2/24/2016 5:33 AM, Borislav Petkov wrote:

On Tue, Feb 23, 2016 at 04:56:38PM -0600, Aravind Gopalakrishnan wrote:

I think MSR_AMD64_SMCA_MC0_MISC0 would be required in mce.c as well.
So might be better to retain it here.

Actually, I'm thinking, these all are - even if used in multiple files
- all MCE-specific and therefore used in MCE/RAS-specific code. So they
all should go into mce.h. Everything RAS includes that header so they're
perfectly fine there...


Hmm. We introduced MSR_AMD64_SMCA_MCx_CONFIG in this patch-
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=e6c8f1873be8a14c7e44202df1f7e6ea61bf3352

Should I change that as well and move it to mce.h?


(It comes up to 81 chars, but will ignore checkpatch in this case..)

The 80-cols rule is not a hard one. Here's some food for thought:

https://lkml.kernel.org/r/20160219095132.ga9...@gmail.com



Got it.

Thanks,
-Aravind.


Re: [PATCH 1/4] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors

2016-02-24 Thread Aravind Gopalakrishnan

On 2/24/2016 5:28 AM, Borislav Petkov wrote:

On Tue, Feb 23, 2016 at 04:50:37PM -0600, Aravind Gopalakrishnan wrote:

Sorry about that. Looks like this pair is not defined in spelling.txt. So,
might be worth adding there as well?

Oh geez, we have a spelling.txt! I think we can declare the kernel as
done and go do something else with our lives...


Haha:)


It's the block for programming FUSE registers.

Oh, that's what it is.

So maybe "fuses block" or "fuses" or ... just the capitalized "FUSE" is
kinda misleading.


Asked about this internally.
Looks like it might be renamed to "Parameter block". So, I'll use that.


How about "Unable to gather IP block that threw the error. Therefore cannot
decode errors further.\n"

Or simply "Invalid IP block specified, error information is unreliable."
and still continue decoding. It might still be recognizable from the
signature, methinks.


Hmm. We might be able to decode other bits of MCi_STATUS. Not the XEC 
which is what we do in this function.

So better to return early if we can't figure out which IP block to indict.


If for some reason the CPUID bit is not set, then we should not assume the
processor supports the features right?

Is that even remotely possible? If yes, then we should keep the warning,
otherwise it is useless.



Yes, it is possible.

Thanks,
-Aravind.


Re: [PATCH 1/4] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors

2016-02-24 Thread Aravind Gopalakrishnan

On 2/24/2016 5:28 AM, Borislav Petkov wrote:

On Tue, Feb 23, 2016 at 04:50:37PM -0600, Aravind Gopalakrishnan wrote:

Sorry about that. Looks like this pair is not defined in spelling.txt. So,
might be worth adding there as well?

Oh geez, we have a spelling.txt! I think we can declare the kernel as
done and go do something else with our lives...


Haha:)


It's the block for programming FUSE registers.

Oh, that's what it is.

So maybe "fuses block" or "fuses" or ... just the capitalized "FUSE" is
kinda misleading.


Asked about this internally.
Looks like it might be renamed to "Parameter block". So, I'll use that.


How about "Unable to gather IP block that threw the error. Therefore cannot
decode errors further.\n"

Or simply "Invalid IP block specified, error information is unreliable."
and still continue decoding. It might still be recognizable from the
signature, methinks.


Hmm. We might be able to decode other bits of MCi_STATUS. Not the XEC 
which is what we do in this function.

So better to return early if we can't figure out which IP block to indict.


If for some reason the CPUID bit is not set, then we should not assume the
processor supports the features right?

Is that even remotely possible? If yes, then we should keep the warning,
otherwise it is useless.



Yes, it is possible.

Thanks,
-Aravind.


Re: [PATCH 1/4] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors

2016-02-23 Thread Aravind Gopalakrishnan



On 2/23/16 6:37 AM, Borislav Petkov wrote:

On Tue, Feb 16, 2016 at 03:45:08PM -0600, Aravind Gopalakrishnan wrote:

  /* AMD-specific bits */
  #define MCI_STATUS_DEFERRED   (1ULL<<44)  /* declare an uncorrected error */
  #define MCI_STATUS_POISON (1ULL<<43)  /* access poisonous data */
+#define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */

\n



Ack.


+/*
+ * McaX field if set indicates a given bank supports MCA extensions:
+ *  - Deferred error interrupt type is specifiable by bank
+ *  - BlkPtr field indicates prescence of extended MISC registers.

^

Btw, that's MCi_MISC[BlkPtr] ?


MCi_MISC0[BlkPtr] specifically. Will update the comments about this.


Also, please integrate a spell checker into your patch creation
workflow.


Sorry about that. Looks like this pair is not defined in spelling.txt. 
So, might be worth adding there as well?



+ *But should not be used to determine MSR numbers
+ *  - TCC bit is present in MCx_STATUS

All sentences end with a "."


Will fix.




+/*
+ * Enumerating new IP types and HWID values

Please stop using gerund, i.e., -ing, in comments and commit messages.

"Enumerate new IP ..." is just fine.


Ack.




+ * in ScalableMCA enabled AMD processors
+ */
+#ifdef CONFIG_X86_MCE_AMD
+enum ip_types {

AMD-specific so "amd_ip_types"


Ok, will fix.




+   F17H_CORE = 0,  /* Core errors */
+   DF, /* Data Fabric */
+   UMC,/* Unified Memory Controller */
+   FUSE,   /* FUSE subsystem */

What's FUSE subsystem?


It's the block for programming FUSE registers.



In any case, this could use a different name in order not to confuse
with Linux's filesystem in userspace.


Ok, will ask internally as well as to what name suits here.


+
+struct hwid {

amd_hwid and so on. All below should have the "amd_" prefix so that it
is obvious.


Will fix.




+   const char *ipname;
+   unsigned int hwid_value;
+};
+
+extern struct hwid hwid_mappings[N_IP_TYPES];
+
+enum core_mcatypes {
+   LS = 0, /* Load Store */
+   IF, /* Instruction Fetch */
+   L2_CACHE,   /* L2 cache */
+   DE, /* Decoder unit */
+   RES,/* Reserved */
+   EX, /* Execution unit */
+   FP, /* Floating Point */
+   L3_CACHE/* L3 cache */
+};
+
+enum df_mcatypes {
+   CS = 0, /* Coherent Slave */
+   PIE /* Power management, Interrupts, etc */
+};
+#endif

So all those are defined here but we have a header for exactly that
drivers/edac/mce_amd.h. And then you define and export hwid_mappings in
arch/x86/kernel/cpu/mcheck/mce_amd.c to not use it there.

Why isn't all this in drivers/edac/mce_amd.[ch] ?

And if it is there, then you obviously don't need the "amd_" prefix.


I have a patch that uses these enums here. But I didn't send it out 
along with this patchset as I was testing the patch.

So yes, I need it here and in the EDAC driver.




+
  #endif /* _ASM_X86_MCE_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 5523465..93bccbc 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -266,7 +266,9 @@
  
  /* 'SMCA': AMD64 Scalable MCA */

  #define MSR_AMD64_SMCA_MC0_CONFIG 0xc0002004
+#define MSR_AMD64_SMCA_MC0_IPID0xc0002005
  #define MSR_AMD64_SMCA_MCx_CONFIG(x)  (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))

Are those MSRs used in multiple files? If not, -> mce.h.


Yes, I'll need them in arch/x86/.../mce_amd.c as well.
A later patch will be using it there.

  
  
+/* Defining HWID to IP type mappings for Scalable MCA */

" Define ..."


Ack




+   case L3_CACHE:
+   if (xec > (ARRAY_SIZE(f17h_l3_mce_desc) - 1))
+   goto wrong_f17hcore_error;
+
+   pr_cont("%s.\n", f17h_l3_mce_desc[xec]);
+   break;
+
+   default:
+   goto wrong_f17hcore_error;

That's a lot of repeated code. You can assign the desc array to a temp
variable depending on mca_type and do the if and pr_cont only once using
that temp variable.


Ok, will simplify.




+
+   case PIE:
+   if (xec > (ARRAY_SIZE(f17h_pie_mce_desc) - 1))
+   goto wrong_df_error;
+
+   pr_cont("%s.\n", f17h_pie_mce_desc[xec]);
+   break;

Ditto.



Will fix.


+
+/* Decode errors according to Scalable MCA specification */
+static void decode_smca_errors(struct mce *m)
+{
+   u32 low, high;
+   u32 addr = MSR_AMD64_SMCA_MCx_IPID(m->bank);
+   unsigned int hwid, mca_type, i;
+   u8 xec = XEC(m->status, xec_mask);
+
+   if (rdmsr_safe(addr, , )) {
+   pr_emerg(&qu

Re: [PATCH 1/4] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors

2016-02-23 Thread Aravind Gopalakrishnan



On 2/23/16 6:37 AM, Borislav Petkov wrote:

On Tue, Feb 16, 2016 at 03:45:08PM -0600, Aravind Gopalakrishnan wrote:

  /* AMD-specific bits */
  #define MCI_STATUS_DEFERRED   (1ULL<<44)  /* declare an uncorrected error */
  #define MCI_STATUS_POISON (1ULL<<43)  /* access poisonous data */
+#define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */

\n



Ack.


+/*
+ * McaX field if set indicates a given bank supports MCA extensions:
+ *  - Deferred error interrupt type is specifiable by bank
+ *  - BlkPtr field indicates prescence of extended MISC registers.

^

Btw, that's MCi_MISC[BlkPtr] ?


MCi_MISC0[BlkPtr] specifically. Will update the comments about this.


Also, please integrate a spell checker into your patch creation
workflow.


Sorry about that. Looks like this pair is not defined in spelling.txt. 
So, might be worth adding there as well?



+ *But should not be used to determine MSR numbers
+ *  - TCC bit is present in MCx_STATUS

All sentences end with a "."


Will fix.




+/*
+ * Enumerating new IP types and HWID values

Please stop using gerund, i.e., -ing, in comments and commit messages.

"Enumerate new IP ..." is just fine.


Ack.




+ * in ScalableMCA enabled AMD processors
+ */
+#ifdef CONFIG_X86_MCE_AMD
+enum ip_types {

AMD-specific so "amd_ip_types"


Ok, will fix.




+   F17H_CORE = 0,  /* Core errors */
+   DF, /* Data Fabric */
+   UMC,/* Unified Memory Controller */
+   FUSE,   /* FUSE subsystem */

What's FUSE subsystem?


It's the block for programming FUSE registers.



In any case, this could use a different name in order not to confuse
with Linux's filesystem in userspace.


Ok, will ask internally as well as to what name suits here.


+
+struct hwid {

amd_hwid and so on. All below should have the "amd_" prefix so that it
is obvious.


Will fix.




+   const char *ipname;
+   unsigned int hwid_value;
+};
+
+extern struct hwid hwid_mappings[N_IP_TYPES];
+
+enum core_mcatypes {
+   LS = 0, /* Load Store */
+   IF, /* Instruction Fetch */
+   L2_CACHE,   /* L2 cache */
+   DE, /* Decoder unit */
+   RES,/* Reserved */
+   EX, /* Execution unit */
+   FP, /* Floating Point */
+   L3_CACHE/* L3 cache */
+};
+
+enum df_mcatypes {
+   CS = 0, /* Coherent Slave */
+   PIE /* Power management, Interrupts, etc */
+};
+#endif

So all those are defined here but we have a header for exactly that
drivers/edac/mce_amd.h. And then you define and export hwid_mappings in
arch/x86/kernel/cpu/mcheck/mce_amd.c to not use it there.

Why isn't all this in drivers/edac/mce_amd.[ch] ?

And if it is there, then you obviously don't need the "amd_" prefix.


I have a patch that uses these enums here. But I didn't send it out 
along with this patchset as I was testing the patch.

So yes, I need it here and in the EDAC driver.




+
  #endif /* _ASM_X86_MCE_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 5523465..93bccbc 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -266,7 +266,9 @@
  
  /* 'SMCA': AMD64 Scalable MCA */

  #define MSR_AMD64_SMCA_MC0_CONFIG 0xc0002004
+#define MSR_AMD64_SMCA_MC0_IPID0xc0002005
  #define MSR_AMD64_SMCA_MCx_CONFIG(x)  (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))

Are those MSRs used in multiple files? If not, -> mce.h.


Yes, I'll need them in arch/x86/.../mce_amd.c as well.
A later patch will be using it there.

  
  
+/* Defining HWID to IP type mappings for Scalable MCA */

" Define ..."


Ack




+   case L3_CACHE:
+   if (xec > (ARRAY_SIZE(f17h_l3_mce_desc) - 1))
+   goto wrong_f17hcore_error;
+
+   pr_cont("%s.\n", f17h_l3_mce_desc[xec]);
+   break;
+
+   default:
+   goto wrong_f17hcore_error;

That's a lot of repeated code. You can assign the desc array to a temp
variable depending on mca_type and do the if and pr_cont only once using
that temp variable.


Ok, will simplify.




+
+   case PIE:
+   if (xec > (ARRAY_SIZE(f17h_pie_mce_desc) - 1))
+   goto wrong_df_error;
+
+   pr_cont("%s.\n", f17h_pie_mce_desc[xec]);
+   break;

Ditto.



Will fix.


+
+/* Decode errors according to Scalable MCA specification */
+static void decode_smca_errors(struct mce *m)
+{
+   u32 low, high;
+   u32 addr = MSR_AMD64_SMCA_MCx_IPID(m->bank);
+   unsigned int hwid, mca_type, i;
+   u8 xec = XEC(m->status, xec_mask);
+
+   if (rdmsr_safe(addr, , )) {
+   pr_emerg(&qu

Re: [PATCH 3/4] x86/mce: Clarify comments regarding deferred error

2016-02-23 Thread Aravind Gopalakrishnan



On 2/23/16 6:11 AM, Borislav Petkov wrote:

On Tue, Feb 16, 2016 at 03:45:10PM -0600, Aravind Gopalakrishnan wrote:

  /* AMD-specific bits */
-#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
+#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare a deferred error */

/* uncorrected error, deferred exception */

sounds better to me.


Hmm. Well, Deferred error is a separate class of error by itself.
It's neither Corrected in HW nor is it Uncorrected like a MCE.

If you feel "Uncorrected error, deferred error exception" won;t be 
confusing, that's OK with me.





For the future, such cleanups/fixes should always go first in the patch
set.



Ok, I'll retain the order this time for V2 patchset as well.
But noted for future.

Thanks,
-Aravind.



Re: [PATCH 3/4] x86/mce: Clarify comments regarding deferred error

2016-02-23 Thread Aravind Gopalakrishnan



On 2/23/16 6:11 AM, Borislav Petkov wrote:

On Tue, Feb 16, 2016 at 03:45:10PM -0600, Aravind Gopalakrishnan wrote:

  /* AMD-specific bits */
-#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
+#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare a deferred error */

/* uncorrected error, deferred exception */

sounds better to me.


Hmm. Well, Deferred error is a separate class of error by itself.
It's neither Corrected in HW nor is it Uncorrected like a MCE.

If you feel "Uncorrected error, deferred error exception" won;t be 
confusing, that's OK with me.





For the future, such cleanups/fixes should always go first in the patch
set.



Ok, I'll retain the order this time for V2 patchset as well.
But noted for future.

Thanks,
-Aravind.



Re: [PATCH 2/4] x86/mce/AMD: Fix logic to obtain block address

2016-02-23 Thread Aravind Gopalakrishnan



On 2/23/16 6:39 AM, Borislav Petkov wrote:

On Tue, Feb 16, 2016 at 03:45:09PM -0600, Aravind Gopalakrishnan wrote:
  
  /* 'SMCA': AMD64 Scalable MCA */

+#define MSR_AMD64_SMCA_MC0_MISC0   0xc0002003
  #define MSR_AMD64_SMCA_MC0_CONFIG 0xc0002004
  #define MSR_AMD64_SMCA_MC0_IPID   0xc0002005
+#define MSR_AMD64_SMCA_MC0_MISC1   0xc000200a
+#define MSR_AMD64_SMCA_MCx_MISC(x) (MSR_AMD64_SMCA_MC0_MISC0 + 0x10*(x))
  #define MSR_AMD64_SMCA_MCx_CONFIG(x)  (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
  #define MSR_AMD64_SMCA_MCx_IPID(x)(MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_MISCy(x, y) ((MSR_AMD64_SMCA_MC0_MISC1 + y) + 
(0x10*(x)))

Are those MSRs going to be used in multiple files? If not, they should
all go to mce.h.


I think MSR_AMD64_SMCA_MC0_MISC0 would be required in mce.c as well.
So might be better to retain it here.

MSR_AMD64_SMCA_MC0_MISC1 might be required only in mce_amd.c, So, I'll 
move it to mce.h




  
  
+static u32 get_block_address(u32 current_addr,

+u32 low,
+u32 high,
+unsigned int bank,
+unsigned int block)

Use arg formatting like the rest of functions in the file please.


Will fix.


+   u32 smca_low, smca_high;

s/smca_//


Will fix.




+
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank),
+  _low, _high) ||
+   !(smca_low & MCI_CONFIG_MCAX))
+   goto nextaddr_out;
+
+   if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank),
+   _low, _high) &&
+   (smca_low & MASK_BLKPTR_LO))
+   addr = MSR_AMD64_SMCA_MCx_MISCy(bank,
+   block - 1);

unnecessary line break.



Will fix it like so-
addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);

(It comes up to 81 chars, but will ignore checkpatch in this case..)

Thanks,
-Aravind.


Re: [PATCH 2/4] x86/mce/AMD: Fix logic to obtain block address

2016-02-23 Thread Aravind Gopalakrishnan



On 2/23/16 6:39 AM, Borislav Petkov wrote:

On Tue, Feb 16, 2016 at 03:45:09PM -0600, Aravind Gopalakrishnan wrote:
  
  /* 'SMCA': AMD64 Scalable MCA */

+#define MSR_AMD64_SMCA_MC0_MISC0   0xc0002003
  #define MSR_AMD64_SMCA_MC0_CONFIG 0xc0002004
  #define MSR_AMD64_SMCA_MC0_IPID   0xc0002005
+#define MSR_AMD64_SMCA_MC0_MISC1   0xc000200a
+#define MSR_AMD64_SMCA_MCx_MISC(x) (MSR_AMD64_SMCA_MC0_MISC0 + 0x10*(x))
  #define MSR_AMD64_SMCA_MCx_CONFIG(x)  (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
  #define MSR_AMD64_SMCA_MCx_IPID(x)(MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_MISCy(x, y) ((MSR_AMD64_SMCA_MC0_MISC1 + y) + 
(0x10*(x)))

Are those MSRs going to be used in multiple files? If not, they should
all go to mce.h.


I think MSR_AMD64_SMCA_MC0_MISC0 would be required in mce.c as well.
So might be better to retain it here.

MSR_AMD64_SMCA_MC0_MISC1 might be required only in mce_amd.c, So, I'll 
move it to mce.h




  
  
+static u32 get_block_address(u32 current_addr,

+u32 low,
+u32 high,
+unsigned int bank,
+unsigned int block)

Use arg formatting like the rest of functions in the file please.


Will fix.


+   u32 smca_low, smca_high;

s/smca_//


Will fix.




+
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank),
+  _low, _high) ||
+   !(smca_low & MCI_CONFIG_MCAX))
+   goto nextaddr_out;
+
+   if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank),
+   _low, _high) &&
+   (smca_low & MASK_BLKPTR_LO))
+   addr = MSR_AMD64_SMCA_MCx_MISCy(bank,
+   block - 1);

unnecessary line break.



Will fix it like so-
addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);

(It comes up to 81 chars, but will ignore checkpatch in this case..)

Thanks,
-Aravind.


Re: [PATCH 2/4] x86/mce/AMD: Fix logic to obtain block address

2016-02-18 Thread Aravind Gopalakrishnan

On 2/16/2016 3:45 PM, Aravind Gopalakrishnan wrote:

In upcoming processors, the BLKPTR field is no longer used
to indicate the MSR number of the additional register.
Insted, it simply indicates the prescence of additional MSRs.

Fixing the logic here to gather MSR address from
MSR_AMD64_SMCA_MCx_MISC() for newer processors
and we fall back to existing logic for older processors.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---


Caught couple of issues-



+   /* Fall back to method we used for older processors */
+   switch (block) {
+   case 0:
+   addr = MSR_IA32_MCx_MISC(bank);
+   break;
+   case 1:
+   offset = ((low & MASK_BLKPTR_LO) >> 21);
+   if (offset)
+   addr = MCG_XBLK_ADDR + offset;
+   break;
+   default:
+   addr = current_addr++;
+   }
+



This needs to be addr = ++current_addr;



+   address = get_block_address(address, high, low,
+   bank, block);


The 'high' and 'low' variables need to be swapped.
Missed this during a rebase to latest tip, Apologies..



+   address = get_block_address(address, high, low,
+   bank, block);


and here..


+   address = get_block_address(address, high, low, bank, ++block);
+


and here..


+   if (!address)
+   return 0;
  



Apologies, these didn't show up on initial testing locally..

Fixed these on local branch and it seems to work fine.
I'll send it out as a V2 (Shall wait for further comments/reviews before 
I do that).


Thanks,
-Aravind.


Re: [PATCH 2/4] x86/mce/AMD: Fix logic to obtain block address

2016-02-18 Thread Aravind Gopalakrishnan

On 2/16/2016 3:45 PM, Aravind Gopalakrishnan wrote:

In upcoming processors, the BLKPTR field is no longer used
to indicate the MSR number of the additional register.
Insted, it simply indicates the prescence of additional MSRs.

Fixing the logic here to gather MSR address from
MSR_AMD64_SMCA_MCx_MISC() for newer processors
and we fall back to existing logic for older processors.

Signed-off-by: Aravind Gopalakrishnan 
---


Caught couple of issues-



+   /* Fall back to method we used for older processors */
+   switch (block) {
+   case 0:
+   addr = MSR_IA32_MCx_MISC(bank);
+   break;
+   case 1:
+   offset = ((low & MASK_BLKPTR_LO) >> 21);
+   if (offset)
+   addr = MCG_XBLK_ADDR + offset;
+   break;
+   default:
+   addr = current_addr++;
+   }
+



This needs to be addr = ++current_addr;



+   address = get_block_address(address, high, low,
+   bank, block);


The 'high' and 'low' variables need to be swapped.
Missed this during a rebase to latest tip, Apologies..



+   address = get_block_address(address, high, low,
+   bank, block);


and here..


+   address = get_block_address(address, high, low, bank, ++block);
+


and here..


+   if (!address)
+   return 0;
  



Apologies, these didn't show up on initial testing locally..

Fixed these on local branch and it seems to work fine.
I'll send it out as a V2 (Shall wait for further comments/reviews before 
I do that).


Thanks,
-Aravind.


[PATCH 2/4] x86/mce/AMD: Fix logic to obtain block address

2016-02-16 Thread Aravind Gopalakrishnan
In upcoming processors, the BLKPTR field is no longer used
to indicate the MSR number of the additional register.
Insted, it simply indicates the prescence of additional MSRs.

Fixing the logic here to gather MSR address from
MSR_AMD64_SMCA_MCx_MISC() for newer processors
and we fall back to existing logic for older processors.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---
 arch/x86/include/asm/msr-index.h |  4 ++
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 94 +---
 2 files changed, 69 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 93bccbc..ca49e928e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -265,10 +265,14 @@
 #define MSR_IA32_MCx_CTL2(x)   (MSR_IA32_MC0_CTL2 + (x))
 
 /* 'SMCA': AMD64 Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_MISC0   0xc0002003
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
 #define MSR_AMD64_SMCA_MC0_IPID0xc0002005
+#define MSR_AMD64_SMCA_MC0_MISC1   0xc000200a
+#define MSR_AMD64_SMCA_MCx_MISC(x) (MSR_AMD64_SMCA_MC0_MISC0 + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_MISCy(x, y) ((MSR_AMD64_SMCA_MC0_MISC1 + y) + 
(0x10*(x)))
 
 #define MSR_P6_PERFCTR00x00c1
 #define MSR_P6_PERFCTR10x00c2
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 8169103..4bdc836 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -286,6 +286,58 @@ static void deferred_error_interrupt_enable(struct 
cpuinfo_x86 *c)
wrmsr(MSR_CU_DEF_ERR, low, high);
 }
 
+static u32 get_block_address(u32 current_addr,
+u32 low,
+u32 high,
+unsigned int bank,
+unsigned int block)
+{
+   u32 addr = 0, offset = 0;
+
+   if (mce_flags.smca) {
+   if (!block) {
+   addr = MSR_AMD64_SMCA_MCx_MISC(bank);
+   } else {
+   /*
+* For SMCA enabled processors, BLKPTR field
+* of the first MISC register (MCx_MISC0) indicates
+* presence of additional MISC register set (MISC1-4)
+*/
+   u32 smca_low, smca_high;
+
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank),
+  _low, _high) ||
+   !(smca_low & MCI_CONFIG_MCAX))
+   goto nextaddr_out;
+
+   if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank),
+   _low, _high) &&
+   (smca_low & MASK_BLKPTR_LO))
+   addr = MSR_AMD64_SMCA_MCx_MISCy(bank,
+   block - 1);
+   }
+
+   goto nextaddr_out;
+   }
+
+   /* Fall back to method we used for older processors */
+   switch (block) {
+   case 0:
+   addr = MSR_IA32_MCx_MISC(bank);
+   break;
+   case 1:
+   offset = ((low & MASK_BLKPTR_LO) >> 21);
+   if (offset)
+   addr = MCG_XBLK_ADDR + offset;
+   break;
+   default:
+   addr = current_addr++;
+   }
+
+nextaddr_out:
+   return addr;
+}
+
 static int
 prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
int offset, u32 misc_high)
@@ -348,16 +400,10 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 
for (bank = 0; bank < mca_cfg.banks; ++bank) {
for (block = 0; block < NR_BLOCKS; ++block) {
-   if (block == 0)
-   address = MSR_IA32_MCx_MISC(bank);
-   else if (block == 1) {
-   address = (low & MASK_BLKPTR_LO) >> 21;
-   if (!address)
-   break;
-
-   address += MCG_XBLK_ADDR;
-   } else
-   ++address;
+   address = get_block_address(address, high, low,
+   bank, block);
+   if (!address)
+   break;
 
if (rdmsr_safe(address, , ))
break;
@@ -462,16 +508,10 @@ static void amd_threshold_interrupt(void)
if (!(per_cpu(bank_map, cpu) 

[PATCH 2/4] x86/mce/AMD: Fix logic to obtain block address

2016-02-16 Thread Aravind Gopalakrishnan
In upcoming processors, the BLKPTR field is no longer used
to indicate the MSR number of the additional register.
Insted, it simply indicates the prescence of additional MSRs.

Fixing the logic here to gather MSR address from
MSR_AMD64_SMCA_MCx_MISC() for newer processors
and we fall back to existing logic for older processors.

Signed-off-by: Aravind Gopalakrishnan 
---
 arch/x86/include/asm/msr-index.h |  4 ++
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 94 +---
 2 files changed, 69 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 93bccbc..ca49e928e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -265,10 +265,14 @@
 #define MSR_IA32_MCx_CTL2(x)   (MSR_IA32_MC0_CTL2 + (x))
 
 /* 'SMCA': AMD64 Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_MISC0   0xc0002003
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
 #define MSR_AMD64_SMCA_MC0_IPID0xc0002005
+#define MSR_AMD64_SMCA_MC0_MISC1   0xc000200a
+#define MSR_AMD64_SMCA_MCx_MISC(x) (MSR_AMD64_SMCA_MC0_MISC0 + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_MISCy(x, y) ((MSR_AMD64_SMCA_MC0_MISC1 + y) + 
(0x10*(x)))
 
 #define MSR_P6_PERFCTR00x00c1
 #define MSR_P6_PERFCTR10x00c2
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 8169103..4bdc836 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -286,6 +286,58 @@ static void deferred_error_interrupt_enable(struct 
cpuinfo_x86 *c)
wrmsr(MSR_CU_DEF_ERR, low, high);
 }
 
+static u32 get_block_address(u32 current_addr,
+u32 low,
+u32 high,
+unsigned int bank,
+unsigned int block)
+{
+   u32 addr = 0, offset = 0;
+
+   if (mce_flags.smca) {
+   if (!block) {
+   addr = MSR_AMD64_SMCA_MCx_MISC(bank);
+   } else {
+   /*
+* For SMCA enabled processors, BLKPTR field
+* of the first MISC register (MCx_MISC0) indicates
+* presence of additional MISC register set (MISC1-4)
+*/
+   u32 smca_low, smca_high;
+
+   if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank),
+  _low, _high) ||
+   !(smca_low & MCI_CONFIG_MCAX))
+   goto nextaddr_out;
+
+   if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank),
+   _low, _high) &&
+   (smca_low & MASK_BLKPTR_LO))
+   addr = MSR_AMD64_SMCA_MCx_MISCy(bank,
+   block - 1);
+   }
+
+   goto nextaddr_out;
+   }
+
+   /* Fall back to method we used for older processors */
+   switch (block) {
+   case 0:
+   addr = MSR_IA32_MCx_MISC(bank);
+   break;
+   case 1:
+   offset = ((low & MASK_BLKPTR_LO) >> 21);
+   if (offset)
+   addr = MCG_XBLK_ADDR + offset;
+   break;
+   default:
+   addr = current_addr++;
+   }
+
+nextaddr_out:
+   return addr;
+}
+
 static int
 prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
int offset, u32 misc_high)
@@ -348,16 +400,10 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 
for (bank = 0; bank < mca_cfg.banks; ++bank) {
for (block = 0; block < NR_BLOCKS; ++block) {
-   if (block == 0)
-   address = MSR_IA32_MCx_MISC(bank);
-   else if (block == 1) {
-   address = (low & MASK_BLKPTR_LO) >> 21;
-   if (!address)
-   break;
-
-   address += MCG_XBLK_ADDR;
-   } else
-   ++address;
+   address = get_block_address(address, high, low,
+   bank, block);
+   if (!address)
+   break;
 
if (rdmsr_safe(address, , ))
break;
@@ -462,16 +508,10 @@ static void amd_threshold_interrupt(void)
if (!(per_cpu(bank_map, cpu) & (1 << bank)))
   

[PATCH 4/4] x86/mce/AMD: Add comments for easier understanding

2016-02-16 Thread Aravind Gopalakrishnan
In an attempt to help folks not very familiar with the code to
understand what the code is doing, adding a bit of helper
comments around some more important functions in the driver
to describe them.

No functional change is introduced.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 4bdc836..d2b6001 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -184,6 +184,11 @@ static int lvt_off_valid(struct threshold_block *b, int 
apic, u32 lo, u32 hi)
 };
 
 /*
+ * Set the error_count and interrupt_enable sysfs attributes here.
+ * This function gets called during the init phase and when someone
+ * makes changes to either of the sysfs attributes.
+ * During init phase, we also program Interrupt type as 'APIC' and
+ * verify if LVT offset obtained from MCx_MISC is valid.
  * Called via smp_call_function_single(), must be called with correct
  * cpu affinity.
  */
@@ -262,6 +267,11 @@ static int setup_APIC_deferred_error(int reserved, int new)
return reserved;
 }
 
+/*
+ * Obtain LVT offset from MSR_CU_DEF_ERR and call
+ * setup_APIC_deferred_error() to program relevant APIC register.
+ * Also, register a deferred error interrupt handler
+ */
 static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
 {
u32 low = 0, high = 0;
@@ -338,6 +348,14 @@ nextaddr_out:
return addr;
 }
 
+/*
+ * struct threshold_block descriptor tracks useful info regarding the
+ * banks' MISC register. Among other things, it tracks whether interrupt
+ * is possible for the given bank, the threshold limit and the sysfs object
+ * that outputs these info. Initializing the struct here, programming
+ * LVT offset for threshold interrupts and registering a interrupt handler
+ * if we haven't already done so
+ */
 static int
 prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
int offset, u32 misc_high)
@@ -673,6 +691,9 @@ static struct kobj_type threshold_ktype = {
.default_attrs  = default_attrs,
 };
 
+/*
+ * Initializing sysfs entries for each block within the MCA bank
+ */
 static int allocate_threshold_blocks(unsigned int cpu, unsigned int bank,
 unsigned int block, u32 address)
 {
-- 
2.7.0



[PATCH 4/4] x86/mce/AMD: Add comments for easier understanding

2016-02-16 Thread Aravind Gopalakrishnan
In an attempt to help folks not very familiar with the code to
understand what the code is doing, adding a bit of helper
comments around some more important functions in the driver
to describe them.

No functional change is introduced.

Signed-off-by: Aravind Gopalakrishnan 
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 4bdc836..d2b6001 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -184,6 +184,11 @@ static int lvt_off_valid(struct threshold_block *b, int 
apic, u32 lo, u32 hi)
 };
 
 /*
+ * Set the error_count and interrupt_enable sysfs attributes here.
+ * This function gets called during the init phase and when someone
+ * makes changes to either of the sysfs attributes.
+ * During init phase, we also program Interrupt type as 'APIC' and
+ * verify if LVT offset obtained from MCx_MISC is valid.
  * Called via smp_call_function_single(), must be called with correct
  * cpu affinity.
  */
@@ -262,6 +267,11 @@ static int setup_APIC_deferred_error(int reserved, int new)
return reserved;
 }
 
+/*
+ * Obtain LVT offset from MSR_CU_DEF_ERR and call
+ * setup_APIC_deferred_error() to program relevant APIC register.
+ * Also, register a deferred error interrupt handler
+ */
 static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
 {
u32 low = 0, high = 0;
@@ -338,6 +348,14 @@ nextaddr_out:
return addr;
 }
 
+/*
+ * struct threshold_block descriptor tracks useful info regarding the
+ * banks' MISC register. Among other things, it tracks whether interrupt
+ * is possible for the given bank, the threshold limit and the sysfs object
+ * that outputs these info. Initializing the struct here, programming
+ * LVT offset for threshold interrupts and registering a interrupt handler
+ * if we haven't already done so
+ */
 static int
 prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
int offset, u32 misc_high)
@@ -673,6 +691,9 @@ static struct kobj_type threshold_ktype = {
.default_attrs  = default_attrs,
 };
 
+/*
+ * Initializing sysfs entries for each block within the MCA bank
+ */
 static int allocate_threshold_blocks(unsigned int cpu, unsigned int bank,
 unsigned int block, u32 address)
 {
-- 
2.7.0



[PATCH 1/4] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors

2016-02-16 Thread Aravind Gopalakrishnan
For Scalable MCA enabled processors, errors are listed
per IP block. And since it is not required for an IP to
map to a particular bank, we need to use HWID and McaType
values from the MCx_IPID register to figure out which IP
a given bank represents.

We also have a new bit (TCC) in the MCx_STATUS register
to indicate Task context is corrupt.

Add logic here to decode errors from all known IP
blocks for Fam17h Model 00-0fh and to print TCC errors.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---
 arch/x86/include/asm/mce.h   |  50 ++
 arch/x86/include/asm/msr-index.h |   2 +
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  11 ++
 drivers/edac/mce_amd.c   | 327 ++-
 4 files changed, 389 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 2ea4527..2ec67ac 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -42,6 +42,17 @@
 /* AMD-specific bits */
 #define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
+#define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
+/*
+ * McaX field if set indicates a given bank supports MCA extensions:
+ *  - Deferred error interrupt type is specifiable by bank
+ *  - BlkPtr field indicates prescence of extended MISC registers.
+ *But should not be used to determine MSR numbers
+ *  - TCC bit is present in MCx_STATUS
+ */
+#define MCI_CONFIG_MCAX0x1
+#define MCI_IPID_MCATYPE   0x
+#define MCI_IPID_HWID  0xFFF
 
 /*
  * Note that the full MCACOD field of IA32_MCi_STATUS MSR is
@@ -287,4 +298,43 @@ struct cper_sec_mem_err;
 extern void apei_mce_report_mem_error(int corrected,
  struct cper_sec_mem_err *mem_err);
 
+/*
+ * Enumerating new IP types and HWID values
+ * in ScalableMCA enabled AMD processors
+ */
+#ifdef CONFIG_X86_MCE_AMD
+enum ip_types {
+   F17H_CORE = 0,  /* Core errors */
+   DF, /* Data Fabric */
+   UMC,/* Unified Memory Controller */
+   FUSE,   /* FUSE subsystem */
+   PSP,/* Platform Security Processor */
+   SMU,/* System Management Unit */
+   N_IP_TYPES
+};
+
+struct hwid {
+   const char *ipname;
+   unsigned int hwid_value;
+};
+
+extern struct hwid hwid_mappings[N_IP_TYPES];
+
+enum core_mcatypes {
+   LS = 0, /* Load Store */
+   IF, /* Instruction Fetch */
+   L2_CACHE,   /* L2 cache */
+   DE, /* Decoder unit */
+   RES,/* Reserved */
+   EX, /* Execution unit */
+   FP, /* Floating Point */
+   L3_CACHE/* L3 cache */
+};
+
+enum df_mcatypes {
+   CS = 0, /* Coherent Slave */
+   PIE /* Power management, Interrupts, etc */
+};
+#endif
+
 #endif /* _ASM_X86_MCE_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 5523465..93bccbc 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -266,7 +266,9 @@
 
 /* 'SMCA': AMD64 Scalable MCA */
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MC0_IPID0xc0002005
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
 
 #define MSR_P6_PERFCTR00x00c1
 #define MSR_P6_PERFCTR10x00c2
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 88de27b..8169103 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -71,6 +71,17 @@ static const char * const th_names[] = {
"execution_unit",
 };
 
+/* Defining HWID to IP type mappings for Scalable MCA */
+struct hwid hwid_mappings[] = {
+   [F17H_CORE] = { "f17h_core", 0xB0 },
+   [DF]= { "df", 0x2E },
+   [UMC]   = { "umc", 0x96 },
+   [FUSE]  = { "fuse", 0x5 },
+   [PSP]   = { "psp", 0xFF },
+   [SMU]   = { "smu", 0x1 },
+};
+EXPORT_SYMBOL_GPL(hwid_mappings);
+
 static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
 static DEFINE_PER_CPU(unsigned char, bank_map);/* see which banks are 
on */
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index e3a945c..6e6b327 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -147,6 +147,136 @@ static const char * const mc6_mce_desc[] = {
"Status Register File",
 };
 
+/* Scalable MCA error strings */
+
+static const char * const f17h_ls_mce_desc[] = {
+   "Load queu

[PATCH 1/4] EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors

2016-02-16 Thread Aravind Gopalakrishnan
For Scalable MCA enabled processors, errors are listed
per IP block. And since it is not required for an IP to
map to a particular bank, we need to use HWID and McaType
values from the MCx_IPID register to figure out which IP
a given bank represents.

We also have a new bit (TCC) in the MCx_STATUS register
to indicate Task context is corrupt.

Add logic here to decode errors from all known IP
blocks for Fam17h Model 00-0fh and to print TCC errors.

Signed-off-by: Aravind Gopalakrishnan 
---
 arch/x86/include/asm/mce.h   |  50 ++
 arch/x86/include/asm/msr-index.h |   2 +
 arch/x86/kernel/cpu/mcheck/mce_amd.c |  11 ++
 drivers/edac/mce_amd.c   | 327 ++-
 4 files changed, 389 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 2ea4527..2ec67ac 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -42,6 +42,17 @@
 /* AMD-specific bits */
 #define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
+#define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
+/*
+ * McaX field if set indicates a given bank supports MCA extensions:
+ *  - Deferred error interrupt type is specifiable by bank
+ *  - BlkPtr field indicates prescence of extended MISC registers.
+ *But should not be used to determine MSR numbers
+ *  - TCC bit is present in MCx_STATUS
+ */
+#define MCI_CONFIG_MCAX0x1
+#define MCI_IPID_MCATYPE   0x
+#define MCI_IPID_HWID  0xFFF
 
 /*
  * Note that the full MCACOD field of IA32_MCi_STATUS MSR is
@@ -287,4 +298,43 @@ struct cper_sec_mem_err;
 extern void apei_mce_report_mem_error(int corrected,
  struct cper_sec_mem_err *mem_err);
 
+/*
+ * Enumerating new IP types and HWID values
+ * in ScalableMCA enabled AMD processors
+ */
+#ifdef CONFIG_X86_MCE_AMD
+enum ip_types {
+   F17H_CORE = 0,  /* Core errors */
+   DF, /* Data Fabric */
+   UMC,/* Unified Memory Controller */
+   FUSE,   /* FUSE subsystem */
+   PSP,/* Platform Security Processor */
+   SMU,/* System Management Unit */
+   N_IP_TYPES
+};
+
+struct hwid {
+   const char *ipname;
+   unsigned int hwid_value;
+};
+
+extern struct hwid hwid_mappings[N_IP_TYPES];
+
+enum core_mcatypes {
+   LS = 0, /* Load Store */
+   IF, /* Instruction Fetch */
+   L2_CACHE,   /* L2 cache */
+   DE, /* Decoder unit */
+   RES,/* Reserved */
+   EX, /* Execution unit */
+   FP, /* Floating Point */
+   L3_CACHE/* L3 cache */
+};
+
+enum df_mcatypes {
+   CS = 0, /* Coherent Slave */
+   PIE /* Power management, Interrupts, etc */
+};
+#endif
+
 #endif /* _ASM_X86_MCE_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 5523465..93bccbc 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -266,7 +266,9 @@
 
 /* 'SMCA': AMD64 Scalable MCA */
 #define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MC0_IPID0xc0002005
 #define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_IPID(x) (MSR_AMD64_SMCA_MC0_IPID + 0x10*(x))
 
 #define MSR_P6_PERFCTR00x00c1
 #define MSR_P6_PERFCTR10x00c2
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 88de27b..8169103 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -71,6 +71,17 @@ static const char * const th_names[] = {
"execution_unit",
 };
 
+/* Defining HWID to IP type mappings for Scalable MCA */
+struct hwid hwid_mappings[] = {
+   [F17H_CORE] = { "f17h_core", 0xB0 },
+   [DF]= { "df", 0x2E },
+   [UMC]   = { "umc", 0x96 },
+   [FUSE]  = { "fuse", 0x5 },
+   [PSP]   = { "psp", 0xFF },
+   [SMU]   = { "smu", 0x1 },
+};
+EXPORT_SYMBOL_GPL(hwid_mappings);
+
 static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
 static DEFINE_PER_CPU(unsigned char, bank_map);/* see which banks are 
on */
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index e3a945c..6e6b327 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -147,6 +147,136 @@ static const char * const mc6_mce_desc[] = {
"Status Register File",
 };
 
+/* Scalable MCA error strings */
+
+static const char * const f17h_ls_mce_desc[] = {
+   "Load queue pari

[PATCH 0/4] Updates to EDAC and AMD MCE driver

2016-02-16 Thread Aravind Gopalakrishnan
This patchset mainly provides necessary EDAC bits to decode errors
occuring on Scalable MCA enabled processors and also updates AMD MCE
driver to get correct MCx_MISC register address for upcoming processors.
Patches 1 ans 2 are meant for the upcoming processors.

Patches 3 and 4 are either fixing or adding comments to help in
understanding the code and do not introduce any functional changes.

Patch 1: Updates to EDAC driver to decode the new error signatures
Patch 2: Fix logic to get correct block address
Patch 3: Fix deferred error comment
Patch 4: Add comments to mce_amd.c to describe functionality

Tested the patches for regressions on Fam15h, Fam10h systems
and found none.

Aravind Gopalakrishnan (4):
  EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors
  x86/mce/AMD: Fix logic to obtain block address
  x86/mce: Clarify comments regarding deferred error
  x86/mce/AMD: Add comments for easier understanding

 arch/x86/include/asm/mce.h   |  52 +-
 arch/x86/include/asm/msr-index.h |   6 +
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 126 ++
 drivers/edac/mce_amd.c   | 327 ++-
 4 files changed, 480 insertions(+), 31 deletions(-)

-- 
2.7.0



[PATCH 0/4] Updates to EDAC and AMD MCE driver

2016-02-16 Thread Aravind Gopalakrishnan
This patchset mainly provides necessary EDAC bits to decode errors
occuring on Scalable MCA enabled processors and also updates AMD MCE
driver to get correct MCx_MISC register address for upcoming processors.
Patches 1 ans 2 are meant for the upcoming processors.

Patches 3 and 4 are either fixing or adding comments to help in
understanding the code and do not introduce any functional changes.

Patch 1: Updates to EDAC driver to decode the new error signatures
Patch 2: Fix logic to get correct block address
Patch 3: Fix deferred error comment
Patch 4: Add comments to mce_amd.c to describe functionality

Tested the patches for regressions on Fam15h, Fam10h systems
and found none.

Aravind Gopalakrishnan (4):
  EDAC, MCE, AMD: Enable error decoding of Scalable MCA errors
  x86/mce/AMD: Fix logic to obtain block address
  x86/mce: Clarify comments regarding deferred error
  x86/mce/AMD: Add comments for easier understanding

 arch/x86/include/asm/mce.h   |  52 +-
 arch/x86/include/asm/msr-index.h |   6 +
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 126 ++
 drivers/edac/mce_amd.c   | 327 ++-
 4 files changed, 480 insertions(+), 31 deletions(-)

-- 
2.7.0



[PATCH 3/4] x86/mce: Clarify comments regarding deferred error

2016-02-16 Thread Aravind Gopalakrishnan
The Deferred field indicates if we have a Deferred error.
Deferred errors indicate errors that hardware could not
fix. But it still does not cause any interruption to program
flow. So it does not generate any #MC and UC bit in MCx_STATUS
is not set.

Fixing comment here. No functional change

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
---
 arch/x86/include/asm/mce.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 2ec67ac..476da8b 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -40,7 +40,7 @@
 #define MCI_STATUS_AR   (1ULL<<55)  /* Action required */
 
 /* AMD-specific bits */
-#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
+#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare a deferred error */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
 #define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
 /*
-- 
2.7.0



[PATCH 3/4] x86/mce: Clarify comments regarding deferred error

2016-02-16 Thread Aravind Gopalakrishnan
The Deferred field indicates if we have a Deferred error.
Deferred errors indicate errors that hardware could not
fix. But it still does not cause any interruption to program
flow. So it does not generate any #MC and UC bit in MCx_STATUS
is not set.

Fixing comment here. No functional change

Signed-off-by: Aravind Gopalakrishnan 
---
 arch/x86/include/asm/mce.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 2ec67ac..476da8b 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -40,7 +40,7 @@
 #define MCI_STATUS_AR   (1ULL<<55)  /* Action required */
 
 /* AMD-specific bits */
-#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare an uncorrected error */
+#define MCI_STATUS_DEFERRED(1ULL<<44)  /* declare a deferred error */
 #define MCI_STATUS_POISON  (1ULL<<43)  /* access poisonous data */
 #define MCI_STATUS_TCC (1ULL<<55)  /* Task context corrupt */
 /*
-- 
2.7.0



[tip:ras/core] x86/mce/AMD: Set MCAX Enable bit

2016-02-01 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  e6c8f1873be8a14c7e44202df1f7e6ea61bf3352
Gitweb: http://git.kernel.org/tip/e6c8f1873be8a14c7e44202df1f7e6ea61bf3352
Author: Aravind Gopalakrishnan 
AuthorDate: Mon, 25 Jan 2016 20:41:53 +0100
Committer:  Ingo Molnar 
CommitDate: Mon, 1 Feb 2016 10:53:59 +0100

x86/mce/AMD: Set MCAX Enable bit

It is required for the OS to acknowledge that it is using the
MCAX register set and its associated fields by setting the
'McaXEnable' bit in each bank's MCi_CONFIG register. If it is
not set, then all UC errors will cause a system panic.

Signed-off-by: Aravind Gopalakrishnan 
Signed-off-by: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: linux-edac 
Link: http://lkml.kernel.org/r/1453750913-4781-9-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/msr-index.h |  4 
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 14 ++
 2 files changed, 18 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b05402e..5523465 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -264,6 +264,10 @@
 #define MSR_IA32_MC0_CTL2  0x0280
 #define MSR_IA32_MCx_CTL2(x)   (MSR_IA32_MC0_CTL2 + (x))
 
+/* 'SMCA': AMD64 Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+
 #define MSR_P6_PERFCTR00x00c1
 #define MSR_P6_PERFCTR10x00c2
 #define MSR_P6_EVNTSEL00x0186
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index f2860a1..88de27b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -54,6 +54,14 @@
 /* Threshold LVT offset is at MSR0xC410[15:12] */
 #define SMCA_THR_LVT_OFF   0xF000
 
+/*
+ * OS is required to set the MCAX bit to acknowledge that it is now using the
+ * new MSR ranges and new registers under each bank. It also means that the OS
+ * will configure deferred errors in the new MCx_CONFIG register. If the bit is
+ * not set, uncorrectable errors will cause a system panic.
+ */
+#define SMCA_MCAX_EN_OFF   0x1
+
 static const char * const th_names[] = {
"load_store",
"insn_fetch",
@@ -292,6 +300,12 @@ prepare_threshold_block(unsigned int bank, unsigned int 
block, u32 addr,
 
if (mce_flags.smca) {
u32 smca_low, smca_high;
+   u32 smca_addr = MSR_AMD64_SMCA_MCx_CONFIG(bank);
+
+   if (!rdmsr_safe(smca_addr, _low, _high)) {
+   smca_high |= SMCA_MCAX_EN_OFF;
+   wrmsr(smca_addr, smca_low, smca_high);
+   }
 
/* Gather LVT offset for thresholding: */
if (rdmsr_safe(MSR_CU_DEF_ERR, _low, _high))


[tip:ras/core] x86/mce/AMD: Fix LVT offset configuration for thresholding

2016-02-01 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  f57a1f3c14b9182f1fea667f5a38a1094699db7c
Gitweb: http://git.kernel.org/tip/f57a1f3c14b9182f1fea667f5a38a1094699db7c
Author: Aravind Gopalakrishnan 
AuthorDate: Mon, 25 Jan 2016 20:41:51 +0100
Committer:  Ingo Molnar 
CommitDate: Mon, 1 Feb 2016 10:53:57 +0100

x86/mce/AMD: Fix LVT offset configuration for thresholding

For processor families with the Scalable MCA feature, the LVT
offset for threshold interrupts is configured only in MSR
0xC410 and not in each per bank MISC register as was done in
earlier families.

Obtain the LVT offset from the correct MSR for those families.

Signed-off-by: Aravind Gopalakrishnan 
Signed-off-by: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: linux-edac 
Link: http://lkml.kernel.org/r/1453750913-4781-7-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 5982227..a77a452 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -49,6 +49,11 @@
 #define DEF_LVT_OFF0x2
 #define DEF_INT_TYPE_APIC  0x2
 
+/* Scalable MCA: */
+
+/* Threshold LVT offset is at MSR0xC410[15:12] */
+#define SMCA_THR_LVT_OFF   0xF000
+
 static const char * const th_names[] = {
"load_store",
"insn_fetch",
@@ -142,6 +147,14 @@ static int lvt_off_valid(struct threshold_block *b, int 
apic, u32 lo, u32 hi)
}
 
if (apic != msr) {
+   /*
+* On SMCA CPUs, LVT offset is programmed at a different MSR, 
and
+* the BIOS provides the value. The original field where LVT 
offset
+* was set is reserved. Return early here:
+*/
+   if (mce_flags.smca)
+   return 0;
+
pr_err(FW_BUG "cpu %d, invalid threshold interrupt offset %d "
   "for bank %d, block %d (MSR%08X=0x%x%08x)\n",
   b->cpu, apic, b->bank, b->block, b->address, hi, lo);
@@ -300,7 +313,19 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
goto init;
 
b.interrupt_enable = 1;
-   new = (high & MASK_LVTOFF_HI) >> 20;
+
+   if (mce_flags.smca) {
+   u32 smca_low, smca_high;
+
+   /* Gather LVT offset for thresholding: */
+   if (rdmsr_safe(MSR_CU_DEF_ERR, _low, 
_high))
+   break;
+
+   new = (smca_low & SMCA_THR_LVT_OFF) >> 12;
+   } else {
+   new = (high & MASK_LVTOFF_HI) >> 20;
+   }
+
offset  = setup_APIC_mce_threshold(offset, new);
 
if ((offset == new) &&


[tip:ras/core] x86/mce/AMD: Reduce number of blocks scanned per bank

2016-02-01 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  60f116fca162d9488f783f5014779463243ab7a2
Gitweb: http://git.kernel.org/tip/60f116fca162d9488f783f5014779463243ab7a2
Author: Aravind Gopalakrishnan 
AuthorDate: Mon, 25 Jan 2016 20:41:50 +0100
Committer:  Ingo Molnar 
CommitDate: Mon, 1 Feb 2016 10:53:57 +0100

x86/mce/AMD: Reduce number of blocks scanned per bank

>From Fam17h onwards, the number of extended MCx_MISC register blocks is
reduced to 4. It is an architectural change from what we had on
earlier processors.

Although theoritically the total number of extended MCx_MISC
registers was 8 in earlier processor families, in practice we
only had to use the extra registers for MC4. And only 2 of those
were used. So this change does not affect older processors.
Tested on Fam10h and Fam15h systems.

Signed-off-by: Aravind Gopalakrishnan 
Signed-off-by: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: linux-edac 
Link: http://lkml.kernel.org/r/1453750913-4781-6-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 3068ce2..5982227 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -28,7 +28,7 @@
 #include 
 #include 
 
-#define NR_BLOCKS 9
+#define NR_BLOCKS 5
 #define THRESHOLD_MAX 0xFFF
 #define INT_TYPE_APIC 0x0002
 #define MASK_VALID_HI 0x8000


[tip:ras/core] x86/mce/AMD: Do not perform shared bank check for future processors

2016-02-01 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  284b965c146f482b4a411133f62288d52b7e3a72
Gitweb: http://git.kernel.org/tip/284b965c146f482b4a411133f62288d52b7e3a72
Author: Aravind Gopalakrishnan 
AuthorDate: Mon, 25 Jan 2016 20:41:49 +0100
Committer:  Ingo Molnar 
CommitDate: Mon, 1 Feb 2016 10:53:56 +0100

x86/mce/AMD: Do not perform shared bank check for future processors

Fam17h and above should not require a check to see if a bank is
shared or not. For shared banks, there will always be only one
core that has visibility over the MSRs and only that particular
core will be allowed to write to the MSRs.

Fix the code to return early if we have Scalable MCA support. No
change in functionality for earlier processors.

Signed-off-by: Aravind Gopalakrishnan 
Signed-off-by: Fengguang Wu 
[ Massaged the changelog text, fixed kbuild test robot build warning. ]
Signed-off-by: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: linux-edac 
Link: http://lkml.kernel.org/r/1453750913-4781-5-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index e99b150..3068ce2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -84,6 +84,13 @@ struct thresh_restart {
 
 static inline bool is_shared_bank(int bank)
 {
+   /*
+* Scalable MCA provides for only one core to have access to the MSRs of
+* a shared bank.
+*/
+   if (mce_flags.smca)
+   return false;
+
/* Bank 4 is for northbridge reporting and is thus shared */
return (bank == 4);
 }


[tip:ras/core] x86/mce: Fix order of AMD MCE init function call

2016-02-01 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  bfbe0eeb769e2aff2cb1fc6845c4e4b7eac40bb3
Gitweb: http://git.kernel.org/tip/bfbe0eeb769e2aff2cb1fc6845c4e4b7eac40bb3
Author: Aravind Gopalakrishnan 
AuthorDate: Mon, 25 Jan 2016 20:41:48 +0100
Committer:  Ingo Molnar 
CommitDate: Mon, 1 Feb 2016 10:53:55 +0100

x86/mce: Fix order of AMD MCE init function call

In mce_amd_feature_init() we take decisions based on mce_flags
being set or not. So the feature detection using CPUID should
naturally be ordered before we call mce_amd_feature_init().

Fix that here.

Signed-off-by: Aravind Gopalakrishnan 
Signed-off-by: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: linux-edac 
Link: http://lkml.kernel.org/r/1453750913-4781-4-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/cpu/mcheck/mce.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index a006f4c..b718080 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1617,10 +1617,10 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 
*c)
case X86_VENDOR_AMD: {
u32 ebx = cpuid_ebx(0x8007);
 
-   mce_amd_feature_init(c);
mce_flags.overflow_recov = !!(ebx & BIT(0));
mce_flags.succor = !!(ebx & BIT(1));
mce_flags.smca   = !!(ebx & BIT(3));
+   mce_amd_feature_init(c);
 
break;
}


[tip:ras/core] x86/mce: Fix order of AMD MCE init function call

2016-02-01 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  bfbe0eeb769e2aff2cb1fc6845c4e4b7eac40bb3
Gitweb: http://git.kernel.org/tip/bfbe0eeb769e2aff2cb1fc6845c4e4b7eac40bb3
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Mon, 25 Jan 2016 20:41:48 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Mon, 1 Feb 2016 10:53:55 +0100

x86/mce: Fix order of AMD MCE init function call

In mce_amd_feature_init() we take decisions based on mce_flags
being set or not. So the feature detection using CPUID should
naturally be ordered before we call mce_amd_feature_init().

Fix that here.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: linux-edac <linux-e...@vger.kernel.org>
Link: http://lkml.kernel.org/r/1453750913-4781-4-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index a006f4c..b718080 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1617,10 +1617,10 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 
*c)
case X86_VENDOR_AMD: {
u32 ebx = cpuid_ebx(0x8007);
 
-   mce_amd_feature_init(c);
mce_flags.overflow_recov = !!(ebx & BIT(0));
mce_flags.succor = !!(ebx & BIT(1));
mce_flags.smca   = !!(ebx & BIT(3));
+   mce_amd_feature_init(c);
 
break;
}


[tip:ras/core] x86/mce/AMD: Reduce number of blocks scanned per bank

2016-02-01 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  60f116fca162d9488f783f5014779463243ab7a2
Gitweb: http://git.kernel.org/tip/60f116fca162d9488f783f5014779463243ab7a2
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Mon, 25 Jan 2016 20:41:50 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Mon, 1 Feb 2016 10:53:57 +0100

x86/mce/AMD: Reduce number of blocks scanned per bank

>From Fam17h onwards, the number of extended MCx_MISC register blocks is
reduced to 4. It is an architectural change from what we had on
earlier processors.

Although theoritically the total number of extended MCx_MISC
registers was 8 in earlier processor families, in practice we
only had to use the extra registers for MC4. And only 2 of those
were used. So this change does not affect older processors.
Tested on Fam10h and Fam15h systems.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: linux-edac <linux-e...@vger.kernel.org>
Link: http://lkml.kernel.org/r/1453750913-4781-6-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 3068ce2..5982227 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -28,7 +28,7 @@
 #include 
 #include 
 
-#define NR_BLOCKS 9
+#define NR_BLOCKS 5
 #define THRESHOLD_MAX 0xFFF
 #define INT_TYPE_APIC 0x0002
 #define MASK_VALID_HI 0x8000


[tip:ras/core] x86/mce/AMD: Do not perform shared bank check for future processors

2016-02-01 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  284b965c146f482b4a411133f62288d52b7e3a72
Gitweb: http://git.kernel.org/tip/284b965c146f482b4a411133f62288d52b7e3a72
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Mon, 25 Jan 2016 20:41:49 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Mon, 1 Feb 2016 10:53:56 +0100

x86/mce/AMD: Do not perform shared bank check for future processors

Fam17h and above should not require a check to see if a bank is
shared or not. For shared banks, there will always be only one
core that has visibility over the MSRs and only that particular
core will be allowed to write to the MSRs.

Fix the code to return early if we have Scalable MCA support. No
change in functionality for earlier processors.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
Signed-off-by: Fengguang Wu <fengguang...@intel.com>
[ Massaged the changelog text, fixed kbuild test robot build warning. ]
Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: linux-edac <linux-e...@vger.kernel.org>
Link: http://lkml.kernel.org/r/1453750913-4781-5-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index e99b150..3068ce2 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -84,6 +84,13 @@ struct thresh_restart {
 
 static inline bool is_shared_bank(int bank)
 {
+   /*
+* Scalable MCA provides for only one core to have access to the MSRs of
+* a shared bank.
+*/
+   if (mce_flags.smca)
+   return false;
+
/* Bank 4 is for northbridge reporting and is thus shared */
return (bank == 4);
 }


[tip:ras/core] x86/mce/AMD: Fix LVT offset configuration for thresholding

2016-02-01 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  f57a1f3c14b9182f1fea667f5a38a1094699db7c
Gitweb: http://git.kernel.org/tip/f57a1f3c14b9182f1fea667f5a38a1094699db7c
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Mon, 25 Jan 2016 20:41:51 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Mon, 1 Feb 2016 10:53:57 +0100

x86/mce/AMD: Fix LVT offset configuration for thresholding

For processor families with the Scalable MCA feature, the LVT
offset for threshold interrupts is configured only in MSR
0xC410 and not in each per bank MISC register as was done in
earlier families.

Obtain the LVT offset from the correct MSR for those families.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: linux-edac <linux-e...@vger.kernel.org>
Link: http://lkml.kernel.org/r/1453750913-4781-7-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 5982227..a77a452 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -49,6 +49,11 @@
 #define DEF_LVT_OFF0x2
 #define DEF_INT_TYPE_APIC  0x2
 
+/* Scalable MCA: */
+
+/* Threshold LVT offset is at MSR0xC410[15:12] */
+#define SMCA_THR_LVT_OFF   0xF000
+
 static const char * const th_names[] = {
"load_store",
"insn_fetch",
@@ -142,6 +147,14 @@ static int lvt_off_valid(struct threshold_block *b, int 
apic, u32 lo, u32 hi)
}
 
if (apic != msr) {
+   /*
+* On SMCA CPUs, LVT offset is programmed at a different MSR, 
and
+* the BIOS provides the value. The original field where LVT 
offset
+* was set is reserved. Return early here:
+*/
+   if (mce_flags.smca)
+   return 0;
+
pr_err(FW_BUG "cpu %d, invalid threshold interrupt offset %d "
   "for bank %d, block %d (MSR%08X=0x%x%08x)\n",
   b->cpu, apic, b->bank, b->block, b->address, hi, lo);
@@ -300,7 +313,19 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
goto init;
 
b.interrupt_enable = 1;
-   new = (high & MASK_LVTOFF_HI) >> 20;
+
+   if (mce_flags.smca) {
+   u32 smca_low, smca_high;
+
+   /* Gather LVT offset for thresholding: */
+   if (rdmsr_safe(MSR_CU_DEF_ERR, _low, 
_high))
+   break;
+
+   new = (smca_low & SMCA_THR_LVT_OFF) >> 12;
+   } else {
+   new = (high & MASK_LVTOFF_HI) >> 20;
+   }
+
offset  = setup_APIC_mce_threshold(offset, new);
 
if ((offset == new) &&


[tip:ras/core] x86/mce/AMD: Set MCAX Enable bit

2016-02-01 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  e6c8f1873be8a14c7e44202df1f7e6ea61bf3352
Gitweb: http://git.kernel.org/tip/e6c8f1873be8a14c7e44202df1f7e6ea61bf3352
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Mon, 25 Jan 2016 20:41:53 +0100
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Mon, 1 Feb 2016 10:53:59 +0100

x86/mce/AMD: Set MCAX Enable bit

It is required for the OS to acknowledge that it is using the
MCAX register set and its associated fields by setting the
'McaXEnable' bit in each bank's MCi_CONFIG register. If it is
not set, then all UC errors will cause a system panic.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: linux-edac <linux-e...@vger.kernel.org>
Link: http://lkml.kernel.org/r/1453750913-4781-9-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/include/asm/msr-index.h |  4 
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 14 ++
 2 files changed, 18 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b05402e..5523465 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -264,6 +264,10 @@
 #define MSR_IA32_MC0_CTL2  0x0280
 #define MSR_IA32_MCx_CTL2(x)   (MSR_IA32_MC0_CTL2 + (x))
 
+/* 'SMCA': AMD64 Scalable MCA */
+#define MSR_AMD64_SMCA_MC0_CONFIG  0xc0002004
+#define MSR_AMD64_SMCA_MCx_CONFIG(x)   (MSR_AMD64_SMCA_MC0_CONFIG + 0x10*(x))
+
 #define MSR_P6_PERFCTR00x00c1
 #define MSR_P6_PERFCTR10x00c2
 #define MSR_P6_EVNTSEL00x0186
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index f2860a1..88de27b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -54,6 +54,14 @@
 /* Threshold LVT offset is at MSR0xC410[15:12] */
 #define SMCA_THR_LVT_OFF   0xF000
 
+/*
+ * OS is required to set the MCAX bit to acknowledge that it is now using the
+ * new MSR ranges and new registers under each bank. It also means that the OS
+ * will configure deferred errors in the new MCx_CONFIG register. If the bit is
+ * not set, uncorrectable errors will cause a system panic.
+ */
+#define SMCA_MCAX_EN_OFF   0x1
+
 static const char * const th_names[] = {
"load_store",
"insn_fetch",
@@ -292,6 +300,12 @@ prepare_threshold_block(unsigned int bank, unsigned int 
block, u32 addr,
 
if (mce_flags.smca) {
u32 smca_low, smca_high;
+   u32 smca_addr = MSR_AMD64_SMCA_MCx_CONFIG(bank);
+
+   if (!rdmsr_safe(smca_addr, _low, _high)) {
+   smca_high |= SMCA_MCAX_EN_OFF;
+   wrmsr(smca_addr, smca_low, smca_high);
+   }
 
/* Gather LVT offset for thresholding: */
if (rdmsr_safe(MSR_CU_DEF_ERR, _low, _high))


Re: [patch] amd64_edac: shift wrapping issue in f1x_get_norm_dct_addr()

2016-01-21 Thread Aravind Gopalakrishnan

On 1/21/2016 6:32 AM, Borislav Petkov wrote:

On Wed, Jan 20, 2016 at 12:54:51PM +0300, Dan Carpenter wrote:

+   u64 dct_sel_base_off= (u64)(pvt->dct_sel_hi & 0xFC00) << 16;



@Aravind: do you have a box with

setpci -s 18.2 0x114.l

bits [31:16] not 0?





Nope. I don't see it set on my Fam10h and Fam15h Model 00-0fh systems.

But yes, nice catch!

-Aravind.


Re: [patch] amd64_edac: shift wrapping issue in f1x_get_norm_dct_addr()

2016-01-21 Thread Aravind Gopalakrishnan

On 1/21/2016 6:32 AM, Borislav Petkov wrote:

On Wed, Jan 20, 2016 at 12:54:51PM +0300, Dan Carpenter wrote:

+   u64 dct_sel_base_off= (u64)(pvt->dct_sel_hi & 0xFC00) << 16;



@Aravind: do you have a box with

setpci -s 18.2 0x114.l

bits [31:16] not 0?





Nope. I don't see it set on my Fam10h and Fam15h Model 00-0fh systems.

But yes, nice catch!

-Aravind.


[tip:x86/urgent] x86/AMD: Fix last level cache topology for AMD Fam17h systems

2015-11-07 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  3849e91f571dcb48cf2c8143480c59137d44d6bc
Gitweb: http://git.kernel.org/tip/3849e91f571dcb48cf2c8143480c59137d44d6bc
Author: Aravind Gopalakrishnan 
AuthorDate: Wed, 4 Nov 2015 12:49:42 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 7 Nov 2015 10:37:51 +0100

x86/AMD: Fix last level cache topology for AMD Fam17h systems

On AMD Fam17h systems, the last level cache is not resident in the
northbridge. Therefore, we cannot assign cpu_llc_id to the same value as
Node ID as we have been doing until now.

We should rather look at the ApicID bits of the core to provide us the
last level cache ID info.

Signed-off-by: Aravind Gopalakrishnan 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Frederic Weisbecker 
Cc: "H. Peter Anvin" 
Cc: Huang Rui 
Cc: Ingo Molnar 
Cc: Jacob Shin 
Link: 
http://lkml.kernel.org/r/1446582899-9378-1-git-send-email-aravind.gopalakrish...@amd.com
Signed-off-by: Borislav Petkov 
Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/amd.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 4a70fc6..a8816b3 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -352,6 +352,7 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c)
 #ifdef CONFIG_SMP
unsigned bits;
int cpu = smp_processor_id();
+   unsigned int socket_id, core_complex_id;
 
bits = c->x86_coreid_bits;
/* Low order bits define the core id (index of core in socket) */
@@ -361,6 +362,18 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c)
/* use socket ID also for last level cache */
per_cpu(cpu_llc_id, cpu) = c->phys_proc_id;
amd_get_topology(c);
+
+   /*
+* Fix percpu cpu_llc_id here as LLC topology is different
+* for Fam17h systems.
+*/
+if (c->x86 != 0x17 || !cpuid_edx(0x8006))
+   return;
+
+   socket_id   = (c->apicid >> bits) - 1;
+   core_complex_id = (c->apicid & ((1 << bits) - 1)) >> 3;
+
+   per_cpu(cpu_llc_id, cpu) = (socket_id << 3) | core_complex_id;
 #endif
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86/AMD: Fix last level cache topology for AMD Fam17h systems

2015-11-07 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  3849e91f571dcb48cf2c8143480c59137d44d6bc
Gitweb: http://git.kernel.org/tip/3849e91f571dcb48cf2c8143480c59137d44d6bc
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Wed, 4 Nov 2015 12:49:42 +0100
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Sat, 7 Nov 2015 10:37:51 +0100

x86/AMD: Fix last level cache topology for AMD Fam17h systems

On AMD Fam17h systems, the last level cache is not resident in the
northbridge. Therefore, we cannot assign cpu_llc_id to the same value as
Node ID as we have been doing until now.

We should rather look at the ApicID bits of the core to provide us the
last level cache ID info.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Frederic Weisbecker <fweis...@gmail.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Huang Rui <ray.hu...@amd.com>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Jacob Shin <jacob.w.s...@gmail.com>
Link: 
http://lkml.kernel.org/r/1446582899-9378-1-git-send-email-aravind.gopalakrish...@amd.com
Signed-off-by: Borislav Petkov <b...@suse.de>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
---
 arch/x86/kernel/cpu/amd.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 4a70fc6..a8816b3 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -352,6 +352,7 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c)
 #ifdef CONFIG_SMP
unsigned bits;
int cpu = smp_processor_id();
+   unsigned int socket_id, core_complex_id;
 
bits = c->x86_coreid_bits;
/* Low order bits define the core id (index of core in socket) */
@@ -361,6 +362,18 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c)
/* use socket ID also for last level cache */
per_cpu(cpu_llc_id, cpu) = c->phys_proc_id;
amd_get_topology(c);
+
+   /*
+* Fix percpu cpu_llc_id here as LLC topology is different
+* for Fam17h systems.
+*/
+if (c->x86 != 0x17 || !cpuid_edx(0x8006))
+   return;
+
+   socket_id   = (c->apicid >> bits) - 1;
+   core_complex_id = (c->apicid & ((1 << bits) - 1)) >> 3;
+
+   per_cpu(cpu_llc_id, cpu) = (socket_id << 3) | core_complex_id;
 #endif
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86/AMD: Fix last level cache topology for AMD Fam17h systems

2015-11-04 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  e5e84a26ef2909964d964224b805236293fb4c63
Gitweb: http://git.kernel.org/tip/e5e84a26ef2909964d964224b805236293fb4c63
Author: Aravind Gopalakrishnan 
AuthorDate: Wed, 4 Nov 2015 12:49:42 +0100
Committer:  Thomas Gleixner 
CommitDate: Wed, 4 Nov 2015 12:52:06 +0100

x86/AMD: Fix last level cache topology for AMD Fam17h systems

On AMD Fam17h systems, the last level cache is not resident in the
northbridge. Therefore, we cannot assign cpu_llc_id to the same value as
Node ID as we have been doing until now.

We should rather look at the ApicID bits of the core to provide us the
last level cache ID info.

Signed-off-by: Aravind Gopalakrishnan 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Frederic Weisbecker 
Cc: "H. Peter Anvin" 
Cc: Huang Rui 
Cc: Ingo Molnar 
Cc: Jacob Shin 
Link: 
http://lkml.kernel.org/r/1446582899-9378-1-git-send-email-aravind.gopalakrish...@amd.com
Signed-off-by: Borislav Petkov 
Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/amd.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 4a70fc6..a8816b3 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -352,6 +352,7 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c)
 #ifdef CONFIG_SMP
unsigned bits;
int cpu = smp_processor_id();
+   unsigned int socket_id, core_complex_id;
 
bits = c->x86_coreid_bits;
/* Low order bits define the core id (index of core in socket) */
@@ -361,6 +362,18 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c)
/* use socket ID also for last level cache */
per_cpu(cpu_llc_id, cpu) = c->phys_proc_id;
amd_get_topology(c);
+
+   /*
+* Fix percpu cpu_llc_id here as LLC topology is different
+* for Fam17h systems.
+*/
+if (c->x86 != 0x17 || !cpuid_edx(0x8006))
+   return;
+
+   socket_id   = (c->apicid >> bits) - 1;
+   core_complex_id = (c->apicid & ((1 << bits) - 1)) >> 3;
+
+   per_cpu(cpu_llc_id, cpu) = (socket_id << 3) | core_complex_id;
 #endif
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86/AMD: Fix last level cache topology for AMD Fam17h systems

2015-11-04 Thread tip-bot for Aravind Gopalakrishnan
Commit-ID:  e5e84a26ef2909964d964224b805236293fb4c63
Gitweb: http://git.kernel.org/tip/e5e84a26ef2909964d964224b805236293fb4c63
Author: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
AuthorDate: Wed, 4 Nov 2015 12:49:42 +0100
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Wed, 4 Nov 2015 12:52:06 +0100

x86/AMD: Fix last level cache topology for AMD Fam17h systems

On AMD Fam17h systems, the last level cache is not resident in the
northbridge. Therefore, we cannot assign cpu_llc_id to the same value as
Node ID as we have been doing until now.

We should rather look at the ApicID bits of the core to provide us the
last level cache ID info.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrish...@amd.com>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Frederic Weisbecker <fweis...@gmail.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Huang Rui <ray.hu...@amd.com>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Jacob Shin <jacob.w.s...@gmail.com>
Link: 
http://lkml.kernel.org/r/1446582899-9378-1-git-send-email-aravind.gopalakrish...@amd.com
Signed-off-by: Borislav Petkov <b...@suse.de>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
---
 arch/x86/kernel/cpu/amd.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 4a70fc6..a8816b3 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -352,6 +352,7 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c)
 #ifdef CONFIG_SMP
unsigned bits;
int cpu = smp_processor_id();
+   unsigned int socket_id, core_complex_id;
 
bits = c->x86_coreid_bits;
/* Low order bits define the core id (index of core in socket) */
@@ -361,6 +362,18 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c)
/* use socket ID also for last level cache */
per_cpu(cpu_llc_id, cpu) = c->phys_proc_id;
amd_get_topology(c);
+
+   /*
+* Fix percpu cpu_llc_id here as LLC topology is different
+* for Fam17h systems.
+*/
+if (c->x86 != 0x17 || !cpuid_edx(0x8006))
+   return;
+
+   socket_id   = (c->apicid >> bits) - 1;
+   core_complex_id = (c->apicid & ((1 << bits) - 1)) >> 3;
+
+   per_cpu(cpu_llc_id, cpu) = (socket_id << 3) | core_complex_id;
 #endif
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2] x86/AMD: Fix LLC topology for AMD Fam17h systems

2015-11-03 Thread Aravind Gopalakrishnan
On AMD Fam17h systems, the last level cache is not resident in
Northbridge. Therefore, we cannot assign cpu_llc_id to same
value as Node ID (as we have been doing currently)

We should rather look at the ApicID bits of the core to provide
us the last level cache ID info. Doing that here.

Signed-off-by: Aravind Gopalakrishnan 
---
Changes in V2:
 - Move LLC calculation logic to amd_detect_cmp() and change patch
   header as a result. (This in turn fixes the issue found by
   kbuild bot on the V1 patch)
   
 arch/x86/kernel/cpu/amd.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 4a70fc6..dab371e 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -352,6 +352,8 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c)
 #ifdef CONFIG_SMP
unsigned bits;
int cpu = smp_processor_id();
+   unsigned int apicid = c->apicid;
+   unsigned int socket_id, core_complex_id;
 
bits = c->x86_coreid_bits;
/* Low order bits define the core id (index of core in socket) */
@@ -361,6 +363,17 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c)
/* use socket ID also for last level cache */
per_cpu(cpu_llc_id, cpu) = c->phys_proc_id;
amd_get_topology(c);
+
+   /*
+* Fix percpu cpu_llc_id here as LLC topology is different
+* for Fam17h systems.
+*/
+if (c->x86 != 0x17 || !cpuid_edx(0x8006))
+   return;
+
+   socket_id = (apicid >> bits) - 1;
+   core_complex_id = (apicid & ((1 << bits) - 1)) >> 3;
+   per_cpu(cpu_llc_id, cpu) = (socket_id << 3) | core_complex_id;
 #endif
 }
 
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/intel_cacheinfo: Fix LLC topology for AMD Fam17h systems

2015-11-03 Thread Aravind Gopalakrishnan

On 11/3/2015 1:52 PM, Borislav Petkov wrote:

On Tue, Nov 03, 2015 at 01:41:53PM -0600, Aravind Gopalakrishnan wrote:

cpu_llc_id references should be wrapped under #ifdef CONFIG_SMP.

Did that and kernel build worked with the attached config.

Will send a V2 with the fix.

Why aren't you doing all that figuring out what the llc_id is in
amd_detect_cmp() which is already CONFIG_SMP ifdeffed?

Which is where that code belongs anyway...



Since we needed to modify last level cache IDs, I thought 
init_amd_cacheinfo() might be a logical place to put it.


But you are right, makes sense to move it to amd_detect_cmp().
I'll do that in V2.

Thanks,
-Aravind.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   >