From: Tony Luck <[email protected]>

[ Upstream commit f7cf2a22a2896d3b3595b71d7936b6d7a3316b00 ]

Haswell moved the TOLM/TOHM registers to a different device and offset.
The sb_edac driver accounted for the change of device, but not for the
new offset.  There was also a typo in the constant to fill in the low
26 bits (was 0x1ffffff, should be 0x3ffffff).

This resulted in a bogus value for the top of low memory:

  EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff)

which would result in EDAC refusing to translate addresses for
errors above the bogus value and below 4GB:

   sbridge MC3: HANDLING MCE MEMORY ERROR
   sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
   sbridge MC3: TSC 0
   sbridge MC3: ADDR 2000000
   sbridge MC3: MISC 523eac86
   sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0
   MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 
offset:0x0 grain:32 syndrome:0x0)

With the fix we see the correct TOLM value:

   DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff)

and we decode address 2000000 correctly:

   sbridge MC3: HANDLING MCE MEMORY ERROR
   sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
   sbridge MC3: TSC 0
   sbridge MC3: ADDR 2000000
   sbridge MC3: MISC 523e1086
   sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0
   DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 
0, shiftup: 0
   DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 
0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 
0x00000000), index 0, base ch: 0, ch mask: 0x01
   DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), 
way: 1
   DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 
0xffffffff, RIR interleave 0, index 0
   DEBUG: sbridge_mce_output_error:  area:DRAM err_code:0001:0090 socket:0 
channel_mask:1 rank:0
   MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 
slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM 
err_code:0001:0090 socket:0 channel_mask:1 rank:0)

Signed-off-by: Tony Luck <[email protected]>
Acked-by: Aristeu Rozanski <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
 drivers/edac/sb_edac.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 3cdaac8..3f910bf 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -135,6 +135,7 @@ static inline int sad_pkg(const struct interleave_pkg 
*table, u32 reg,
 
 #define TOLM           0x80
 #define        TOHM            0x84
+#define HASWELL_TOLM   0xd0
 #define HASWELL_TOHM_0 0xd4
 #define HASWELL_TOHM_1 0xd8
 
@@ -706,8 +707,8 @@ static u64 haswell_get_tolm(struct sbridge_pvt *pvt)
 {
        u32 reg;
 
-       pci_read_config_dword(pvt->info.pci_vtd, TOLM, &reg);
-       return (GET_BITFIELD(reg, 26, 31) << 26) | 0x1ffffff;
+       pci_read_config_dword(pvt->info.pci_vtd, HASWELL_TOLM, &reg);
+       return (GET_BITFIELD(reg, 26, 31) << 26) | 0x3ffffff;
 }
 
 static u64 haswell_get_tohm(struct sbridge_pvt *pvt)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to