Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-05-17 Thread Borislav Petkov
On Thu, May 17, 2018 at 09:29:23PM +0200, Johannes Hirte wrote:
> Works as expected on my Carrizo.

Thanks, I'll add your Tested-by.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-05-17 Thread Johannes Hirte
On 2018 Mai 17, Borislav Petkov wrote:
> On Thu, May 17, 2018 at 08:49:31AM +0200, Johannes Hirte wrote:
> > Maybe I'm missing something, but those RDMSR IPSs don't happen on
> > pre-SMCA systems, right? So the caching should be avoided here, cause
> > the whole lookup looks more expensive to me than the simple switch-block
> > in get_block_address.
> 
> Yeah, and we should simply cache all those MSR values as I suggested then.
> 
> The patch at the end should fix your issue.
> 

Works as expected on my Carrizo.

-- 
Regards,
  Johannes



RE: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-05-17 Thread Ghannam, Yazen
> -Original Message-
> From: Borislav Petkov 
> Sent: Thursday, May 17, 2018 9:44 AM
> To: Ghannam, Yazen 
> Cc: Johannes Hirte ; linux-
> e...@vger.kernel.org; linux-kernel@vger.kernel.org; tony.l...@intel.com;
> x...@kernel.org
> Subject: Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized
> block
> 
> On Thu, May 17, 2018 at 01:04:19PM +, Ghannam, Yazen wrote:
...
> 
> I check PFEH is enabled how?
> 

If MISC0 is RAZ then you can assume PFEH is enabled. There should be a BIOS
option to disable it.

BTW, I just tried you patch with PFEH disabled and it seems to work fine.

...
> > Since we're caching the values during init, we can drop all the
> > *_on_cpu() calls. What do you think?
> 
> Well, if they're all the same on all CPUs, sure. That's your call.
> 

Let's drop them. We won't need them since we're caching the values during
init. And the init code is run on the target CPU.

We can just make smca_bank_addrs[][] into per_cpu when we need to support
different values on different CPUs.

Thanks,
Yazen




Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-05-17 Thread Borislav Petkov
On Thu, May 17, 2018 at 01:04:19PM +, Ghannam, Yazen wrote:
> Yes, you're right. I thought using the existing data structures would work, 
> but it
> seems I messed up the implementation.

Not only that - your idea wouldn't fly because the per-CPU stuff you
were using gets torn down when the CPU goes offline so you can't use
them on resume because they're not there yet.

> Banks 15 and 16 should have an address for block 1 also. Do you have PFEH
> enabled on your system? That would cause MISC0 to be RAZ so we won't
> get the MISC1 address. I'll try it myself also and let you know.

I check PFEH is enabled how?

> I think this good for now. We'll probably need to change it in the
> future, but maybe we can clean up all the thresholding blocks code and
> make it simpler when we do change it.

Ok.

> This hunk could go above the !block. Though maybe the macro is lighter than 
> the
> array lookup. It'll work either way, but I'm just thinking out loud.

Yeah, it doesn't matter in that case.

> Since we're caching the values during init, we can drop all the
> *_on_cpu() calls. What do you think?

Well, if they're all the same on all CPUs, sure. That's your call.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


RE: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-05-17 Thread Ghannam, Yazen
> -Original Message-
> From: Borislav Petkov 
> Sent: Thursday, May 17, 2018 6:41 AM
> To: Johannes Hirte 
> Cc: Ghannam, Yazen ; linux-
> e...@vger.kernel.org; linux-kernel@vger.kernel.org; tony.l...@intel.com;
> x...@kernel.org
> Subject: Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized
> block
> 
> On Thu, May 17, 2018 at 08:49:31AM +0200, Johannes Hirte wrote:
> > Maybe I'm missing something, but those RDMSR IPSs don't happen on
> > pre-SMCA systems, right? So the caching should be avoided here, cause
> > the whole lookup looks more expensive to me than the simple switch-block
> > in get_block_address.
> 
> Yeah, and we should simply cache all those MSR values as I suggested then.
> 

Yes, you're right. I thought using the existing data structures would work, but 
it
seems I messed up the implementation.

> The patch at the end should fix your issue.
> 
> Yazen, so I'm working on the assumption that all addresses are the same
> on every core - I dumped them and it looks like this:
> 
> 128 bank: 00, block: 0 : 0xc0002003
> 128 bank: 00, block: 1 : 0x0
> 128 bank: 01, block: 0 : 0xc0002013
> 128 bank: 01, block: 1 : 0x0
> 128 bank: 02, block: 0 : 0xc0002023
> 128 bank: 02, block: 1 : 0x0
> 128 bank: 03, block: 0 : 0xc0002033
> 128 bank: 03, block: 1 : 0x0
> 128 bank: 04, block: 0 : 0x0
> 128 bank: 05, block: 0 : 0xc0002053
> 128 bank: 05, block: 1 : 0x0
> 128 bank: 06, block: 0 : 0xc0002063
> 128 bank: 06, block: 1 : 0x0
> 128 bank: 07, block: 0 : 0xc0002073
> 128 bank: 07, block: 1 : 0x0
> 128 bank: 08, block: 0 : 0xc0002083
> 128 bank: 08, block: 1 : 0x0
> 128 bank: 09, block: 0 : 0xc0002093
> 128 bank: 09, block: 1 : 0x0
> 128 bank: 10, block: 0 : 0xc00020a3
> 128 bank: 10, block: 1 : 0x0
> 128 bank: 11, block: 0 : 0xc00020b3
> 128 bank: 11, block: 1 : 0x0
> 128 bank: 12, block: 0 : 0xc00020c3
> 128 bank: 12, block: 1 : 0x0
> 128 bank: 13, block: 0 : 0xc00020d3
> 128 bank: 13, block: 1 : 0x0
> 128 bank: 14, block: 0 : 0xc00020e3
> 128 bank: 14, block: 1 : 0x0
> 128 bank: 15, block: 0 : 0xc00020f3
> 128 bank: 15, block: 1 : 0x0
> 128 bank: 16, block: 0 : 0xc0002103
> 128 bank: 16, block: 1 : 0x0

Banks 15 and 16 should have an address for block 1 also. Do you have PFEH
enabled on your system? That would cause MISC0 to be RAZ so we won't
get the MISC1 address. I'll try it myself also and let you know. 

> 128 bank: 17, block: 0 : 0xc0002113
> 128 bank: 17, block: 1 : 0x0
> 128 bank: 18, block: 0 : 0xc0002123
> 128 bank: 18, block: 1 : 0x0
> 128 bank: 19, block: 0 : 0xc0002133
> 128 bank: 19, block: 1 : 0x0
> 128 bank: 20, block: 0 : 0xc0002143
> 128 bank: 20, block: 1 : 0x0
> 128 bank: 21, block: 0 : 0xc0002153
> 128 bank: 21, block: 1 : 0x0
> 128 bank: 22, block: 0 : 0xc0002163
> 128 bank: 22, block: 1 : 0x0
> 
> so you have 128 times the same address on a 128 core Zen box.
> 
> If that is correct, then we can use this nice simplification and cache
> them globally and not per-CPU. Seems to work here. Scream if something's
> amiss.
> 

I think this good for now. We'll probably need to change it in the future, but
maybe we can clean up all the thresholding blocks code and make it simpler
when we do change it.

> Thx.
> 
> ---
> From d8614e904d8f63b89d2d204d6fa247c4c416a1b6 Mon Sep 17 00:00:00
> 2001
> From: Borislav Petkov 
> Date: Thu, 17 May 2018 10:46:26 +0200
> Subject: [PATCH] array caching
> 
> Not-Signed-off-by: Borislav Petkov 
> ---
>  arch/x86/kernel/cpu/mcheck/mce_amd.c | 29 ++--
>  1 file changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> index f7666eef4a87..c8e038800591 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> @@ -94,6 +94,11 @@ static struct smca_bank_name smca_names[] = {
>   [SMCA_SMU]  = { "smu",  "System Management Unit" },
>  };
> 
> +static u32 smca_bank_addrs[MAX_NR_BANKS][NR_BLOCKS] __ro_after_init =
> +{
> + [0 ... MAX_NR_BANKS - 1] = { [0 ... NR_BLOCKS - 1] = -1 }
> +};
> +
>  const char *smca_get_name(enum smca_bank_types t)
>  {
>   if (t >= N_SMCA_BANK_TYPES)
> @@ -443,20 +448,26 @@ static u32 smca_get_block_address(unsigned int
> cpu, unsigned int bank,
>   if (!block)
>   return MSR_AMD64_SMCA_MCx_MISC(bank);
> 
> + /* Check our cache first: */
> 

Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-05-17 Thread Borislav Petkov
On Thu, May 17, 2018 at 08:49:31AM +0200, Johannes Hirte wrote:
> Maybe I'm missing something, but those RDMSR IPSs don't happen on
> pre-SMCA systems, right? So the caching should be avoided here, cause
> the whole lookup looks more expensive to me than the simple switch-block
> in get_block_address.

Yeah, and we should simply cache all those MSR values as I suggested then.

The patch at the end should fix your issue.

Yazen, so I'm working on the assumption that all addresses are the same
on every core - I dumped them and it looks like this:

128 bank: 00, block: 0 : 0xc0002003
128 bank: 00, block: 1 : 0x0
128 bank: 01, block: 0 : 0xc0002013
128 bank: 01, block: 1 : 0x0
128 bank: 02, block: 0 : 0xc0002023
128 bank: 02, block: 1 : 0x0
128 bank: 03, block: 0 : 0xc0002033
128 bank: 03, block: 1 : 0x0
128 bank: 04, block: 0 : 0x0
128 bank: 05, block: 0 : 0xc0002053
128 bank: 05, block: 1 : 0x0
128 bank: 06, block: 0 : 0xc0002063
128 bank: 06, block: 1 : 0x0
128 bank: 07, block: 0 : 0xc0002073
128 bank: 07, block: 1 : 0x0
128 bank: 08, block: 0 : 0xc0002083
128 bank: 08, block: 1 : 0x0
128 bank: 09, block: 0 : 0xc0002093
128 bank: 09, block: 1 : 0x0
128 bank: 10, block: 0 : 0xc00020a3
128 bank: 10, block: 1 : 0x0
128 bank: 11, block: 0 : 0xc00020b3
128 bank: 11, block: 1 : 0x0
128 bank: 12, block: 0 : 0xc00020c3
128 bank: 12, block: 1 : 0x0
128 bank: 13, block: 0 : 0xc00020d3
128 bank: 13, block: 1 : 0x0
128 bank: 14, block: 0 : 0xc00020e3
128 bank: 14, block: 1 : 0x0
128 bank: 15, block: 0 : 0xc00020f3
128 bank: 15, block: 1 : 0x0
128 bank: 16, block: 0 : 0xc0002103
128 bank: 16, block: 1 : 0x0
128 bank: 17, block: 0 : 0xc0002113
128 bank: 17, block: 1 : 0x0
128 bank: 18, block: 0 : 0xc0002123
128 bank: 18, block: 1 : 0x0
128 bank: 19, block: 0 : 0xc0002133
128 bank: 19, block: 1 : 0x0
128 bank: 20, block: 0 : 0xc0002143
128 bank: 20, block: 1 : 0x0
128 bank: 21, block: 0 : 0xc0002153
128 bank: 21, block: 1 : 0x0
128 bank: 22, block: 0 : 0xc0002163
128 bank: 22, block: 1 : 0x0

so you have 128 times the same address on a 128 core Zen box.

If that is correct, then we can use this nice simplification and cache
them globally and not per-CPU. Seems to work here. Scream if something's
amiss.

Thx.

---
>From d8614e904d8f63b89d2d204d6fa247c4c416a1b6 Mon Sep 17 00:00:00 2001
From: Borislav Petkov 
Date: Thu, 17 May 2018 10:46:26 +0200
Subject: [PATCH] array caching

Not-Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 29 ++--
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index f7666eef4a87..c8e038800591 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -94,6 +94,11 @@ static struct smca_bank_name smca_names[] = {
[SMCA_SMU]  = { "smu",  "System Management Unit" },
 };
 
+static u32 smca_bank_addrs[MAX_NR_BANKS][NR_BLOCKS] __ro_after_init =
+{
+   [0 ... MAX_NR_BANKS - 1] = { [0 ... NR_BLOCKS - 1] = -1 }
+};
+
 const char *smca_get_name(enum smca_bank_types t)
 {
if (t >= N_SMCA_BANK_TYPES)
@@ -443,20 +448,26 @@ static u32 smca_get_block_address(unsigned int cpu, 
unsigned int bank,
if (!block)
return MSR_AMD64_SMCA_MCx_MISC(bank);
 
+   /* Check our cache first: */
+   if (smca_bank_addrs[bank][block] != -1)
+   return smca_bank_addrs[bank][block];
+
/*
 * For SMCA enabled processors, BLKPTR field of the first MISC register
 * (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
 */
if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_CONFIG(bank), &low, 
&high))
-   return addr;
+   goto out;
 
if (!(low & MCI_CONFIG_MCAX))
-   return addr;
+   goto out;
 
if (!rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_MISC(bank), &low, &high) 
&&
(low & MASK_BLKPTR_LO))
-   return MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
+   addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
 
+out:
+   smca_bank_addrs[bank][block] = addr;
return addr;
 }
 
@@ -468,18 +479,6 @@ static u32 get_block_address(unsigned int cpu, u32 
current_addr, u32 low, u32 hi
if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
return addr;
 
-   /* Get address from already initialized block. */
-   if (per_cpu(threshold_banks, cpu)) {
-   struct threshold_bank *bankp = per_cpu(threshold_banks, 
cpu)[bank];
-
-   if (bankp && bankp->blocks) {
-   struct threshold_block *blockp = &bankp->blocks[block];
-
-   if (blockp)
-   return blockp

Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-05-16 Thread Johannes Hirte
On 2018 Mai 17, Borislav Petkov wrote:
> On Tue, May 15, 2018 at 11:39:54AM +0200, Johannes Hirte wrote:
> > The out-of-bound access happens in get_block_address:
> > 
> > if (bankp && bankp->blocks) {
> > struct threshold_block *blockp blockp = &bankp->blocks[block];
> > 
> > with block=1. This doesn't exists. I don't even find any array here.
> > There is a linked list, created in allocate_threshold_blocks. On my
> > system I get 17 lists with one element each.
> 
> Yes, what a mess this is. ;-\
> 
> There's no such thing as ->blocks[block] array. We assign simply the
> threshold_block to it in allocate_threshold_blocks:
> 
>   per_cpu(threshold_banks, cpu)[bank]->blocks = b;
> 
> And I can't say the design of this thing is really friendly but it is
> still no excuse that I missed that during review. Grrr.
> 
> So, Yazen, what really needs to happen here is to iterate the
> bank->blocks->miscj list to find the block you're looking for and return
> its address, the opposite to this here:
> 
> if (per_cpu(threshold_banks, cpu)[bank]->blocks) {
> list_add(&b->miscj,
>  &per_cpu(threshold_banks, cpu)[bank]->blocks->miscj);
> } else {
> per_cpu(threshold_banks, cpu)[bank]->blocks = b;
> }
> 
> and don't forget to look at ->blocks itself.
> 
> And then you need to make sure that searching for block addresses still
> works when resuming from suspend so that you can avoid the RDMSR IPIs.
> 

Maybe I'm missing something, but those RDMSR IPSs don't happen on
pre-SMCA systems, right? So the caching should be avoided here, cause
the whole lookup looks more expensive to me than the simple switch-block
in get_block_address.

-- 
Regards,
  Johannes



Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-05-16 Thread Borislav Petkov
On Tue, May 15, 2018 at 11:39:54AM +0200, Johannes Hirte wrote:
> The out-of-bound access happens in get_block_address:
> 
>   if (bankp && bankp->blocks) {
>   struct threshold_block *blockp blockp = &bankp->blocks[block];
> 
> with block=1. This doesn't exists. I don't even find any array here.
> There is a linked list, created in allocate_threshold_blocks. On my
> system I get 17 lists with one element each.

Yes, what a mess this is. ;-\

There's no such thing as ->blocks[block] array. We assign simply the
threshold_block to it in allocate_threshold_blocks:

per_cpu(threshold_banks, cpu)[bank]->blocks = b;

And I can't say the design of this thing is really friendly but it is
still no excuse that I missed that during review. Grrr.

So, Yazen, what really needs to happen here is to iterate the
bank->blocks->miscj list to find the block you're looking for and return
its address, the opposite to this here:

if (per_cpu(threshold_banks, cpu)[bank]->blocks) {
list_add(&b->miscj,
 &per_cpu(threshold_banks, cpu)[bank]->blocks->miscj);
} else {
per_cpu(threshold_banks, cpu)[bank]->blocks = b;
}

and don't forget to look at ->blocks itself.

And then you need to make sure that searching for block addresses still
works when resuming from suspend so that you can avoid the RDMSR IPIs.

Ok?

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-05-15 Thread Johannes Hirte
On 2018 Apr 17, Ghannam, Yazen wrote:
> > -Original Message-
> > From: linux-edac-ow...@vger.kernel.org  > ow...@vger.kernel.org> On Behalf Of Johannes Hirte
> > Sent: Monday, April 16, 2018 7:56 AM
> > To: Ghannam, Yazen 
> > Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; b...@suse.de;
> > tony.l...@intel.com; x...@kernel.org
> > Subject: Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized
> > block
> > 
> > On 2018 Apr 14, Johannes Hirte wrote:
> > > On 2018 Feb 01, Yazen Ghannam wrote:
> > > > From: Yazen Ghannam 
> > > >
> > > > The block address is saved after the block is initialized when
> > > > threshold_init_device() is called.
> > > >
> > > > Use the saved block address, if available, rather than trying to
> > > > rediscover it.
> > > >
> > > > We can avoid some *on_cpu() calls in the init path that will cause a
> > > > call trace when resuming from suspend.
> > > >
> > > > Cc:  # 4.14.x
> > > > Signed-off-by: Yazen Ghannam 
> > > > ---
> > > >  arch/x86/kernel/cpu/mcheck/mce_amd.c | 15 +++
> > > >  1 file changed, 15 insertions(+)
> > > >
> > > > diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > > > index bf53b4549a17..8c4f8f30c779 100644
> > > > --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > > > +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > > > @@ -436,6 +436,21 @@ static u32 get_block_address(unsigned int cpu,
> > u32 current_addr, u32 low, u32 hi
> > > >  {
> > > > u32 addr = 0, offset = 0;
> > > >
> > > > +   if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
> > > > +   return addr;
> > > > +
> > > > +   /* Get address from already initialized block. */
> > > > +   if (per_cpu(threshold_banks, cpu)) {
> > > > +   struct threshold_bank *bankp = per_cpu(threshold_banks,
> > cpu)[bank];
> > > > +
> > > > +   if (bankp && bankp->blocks) {
> > > > +   struct threshold_block *blockp = &bankp-
> > >blocks[block];
> > > > +
> > > > +   if (blockp)
> > > > +   return blockp->address;
> > > > +   }
> > > > +   }
> > > > +
> > > > if (mce_flags.smca) {
> > > > if (smca_get_bank_type(bank) == SMCA_RESERVED)
> > > > return addr;
> > > > --
> > > > 2.14.1
> > >
> > > I have a KASAN: slab-out-of-bounds, and git bisect points me to this
> > > change:
> > >
> > > Apr 13 00:40:32 probook kernel:
> > 
> > ==
> > > Apr 13 00:40:32 probook kernel: BUG: KASAN: slab-out-of-bounds in
> > get_block_address.isra.3+0x1e9/0x520
> > > Apr 13 00:40:32 probook kernel: Read of size 4 at addr 8803f165ddf4 by
> > task swapper/0/1
> > > Apr 13 00:40:32 probook kernel:
> > > Apr 13 00:40:32 probook kernel: CPU: 1 PID: 1 Comm: swapper/0 Not
> > tainted 4.16.0-10757-g4ca8ba4ccff9 #532
> > > Apr 13 00:40:32 probook kernel: Hardware name: HP HP ProBook 645
> > G2/80FE, BIOS N77 Ver. 01.12 12/19/2017
> > > Apr 13 00:40:32 probook kernel: Call Trace:
> > > Apr 13 00:40:32 probook kernel:  dump_stack+0x5b/0x8b
> > > Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x1e9/0x520
> > > Apr 13 00:40:32 probook kernel:  print_address_description+0x65/0x270
> > > Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x1e9/0x520
> > > Apr 13 00:40:32 probook kernel:  kasan_report+0x232/0x350
> > > Apr 13 00:40:32 probook kernel:  get_block_address.isra.3+0x1e9/0x520
> > > Apr 13 00:40:32 probook kernel:  ? kobject_init_and_add+0xde/0x130
> > > Apr 13 00:40:32 probook kernel:  ? get_name+0x390/0x390
> > > Apr 13 00:40:32 probook kernel:  ? kasan_unpoison_shadow+0x30/0x40
> > > Apr 13 00:40:32 probook kernel:  ? kasan_kmalloc+0xa0/0xd0
> > > Apr 13 00:40:32 probook kernel:  allocate_threshold_blocks+0x12c/0xc60
> > > Apr 13 00:40:32 probook kernel:  ? kobject_add_internal+0x800/0x800
> > > Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x520/0x520
> > > 

RE: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-04-17 Thread Ghannam, Yazen
> -Original Message-
> From: linux-edac-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Johannes Hirte
> Sent: Monday, April 16, 2018 7:56 AM
> To: Ghannam, Yazen 
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; b...@suse.de;
> tony.l...@intel.com; x...@kernel.org
> Subject: Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized
> block
> 
> On 2018 Apr 14, Johannes Hirte wrote:
> > On 2018 Feb 01, Yazen Ghannam wrote:
> > > From: Yazen Ghannam 
> > >
> > > The block address is saved after the block is initialized when
> > > threshold_init_device() is called.
> > >
> > > Use the saved block address, if available, rather than trying to
> > > rediscover it.
> > >
> > > We can avoid some *on_cpu() calls in the init path that will cause a
> > > call trace when resuming from suspend.
> > >
> > > Cc:  # 4.14.x
> > > Signed-off-by: Yazen Ghannam 
> > > ---
> > >  arch/x86/kernel/cpu/mcheck/mce_amd.c | 15 +++
> > >  1 file changed, 15 insertions(+)
> > >
> > > diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > > index bf53b4549a17..8c4f8f30c779 100644
> > > --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > > +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > > @@ -436,6 +436,21 @@ static u32 get_block_address(unsigned int cpu,
> u32 current_addr, u32 low, u32 hi
> > >  {
> > >   u32 addr = 0, offset = 0;
> > >
> > > + if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
> > > + return addr;
> > > +
> > > + /* Get address from already initialized block. */
> > > + if (per_cpu(threshold_banks, cpu)) {
> > > + struct threshold_bank *bankp = per_cpu(threshold_banks,
> cpu)[bank];
> > > +
> > > + if (bankp && bankp->blocks) {
> > > + struct threshold_block *blockp = &bankp-
> >blocks[block];
> > > +
> > > + if (blockp)
> > > + return blockp->address;
> > > + }
> > > + }
> > > +
> > >   if (mce_flags.smca) {
> > >   if (smca_get_bank_type(bank) == SMCA_RESERVED)
> > >   return addr;
> > > --
> > > 2.14.1
> >
> > I have a KASAN: slab-out-of-bounds, and git bisect points me to this
> > change:
> >
> > Apr 13 00:40:32 probook kernel:
> 
> ==
> > Apr 13 00:40:32 probook kernel: BUG: KASAN: slab-out-of-bounds in
> get_block_address.isra.3+0x1e9/0x520
> > Apr 13 00:40:32 probook kernel: Read of size 4 at addr 8803f165ddf4 by
> task swapper/0/1
> > Apr 13 00:40:32 probook kernel:
> > Apr 13 00:40:32 probook kernel: CPU: 1 PID: 1 Comm: swapper/0 Not
> tainted 4.16.0-10757-g4ca8ba4ccff9 #532
> > Apr 13 00:40:32 probook kernel: Hardware name: HP HP ProBook 645
> G2/80FE, BIOS N77 Ver. 01.12 12/19/2017
> > Apr 13 00:40:32 probook kernel: Call Trace:
> > Apr 13 00:40:32 probook kernel:  dump_stack+0x5b/0x8b
> > Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x1e9/0x520
> > Apr 13 00:40:32 probook kernel:  print_address_description+0x65/0x270
> > Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x1e9/0x520
> > Apr 13 00:40:32 probook kernel:  kasan_report+0x232/0x350
> > Apr 13 00:40:32 probook kernel:  get_block_address.isra.3+0x1e9/0x520
> > Apr 13 00:40:32 probook kernel:  ? kobject_init_and_add+0xde/0x130
> > Apr 13 00:40:32 probook kernel:  ? get_name+0x390/0x390
> > Apr 13 00:40:32 probook kernel:  ? kasan_unpoison_shadow+0x30/0x40
> > Apr 13 00:40:32 probook kernel:  ? kasan_kmalloc+0xa0/0xd0
> > Apr 13 00:40:32 probook kernel:  allocate_threshold_blocks+0x12c/0xc60
> > Apr 13 00:40:32 probook kernel:  ? kobject_add_internal+0x800/0x800
> > Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x520/0x520
> > Apr 13 00:40:32 probook kernel:  ? kasan_kmalloc+0xa0/0xd0
> > Apr 13 00:40:32 probook kernel:
> mce_threshold_create_device+0x35b/0x990
> > Apr 13 00:40:32 probook kernel:  ? init_special_inode+0x1d0/0x230
> > Apr 13 00:40:32 probook kernel:  threshold_init_device+0x98/0xa7
> > Apr 13 00:40:32 probook kernel:  ?
> mcheck_vendor_init_severity+0x43/0x43
> > Apr 13 00:40:32 probook kernel:  do_one_initcall+0x76/0x30c
> > Apr 13 00:40:32 probook kernel:  ?
> trace_event_raw_event_initcall_finish+0x190/0x190
> > Ap

Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-04-16 Thread Johannes Hirte
On 2018 Apr 14, Johannes Hirte wrote:
> On 2018 Feb 01, Yazen Ghannam wrote:
> > From: Yazen Ghannam 
> > 
> > The block address is saved after the block is initialized when
> > threshold_init_device() is called.
> > 
> > Use the saved block address, if available, rather than trying to
> > rediscover it.
> > 
> > We can avoid some *on_cpu() calls in the init path that will cause a
> > call trace when resuming from suspend.
> > 
> > Cc:  # 4.14.x
> > Signed-off-by: Yazen Ghannam 
> > ---
> >  arch/x86/kernel/cpu/mcheck/mce_amd.c | 15 +++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
> > b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > index bf53b4549a17..8c4f8f30c779 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > @@ -436,6 +436,21 @@ static u32 get_block_address(unsigned int cpu, u32 
> > current_addr, u32 low, u32 hi
> >  {
> > u32 addr = 0, offset = 0;
> >  
> > +   if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
> > +   return addr;
> > +
> > +   /* Get address from already initialized block. */
> > +   if (per_cpu(threshold_banks, cpu)) {
> > +   struct threshold_bank *bankp = per_cpu(threshold_banks, 
> > cpu)[bank];
> > +
> > +   if (bankp && bankp->blocks) {
> > +   struct threshold_block *blockp = &bankp->blocks[block];
> > +
> > +   if (blockp)
> > +   return blockp->address;
> > +   }
> > +   }
> > +
> > if (mce_flags.smca) {
> > if (smca_get_bank_type(bank) == SMCA_RESERVED)
> > return addr;
> > -- 
> > 2.14.1
> 
> I have a KASAN: slab-out-of-bounds, and git bisect points me to this
> change:
> 
> Apr 13 00:40:32 probook kernel: 
> ==
> Apr 13 00:40:32 probook kernel: BUG: KASAN: slab-out-of-bounds in 
> get_block_address.isra.3+0x1e9/0x520
> Apr 13 00:40:32 probook kernel: Read of size 4 at addr 8803f165ddf4 by 
> task swapper/0/1
> Apr 13 00:40:32 probook kernel: 
> Apr 13 00:40:32 probook kernel: CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
> 4.16.0-10757-g4ca8ba4ccff9 #532
> Apr 13 00:40:32 probook kernel: Hardware name: HP HP ProBook 645 G2/80FE, 
> BIOS N77 Ver. 01.12 12/19/2017
> Apr 13 00:40:32 probook kernel: Call Trace:
> Apr 13 00:40:32 probook kernel:  dump_stack+0x5b/0x8b
> Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x1e9/0x520
> Apr 13 00:40:32 probook kernel:  print_address_description+0x65/0x270
> Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x1e9/0x520
> Apr 13 00:40:32 probook kernel:  kasan_report+0x232/0x350
> Apr 13 00:40:32 probook kernel:  get_block_address.isra.3+0x1e9/0x520
> Apr 13 00:40:32 probook kernel:  ? kobject_init_and_add+0xde/0x130
> Apr 13 00:40:32 probook kernel:  ? get_name+0x390/0x390
> Apr 13 00:40:32 probook kernel:  ? kasan_unpoison_shadow+0x30/0x40
> Apr 13 00:40:32 probook kernel:  ? kasan_kmalloc+0xa0/0xd0
> Apr 13 00:40:32 probook kernel:  allocate_threshold_blocks+0x12c/0xc60
> Apr 13 00:40:32 probook kernel:  ? kobject_add_internal+0x800/0x800
> Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x520/0x520
> Apr 13 00:40:32 probook kernel:  ? kasan_kmalloc+0xa0/0xd0
> Apr 13 00:40:32 probook kernel:  mce_threshold_create_device+0x35b/0x990
> Apr 13 00:40:32 probook kernel:  ? init_special_inode+0x1d0/0x230
> Apr 13 00:40:32 probook kernel:  threshold_init_device+0x98/0xa7
> Apr 13 00:40:32 probook kernel:  ? mcheck_vendor_init_severity+0x43/0x43
> Apr 13 00:40:32 probook kernel:  do_one_initcall+0x76/0x30c
> Apr 13 00:40:32 probook kernel:  ? 
> trace_event_raw_event_initcall_finish+0x190/0x190
> Apr 13 00:40:32 probook kernel:  ? kasan_unpoison_shadow+0xb/0x40
> Apr 13 00:40:32 probook kernel:  ? kasan_unpoison_shadow+0x30/0x40
> Apr 13 00:40:32 probook kernel:  kernel_init_freeable+0x3d6/0x471
> Apr 13 00:40:32 probook kernel:  ? rest_init+0xf0/0xf0
> Apr 13 00:40:32 probook kernel:  kernel_init+0xa/0x120
> Apr 13 00:40:32 probook kernel:  ? rest_init+0xf0/0xf0
> Apr 13 00:40:32 probook kernel:  ret_from_fork+0x22/0x40
> Apr 13 00:40:32 probook kernel:
> Apr 13 00:40:32 probook kernel: Allocated by task 1:
> Apr 13 00:40:32 probook kernel:  kasan_kmalloc+0xa0/0xd0
> Apr 13 00:40:32 probook kernel:  kmem_cache_alloc_trace+0xf3/0x1f0
> Apr 13 00:40:32 probook kernel:  allocate_threshold_blocks+0x1bc/0xc60
> Apr 13 00:40:32 probook kernel:  mce_threshold_create_device+0x35b/0x990
> Apr 13 00:40:32 probook kernel:  threshold_init_device+0x98/0xa7
> Apr 13 00:40:32 probook kernel:  do_one_initcall+0x76/0x30c
> Apr 13 00:40:32 probook kernel:  kernel_init_freeable+0x3d6/0x471
> Apr 13 00:40:32 probook kernel:  kernel_init+0xa/0x120
> Apr 13 00:40:32 probook kernel:  ret_from_fork+0x22/0x40
> Apr 13 00:40:32 probook kernel: 
> Apr 13 00:40:32 probook kernel: Freed by task 0:
> Apr 13 00:40:32 probook ker

Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-04-13 Thread Johannes Hirte
On 2018 Feb 01, Yazen Ghannam wrote:
> From: Yazen Ghannam 
> 
> The block address is saved after the block is initialized when
> threshold_init_device() is called.
> 
> Use the saved block address, if available, rather than trying to
> rediscover it.
> 
> We can avoid some *on_cpu() calls in the init path that will cause a
> call trace when resuming from suspend.
> 
> Cc:  # 4.14.x
> Signed-off-by: Yazen Ghannam 
> ---
>  arch/x86/kernel/cpu/mcheck/mce_amd.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c 
> b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> index bf53b4549a17..8c4f8f30c779 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> @@ -436,6 +436,21 @@ static u32 get_block_address(unsigned int cpu, u32 
> current_addr, u32 low, u32 hi
>  {
>   u32 addr = 0, offset = 0;
>  
> + if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
> + return addr;
> +
> + /* Get address from already initialized block. */
> + if (per_cpu(threshold_banks, cpu)) {
> + struct threshold_bank *bankp = per_cpu(threshold_banks, 
> cpu)[bank];
> +
> + if (bankp && bankp->blocks) {
> + struct threshold_block *blockp = &bankp->blocks[block];
> +
> + if (blockp)
> + return blockp->address;
> + }
> + }
> +
>   if (mce_flags.smca) {
>   if (smca_get_bank_type(bank) == SMCA_RESERVED)
>   return addr;
> -- 
> 2.14.1

I have a KASAN: slab-out-of-bounds, and git bisect points me to this
change:

Apr 13 00:40:32 probook kernel: 
==
Apr 13 00:40:32 probook kernel: BUG: KASAN: slab-out-of-bounds in 
get_block_address.isra.3+0x1e9/0x520
Apr 13 00:40:32 probook kernel: Read of size 4 at addr 8803f165ddf4 by task 
swapper/0/1
Apr 13 00:40:32 probook kernel: 
Apr 13 00:40:32 probook kernel: CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
4.16.0-10757-g4ca8ba4ccff9 #532
Apr 13 00:40:32 probook kernel: Hardware name: HP HP ProBook 645 G2/80FE, BIOS 
N77 Ver. 01.12 12/19/2017
Apr 13 00:40:32 probook kernel: Call Trace:
Apr 13 00:40:32 probook kernel:  dump_stack+0x5b/0x8b
Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x1e9/0x520
Apr 13 00:40:32 probook kernel:  print_address_description+0x65/0x270
Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x1e9/0x520
Apr 13 00:40:32 probook kernel:  kasan_report+0x232/0x350
Apr 13 00:40:32 probook kernel:  get_block_address.isra.3+0x1e9/0x520
Apr 13 00:40:32 probook kernel:  ? kobject_init_and_add+0xde/0x130
Apr 13 00:40:32 probook kernel:  ? get_name+0x390/0x390
Apr 13 00:40:32 probook kernel:  ? kasan_unpoison_shadow+0x30/0x40
Apr 13 00:40:32 probook kernel:  ? kasan_kmalloc+0xa0/0xd0
Apr 13 00:40:32 probook kernel:  allocate_threshold_blocks+0x12c/0xc60
Apr 13 00:40:32 probook kernel:  ? kobject_add_internal+0x800/0x800
Apr 13 00:40:32 probook kernel:  ? get_block_address.isra.3+0x520/0x520
Apr 13 00:40:32 probook kernel:  ? kasan_kmalloc+0xa0/0xd0
Apr 13 00:40:32 probook kernel:  mce_threshold_create_device+0x35b/0x990
Apr 13 00:40:32 probook kernel:  ? init_special_inode+0x1d0/0x230
Apr 13 00:40:32 probook kernel:  threshold_init_device+0x98/0xa7
Apr 13 00:40:32 probook kernel:  ? mcheck_vendor_init_severity+0x43/0x43
Apr 13 00:40:32 probook kernel:  do_one_initcall+0x76/0x30c
Apr 13 00:40:32 probook kernel:  ? 
trace_event_raw_event_initcall_finish+0x190/0x190
Apr 13 00:40:32 probook kernel:  ? kasan_unpoison_shadow+0xb/0x40
Apr 13 00:40:32 probook kernel:  ? kasan_unpoison_shadow+0x30/0x40
Apr 13 00:40:32 probook kernel:  kernel_init_freeable+0x3d6/0x471
Apr 13 00:40:32 probook kernel:  ? rest_init+0xf0/0xf0
Apr 13 00:40:32 probook kernel:  kernel_init+0xa/0x120
Apr 13 00:40:32 probook kernel:  ? rest_init+0xf0/0xf0
Apr 13 00:40:32 probook kernel:  ret_from_fork+0x22/0x40
Apr 13 00:40:32 probook kernel:
Apr 13 00:40:32 probook kernel: Allocated by task 1:
Apr 13 00:40:32 probook kernel:  kasan_kmalloc+0xa0/0xd0
Apr 13 00:40:32 probook kernel:  kmem_cache_alloc_trace+0xf3/0x1f0
Apr 13 00:40:32 probook kernel:  allocate_threshold_blocks+0x1bc/0xc60
Apr 13 00:40:32 probook kernel:  mce_threshold_create_device+0x35b/0x990
Apr 13 00:40:32 probook kernel:  threshold_init_device+0x98/0xa7
Apr 13 00:40:32 probook kernel:  do_one_initcall+0x76/0x30c
Apr 13 00:40:32 probook kernel:  kernel_init_freeable+0x3d6/0x471
Apr 13 00:40:32 probook kernel:  kernel_init+0xa/0x120
Apr 13 00:40:32 probook kernel:  ret_from_fork+0x22/0x40
Apr 13 00:40:32 probook kernel: 
Apr 13 00:40:32 probook kernel: Freed by task 0:
Apr 13 00:40:32 probook kernel: (stack is not available)
Apr 13 00:40:32 probook kernel: 
Apr 13 00:40:32 probook kernel: The buggy address belongs to the object at 
8803f165dd80
 which belongs to the cache kmalloc-128 of size 128
Apr 13 00:40:3

Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block

2018-02-08 Thread Borislav Petkov
On Thu, Feb 01, 2018 at 12:48:13PM -0600, Yazen Ghannam wrote:
> From: Yazen Ghannam 
> 
> The block address is saved after the block is initialized when
> threshold_init_device() is called.
> 
> Use the saved block address, if available, rather than trying to
> rediscover it.
> 
> We can avoid some *on_cpu() calls in the init path that will cause a
> call trace when resuming from suspend.

Ditto.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.