Re: [PATCH 1/6] x86-mce: Modify CMCI poll interval to adjust for small check_interval values.

2014-07-11 Thread Havard Skinnemoen
On Fri, Jul 11, 2014 at 1:22 PM, Borislav Petkov wrote: > On Fri, Jul 11, 2014 at 11:56:11AM -0700, Havard Skinnemoen wrote: >> > * max number of CMCIs per second a system can sustain fine, i.e. the 100 >> > above >> >> What's the definition of "fine&qu

Re: [PATCH 4/6] x86-mce: Add spinlocks to prevent duplicated MCP and CMCI reports.

2014-07-11 Thread Havard Skinnemoen
On Fri, Jul 11, 2014 at 12:52 PM, Borislav Petkov wrote: > On Fri, Jul 11, 2014 at 12:06:40PM -0700, Tony Luck wrote: >> > + if (atomic_add_unless(&mce_banks[i].poll_reader, 1, 1)) { >> > + m.status = mce_rdmsrl(MSR_IA32_MCx_STATUS(i)); >> >> Same as yesterday.

Re: [PATCH 1/6] x86-mce: Modify CMCI poll interval to adjust for small check_interval values.

2014-07-11 Thread Havard Skinnemoen
On Fri, Jul 11, 2014 at 1:36 PM, Borislav Petkov wrote: > On Fri, Jul 11, 2014 at 11:56:11AM -0700, Havard Skinnemoen wrote: >> > Basically the scheme becomes the following: >> > >> > * We switch to polling if we detect a second CMCI under an interval X >> >

Re: [PATCH 1/6] x86-mce: Modify CMCI poll interval to adjust for small check_interval values.

2014-07-11 Thread Havard Skinnemoen
On Fri, Jul 11, 2014 at 1:10 PM, Borislav Petkov wrote: > I'm going to reply with multiple mails so that we can keep the things > separate and not let replies grow out of proportion. > > On Fri, Jul 11, 2014 at 11:56:11AM -0700, Havard Skinnemoen wrote: >> So a short burst

Re: [PATCH 1/6] x86-mce: Modify CMCI poll interval to adjust for small check_interval values.

2014-07-11 Thread Havard Skinnemoen
On Fri, Jul 11, 2014 at 8:35 AM, Borislav Petkov wrote: > So, with roughly few hundred CMCIs per second, we can be generous and > say we can handle 100 CMCIs per second just fine. Which would mean, if > the CMCI handler takes 10ms, with 100 CMCIs per second, we spend the > whole time handling CMCI

Re: [PATCH 1/6] x86-mce: Modify CMCI poll interval to adjust for small check_interval values.

2014-07-10 Thread Havard Skinnemoen
On Thu, Jul 10, 2014 at 11:55 AM, Tony Luck wrote: > On Thu, Jul 10, 2014 at 10:51 AM, Havard Skinnemoen > wrote: >> What's the typical interrupt rate during a storm? We should make it >> significantly less frequent than that, otherwise there's no point >> swit

Re: [PATCH 2/6] x86-mce: Modify CMCI storm exit to reenable instead of rediscover banks.

2014-07-10 Thread Havard Skinnemoen
On Thu, Jul 10, 2014 at 8:51 AM, Borislav Petkov wrote: > > On Wed, Jul 09, 2014 at 02:34:39PM -0700, Havard Skinnemoen wrote: > > On Wed, Jul 9, 2014 at 1:20 PM, Luck, Tony wrote: > > >> The CMCI storm handler previously called cmci_reenable() when exiting a > >

Re: [PATCH 4/6] x86-mce: Add spinlocks to prevent duplicated MCP and CMCI reports.

2014-07-10 Thread Havard Skinnemoen
On Thu, Jul 10, 2014 at 9:41 AM, Borislav Petkov wrote: > On Wed, Jul 09, 2014 at 10:09:24AM -0700, Havard Skinnemoen wrote: >> @@ -617,14 +620,28 @@ void machine_check_poll(enum mcp_flags flags, >> mce_banks_t *b) >> >> this_cp

Re: [PATCH 3/6] x86-mce: Clear CMCI enable on all claimed CMCI banks before reboot.

2014-07-10 Thread Havard Skinnemoen
On Thu, Jul 10, 2014 at 9:24 AM, Borislav Petkov wrote: > We don't need all that atomicity special fun if we register the reboot > notifier on the BSP, say from mcheck_init() which is done even pre-SMP. > > If that's too early, we can add an initcall or whatever... OK, will see if I can find a be

Re: [PATCH 1/6] x86-mce: Modify CMCI poll interval to adjust for small check_interval values.

2014-07-10 Thread Havard Skinnemoen
On Thu, Jul 10, 2014 at 4:42 AM, Borislav Petkov wrote: > + linux-edac. > > On Wed, Jul 09, 2014 at 02:24:31PM -0700, Havard Skinnemoen wrote: >> > > The CMCI poll interval was updated to pick the minimum interval between >> > > the original 30 seconds and the che

Re: [PATCH 1/6] x86-mce: Modify CMCI poll interval to adjust for small check_interval values.

2014-07-10 Thread Havard Skinnemoen
On Thu, Jul 10, 2014 at 2:01 AM, Chen, Gong wrote: > On Wed, Jul 09, 2014 at 02:24:31PM -0700, Havard Skinnemoen wrote: >> On Wed, Jul 9, 2014 at 12:17 PM, Borislav Petkov wrote: >> > Why min 3 polls? How do you come up with exactly that frequency? >> >> The idea

Re: [PATCH 6/6] x86-mce: ensure the MCP timer is not already set in the mce_timer_fn.

2014-07-09 Thread Havard Skinnemoen
On Wed, Jul 9, 2014 at 2:04 PM, Luck, Tony wrote: > + /* Ensure a CMCI interrupt can't preempt this. */ > + local_irq_save(flags); > if (mce_available(__this_cpu_ptr(&cpu_info))) { > machine_check_poll(MCP_TIMESTAMP, > &__get_cpu_

Re: [PATCH 5/6] x86-mce: check if no_way_out applies before deciding not to clear MCE banks.

2014-07-09 Thread Havard Skinnemoen
On Wed, Jul 9, 2014 at 2:00 PM, Luck, Tony wrote: > + if (!(no_way_out && cfg->tolerant < 3)) > mce_clear_state(toclear); > > Style - I think this is easier to grok: > > if (!no_way_out || cfg->tolerant >=3) > mce_clear_state(toclear); > > but not too

Re: [PATCH 4/6] x86-mce: Add spinlocks to prevent duplicated MCP and CMCI reports.

2014-07-09 Thread Havard Skinnemoen
On Wed, Jul 9, 2014 at 1:47 PM, Luck, Tony wrote: > if (!(flags & MCP_UC) && > - (m.status & (mca_cfg.ser ? MCI_STATUS_S : MCI_STATUS_UC))) > + (m.status & (mca_cfg.ser ? MCI_STATUS_S : > MCI_STATUS_UC))) { > + spin_unlock_

Re: [PATCH 4/6] x86-mce: Add spinlocks to prevent duplicated MCP and CMCI reports.

2014-07-09 Thread Havard Skinnemoen
On Wed, Jul 9, 2014 at 1:35 PM, Andi Kleen wrote: > Havard Skinnemoen writes: > >> machine_check_poll() was modified to use spin_lock_irqsave independently >> per bank when a valid MCE is found to prevent duplicated MCE reports by >> the CMCI and polling methods. In the

Re: [PATCH 3/6] x86-mce: Clear CMCI enable on all claimed CMCI banks before reboot.

2014-07-09 Thread Havard Skinnemoen
On Wed, Jul 9, 2014 at 1:36 PM, Luck, Tony wrote: > + if (!xchg(&reboot_notifier_registered, true)) > + register_reboot_notifier(&cmci_reboot_notifier); > > This is super-safe ... but isn't the xchg() overkill? I thought we serialized > bringup > of other cpus. Could be. Ther

Re: [PATCH 2/6] x86-mce: Modify CMCI storm exit to reenable instead of rediscover banks.

2014-07-09 Thread Havard Skinnemoen
On Wed, Jul 9, 2014 at 1:20 PM, Luck, Tony wrote: >> The CMCI storm handler previously called cmci_reenable() when exiting a >> CMCI storm. However, when entering a CMCI storm the bank ownership was >> not relinquished by the affected CPUs. The CMCIs were only disabled via >> cmci_storm_disable_ba

Re: [PATCH 1/6] x86-mce: Modify CMCI poll interval to adjust for small check_interval values.

2014-07-09 Thread Havard Skinnemoen
On Wed, Jul 9, 2014 at 12:17 PM, Borislav Petkov wrote: > > On Wed, Jul 09, 2014 at 10:09:21AM -0700, Havard Skinnemoen wrote: > > From: Ewout van Bekkum > > > > The CMCI poll interval was updated to pick the minimum interval between > > the original 30 seconds and

[PATCH 2/6] x86-mce: Modify CMCI storm exit to reenable instead of rediscover banks.

2014-07-09 Thread Havard Skinnemoen
instead call a new function, cmci_storm_enable_banks(), to reenable CMCI on the already owned banks instead of rediscovering CMCI banks (which were still owned but disabled). Signed-off-by: Ewout van Bekkum Signed-off-by: Havard Skinnemoen --- arch/x86/kernel/cpu/mcheck/mce_intel.c | 50

[PATCH 6/6] x86-mce: ensure the MCP timer is not already set in the mce_timer_fn.

2014-07-09 Thread Havard Skinnemoen
condition was resolved by disabling interrupts during the mce_timer_fn() function and by verifying the timer isn't already set before starting the timer. Signed-off-by: Ewout van Bekkum Signed-off-by: Havard Skinnemoen --- arch/x86/kernel/cpu/mcheck/mce.c | 11 +-- 1 file chang

[PATCH 1/6] x86-mce: Modify CMCI poll interval to adjust for small check_interval values.

2014-07-09 Thread Havard Skinnemoen
leaving the CMCI_STORM_ACTIVE state. Signed-off-by: Ewout van Bekkum Signed-off-by: Havard Skinnemoen --- arch/x86/kernel/cpu/mcheck/mce-internal.h | 1 + arch/x86/kernel/cpu/mcheck/mce.c | 5 + arch/x86/kernel/cpu/mcheck/mce_intel.c| 15 +++ 3 files changed, 17

[PATCH 4/6] x86-mce: Add spinlocks to prevent duplicated MCP and CMCI reports.

2014-07-09 Thread Havard Skinnemoen
. The status is reread after the lock is acquired in case the MCE was already handled by a different thread. A unique spinlock is used per bank number, so contention should be mostly limited to non-shared banks. Signed-off-by: Ewout van Bekkum Signed-off-by: Havard Skinnemoen --- arch/x86/kernel

[PATCH 5/6] x86-mce: check if no_way_out applies before deciding not to clear MCE banks.

2014-07-09 Thread Havard Skinnemoen
. The sanity check was updated to check if the system has no_way_out and that no_way_out is relevant (tolerant level is less than 3). Signed-off-by: Ewout van Bekkum Signed-off-by: Havard Skinnemoen --- arch/x86/kernel/cpu/mcheck/mce.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff

[PATCH 0/6] x86 mce fixes

2014-07-09 Thread Havard Skinnemoen
hat we got wrong. Ewout did all the leg work in getting this implemented and tested, while I've been providing advice and reviews. Signed-off-by: Ewout van Bekkum Signed-off-by: Havard Skinnemoen Ewout van Bekkum (6): x86-mce: Modify CMCI poll interval to adjust for small check_interval

[PATCH 3/6] x86-mce: Clear CMCI enable on all claimed CMCI banks before reboot.

2014-07-09 Thread Havard Skinnemoen
claimed. Signed-off-by: Ewout van Bekkum Signed-off-by: Havard Skinnemoen --- arch/x86/kernel/cpu/mcheck/mce_intel.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c index d015daf

Re: [PATCH 04/10] net/macb: Fix a race in macb_start_xmit()

2012-09-06 Thread Havard Skinnemoen
On Wed, Sep 5, 2012 at 11:30 PM, David Miller wrote: > From: Nicolas Ferre > Date: Wed, 5 Sep 2012 10:19:11 +0200 > >> From: Havard Skinnemoen >> >> Fix a race in macb_start_xmit() where we unconditionally set the TSTART bit. >> If an underrun just happened (we

Re: [PATCH] Fixes for dw_dmac and atmel-mci for AP700x

2012-08-21 Thread Havard Skinnemoen
On Tue, Aug 21, 2012 at 1:31 AM, Arnd Bergmann wrote: > On Tuesday 21 August 2012, Viresh Kumar wrote: >> > Is AVR32 a big-endian system? Probably big-endian, that's why values are >> > > getting >> > > swapped. And dw_dmac is the standard one, can call it little endian for >> > the >> > > time be