On Fri, Jul 11, 2014 at 1:22 PM, Borislav Petkov wrote:
> On Fri, Jul 11, 2014 at 11:56:11AM -0700, Havard Skinnemoen wrote:
>> > * max number of CMCIs per second a system can sustain fine, i.e. the 100
>> > above
>>
>> What's the definition of "fine&qu
On Fri, Jul 11, 2014 at 12:52 PM, Borislav Petkov wrote:
> On Fri, Jul 11, 2014 at 12:06:40PM -0700, Tony Luck wrote:
>> > + if (atomic_add_unless(&mce_banks[i].poll_reader, 1, 1)) {
>> > + m.status = mce_rdmsrl(MSR_IA32_MCx_STATUS(i));
>>
>> Same as yesterday.
On Fri, Jul 11, 2014 at 1:36 PM, Borislav Petkov wrote:
> On Fri, Jul 11, 2014 at 11:56:11AM -0700, Havard Skinnemoen wrote:
>> > Basically the scheme becomes the following:
>> >
>> > * We switch to polling if we detect a second CMCI under an interval X
>> >
On Fri, Jul 11, 2014 at 1:10 PM, Borislav Petkov wrote:
> I'm going to reply with multiple mails so that we can keep the things
> separate and not let replies grow out of proportion.
>
> On Fri, Jul 11, 2014 at 11:56:11AM -0700, Havard Skinnemoen wrote:
>> So a short burst
On Fri, Jul 11, 2014 at 8:35 AM, Borislav Petkov wrote:
> So, with roughly few hundred CMCIs per second, we can be generous and
> say we can handle 100 CMCIs per second just fine. Which would mean, if
> the CMCI handler takes 10ms, with 100 CMCIs per second, we spend the
> whole time handling CMCI
On Thu, Jul 10, 2014 at 11:55 AM, Tony Luck wrote:
> On Thu, Jul 10, 2014 at 10:51 AM, Havard Skinnemoen
> wrote:
>> What's the typical interrupt rate during a storm? We should make it
>> significantly less frequent than that, otherwise there's no point
>> swit
On Thu, Jul 10, 2014 at 8:51 AM, Borislav Petkov wrote:
>
> On Wed, Jul 09, 2014 at 02:34:39PM -0700, Havard Skinnemoen wrote:
> > On Wed, Jul 9, 2014 at 1:20 PM, Luck, Tony wrote:
> > >> The CMCI storm handler previously called cmci_reenable() when exiting a
> >
On Thu, Jul 10, 2014 at 9:41 AM, Borislav Petkov wrote:
> On Wed, Jul 09, 2014 at 10:09:24AM -0700, Havard Skinnemoen wrote:
>> @@ -617,14 +620,28 @@ void machine_check_poll(enum mcp_flags flags,
>> mce_banks_t *b)
>>
>> this_cp
On Thu, Jul 10, 2014 at 9:24 AM, Borislav Petkov wrote:
> We don't need all that atomicity special fun if we register the reboot
> notifier on the BSP, say from mcheck_init() which is done even pre-SMP.
>
> If that's too early, we can add an initcall or whatever...
OK, will see if I can find a be
On Thu, Jul 10, 2014 at 4:42 AM, Borislav Petkov wrote:
> + linux-edac.
>
> On Wed, Jul 09, 2014 at 02:24:31PM -0700, Havard Skinnemoen wrote:
>> > > The CMCI poll interval was updated to pick the minimum interval between
>> > > the original 30 seconds and the che
On Thu, Jul 10, 2014 at 2:01 AM, Chen, Gong wrote:
> On Wed, Jul 09, 2014 at 02:24:31PM -0700, Havard Skinnemoen wrote:
>> On Wed, Jul 9, 2014 at 12:17 PM, Borislav Petkov wrote:
>> > Why min 3 polls? How do you come up with exactly that frequency?
>>
>> The idea
On Wed, Jul 9, 2014 at 2:04 PM, Luck, Tony wrote:
> + /* Ensure a CMCI interrupt can't preempt this. */
> + local_irq_save(flags);
> if (mce_available(__this_cpu_ptr(&cpu_info))) {
> machine_check_poll(MCP_TIMESTAMP,
> &__get_cpu_
On Wed, Jul 9, 2014 at 2:00 PM, Luck, Tony wrote:
> + if (!(no_way_out && cfg->tolerant < 3))
> mce_clear_state(toclear);
>
> Style - I think this is easier to grok:
>
> if (!no_way_out || cfg->tolerant >=3)
> mce_clear_state(toclear);
>
> but not too
On Wed, Jul 9, 2014 at 1:47 PM, Luck, Tony wrote:
> if (!(flags & MCP_UC) &&
> - (m.status & (mca_cfg.ser ? MCI_STATUS_S : MCI_STATUS_UC)))
> + (m.status & (mca_cfg.ser ? MCI_STATUS_S :
> MCI_STATUS_UC))) {
> + spin_unlock_
On Wed, Jul 9, 2014 at 1:35 PM, Andi Kleen wrote:
> Havard Skinnemoen writes:
>
>> machine_check_poll() was modified to use spin_lock_irqsave independently
>> per bank when a valid MCE is found to prevent duplicated MCE reports by
>> the CMCI and polling methods. In the
On Wed, Jul 9, 2014 at 1:36 PM, Luck, Tony wrote:
> + if (!xchg(&reboot_notifier_registered, true))
> + register_reboot_notifier(&cmci_reboot_notifier);
>
> This is super-safe ... but isn't the xchg() overkill? I thought we serialized
> bringup
> of other cpus.
Could be. Ther
On Wed, Jul 9, 2014 at 1:20 PM, Luck, Tony wrote:
>> The CMCI storm handler previously called cmci_reenable() when exiting a
>> CMCI storm. However, when entering a CMCI storm the bank ownership was
>> not relinquished by the affected CPUs. The CMCIs were only disabled via
>> cmci_storm_disable_ba
On Wed, Jul 9, 2014 at 12:17 PM, Borislav Petkov wrote:
>
> On Wed, Jul 09, 2014 at 10:09:21AM -0700, Havard Skinnemoen wrote:
> > From: Ewout van Bekkum
> >
> > The CMCI poll interval was updated to pick the minimum interval between
> > the original 30 seconds and
instead call a
new function, cmci_storm_enable_banks(), to reenable CMCI on the already
owned banks instead of rediscovering CMCI banks (which were still owned
but disabled).
Signed-off-by: Ewout van Bekkum
Signed-off-by: Havard Skinnemoen
---
arch/x86/kernel/cpu/mcheck/mce_intel.c | 50
condition
was resolved by disabling interrupts during the mce_timer_fn() function
and by verifying the timer isn't already set before starting the timer.
Signed-off-by: Ewout van Bekkum
Signed-off-by: Havard Skinnemoen
---
arch/x86/kernel/cpu/mcheck/mce.c | 11 +--
1 file chang
leaving
the CMCI_STORM_ACTIVE state.
Signed-off-by: Ewout van Bekkum
Signed-off-by: Havard Skinnemoen
---
arch/x86/kernel/cpu/mcheck/mce-internal.h | 1 +
arch/x86/kernel/cpu/mcheck/mce.c | 5 +
arch/x86/kernel/cpu/mcheck/mce_intel.c| 15 +++
3 files changed, 17
. The status is
reread after the lock is acquired in case the MCE was already handled by
a different thread. A unique spinlock is used per bank number, so
contention should be mostly limited to non-shared banks.
Signed-off-by: Ewout van Bekkum
Signed-off-by: Havard Skinnemoen
---
arch/x86/kernel
. The
sanity check was updated to check if the system has no_way_out and that
no_way_out is relevant (tolerant level is less than 3).
Signed-off-by: Ewout van Bekkum
Signed-off-by: Havard Skinnemoen
---
arch/x86/kernel/cpu/mcheck/mce.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff
hat we got wrong.
Ewout did all the leg work in getting this implemented and tested, while I've
been providing advice and reviews.
Signed-off-by: Ewout van Bekkum
Signed-off-by: Havard Skinnemoen
Ewout van Bekkum (6):
x86-mce: Modify CMCI poll interval to adjust for small check_interval
claimed.
Signed-off-by: Ewout van Bekkum
Signed-off-by: Havard Skinnemoen
---
arch/x86/kernel/cpu/mcheck/mce_intel.c | 30 ++
1 file changed, 30 insertions(+)
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c
b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index d015daf
On Wed, Sep 5, 2012 at 11:30 PM, David Miller wrote:
> From: Nicolas Ferre
> Date: Wed, 5 Sep 2012 10:19:11 +0200
>
>> From: Havard Skinnemoen
>>
>> Fix a race in macb_start_xmit() where we unconditionally set the TSTART bit.
>> If an underrun just happened (we
On Tue, Aug 21, 2012 at 1:31 AM, Arnd Bergmann wrote:
> On Tuesday 21 August 2012, Viresh Kumar wrote:
>> > Is AVR32 a big-endian system? Probably big-endian, that's why values are
>> > > getting
>> > > swapped. And dw_dmac is the standard one, can call it little endian for
>> > the
>> > > time be
27 matches
Mail list logo