Re: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions

2014-09-02 Thread Thomas Gleixner
On Tue, 2 Sep 2014, Linus Torvalds wrote:

> On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt
>  wrote:
> >
> > I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601  for 
> > this subject with full description.
> > There is also a short fix patch for kernel/hrtimer.c file.
> > Even if this bug occurs rary, however it resolves system hard lockup option.
> 
> The patch is whitespace-damaged, but with a small oneliner like this
> that doesn't much matter (the timer files moved to kernel/time/ during
> this merge window, so the patch wouldn't apply as-is anyway).
> 
> It needs a sign-off (see Documentation/SubmittingPatches), but even
> more importantly it needs to go to the right people for
> double-checking.
> 
> But the patch is more broken than whitespace and even lack of
> sign-off. It cannot even have compiled. I'm assuming "timer_state" was
> intended to be "timer->state". Also, every caller but one already has
> "HRTIMER_STATE_CALLBACK" set unconditionally or to the old state in
> "newstate", so I suspect if this patch is the real fix (which I'll
> leave for Thomas to comment more on), afaik the actual problem can
> only happen through migrate_hrtimer_list() which uconditionally sets
> the whole state to HRTIMER_STATE_MIGRATE.
> 
> Thomas? Leaving damaged patch quoted below.

Right. It's been fixed long ago and the migrate path cannot suffer
from this problem because at this point a callback running on the dead
cpu would cause the 

BUG_ON(hrtimer_callback_running(timer));

a few lines above the remove_hrtimer() call to trigger and send the
machine into lala land.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions

2014-09-02 Thread Itzcak Pechtalt
Indeed,  my crash dump occurred on kernel 2.6.30 (RH 6.5 distribution) and in 
this version function remove_hrtimer  doesn't preserve "HRTIMER_STATE_CALLBACK" 
flag and triggers hard lockup.
So this bug was already fixed in version 2.6.35 to set the flag by 
remove_hrtimer function.

Seems that  migrate_hrtimer_list case isn't a problem because it is called when 
old CPU is already dead.

I'll fix the bug report to resolved.

Thanks

Itzcak Pechtalt



-Original Message-
From: linus...@gmail.com [mailto:linus...@gmail.com] On Behalf Of Linus Torvalds
Sent: Tuesday, September 02, 2014 7:09 PM
To: Itzcak Pechtalt; Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Subject: Re: Race condition in HR timers that cause double insertion and hard 
lockup -- all latest versions

On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt  
wrote:
>
> I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601  for this 
> subject with full description.
> There is also a short fix patch for kernel/hrtimer.c file.
> Even if this bug occurs rary, however it resolves system hard lockup option.

The patch is whitespace-damaged, but with a small oneliner like this that 
doesn't much matter (the timer files moved to kernel/time/ during this merge 
window, so the patch wouldn't apply as-is anyway).

It needs a sign-off (see Documentation/SubmittingPatches), but even more 
importantly it needs to go to the right people for double-checking.

But the patch is more broken than whitespace and even lack of sign-off. It 
cannot even have compiled. I'm assuming "timer_state" was intended to be 
"timer->state". Also, every caller but one already has "HRTIMER_STATE_CALLBACK" 
set unconditionally or to the old state in "newstate", so I suspect if this 
patch is the real fix (which I'll leave for Thomas to comment more on), afaik 
the actual problem can only happen through migrate_hrtimer_list() which 
uconditionally sets the whole state to HRTIMER_STATE_MIGRATE.

Thomas? Leaving damaged patch quoted below.

   Linus

> I suspect that it was targeted by mistake to not active list 
> (timers_realtime-cl...@kernel-bugs.osdl.org).
> Following is the fix patch based on kernel 3.16.1 (just simple):
> diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c
> --- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300
> +++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300
> @@ -941,7 +941,7 @@
> if (!timerqueue_getnext(>active))
> base->cpu_base->active_bases &= ~(1 << base->index);
> out:
> - timer->state = newstate;
> + timer->state = (newstate | (timer_state & HRTIMER_STATE_CALLBACK));
> }
>
> /*
>
> Is there a chance for this patch fix to insert into next kernel release?
>
> Thanks
>
> Itzcak Pechtalt
>
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions

2014-09-02 Thread Linus Torvalds
On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt
 wrote:
>
> I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601  for this 
> subject with full description.
> There is also a short fix patch for kernel/hrtimer.c file.
> Even if this bug occurs rary, however it resolves system hard lockup option.

The patch is whitespace-damaged, but with a small oneliner like this
that doesn't much matter (the timer files moved to kernel/time/ during
this merge window, so the patch wouldn't apply as-is anyway).

It needs a sign-off (see Documentation/SubmittingPatches), but even
more importantly it needs to go to the right people for
double-checking.

But the patch is more broken than whitespace and even lack of
sign-off. It cannot even have compiled. I'm assuming "timer_state" was
intended to be "timer->state". Also, every caller but one already has
"HRTIMER_STATE_CALLBACK" set unconditionally or to the old state in
"newstate", so I suspect if this patch is the real fix (which I'll
leave for Thomas to comment more on), afaik the actual problem can
only happen through migrate_hrtimer_list() which uconditionally sets
the whole state to HRTIMER_STATE_MIGRATE.

Thomas? Leaving damaged patch quoted below.

   Linus

> I suspect that it was targeted by mistake to not active list 
> (timers_realtime-cl...@kernel-bugs.osdl.org).
> Following is the fix patch based on kernel 3.16.1 (just simple):
> diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c
> --- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300
> +++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300
> @@ -941,7 +941,7 @@
> if (!timerqueue_getnext(>active))
> base->cpu_base->active_bases &= ~(1 << base->index);
> out:
> - timer->state = newstate;
> + timer->state = (newstate | (timer_state & HRTIMER_STATE_CALLBACK));
> }
>
> /*
>
> Is there a chance for this patch fix to insert into next kernel release?
>
> Thanks
>
> Itzcak Pechtalt
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Race condition in HR timers that cause double insertion and hard lockup -- all latest versions

2014-09-02 Thread Itzcak Pechtalt
Hi,

I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601  for this 
subject with full description.
There is also a short fix patch for kernel/hrtimer.c file.
Even if this bug occurs rary, however it resolves system hard lockup option.

I suspect that it was targeted by mistake to not active list 
(timers_realtime-cl...@kernel-bugs.osdl.org).
Following is the fix patch based on kernel 3.16.1 (just simple):
diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c
--- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300
+++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300
@@ -941,7 +941,7 @@
if (!timerqueue_getnext(>active))
base->cpu_base->active_bases &= ~(1 << base->index);
out:
- timer->state = newstate;
+ timer->state = (newstate | (timer_state & HRTIMER_STATE_CALLBACK));
}

/*

Is there a chance for this patch fix to insert into next kernel release?

Thanks

Itzcak Pechtalt

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Race condition in HR timers that cause double insertion and hard lockup -- all latest versions

2014-09-02 Thread Itzcak Pechtalt
Hi,

I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601  for this 
subject with full description.
There is also a short fix patch for kernel/hrtimer.c file.
Even if this bug occurs rary, however it resolves system hard lockup option.

I suspect that it was targeted by mistake to not active list 
(timers_realtime-cl...@kernel-bugs.osdl.org).
Following is the fix patch based on kernel 3.16.1 (just simple):
diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c
--- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300
+++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300
@@ -941,7 +941,7 @@
if (!timerqueue_getnext(base-active))
base-cpu_base-active_bases = ~(1  base-index);
out:
- timer-state = newstate;
+ timer-state = (newstate | (timer_state  HRTIMER_STATE_CALLBACK));
}

/*

Is there a chance for this patch fix to insert into next kernel release?

Thanks

Itzcak Pechtalt

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions

2014-09-02 Thread Linus Torvalds
On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt
itz...@flashnetworks.com wrote:

 I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601  for this 
 subject with full description.
 There is also a short fix patch for kernel/hrtimer.c file.
 Even if this bug occurs rary, however it resolves system hard lockup option.

The patch is whitespace-damaged, but with a small oneliner like this
that doesn't much matter (the timer files moved to kernel/time/ during
this merge window, so the patch wouldn't apply as-is anyway).

It needs a sign-off (see Documentation/SubmittingPatches), but even
more importantly it needs to go to the right people for
double-checking.

But the patch is more broken than whitespace and even lack of
sign-off. It cannot even have compiled. I'm assuming timer_state was
intended to be timer-state. Also, every caller but one already has
HRTIMER_STATE_CALLBACK set unconditionally or to the old state in
newstate, so I suspect if this patch is the real fix (which I'll
leave for Thomas to comment more on), afaik the actual problem can
only happen through migrate_hrtimer_list() which uconditionally sets
the whole state to HRTIMER_STATE_MIGRATE.

Thomas? Leaving damaged patch quoted below.

   Linus

 I suspect that it was targeted by mistake to not active list 
 (timers_realtime-cl...@kernel-bugs.osdl.org).
 Following is the fix patch based on kernel 3.16.1 (just simple):
 diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c
 --- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300
 +++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300
 @@ -941,7 +941,7 @@
 if (!timerqueue_getnext(base-active))
 base-cpu_base-active_bases = ~(1  base-index);
 out:
 - timer-state = newstate;
 + timer-state = (newstate | (timer_state  HRTIMER_STATE_CALLBACK));
 }

 /*

 Is there a chance for this patch fix to insert into next kernel release?

 Thanks

 Itzcak Pechtalt

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions

2014-09-02 Thread Itzcak Pechtalt
Indeed,  my crash dump occurred on kernel 2.6.30 (RH 6.5 distribution) and in 
this version function remove_hrtimer  doesn't preserve HRTIMER_STATE_CALLBACK 
flag and triggers hard lockup.
So this bug was already fixed in version 2.6.35 to set the flag by 
remove_hrtimer function.

Seems that  migrate_hrtimer_list case isn't a problem because it is called when 
old CPU is already dead.

I'll fix the bug report to resolved.

Thanks

Itzcak Pechtalt



-Original Message-
From: linus...@gmail.com [mailto:linus...@gmail.com] On Behalf Of Linus Torvalds
Sent: Tuesday, September 02, 2014 7:09 PM
To: Itzcak Pechtalt; Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Subject: Re: Race condition in HR timers that cause double insertion and hard 
lockup -- all latest versions

On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt itz...@flashnetworks.com 
wrote:

 I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601  for this 
 subject with full description.
 There is also a short fix patch for kernel/hrtimer.c file.
 Even if this bug occurs rary, however it resolves system hard lockup option.

The patch is whitespace-damaged, but with a small oneliner like this that 
doesn't much matter (the timer files moved to kernel/time/ during this merge 
window, so the patch wouldn't apply as-is anyway).

It needs a sign-off (see Documentation/SubmittingPatches), but even more 
importantly it needs to go to the right people for double-checking.

But the patch is more broken than whitespace and even lack of sign-off. It 
cannot even have compiled. I'm assuming timer_state was intended to be 
timer-state. Also, every caller but one already has HRTIMER_STATE_CALLBACK 
set unconditionally or to the old state in newstate, so I suspect if this 
patch is the real fix (which I'll leave for Thomas to comment more on), afaik 
the actual problem can only happen through migrate_hrtimer_list() which 
uconditionally sets the whole state to HRTIMER_STATE_MIGRATE.

Thomas? Leaving damaged patch quoted below.

   Linus

 I suspect that it was targeted by mistake to not active list 
 (timers_realtime-cl...@kernel-bugs.osdl.org).
 Following is the fix patch based on kernel 3.16.1 (just simple):
 diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c
 --- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300
 +++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300
 @@ -941,7 +941,7 @@
 if (!timerqueue_getnext(base-active))
 base-cpu_base-active_bases = ~(1  base-index);
 out:
 - timer-state = newstate;
 + timer-state = (newstate | (timer_state  HRTIMER_STATE_CALLBACK));
 }

 /*

 Is there a chance for this patch fix to insert into next kernel release?

 Thanks

 Itzcak Pechtalt

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions

2014-09-02 Thread Thomas Gleixner
On Tue, 2 Sep 2014, Linus Torvalds wrote:

 On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt
 itz...@flashnetworks.com wrote:
 
  I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601  for 
  this subject with full description.
  There is also a short fix patch for kernel/hrtimer.c file.
  Even if this bug occurs rary, however it resolves system hard lockup option.
 
 The patch is whitespace-damaged, but with a small oneliner like this
 that doesn't much matter (the timer files moved to kernel/time/ during
 this merge window, so the patch wouldn't apply as-is anyway).
 
 It needs a sign-off (see Documentation/SubmittingPatches), but even
 more importantly it needs to go to the right people for
 double-checking.
 
 But the patch is more broken than whitespace and even lack of
 sign-off. It cannot even have compiled. I'm assuming timer_state was
 intended to be timer-state. Also, every caller but one already has
 HRTIMER_STATE_CALLBACK set unconditionally or to the old state in
 newstate, so I suspect if this patch is the real fix (which I'll
 leave for Thomas to comment more on), afaik the actual problem can
 only happen through migrate_hrtimer_list() which uconditionally sets
 the whole state to HRTIMER_STATE_MIGRATE.
 
 Thomas? Leaving damaged patch quoted below.

Right. It's been fixed long ago and the migrate path cannot suffer
from this problem because at this point a callback running on the dead
cpu would cause the 

BUG_ON(hrtimer_callback_running(timer));

a few lines above the remove_hrtimer() call to trigger and send the
machine into lala land.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/