Re: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions
On Tue, 2 Sep 2014, Linus Torvalds wrote: > On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt > wrote: > > > > I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601 for > > this subject with full description. > > There is also a short fix patch for kernel/hrtimer.c file. > > Even if this bug occurs rary, however it resolves system hard lockup option. > > The patch is whitespace-damaged, but with a small oneliner like this > that doesn't much matter (the timer files moved to kernel/time/ during > this merge window, so the patch wouldn't apply as-is anyway). > > It needs a sign-off (see Documentation/SubmittingPatches), but even > more importantly it needs to go to the right people for > double-checking. > > But the patch is more broken than whitespace and even lack of > sign-off. It cannot even have compiled. I'm assuming "timer_state" was > intended to be "timer->state". Also, every caller but one already has > "HRTIMER_STATE_CALLBACK" set unconditionally or to the old state in > "newstate", so I suspect if this patch is the real fix (which I'll > leave for Thomas to comment more on), afaik the actual problem can > only happen through migrate_hrtimer_list() which uconditionally sets > the whole state to HRTIMER_STATE_MIGRATE. > > Thomas? Leaving damaged patch quoted below. Right. It's been fixed long ago and the migrate path cannot suffer from this problem because at this point a callback running on the dead cpu would cause the BUG_ON(hrtimer_callback_running(timer)); a few lines above the remove_hrtimer() call to trigger and send the machine into lala land. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions
Indeed, my crash dump occurred on kernel 2.6.30 (RH 6.5 distribution) and in this version function remove_hrtimer doesn't preserve "HRTIMER_STATE_CALLBACK" flag and triggers hard lockup. So this bug was already fixed in version 2.6.35 to set the flag by remove_hrtimer function. Seems that migrate_hrtimer_list case isn't a problem because it is called when old CPU is already dead. I'll fix the bug report to resolved. Thanks Itzcak Pechtalt -Original Message- From: linus...@gmail.com [mailto:linus...@gmail.com] On Behalf Of Linus Torvalds Sent: Tuesday, September 02, 2014 7:09 PM To: Itzcak Pechtalt; Thomas Gleixner Cc: linux-kernel@vger.kernel.org Subject: Re: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt wrote: > > I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601 for this > subject with full description. > There is also a short fix patch for kernel/hrtimer.c file. > Even if this bug occurs rary, however it resolves system hard lockup option. The patch is whitespace-damaged, but with a small oneliner like this that doesn't much matter (the timer files moved to kernel/time/ during this merge window, so the patch wouldn't apply as-is anyway). It needs a sign-off (see Documentation/SubmittingPatches), but even more importantly it needs to go to the right people for double-checking. But the patch is more broken than whitespace and even lack of sign-off. It cannot even have compiled. I'm assuming "timer_state" was intended to be "timer->state". Also, every caller but one already has "HRTIMER_STATE_CALLBACK" set unconditionally or to the old state in "newstate", so I suspect if this patch is the real fix (which I'll leave for Thomas to comment more on), afaik the actual problem can only happen through migrate_hrtimer_list() which uconditionally sets the whole state to HRTIMER_STATE_MIGRATE. Thomas? Leaving damaged patch quoted below. Linus > I suspect that it was targeted by mistake to not active list > (timers_realtime-cl...@kernel-bugs.osdl.org). > Following is the fix patch based on kernel 3.16.1 (just simple): > diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c > --- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300 > +++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300 > @@ -941,7 +941,7 @@ > if (!timerqueue_getnext(>active)) > base->cpu_base->active_bases &= ~(1 << base->index); > out: > - timer->state = newstate; > + timer->state = (newstate | (timer_state & HRTIMER_STATE_CALLBACK)); > } > > /* > > Is there a chance for this patch fix to insert into next kernel release? > > Thanks > > Itzcak Pechtalt > N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions
On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt wrote: > > I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601 for this > subject with full description. > There is also a short fix patch for kernel/hrtimer.c file. > Even if this bug occurs rary, however it resolves system hard lockup option. The patch is whitespace-damaged, but with a small oneliner like this that doesn't much matter (the timer files moved to kernel/time/ during this merge window, so the patch wouldn't apply as-is anyway). It needs a sign-off (see Documentation/SubmittingPatches), but even more importantly it needs to go to the right people for double-checking. But the patch is more broken than whitespace and even lack of sign-off. It cannot even have compiled. I'm assuming "timer_state" was intended to be "timer->state". Also, every caller but one already has "HRTIMER_STATE_CALLBACK" set unconditionally or to the old state in "newstate", so I suspect if this patch is the real fix (which I'll leave for Thomas to comment more on), afaik the actual problem can only happen through migrate_hrtimer_list() which uconditionally sets the whole state to HRTIMER_STATE_MIGRATE. Thomas? Leaving damaged patch quoted below. Linus > I suspect that it was targeted by mistake to not active list > (timers_realtime-cl...@kernel-bugs.osdl.org). > Following is the fix patch based on kernel 3.16.1 (just simple): > diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c > --- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300 > +++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300 > @@ -941,7 +941,7 @@ > if (!timerqueue_getnext(>active)) > base->cpu_base->active_bases &= ~(1 << base->index); > out: > - timer->state = newstate; > + timer->state = (newstate | (timer_state & HRTIMER_STATE_CALLBACK)); > } > > /* > > Is there a chance for this patch fix to insert into next kernel release? > > Thanks > > Itzcak Pechtalt > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Race condition in HR timers that cause double insertion and hard lockup -- all latest versions
Hi, I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601 for this subject with full description. There is also a short fix patch for kernel/hrtimer.c file. Even if this bug occurs rary, however it resolves system hard lockup option. I suspect that it was targeted by mistake to not active list (timers_realtime-cl...@kernel-bugs.osdl.org). Following is the fix patch based on kernel 3.16.1 (just simple): diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c --- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300 +++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300 @@ -941,7 +941,7 @@ if (!timerqueue_getnext(>active)) base->cpu_base->active_bases &= ~(1 << base->index); out: - timer->state = newstate; + timer->state = (newstate | (timer_state & HRTIMER_STATE_CALLBACK)); } /* Is there a chance for this patch fix to insert into next kernel release? Thanks Itzcak Pechtalt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Race condition in HR timers that cause double insertion and hard lockup -- all latest versions
Hi, I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601 for this subject with full description. There is also a short fix patch for kernel/hrtimer.c file. Even if this bug occurs rary, however it resolves system hard lockup option. I suspect that it was targeted by mistake to not active list (timers_realtime-cl...@kernel-bugs.osdl.org). Following is the fix patch based on kernel 3.16.1 (just simple): diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c --- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300 +++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300 @@ -941,7 +941,7 @@ if (!timerqueue_getnext(base-active)) base-cpu_base-active_bases = ~(1 base-index); out: - timer-state = newstate; + timer-state = (newstate | (timer_state HRTIMER_STATE_CALLBACK)); } /* Is there a chance for this patch fix to insert into next kernel release? Thanks Itzcak Pechtalt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions
On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt itz...@flashnetworks.com wrote: I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601 for this subject with full description. There is also a short fix patch for kernel/hrtimer.c file. Even if this bug occurs rary, however it resolves system hard lockup option. The patch is whitespace-damaged, but with a small oneliner like this that doesn't much matter (the timer files moved to kernel/time/ during this merge window, so the patch wouldn't apply as-is anyway). It needs a sign-off (see Documentation/SubmittingPatches), but even more importantly it needs to go to the right people for double-checking. But the patch is more broken than whitespace and even lack of sign-off. It cannot even have compiled. I'm assuming timer_state was intended to be timer-state. Also, every caller but one already has HRTIMER_STATE_CALLBACK set unconditionally or to the old state in newstate, so I suspect if this patch is the real fix (which I'll leave for Thomas to comment more on), afaik the actual problem can only happen through migrate_hrtimer_list() which uconditionally sets the whole state to HRTIMER_STATE_MIGRATE. Thomas? Leaving damaged patch quoted below. Linus I suspect that it was targeted by mistake to not active list (timers_realtime-cl...@kernel-bugs.osdl.org). Following is the fix patch based on kernel 3.16.1 (just simple): diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c --- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300 +++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300 @@ -941,7 +941,7 @@ if (!timerqueue_getnext(base-active)) base-cpu_base-active_bases = ~(1 base-index); out: - timer-state = newstate; + timer-state = (newstate | (timer_state HRTIMER_STATE_CALLBACK)); } /* Is there a chance for this patch fix to insert into next kernel release? Thanks Itzcak Pechtalt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions
Indeed, my crash dump occurred on kernel 2.6.30 (RH 6.5 distribution) and in this version function remove_hrtimer doesn't preserve HRTIMER_STATE_CALLBACK flag and triggers hard lockup. So this bug was already fixed in version 2.6.35 to set the flag by remove_hrtimer function. Seems that migrate_hrtimer_list case isn't a problem because it is called when old CPU is already dead. I'll fix the bug report to resolved. Thanks Itzcak Pechtalt -Original Message- From: linus...@gmail.com [mailto:linus...@gmail.com] On Behalf Of Linus Torvalds Sent: Tuesday, September 02, 2014 7:09 PM To: Itzcak Pechtalt; Thomas Gleixner Cc: linux-kernel@vger.kernel.org Subject: Re: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt itz...@flashnetworks.com wrote: I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601 for this subject with full description. There is also a short fix patch for kernel/hrtimer.c file. Even if this bug occurs rary, however it resolves system hard lockup option. The patch is whitespace-damaged, but with a small oneliner like this that doesn't much matter (the timer files moved to kernel/time/ during this merge window, so the patch wouldn't apply as-is anyway). It needs a sign-off (see Documentation/SubmittingPatches), but even more importantly it needs to go to the right people for double-checking. But the patch is more broken than whitespace and even lack of sign-off. It cannot even have compiled. I'm assuming timer_state was intended to be timer-state. Also, every caller but one already has HRTIMER_STATE_CALLBACK set unconditionally or to the old state in newstate, so I suspect if this patch is the real fix (which I'll leave for Thomas to comment more on), afaik the actual problem can only happen through migrate_hrtimer_list() which uconditionally sets the whole state to HRTIMER_STATE_MIGRATE. Thomas? Leaving damaged patch quoted below. Linus I suspect that it was targeted by mistake to not active list (timers_realtime-cl...@kernel-bugs.osdl.org). Following is the fix patch based on kernel 3.16.1 (just simple): diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c --- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300 +++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300 @@ -941,7 +941,7 @@ if (!timerqueue_getnext(base-active)) base-cpu_base-active_bases = ~(1 base-index); out: - timer-state = newstate; + timer-state = (newstate | (timer_state HRTIMER_STATE_CALLBACK)); } /* Is there a chance for this patch fix to insert into next kernel release? Thanks Itzcak Pechtalt N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions
On Tue, 2 Sep 2014, Linus Torvalds wrote: On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt itz...@flashnetworks.com wrote: I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601 for this subject with full description. There is also a short fix patch for kernel/hrtimer.c file. Even if this bug occurs rary, however it resolves system hard lockup option. The patch is whitespace-damaged, but with a small oneliner like this that doesn't much matter (the timer files moved to kernel/time/ during this merge window, so the patch wouldn't apply as-is anyway). It needs a sign-off (see Documentation/SubmittingPatches), but even more importantly it needs to go to the right people for double-checking. But the patch is more broken than whitespace and even lack of sign-off. It cannot even have compiled. I'm assuming timer_state was intended to be timer-state. Also, every caller but one already has HRTIMER_STATE_CALLBACK set unconditionally or to the old state in newstate, so I suspect if this patch is the real fix (which I'll leave for Thomas to comment more on), afaik the actual problem can only happen through migrate_hrtimer_list() which uconditionally sets the whole state to HRTIMER_STATE_MIGRATE. Thomas? Leaving damaged patch quoted below. Right. It's been fixed long ago and the migrate path cannot suffer from this problem because at this point a callback running on the dead cpu would cause the BUG_ON(hrtimer_callback_running(timer)); a few lines above the remove_hrtimer() call to trigger and send the machine into lala land. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/