Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Tue, Nov 20, 2007 at 03:15:38PM -0800, David Miller wrote: > From: Ingo Molnar <[EMAIL PROTECTED]> > Date: Tue, 20 Nov 2007 22:49:27 +0100 > > > > > * Greg KH <[EMAIL PROTECTED]> wrote: > > > > > On Tue, Nov 20, 2007 at 09:39:19PM +0100, Ingo Molnar wrote: > > > > > > > > * Greg KH <[EMAIL PROTECTED]> wrote: > > > > > > > > > > but we only have cpu_clock() from v2.6.23 onwards - so we should > > > > > > not > > > > > > apply the original patch to v2.6.22. (we should not have applied > > > > > > your patch that started the mess to begin with - but that's another > > > > > > matter.) > > > > > > > > > > Well, I can easily back that one out, if that is easier than adding 2 > > > > > more patches to try to fix up the mess here. > > > > > > > > > > Let me know if you feel that would be best. > > > > > > > > i'd leave it alone - doing that we have in essence the softlockup > > > > detector turned off. Reverting to the older version might trigger false > > > > positives that need the new stuff. > > > > > > Ok, I'll see if the current round of patches fix up everyone > > > complaints :) > > > > so just to reiterate, to make sure we have the same plans: lets leave > > v2.6.22 and earlier kernels alone - and lets strive for the latest > > patches and code for v2.6.23 (and v2.6.24, evidently). > > I've validated that those patches make 2.6.23 behave on my > Niagara box. Great, thanks for testing and letting us know! greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
* David Miller <[EMAIL PROTECTED]> wrote: > > so just to reiterate, to make sure we have the same plans: lets > > leave v2.6.22 and earlier kernels alone - and lets strive for the > > latest patches and code for v2.6.23 (and v2.6.24, evidently). > > I've validated that those patches make 2.6.23 behave on my Niagara > box. Great and thanks for testing it! Arjan noticed the shortness of the 1 sec sleep too, my suggestion would be to increase the sleep period to ~50 seconds and the detection threshold to 60 seconds - that should be large enough - instead of complicating the tick code even more. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
From: Ingo Molnar <[EMAIL PROTECTED]> Date: Tue, 20 Nov 2007 22:49:27 +0100 > > * Greg KH <[EMAIL PROTECTED]> wrote: > > > On Tue, Nov 20, 2007 at 09:39:19PM +0100, Ingo Molnar wrote: > > > > > > * Greg KH <[EMAIL PROTECTED]> wrote: > > > > > > > > but we only have cpu_clock() from v2.6.23 onwards - so we should not > > > > > apply the original patch to v2.6.22. (we should not have applied > > > > > your patch that started the mess to begin with - but that's another > > > > > matter.) > > > > > > > > Well, I can easily back that one out, if that is easier than adding 2 > > > > more patches to try to fix up the mess here. > > > > > > > > Let me know if you feel that would be best. > > > > > > i'd leave it alone - doing that we have in essence the softlockup > > > detector turned off. Reverting to the older version might trigger false > > > positives that need the new stuff. > > > > Ok, I'll see if the current round of patches fix up everyone > > complaints :) > > so just to reiterate, to make sure we have the same plans: lets leave > v2.6.22 and earlier kernels alone - and lets strive for the latest > patches and code for v2.6.23 (and v2.6.24, evidently). I've validated that those patches make 2.6.23 behave on my Niagara box. Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Tue, Nov 20, 2007 at 10:49:27PM +0100, Ingo Molnar wrote: > > * Greg KH <[EMAIL PROTECTED]> wrote: > > > On Tue, Nov 20, 2007 at 09:39:19PM +0100, Ingo Molnar wrote: > > > > > > * Greg KH <[EMAIL PROTECTED]> wrote: > > > > > > > > but we only have cpu_clock() from v2.6.23 onwards - so we should not > > > > > apply the original patch to v2.6.22. (we should not have applied > > > > > your patch that started the mess to begin with - but that's another > > > > > matter.) > > > > > > > > Well, I can easily back that one out, if that is easier than adding 2 > > > > more patches to try to fix up the mess here. > > > > > > > > Let me know if you feel that would be best. > > > > > > i'd leave it alone - doing that we have in essence the softlockup > > > detector turned off. Reverting to the older version might trigger false > > > positives that need the new stuff. > > > > Ok, I'll see if the current round of patches fix up everyone > > complaints :) > > so just to reiterate, to make sure we have the same plans: lets leave > v2.6.22 and earlier kernels alone - and lets strive for the latest > patches and code for v2.6.23 (and v2.6.24, evidently). Yes, that sounds fine to me. thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
* Greg KH <[EMAIL PROTECTED]> wrote: > On Tue, Nov 20, 2007 at 09:39:19PM +0100, Ingo Molnar wrote: > > > > * Greg KH <[EMAIL PROTECTED]> wrote: > > > > > > but we only have cpu_clock() from v2.6.23 onwards - so we should not > > > > apply the original patch to v2.6.22. (we should not have applied > > > > your patch that started the mess to begin with - but that's another > > > > matter.) > > > > > > Well, I can easily back that one out, if that is easier than adding 2 > > > more patches to try to fix up the mess here. > > > > > > Let me know if you feel that would be best. > > > > i'd leave it alone - doing that we have in essence the softlockup > > detector turned off. Reverting to the older version might trigger false > > positives that need the new stuff. > > Ok, I'll see if the current round of patches fix up everyone > complaints :) so just to reiterate, to make sure we have the same plans: lets leave v2.6.22 and earlier kernels alone - and lets strive for the latest patches and code for v2.6.23 (and v2.6.24, evidently). Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Tue, Nov 20, 2007 at 09:39:19PM +0100, Ingo Molnar wrote: > > * Greg KH <[EMAIL PROTECTED]> wrote: > > > > but we only have cpu_clock() from v2.6.23 onwards - so we should not > > > apply the original patch to v2.6.22. (we should not have applied > > > your patch that started the mess to begin with - but that's another > > > matter.) > > > > Well, I can easily back that one out, if that is easier than adding 2 > > more patches to try to fix up the mess here. > > > > Let me know if you feel that would be best. > > i'd leave it alone - doing that we have in essence the softlockup > detector turned off. Reverting to the older version might trigger false > positives that need the new stuff. Ok, I'll see if the current round of patches fix up everyone complaints :) thanks for sending these, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
* Greg KH <[EMAIL PROTECTED]> wrote: > > but we only have cpu_clock() from v2.6.23 onwards - so we should not > > apply the original patch to v2.6.22. (we should not have applied > > your patch that started the mess to begin with - but that's another > > matter.) > > Well, I can easily back that one out, if that is easier than adding 2 > more patches to try to fix up the mess here. > > Let me know if you feel that would be best. i'd leave it alone - doing that we have in essence the softlockup detector turned off. Reverting to the older version might trigger false positives that need the new stuff. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Tue, Nov 20, 2007 at 07:05:25AM +0100, Ingo Molnar wrote: > > * Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > > > Greg KH wrote: > > > Can you try applying the patch below to see if that solves the problem > > > for you? > > > > > > > I don't think this patch will help; it only has cosmetic changes in > > addition to the original message printing fix. I think it also needs > > change a3b13c23f186ecb57204580cc1f2dbe9c284953a: > > > > diff -r 79f0ea1e0e70 -r 06f060ab58aa kernel/softlockup.c > > yes, it does need the cpu_clock() changes as i mentioned. > > commit a3b13c23f186ecb57204580cc1f2dbe9c284953a > Author: Ingo Molnar <[EMAIL PROTECTED]> > Date: Tue Oct 16 23:26:06 2007 -0700 > > softlockup: use cpu_clock() instead of sched_clock() > > sched_clock() is not a reliable time-source, use cpu_clock() instead. > > but we only have cpu_clock() from v2.6.23 onwards - so we should not > apply the original patch to v2.6.22. (we should not have applied your > patch that started the mess to begin with - but that's another matter.) Well, I can easily back that one out, if that is easier than adding 2 more patches to try to fix up the mess here. Let me know if you feel that would be best. thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Tue, Nov 20, 2007 at 07:08:08AM +0100, Ingo Molnar wrote: > > * Chuck Ebbert <[EMAIL PROTECTED]> wrote: > > > On 11/17/2007 07:55 PM, Ingo Molnar wrote: > > > * Greg KH <[EMAIL PROTECTED]> wrote: > > > > > >> Great, thanks for tracking this down. > > >> > > >> Ingo, this corrisponds to changeset > > >> a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline. Is that patch > > >> incorrect? Should this patch in the -stable tree be reverted? > > > > > > hm, there are no such problems in .24 and the cpu_clock() and other > > > fixes i did were not picked up. Find the missing fixes below. They > > > should work just fine in .23 as it has the cpu_clock() functionality > > > too. > > > > > > [ NOTE: the most robust thing is to make the .23 version match the .24 > > > version of kernel/softlockup.c, so i included two other harmless > > > changes in this diff as well. ] > > > > > > Ingo > > > > > > ---> > > > commit a5f2ce3c6024a5bb895647b6bd88ecae5001020a > > > Author: Ingo Molnar <[EMAIL PROTECTED]> > > > Date: Tue Oct 16 23:26:08 2007 -0700 > > > > > > commit 43581a10075492445f65234384210492ff333eba > > > Author: Ingo Molnar <[EMAIL PROTECTED]> > > > Date: Tue Oct 16 23:26:08 2007 -0700 > > > > Those are just cosmetic / cleanup changes. > > > > Don't you need commit a3b13c23f186ecb57204580cc1f2dbe9c284953a ?? > > yes: > > > > [...] the cpu_clock() and other fixes i did were not picked up. > > i just forgot to attach the cpu_clock() changes - they are in a3b13c23. Ok, I've now added that patch too :) Hopefully this is all straightened out now, I'll go cut a -rc for the next stable so people can test... thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Tue, Nov 20, 2007 at 07:05:25AM +0100, Ingo Molnar wrote: * Jeremy Fitzhardinge [EMAIL PROTECTED] wrote: Greg KH wrote: Can you try applying the patch below to see if that solves the problem for you? I don't think this patch will help; it only has cosmetic changes in addition to the original message printing fix. I think it also needs change a3b13c23f186ecb57204580cc1f2dbe9c284953a: diff -r 79f0ea1e0e70 -r 06f060ab58aa kernel/softlockup.c yes, it does need the cpu_clock() changes as i mentioned. commit a3b13c23f186ecb57204580cc1f2dbe9c284953a Author: Ingo Molnar [EMAIL PROTECTED] Date: Tue Oct 16 23:26:06 2007 -0700 softlockup: use cpu_clock() instead of sched_clock() sched_clock() is not a reliable time-source, use cpu_clock() instead. but we only have cpu_clock() from v2.6.23 onwards - so we should not apply the original patch to v2.6.22. (we should not have applied your patch that started the mess to begin with - but that's another matter.) Well, I can easily back that one out, if that is easier than adding 2 more patches to try to fix up the mess here. Let me know if you feel that would be best. thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Tue, Nov 20, 2007 at 07:08:08AM +0100, Ingo Molnar wrote: * Chuck Ebbert [EMAIL PROTECTED] wrote: On 11/17/2007 07:55 PM, Ingo Molnar wrote: * Greg KH [EMAIL PROTECTED] wrote: Great, thanks for tracking this down. Ingo, this corrisponds to changeset a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline. Is that patch incorrect? Should this patch in the -stable tree be reverted? hm, there are no such problems in .24 and the cpu_clock() and other fixes i did were not picked up. Find the missing fixes below. They should work just fine in .23 as it has the cpu_clock() functionality too. [ NOTE: the most robust thing is to make the .23 version match the .24 version of kernel/softlockup.c, so i included two other harmless changes in this diff as well. ] Ingo --- commit a5f2ce3c6024a5bb895647b6bd88ecae5001020a Author: Ingo Molnar [EMAIL PROTECTED] Date: Tue Oct 16 23:26:08 2007 -0700 commit 43581a10075492445f65234384210492ff333eba Author: Ingo Molnar [EMAIL PROTECTED] Date: Tue Oct 16 23:26:08 2007 -0700 Those are just cosmetic / cleanup changes. Don't you need commit a3b13c23f186ecb57204580cc1f2dbe9c284953a ?? yes: [...] the cpu_clock() and other fixes i did were not picked up. i just forgot to attach the cpu_clock() changes - they are in a3b13c23. Ok, I've now added that patch too :) Hopefully this is all straightened out now, I'll go cut a -rc for the next stable so people can test... thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
* Greg KH [EMAIL PROTECTED] wrote: but we only have cpu_clock() from v2.6.23 onwards - so we should not apply the original patch to v2.6.22. (we should not have applied your patch that started the mess to begin with - but that's another matter.) Well, I can easily back that one out, if that is easier than adding 2 more patches to try to fix up the mess here. Let me know if you feel that would be best. i'd leave it alone - doing that we have in essence the softlockup detector turned off. Reverting to the older version might trigger false positives that need the new stuff. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Tue, Nov 20, 2007 at 09:39:19PM +0100, Ingo Molnar wrote: * Greg KH [EMAIL PROTECTED] wrote: but we only have cpu_clock() from v2.6.23 onwards - so we should not apply the original patch to v2.6.22. (we should not have applied your patch that started the mess to begin with - but that's another matter.) Well, I can easily back that one out, if that is easier than adding 2 more patches to try to fix up the mess here. Let me know if you feel that would be best. i'd leave it alone - doing that we have in essence the softlockup detector turned off. Reverting to the older version might trigger false positives that need the new stuff. Ok, I'll see if the current round of patches fix up everyone complaints :) thanks for sending these, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
* Greg KH [EMAIL PROTECTED] wrote: On Tue, Nov 20, 2007 at 09:39:19PM +0100, Ingo Molnar wrote: * Greg KH [EMAIL PROTECTED] wrote: but we only have cpu_clock() from v2.6.23 onwards - so we should not apply the original patch to v2.6.22. (we should not have applied your patch that started the mess to begin with - but that's another matter.) Well, I can easily back that one out, if that is easier than adding 2 more patches to try to fix up the mess here. Let me know if you feel that would be best. i'd leave it alone - doing that we have in essence the softlockup detector turned off. Reverting to the older version might trigger false positives that need the new stuff. Ok, I'll see if the current round of patches fix up everyone complaints :) so just to reiterate, to make sure we have the same plans: lets leave v2.6.22 and earlier kernels alone - and lets strive for the latest patches and code for v2.6.23 (and v2.6.24, evidently). Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Tue, Nov 20, 2007 at 10:49:27PM +0100, Ingo Molnar wrote: * Greg KH [EMAIL PROTECTED] wrote: On Tue, Nov 20, 2007 at 09:39:19PM +0100, Ingo Molnar wrote: * Greg KH [EMAIL PROTECTED] wrote: but we only have cpu_clock() from v2.6.23 onwards - so we should not apply the original patch to v2.6.22. (we should not have applied your patch that started the mess to begin with - but that's another matter.) Well, I can easily back that one out, if that is easier than adding 2 more patches to try to fix up the mess here. Let me know if you feel that would be best. i'd leave it alone - doing that we have in essence the softlockup detector turned off. Reverting to the older version might trigger false positives that need the new stuff. Ok, I'll see if the current round of patches fix up everyone complaints :) so just to reiterate, to make sure we have the same plans: lets leave v2.6.22 and earlier kernels alone - and lets strive for the latest patches and code for v2.6.23 (and v2.6.24, evidently). Yes, that sounds fine to me. thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
From: Ingo Molnar [EMAIL PROTECTED] Date: Tue, 20 Nov 2007 22:49:27 +0100 * Greg KH [EMAIL PROTECTED] wrote: On Tue, Nov 20, 2007 at 09:39:19PM +0100, Ingo Molnar wrote: * Greg KH [EMAIL PROTECTED] wrote: but we only have cpu_clock() from v2.6.23 onwards - so we should not apply the original patch to v2.6.22. (we should not have applied your patch that started the mess to begin with - but that's another matter.) Well, I can easily back that one out, if that is easier than adding 2 more patches to try to fix up the mess here. Let me know if you feel that would be best. i'd leave it alone - doing that we have in essence the softlockup detector turned off. Reverting to the older version might trigger false positives that need the new stuff. Ok, I'll see if the current round of patches fix up everyone complaints :) so just to reiterate, to make sure we have the same plans: lets leave v2.6.22 and earlier kernels alone - and lets strive for the latest patches and code for v2.6.23 (and v2.6.24, evidently). I've validated that those patches make 2.6.23 behave on my Niagara box. Thanks. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
* David Miller [EMAIL PROTECTED] wrote: so just to reiterate, to make sure we have the same plans: lets leave v2.6.22 and earlier kernels alone - and lets strive for the latest patches and code for v2.6.23 (and v2.6.24, evidently). I've validated that those patches make 2.6.23 behave on my Niagara box. Great and thanks for testing it! Arjan noticed the shortness of the 1 sec sleep too, my suggestion would be to increase the sleep period to ~50 seconds and the detection threshold to 60 seconds - that should be large enough - instead of complicating the tick code even more. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Tue, Nov 20, 2007 at 03:15:38PM -0800, David Miller wrote: From: Ingo Molnar [EMAIL PROTECTED] Date: Tue, 20 Nov 2007 22:49:27 +0100 * Greg KH [EMAIL PROTECTED] wrote: On Tue, Nov 20, 2007 at 09:39:19PM +0100, Ingo Molnar wrote: * Greg KH [EMAIL PROTECTED] wrote: but we only have cpu_clock() from v2.6.23 onwards - so we should not apply the original patch to v2.6.22. (we should not have applied your patch that started the mess to begin with - but that's another matter.) Well, I can easily back that one out, if that is easier than adding 2 more patches to try to fix up the mess here. Let me know if you feel that would be best. i'd leave it alone - doing that we have in essence the softlockup detector turned off. Reverting to the older version might trigger false positives that need the new stuff. Ok, I'll see if the current round of patches fix up everyone complaints :) so just to reiterate, to make sure we have the same plans: lets leave v2.6.22 and earlier kernels alone - and lets strive for the latest patches and code for v2.6.23 (and v2.6.24, evidently). I've validated that those patches make 2.6.23 behave on my Niagara box. Great, thanks for testing and letting us know! greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
* Chuck Ebbert <[EMAIL PROTECTED]> wrote: > On 11/17/2007 07:55 PM, Ingo Molnar wrote: > > * Greg KH <[EMAIL PROTECTED]> wrote: > > > >> Great, thanks for tracking this down. > >> > >> Ingo, this corrisponds to changeset > >> a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline. Is that patch > >> incorrect? Should this patch in the -stable tree be reverted? > > > > hm, there are no such problems in .24 and the cpu_clock() and other > > fixes i did were not picked up. Find the missing fixes below. They > > should work just fine in .23 as it has the cpu_clock() functionality > > too. > > > > [ NOTE: the most robust thing is to make the .23 version match the .24 > > version of kernel/softlockup.c, so i included two other harmless > > changes in this diff as well. ] > > > > Ingo > > > > ---> > > commit a5f2ce3c6024a5bb895647b6bd88ecae5001020a > > Author: Ingo Molnar <[EMAIL PROTECTED]> > > Date: Tue Oct 16 23:26:08 2007 -0700 > > > > commit 43581a10075492445f65234384210492ff333eba > > Author: Ingo Molnar <[EMAIL PROTECTED]> > > Date: Tue Oct 16 23:26:08 2007 -0700 > > Those are just cosmetic / cleanup changes. > > Don't you need commit a3b13c23f186ecb57204580cc1f2dbe9c284953a ?? yes: > > [...] the cpu_clock() and other fixes i did were not picked up. i just forgot to attach the cpu_clock() changes - they are in a3b13c23. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
* Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > Greg KH wrote: > > Can you try applying the patch below to see if that solves the problem > > for you? > > > > I don't think this patch will help; it only has cosmetic changes in > addition to the original message printing fix. I think it also needs > change a3b13c23f186ecb57204580cc1f2dbe9c284953a: > > diff -r 79f0ea1e0e70 -r 06f060ab58aa kernel/softlockup.c yes, it does need the cpu_clock() changes as i mentioned. commit a3b13c23f186ecb57204580cc1f2dbe9c284953a Author: Ingo Molnar <[EMAIL PROTECTED]> Date: Tue Oct 16 23:26:06 2007 -0700 softlockup: use cpu_clock() instead of sched_clock() sched_clock() is not a reliable time-source, use cpu_clock() instead. but we only have cpu_clock() from v2.6.23 onwards - so we should not apply the original patch to v2.6.22. (we should not have applied your patch that started the mess to begin with - but that's another matter.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
Greg KH wrote: > Can you try applying the patch below to see if that solves the problem > for you? > I don't think this patch will help; it only has cosmetic changes in addition to the original message printing fix. I think it also needs change a3b13c23f186ecb57204580cc1f2dbe9c284953a: diff -r 79f0ea1e0e70 -r 06f060ab58aa kernel/softlockup.c --- a/kernel/softlockup.c Tue Oct 09 21:00:40 2007 + +++ b/kernel/softlockup.c Wed Oct 17 08:42:46 2007 -0700 @@ -40,14 +40,16 @@ static struct notifier_block panic_block * resolution, and we don't need to waste time with a big divide when * 2^30ns == 1.074s. */ -static unsigned long get_timestamp(void) +static unsigned long get_timestamp(int this_cpu) { - return sched_clock() >> 30; /* 2^30 ~= 10^9 */ + return cpu_clock(this_cpu) >> 30; /* 2^30 ~= 10^9 */ } void touch_softlockup_watchdog(void) { - __raw_get_cpu_var(touch_timestamp) = get_timestamp(); + int this_cpu = raw_smp_processor_id(); + + __raw_get_cpu_var(touch_timestamp) = get_timestamp(this_cpu); } EXPORT_SYMBOL(touch_softlockup_watchdog); @@ -91,7 +93,7 @@ void softlockup_tick(void) return; } - now = get_timestamp(); + now = get_timestamp(this_cpu); /* Wake up the high-prio watchdog task every second: */ if (now > (touch_timestamp + 1)) J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On 11/17/2007 07:55 PM, Ingo Molnar wrote: > * Greg KH <[EMAIL PROTECTED]> wrote: > >> Great, thanks for tracking this down. >> >> Ingo, this corrisponds to changeset >> a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline. Is that patch >> incorrect? Should this patch in the -stable tree be reverted? > > hm, there are no such problems in .24 and the cpu_clock() and other > fixes i did were not picked up. Find the missing fixes below. They > should work just fine in .23 as it has the cpu_clock() functionality > too. > > [ NOTE: the most robust thing is to make the .23 version match the .24 > version of kernel/softlockup.c, so i included two other harmless > changes in this diff as well. ] > > Ingo > > ---> > commit a5f2ce3c6024a5bb895647b6bd88ecae5001020a > Author: Ingo Molnar <[EMAIL PROTECTED]> > Date: Tue Oct 16 23:26:08 2007 -0700 > > commit 43581a10075492445f65234384210492ff333eba > Author: Ingo Molnar <[EMAIL PROTECTED]> > Date: Tue Oct 16 23:26:08 2007 -0700 Those are just cosmetic / cleanup changes. Don't you need commit a3b13c23f186ecb57204580cc1f2dbe9c284953a ?? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Sat, Nov 17, 2007 at 04:34:56PM -0800, Jeremy Fitzhardinge wrote: > Greg KH wrote: > > Great, thanks for tracking this down. > > > > Ingo, this corrisponds to changeset > > a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline. Is that patch > > incorrect? Should this patch in the -stable tree be reverted? > > > > Hm, I've never observed a problem with this in mainline. > > Ah. The significant difference between 2.6.23 and -git is that the > former used sched_clock as the softlockup timebase, versus cpu_clock in > git. If sched_clock() is tsc-based, and the tsc isn't stable when using > cpufreq, then the softlockup with get confused and fire spuriously. > Ingo's fix to reporting exposed the fact that softlockup is terminally > broken in that kernel. > > I think the best course for now is to revert it, since softlockup is > hardly a critical feature. The proper fixes would either be to backport > cpu_clock() to 2.6.23, or make it go back to using ticks. Can you try applying the patch below to see if that solves the problem for you? thanks, greg k-h - From: Ingo Molnar <[EMAIL PROTECTED]> Date: Sun, 18 Nov 2007 01:55:38 +0100 Subject: softlockup watchdog fixes and cleanups To: Greg KH <[EMAIL PROTECTED]> Cc: David <[EMAIL PROTECTED]>, Jeremy Fitzhardinge <[EMAIL PROTECTED]>, [EMAIL PROTECTED], Javier Kohen <[EMAIL PROTECTED]>, Andrew Morton <[EMAIL PROTECTED]>, linux-kernel@vger.kernel.org, [EMAIL PROTECTED] Message-ID: <[EMAIL PROTECTED]> Content-Disposition: inline From: Ingo Molnar <[EMAIL PROTECTED]> This is a merge of commits a5f2ce3c6024a5bb895647b6bd88ecae5001020a and 43581a10075492445f65234384210492ff333eba in mainline to fix a warning in the 2.6.23.3 kernel release. softlockup watchdog: style cleanups kernel/softirq.c grew a few style uncleanlinesses in the past few months, clean that up. No functional changes: textdata bss dec hex filename 1126 76 41206 4b6 softlockup.o.before 1129 76 41209 4b9 softlockup.o.after ( the 3 bytes .text increase is due to the "<1>" appended to one of the printk messages. ) Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]> softlockup: improve debug output Improve the debuggability of kernel lockups by enhancing the debug output of the softlockup detector: print the task that causes the lockup and try to print a more intelligent backtrace. The old format was: BUG: soft lockup detected on CPU#1! [] show_trace_log_lvl+0x19/0x2e [] show_trace+0x12/0x14 [] dump_stack+0x14/0x16 [] softlockup_tick+0xbe/0xd0 [] run_local_timers+0x12/0x14 [] update_process_times+0x3e/0x63 [] tick_sched_timer+0x7c/0xc0 [] hrtimer_interrupt+0x135/0x1ba [] smp_apic_timer_interrupt+0x6e/0x80 [] apic_timer_interrupt+0x33/0x38 [] syscall_call+0x7/0xb === The new format is: BUG: soft lockup detected on CPU#1! [prctl:2363] Pid: 2363, comm:prctl EIP: 0060:[] CPU: 1 EIP is at sys_prctl+0x24/0x18c EFLAGS: 0213Not tainted (2.6.22-cfs-v20 #26) EAX: 0001 EBX: 03e7 ECX: 0001 EDX: f6df ESI: 03e7 EDI: 03e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 06d0 [] show_trace_log_lvl+0x19/0x2e [] show_trace+0x12/0x14 [] show_regs+0x1ab/0x1b3 [] softlockup_tick+0xef/0x108 [] run_local_timers+0x12/0x14 [] update_process_times+0x3e/0x63 [] tick_sched_timer+0x7c/0xc0 [] hrtimer_interrupt+0x135/0x1ba [] smp_apic_timer_interrupt+0x6e/0x80 [] apic_timer_interrupt+0x33/0x38 [] syscall_call+0x7/0xb === Note that in the old format we only knew that some system call locked up, we didnt know _which_. With the new format we know that it's at a specific place in sys_prctl(). [which was where i created an artificial kernel lockup to test the new format.] This is also useful if the lockup happens in user-space - the user-space EIP (and other registers) will be printed too. (such a lockup would either suggest that the task was running at SCHED_FIFO:99 and looping for more than 10 seconds, or that the softlockup detector has a false-positive.) The task name is printed too first, just in case we dont manage to print a useful backtrace. [EMAIL PROTECTED]: fix warning] Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]> --- kernel/softlockup.c | 37 +++-- 1 file changed, 23 insertions(+), 14 deletions(-) --- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -15,13 +15,16 @@ #include #include +#include + static DEFINE_SPINLOCK(print_lock); static DEFINE_PER_CPU(unsigned long, touch_timestamp); static DEFINE_PER_CPU(unsigned long, print_timestamp); static
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Sat, Nov 17, 2007 at 04:34:56PM -0800, Jeremy Fitzhardinge wrote: Greg KH wrote: Great, thanks for tracking this down. Ingo, this corrisponds to changeset a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline. Is that patch incorrect? Should this patch in the -stable tree be reverted? Hm, I've never observed a problem with this in mainline. Ah. The significant difference between 2.6.23 and -git is that the former used sched_clock as the softlockup timebase, versus cpu_clock in git. If sched_clock() is tsc-based, and the tsc isn't stable when using cpufreq, then the softlockup with get confused and fire spuriously. Ingo's fix to reporting exposed the fact that softlockup is terminally broken in that kernel. I think the best course for now is to revert it, since softlockup is hardly a critical feature. The proper fixes would either be to backport cpu_clock() to 2.6.23, or make it go back to using ticks. Can you try applying the patch below to see if that solves the problem for you? thanks, greg k-h - From: Ingo Molnar [EMAIL PROTECTED] Date: Sun, 18 Nov 2007 01:55:38 +0100 Subject: softlockup watchdog fixes and cleanups To: Greg KH [EMAIL PROTECTED] Cc: David [EMAIL PROTECTED], Jeremy Fitzhardinge [EMAIL PROTECTED], [EMAIL PROTECTED], Javier Kohen [EMAIL PROTECTED], Andrew Morton [EMAIL PROTECTED], linux-kernel@vger.kernel.org, [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] Content-Disposition: inline From: Ingo Molnar [EMAIL PROTECTED] This is a merge of commits a5f2ce3c6024a5bb895647b6bd88ecae5001020a and 43581a10075492445f65234384210492ff333eba in mainline to fix a warning in the 2.6.23.3 kernel release. softlockup watchdog: style cleanups kernel/softirq.c grew a few style uncleanlinesses in the past few months, clean that up. No functional changes: textdata bss dec hex filename 1126 76 41206 4b6 softlockup.o.before 1129 76 41209 4b9 softlockup.o.after ( the 3 bytes .text increase is due to the 1 appended to one of the printk messages. ) Signed-off-by: Ingo Molnar [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] Signed-off-by: Linus Torvalds [EMAIL PROTECTED] softlockup: improve debug output Improve the debuggability of kernel lockups by enhancing the debug output of the softlockup detector: print the task that causes the lockup and try to print a more intelligent backtrace. The old format was: BUG: soft lockup detected on CPU#1! [c0105e4a] show_trace_log_lvl+0x19/0x2e [c0105f43] show_trace+0x12/0x14 [c0105f59] dump_stack+0x14/0x16 [c015f6bc] softlockup_tick+0xbe/0xd0 [c013457d] run_local_timers+0x12/0x14 [c01346b8] update_process_times+0x3e/0x63 [c0145fb8] tick_sched_timer+0x7c/0xc0 [c0140a75] hrtimer_interrupt+0x135/0x1ba [c011bde7] smp_apic_timer_interrupt+0x6e/0x80 [c0105aa3] apic_timer_interrupt+0x33/0x38 [c0104f8a] syscall_call+0x7/0xb === The new format is: BUG: soft lockup detected on CPU#1! [prctl:2363] Pid: 2363, comm:prctl EIP: 0060:[c013915f] CPU: 1 EIP is at sys_prctl+0x24/0x18c EFLAGS: 0213Not tainted (2.6.22-cfs-v20 #26) EAX: 0001 EBX: 03e7 ECX: 0001 EDX: f6df ESI: 03e7 EDI: 03e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 06d0 [c0105e4a] show_trace_log_lvl+0x19/0x2e [c0105f43] show_trace+0x12/0x14 [c01040be] show_regs+0x1ab/0x1b3 [c015f807] softlockup_tick+0xef/0x108 [c013457d] run_local_timers+0x12/0x14 [c01346b8] update_process_times+0x3e/0x63 [c0145fcc] tick_sched_timer+0x7c/0xc0 [c0140a89] hrtimer_interrupt+0x135/0x1ba [c011bde7] smp_apic_timer_interrupt+0x6e/0x80 [c0105aa3] apic_timer_interrupt+0x33/0x38 [c0104f8a] syscall_call+0x7/0xb === Note that in the old format we only knew that some system call locked up, we didnt know _which_. With the new format we know that it's at a specific place in sys_prctl(). [which was where i created an artificial kernel lockup to test the new format.] This is also useful if the lockup happens in user-space - the user-space EIP (and other registers) will be printed too. (such a lockup would either suggest that the task was running at SCHED_FIFO:99 and looping for more than 10 seconds, or that the softlockup detector has a false-positive.) The task name is printed too first, just in case we dont manage to print a useful backtrace. [EMAIL PROTECTED]: fix warning] Signed-off-by: Ingo Molnar [EMAIL PROTECTED] Signed-off-by: Satyam Sharma [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] Signed-off-by: Linus Torvalds [EMAIL PROTECTED] Signed-off-by: Greg Kroah-Hartman [EMAIL PROTECTED] --- kernel/softlockup.c | 37 +++-- 1 file changed, 23 insertions(+), 14 deletions(-) --- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -15,13 +15,16 @@ #include linux/notifier.h #include linux/module.h +#include asm/irq_regs.h +
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On 11/17/2007 07:55 PM, Ingo Molnar wrote: * Greg KH [EMAIL PROTECTED] wrote: Great, thanks for tracking this down. Ingo, this corrisponds to changeset a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline. Is that patch incorrect? Should this patch in the -stable tree be reverted? hm, there are no such problems in .24 and the cpu_clock() and other fixes i did were not picked up. Find the missing fixes below. They should work just fine in .23 as it has the cpu_clock() functionality too. [ NOTE: the most robust thing is to make the .23 version match the .24 version of kernel/softlockup.c, so i included two other harmless changes in this diff as well. ] Ingo --- commit a5f2ce3c6024a5bb895647b6bd88ecae5001020a Author: Ingo Molnar [EMAIL PROTECTED] Date: Tue Oct 16 23:26:08 2007 -0700 commit 43581a10075492445f65234384210492ff333eba Author: Ingo Molnar [EMAIL PROTECTED] Date: Tue Oct 16 23:26:08 2007 -0700 Those are just cosmetic / cleanup changes. Don't you need commit a3b13c23f186ecb57204580cc1f2dbe9c284953a ?? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
Greg KH wrote: Can you try applying the patch below to see if that solves the problem for you? I don't think this patch will help; it only has cosmetic changes in addition to the original message printing fix. I think it also needs change a3b13c23f186ecb57204580cc1f2dbe9c284953a: diff -r 79f0ea1e0e70 -r 06f060ab58aa kernel/softlockup.c --- a/kernel/softlockup.c Tue Oct 09 21:00:40 2007 + +++ b/kernel/softlockup.c Wed Oct 17 08:42:46 2007 -0700 @@ -40,14 +40,16 @@ static struct notifier_block panic_block * resolution, and we don't need to waste time with a big divide when * 2^30ns == 1.074s. */ -static unsigned long get_timestamp(void) +static unsigned long get_timestamp(int this_cpu) { - return sched_clock() 30; /* 2^30 ~= 10^9 */ + return cpu_clock(this_cpu) 30; /* 2^30 ~= 10^9 */ } void touch_softlockup_watchdog(void) { - __raw_get_cpu_var(touch_timestamp) = get_timestamp(); + int this_cpu = raw_smp_processor_id(); + + __raw_get_cpu_var(touch_timestamp) = get_timestamp(this_cpu); } EXPORT_SYMBOL(touch_softlockup_watchdog); @@ -91,7 +93,7 @@ void softlockup_tick(void) return; } - now = get_timestamp(); + now = get_timestamp(this_cpu); /* Wake up the high-prio watchdog task every second: */ if (now (touch_timestamp + 1)) J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
* Jeremy Fitzhardinge [EMAIL PROTECTED] wrote: Greg KH wrote: Can you try applying the patch below to see if that solves the problem for you? I don't think this patch will help; it only has cosmetic changes in addition to the original message printing fix. I think it also needs change a3b13c23f186ecb57204580cc1f2dbe9c284953a: diff -r 79f0ea1e0e70 -r 06f060ab58aa kernel/softlockup.c yes, it does need the cpu_clock() changes as i mentioned. commit a3b13c23f186ecb57204580cc1f2dbe9c284953a Author: Ingo Molnar [EMAIL PROTECTED] Date: Tue Oct 16 23:26:06 2007 -0700 softlockup: use cpu_clock() instead of sched_clock() sched_clock() is not a reliable time-source, use cpu_clock() instead. but we only have cpu_clock() from v2.6.23 onwards - so we should not apply the original patch to v2.6.22. (we should not have applied your patch that started the mess to begin with - but that's another matter.) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
* Chuck Ebbert [EMAIL PROTECTED] wrote: On 11/17/2007 07:55 PM, Ingo Molnar wrote: * Greg KH [EMAIL PROTECTED] wrote: Great, thanks for tracking this down. Ingo, this corrisponds to changeset a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline. Is that patch incorrect? Should this patch in the -stable tree be reverted? hm, there are no such problems in .24 and the cpu_clock() and other fixes i did were not picked up. Find the missing fixes below. They should work just fine in .23 as it has the cpu_clock() functionality too. [ NOTE: the most robust thing is to make the .23 version match the .24 version of kernel/softlockup.c, so i included two other harmless changes in this diff as well. ] Ingo --- commit a5f2ce3c6024a5bb895647b6bd88ecae5001020a Author: Ingo Molnar [EMAIL PROTECTED] Date: Tue Oct 16 23:26:08 2007 -0700 commit 43581a10075492445f65234384210492ff333eba Author: Ingo Molnar [EMAIL PROTECTED] Date: Tue Oct 16 23:26:08 2007 -0700 Those are just cosmetic / cleanup changes. Don't you need commit a3b13c23f186ecb57204580cc1f2dbe9c284953a ?? yes: [...] the cpu_clock() and other fixes i did were not picked up. i just forgot to attach the cpu_clock() changes - they are in a3b13c23. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
* Greg KH <[EMAIL PROTECTED]> wrote: > Great, thanks for tracking this down. > > Ingo, this corrisponds to changeset > a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline. Is that patch > incorrect? Should this patch in the -stable tree be reverted? hm, there are no such problems in .24 and the cpu_clock() and other fixes i did were not picked up. Find the missing fixes below. They should work just fine in .23 as it has the cpu_clock() functionality too. [ NOTE: the most robust thing is to make the .23 version match the .24 version of kernel/softlockup.c, so i included two other harmless changes in this diff as well. ] Ingo ---> commit a5f2ce3c6024a5bb895647b6bd88ecae5001020a Author: Ingo Molnar <[EMAIL PROTECTED]> Date: Tue Oct 16 23:26:08 2007 -0700 softlockup watchdog: style cleanups kernel/softirq.c grew a few style uncleanlinesses in the past few months, clean that up. No functional changes: textdata bss dec hex filename 1126 76 41206 4b6 softlockup.o.before 1129 76 41209 4b9 softlockup.o.after ( the 3 bytes .text increase is due to the "<1>" appended to one of the printk messages. ) Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]> commit 43581a10075492445f65234384210492ff333eba Author: Ingo Molnar <[EMAIL PROTECTED]> Date: Tue Oct 16 23:26:08 2007 -0700 softlockup: improve debug output Improve the debuggability of kernel lockups by enhancing the debug output of the softlockup detector: print the task that causes the lockup and try to print a more intelligent backtrace. The old format was: BUG: soft lockup detected on CPU#1! [] show_trace_log_lvl+0x19/0x2e [] show_trace+0x12/0x14 [] dump_stack+0x14/0x16 [] softlockup_tick+0xbe/0xd0 [] run_local_timers+0x12/0x14 [] update_process_times+0x3e/0x63 [] tick_sched_timer+0x7c/0xc0 [] hrtimer_interrupt+0x135/0x1ba [] smp_apic_timer_interrupt+0x6e/0x80 [] apic_timer_interrupt+0x33/0x38 [] syscall_call+0x7/0xb === The new format is: BUG: soft lockup detected on CPU#1! [prctl:2363] Pid: 2363, comm:prctl EIP: 0060:[] CPU: 1 EIP is at sys_prctl+0x24/0x18c EFLAGS: 0213Not tainted (2.6.22-cfs-v20 #26) EAX: 0001 EBX: 03e7 ECX: 0001 EDX: f6df ESI: 03e7 EDI: 03e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 06d0 [] show_trace_log_lvl+0x19/0x2e [] show_trace+0x12/0x14 [] show_regs+0x1ab/0x1b3 [] softlockup_tick+0xef/0x108 [] run_local_timers+0x12/0x14 [] update_process_times+0x3e/0x63 [] tick_sched_timer+0x7c/0xc0 [] hrtimer_interrupt+0x135/0x1ba [] smp_apic_timer_interrupt+0x6e/0x80 [] apic_timer_interrupt+0x33/0x38 [] syscall_call+0x7/0xb === Note that in the old format we only knew that some system call locked up, we didnt know _which_. With the new format we know that it's at a specific place in sys_prctl(). [which was where i created an artificial kernel lockup to test the new format.] This is also useful if the lockup happens in user-space - the user-space EIP (and other registers) will be printed too. (such a lockup would either suggest that the task was running at SCHED_FIFO:99 and looping for more than 10 seconds, or that the softlockup detector has a false-positive.) The task name is printed too first, just in case we dont manage to print a useful backtrace. [EMAIL PROTECTED]: fix warning] Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]> diff --git a/kernel/softlockup.c b/kernel/softlockup.c index e423b3a..11df812 100644 --- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -15,13 +15,16 @@ #include #include +#include + static DEFINE_SPINLOCK(print_lock); static DEFINE_PER_CPU(unsigned long, touch_timestamp); static DEFINE_PER_CPU(unsigned long, print_timestamp); static DEFINE_PER_CPU(struct task_struct *, watchdog_task); -static int did_panic = 0; +static int did_panic; +int softlockup_thresh = 10; static int softlock_panic(struct notifier_block *this, unsigned long event, void *ptr) @@ -72,6 +75,7 @@ void softlockup_tick(void) int this_cpu = smp_processor_id(); unsigned long touch_timestamp = per_cpu(touch_timestamp, this_cpu); unsigned long print_timestamp; + struct pt_regs *regs = get_irq_regs(); unsigned
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
Greg KH wrote: > Great, thanks for tracking this down. > > Ingo, this corrisponds to changeset > a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline. Is that patch > incorrect? Should this patch in the -stable tree be reverted? > Hm, I've never observed a problem with this in mainline. Ah. The significant difference between 2.6.23 and -git is that the former used sched_clock as the softlockup timebase, versus cpu_clock in git. If sched_clock() is tsc-based, and the tsc isn't stable when using cpufreq, then the softlockup with get confused and fire spuriously. Ingo's fix to reporting exposed the fact that softlockup is terminally broken in that kernel. I think the best course for now is to revert it, since softlockup is hardly a critical feature. The proper fixes would either be to backport cpu_clock() to 2.6.23, or make it go back to using ticks. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Sat, Nov 17, 2007 at 08:05:33PM +, David wrote: > Greg KH wrote: > > On Sat, Nov 17, 2007 at 07:21:35PM +0100, Javier Kohen wrote: > > > >> I upgraded today from 2.6.23 to 2.6.23.8 and started seeing a lot of > >> these in the logs: > >> > > > > Can you see if the problem showed up in 2.6.23.2 or .3 to help narrow > > this down? > > > This is the culprit, reverting fixes the issue. > > Cheers > David > > --- a/kernel/softlockup.c > +++ b/kernel/softlockup.c > @@ -80,10 +80,11 @@ void softlockup_tick(void) > print_timestamp = per_cpu(print_timestamp, this_cpu); > > /* report at most once a second */ > - if (print_timestamp < (touch_timestamp + 1) || > - did_panic || > - !per_cpu(watchdog_task, this_cpu)) > + if ((print_timestamp >= touch_timestamp && > + print_timestamp < (touch_timestamp + 1)) || > + did_panic || !per_cpu(watchdog_task, this_cpu)) { > return; > + } > > /* do not print during early bootup: */ > if (unlikely(system_state != SYSTEM_RUNNING)) { > Great, thanks for tracking this down. Ingo, this corrisponds to changeset a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline. Is that patch incorrect? Should this patch in the -stable tree be reverted? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
Greg KH wrote: > On Sat, Nov 17, 2007 at 07:21:35PM +0100, Javier Kohen wrote: > >> I upgraded today from 2.6.23 to 2.6.23.8 and started seeing a lot of >> these in the logs: >> > > Can you see if the problem showed up in 2.6.23.2 or .3 to help narrow > this down? > This is the culprit, reverting fixes the issue. Cheers David --- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -80,10 +80,11 @@ void softlockup_tick(void) print_timestamp = per_cpu(print_timestamp, this_cpu); /* report at most once a second */ - if (print_timestamp < (touch_timestamp + 1) || - did_panic || - !per_cpu(watchdog_task, this_cpu)) + if ((print_timestamp >= touch_timestamp && + print_timestamp < (touch_timestamp + 1)) || + did_panic || !per_cpu(watchdog_task, this_cpu)) { return; + } /* do not print during early bootup: */ if (unlikely(system_state != SYSTEM_RUNNING)) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Sat, Nov 17, 2007 at 07:21:35PM +0100, Javier Kohen wrote: > I upgraded today from 2.6.23 to 2.6.23.8 and started seeing a lot of > these in the logs: Can you see if the problem showed up in 2.6.23.2 or .3 to help narrow this down? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Sat, Nov 17, 2007 at 07:21:35PM +0100, Javier Kohen wrote: I upgraded today from 2.6.23 to 2.6.23.8 and started seeing a lot of these in the logs: Can you see if the problem showed up in 2.6.23.2 or .3 to help narrow this down? thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
Greg KH wrote: On Sat, Nov 17, 2007 at 07:21:35PM +0100, Javier Kohen wrote: I upgraded today from 2.6.23 to 2.6.23.8 and started seeing a lot of these in the logs: Can you see if the problem showed up in 2.6.23.2 or .3 to help narrow this down? This is the culprit, reverting fixes the issue. Cheers David --- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -80,10 +80,11 @@ void softlockup_tick(void) print_timestamp = per_cpu(print_timestamp, this_cpu); /* report at most once a second */ - if (print_timestamp (touch_timestamp + 1) || - did_panic || - !per_cpu(watchdog_task, this_cpu)) + if ((print_timestamp = touch_timestamp + print_timestamp (touch_timestamp + 1)) || + did_panic || !per_cpu(watchdog_task, this_cpu)) { return; + } /* do not print during early bootup: */ if (unlikely(system_state != SYSTEM_RUNNING)) { - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
On Sat, Nov 17, 2007 at 08:05:33PM +, David wrote: Greg KH wrote: On Sat, Nov 17, 2007 at 07:21:35PM +0100, Javier Kohen wrote: I upgraded today from 2.6.23 to 2.6.23.8 and started seeing a lot of these in the logs: Can you see if the problem showed up in 2.6.23.2 or .3 to help narrow this down? This is the culprit, reverting fixes the issue. Cheers David --- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -80,10 +80,11 @@ void softlockup_tick(void) print_timestamp = per_cpu(print_timestamp, this_cpu); /* report at most once a second */ - if (print_timestamp (touch_timestamp + 1) || - did_panic || - !per_cpu(watchdog_task, this_cpu)) + if ((print_timestamp = touch_timestamp + print_timestamp (touch_timestamp + 1)) || + did_panic || !per_cpu(watchdog_task, this_cpu)) { return; + } /* do not print during early bootup: */ if (unlikely(system_state != SYSTEM_RUNNING)) { Great, thanks for tracking this down. Ingo, this corrisponds to changeset a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline. Is that patch incorrect? Should this patch in the -stable tree be reverted? thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
Greg KH wrote: Great, thanks for tracking this down. Ingo, this corrisponds to changeset a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline. Is that patch incorrect? Should this patch in the -stable tree be reverted? Hm, I've never observed a problem with this in mainline. Ah. The significant difference between 2.6.23 and -git is that the former used sched_clock as the softlockup timebase, versus cpu_clock in git. If sched_clock() is tsc-based, and the tsc isn't stable when using cpufreq, then the softlockup with get confused and fire spuriously. Ingo's fix to reporting exposed the fact that softlockup is terminally broken in that kernel. I think the best course for now is to revert it, since softlockup is hardly a critical feature. The proper fixes would either be to backport cpu_clock() to 2.6.23, or make it go back to using ticks. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] Soft lockups since stable kernel upgrade to 2.6.23.8
* Greg KH [EMAIL PROTECTED] wrote: Great, thanks for tracking this down. Ingo, this corrisponds to changeset a115d5caca1a2905ba7a32b408a6042b20179aaa in mainline. Is that patch incorrect? Should this patch in the -stable tree be reverted? hm, there are no such problems in .24 and the cpu_clock() and other fixes i did were not picked up. Find the missing fixes below. They should work just fine in .23 as it has the cpu_clock() functionality too. [ NOTE: the most robust thing is to make the .23 version match the .24 version of kernel/softlockup.c, so i included two other harmless changes in this diff as well. ] Ingo --- commit a5f2ce3c6024a5bb895647b6bd88ecae5001020a Author: Ingo Molnar [EMAIL PROTECTED] Date: Tue Oct 16 23:26:08 2007 -0700 softlockup watchdog: style cleanups kernel/softirq.c grew a few style uncleanlinesses in the past few months, clean that up. No functional changes: textdata bss dec hex filename 1126 76 41206 4b6 softlockup.o.before 1129 76 41209 4b9 softlockup.o.after ( the 3 bytes .text increase is due to the 1 appended to one of the printk messages. ) Signed-off-by: Ingo Molnar [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] Signed-off-by: Linus Torvalds [EMAIL PROTECTED] commit 43581a10075492445f65234384210492ff333eba Author: Ingo Molnar [EMAIL PROTECTED] Date: Tue Oct 16 23:26:08 2007 -0700 softlockup: improve debug output Improve the debuggability of kernel lockups by enhancing the debug output of the softlockup detector: print the task that causes the lockup and try to print a more intelligent backtrace. The old format was: BUG: soft lockup detected on CPU#1! [c0105e4a] show_trace_log_lvl+0x19/0x2e [c0105f43] show_trace+0x12/0x14 [c0105f59] dump_stack+0x14/0x16 [c015f6bc] softlockup_tick+0xbe/0xd0 [c013457d] run_local_timers+0x12/0x14 [c01346b8] update_process_times+0x3e/0x63 [c0145fb8] tick_sched_timer+0x7c/0xc0 [c0140a75] hrtimer_interrupt+0x135/0x1ba [c011bde7] smp_apic_timer_interrupt+0x6e/0x80 [c0105aa3] apic_timer_interrupt+0x33/0x38 [c0104f8a] syscall_call+0x7/0xb === The new format is: BUG: soft lockup detected on CPU#1! [prctl:2363] Pid: 2363, comm:prctl EIP: 0060:[c013915f] CPU: 1 EIP is at sys_prctl+0x24/0x18c EFLAGS: 0213Not tainted (2.6.22-cfs-v20 #26) EAX: 0001 EBX: 03e7 ECX: 0001 EDX: f6df ESI: 03e7 EDI: 03e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 06d0 [c0105e4a] show_trace_log_lvl+0x19/0x2e [c0105f43] show_trace+0x12/0x14 [c01040be] show_regs+0x1ab/0x1b3 [c015f807] softlockup_tick+0xef/0x108 [c013457d] run_local_timers+0x12/0x14 [c01346b8] update_process_times+0x3e/0x63 [c0145fcc] tick_sched_timer+0x7c/0xc0 [c0140a89] hrtimer_interrupt+0x135/0x1ba [c011bde7] smp_apic_timer_interrupt+0x6e/0x80 [c0105aa3] apic_timer_interrupt+0x33/0x38 [c0104f8a] syscall_call+0x7/0xb === Note that in the old format we only knew that some system call locked up, we didnt know _which_. With the new format we know that it's at a specific place in sys_prctl(). [which was where i created an artificial kernel lockup to test the new format.] This is also useful if the lockup happens in user-space - the user-space EIP (and other registers) will be printed too. (such a lockup would either suggest that the task was running at SCHED_FIFO:99 and looping for more than 10 seconds, or that the softlockup detector has a false-positive.) The task name is printed too first, just in case we dont manage to print a useful backtrace. [EMAIL PROTECTED]: fix warning] Signed-off-by: Ingo Molnar [EMAIL PROTECTED] Signed-off-by: Satyam Sharma [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] Signed-off-by: Linus Torvalds [EMAIL PROTECTED] diff --git a/kernel/softlockup.c b/kernel/softlockup.c index e423b3a..11df812 100644 --- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -15,13 +15,16 @@ #include linux/notifier.h #include linux/module.h +#include asm/irq_regs.h + static DEFINE_SPINLOCK(print_lock); static DEFINE_PER_CPU(unsigned long, touch_timestamp); static DEFINE_PER_CPU(unsigned long, print_timestamp); static DEFINE_PER_CPU(struct task_struct *, watchdog_task); -static int did_panic = 0; +static int did_panic; +int softlockup_thresh = 10; static int softlock_panic(struct notifier_block *this, unsigned long event, void *ptr) @@ -72,6 +75,7 @@ void softlockup_tick(void) int this_cpu =