Re: RESEND, HTB(?) softlockup, vanilla 2.6.24
On Sun, Feb 17, 2008 at 02:03:33AM +0200, Denys Fedoryshchenko wrote: Server is fully redundant now, so i apply patches (but i apply both, probably it will make system more reliable somehow) and i enable required debug options in kernel. So i will try to catch this bug few more times, probably if it will generate more detailed info over netconsole it will be useful. I guess you mean the patches mentioned in the BUG/ spinlock lockup; they could be useful, but we are not sure this is the same problem. Anyway, if there are really stack overflows then we don't need any bug report after this: with stack data corrupted they would show some false problems. We need to find which code overflows and why. If you want to debug this, then try to make this more reproducible e.g. with CONFIG_4KSTACKS; anyway you should always turn on these options with such problems: CONFIG_DEBUG_STACKOVERFLOW CONFIG_DEBUG_STACK_USAGE. Is there any project to dump console messages/kernel dump to disk? For ... I don't know, but there is probably something better: a project by Intel to save this in some cpu memory (or something...). But again: we don't need corrupted messages after stack overflow, and, if we don't let for this, maybe these netconsole messages would be properly printed and quite enough... I notice some code in MTD(CONFIG_MTD_OOPS), but i am not sure it is correct and will work if i will setup MTD emulation for block device. I'm not sure what do you mean by MTD emulation: it should be used with MTD devices only, I presume? Regards, Jarek P. PS: BTW, for HTB with actions I recommend my sch_htb: htb_requeue fix, available in 2.6.25-rc. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RESEND, HTB(?) softlockup, vanilla 2.6.24
Thanks, i will try it. You think lockdep can be buggy? On Sat, 16 Feb 2008 09:00:36 +0100, Jarek Poplawski wrote Denys Fedoryshchenko wrote, On 02/13/2008 09:13 AM: It is very difficult to reproduce, happened after running about 1month. No changes done in classes at time of crash. Kernel 2.6.24 vanilla Hi, I could be wrong, but IMHO this looks like stack was overridden here, so my proposal is to try this: CONFIG_DEBUG_STACKOVERFLOW=y But, if you're not very interested in reproducing this, you could also try to turn off some other debugging, especially lockdep. Regards, Jarek P. Feb 10 15:53:22 SHAPER [ 8271.778915] BUG: NMI Watchdog detected LOCKUP Feb 10 15:53:22 SHAPER on CPU1, eip c01f0e5d, registers: Feb 10 15:53:22 SHAPER [ 8271.779307] Pid: 0, comm: swapper Not tainted (2.6.24-build-0021 #26) Feb 10 15:53:22 SHAPER [ 8271.779327] EIP: 0060:[c01f0e5d] EFLAGS: 0082 CPU: 1 Feb 10 15:53:22 SHAPER [ 8271.779349] EIP is at __rb_rotate_right+0x5/0x50 Feb 10 15:53:22 SHAPER [ 8271.779366] EAX: f76494a4 EBX: f76494a4 ECX: f76494a4 EDX: c1ff5f80 Feb 10 15:53:22 SHAPER [ 8271.779386] ESI: f76494a4 EDI: c1ff5f80 EBP: ESP: f7c29c70 Feb 10 15:53:22 SHAPER [ 8271.779406] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Feb 10 15:53:22 SHAPER [ 8271.779425] Process swapper (pid: 0, ti=f7c28000 task=f7c20a60 task.ti=f7c28000) Feb 10 15:53:22 SHAPER Feb 10 15:53:22 SHAPER [ 8271.779446] Stack: Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER c01f0ef4 Feb 10 15:53:22 SHAPER c1ff5f80 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a8 Feb 10 15:53:22 SHAPER c1ff5f78 Feb 10 15:53:22 SHAPER Feb 10 15:53:22 SHAPER [ 8271.779493] Feb 10 15:53:22 SHAPER [ 8271.779307] Pid: 0, comm: swapper Not tainted (2.6.24-build-0021 #26) Feb 10 15:53:22 SHAPER [ 8271.779327] EIP: 0060:[c01f0e5d] EFLAGS: 0082 CPU: 1 Feb 10 15:53:22 SHAPER [ 8271.779349] EIP is at __rb_rotate_right+0x5/0x50 Feb 10 15:53:22 SHAPER [ 8271.779366] EAX: f76494a4 EBX: f76494a4 ECX: f76494a4 EDX: c1ff5f80 Feb 10 15:53:22 SHAPER [ 8271.779386] ESI: f76494a4 EDI: c1ff5f80 EBP: ESP: f7c29c70 Feb 10 15:53:22 SHAPER [ 8271.779406] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Feb 10 15:53:22 SHAPER [ 8271.779425] Process swapper (pid: 0, ti=f7c28000 task=f7c20a60 task.ti=f7c28000) Feb 10 15:53:22 SHAPER Feb 10 15:53:22 SHAPER [ 8271.779446] Stack: Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RESEND, HTB(?) softlockup, vanilla 2.6.24
On Sat, Feb 16, 2008 at 12:25:31PM +0200, Denys Fedoryshchenko wrote: Thanks, i will try it. You think lockdep can be buggy? Just like every code... But the main reason is it has quite meaningful overhead, so could be right in production only after lockups happen. But if it doesn't report anything anyway... Your report shows there are quite long paths of calls during softirqs with some actions (ipt + mirred here?) and qdiscs, so if I'm not wrong with this stack problem, this would need some optimization. And, of course, there could be some additional bugs involved around too: otherwise it seems this should happen more often. But I don't expect you would try to debug this on your servers, so I hope, it simply will be found BTW some day... Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RESEND, HTB(?) softlockup, vanilla 2.6.24
Denys Fedoryshchenko wrote, On 02/13/2008 09:13 AM: It is very difficult to reproduce, happened after running about 1month. No changes done in classes at time of crash. Kernel 2.6.24 vanilla Hi, I could be wrong, but IMHO this looks like stack was overridden here, so my proposal is to try this: CONFIG_DEBUG_STACKOVERFLOW=y But, if you're not very interested in reproducing this, you could also try to turn off some other debugging, especially lockdep. Regards, Jarek P. ... Feb 10 15:53:22 SHAPER [ 8271.778915] BUG: NMI Watchdog detected LOCKUP Feb 10 15:53:22 SHAPER on CPU1, eip c01f0e5d, registers: ... Feb 10 15:53:22 SHAPER [ 8271.779307] Pid: 0, comm: swapper Not tainted (2.6.24-build-0021 #26) Feb 10 15:53:22 SHAPER [ 8271.779327] EIP: 0060:[c01f0e5d] EFLAGS: 0082 CPU: 1 Feb 10 15:53:22 SHAPER [ 8271.779349] EIP is at __rb_rotate_right+0x5/0x50 Feb 10 15:53:22 SHAPER [ 8271.779366] EAX: f76494a4 EBX: f76494a4 ECX: f76494a4 EDX: c1ff5f80 Feb 10 15:53:22 SHAPER [ 8271.779386] ESI: f76494a4 EDI: c1ff5f80 EBP: ESP: f7c29c70 Feb 10 15:53:22 SHAPER [ 8271.779406] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Feb 10 15:53:22 SHAPER [ 8271.779425] Process swapper (pid: 0, ti=f7c28000 task=f7c20a60 task.ti=f7c28000) Feb 10 15:53:22 SHAPER Feb 10 15:53:22 SHAPER [ 8271.779446] Stack: Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER c01f0ef4 Feb 10 15:53:22 SHAPER c1ff5f80 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a8 Feb 10 15:53:22 SHAPER c1ff5f78 Feb 10 15:53:22 SHAPER Feb 10 15:53:22 SHAPER [ 8271.779493] Feb 10 15:53:22 SHAPER [ 8271.779307] Pid: 0, comm: swapper Not tainted (2.6.24-build-0021 #26) Feb 10 15:53:22 SHAPER [ 8271.779327] EIP: 0060:[c01f0e5d] EFLAGS: 0082 CPU: 1 Feb 10 15:53:22 SHAPER [ 8271.779349] EIP is at __rb_rotate_right+0x5/0x50 Feb 10 15:53:22 SHAPER [ 8271.779366] EAX: f76494a4 EBX: f76494a4 ECX: f76494a4 EDX: c1ff5f80 Feb 10 15:53:22 SHAPER [ 8271.779386] ESI: f76494a4 EDI: c1ff5f80 EBP: ESP: f7c29c70 Feb 10 15:53:22 SHAPER [ 8271.779406] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Feb 10 15:53:22 SHAPER [ 8271.779425] Process swapper (pid: 0, ti=f7c28000 task=f7c20a60 task.ti=f7c28000) Feb 10 15:53:22 SHAPER Feb 10 15:53:22 SHAPER [ 8271.779446] Stack: Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 ... -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RESEND, HTB(?) softlockup, vanilla 2.6.24
On 13-02-2008 09:13, Denys Fedoryshchenko wrote: It is very difficult to reproduce, happened after running about 1month. No changes done in classes at time of crash. Kernel 2.6.24 vanilla I will try to attach also .config Hi Denys, This report looks very interesting. I don't know how others, but I plan to study it more soon (on the weekend?), then maybe more questions. Of course some exemplary tc rules should be helpful. Thanks, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
HTB(?) softlockup, vanilla 2.6.24
It is very difficult to reproduce, happened after running about 1month. No changes done in classes at that time Feb 10 15:53:22 SHAPER [ 8271.778915] BUG: NMI Watchdog detected LOCKUP Feb 10 15:53:22 SHAPER on CPU1, eip c01f0e5d, registers: Feb 10 15:53:22 SHAPER [ 8271.778952] Modules linked in: Feb 10 15:53:22 SHAPER netconsole Feb 10 15:53:22 SHAPER configfs Feb 10 15:53:22 SHAPER softdog Feb 10 15:53:22 SHAPER nf_nat_pptp Feb 10 15:53:22 SHAPER nf_conntrack_pptp Feb 10 15:53:22 SHAPER nf_conntrack_proto_gre Feb 10 15:53:22 SHAPER nf_nat_proto_gre Feb 10 15:53:22 SHAPER xt_tcpudp Feb 10 15:53:22 SHAPER ipt_TTL Feb 10 15:53:22 SHAPER ipt_ttl Feb 10 15:53:22 SHAPER xt_NOTRACK Feb 10 15:53:22 SHAPER iptable_raw Feb 10 15:53:22 SHAPER iptable_mangle Feb 10 15:53:22 SHAPER ifb Feb 10 15:53:22 SHAPER e1000e Feb 10 15:53:22 SHAPER em_nbyte Feb 10 15:53:22 SHAPER cls_tcindex Feb 10 15:53:22 SHAPER act_gact Feb 10 15:53:22 SHAPER cls_rsvp Feb 10 15:53:22 SHAPER sch_htb Feb 10 15:53:22 SHAPER cls_fw Feb 10 15:53:22 SHAPER act_mirred Feb 10 15:53:22 SHAPER em_u32 Feb 10 15:53:22 SHAPER sch_red Feb 10 15:53:22 SHAPER sch_sfq Feb 10 15:53:22 SHAPER sch_tbf Feb 10 15:53:22 SHAPER sch_teql Feb 10 15:53:22 SHAPER cls_basic Feb 10 15:53:22 SHAPER act_police Feb 10 15:53:22 SHAPER sch_gred Feb 10 15:53:22 SHAPER act_pedit Feb 10 15:53:22 SHAPER sch_hfsc Feb 10 15:53:22 SHAPER cls_rsvp6 Feb 10 15:53:22 SHAPER sch_ingress Feb 10 15:53:22 SHAPER em_meta Feb 10 15:53:22 SHAPER em_text Feb 10 15:53:22 SHAPER act_ipt Feb 10 15:53:22 SHAPER sch_dsmark Feb 10 15:53:22 SHAPER sch_prio Feb 10 15:53:22 SHAPER sch_netem Feb 10 15:53:22 SHAPER act_simple Feb 10 15:53:22 SHAPER cls_u32 Feb 10 15:53:22 SHAPER em_cmp Feb 10 15:53:22 SHAPER sch_cbq Feb 10 15:53:22 SHAPER cls_route Feb 10 15:53:22 SHAPER xt_TCPMSS Feb 10 15:53:22 SHAPER iptable_nat Feb 10 15:53:22 SHAPER nf_conntrack_ipv4 Feb 10 15:53:22 SHAPER ipt_LOG Feb 10 15:53:22 SHAPER ipt_MASQUERADE Feb 10 15:53:22 SHAPER ipt_REDIRECT Feb 10 15:53:22 SHAPER nf_nat Feb 10 15:53:22 SHAPER nf_conntrack Feb 10 15:53:22 SHAPER nfnetlink Feb 10 15:53:22 SHAPER iptable_filter Feb 10 15:53:22 SHAPER ip_tables Feb 10 15:53:22 SHAPER x_tables Feb 10 15:53:22 SHAPER 8021q Feb 10 15:53:22 SHAPER tun Feb 10 15:53:22 SHAPER tulip Feb 10 15:53:22 SHAPER r8169 Feb 10 15:53:22 SHAPER sky2 Feb 10 15:53:22 SHAPER via_velocity Feb 10 15:53:22 SHAPER via_rhine Feb 10 15:53:22 SHAPER sis900 Feb 10 15:53:22 SHAPER ne2k_pci Feb 10 15:53:22 SHAPER 8390 Feb 10 15:53:22 SHAPER skge Feb 10 15:53:22 SHAPER tg3 Feb 10 15:53:22 SHAPER 8139too Feb 10 15:53:22 SHAPER e1000 Feb 10 15:53:22 SHAPER e100 Feb 10 15:53:22 SHAPER usb_storage Feb 10 15:53:22 SHAPER mtdblock Feb 10 15:53:22 SHAPER mtd_blkdevs Feb 10 15:53:22 SHAPER usbhid Feb 10 15:53:22 SHAPER uhci_hcd Feb 10 15:53:22 SHAPER ehci_hcd Feb 10 15:53:22 SHAPER ohci_hcd Feb 10 15:53:22 SHAPER usbcore Feb 10 15:53:22 SHAPER Feb 10 15:53:22 SHAPER [ 8271.779291] Feb 10 15:53:22 SHAPER [ 8271.779307] Pid: 0, comm: swapper Not tainted (2.6.24-build-0021 #26) Feb 10 15:53:22 SHAPER [ 8271.779327] EIP: 0060:[c01f0e5d] EFLAGS: 0082 CPU: 1 Feb 10 15:53:22 SHAPER [ 8271.779349] EIP is at __rb_rotate_right+0x5/0x50 Feb 10 15:53:22 SHAPER [ 8271.779366] EAX: f76494a4 EBX: f76494a4 ECX: f76494a4 EDX: c1ff5f80 Feb 10 15:53:22 SHAPER [ 8271.779386] ESI: f76494a4 EDI: c1ff5f80 EBP: ESP: f7c29c70 Feb 10 15:53:22 SHAPER [ 8271.779406] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Feb 10 15:53:22 SHAPER [ 8271.779425] Process swapper (pid: 0, ti=f7c28000 task=f7c20a60 task.ti=f7c28000) Feb 10 15:53:22 SHAPER Feb 10 15:53:22 SHAPER [ 8271.779446] Stack: Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER c01f0ef4 Feb 10 15:53:22 SHAPER c1ff5f80 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a8 Feb 10 15:53:22 SHAPER c1ff5f78 Feb 10 15:53:22 SHAPER Feb 10 15:53:22 SHAPER [ 8271.779493] Feb 10 15:53:22 SHAPER [ 8271.779307] Pid: 0, comm: swapper Not tainted (2.6.24-build-0021 #26) Feb 10 15:53:22 SHAPER [ 8271.779327] EIP: 0060:[c01f0e5d] EFLAGS: 0082 CPU: 1 Feb 10 15:53:22 SHAPER [ 8271.779349] EIP is at __rb_rotate_right+0x5/0x50 Feb 10 15:53:22 SHAPER [ 8271.779366] EAX: f76494a4 EBX: f76494a4 ECX: f76494a4 EDX: c1ff5f80 Feb 10 15:53:22 SHAPER [ 8271.779386] ESI: f76494a4 EDI: c1ff5f80 EBP: ESP: f7c29c70 Feb 10 15:53:22 SHAPER [ 8271.779406] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Feb 10 15:53:22 SHAPER [ 8271.779425] Process swapper (pid: 0, ti=f7c28000 task=f7c20a60 task.ti=f7c28000) Feb 10 15:53:22 SHAPER Feb 10 15:53:22 SHAPER [ 8271.779446] Stack: Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER c01f0ef4 Feb 10 15:53:22 SHAPER c1ff5f80 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a8 Feb 10 15:53:22 SHAPER c1ff5f78 Feb 10 15:53:22 SHAPER Feb 10 15:53:22 SHAPER [