Re: 2.6.24-rc1 fails with lockup and BUG:
On Wed, 2007-10-24 at 12:44 -0400, Joseph Fannin wrote: > On Wed, Oct 24, 2007 at 03:25:44PM +0200, Romano Giannetti wrote: > > > > > Denis V. Lunev wrote a patch for the NetworkManager thing a day or two > ago (which DaveM has queued). > > Since netlink is involved in the traces you sent, this might do something > for the other too. This *do* fix the "network manager needs to be restarted at boot" part of the problem, but leave as is the worst one (the failed suspend to ram and following bugs). Romano -- Sorry for the disclaimer --- ¡I cannot stop it! -- La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración. This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1 fails with lockup and BUG:
On Wed, 2007-10-24 at 12:44 -0400, Joseph Fannin wrote: On Wed, Oct 24, 2007 at 03:25:44PM +0200, Romano Giannetti wrote: Denis V. Lunev wrote a patch for the NetworkManager thing a day or two ago (which DaveM has queued). Since netlink is involved in the traces you sent, this might do something for the other too. This *do* fix the network manager needs to be restarted at boot part of the problem, but leave as is the worst one (the failed suspend to ram and following bugs). Romano -- Sorry for the disclaimer --- ¡I cannot stop it! -- La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración. This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1 fails with lockup and BUG:
On Wed, 2007-10-24 at 18:11 +0200, Peter Zijlstra wrote: > On Wed, 2007-10-24 at 17:55 +0200, Ingo Molnar wrote: > > > > hm, this lockdep warning caused lockdep to turn itself off - hence we > > wont get to the really interesting warnings. We'll try to come up with a > > solution for this. > > Does this help? I tried this, but although I have the D-state processes, I cannot see any debug trace now. Results are at: http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_3/ Can I try anything more? This is quite a show-stopper for me... and before trying to bisect 11Mbyte of patches... Romano -- Sorry for the disclaimer --- ¡I cannot stop it! -- La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración. This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1 fails with lockup and BUG:
On Wed, 2007-10-24 at 18:11 +0200, Peter Zijlstra wrote: On Wed, 2007-10-24 at 17:55 +0200, Ingo Molnar wrote: hm, this lockdep warning caused lockdep to turn itself off - hence we wont get to the really interesting warnings. We'll try to come up with a solution for this. Does this help? I tried this, but although I have the D-state processes, I cannot see any debug trace now. Results are at: http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_3/ Can I try anything more? This is quite a show-stopper for me... and before trying to bisect 11Mbyte of patches... Romano -- Sorry for the disclaimer --- ¡I cannot stop it! -- La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración. This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1 fails with lockup and BUG:
On Wed, Oct 24, 2007 at 03:25:44PM +0200, Romano Giannetti wrote: > > Hi, > > 2.6.23-rc1 fails for me. I have the sensation it is network-related, but > I am not sure, so I send this message just to the list. > This same failure was present in git-5734-gd85714d, I sent > a message to the list but it seems it never arrived. I hope this will > pass through. My system is a toshiba satellite A305-S5077, dual core pentium. > > The symptoms are quite strange. At boot, NetworkManager fails to activate > my eth0 (r8169). Just stopping/restarting NM will make it works. Denis V. Lunev wrote a patch for the NetworkManager thing a day or two ago (which DaveM has queued). Since netlink is involved in the traces you sent, this might do something for the other too. The patch I recieved follows: > Revert to original netlink behavior. Do not reply with ACK if the > netlink dump has bees successfully started. > libnl has been broken by the cd40b7d3983c708aabe3d3008ec64ffce56d33b0 > The following command reproduce the problem: >/nl-route-get 192.168.1.1 > Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 98e313e..44a8b41 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1565,7 +1565,10 @@ int netlink_dump_start(struct sock *ssk, struct sk_buff *skb, netlink_dump(sk); sock_put(sk); - return 0; + + /* We successfully started a dump, by returning -EINTR we +* signal not to send ACK even if it was requested */ + return -EINTR; } void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err) @@ -1619,17 +1622,21 @@ int netlink_rcv_skb(struct sk_buff *skb, int (*cb)(struct sk_buff *, /* Only requests are handled by the kernel */ if (!(nlh->nlmsg_flags & NLM_F_REQUEST)) - goto skip; + goto ack; /* Skip control messages */ if (nlh->nlmsg_type < NLMSG_MIN_TYPE) - goto skip; + goto ack; err = cb(skb, nlh); -skip: + if (err == -EINTR) + goto skip; + +ack: if (nlh->nlmsg_flags & NLM_F_ACK || err) netlink_ack(skb, nlh, err); +skip: msglen = NLMSG_ALIGN(nlh->nlmsg_len); if (msglen > skb->len) msglen = skb->len; -- Joseph Fannin [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1 fails with lockup and BUG:
On Wed, 2007-10-24 at 17:55 +0200, Ingo Molnar wrote: > * Romano Giannetti <[EMAIL PROTECTED]> wrote: > > > Done. The results are at: > > > > http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_2/ > > > > in the syslog-after-failed-suspend.txt file. After the failed suspend > > (at line 15766) there where the bunch of things in D-state. I have > > left the file intact. > > > > At line 17646 there is: > > > > WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on() > > hm, this lockdep warning caused lockdep to turn itself off - hence we > wont get to the really interesting warnings. We'll try to come up with a > solution for this. Does this help? --- Subject: lockdep: invalid irq usage this function can be called from hardirq context. Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> --- Index: linux-2.6-2/kernel/sched_debug.c === --- linux-2.6-2.orig/kernel/sched_debug.c +++ linux-2.6-2/kernel/sched_debug.c @@ -80,6 +80,7 @@ print_task(struct seq_file *m, struct rq static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu) { struct task_struct *g, *p; + unsigned long flags; SEQ_printf(m, "\nrunnable tasks:\n" @@ -88,7 +89,7 @@ static void print_rq(struct seq_file *m, "--" "\n"); - read_lock_irq(_lock); + read_lock_irqsave(_lock, flags); do_each_thread(g, p) { if (!p->se.on_rq || task_cpu(p) != rq_cpu) @@ -97,7 +98,7 @@ static void print_rq(struct seq_file *m, print_task(m, rq, p); } while_each_thread(g, p); - read_unlock_irq(_lock); + read_unlock_irqrestore(_lock, flags); } void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) signature.asc Description: This is a digitally signed message part
Re: 2.6.24-rc1 fails with lockup and BUG:
* Romano Giannetti <[EMAIL PROTECTED]> wrote: > Done. The results are at: > > http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_2/ > > in the syslog-after-failed-suspend.txt file. After the failed suspend > (at line 15766) there where the bunch of things in D-state. I have > left the file intact. > > At line 17646 there is: > > WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on() hm, this lockdep warning caused lockdep to turn itself off - hence we wont get to the really interesting warnings. We'll try to come up with a solution for this. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1 fails with lockup and BUG:
On Wed, 2007-10-24 at 16:27 +0200, Ingo Molnar wrote: > > CONFIG_PROVE_LOCKING=y > CONFIG_DEBUG_LIST=y > CONFIG_FRAME_POINTER=y > CONFIG_DEBUG_SLAB=y > > and please post the resulting dmesg output - does lockdep notice any > lockup reason? (your backtrace suggests some mutex stuff so it might as > well detect it) > Done. The results are at: http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_2/ in the syslog-after-failed-suspend.txt file. After the failed suspend (at line 15766) there where the bunch of things in D-state. I have left the file intact. At line 17646 there is: WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on() I waited a bit and then, on an already-opened root shell, did s2ram -f -p -m (line 17811) and then a lot more things happened, and I am somewhat lost. Hope this could be useful to you. Romano -- Sorry for the disclaimer --- ¡I cannot stop it! -- La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración. This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1 fails with lockup and BUG:
* Romano Giannetti <[EMAIL PROTECTED]> wrote: > 2.6.23-rc1 fails for me. I have the sensation it is network-related, > but I am not sure, so I send this message just to the list. This same > failure was present in git-5734-gd85714d, I sent a message to the list > but it seems it never arrived. I hope this will pass through. My > system is a toshiba satellite A305-S5077, dual core pentium. could you turn on these in your .config: CONFIG_PROVE_LOCKING=y CONFIG_DEBUG_LIST=y CONFIG_FRAME_POINTER=y CONFIG_DEBUG_SLAB=y and please post the resulting dmesg output - does lockdep notice any lockup reason? (your backtrace suggests some mutex stuff so it might as well detect it) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.24-rc1 fails with lockup and BUG:
Hi, 2.6.23-rc1 fails for me. I have the sensation it is network-related, but I am not sure, so I send this message just to the list. This same failure was present in git-5734-gd85714d, I sent a message to the list but it seems it never arrived. I hope this will pass through. My system is a toshiba satellite A305-S5077, dual core pentium. The symptoms are quite strange. At boot, NetworkManager fails to activate my eth0 (r8169). Just stopping/restarting NM will make it works. Then, after one or two or maximum three suspend to ram and resume that works, all go awry. Notice that I do not know if the s2ram is the cause, or simply the way to accelerate the bug. The suspend-to-ram will fail with a messages: "gnome-power-manager: (romano) DBUS timed out, but recovering" and a number of processes go into D state (please find their sysrq-t traces few lines down). Now I cannot create new windows, nor doing sudo (sudo anything will go into D limbo), and not even a clean shutdown. Trying that the system loops forever saying: BUG: soft lockup - CPU#0 stuck for 11s! [ifconfig: 7481] and sysrq-b is the only option. Complete dmesg, config, etc at: http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_1/ Thanks, Romano PS sorry for the disclaimer, I cannot stop it (¡!) nmbd D ca9cbea4 0 5464 1 c256eaa0 0086 0002 ca9cbea4 ca9cbe9c c256ebdc c17fba80 c250e900 c0426080 c0426080 c22f67d0 c01773a3 0010 c0426080 00013bab 00ff c03bc514 c03bc51c c03bc518 Call Trace: [cache_alloc_refill+115/1280] cache_alloc_refill+0x73/0x500 [__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90 [mutex_lock+20/32] mutex_lock+0x14/0x20 [sock_ioctl+0/560] sock_ioctl+0x0/0x230 [dev_ioctl+200/1312] dev_ioctl+0xc8/0x520 [sock_init_data+108/384] sock_init_data+0x6c/0x180 [inet_create+413/832] inet_create+0x19d/0x340 [inotify_d_instantiate+24/128] inotify_d_instantiate+0x18/0x80 [d_alloc+265/400] d_alloc+0x109/0x190 [d_instantiate+59/80] d_instantiate+0x3b/0x50 [udp_ioctl+0/160] udp_ioctl+0x0/0xa0 [inet_ioctl+58/192] inet_ioctl+0x3a/0xc0 [sock_ioctl+207/560] sock_ioctl+0xcf/0x230 [sock_ioctl+0/560] sock_ioctl+0x0/0x230 [do_ioctl+43/144] do_ioctl+0x2b/0x90 [sys_socket+41/80] sys_socket+0x29/0x50 [vfs_ioctl+92/656] vfs_ioctl+0x5c/0x290 [sys_ioctl+61/112] sys_ioctl+0x3d/0x70 [sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85 === x-session-man D c2c49d28 0 5774 5246 c1c4d000 00200082 0002 c2c49d28 c2c49d20 c1c4d13c c1803a80 c2c25c80 c0426080 c0426080 cab12550 c011d3c2 c03a13a0 c0426080 00016269 00ff c03bc514 c03bc51c c03bc518 Call Trace: [enqueue_task+18/48] enqueue_task+0x12/0x30 [__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90 [mutex_lock+20/32] mutex_lock+0x14/0x20 [rtnetlink_rcv+8/32] rtnetlink_rcv+0x8/0x20 [netlink_unicast+502/544] netlink_unicast+0x1f6/0x220 [copy_from_user+46/112] copy_from_user+0x2e/0x70 [memcpy_fromiovec+56/80] memcpy_fromiovec+0x38/0x50 [netlink_sendmsg+488/720] netlink_sendmsg+0x1e8/0x2d0 [__wake_up_sync+65/128] __wake_up_sync+0x41/0x80 [sock_sendmsg+206/256] sock_sendmsg+0xce/0x100 [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50 [__wake_up+62/96] __wake_up+0x3e/0x60 [netlink_insert+197/320] netlink_insert+0xc5/0x140 [copy_from_user+46/112] copy_from_user+0x2e/0x70 [sys_sendto+307/384] sys_sendto+0x133/0x180 [move_addr_to_user+95/112] move_addr_to_user+0x5f/0x70 [sys_getsockname+205/208] sys_getsockname+0xcd/0xd0 [__netlink_create+97/176] __netlink_create+0x61/0xb0 [inotify_d_instantiate+24/128] inotify_d_instantiate+0x18/0x80 [d_alloc+265/400] d_alloc+0x109/0x190 [d_instantiate+59/80] d_instantiate+0x3b/0x50 [sock_attach_fd+128/192] sock_attach_fd+0x80/0xc0 [sys_socketcall+408/640] sys_socketcall+0x198/0x280 [sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85 [clip_device_event+16/160] clip_device_event+0x10/0xa0 === ipD cb573d28 0 7487 7486 c1dc7aa0 0082 0002 cb573d28 cb573d20 c1dc7bdc c1803a80 c27ed740 c0426080 c0426080 001280d2 c218eeb0 c218ef54 c0426080 00010bb9 00ff c03bc514 c03bc51c c03bc518 Call Trace: [__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90 [mutex_lock+20/32] mutex_lock+0x14/0x20 [rtnetlink_rcv+8/32] rtnetlink_rcv+0x8/0x20 [netlink_unicast+502/544] netlink_unicast+0x1f6/0x220 [copy_from_user+46/112] copy_from_user+0x2e/0x70 [memcpy_fromiovec+56/80] memcpy_fromiovec+0x38/0x50 [netlink_sendmsg+488/720] netlink_sendmsg+0x1e8/0x2d0 [do_lookup+101/400] do_lookup+0x65/0x190 [sock_sendmsg+206/256] sock_sendmsg+0xce/0x100 [] __ext3_journal_dirty_metadata+0x22/0x60 [ext3] [] journal_get_write_access+0x29/0x40 [jbd] [autoremove_wake_function+0/80]
2.6.24-rc1 fails with lockup and BUG:
Hi, 2.6.23-rc1 fails for me. I have the sensation it is network-related, but I am not sure, so I send this message just to the list. This same failure was present in git-5734-gd85714d, I sent a message to the list but it seems it never arrived. I hope this will pass through. My system is a toshiba satellite A305-S5077, dual core pentium. The symptoms are quite strange. At boot, NetworkManager fails to activate my eth0 (r8169). Just stopping/restarting NM will make it works. Then, after one or two or maximum three suspend to ram and resume that works, all go awry. Notice that I do not know if the s2ram is the cause, or simply the way to accelerate the bug. The suspend-to-ram will fail with a messages: gnome-power-manager: (romano) DBUS timed out, but recovering and a number of processes go into D state (please find their sysrq-t traces few lines down). Now I cannot create new windows, nor doing sudo (sudo anything will go into D limbo), and not even a clean shutdown. Trying that the system loops forever saying: BUG: soft lockup - CPU#0 stuck for 11s! [ifconfig: 7481] and sysrq-b is the only option. Complete dmesg, config, etc at: http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_1/ Thanks, Romano PS sorry for the disclaimer, I cannot stop it (¡!) nmbd D ca9cbea4 0 5464 1 c256eaa0 0086 0002 ca9cbea4 ca9cbe9c c256ebdc c17fba80 c250e900 c0426080 c0426080 c22f67d0 c01773a3 0010 c0426080 00013bab 00ff c03bc514 c03bc51c c03bc518 Call Trace: [cache_alloc_refill+115/1280] cache_alloc_refill+0x73/0x500 [__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90 [mutex_lock+20/32] mutex_lock+0x14/0x20 [sock_ioctl+0/560] sock_ioctl+0x0/0x230 [dev_ioctl+200/1312] dev_ioctl+0xc8/0x520 [sock_init_data+108/384] sock_init_data+0x6c/0x180 [inet_create+413/832] inet_create+0x19d/0x340 [inotify_d_instantiate+24/128] inotify_d_instantiate+0x18/0x80 [d_alloc+265/400] d_alloc+0x109/0x190 [d_instantiate+59/80] d_instantiate+0x3b/0x50 [udp_ioctl+0/160] udp_ioctl+0x0/0xa0 [inet_ioctl+58/192] inet_ioctl+0x3a/0xc0 [sock_ioctl+207/560] sock_ioctl+0xcf/0x230 [sock_ioctl+0/560] sock_ioctl+0x0/0x230 [do_ioctl+43/144] do_ioctl+0x2b/0x90 [sys_socket+41/80] sys_socket+0x29/0x50 [vfs_ioctl+92/656] vfs_ioctl+0x5c/0x290 [sys_ioctl+61/112] sys_ioctl+0x3d/0x70 [sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85 === x-session-man D c2c49d28 0 5774 5246 c1c4d000 00200082 0002 c2c49d28 c2c49d20 c1c4d13c c1803a80 c2c25c80 c0426080 c0426080 cab12550 c011d3c2 c03a13a0 c0426080 00016269 00ff c03bc514 c03bc51c c03bc518 Call Trace: [enqueue_task+18/48] enqueue_task+0x12/0x30 [__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90 [mutex_lock+20/32] mutex_lock+0x14/0x20 [rtnetlink_rcv+8/32] rtnetlink_rcv+0x8/0x20 [netlink_unicast+502/544] netlink_unicast+0x1f6/0x220 [copy_from_user+46/112] copy_from_user+0x2e/0x70 [memcpy_fromiovec+56/80] memcpy_fromiovec+0x38/0x50 [netlink_sendmsg+488/720] netlink_sendmsg+0x1e8/0x2d0 [__wake_up_sync+65/128] __wake_up_sync+0x41/0x80 [sock_sendmsg+206/256] sock_sendmsg+0xce/0x100 [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50 [__wake_up+62/96] __wake_up+0x3e/0x60 [netlink_insert+197/320] netlink_insert+0xc5/0x140 [copy_from_user+46/112] copy_from_user+0x2e/0x70 [sys_sendto+307/384] sys_sendto+0x133/0x180 [move_addr_to_user+95/112] move_addr_to_user+0x5f/0x70 [sys_getsockname+205/208] sys_getsockname+0xcd/0xd0 [__netlink_create+97/176] __netlink_create+0x61/0xb0 [inotify_d_instantiate+24/128] inotify_d_instantiate+0x18/0x80 [d_alloc+265/400] d_alloc+0x109/0x190 [d_instantiate+59/80] d_instantiate+0x3b/0x50 [sock_attach_fd+128/192] sock_attach_fd+0x80/0xc0 [sys_socketcall+408/640] sys_socketcall+0x198/0x280 [sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85 [clip_device_event+16/160] clip_device_event+0x10/0xa0 === ipD cb573d28 0 7487 7486 c1dc7aa0 0082 0002 cb573d28 cb573d20 c1dc7bdc c1803a80 c27ed740 c0426080 c0426080 001280d2 c218eeb0 c218ef54 c0426080 00010bb9 00ff c03bc514 c03bc51c c03bc518 Call Trace: [__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90 [mutex_lock+20/32] mutex_lock+0x14/0x20 [rtnetlink_rcv+8/32] rtnetlink_rcv+0x8/0x20 [netlink_unicast+502/544] netlink_unicast+0x1f6/0x220 [copy_from_user+46/112] copy_from_user+0x2e/0x70 [memcpy_fromiovec+56/80] memcpy_fromiovec+0x38/0x50 [netlink_sendmsg+488/720] netlink_sendmsg+0x1e8/0x2d0 [do_lookup+101/400] do_lookup+0x65/0x190 [sock_sendmsg+206/256] sock_sendmsg+0xce/0x100 [f88fc042] __ext3_journal_dirty_metadata+0x22/0x60 [ext3] [f88de999] journal_get_write_access+0x29/0x40 [jbd]
Re: 2.6.24-rc1 fails with lockup and BUG:
* Romano Giannetti [EMAIL PROTECTED] wrote: 2.6.23-rc1 fails for me. I have the sensation it is network-related, but I am not sure, so I send this message just to the list. This same failure was present in git-5734-gd85714d, I sent a message to the list but it seems it never arrived. I hope this will pass through. My system is a toshiba satellite A305-S5077, dual core pentium. could you turn on these in your .config: CONFIG_PROVE_LOCKING=y CONFIG_DEBUG_LIST=y CONFIG_FRAME_POINTER=y CONFIG_DEBUG_SLAB=y and please post the resulting dmesg output - does lockdep notice any lockup reason? (your backtrace suggests some mutex stuff so it might as well detect it) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1 fails with lockup and BUG:
On Wed, 2007-10-24 at 16:27 +0200, Ingo Molnar wrote: CONFIG_PROVE_LOCKING=y CONFIG_DEBUG_LIST=y CONFIG_FRAME_POINTER=y CONFIG_DEBUG_SLAB=y and please post the resulting dmesg output - does lockdep notice any lockup reason? (your backtrace suggests some mutex stuff so it might as well detect it) Done. The results are at: http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_2/ in the syslog-after-failed-suspend.txt file. After the failed suspend (at line 15766) there where the bunch of things in D-state. I have left the file intact. At line 17646 there is: WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on() I waited a bit and then, on an already-opened root shell, did s2ram -f -p -m (line 17811) and then a lot more things happened, and I am somewhat lost. Hope this could be useful to you. Romano -- Sorry for the disclaimer --- ¡I cannot stop it! -- La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración. This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1 fails with lockup and BUG:
* Romano Giannetti [EMAIL PROTECTED] wrote: Done. The results are at: http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_2/ in the syslog-after-failed-suspend.txt file. After the failed suspend (at line 15766) there where the bunch of things in D-state. I have left the file intact. At line 17646 there is: WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on() hm, this lockdep warning caused lockdep to turn itself off - hence we wont get to the really interesting warnings. We'll try to come up with a solution for this. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1 fails with lockup and BUG:
On Wed, 2007-10-24 at 17:55 +0200, Ingo Molnar wrote: * Romano Giannetti [EMAIL PROTECTED] wrote: Done. The results are at: http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_2/ in the syslog-after-failed-suspend.txt file. After the failed suspend (at line 15766) there where the bunch of things in D-state. I have left the file intact. At line 17646 there is: WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on() hm, this lockdep warning caused lockdep to turn itself off - hence we wont get to the really interesting warnings. We'll try to come up with a solution for this. Does this help? --- Subject: lockdep: invalid irq usage this function can be called from hardirq context. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- Index: linux-2.6-2/kernel/sched_debug.c === --- linux-2.6-2.orig/kernel/sched_debug.c +++ linux-2.6-2/kernel/sched_debug.c @@ -80,6 +80,7 @@ print_task(struct seq_file *m, struct rq static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu) { struct task_struct *g, *p; + unsigned long flags; SEQ_printf(m, \nrunnable tasks:\n @@ -88,7 +89,7 @@ static void print_rq(struct seq_file *m, -- \n); - read_lock_irq(tasklist_lock); + read_lock_irqsave(tasklist_lock, flags); do_each_thread(g, p) { if (!p-se.on_rq || task_cpu(p) != rq_cpu) @@ -97,7 +98,7 @@ static void print_rq(struct seq_file *m, print_task(m, rq, p); } while_each_thread(g, p); - read_unlock_irq(tasklist_lock); + read_unlock_irqrestore(tasklist_lock, flags); } void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) signature.asc Description: This is a digitally signed message part
Re: 2.6.24-rc1 fails with lockup and BUG:
On Wed, Oct 24, 2007 at 03:25:44PM +0200, Romano Giannetti wrote: Hi, 2.6.23-rc1 fails for me. I have the sensation it is network-related, but I am not sure, so I send this message just to the list. This same failure was present in git-5734-gd85714d, I sent a message to the list but it seems it never arrived. I hope this will pass through. My system is a toshiba satellite A305-S5077, dual core pentium. The symptoms are quite strange. At boot, NetworkManager fails to activate my eth0 (r8169). Just stopping/restarting NM will make it works. Denis V. Lunev wrote a patch for the NetworkManager thing a day or two ago (which DaveM has queued). Since netlink is involved in the traces you sent, this might do something for the other too. The patch I recieved follows: Revert to original netlink behavior. Do not reply with ACK if the netlink dump has bees successfully started. libnl has been broken by the cd40b7d3983c708aabe3d3008ec64ffce56d33b0 The following command reproduce the problem: /nl-route-get 192.168.1.1 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 98e313e..44a8b41 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1565,7 +1565,10 @@ int netlink_dump_start(struct sock *ssk, struct sk_buff *skb, netlink_dump(sk); sock_put(sk); - return 0; + + /* We successfully started a dump, by returning -EINTR we +* signal not to send ACK even if it was requested */ + return -EINTR; } void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err) @@ -1619,17 +1622,21 @@ int netlink_rcv_skb(struct sk_buff *skb, int (*cb)(struct sk_buff *, /* Only requests are handled by the kernel */ if (!(nlh-nlmsg_flags NLM_F_REQUEST)) - goto skip; + goto ack; /* Skip control messages */ if (nlh-nlmsg_type NLMSG_MIN_TYPE) - goto skip; + goto ack; err = cb(skb, nlh); -skip: + if (err == -EINTR) + goto skip; + +ack: if (nlh-nlmsg_flags NLM_F_ACK || err) netlink_ack(skb, nlh, err); +skip: msglen = NLMSG_ALIGN(nlh-nlmsg_len); if (msglen skb-len) msglen = skb-len; -- Joseph Fannin [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/