Re: 2.6.24-rc1 fails with lockup and BUG:

2007-10-26 Thread Romano Giannetti

On Wed, 2007-10-24 at 12:44 -0400, Joseph Fannin wrote:
> On Wed, Oct 24, 2007 at 03:25:44PM +0200, Romano Giannetti wrote:
> >
> >
> Denis V. Lunev wrote a patch for the NetworkManager thing a day or two
> ago (which DaveM has queued).
> 
> Since netlink is involved in the traces you sent, this might do something
> for the other too.

This *do* fix the "network manager needs to be restarted at boot" part
of the problem, but leave as is the worst one (the failed suspend to ram
and following bugs). 

Romano 

-- 
Sorry for the disclaimer --- ¡I cannot stop it!



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso 
del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, 
le informamos que cualquier forma de distribución, reproducción o uso de esta 
comunicación y/o de la información contenida en la misma están estrictamente 
prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por 
favor, notifíquelo inmediatamente al remitente contestando a este mensaje y 
proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive 
use of the intended addressee. If you are not the intended addressee, please 
note that any form of distribution, copying or use of this communication or the 
information in it is strictly prohibited by law. If you have received this 
communication in error, please immediately notify the sender by reply e-mail 
and destroy this message. Thank you for your cooperation. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1 fails with lockup and BUG:

2007-10-26 Thread Romano Giannetti

On Wed, 2007-10-24 at 12:44 -0400, Joseph Fannin wrote:
 On Wed, Oct 24, 2007 at 03:25:44PM +0200, Romano Giannetti wrote:
 
 
 Denis V. Lunev wrote a patch for the NetworkManager thing a day or two
 ago (which DaveM has queued).
 
 Since netlink is involved in the traces you sent, this might do something
 for the other too.

This *do* fix the network manager needs to be restarted at boot part
of the problem, but leave as is the worst one (the failed suspend to ram
and following bugs). 

Romano 

-- 
Sorry for the disclaimer --- ¡I cannot stop it!



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso 
del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, 
le informamos que cualquier forma de distribución, reproducción o uso de esta 
comunicación y/o de la información contenida en la misma están estrictamente 
prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por 
favor, notifíquelo inmediatamente al remitente contestando a este mensaje y 
proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive 
use of the intended addressee. If you are not the intended addressee, please 
note that any form of distribution, copying or use of this communication or the 
information in it is strictly prohibited by law. If you have received this 
communication in error, please immediately notify the sender by reply e-mail 
and destroy this message. Thank you for your cooperation. 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1 fails with lockup and BUG:

2007-10-25 Thread Romano Giannetti

On Wed, 2007-10-24 at 18:11 +0200, Peter Zijlstra wrote:
> On Wed, 2007-10-24 at 17:55 +0200, Ingo Molnar wrote:
> > 
> > hm, this lockdep warning caused lockdep to turn itself off - hence we 
> > wont get to the really interesting warnings. We'll try to come up with a 
> > solution for this.
> 
> Does this help?

I tried this, but although I have the D-state processes, I cannot see
any debug trace now. Results are at:

http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_3/

Can I try anything more? This is quite a show-stopper for me... and
before trying to bisect 11Mbyte of patches...

Romano 


-- 
Sorry for the disclaimer --- ¡I cannot stop it!



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso 
del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, 
le informamos que cualquier forma de distribución, reproducción o uso de esta 
comunicación y/o de la información contenida en la misma están estrictamente 
prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por 
favor, notifíquelo inmediatamente al remitente contestando a este mensaje y 
proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive 
use of the intended addressee. If you are not the intended addressee, please 
note that any form of distribution, copying or use of this communication or the 
information in it is strictly prohibited by law. If you have received this 
communication in error, please immediately notify the sender by reply e-mail 
and destroy this message. Thank you for your cooperation. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1 fails with lockup and BUG:

2007-10-25 Thread Romano Giannetti

On Wed, 2007-10-24 at 18:11 +0200, Peter Zijlstra wrote:
 On Wed, 2007-10-24 at 17:55 +0200, Ingo Molnar wrote:
  
  hm, this lockdep warning caused lockdep to turn itself off - hence we 
  wont get to the really interesting warnings. We'll try to come up with a 
  solution for this.
 
 Does this help?

I tried this, but although I have the D-state processes, I cannot see
any debug trace now. Results are at:

http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_3/

Can I try anything more? This is quite a show-stopper for me... and
before trying to bisect 11Mbyte of patches...

Romano 


-- 
Sorry for the disclaimer --- ¡I cannot stop it!



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso 
del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, 
le informamos que cualquier forma de distribución, reproducción o uso de esta 
comunicación y/o de la información contenida en la misma están estrictamente 
prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por 
favor, notifíquelo inmediatamente al remitente contestando a este mensaje y 
proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive 
use of the intended addressee. If you are not the intended addressee, please 
note that any form of distribution, copying or use of this communication or the 
information in it is strictly prohibited by law. If you have received this 
communication in error, please immediately notify the sender by reply e-mail 
and destroy this message. Thank you for your cooperation. 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1 fails with lockup and BUG:

2007-10-24 Thread Joseph Fannin
On Wed, Oct 24, 2007 at 03:25:44PM +0200, Romano Giannetti wrote:
>
> Hi,
>
> 2.6.23-rc1 fails for me. I have the sensation it is network-related, but
> I am not sure, so I send this message just to the list.
> This same failure was present in git-5734-gd85714d, I sent
> a message to the list but it seems it never arrived. I hope this will
> pass through. My system is a toshiba satellite A305-S5077, dual core pentium.
>
> The symptoms are quite strange. At boot, NetworkManager fails to activate
> my eth0 (r8169). Just stopping/restarting NM will make it works.


Denis V. Lunev wrote a patch for the NetworkManager thing a day or two
ago (which DaveM has queued).

Since netlink is involved in the traces you sent, this might do something
for the other too.

The patch I recieved follows:


> Revert to original netlink behavior. Do not reply with ACK if the
> netlink dump has bees successfully started.

> libnl has been broken by the cd40b7d3983c708aabe3d3008ec64ffce56d33b0
> The following command reproduce the problem:
>/nl-route-get 192.168.1.1

> Signed-off-by: Denis V. Lunev <[EMAIL PROTECTED]>



diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 98e313e..44a8b41 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1565,7 +1565,10 @@ int netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
 
netlink_dump(sk);
sock_put(sk);
-   return 0;
+
+   /* We successfully started a dump, by returning -EINTR we
+* signal not to send ACK even if it was requested */
+   return -EINTR;
 }
 
 void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err)
@@ -1619,17 +1622,21 @@ int netlink_rcv_skb(struct sk_buff *skb, int 
(*cb)(struct sk_buff *,
 
/* Only requests are handled by the kernel */
if (!(nlh->nlmsg_flags & NLM_F_REQUEST))
-   goto skip;
+   goto ack;
 
/* Skip control messages */
if (nlh->nlmsg_type < NLMSG_MIN_TYPE)
-   goto skip;
+   goto ack;
 
err = cb(skb, nlh);
-skip:
+   if (err == -EINTR)
+   goto skip;
+
+ack:
if (nlh->nlmsg_flags & NLM_F_ACK || err)
netlink_ack(skb, nlh, err);
 
+skip:
msglen = NLMSG_ALIGN(nlh->nlmsg_len);
if (msglen > skb->len)
msglen = skb->len;





--
Joseph Fannin
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1 fails with lockup and BUG:

2007-10-24 Thread Peter Zijlstra
On Wed, 2007-10-24 at 17:55 +0200, Ingo Molnar wrote:
> * Romano Giannetti <[EMAIL PROTECTED]> wrote:
> 
> > Done. The results are at:
> > 
> > http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_2/
> > 
> > in the syslog-after-failed-suspend.txt file. After the failed suspend 
> > (at line 15766) there where the bunch of things in D-state. I have 
> > left the file intact.
> > 
> > At line 17646 there is:
> > 
> > WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on()
> 
> hm, this lockdep warning caused lockdep to turn itself off - hence we 
> wont get to the really interesting warnings. We'll try to come up with a 
> solution for this.

Does this help?

---
Subject: lockdep: invalid irq usage

this function can be called from hardirq context.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---

Index: linux-2.6-2/kernel/sched_debug.c
===
--- linux-2.6-2.orig/kernel/sched_debug.c
+++ linux-2.6-2/kernel/sched_debug.c
@@ -80,6 +80,7 @@ print_task(struct seq_file *m, struct rq
 static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu)
 {
struct task_struct *g, *p;
+   unsigned long flags;
 
SEQ_printf(m,
"\nrunnable tasks:\n"
@@ -88,7 +89,7 @@ static void print_rq(struct seq_file *m,
"--"
"\n");
 
-   read_lock_irq(_lock);
+   read_lock_irqsave(_lock, flags);
 
do_each_thread(g, p) {
if (!p->se.on_rq || task_cpu(p) != rq_cpu)
@@ -97,7 +98,7 @@ static void print_rq(struct seq_file *m,
print_task(m, rq, p);
} while_each_thread(g, p);
 
-   read_unlock_irq(_lock);
+   read_unlock_irqrestore(_lock, flags);
 }
 
 void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)



signature.asc
Description: This is a digitally signed message part


Re: 2.6.24-rc1 fails with lockup and BUG:

2007-10-24 Thread Ingo Molnar

* Romano Giannetti <[EMAIL PROTECTED]> wrote:

> Done. The results are at:
> 
> http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_2/
> 
> in the syslog-after-failed-suspend.txt file. After the failed suspend 
> (at line 15766) there where the bunch of things in D-state. I have 
> left the file intact.
> 
> At line 17646 there is:
> 
> WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on()

hm, this lockdep warning caused lockdep to turn itself off - hence we 
wont get to the really interesting warnings. We'll try to come up with a 
solution for this.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1 fails with lockup and BUG:

2007-10-24 Thread Romano Giannetti

On Wed, 2007-10-24 at 16:27 +0200, Ingo Molnar wrote:

> 
>   CONFIG_PROVE_LOCKING=y
>   CONFIG_DEBUG_LIST=y
>   CONFIG_FRAME_POINTER=y
>   CONFIG_DEBUG_SLAB=y
> 
> and please post the resulting dmesg output - does lockdep notice any 
> lockup reason? (your backtrace suggests some mutex stuff so it might as 
> well detect it)
> 

Done. The results are at: 

http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_2/

in the  syslog-after-failed-suspend.txt file. After the failed suspend
(at line 15766) there where the bunch of things in D-state. I have left
the file intact.

At line 17646 there  is:

WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on() 

I waited a bit and then, on an already-opened root shell, did 
s2ram -f -p -m  (line 17811)

and then a lot more things happened, and I am somewhat lost.

Hope this could be useful to you.

Romano 

-- 
Sorry for the disclaimer --- ¡I cannot stop it!



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso 
del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, 
le informamos que cualquier forma de distribución, reproducción o uso de esta 
comunicación y/o de la información contenida en la misma están estrictamente 
prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por 
favor, notifíquelo inmediatamente al remitente contestando a este mensaje y 
proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive 
use of the intended addressee. If you are not the intended addressee, please 
note that any form of distribution, copying or use of this communication or the 
information in it is strictly prohibited by law. If you have received this 
communication in error, please immediately notify the sender by reply e-mail 
and destroy this message. Thank you for your cooperation. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1 fails with lockup and BUG:

2007-10-24 Thread Ingo Molnar

* Romano Giannetti <[EMAIL PROTECTED]> wrote:

> 2.6.23-rc1 fails for me. I have the sensation it is network-related, 
> but I am not sure, so I send this message just to the list. This same 
> failure was present in git-5734-gd85714d, I sent a message to the list 
> but it seems it never arrived. I hope this will pass through. My 
> system is a toshiba satellite A305-S5077, dual core pentium.

could you turn on these in your .config:

  CONFIG_PROVE_LOCKING=y
  CONFIG_DEBUG_LIST=y
  CONFIG_FRAME_POINTER=y
  CONFIG_DEBUG_SLAB=y

and please post the resulting dmesg output - does lockdep notice any 
lockup reason? (your backtrace suggests some mutex stuff so it might as 
well detect it)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.24-rc1 fails with lockup and BUG:

2007-10-24 Thread Romano Giannetti

Hi,

2.6.23-rc1 fails for me. I have the sensation it is network-related, but
I am not sure, so I send this message just to the list. 
This same failure was present in git-5734-gd85714d, I sent
a message to the list but it seems it never arrived. I hope this will
pass through. My system is a toshiba satellite A305-S5077, dual core pentium.

The symptoms are quite strange. At boot, NetworkManager fails to activate
my eth0 (r8169). Just stopping/restarting NM will make it works.

Then, after one or two or maximum three suspend to ram and resume that
works, all go awry. Notice that I do not know if the s2ram is the cause, or
simply the way to accelerate the bug.

The suspend-to-ram will fail with a messages: 

"gnome-power-manager: (romano) DBUS timed out, but recovering"

and a number of processes go into D state (please find their sysrq-t traces
few lines down). Now I cannot create new windows, nor doing sudo (sudo
anything will go into D limbo), and not even a clean shutdown. Trying that
the system loops forever saying: 

BUG: soft lockup - CPU#0 stuck for 11s! [ifconfig: 7481]

and sysrq-b is the only option. 

Complete dmesg, config, etc at:
http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_1/

Thanks, 

Romano 

PS sorry for the disclaimer, I cannot stop it (¡!) 

nmbd  D ca9cbea4 0  5464  1
   c256eaa0 0086 0002 ca9cbea4 ca9cbe9c  c256ebdc c17fba80 
   c250e900 c0426080 c0426080 c22f67d0 c01773a3 0010 c0426080 00013bab 
    00ff    c03bc514 c03bc51c c03bc518 
Call Trace:
 [cache_alloc_refill+115/1280] cache_alloc_refill+0x73/0x500
 [__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90
 [mutex_lock+20/32] mutex_lock+0x14/0x20
 [sock_ioctl+0/560] sock_ioctl+0x0/0x230
 [dev_ioctl+200/1312] dev_ioctl+0xc8/0x520
 [sock_init_data+108/384] sock_init_data+0x6c/0x180
 [inet_create+413/832] inet_create+0x19d/0x340
 [inotify_d_instantiate+24/128] inotify_d_instantiate+0x18/0x80
 [d_alloc+265/400] d_alloc+0x109/0x190
 [d_instantiate+59/80] d_instantiate+0x3b/0x50
 [udp_ioctl+0/160] udp_ioctl+0x0/0xa0
 [inet_ioctl+58/192] inet_ioctl+0x3a/0xc0
 [sock_ioctl+207/560] sock_ioctl+0xcf/0x230
 [sock_ioctl+0/560] sock_ioctl+0x0/0x230
 [do_ioctl+43/144] do_ioctl+0x2b/0x90
 [sys_socket+41/80] sys_socket+0x29/0x50
 [vfs_ioctl+92/656] vfs_ioctl+0x5c/0x290
 [sys_ioctl+61/112] sys_ioctl+0x3d/0x70
 [sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85
 ===

x-session-man D c2c49d28 0  5774   5246
   c1c4d000 00200082 0002 c2c49d28 c2c49d20  c1c4d13c c1803a80 
   c2c25c80 c0426080 c0426080 cab12550 c011d3c2 c03a13a0 c0426080 00016269 
    00ff    c03bc514 c03bc51c c03bc518 
Call Trace:
 [enqueue_task+18/48] enqueue_task+0x12/0x30
 [__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90
 [mutex_lock+20/32] mutex_lock+0x14/0x20
 [rtnetlink_rcv+8/32] rtnetlink_rcv+0x8/0x20
 [netlink_unicast+502/544] netlink_unicast+0x1f6/0x220
 [copy_from_user+46/112] copy_from_user+0x2e/0x70
 [memcpy_fromiovec+56/80] memcpy_fromiovec+0x38/0x50
 [netlink_sendmsg+488/720] netlink_sendmsg+0x1e8/0x2d0
 [__wake_up_sync+65/128] __wake_up_sync+0x41/0x80
 [sock_sendmsg+206/256] sock_sendmsg+0xce/0x100
 [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
 [__wake_up+62/96] __wake_up+0x3e/0x60
 [netlink_insert+197/320] netlink_insert+0xc5/0x140
 [copy_from_user+46/112] copy_from_user+0x2e/0x70
 [sys_sendto+307/384] sys_sendto+0x133/0x180
 [move_addr_to_user+95/112] move_addr_to_user+0x5f/0x70
 [sys_getsockname+205/208] sys_getsockname+0xcd/0xd0
 [__netlink_create+97/176] __netlink_create+0x61/0xb0
 [inotify_d_instantiate+24/128] inotify_d_instantiate+0x18/0x80
 [d_alloc+265/400] d_alloc+0x109/0x190
 [d_instantiate+59/80] d_instantiate+0x3b/0x50
 [sock_attach_fd+128/192] sock_attach_fd+0x80/0xc0
 [sys_socketcall+408/640] sys_socketcall+0x198/0x280
 [sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85
 [clip_device_event+16/160] clip_device_event+0x10/0xa0
 ===

ipD cb573d28 0  7487   7486
   c1dc7aa0 0082 0002 cb573d28 cb573d20  c1dc7bdc c1803a80 
   c27ed740 c0426080 c0426080 001280d2 c218eeb0 c218ef54 c0426080 00010bb9 
    00ff    c03bc514 c03bc51c c03bc518 
Call Trace:
 [__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90
 [mutex_lock+20/32] mutex_lock+0x14/0x20
 [rtnetlink_rcv+8/32] rtnetlink_rcv+0x8/0x20
 [netlink_unicast+502/544] netlink_unicast+0x1f6/0x220
 [copy_from_user+46/112] copy_from_user+0x2e/0x70
 [memcpy_fromiovec+56/80] memcpy_fromiovec+0x38/0x50
 [netlink_sendmsg+488/720] netlink_sendmsg+0x1e8/0x2d0
 [do_lookup+101/400] do_lookup+0x65/0x190
 [sock_sendmsg+206/256] sock_sendmsg+0xce/0x100
 [] __ext3_journal_dirty_metadata+0x22/0x60 [ext3]
 [] journal_get_write_access+0x29/0x40 [jbd]
 [autoremove_wake_function+0/80] 

2.6.24-rc1 fails with lockup and BUG:

2007-10-24 Thread Romano Giannetti

Hi,

2.6.23-rc1 fails for me. I have the sensation it is network-related, but
I am not sure, so I send this message just to the list. 
This same failure was present in git-5734-gd85714d, I sent
a message to the list but it seems it never arrived. I hope this will
pass through. My system is a toshiba satellite A305-S5077, dual core pentium.

The symptoms are quite strange. At boot, NetworkManager fails to activate
my eth0 (r8169). Just stopping/restarting NM will make it works.

Then, after one or two or maximum three suspend to ram and resume that
works, all go awry. Notice that I do not know if the s2ram is the cause, or
simply the way to accelerate the bug.

The suspend-to-ram will fail with a messages: 

gnome-power-manager: (romano) DBUS timed out, but recovering

and a number of processes go into D state (please find their sysrq-t traces
few lines down). Now I cannot create new windows, nor doing sudo (sudo
anything will go into D limbo), and not even a clean shutdown. Trying that
the system loops forever saying: 

BUG: soft lockup - CPU#0 stuck for 11s! [ifconfig: 7481]

and sysrq-b is the only option. 

Complete dmesg, config, etc at:
http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_1/

Thanks, 

Romano 

PS sorry for the disclaimer, I cannot stop it (¡!) 

nmbd  D ca9cbea4 0  5464  1
   c256eaa0 0086 0002 ca9cbea4 ca9cbe9c  c256ebdc c17fba80 
   c250e900 c0426080 c0426080 c22f67d0 c01773a3 0010 c0426080 00013bab 
    00ff    c03bc514 c03bc51c c03bc518 
Call Trace:
 [cache_alloc_refill+115/1280] cache_alloc_refill+0x73/0x500
 [__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90
 [mutex_lock+20/32] mutex_lock+0x14/0x20
 [sock_ioctl+0/560] sock_ioctl+0x0/0x230
 [dev_ioctl+200/1312] dev_ioctl+0xc8/0x520
 [sock_init_data+108/384] sock_init_data+0x6c/0x180
 [inet_create+413/832] inet_create+0x19d/0x340
 [inotify_d_instantiate+24/128] inotify_d_instantiate+0x18/0x80
 [d_alloc+265/400] d_alloc+0x109/0x190
 [d_instantiate+59/80] d_instantiate+0x3b/0x50
 [udp_ioctl+0/160] udp_ioctl+0x0/0xa0
 [inet_ioctl+58/192] inet_ioctl+0x3a/0xc0
 [sock_ioctl+207/560] sock_ioctl+0xcf/0x230
 [sock_ioctl+0/560] sock_ioctl+0x0/0x230
 [do_ioctl+43/144] do_ioctl+0x2b/0x90
 [sys_socket+41/80] sys_socket+0x29/0x50
 [vfs_ioctl+92/656] vfs_ioctl+0x5c/0x290
 [sys_ioctl+61/112] sys_ioctl+0x3d/0x70
 [sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85
 ===

x-session-man D c2c49d28 0  5774   5246
   c1c4d000 00200082 0002 c2c49d28 c2c49d20  c1c4d13c c1803a80 
   c2c25c80 c0426080 c0426080 cab12550 c011d3c2 c03a13a0 c0426080 00016269 
    00ff    c03bc514 c03bc51c c03bc518 
Call Trace:
 [enqueue_task+18/48] enqueue_task+0x12/0x30
 [__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90
 [mutex_lock+20/32] mutex_lock+0x14/0x20
 [rtnetlink_rcv+8/32] rtnetlink_rcv+0x8/0x20
 [netlink_unicast+502/544] netlink_unicast+0x1f6/0x220
 [copy_from_user+46/112] copy_from_user+0x2e/0x70
 [memcpy_fromiovec+56/80] memcpy_fromiovec+0x38/0x50
 [netlink_sendmsg+488/720] netlink_sendmsg+0x1e8/0x2d0
 [__wake_up_sync+65/128] __wake_up_sync+0x41/0x80
 [sock_sendmsg+206/256] sock_sendmsg+0xce/0x100
 [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
 [__wake_up+62/96] __wake_up+0x3e/0x60
 [netlink_insert+197/320] netlink_insert+0xc5/0x140
 [copy_from_user+46/112] copy_from_user+0x2e/0x70
 [sys_sendto+307/384] sys_sendto+0x133/0x180
 [move_addr_to_user+95/112] move_addr_to_user+0x5f/0x70
 [sys_getsockname+205/208] sys_getsockname+0xcd/0xd0
 [__netlink_create+97/176] __netlink_create+0x61/0xb0
 [inotify_d_instantiate+24/128] inotify_d_instantiate+0x18/0x80
 [d_alloc+265/400] d_alloc+0x109/0x190
 [d_instantiate+59/80] d_instantiate+0x3b/0x50
 [sock_attach_fd+128/192] sock_attach_fd+0x80/0xc0
 [sys_socketcall+408/640] sys_socketcall+0x198/0x280
 [sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85
 [clip_device_event+16/160] clip_device_event+0x10/0xa0
 ===

ipD cb573d28 0  7487   7486
   c1dc7aa0 0082 0002 cb573d28 cb573d20  c1dc7bdc c1803a80 
   c27ed740 c0426080 c0426080 001280d2 c218eeb0 c218ef54 c0426080 00010bb9 
    00ff    c03bc514 c03bc51c c03bc518 
Call Trace:
 [__mutex_lock_slowpath+85/144] __mutex_lock_slowpath+0x55/0x90
 [mutex_lock+20/32] mutex_lock+0x14/0x20
 [rtnetlink_rcv+8/32] rtnetlink_rcv+0x8/0x20
 [netlink_unicast+502/544] netlink_unicast+0x1f6/0x220
 [copy_from_user+46/112] copy_from_user+0x2e/0x70
 [memcpy_fromiovec+56/80] memcpy_fromiovec+0x38/0x50
 [netlink_sendmsg+488/720] netlink_sendmsg+0x1e8/0x2d0
 [do_lookup+101/400] do_lookup+0x65/0x190
 [sock_sendmsg+206/256] sock_sendmsg+0xce/0x100
 [f88fc042] __ext3_journal_dirty_metadata+0x22/0x60 [ext3]
 [f88de999] journal_get_write_access+0x29/0x40 [jbd]
 

Re: 2.6.24-rc1 fails with lockup and BUG:

2007-10-24 Thread Ingo Molnar

* Romano Giannetti [EMAIL PROTECTED] wrote:

 2.6.23-rc1 fails for me. I have the sensation it is network-related, 
 but I am not sure, so I send this message just to the list. This same 
 failure was present in git-5734-gd85714d, I sent a message to the list 
 but it seems it never arrived. I hope this will pass through. My 
 system is a toshiba satellite A305-S5077, dual core pentium.

could you turn on these in your .config:

  CONFIG_PROVE_LOCKING=y
  CONFIG_DEBUG_LIST=y
  CONFIG_FRAME_POINTER=y
  CONFIG_DEBUG_SLAB=y

and please post the resulting dmesg output - does lockdep notice any 
lockup reason? (your backtrace suggests some mutex stuff so it might as 
well detect it)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1 fails with lockup and BUG:

2007-10-24 Thread Romano Giannetti

On Wed, 2007-10-24 at 16:27 +0200, Ingo Molnar wrote:

 
   CONFIG_PROVE_LOCKING=y
   CONFIG_DEBUG_LIST=y
   CONFIG_FRAME_POINTER=y
   CONFIG_DEBUG_SLAB=y
 
 and please post the resulting dmesg output - does lockdep notice any 
 lockup reason? (your backtrace suggests some mutex stuff so it might as 
 well detect it)
 

Done. The results are at: 

http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_2/

in the  syslog-after-failed-suspend.txt file. After the failed suspend
(at line 15766) there where the bunch of things in D-state. I have left
the file intact.

At line 17646 there  is:

WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on() 

I waited a bit and then, on an already-opened root shell, did 
s2ram -f -p -m  (line 17811)

and then a lot more things happened, and I am somewhat lost.

Hope this could be useful to you.

Romano 

-- 
Sorry for the disclaimer --- ¡I cannot stop it!



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso 
del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, 
le informamos que cualquier forma de distribución, reproducción o uso de esta 
comunicación y/o de la información contenida en la misma están estrictamente 
prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por 
favor, notifíquelo inmediatamente al remitente contestando a este mensaje y 
proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive 
use of the intended addressee. If you are not the intended addressee, please 
note that any form of distribution, copying or use of this communication or the 
information in it is strictly prohibited by law. If you have received this 
communication in error, please immediately notify the sender by reply e-mail 
and destroy this message. Thank you for your cooperation. 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1 fails with lockup and BUG:

2007-10-24 Thread Ingo Molnar

* Romano Giannetti [EMAIL PROTECTED] wrote:

 Done. The results are at:
 
 http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_2/
 
 in the syslog-after-failed-suspend.txt file. After the failed suspend 
 (at line 15766) there where the bunch of things in D-state. I have 
 left the file intact.
 
 At line 17646 there is:
 
 WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on()

hm, this lockdep warning caused lockdep to turn itself off - hence we 
wont get to the really interesting warnings. We'll try to come up with a 
solution for this.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1 fails with lockup and BUG:

2007-10-24 Thread Peter Zijlstra
On Wed, 2007-10-24 at 17:55 +0200, Ingo Molnar wrote:
 * Romano Giannetti [EMAIL PROTECTED] wrote:
 
  Done. The results are at:
  
  http://www.dea.icai.upcomillas.es/romano/linux/info/2624rc1_2/
  
  in the syslog-after-failed-suspend.txt file. After the failed suspend 
  (at line 15766) there where the bunch of things in D-state. I have 
  left the file intact.
  
  At line 17646 there is:
  
  WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on()
 
 hm, this lockdep warning caused lockdep to turn itself off - hence we 
 wont get to the really interesting warnings. We'll try to come up with a 
 solution for this.

Does this help?

---
Subject: lockdep: invalid irq usage

this function can be called from hardirq context.

Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
---

Index: linux-2.6-2/kernel/sched_debug.c
===
--- linux-2.6-2.orig/kernel/sched_debug.c
+++ linux-2.6-2/kernel/sched_debug.c
@@ -80,6 +80,7 @@ print_task(struct seq_file *m, struct rq
 static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu)
 {
struct task_struct *g, *p;
+   unsigned long flags;
 
SEQ_printf(m,
\nrunnable tasks:\n
@@ -88,7 +89,7 @@ static void print_rq(struct seq_file *m,
--
\n);
 
-   read_lock_irq(tasklist_lock);
+   read_lock_irqsave(tasklist_lock, flags);
 
do_each_thread(g, p) {
if (!p-se.on_rq || task_cpu(p) != rq_cpu)
@@ -97,7 +98,7 @@ static void print_rq(struct seq_file *m,
print_task(m, rq, p);
} while_each_thread(g, p);
 
-   read_unlock_irq(tasklist_lock);
+   read_unlock_irqrestore(tasklist_lock, flags);
 }
 
 void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)



signature.asc
Description: This is a digitally signed message part


Re: 2.6.24-rc1 fails with lockup and BUG:

2007-10-24 Thread Joseph Fannin
On Wed, Oct 24, 2007 at 03:25:44PM +0200, Romano Giannetti wrote:

 Hi,

 2.6.23-rc1 fails for me. I have the sensation it is network-related, but
 I am not sure, so I send this message just to the list.
 This same failure was present in git-5734-gd85714d, I sent
 a message to the list but it seems it never arrived. I hope this will
 pass through. My system is a toshiba satellite A305-S5077, dual core pentium.

 The symptoms are quite strange. At boot, NetworkManager fails to activate
 my eth0 (r8169). Just stopping/restarting NM will make it works.


Denis V. Lunev wrote a patch for the NetworkManager thing a day or two
ago (which DaveM has queued).

Since netlink is involved in the traces you sent, this might do something
for the other too.

The patch I recieved follows:


 Revert to original netlink behavior. Do not reply with ACK if the
 netlink dump has bees successfully started.

 libnl has been broken by the cd40b7d3983c708aabe3d3008ec64ffce56d33b0
 The following command reproduce the problem:
/nl-route-get 192.168.1.1

 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]



diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 98e313e..44a8b41 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1565,7 +1565,10 @@ int netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
 
netlink_dump(sk);
sock_put(sk);
-   return 0;
+
+   /* We successfully started a dump, by returning -EINTR we
+* signal not to send ACK even if it was requested */
+   return -EINTR;
 }
 
 void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err)
@@ -1619,17 +1622,21 @@ int netlink_rcv_skb(struct sk_buff *skb, int 
(*cb)(struct sk_buff *,
 
/* Only requests are handled by the kernel */
if (!(nlh-nlmsg_flags  NLM_F_REQUEST))
-   goto skip;
+   goto ack;
 
/* Skip control messages */
if (nlh-nlmsg_type  NLMSG_MIN_TYPE)
-   goto skip;
+   goto ack;
 
err = cb(skb, nlh);
-skip:
+   if (err == -EINTR)
+   goto skip;
+
+ack:
if (nlh-nlmsg_flags  NLM_F_ACK || err)
netlink_ack(skb, nlh, err);
 
+skip:
msglen = NLMSG_ALIGN(nlh-nlmsg_len);
if (msglen  skb-len)
msglen = skb-len;





--
Joseph Fannin
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/