Re: [BUG] Hang (deadlock?) inside iscsi_if_rx

2012-01-25 Thread Sasha Levin
On Tue, Jan 24, 2012 at 11:29 AM, Mike Christie micha...@cs.wisc.edu wrote:
 On 01/24/2012 08:30 AM, Sasha Levin wrote:
 Hi all,

 I got the following hang when running trinity under KVM tool:

 [ 2520.824250] INFO: task trinity:31986 blocked for more than 120 seconds.
 [ 2520.825513] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
 this message.
 [ 2520.826983] trinity         D   5504 31986   2767 
 0x0004
 [ 2520.828416]  88000a21ba98 0086 88000a3f8000 
 88000a3f8000
 [ 2520.829892]  001d4340 88000a21bfd8 88000a21a000 
 001d4340
 [ 2520.831380]  001d4340 001d4340 88000a21bfd8 
 001d4340
 [ 2520.832838] Call Trace:
 [ 2520.837089]  [8265297a] schedule+0x3a/0x50
 [ 2520.837983]  [82651049] __mutex_lock_common+0x209/0x5b0
 [ 2520.839181]  [81af64f3] ? iscsi_if_rx+0x23/0xa00
 [ 2520.840232]  [81053943] ? sched_clock+0x13/0x20
 [ 2520.841242]  [81af64f3] ? iscsi_if_rx+0x23/0xa00
 [ 2520.842271]  [82651430] mutex_lock_nested+0x40/0x50
 [ 2520.843353]  [81af64f3] iscsi_if_rx+0x23/0xa00
 [ 2520.844461]  [810dfffd] ? sub_preempt_count+0x9d/0xd0
 [ 2520.845576]  [82654130] ? _raw_read_unlock+0x30/0x60
 [ 2520.846705]  [8221632e] netlink_unicast+0x1ae/0x1f0
 [ 2520.847790]  [82216ae7] netlink_sendmsg+0x227/0x350
 [ 2520.848884]  [821a490d] ? sock_update_netprioidx+0xdd/0x1b0
 [ 2520.850105]  [821a4882] ? sock_update_netprioidx+0x52/0x1b0
 [ 2520.851282]  [8219e376] sock_aio_write+0x166/0x180
 [ 2520.852317]  [810dfe51] ? get_parent_ip+0x11/0x50
 [ 2520.853342]  [811d0b1a] do_sync_write+0xda/0x120
 [ 2520.854347]  [81109c42] ? lock_release_holdtime+0xb2/0x160
 [ 2520.855533]  [810dfe51] ? get_parent_ip+0x11/0x50
 [ 2520.856569]  [817914e7] ? security_file_permission+0x27/0xb0
 [ 2520.857756]  [811d0ccc] vfs_write+0x16c/0x180
 [ 2520.858713]  [811d0dcf] sys_write+0x4f/0xa0
 [ 2520.859680]  [826553f9] system_call_fastpath+0x16/0x1b
 [ 2520.873039] 1 lock held by trinity/31986:
 [ 2520.873801]  #0:  (rx_queue_mutex){+.+...}, at: [81af64f3] 
 iscsi_if_rx+0x23/0xa00
 [ 2520.875587] Kernel panic - not syncing: hung_task: blocked tasks

 From looking at the logs, there were two instances of the fuzzer blocked on 
 a write() to a netlink socket. There are no iSCSI targets anywhere on the 
 network.


 What does the trinity test do? Does it write random junk to netlink
 sockets to test the kernel handlers or were you actually stressing the
 creation of targets. If the latter what driver were you trying to create
 targets with (iscsi_tcp, bnx2i, etc)?

It basically just writes junk into sockets. In this case it was just a
large chunk of it written into the netlink socket.

I'm not directly trying to stress iscsi, though it might have happened
somehow during the test accidentally (but thats not probable).

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



[BUG] Hang (deadlock?) inside iscsi_if_rx

2012-01-25 Thread Sasha Levin
Hi all,

I got the following hang when running trinity under KVM tool:

[ 2520.824250] INFO: task trinity:31986 blocked for more than 120 seconds.
[ 2520.825513] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this 
message.
[ 2520.826983] trinity D   5504 31986   2767 0x0004
[ 2520.828416]  88000a21ba98 0086 88000a3f8000 
88000a3f8000
[ 2520.829892]  001d4340 88000a21bfd8 88000a21a000 
001d4340
[ 2520.831380]  001d4340 001d4340 88000a21bfd8 
001d4340
[ 2520.832838] Call Trace:
[ 2520.837089]  [8265297a] schedule+0x3a/0x50
[ 2520.837983]  [82651049] __mutex_lock_common+0x209/0x5b0
[ 2520.839181]  [81af64f3] ? iscsi_if_rx+0x23/0xa00
[ 2520.840232]  [81053943] ? sched_clock+0x13/0x20
[ 2520.841242]  [81af64f3] ? iscsi_if_rx+0x23/0xa00
[ 2520.842271]  [82651430] mutex_lock_nested+0x40/0x50
[ 2520.843353]  [81af64f3] iscsi_if_rx+0x23/0xa00
[ 2520.844461]  [810dfffd] ? sub_preempt_count+0x9d/0xd0
[ 2520.845576]  [82654130] ? _raw_read_unlock+0x30/0x60
[ 2520.846705]  [8221632e] netlink_unicast+0x1ae/0x1f0
[ 2520.847790]  [82216ae7] netlink_sendmsg+0x227/0x350
[ 2520.848884]  [821a490d] ? sock_update_netprioidx+0xdd/0x1b0
[ 2520.850105]  [821a4882] ? sock_update_netprioidx+0x52/0x1b0
[ 2520.851282]  [8219e376] sock_aio_write+0x166/0x180
[ 2520.852317]  [810dfe51] ? get_parent_ip+0x11/0x50
[ 2520.853342]  [811d0b1a] do_sync_write+0xda/0x120
[ 2520.854347]  [81109c42] ? lock_release_holdtime+0xb2/0x160
[ 2520.855533]  [810dfe51] ? get_parent_ip+0x11/0x50
[ 2520.856569]  [817914e7] ? security_file_permission+0x27/0xb0
[ 2520.857756]  [811d0ccc] vfs_write+0x16c/0x180
[ 2520.858713]  [811d0dcf] sys_write+0x4f/0xa0
[ 2520.859680]  [826553f9] system_call_fastpath+0x16/0x1b
[ 2520.873039] 1 lock held by trinity/31986:
[ 2520.873801]  #0:  (rx_queue_mutex){+.+...}, at: [81af64f3] 
iscsi_if_rx+0x23/0xa00
[ 2520.875587] Kernel panic - not syncing: hung_task: blocked tasks

From looking at the logs, there were two instances of the fuzzer blocked on a 
write() to a netlink socket. There are no iSCSI targets anywhere on the 
network.

Thanks.

-- 

Sasha.


-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: [BUG] Hang (deadlock?) inside iscsi_if_rx

2012-01-25 Thread Mike Christie
On 01/24/2012 10:51 AM, Sasha Levin wrote:
 On Tue, Jan 24, 2012 at 11:29 AM, Mike Christie micha...@cs.wisc.edu wrote:
 On 01/24/2012 08:30 AM, Sasha Levin wrote:
 Hi all,

 I got the following hang when running trinity under KVM tool:

 [ 2520.824250] INFO: task trinity:31986 blocked for more than 120 seconds.
 [ 2520.825513] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
 this message.
 [ 2520.826983] trinity D   5504 31986   2767 
 0x0004
 [ 2520.828416]  88000a21ba98 0086 88000a3f8000 
 88000a3f8000
 [ 2520.829892]  001d4340 88000a21bfd8 88000a21a000 
 001d4340
 [ 2520.831380]  001d4340 001d4340 88000a21bfd8 
 001d4340
 [ 2520.832838] Call Trace:
 [ 2520.837089]  [8265297a] schedule+0x3a/0x50
 [ 2520.837983]  [82651049] __mutex_lock_common+0x209/0x5b0
 [ 2520.839181]  [81af64f3] ? iscsi_if_rx+0x23/0xa00
 [ 2520.840232]  [81053943] ? sched_clock+0x13/0x20
 [ 2520.841242]  [81af64f3] ? iscsi_if_rx+0x23/0xa00
 [ 2520.842271]  [82651430] mutex_lock_nested+0x40/0x50
 [ 2520.843353]  [81af64f3] iscsi_if_rx+0x23/0xa00
 [ 2520.844461]  [810dfffd] ? sub_preempt_count+0x9d/0xd0
 [ 2520.845576]  [82654130] ? _raw_read_unlock+0x30/0x60
 [ 2520.846705]  [8221632e] netlink_unicast+0x1ae/0x1f0
 [ 2520.847790]  [82216ae7] netlink_sendmsg+0x227/0x350
 [ 2520.848884]  [821a490d] ? sock_update_netprioidx+0xdd/0x1b0
 [ 2520.850105]  [821a4882] ? sock_update_netprioidx+0x52/0x1b0
 [ 2520.851282]  [8219e376] sock_aio_write+0x166/0x180
 [ 2520.852317]  [810dfe51] ? get_parent_ip+0x11/0x50
 [ 2520.853342]  [811d0b1a] do_sync_write+0xda/0x120
 [ 2520.854347]  [81109c42] ? lock_release_holdtime+0xb2/0x160
 [ 2520.855533]  [810dfe51] ? get_parent_ip+0x11/0x50
 [ 2520.856569]  [817914e7] ? security_file_permission+0x27/0xb0
 [ 2520.857756]  [811d0ccc] vfs_write+0x16c/0x180
 [ 2520.858713]  [811d0dcf] sys_write+0x4f/0xa0
 [ 2520.859680]  [826553f9] system_call_fastpath+0x16/0x1b
 [ 2520.873039] 1 lock held by trinity/31986:
 [ 2520.873801]  #0:  (rx_queue_mutex){+.+...}, at: [81af64f3] 
 iscsi_if_rx+0x23/0xa00
 [ 2520.875587] Kernel panic - not syncing: hung_task: blocked tasks

 From looking at the logs, there were two instances of the fuzzer blocked on 
 a write() to a netlink socket. There are no iSCSI targets anywhere on the 
 network.


 What does the trinity test do? Does it write random junk to netlink
 sockets to test the kernel handlers or were you actually stressing the
 creation of targets. If the latter what driver were you trying to create
 targets with (iscsi_tcp, bnx2i, etc)?
 
 It basically just writes junk into sockets. In this case it was just a
 large chunk of it written into the netlink socket.
 

Could you tell me what arguments you used?

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: [BUG] Hang (deadlock?) inside iscsi_if_rx

2012-01-25 Thread Mike Christie
On 01/25/2012 08:17 PM, Mike Christie wrote:
 On 01/24/2012 10:51 AM, Sasha Levin wrote:
 On Tue, Jan 24, 2012 at 11:29 AM, Mike Christie micha...@cs.wisc.edu wrote:
 On 01/24/2012 08:30 AM, Sasha Levin wrote:
 Hi all,

 I got the following hang when running trinity under KVM tool:

 [ 2520.824250] INFO: task trinity:31986 blocked for more than 120 seconds.
 [ 2520.825513] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
 this message.
 [ 2520.826983] trinity D   5504 31986   2767 
 0x0004
 [ 2520.828416]  88000a21ba98 0086 88000a3f8000 
 88000a3f8000
 [ 2520.829892]  001d4340 88000a21bfd8 88000a21a000 
 001d4340
 [ 2520.831380]  001d4340 001d4340 88000a21bfd8 
 001d4340
 [ 2520.832838] Call Trace:
 [ 2520.837089]  [8265297a] schedule+0x3a/0x50
 [ 2520.837983]  [82651049] __mutex_lock_common+0x209/0x5b0
 [ 2520.839181]  [81af64f3] ? iscsi_if_rx+0x23/0xa00
 [ 2520.840232]  [81053943] ? sched_clock+0x13/0x20
 [ 2520.841242]  [81af64f3] ? iscsi_if_rx+0x23/0xa00
 [ 2520.842271]  [82651430] mutex_lock_nested+0x40/0x50
 [ 2520.843353]  [81af64f3] iscsi_if_rx+0x23/0xa00
 [ 2520.844461]  [810dfffd] ? sub_preempt_count+0x9d/0xd0
 [ 2520.845576]  [82654130] ? _raw_read_unlock+0x30/0x60
 [ 2520.846705]  [8221632e] netlink_unicast+0x1ae/0x1f0
 [ 2520.847790]  [82216ae7] netlink_sendmsg+0x227/0x350
 [ 2520.848884]  [821a490d] ? sock_update_netprioidx+0xdd/0x1b0
 [ 2520.850105]  [821a4882] ? sock_update_netprioidx+0x52/0x1b0
 [ 2520.851282]  [8219e376] sock_aio_write+0x166/0x180
 [ 2520.852317]  [810dfe51] ? get_parent_ip+0x11/0x50
 [ 2520.853342]  [811d0b1a] do_sync_write+0xda/0x120
 [ 2520.854347]  [81109c42] ? lock_release_holdtime+0xb2/0x160
 [ 2520.855533]  [810dfe51] ? get_parent_ip+0x11/0x50
 [ 2520.856569]  [817914e7] ? security_file_permission+0x27/0xb0
 [ 2520.857756]  [811d0ccc] vfs_write+0x16c/0x180
 [ 2520.858713]  [811d0dcf] sys_write+0x4f/0xa0
 [ 2520.859680]  [826553f9] system_call_fastpath+0x16/0x1b
 [ 2520.873039] 1 lock held by trinity/31986:
 [ 2520.873801]  #0:  (rx_queue_mutex){+.+...}, at: [81af64f3] 
 iscsi_if_rx+0x23/0xa00
 [ 2520.875587] Kernel panic - not syncing: hung_task: blocked tasks

 From looking at the logs, there were two instances of the fuzzer blocked 
 on a write() to a netlink socket. There are no iSCSI targets anywhere on 
 the network.


 What does the trinity test do? Does it write random junk to netlink
 sockets to test the kernel handlers or were you actually stressing the
 creation of targets. If the latter what driver were you trying to create
 targets with (iscsi_tcp, bnx2i, etc)?

 It basically just writes junk into sockets. In this case it was just a
 large chunk of it written into the netlink socket.

 
 Could you tell me what arguments you used?

And what kernel was this with?

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: [PATCH] iscsi: don't hang in endless loop if no targets present

2012-01-25 Thread Mike Christie
On 01/25/2012 09:16 PM, Sasha Levin wrote:
 iscsi_if_send_reply() may return -ESRCH if there were no targets to send
 data to. Currently we're ignoring this value and looping in attempt to do it
 over and over, which will usually lead in a hung task like this one:
 
 [ 4920.817298] INFO: task trinity:9074 blocked for more than 120 seconds.
 [ 4920.818527] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
 this message.
 [ 4920.819982] trinity D   5504  9074   2756 
 0x0004
 [ 4920.825374]  880003961a98 0086 881aa000 
 881aa000
 [ 4920.826791]  001d4340 880003961fd8 88000396 
 001d4340
 [ 4920.828241]  001d4340 001d4340 880003961fd8 
 001d4340
 [ 4920.833231]
 [ 4920.833519] Call Trace:
 [ 4920.834010]  [826363fa] schedule+0x3a/0x50
 [ 4920.834953]  [82634ac9] __mutex_lock_common+0x209/0x5b0
 [ 4920.836226]  [81af805d] ? iscsi_if_rx+0x2d/0x990
 [ 4920.837281]  [81053943] ? sched_clock+0x13/0x20
 [ 4920.838305]  [81af805d] ? iscsi_if_rx+0x2d/0x990
 [ 4920.839336]  [82634eb0] mutex_lock_nested+0x40/0x50
 [ 4920.840423]  [81af805d] iscsi_if_rx+0x2d/0x990
 [ 4920.841434]  [810dffed] ? sub_preempt_count+0x9d/0xd0
 [ 4920.842548]  [82637bb0] ? _raw_read_unlock+0x30/0x60
 [ 4920.843666]  [821f71de] netlink_unicast+0x1ae/0x1f0
 [ 4920.844751]  [821f7997] netlink_sendmsg+0x227/0x350
 [ 4920.845850]  [821857bd] ? sock_update_netprioidx+0xdd/0x1b0
 [ 4920.847060]  [82185732] ? sock_update_netprioidx+0x52/0x1b0
 [ 4920.848276]  [8217f226] sock_aio_write+0x166/0x180
 [ 4920.849348]  [810dfe41] ? get_parent_ip+0x11/0x50
 [ 4920.850428]  [811d0d9a] do_sync_write+0xda/0x120
 [ 4920.851465]  [810dffed] ? sub_preempt_count+0x9d/0xd0
 [ 4920.852579]  [810dfe41] ? get_parent_ip+0x11/0x50
 [ 4920.853608]  [81791887] ? security_file_permission+0x27/0xb0
 [ 4920.854821]  [811d0f4c] vfs_write+0x16c/0x180
 [ 4920.855781]  [811d104f] sys_write+0x4f/0xa0
 [ 4920.856798]  [82638e79] system_call_fastpath+0x16/0x1b
 [ 4920.877487] 1 lock held by trinity/9074:
 [ 4920.878239]  #0:  (rx_queue_mutex){+.+...}, at: [81af805d] 
 iscsi_if_rx+0x2d/0x990
 [ 4920.880005] Kernel panic - not syncing: hung_task: blocked tasks
 
 Signed-off-by: Sasha Levin levinsasha...@gmail.com
 ---
  drivers/scsi/scsi_transport_iscsi.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/scsi/scsi_transport_iscsi.c 
 b/drivers/scsi/scsi_transport_iscsi.c
 index cfd4914..c26707a 100644
 --- a/drivers/scsi/scsi_transport_iscsi.c
 +++ b/drivers/scsi/scsi_transport_iscsi.c
 @@ -2110,7 +2110,7 @@ iscsi_if_rx(struct sk_buff *skb)
   break;
   err = iscsi_if_send_reply(group, nlh-nlmsg_seq,
   nlh-nlmsg_type, 0, 0, ev, sizeof(*ev));
 - } while (err  0  err != -ECONNREFUSED);
 + } while (err  0  err != -ECONNREFUSED  err != -ESRCH);
   skb_pull(skb, rlen);
   }
   mutex_unlock(rx_queue_mutex);


Looks ok. Thanks for debugging and the patch.

Acked-by: Mike Christie micha...@cs.wisc.edu

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.