On 04/13/2014 07:29 AM, Cale, Yonatan wrote:

> # cat waiting-tasks.txt
> SysRq : Show Blocked State
>   task                        PC stack   pid father
> iscsid          D 0000000000000000     0  2842      1 0x00000000
>  ffff880137f83918 0000000000000086 ffff88010e3be2d0 ffff880137f82010
>  0000000000004000 ffff88013b2b8c40 ffff880137f83fd8 ffff880137f83fd8
>  0000000000000000 ffff88013b2b8c40 ffffffff81a0b020 ffff88013b2b8ed0
> Call Trace:
>  [<ffffffff81273a97>] ? kobject_put+0x27/0x60
>  [<ffffffff812dc447>] ? put_device+0x17/0x20
>  [<ffffffff8103418f>] ? complete+0x4f/0x60
>  [<ffffffff815ed07f>] schedule+0x3f/0x60
>  [<ffffffff815eda02>] __mutex_lock_slowpath+0x102/0x180
>  [<ffffffff815edf0b>] mutex_lock+0x2b/0x50
>  [<ffffffffa0011a97>] __iscsi_unbind_session+0x67/0x160 [scsi_transport_iscsi]
>  [<ffffffffa0011ca1>] iscsi_remove_session+0x111/0x1f0 [scsi_transport_iscsi]
>  [<ffffffffa0011d96>] iscsi_destroy_session+0x16/0x60 [scsi_transport_iscsi]
>  [<ffffffffa002573d>] iscsi_session_teardown+0x9d/0xd0 [libiscsi]
>  [<ffffffffa0032300>] iscsi_sw_tcp_session_destroy+0x50/0x70 [iscsi_tcp]
>  [<ffffffffa0012c9d>] iscsi_if_rx+0x7dd/0xaa0 [scsi_transport_iscsi]
>  [<ffffffff814f50ee>] netlink_unicast+0x2ae/0x2c0
>  [<ffffffff814d11dc>] ? memcpy_fromiovec+0x7c/0xa0
>  [<ffffffff814f5aae>] netlink_sendmsg+0x33e/0x380
>  [<ffffffff814c55f8>] sock_sendmsg+0xe8/0x120
>  [<ffffffff811078bf>] ? do_lookup+0xcf/0x360
>  [<ffffffff8111ad6f>] ? mntput+0x1f/0x40
>  [<ffffffff81107012>] ? path_put+0x22/0x30
>  [<ffffffff814c512b>] ? move_addr_to_kernel+0x6b/0x70
>  [<ffffffff814d13a1>] ? verify_iovec+0x51/0x100
>  [<ffffffff814c683f>] __sys_sendmsg+0x3df/0x3f0
>  [<ffffffff810f8419>] ? kmem_cache_free+0xe9/0xf0
>  [<ffffffff81100dac>] ? cp_new_stat+0xfc/0x120
>  [<ffffffff814c6a79>] sys_sendmsg+0x49/0x80
>  [<ffffffff815ef86b>] system_call_fastpath+0x16/0x1b

Ok, that was more interesting than I expected. If you are not running a
iscsiadm logout command, then it looks like the target returned a error
code indicating that target is not coming back. iscsid handled this by
trying to remove the target and we are stuck in there. This and the scan
command should be fast at this time because it looks like the
replacement/recovery timeout has expired:

 session1: session recovery timed out after 120 secs
 session2: session recovery timed out after 120 secs

At this time, all IO is just fast failed so we are not waiting for IO
and any time we need that mutex it should be quick.

1. Can you send me a tcpdump trace? I am guessing that you do not want
the session to be deleted during this time, so we need the trace to see
what error code is leading to this.

2. I am wondering if the hang is because the device is offlined when
this happened. I did not try that case. I will setup that kernel and try
here.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to