Hello,
with SLES10 SP1 on x86_64 (open-iscsi-2.0.707-0.32) I'm seeing a problem during
login using "iscsiadm -m node -L automatic". After a few logins, login suddenly
fails:
Login session [172.20.77.2:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.1.50001fe1500c1f60.50001fe1500c1f68]
Login session [172.20.77.2:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.1.50001fe1500c1f60.50001fe1500c1f69]
Login session [172.20.77.2:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.1.50001fe1500c1f60.50001fe1500c1f6c]
Login session [172.20.77.2:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.1.50001fe1500c1f60.50001fe1500c1f6d]
Login session [172.20.77.2:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.1.50001fe1500c1f20.50001fe1500c1f28]
Login session [172.20.77.2:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.1.50001fe1500c1f20.50001fe1500c1f29]
Login session [172.20.77.2:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.1.50001fe1500c1f20.50001fe1500c1f2c]
Login session [172.20.77.2:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.1.50001fe1500c1f20.50001fe1500c1f2d]
Login session [172.20.77.1:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis1.1.50001fe1500c1f60.50001fe1500c1f68]
Login session [172.20.77.1:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis1.1.50001fe1500c1f60.50001fe1500c1f69]
Login session [172.20.77.1:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis1.1.50001fe1500c1f60.50001fe1500c1f6c]
Login session [172.20.77.1:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis1.1.50001fe1500c1f60.50001fe1500c1f6d]
Login session [172.20.76.2:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.0.50001fe1500c1f60.50001fe1500c1f68]
Login session [172.20.76.2:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.0.50001fe1500c1f60.50001fe1500c1f69]
Login session [172.20.76.2:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.0.50001fe1500c1f60.50001fe1500c1f6c]
Login session [172.20.76.2:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.0.50001fe1500c1f60.50001fe1500c1f6d]
Login session [172.20.77.1:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis1.1.50001fe1500c1f20.50001fe1500c1f28]
Login session [172.20.77.1:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis1.1.50001fe1500c1f20.50001fe1500c1f29]
Login session [172.20.77.1:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis1.1.50001fe1500c1f20.50001fe1500c1f2c]
Login session [172.20.77.1:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis1.1.50001fe1500c1f20.50001fe1500c1f2d]
Login session [172.20.76.2:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.0.50001fe1500c1f20.50001fe1500c1f28]
Login session [172.20.76.2:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.0.50001fe1500c1f20.50001fe1500c1f29]
iscsiadm: Could not login session (err 5).
iscsiadm: initiator reported error (5 - encountered iSCSI login failure)
Login session [172.20.76.2:3260 iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.0.50001fe1500c1f20.50001fe1500c1f2c]
Then the SSH session hangs, but the machine is still alive. Syslog says:
Mar 2 10:36:27 testhost kernel: Unable to handle kernel NULL pointer dereferenc
e at 0000000000000232 RIP:
Mar 2 10:36:27 testhost kernel: <ffffffff802ba089>{inet_sendmsg+23}
Mar 2 10:36:27 testhost kernel: PGD 0
Mar 2 10:36:27 testhost kernel: Oops: 0000 [1] SMP
Mar 2 10:36:27 testhost kernel: last sysfs file:
/class/iscsi_connection/connection22:0/exp_statsn
Mar 2 10:36:27 testhost kernel: CPU 3
Mar 2 10:36:27 testhost kernel: Modules linked in: crc32c libcrc32c iscsi_tcp l
ibiscsi scsi_transport_iscsi nfs lockd nfs_acl sunrpc ip6t_REJECT xt_pkttype ipt
_REJECT ipt_TCPMSS xt_tcpudp ipt_LOG xt_limit xt_state iptable_mangle iptable_na
t ip_nat ip6table_mangle ip_conntrack nfnetlink ip6table_filter ip6_tables xt_ph
ysdev iptable_filter ip_tables x_tables bridge netbk netloop xenblk blkbk blktap
xenbus_be ipmi_devintf ipv6 ipmi_si ipmi_msghandler af_packet button battery ac
sr_mod loop usb_storage usbhid hw_random ide_cd cdrom i2c_amd8111 i2c_amd756 i2
c_core ohci_hcd mptctl shpchp usbcore pci_hotplug e1000 8250 serial_core reiserf
s dm_snapshot dm_mod fan thermal processor sg mptsas mptscsih mptbase scsi_trans
port_sas amd74xx sd_mod scsi_mod ide_disk ide_core
Mar 2 10:36:27 testhost kernel: Pid: 25485, comm: scsi_wq_25 Not tainted
2.6.16.54-0.2.11-xen #1
Mar 2 10:36:27 testhost kernel: RIP: e030:[<ffffffff802ba089>] <ffffffff802ba08
9>{inet_sendmsg+23}
Mar 2 10:36:27 testhost kernel: RSP: e02b:ffff880011e0db78 EFLAGS: 00010296
Mar 2 10:36:27 testhost kernel: RAX: ffffffff802f1c40 RBX: 0000000000000000 RCX
: 0000000000000200
Mar 2 10:36:27 testhost kernel: RDX: ffff880011e0dd58 RSI: ffff8800080988c0 RDI
: ffff880011e0dba8
Mar 2 10:36:27 testhost kernel: RBP: 0000000000000200 R08: 0000000000000200 R09
: 0000000000008000
Mar 2 10:36:27 testhost kernel: R10: 00000000dbb545c6 R11: 0000000000000001 R12
: ffff880011e0dd58
Mar 2 10:36:27 testhost kernel: R13: ffff880011e0dba8 R14: ffff88000bdc52c0 R15
: 0000000000000200
Mar 2 10:36:27 testhost kernel: FS: 00002b77ef71e6d0(0000) GS:ffffffff803a2180
(0000) knlGS:0000000000000000
Mar 2 10:36:27 testhost kernel: CS: e033 DS: 0000 ES: 0000
Mar 2 10:36:27 testhost kernel: Process scsi_wq_25 (pid: 25485, threadinfo ffff
880011e0c000, task ffff8800147e17c0)
Mar 2 10:36:27 testhost kernel: Stack: 0000000000000030 ffff8800080988c0 000000
0000000200 ffff880011e0dd58
Mar 2 10:36:27 testhost kernel: 0000000000000000 ffffffff8026e1da 000000
0000000018 ffff880011e0e000
Mar 2 10:36:27 testhost kernel: 0000000000000000 ffffffff00000001
Mar 2 10:36:27 testhost kernel: Call Trace:
<ffffffff8026e1da>{sock_sendmsg+249}
<ffffffff802d39dd>{__kprobes_text_start+845}
Mar 2 10:36:27 testhost kernel:
<ffffffff8014195d>{autoremove_wake_function+0}
<ffffffff8015cd68>{__alloc_pages+101}
Mar 2 10:36:27 testhost kernel: <ffffffff8026fac1>{kernel_sendmsg+53}
<ffffffff802708cb>{sock_no_sendpage+130}
Mar 2 10:36:27 testhost kernel: <ffffffff8010df38>{monotonic_clock+53}
<ffffffff883a4217>{:iscsi_tcp:iscsi_tcp_mtask_xmit+502}
Mar 2 10:36:27 testhost kernel:
<ffffffff8839ba18>{:libiscsi:iscsi_xmitworker+0}
<ffffffff8839b5e7>{:libiscsi:iscsi_xmit_mtask+84}
Mar 2 10:36:27 testhost kernel:
<ffffffff8839bb4e>{:libiscsi:iscsi_xmitworker+310}
<ffffffff8013dbc1>{run_workqueue+148}
Mar 2 10:36:27 testhost kernel: <ffffffff8013e34e>{worker_thread+0}
<ffffffff80141582>{keventd_create_kthread+0}
Mar 2 10:36:27 testhost kernel: <ffffffff8013e43e>{worker_thread+240}
<ffffffff801255ac>{default_wake_function+0}
Mar 2 10:36:27 testhost kernel:
<ffffffff80141582>{keventd_create_kthread+0}
<ffffffff80141582>{keventd_create_kthread+0}
Mar 2 10:36:27 testhost kernel: <ffffffff80141826>{kthread+212}
<ffffffff8010bab6>{child_rip+8}
Mar 2 10:36:27 testhost kernel:
<ffffffff80141582>{keventd_create_kthread+0} <ffffffff80141752>{kthread+0}
Mar 2 10:36:27 testhost kernel: <ffffffff8010baae>{child_rip+0}
Mar 2 10:36:27 testhost kernel:
Mar 2 10:36:27 testhost kernel: Code: 66 83 bb 32 02 00 00 00 75 0c 48 89 df
e8
ad f5 ff ff 85 c0
Mar 2 10:36:27 testhost kernel: RIP <ffffffff802ba089>{inet_sendmsg+23} RSP
<ffff880011e0db78>
Mar 2 10:36:27 testhost kernel: CR2: 0000000000000232
Mar 2 10:39:29 testhost kernel: <3>iscsi: can not unicast skb (-11)
Mar 2 10:39:29 testhost kernel: iscsi: can not broadcast skb (-3)
Mar 2 10:39:29 testhost kernel: connection12:0: iscsi: detected conn error (10
11)
Mar 2 10:39:29 testhost kernel: iscsi: can not unicast skb (-11)
Mar 2 10:39:29 testhost kernel: iscsi: can not broadcast skb (-3)
Mar 2 10:39:29 testhost kernel: connection11:0: iscsi: detected conn error (10
11)
Mar 2 10:39:30 testhost kernel: iscsi: can not unicast skb (-11)
Mar 2 10:39:30 testhost kernel: iscsi: can not broadcast skb (-3)
Mar 2 10:39:30 testhost kernel: connection13:0: iscsi: detected conn error (10
11)
Mar 2 10:39:30 testhost kernel: iscsi: can not unicast skb (-11)
Mar 2 10:39:30 testhost kernel: iscsi: can not broadcast skb (-3)
Mar 2 10:39:30 testhost kernel: connection14:0: iscsi: detected conn error (10
11)
Mar 2 10:39:31 testhost kernel: iscsi: can not unicast skb (-11)
Mar 2 10:39:31 testhost kernel: iscsi: can not broadcast skb (-3)
Mar 2 10:39:31 testhost kernel: connection15:0: iscsi: detected conn error (10
11)
As the same procedure worked a many times, I suspect a race condition.
The hanging "iscsiadm -m node -L automatic" process hangs at:
# strace -p 25230
Process 25230 attached - interrupt to quit
recvfrom(5,
# lsof -p 25230
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
iscsiadm 25230 root cwd DIR 253,10 920 125 /root
iscsiadm 25230 root rtd DIR 253,10 512 2 /
iscsiadm 25230 root txt REG 253,10 135464 54599 /sbin/iscsiadm
iscsiadm 25230 root mem REG 0,0 0 [heap] (stat:
No
such file or directory)
iscsiadm 25230 root mem REG 253,10 133423 9973
/lib64/ld-2.4.so
iscsiadm 25230 root mem REG 253,10 1505121 9980 /lib64/libc-
2.4.so
iscsiadm 25230 root 0u CHR 136,0 2 /dev/pts/0
iscsiadm 25230 root 1u CHR 136,0 2 /dev/pts/0
iscsiadm 25230 root 2u CHR 136,0 2 /dev/pts/0
iscsiadm 25230 root 3r DIR 253,10 3120 54718
/etc/iscsi/nodes
iscsiadm 25230 root 4r DIR 253,10 80 54783
/etc/iscsi/nodes/iqn.1986-
03.com.hp:fcgw.mpx100:rkdvmis2.0.50001fe1500c1f20.50001fe1500c1f2c
iscsiadm 25230 root 5u unix 0xffff88000cb02680 763005 socket
The kernel being used is "kernel-xen-2.6.16.54-0.2.11" (not the absolutely
latest,
but stable for months)
# uptime
10:52am up 100 days 17:30, 2 users, load average: 1.18, 1.23, 0.94
Regards,
Ulrich
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---