[Think this may have bounced from the Google Group first time around. It
certainly hasn't hit the archives or been returned to me.]

I'm seeing a fairly rare bug with drives I'm accessing with open-iscsi. The
iscsiadm binary is called from within our management system, from a script
which does

  iscsiadm -m discovery -t sendtargets -p $HOST
  iscsiadm -m node -T $TARGET -o update -n node.session.auth.authmethod -v CHAP
  iscsiadm -m node -T $TARGET -o update -n node.session.auth.username -v $USER
  iscsiadm -m node -T $TARGET -o update -n node.session.auth.password -v $PASS
  iscsiadm -m node -T $TARGET -l

and then waits for udev to settle and finds the new /dev/sdX device node with
something like

  udevadm settle
  local SESSION TARGET SESSION BLOCK KERNEL
  # shopt nullglob set, so this is correct in the 'no match' case...
  for SESSION in /sys/class/scsi_host/host*/device/session*; do
    for TARGET in $SESSION/iscsi_session/*/targetname; do
      [ "`< "$TARGET"`" = "$1" ] || continue
      for BLOCK in "$SESSION"/target*/*/block/*; do
        KERNEL=${BLOCK##*/}
        echo /dev/$KERNEL
        return 0
      done
    done
  done
  return 1

(The code then goes on set up a device mapper table based on this node.)

First problem is that iscsiadm -m node -l seems to be able to return without
the node having been created, or even having begun to propagate to udev, so the
udevadm settle returns (i.e. empty queue) even though the /dev/sdX hasn't yet
been created, presumably because the uevent hasn't yet gone out to udev even
though iscsiadm has exited.

What's the canonical race-free way to wait for the iscsi login to properly
finish and the device node to be available? At the moment, I have a hack
looking like sleep 0.1; udevadm settle, which is ugly and unreliable: it
will clearly fail on a sufficiently loaded box.

However, with this hack in place, the vast majority of iscsi logins work
okay. Nonetheless, I see occasional serious kernel problems, like this oops:

  10:05:03.019 kernel: scsi23 : iSCSI Initiator over TCP/IP
  10:05:04.015 kernel: scsi 23:0:0:0: RAID              IET      Controller     
  0001 PQ: 0 ANSI: 5
  10:05:04.015 kernel: scsi 23:0:0:0: Attached scsi generic sg14 type 12
  10:05:04.017 kernel: scsi 23:0:0:1: Direct-Access     IET      VIRTUAL-DISK   
  0001 PQ: 0 ANSI: 5
  10:05:04.017 kernel: sd 23:0:0:1: Attached scsi generic sg15 type 0
  10:05:04.018 kernel: sd 23:0:0:1: [sdk] 27262976 512-byte hardware sectors: 
(13.9 GB/13.0 GiB)
  10:05:04.018 kernel: sd 23:0:0:1: [sdk] Write Protect is off
  10:05:04.019 kernel: sd 23:0:0:1: [sdk] Write cache: disabled, read cache: 
enabled, doesn't support DPO or FUA
  10:05:04.248 kernel: sdk: unknown partition table
  10:05:04.248 kernel: BUG: unable to handle kernel NULL pointer dereference at 
0000000000000010
  10:05:04.248 kernel: IP: [<ffffffff803f0d77>] disk_part_iter_next+0x74/0xfd
  10:05:04.248 kernel: PGD 82ad47067 PUD 82b591067 PMD 0 
  10:05:04.248 kernel: Oops: 0000 [#1] PREEMPT SMP 
  10:05:04.248 kernel: last sysfs file: 
/sys/devices/platform/host14/session9/iscsi_session/session9/ifacename
  10:05:04.248 kernel: CPU 6 
  10:05:04.248 kernel: Modules linked in:
  10:05:04.248 kernel: Pid: 29251, comm: async/0 Not tainted 
2.6.30.4-elastic-lon-p #1 X7DBN
  10:05:04.248 kernel: RIP: 0010:[<ffffffff803f0d77>]  [<ffffffff803f0d77>] 
disk_part_iter_next+0x74/0xfd
  10:05:04.248 kernel: RSP: 0018:ffff8802645b9dd0  EFLAGS: 00010246
  10:05:04.248 kernel: RAX: ffff880548d43c00 RBX: ffff8802645b9e00 RCX: 
0000000000000000
  10:05:04.248 kernel: RDX: 0000000000000000 RSI: ffff880548d43c00 RDI: 
0000000000000000
  10:05:04.248 kernel: RBP: ffff8802645b9df0 R08: 0000000000000000 R09: 
ffff88082cdaf000
  10:05:04.248 kernel: R10: ffff8802645b9ce0 R11: ffff880825d06fe8 R12: 
ffff8802645b9e00
  10:05:04.248 kernel: R13: ffff88082c91f1c0 R14: 0000000000000000 R15: 
ffff880548d45920
  10:05:04.248 kernel: FS:  0000000000000000(0000) GS:ffff8800281a4000(0000) 
knlGS:0000000000000000
  10:05:04.248 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
  10:05:04.248 kernel: CR2: 0000000000000010 CR3: 000000082ad48000 CR4: 
00000000000426e0
  10:05:04.248 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
  10:05:04.248 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
  10:05:04.248 kernel: Process async/0 (pid: 29251, threadinfo 
ffff8802645b8000, task ffff88082af363e0)
  10:05:04.248 kernel: Stack:
  10:05:04.248 kernel: 0000000000000000 ffff8802645b9e00 ffff8802645b9e00 
ffff88082c91f1c0
  10:05:04.248 kernel: ffff8802645b9e40 ffffffff80306feb ffff880548d43c00 
0000000000000000
  10:05:04.248 kernel: 0000000000000001 ffff880548d45920 ffff8802645b9e40 
ffff880548d43c00
  10:05:04.248 kernel: Call Trace:
  10:05:04.248 kernel: [<ffffffff80306feb>] register_disk+0x122/0x13a
  10:05:04.248 kernel: [<ffffffff803f0b0f>] add_disk+0xaa/0x106
  10:05:04.248 kernel: [<ffffffff80493609>] sd_probe_async+0x198/0x25b
  10:05:04.248 kernel: [<ffffffff802de5a3>] ? free_fs_struct+0x2d/0x31
  10:05:04.248 kernel: [<ffffffff80270482>] async_thread+0x10c/0x20d
  10:05:04.249 kernel: [<ffffffff802545ff>] ? default_wake_function+0x0/0xf
  10:05:04.249 kernel: [<ffffffff80270376>] ? async_thread+0x0/0x20d
  10:05:04.249 kernel: [<ffffffff8026ad89>] kthread+0x55/0x80
  10:05:04.249 kernel: [<ffffffff8022be6a>] child_rip+0xa/0x20
  10:05:04.249 kernel: [<ffffffff8026ad34>] ? kthread+0x0/0x80
  10:05:04.249 kernel: [<ffffffff8022be60>] ? child_rip+0x0/0x20
  10:05:04.249 kernel: Code: c8 ff 80 e1 0c b9 00 00 00 00 0f 44 c1 41 83 cd ff 
48 8d 7a 20 48 be ff ff ff ff 08 00 00 00 48 b9 00 00 00 00 08 00 00 00 eb 50 
<8b> 42 10 41 bd 01 00 00 00 eb db 4c 63 c2 4e 8d 04 c7 4d 8b 20 
  10:05:04.249 kernel: RIP  [<ffffffff803f0d77>] disk_part_iter_next+0x74/0xfd
  10:05:04.249 kernel: RSP <ffff8802645b9dd0>
  10:05:04.249 kernel: CR2: 0000000000000010
  10:05:04.249 kernel: ---[ end trace c2e45cd8c17e96d1 ]---
  10:05:04.249 kernel: note: async/0[29251] exited with preempt_count 1
[...]
  10:05:42.880 kernel: scsi24 : iSCSI Initiator over TCP/IP
  10:05:43.888 kernel: scsi 24:0:0:0: RAID              IET      Controller     
  0001 PQ: 0 ANSI: 5
  10:05:43.888 kernel: scsi 24:0:0:0: Attached scsi generic sg14 type 12
  10:05:43.888 kernel: scsi 24:0:0:1: Direct-Access     IET      VIRTUAL-DISK   
  0001 PQ: 0 ANSI: 5
  10:05:43.897 kernel: ------------[ cut here ]------------
  10:05:43.899 kernel: kernel BUG at kernel/exit.c:1014!
  10:05:43.899 kernel: invalid opcode: 0000 [#2] PREEMPT SMP 
  10:05:43.899 kernel: last sysfs file: 
/sys/devices/platform/host24/scsi_host/host24/scan
  10:05:43.899 kernel: CPU 7 
  10:05:43.899 kernel: Modules linked in:
  10:05:43.899 kernel: Pid: 29567, comm: iscsid Tainted: G      D    
2.6.30.4-elastic-lon-p #1 X7DBN
  10:05:43.899 kernel: RIP: 0010:[<ffffffff8025b338>]  [<ffffffff8025b338>] 
do_exit+0x686/0x695
  10:05:43.899 kernel: RSP: 0018:ffff8801da10f840  EFLAGS: 00010006
  10:05:43.899 kernel: RAX: ffff8802645b9ed0 RBX: 0000000000000001 RCX: 
0000000000000000
  10:05:43.899 kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: 
ffff8802645b9ed0
  10:05:43.899 kernel: RBP: ffff8801da10f888 R08: 0000000000000000 R09: 
ffff88082cd3f108
  10:05:43.899 kernel: R10: 000000000000046c R11: 000000000000011b R12: 
0000000000000003
  10:05:43.899 kernel: R13: ffff8806d24a9920 R14: 0000000080664dec R15: 
0000000000000000
  10:05:43.899 kernel: FS:  00007f3b0eefa6f0(0000) GS:ffff8800281be000(0000) 
knlGS:0000000000000000
  10:05:43.899 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  10:05:43.899 kernel: CR2: fffff88004d30000 CR3: 0000000275c0e000 CR4: 
00000000000426e0
  10:05:43.899 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
  10:05:43.899 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
  10:05:43.899 kernel: Process iscsid (pid: 29567, threadinfo ffff8801da10e000, 
task ffff8802811fd820)
  10:05:43.899 kernel: Stack:
  10:05:43.899 kernel: ffffffff8024b6d1 0000000000000000 ffffffff80835c38 
ffff8801da10f888
  10:05:43.899 kernel: ffffffff80835c30 0000000000000282 ffff88020696c200 
000000000000001f
  10:05:43.899 kernel: ffffffff80835c80 ffff8801da10f8c8 ffffffff8024c06c 
00000000000080d0
  10:05:43.899 kernel: Call Trace:
  10:05:43.899 kernel: [<ffffffff8024b6d1>] ? __wake_up_common+0x49/0x7f
  10:05:43.899 kernel: [<ffffffff8024c06c>] __wake_up+0x34/0x48
  10:05:43.899 kernel: [<ffffffff80493471>] ? sd_probe_async+0x0/0x25b
  10:05:43.899 kernel: [<ffffffff802708cb>] __async_schedule+0x17e/0x190
  10:05:43.899 kernel: [<ffffffff802708f8>] async_schedule+0x10/0x14
  10:05:43.899 kernel: [<ffffffff80493d4a>] sd_probe+0x1bd/0x213
  10:05:43.899 kernel: [<ffffffff80473a1e>] driver_probe_device+0x9a/0x11f
  10:05:43.899 kernel: [<ffffffff80473b54>] __device_attach+0x35/0x3a
  10:05:43.899 kernel: [<ffffffff80473b1f>] ? __device_attach+0x0/0x3a
  10:05:43.899 kernel: [<ffffffff80472fd4>] bus_for_each_drv+0x51/0x88
  10:05:43.899 kernel: [<ffffffff80473be1>] device_attach+0x5e/0x75
  10:05:43.899 kernel: [<ffffffff80472e3c>] bus_attach_device+0x26/0x58
  10:05:43.899 kernel: [<ffffffff80471a5d>] device_add+0x3ff/0x562
  10:05:43.899 kernel: [<ffffffff80485104>] scsi_sysfs_add_sdev+0xb5/0x252
  10:05:43.899 kernel: [<ffffffff80482f72>] scsi_probe_and_add_lun+0x910/0xa32
  10:05:43.899 kernel: [<ffffffff8048363c>] __scsi_scan_target+0x3a5/0x542
  10:05:43.899 kernel: [<ffffffff8029e08d>] ? zone_statistics+0x60/0x65
  10:05:43.899 kernel: [<ffffffff80293369>] ? get_page_from_freelist+0x4ad/0x67a
  10:05:43.899 kernel: [<ffffffff80483dce>] scsi_scan_target+0x97/0xae
  10:05:43.899 kernel: [<ffffffff80487c3b>] iscsi_user_scan_session+0xcd/0xe4
  10:05:43.899 kernel: [<ffffffff80487b6e>] ? iscsi_user_scan_session+0x0/0xe4
  10:05:43.899 kernel: [<ffffffff80470f95>] device_for_each_child+0x35/0x6c
  10:05:43.899 kernel: [<ffffffff80487b53>] iscsi_user_scan+0x28/0x2a
  10:05:43.899 kernel: [<ffffffff8048471c>] store_scan+0x9b/0xc6
  10:05:43.899 kernel: [<ffffffff80470765>] dev_attr_store+0x1b/0x1d
  10:05:43.899 kernel: [<ffffffff8030b61d>] sysfs_write_file+0xf2/0x12e
  10:05:43.899 kernel: [<ffffffff802c1711>] vfs_write+0xad/0x129
  10:05:43.899 kernel: [<ffffffff802c1846>] sys_write+0x45/0x6c
  10:05:43.899 kernel: [<ffffffff8022aeeb>] system_call_fastpath+0x16/0x1b
  10:05:43.899 kernel: Code: 8b bb 98 05 00 00 48 85 ff 74 05 e8 64 c4 06 00 65 
48 8b 04 25 e8 b4 00 00 ff 80 44 e0 ff ff 48 c7 03 40 00 00 00 e8 9c 9a 40 00 
<0f> 0b eb fe 41 bc fe ff ff ff e9 17 ff ff ff 55 48 89 e5 41 55 
  10:05:43.899 kernel: RIP  [<ffffffff8025b338>] do_exit+0x686/0x695
  10:05:43.899 kernel: RSP <ffff8801da10f840>
  10:05:43.899 kernel: ---[ end trace c2e45cd8c17e96d2 ]---
  10:05:43.899 kernel: note: iscsid[29567] exited with preempt_count 1
  10:05:43.899 kernel: BUG: scheduling while atomic: iscsid/29567/0x10000002
  10:05:43.899 kernel: Modules linked in:
  10:05:43.899 kernel: Pid: 29567, comm: iscsid Tainted: G      D    
2.6.30.4-elastic-lon-p #1
  10:05:43.899 kernel: Call Trace:
  10:05:43.899 kernel: [<ffffffff80250e32>] __schedule_bug+0x57/0x5c
  10:05:43.899 kernel: [<ffffffff806645b9>] __schedule+0xc1/0x814
  10:05:43.899 kernel: [<ffffffff803ff025>] ? number+0x12f/0x225
  10:05:43.899 kernel: [<ffffffff80292cb0>] ? __pagevec_free+0x29/0x3c
  10:05:43.899 kernel: [<ffffffff80664dec>] schedule+0x18/0x3b
  10:05:43.899 kernel: [<ffffffff80250f33>] __cond_resched+0x1c/0x45
  10:05:43.899 kernel: [<ffffffff80664ec2>] _cond_resched+0x30/0x3b
  10:05:43.899 kernel: [<ffffffff802a17b2>] unmap_vmas+0x6cd/0x886
  10:05:43.899 kernel: [<ffffffff802a5db3>] exit_mmap+0xd4/0x184
  10:05:43.899 kernel: [<ffffffff80255c6b>] mmput+0x2b/0xb4
  10:05:43.899 kernel: [<ffffffff80259524>] exit_mm+0xff/0x10a
  10:05:43.899 kernel: [<ffffffff8025ae2c>] do_exit+0x17a/0x695
  10:05:43.899 kernel: [<ffffffff80419d84>] ? vgacon_set_cursor_size+0xfd/0x109
  10:05:43.899 kernel: [<ffffffff8022ea59>] oops_end+0x89/0x8e
  10:05:43.899 kernel: [<ffffffff8022ec1c>] die+0x55/0x5e
  10:05:43.899 kernel: [<ffffffff8022c6a0>] do_trap+0x115/0x124
  10:05:43.899 kernel: [<ffffffff8022ca20>] do_invalid_op+0x91/0x9a
  10:05:43.899 kernel: [<ffffffff8025b338>] ? do_exit+0x686/0x695
  10:05:43.899 kernel: [<ffffffff8024ecfd>] ? dequeue_task_fair+0x68/0x71
  10:05:43.899 kernel: [<ffffffff80250e89>] ? finish_task_switch+0x52/0xe0
  10:05:43.899 kernel: [<ffffffff8022bc05>] invalid_op+0x15/0x20
  10:05:43.899 kernel: [<ffffffff8025b338>] ? do_exit+0x686/0x695
  10:05:43.899 kernel: [<ffffffff8024b6d1>] ? __wake_up_common+0x49/0x7f
  10:05:43.899 kernel: [<ffffffff8024c06c>] __wake_up+0x34/0x48
  10:05:43.899 kernel: [<ffffffff80493471>] ? sd_probe_async+0x0/0x25b
  10:05:43.899 kernel: [<ffffffff802708cb>] __async_schedule+0x17e/0x190
  10:05:43.899 kernel: [<ffffffff802708f8>] async_schedule+0x10/0x14
  10:05:43.899 kernel: [<ffffffff80493d4a>] sd_probe+0x1bd/0x213
  10:05:43.899 kernel: [<ffffffff80473a1e>] driver_probe_device+0x9a/0x11f
  10:05:43.899 kernel: [<ffffffff80473b54>] __device_attach+0x35/0x3a
  10:05:43.899 kernel: [<ffffffff80473b1f>] ? __device_attach+0x0/0x3a
  10:05:43.899 kernel: [<ffffffff80472fd4>] bus_for_each_drv+0x51/0x88
  10:05:43.899 kernel: [<ffffffff80473be1>] device_attach+0x5e/0x75
  10:05:43.899 kernel: [<ffffffff80472e3c>] bus_attach_device+0x26/0x58
  10:05:43.899 kernel: [<ffffffff80471a5d>] device_add+0x3ff/0x562
  10:05:43.899 kernel: [<ffffffff80485104>] scsi_sysfs_add_sdev+0xb5/0x252
  10:05:43.900 kernel: [<ffffffff80482f72>] scsi_probe_and_add_lun+0x910/0xa32
  10:05:43.900 kernel: [<ffffffff8048363c>] __scsi_scan_target+0x3a5/0x542
  10:05:43.900 kernel: [<ffffffff8029e08d>] ? zone_statistics+0x60/0x65
  10:05:43.900 kernel: [<ffffffff80293369>] ? get_page_from_freelist+0x4ad/0x67a
  10:05:43.900 kernel: [<ffffffff80483dce>] scsi_scan_target+0x97/0xae
  10:05:43.900 kernel: [<ffffffff80487c3b>] iscsi_user_scan_session+0xcd/0xe4
  10:05:43.900 kernel: [<ffffffff80487b6e>] ? iscsi_user_scan_session+0x0/0xe4
  10:05:43.900 kernel: [<ffffffff80470f95>] device_for_each_child+0x35/0x6c
  10:05:43.900 kernel: [<ffffffff80487b53>] iscsi_user_scan+0x28/0x2a
  10:05:43.900 kernel: [<ffffffff8048471c>] store_scan+0x9b/0xc6
  10:05:43.900 kernel: [<ffffffff80470765>] dev_attr_store+0x1b/0x1d
  10:05:43.901 kernel: [<ffffffff8030b61d>] sysfs_write_file+0xf2/0x12e
  10:05:43.901 kernel: [<ffffffff802c1711>] vfs_write+0xad/0x129
  10:05:43.901 kernel: [<ffffffff802c1846>] sys_write+0x45/0x6c
  10:05:43.901 kernel: [<ffffffff8022aeeb>] system_call_fastpath+0x16/0x1b

This clearly shouldn't happen whatever I do from userspace and maybe its
completely unconnected, but I'm wondering if my walk of sysfs might be
provoking a bug that would otherwise been hidden, racing with the creation of
the sysfs files themselves?

I'm running stock 2.6.30.4 on x86-64 with the upstream kernel iscsi_tcp
driver compiled in, and open-iscsi 2.0-871 with just the following patch

diff --git a/usr/iscsid.c b/usr/iscsid.c
--- a/usr/iscsid.c
+++ b/usr/iscsid.c
@@ -387,10 +387,10 @@
                exit(1);
        }
 
-       if (iscsi_sysfs_check_class_version()) {
+       /* if (iscsi_sysfs_check_class_version()) {
                log_close(log_pid);
                exit(1);
-       }
+       } */
 
        umask(0177);
 
to disable a broken check for the iscsi_tcp driver version, which only works
correctly if the driver is compiled as a module, erroneously failing when
iscsi_tcp is compiled into the kernel. Kernel config is at

  http://cdw.me.uk/tmp/linux-2.6.30.4-config.text

I've spent a little time trying to reproduce this within a kvm virtual machine
with the same kernel and userspace, but have been unable to do so. (The box
with the kernel backtraces was a fairly fast dual quad-core Xeon E5420 box, so
perhaps the timings are too different within a slow vm.)

Any pointers gratefully received!

Cheers,

Chris.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to