On 08/06/2009 02:32 PM, Chris Webb wrote:
> I'm seeing a fairly rare bug with drives I'm accessing with open-iscsi. The
> iscsiadm binary is called from within our management system, from a script
> which does
>
>    iscsiadm -m discovery -t sendtargets -p $HOST
>    iscsiadm -m node -T $TARGET -o update -n node.session.auth.authmethod -v 
> CHAP
>    iscsiadm -m node -T $TARGET -o update -n node.session.auth.username -v 
> $USER
>    iscsiadm -m node -T $TARGET -o update -n node.session.auth.password -v 
> $PASS
>    iscsiadm -m node -T $TARGET -l
>
> and then waits for udev to settle and finds the new /dev/sdX device node with
> something like
>
>    udevadm settle
>    local SESSION TARGET SESSION BLOCK KERNEL
>    # shopt nullglob set, so this is correct in the 'no match' case...
>    for SESSION in /sys/class/scsi_host/host*/device/session*; do
>      for TARGET in $SESSION/iscsi_session/*/targetname; do
>        [ "`<  "$TARGET"`" = "$1" ] || continue
>        for BLOCK in "$SESSION"/target*/*/block/*; do
>          KERNEL=${BLOCK##*/}
>          echo /dev/$KERNEL
>          return 0
>        done
>      done
>    done
>    return 1
>
> (The code then goes on set up a device mapper table based on this node.)
>
> First problem is that iscsiadm -m node -l seems to be able to return without
> the node having been created, or even having begun to propagate to udev, so 
> the
> udevadm settle returns (i.e. empty queue) even though the /dev/sdX hasn't yet
> been created, presumably because the uevent hasn't yet gone out to udev even
> though iscsiadm has exited.
>
> What's the canonical race-free way to wait for the iscsi login to properly
> finish and the device node to be available? At the moment, I have a hack

I am not sure about that. I will just add some details about what 
iscsiadm is doing.

After it logs into the target it does

echo - - -  > /sys/class/scsi_host/hostX/scan

This has the scsi layer do commands like inquirys and report luns to 
find your disks. The scsi layer will then create the kernel structs for 
the scsi devices and when that happens it will fire hot plug events that 
udev waits for. When the kernel setup is done the echo command above 
will return.

So from iscsiadm's perspective it has logged in and detected the disks. 
It has no idea udev is running or not. Maybe in some cases like when run 
from the initrd/initramfs udev might not be running like normal.

If you add a udev rule, that checks for iscsi devices and does whatever 
you were trying to do will that help?


> looking like sleep 0.1; udevadm settle, which is ugly and unreliable: it
> will clearly fail on a sufficiently loaded box.
>
> However, with this hack in place, the vast majority of iscsi logins work
> okay. Nonetheless, I see occasional serious kernel problems, like this oops:
>


For the problems below, I am not sure what is up. The iscsi code has no 
control over it. I think the first problem is fixed though. I think 
there were some bugs in the kernel async scanning code (can you turn 
that off in the kernel configuration) that were fixed recently in 
2.6.31-rc5.


>    10:05:03.019 kernel: scsi23 : iSCSI Initiator over TCP/IP
>    10:05:04.015 kernel: scsi 23:0:0:0: RAID              IET      Controller  
>      0001 PQ: 0 ANSI: 5
>    10:05:04.015 kernel: scsi 23:0:0:0: Attached scsi generic sg14 type 12
>    10:05:04.017 kernel: scsi 23:0:0:1: Direct-Access     IET      
> VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
>    10:05:04.017 kernel: sd 23:0:0:1: Attached scsi generic sg15 type 0
>    10:05:04.018 kernel: sd 23:0:0:1: [sdk] 27262976 512-byte hardware 
> sectors: (13.9 GB/13.0 GiB)
>    10:05:04.018 kernel: sd 23:0:0:1: [sdk] Write Protect is off
>    10:05:04.019 kernel: sd 23:0:0:1: [sdk] Write cache: disabled, read cache: 
> enabled, doesn't support DPO or FUA
>    10:05:04.248 kernel: sdk: unknown partition table
>    10:05:04.248 kernel: BUG: unable to handle kernel NULL pointer dereference 
> at 0000000000000010
>    10:05:04.248 kernel: IP: [<ffffffff803f0d77>] disk_part_iter_next+0x74/0xfd
>    10:05:04.248 kernel: PGD 82ad47067 PUD 82b591067 PMD 0
>    10:05:04.248 kernel: Oops: 0000 [#1] PREEMPT SMP
>    10:05:04.248 kernel: last sysfs file: 
> /sys/devices/platform/host14/session9/iscsi_session/session9/ifacename
>    10:05:04.248 kernel: CPU 6
>    10:05:04.248 kernel: Modules linked in:
>    10:05:04.248 kernel: Pid: 29251, comm: async/0 Not tainted 
> 2.6.30.4-elastic-lon-p #1 X7DBN
>    10:05:04.248 kernel: RIP: 0010:[<ffffffff803f0d77>]  [<ffffffff803f0d77>] 
> disk_part_iter_next+0x74/0xfd
>    10:05:04.248 kernel: RSP: 0018:ffff8802645b9dd0  EFLAGS: 00010246
>    10:05:04.248 kernel: RAX: ffff880548d43c00 RBX: ffff8802645b9e00 RCX: 
> 0000000000000000
>    10:05:04.248 kernel: RDX: 0000000000000000 RSI: ffff880548d43c00 RDI: 
> 0000000000000000
>    10:05:04.248 kernel: RBP: ffff8802645b9df0 R08: 0000000000000000 R09: 
> ffff88082cdaf000
>    10:05:04.248 kernel: R10: ffff8802645b9ce0 R11: ffff880825d06fe8 R12: 
> ffff8802645b9e00
>    10:05:04.248 kernel: R13: ffff88082c91f1c0 R14: 0000000000000000 R15: 
> ffff880548d45920
>    10:05:04.248 kernel: FS:  0000000000000000(0000) GS:ffff8800281a4000(0000) 
> knlGS:0000000000000000
>    10:05:04.248 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>    10:05:04.248 kernel: CR2: 0000000000000010 CR3: 000000082ad48000 CR4: 
> 00000000000426e0
>    10:05:04.248 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
>    10:05:04.248 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
>    10:05:04.248 kernel: Process async/0 (pid: 29251, threadinfo 
> ffff8802645b8000, task ffff88082af363e0)
>    10:05:04.248 kernel: Stack:
>    10:05:04.248 kernel: 0000000000000000 ffff8802645b9e00 ffff8802645b9e00 
> ffff88082c91f1c0
>    10:05:04.248 kernel: ffff8802645b9e40 ffffffff80306feb ffff880548d43c00 
> 0000000000000000
>    10:05:04.248 kernel: 0000000000000001 ffff880548d45920 ffff8802645b9e40 
> ffff880548d43c00
>    10:05:04.248 kernel: Call Trace:
>    10:05:04.248 kernel: [<ffffffff80306feb>] register_disk+0x122/0x13a
>    10:05:04.248 kernel: [<ffffffff803f0b0f>] add_disk+0xaa/0x106
>    10:05:04.248 kernel: [<ffffffff80493609>] sd_probe_async+0x198/0x25b
>    10:05:04.248 kernel: [<ffffffff802de5a3>] ? free_fs_struct+0x2d/0x31
>    10:05:04.248 kernel: [<ffffffff80270482>] async_thread+0x10c/0x20d
>    10:05:04.249 kernel: [<ffffffff802545ff>] ? default_wake_function+0x0/0xf
>    10:05:04.249 kernel: [<ffffffff80270376>] ? async_thread+0x0/0x20d
>    10:05:04.249 kernel: [<ffffffff8026ad89>] kthread+0x55/0x80
>    10:05:04.249 kernel: [<ffffffff8022be6a>] child_rip+0xa/0x20
>    10:05:04.249 kernel: [<ffffffff8026ad34>] ? kthread+0x0/0x80
>    10:05:04.249 kernel: [<ffffffff8022be60>] ? child_rip+0x0/0x20
>    10:05:04.249 kernel: Code: c8 ff 80 e1 0c b9 00 00 00 00 0f 44 c1 41 83 cd 
> ff 48 8d 7a 20 48 be ff ff ff ff 08 00 00 00 48 b9 00 00 00 00 08 00 00 00 eb 
> 50<8b>  42 10 41 bd 01 00 00 00 eb db 4c 63 c2 4e 8d 04 c7 4d 8b 20
>    10:05:04.249 kernel: RIP  [<ffffffff803f0d77>] 
> disk_part_iter_next+0x74/0xfd
>    10:05:04.249 kernel: RSP<ffff8802645b9dd0>
>    10:05:04.249 kernel: CR2: 0000000000000010
>    10:05:04.249 kernel: ---[ end trace c2e45cd8c17e96d1 ]---
>    10:05:04.249 kernel: note: async/0[29251] exited with preempt_count 1
> [...]
>    10:05:42.880 kernel: scsi24 : iSCSI Initiator over TCP/IP
>    10:05:43.888 kernel: scsi 24:0:0:0: RAID              IET      Controller  
>      0001 PQ: 0 ANSI: 5
>    10:05:43.888 kernel: scsi 24:0:0:0: Attached scsi generic sg14 type 12
>    10:05:43.888 kernel: scsi 24:0:0:1: Direct-Access     IET      
> VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
>    10:05:43.897 kernel: ------------[ cut here ]------------
>    10:05:43.899 kernel: kernel BUG at kernel/exit.c:1014!
>    10:05:43.899 kernel: invalid opcode: 0000 [#2] PREEMPT SMP
>    10:05:43.899 kernel: last sysfs file: 
> /sys/devices/platform/host24/scsi_host/host24/scan
>    10:05:43.899 kernel: CPU 7
>    10:05:43.899 kernel: Modules linked in:
>    10:05:43.899 kernel: Pid: 29567, comm: iscsid Tainted: G      D    
> 2.6.30.4-elastic-lon-p #1 X7DBN
>    10:05:43.899 kernel: RIP: 0010:[<ffffffff8025b338>]  [<ffffffff8025b338>] 
> do_exit+0x686/0x695
>    10:05:43.899 kernel: RSP: 0018:ffff8801da10f840  EFLAGS: 00010006
>    10:05:43.899 kernel: RAX: ffff8802645b9ed0 RBX: 0000000000000001 RCX: 
> 0000000000000000
>    10:05:43.899 kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: 
> ffff8802645b9ed0
>    10:05:43.899 kernel: RBP: ffff8801da10f888 R08: 0000000000000000 R09: 
> ffff88082cd3f108
>    10:05:43.899 kernel: R10: 000000000000046c R11: 000000000000011b R12: 
> 0000000000000003
>    10:05:43.899 kernel: R13: ffff8806d24a9920 R14: 0000000080664dec R15: 
> 0000000000000000
>    10:05:43.899 kernel: FS:  00007f3b0eefa6f0(0000) GS:ffff8800281be000(0000) 
> knlGS:0000000000000000
>    10:05:43.899 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>    10:05:43.899 kernel: CR2: fffff88004d30000 CR3: 0000000275c0e000 CR4: 
> 00000000000426e0
>    10:05:43.899 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
>    10:05:43.899 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
>    10:05:43.899 kernel: Process iscsid (pid: 29567, threadinfo 
> ffff8801da10e000, task ffff8802811fd820)
>    10:05:43.899 kernel: Stack:
>    10:05:43.899 kernel: ffffffff8024b6d1 0000000000000000 ffffffff80835c38 
> ffff8801da10f888
>    10:05:43.899 kernel: ffffffff80835c30 0000000000000282 ffff88020696c200 
> 000000000000001f
>    10:05:43.899 kernel: ffffffff80835c80 ffff8801da10f8c8 ffffffff8024c06c 
> 00000000000080d0
>    10:05:43.899 kernel: Call Trace:
>    10:05:43.899 kernel: [<ffffffff8024b6d1>] ? __wake_up_common+0x49/0x7f
>    10:05:43.899 kernel: [<ffffffff8024c06c>] __wake_up+0x34/0x48
>    10:05:43.899 kernel: [<ffffffff80493471>] ? sd_probe_async+0x0/0x25b
>    10:05:43.899 kernel: [<ffffffff802708cb>] __async_schedule+0x17e/0x190
>    10:05:43.899 kernel: [<ffffffff802708f8>] async_schedule+0x10/0x14
>    10:05:43.899 kernel: [<ffffffff80493d4a>] sd_probe+0x1bd/0x213
>    10:05:43.899 kernel: [<ffffffff80473a1e>] driver_probe_device+0x9a/0x11f
>    10:05:43.899 kernel: [<ffffffff80473b54>] __device_attach+0x35/0x3a
>    10:05:43.899 kernel: [<ffffffff80473b1f>] ? __device_attach+0x0/0x3a
>    10:05:43.899 kernel: [<ffffffff80472fd4>] bus_for_each_drv+0x51/0x88
>    10:05:43.899 kernel: [<ffffffff80473be1>] device_attach+0x5e/0x75
>    10:05:43.899 kernel: [<ffffffff80472e3c>] bus_attach_device+0x26/0x58
>    10:05:43.899 kernel: [<ffffffff80471a5d>] device_add+0x3ff/0x562
>    10:05:43.899 kernel: [<ffffffff80485104>] scsi_sysfs_add_sdev+0xb5/0x252
>    10:05:43.899 kernel: [<ffffffff80482f72>] 
> scsi_probe_and_add_lun+0x910/0xa32
>    10:05:43.899 kernel: [<ffffffff8048363c>] __scsi_scan_target+0x3a5/0x542
>    10:05:43.899 kernel: [<ffffffff8029e08d>] ? zone_statistics+0x60/0x65
>    10:05:43.899 kernel: [<ffffffff80293369>] ? 
> get_page_from_freelist+0x4ad/0x67a
>    10:05:43.899 kernel: [<ffffffff80483dce>] scsi_scan_target+0x97/0xae
>    10:05:43.899 kernel: [<ffffffff80487c3b>] iscsi_user_scan_session+0xcd/0xe4
>    10:05:43.899 kernel: [<ffffffff80487b6e>] ? 
> iscsi_user_scan_session+0x0/0xe4
>    10:05:43.899 kernel: [<ffffffff80470f95>] device_for_each_child+0x35/0x6c
>    10:05:43.899 kernel: [<ffffffff80487b53>] iscsi_user_scan+0x28/0x2a
>    10:05:43.899 kernel: [<ffffffff8048471c>] store_scan+0x9b/0xc6
>    10:05:43.899 kernel: [<ffffffff80470765>] dev_attr_store+0x1b/0x1d
>    10:05:43.899 kernel: [<ffffffff8030b61d>] sysfs_write_file+0xf2/0x12e
>    10:05:43.899 kernel: [<ffffffff802c1711>] vfs_write+0xad/0x129
>    10:05:43.899 kernel: [<ffffffff802c1846>] sys_write+0x45/0x6c
>    10:05:43.899 kernel: [<ffffffff8022aeeb>] system_call_fastpath+0x16/0x1b
>    10:05:43.899 kernel: Code: 8b bb 98 05 00 00 48 85 ff 74 05 e8 64 c4 06 00 
> 65 48 8b 04 25 e8 b4 00 00 ff 80 44 e0 ff ff 48 c7 03 40 00 00 00 e8 9c 9a 40 
> 00<0f>  0b eb fe 41 bc fe ff ff ff e9 17 ff ff ff 55 48 89 e5 41 55
>    10:05:43.899 kernel: RIP  [<ffffffff8025b338>] do_exit+0x686/0x695
>    10:05:43.899 kernel: RSP<ffff8801da10f840>
>    10:05:43.899 kernel: ---[ end trace c2e45cd8c17e96d2 ]---
>    10:05:43.899 kernel: note: iscsid[29567] exited with preempt_count 1
>    10:05:43.899 kernel: BUG: scheduling while atomic: iscsid/29567/0x10000002
>    10:05:43.899 kernel: Modules linked in:
>    10:05:43.899 kernel: Pid: 29567, comm: iscsid Tainted: G      D    
> 2.6.30.4-elastic-lon-p #1
>    10:05:43.899 kernel: Call Trace:
>    10:05:43.899 kernel: [<ffffffff80250e32>] __schedule_bug+0x57/0x5c
>    10:05:43.899 kernel: [<ffffffff806645b9>] __schedule+0xc1/0x814
>    10:05:43.899 kernel: [<ffffffff803ff025>] ? number+0x12f/0x225
>    10:05:43.899 kernel: [<ffffffff80292cb0>] ? __pagevec_free+0x29/0x3c
>    10:05:43.899 kernel: [<ffffffff80664dec>] schedule+0x18/0x3b
>    10:05:43.899 kernel: [<ffffffff80250f33>] __cond_resched+0x1c/0x45
>    10:05:43.899 kernel: [<ffffffff80664ec2>] _cond_resched+0x30/0x3b
>    10:05:43.899 kernel: [<ffffffff802a17b2>] unmap_vmas+0x6cd/0x886
>    10:05:43.899 kernel: [<ffffffff802a5db3>] exit_mmap+0xd4/0x184
>    10:05:43.899 kernel: [<ffffffff80255c6b>] mmput+0x2b/0xb4
>    10:05:43.899 kernel: [<ffffffff80259524>] exit_mm+0xff/0x10a
>    10:05:43.899 kernel: [<ffffffff8025ae2c>] do_exit+0x17a/0x695
>    10:05:43.899 kernel: [<ffffffff80419d84>] ? 
> vgacon_set_cursor_size+0xfd/0x109
>    10:05:43.899 kernel: [<ffffffff8022ea59>] oops_end+0x89/0x8e
>    10:05:43.899 kernel: [<ffffffff8022ec1c>] die+0x55/0x5e
>    10:05:43.899 kernel: [<ffffffff8022c6a0>] do_trap+0x115/0x124
>    10:05:43.899 kernel: [<ffffffff8022ca20>] do_invalid_op+0x91/0x9a
>    10:05:43.899 kernel: [<ffffffff8025b338>] ? do_exit+0x686/0x695
>    10:05:43.899 kernel: [<ffffffff8024ecfd>] ? dequeue_task_fair+0x68/0x71
>    10:05:43.899 kernel: [<ffffffff80250e89>] ? finish_task_switch+0x52/0xe0
>    10:05:43.899 kernel: [<ffffffff8022bc05>] invalid_op+0x15/0x20
>    10:05:43.899 kernel: [<ffffffff8025b338>] ? do_exit+0x686/0x695
>    10:05:43.899 kernel: [<ffffffff8024b6d1>] ? __wake_up_common+0x49/0x7f
>    10:05:43.899 kernel: [<ffffffff8024c06c>] __wake_up+0x34/0x48
>    10:05:43.899 kernel: [<ffffffff80493471>] ? sd_probe_async+0x0/0x25b
>    10:05:43.899 kernel: [<ffffffff802708cb>] __async_schedule+0x17e/0x190
>    10:05:43.899 kernel: [<ffffffff802708f8>] async_schedule+0x10/0x14
>    10:05:43.899 kernel: [<ffffffff80493d4a>] sd_probe+0x1bd/0x213
>    10:05:43.899 kernel: [<ffffffff80473a1e>] driver_probe_device+0x9a/0x11f
>    10:05:43.899 kernel: [<ffffffff80473b54>] __device_attach+0x35/0x3a
>    10:05:43.899 kernel: [<ffffffff80473b1f>] ? __device_attach+0x0/0x3a
>    10:05:43.899 kernel: [<ffffffff80472fd4>] bus_for_each_drv+0x51/0x88
>    10:05:43.899 kernel: [<ffffffff80473be1>] device_attach+0x5e/0x75
>    10:05:43.899 kernel: [<ffffffff80472e3c>] bus_attach_device+0x26/0x58
>    10:05:43.899 kernel: [<ffffffff80471a5d>] device_add+0x3ff/0x562
>    10:05:43.899 kernel: [<ffffffff80485104>] scsi_sysfs_add_sdev+0xb5/0x252
>    10:05:43.900 kernel: [<ffffffff80482f72>] 
> scsi_probe_and_add_lun+0x910/0xa32
>    10:05:43.900 kernel: [<ffffffff8048363c>] __scsi_scan_target+0x3a5/0x542
>    10:05:43.900 kernel: [<ffffffff8029e08d>] ? zone_statistics+0x60/0x65
>    10:05:43.900 kernel: [<ffffffff80293369>] ? 
> get_page_from_freelist+0x4ad/0x67a
>    10:05:43.900 kernel: [<ffffffff80483dce>] scsi_scan_target+0x97/0xae
>    10:05:43.900 kernel: [<ffffffff80487c3b>] iscsi_user_scan_session+0xcd/0xe4
>    10:05:43.900 kernel: [<ffffffff80487b6e>] ? 
> iscsi_user_scan_session+0x0/0xe4
>    10:05:43.900 kernel: [<ffffffff80470f95>] device_for_each_child+0x35/0x6c
>    10:05:43.900 kernel: [<ffffffff80487b53>] iscsi_user_scan+0x28/0x2a
>    10:05:43.900 kernel: [<ffffffff8048471c>] store_scan+0x9b/0xc6
>    10:05:43.900 kernel: [<ffffffff80470765>] dev_attr_store+0x1b/0x1d
>    10:05:43.901 kernel: [<ffffffff8030b61d>] sysfs_write_file+0xf2/0x12e
>    10:05:43.901 kernel: [<ffffffff802c1711>] vfs_write+0xad/0x129
>    10:05:43.901 kernel: [<ffffffff802c1846>] sys_write+0x45/0x6c
>    10:05:43.901 kernel: [<ffffffff8022aeeb>] system_call_fastpath+0x16/0x1b
>
> This clearly shouldn't happen whatever I do from userspace and maybe its
> completely unconnected, but I'm wondering if my walk of sysfs might be
> provoking a bug that would otherwise been hidden, racing with the creation of
> the sysfs files themselves?
>
> I'm running stock 2.6.30.4 with the upstream kernel iscsi_tcp driver
> compiled in, and open-iscsi 2.0-871 with just the following patch
>
> diff --git a/usr/iscsid.c b/usr/iscsid.c
> --- a/usr/iscsid.c
> +++ b/usr/iscsid.c
> @@ -387,10 +387,10 @@
>               exit(1);
>       }
>
> -     if (iscsi_sysfs_check_class_version()) {
> +     /* if (iscsi_sysfs_check_class_version()) {
>               log_close(log_pid);
>               exit(1);
> -     }
> +     } */
>
>       umask(0177);
>
> to disable a broken check for the iscsi_tcp driver version, which only works
> correctly if the driver is compiled as a module, erroneously failing when
> iscsi_tcp is compiled into the kernel. Kernel config is at
>
>    http://cdw.me.uk/tmp/linux-2.6.30.4-config.text
>
> I've spent a little time trying to reproduce this within a kvm virtual machine
> with the same kernel and userspace, but have been unable to do so. (The box
> with the kernel backtraces was a fairly fast dual quad-core Xeon E5420 box, so
> perhaps the timings are too different within a slow vm.)
>
> Any pointers gratefully received!
>
> Cheers,
>
> Chris.
>
> >


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to