Re: Login race, and (perhaps unrelated) kernel oops

2009-08-20 Thread Chris Webb

Mike Christie  writes:

> Could you repost this to the linux-scsi list? Change the title to oops 
> during scsi scanning disk setup.
> 
> I am not sure what is causing this. It does not look like iscsi is the 
> problem. We are just kicking off the scan. The oops is in the disk 
> setup, and I am not familiar with that code. The linux-scsi guys can 
> help more than I can.

Sure, I'll do that now. Many thanks for the pointer and advice.

Best wishes,

Chris.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Login race, and (perhaps unrelated) kernel oops

2009-08-20 Thread Mike Christie

On 08/20/2009 12:33 PM, Chris Webb wrote:
> Mike Christie  writes:
>
>> I was thinking you might be hitting a problem in the async code the
>> scsi_scan.c/sd.c code use like here:
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;f=kernel/async.c;h=27235f5de198997180d4a293adc932d48c6101c7;hb=HEAD
>
> Hi. Sorry to follow up again, I've now seen the same oops with async scsi
> scan turned off at compile time in the kernel:
>

Could you repost this to the linux-scsi list? Change the title to oops 
during scsi scanning disk setup.

I am not sure what is causing this. It does not look like iscsi is the 
problem. We are just kicking off the scan. The oops is in the disk 
setup, and I am not familiar with that code. The linux-scsi guys can 
help more than I can.


>p0# uname -a
>Linux f7ea8425-d45b-490f-a738-d181d0df6963.host.elastichosts.com 
> 2.6.30.4-elastic-lon-p #2 SMP PREEMPT Thu Aug 20 14:30:50 BST 2009 x86_64 
> Intel(R) Xeon(R) CPU E5420 @ 2.50GHz GenuineIntel GNU/Linux
>p0# zgrep SCAN_ASYNC /proc/config.gz
># CONFIG_SCSI_SCAN_ASYNC is not set
>
>p0# cat /var/log/kern/2009-08-20
>[...]
>15:27:10.485 kernel: scsi9 : iSCSI Initiator over TCP/IP
>15:27:11.493 kernel: scsi 9:0:0:0: RAID  IET  Controller   
> 0001 PQ: 0 ANSI: 5
>15:27:11.493 kernel: scsi 9:0:0:0: Attached scsi generic sg6 type 12
>15:27:11.495 kernel: scsi 9:0:0:1: Direct-Access IET  VIRTUAL-DISK 
> 0001 PQ: 0 ANSI: 5
>15:27:11.495 kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
>15:27:11.495 kernel: sd 9:0:0:1: [sdg] 4194304 512-byte hardware sectors: 
> (2.14 GB/2.00 GiB)
>15:27:11.495 kernel: sd 9:0:0:1: [sdg] Write Protect is off
>15:27:11.495 kernel: sd 9:0:0:1: [sdg] Write cache: disabled, read cache: 
> enabled, doesn't support DPO or FUA
>15:27:13.012 kernel: sdg:<6>scsi 9:0:0:1: [sdg] Unhandled error code
>15:27:13.012 kernel: scsi 9:0:0:1: [sdg] Result: hostbyte=0x07 
> driverbyte=0x00
>15:27:13.012 kernel: end_request: I/O error, dev sdg, sector 0
>15:27:13.012 kernel: Buffer I/O error on device sdg, logical block 0
>15:27:13.012 kernel: ldm_validate_partition_table(): Disk read failed.
>15:27:13.012 kernel: unable to read partition table
>15:27:13.014 kernel: BUG: unable to handle kernel NULL pointer dereference 
> at 0010
>15:27:13.014 kernel: IP: [] disk_part_iter_next+0x74/0xfd
>15:27:13.014 kernel: PGD 82ad0b067 PUD 82cd7e067 PMD 0
>15:27:13.014 kernel: Oops:  [#1] PREEMPT SMP
>15:27:13.014 kernel: last sysfs file: 
> /sys/devices/platform/host9/session4/iscsi_session/session4/ifacename
>15:27:13.014 kernel: CPU 5
>15:27:13.014 kernel: Modules linked in:
>15:27:13.014 kernel: Pid: 13999, comm: async/0 Not tainted 
> 2.6.30.4-elastic-lon-p #2 X7DBN
>15:27:13.014 kernel: RIP: 0010:[]  [] 
> disk_part_iter_next+0x74/0xfd
>15:27:13.014 kernel: RSP: 0018:88066afa3dd0  EFLAGS: 00010246
>15:27:13.014 kernel: RAX: 88082b58a000 RBX: 88066afa3e00 RCX: 
> 
>15:27:13.014 kernel: RDX:  RSI: 88082b58a000 RDI: 
> 
>15:27:13.014 kernel: RBP: 88066afa3df0 R08: 88066afa2000 R09: 
> 8806a204f000
>15:27:13.014 kernel: R10: 00fb12c7d274 R11: 8806c2bf0628 R12: 
> 88066afa3e00
>15:27:13.014 kernel: R13: 88082c829a00 R14:  R15: 
> 8806bc50c920
>15:27:13.014 kernel: FS:  () GS:88002818a000() 
> knlGS:
>15:27:13.014 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
>15:27:13.014 kernel: CR2: 0010 CR3: 00082ade3000 CR4: 
> 000426e0
>15:27:13.014 kernel: DR0:  DR1:  DR2: 
> 
>15:27:13.014 kernel: DR3:  DR6: 0ff0 DR7: 
> 0400
>15:27:13.014 kernel: Process async/0 (pid: 13999, threadinfo 
> 88066afa2000, task 8806c2bf05e0)
>15:27:13.014 kernel: Stack:
>15:27:13.014 kernel:  88066afa3e00 88066afa3e00 
> 88082c829a00
>15:27:13.014 kernel: 88066afa3e40 80306feb 88082b58a000 
> 
>15:27:13.014 kernel: 0001 8806bc50c920 88066afa3e40 
> 88082b58a000
>15:27:13.014 kernel: Call Trace:
>15:27:13.014 kernel: [] register_disk+0x122/0x13a
>15:27:13.014 kernel: [] add_disk+0xaa/0x106
>15:27:13.014 kernel: [] sd_probe_async+0x198/0x25b
>15:27:13.014 kernel: [] async_thread+0x10c/0x20d
>15:27:13.014 kernel: [] ? default_wake_function+0x0/0xf
>15:27:13.014 kernel: [] ? async_thread+0x0/0x20d
>15:27:13.014 kernel: [] kthread+0x55/0x80
>15:27:13.014 kernel: [] child_rip+0xa/0x20
>15:27:13.014 kernel: [] ? kthread+0x0/0x80
>15:27:13.014 kernel: [] ? child_rip+0x0/0x20
>15:27:13.014 kernel: Code: c8 ff 80 e1

Re: Login race, and (perhaps unrelated) kernel oops

2009-08-20 Thread Chris Webb

Mike Christie  writes:

> I was thinking you might be hitting a problem in the async code the 
> scsi_scan.c/sd.c code use like here:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;f=kernel/async.c;h=27235f5de198997180d4a293adc932d48c6101c7;hb=HEAD

Hi. Sorry to follow up again, I've now seen the same oops with async scsi
scan turned off at compile time in the kernel:

  p0# uname -a
  Linux f7ea8425-d45b-490f-a738-d181d0df6963.host.elastichosts.com 
2.6.30.4-elastic-lon-p #2 SMP PREEMPT Thu Aug 20 14:30:50 BST 2009 x86_64 
Intel(R) Xeon(R) CPU E5420 @ 2.50GHz GenuineIntel GNU/Linux
  p0# zgrep SCAN_ASYNC /proc/config.gz
  # CONFIG_SCSI_SCAN_ASYNC is not set

  p0# cat /var/log/kern/2009-08-20
  [...]
  15:27:10.485 kernel: scsi9 : iSCSI Initiator over TCP/IP
  15:27:11.493 kernel: scsi 9:0:0:0: RAID  IET  Controller  
 0001 PQ: 0 ANSI: 5
  15:27:11.493 kernel: scsi 9:0:0:0: Attached scsi generic sg6 type 12
  15:27:11.495 kernel: scsi 9:0:0:1: Direct-Access IET  VIRTUAL-DISK
 0001 PQ: 0 ANSI: 5
  15:27:11.495 kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
  15:27:11.495 kernel: sd 9:0:0:1: [sdg] 4194304 512-byte hardware sectors: 
(2.14 GB/2.00 GiB)
  15:27:11.495 kernel: sd 9:0:0:1: [sdg] Write Protect is off
  15:27:11.495 kernel: sd 9:0:0:1: [sdg] Write cache: disabled, read cache: 
enabled, doesn't support DPO or FUA
  15:27:13.012 kernel: sdg:<6>scsi 9:0:0:1: [sdg] Unhandled error code
  15:27:13.012 kernel: scsi 9:0:0:1: [sdg] Result: hostbyte=0x07 driverbyte=0x00
  15:27:13.012 kernel: end_request: I/O error, dev sdg, sector 0
  15:27:13.012 kernel: Buffer I/O error on device sdg, logical block 0
  15:27:13.012 kernel: ldm_validate_partition_table(): Disk read failed.
  15:27:13.012 kernel: unable to read partition table
  15:27:13.014 kernel: BUG: unable to handle kernel NULL pointer dereference at 
0010
  15:27:13.014 kernel: IP: [] disk_part_iter_next+0x74/0xfd
  15:27:13.014 kernel: PGD 82ad0b067 PUD 82cd7e067 PMD 0 
  15:27:13.014 kernel: Oops:  [#1] PREEMPT SMP 
  15:27:13.014 kernel: last sysfs file: 
/sys/devices/platform/host9/session4/iscsi_session/session4/ifacename
  15:27:13.014 kernel: CPU 5 
  15:27:13.014 kernel: Modules linked in:
  15:27:13.014 kernel: Pid: 13999, comm: async/0 Not tainted 
2.6.30.4-elastic-lon-p #2 X7DBN
  15:27:13.014 kernel: RIP: 0010:[]  [] 
disk_part_iter_next+0x74/0xfd
  15:27:13.014 kernel: RSP: 0018:88066afa3dd0  EFLAGS: 00010246
  15:27:13.014 kernel: RAX: 88082b58a000 RBX: 88066afa3e00 RCX: 

  15:27:13.014 kernel: RDX:  RSI: 88082b58a000 RDI: 

  15:27:13.014 kernel: RBP: 88066afa3df0 R08: 88066afa2000 R09: 
8806a204f000
  15:27:13.014 kernel: R10: 00fb12c7d274 R11: 8806c2bf0628 R12: 
88066afa3e00
  15:27:13.014 kernel: R13: 88082c829a00 R14:  R15: 
8806bc50c920
  15:27:13.014 kernel: FS:  () GS:88002818a000() 
knlGS:
  15:27:13.014 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
  15:27:13.014 kernel: CR2: 0010 CR3: 00082ade3000 CR4: 
000426e0
  15:27:13.014 kernel: DR0:  DR1:  DR2: 

  15:27:13.014 kernel: DR3:  DR6: 0ff0 DR7: 
0400
  15:27:13.014 kernel: Process async/0 (pid: 13999, threadinfo 
88066afa2000, task 8806c2bf05e0)
  15:27:13.014 kernel: Stack:
  15:27:13.014 kernel:  88066afa3e00 88066afa3e00 
88082c829a00
  15:27:13.014 kernel: 88066afa3e40 80306feb 88082b58a000 

  15:27:13.014 kernel: 0001 8806bc50c920 88066afa3e40 
88082b58a000
  15:27:13.014 kernel: Call Trace:
  15:27:13.014 kernel: [] register_disk+0x122/0x13a
  15:27:13.014 kernel: [] add_disk+0xaa/0x106
  15:27:13.014 kernel: [] sd_probe_async+0x198/0x25b
  15:27:13.014 kernel: [] async_thread+0x10c/0x20d
  15:27:13.014 kernel: [] ? default_wake_function+0x0/0xf
  15:27:13.014 kernel: [] ? async_thread+0x0/0x20d
  15:27:13.014 kernel: [] kthread+0x55/0x80
  15:27:13.014 kernel: [] child_rip+0xa/0x20
  15:27:13.014 kernel: [] ? kthread+0x0/0x80
  15:27:13.014 kernel: [] ? child_rip+0x0/0x20
  15:27:13.014 kernel: Code: c8 ff 80 e1 0c b9 00 00 00 00 0f 44 c1 41 83 cd ff 
48 8d 7a 20 48 be ff ff ff ff 08 00 00 00 48 b9 00 00 00 00 08 00 00 00 eb 50 
<8b> 42 10 41 bd 01 00 00 00 eb db 4c 63 c2 4e 8d 04 c7 4d 8b 20 
  15:27:13.015 kernel: RIP  [] disk_part_iter_next+0x74/0xfd
  15:27:13.015 kernel: RSP 
  15:27:13.015 kernel: CR2: 0010
  15:27:13.015 kernel: ---[ end trace 6104b56ef5590e25 ]---
  15:27:13.015 kernel: note: async/0[13999] exited with preempt_count 1
  15:27:13.064 kernel: device-mapper: ioctl: device doesn't appear to be in the 
dev hash table.
  15:27:13.115 kernel: device-mapper: ioctl: device doesn't app

Re: Login race, and (perhaps unrelated) kernel oops

2009-08-10 Thread Chris Webb

Mike Christie  writes:

> The dm target is not dm-multipath is it?

No, I'm just using linear targets to add a layer of indirection that
occasionally gets flipped from one device to another during live storage
migration.

> I was thinking you might be hitting a problem in the async code the 
> scsi_scan.c/sd.c code use like here:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;f=kernel/async.c;h=27235f5de198997180d4a293adc932d48c6101c7;hb=HEAD

Ah, right: although this isn't in 2.6.30.4, it does claim to be a fix for
d5a877e8dd409d8c702986d06485c374b705d340:

  
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d5a877e8dd409d8c702986d06485c374b705d340

which isn't in 2.6.30.4 either. Of course, there may well be other problems
with the older async code I'm running in the stable series. It certainly
looks like it was reworked not too long before 2.6.30 was released.

Cheers,

Chris.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Login race, and (perhaps unrelated) kernel oops

2009-08-10 Thread Mike Christie

Chris Webb wrote:
> Hi Mike. Many thanks for getting back to me in such detail.
> 
> Mike Christie  writes:
> 
>> On 08/06/2009 02:32 PM, Chris Webb wrote:
> [...]
>>> What's the canonical race-free way to wait for the iscsi login to properly
>>> finish and the device node to be available? At the moment, I have a hack
>> I am not sure about that. I will just add some details about what 
>> iscsiadm is doing.
>>
>> After it logs into the target it does
>>
>> echo - - -  > /sys/class/scsi_host/hostX/scan
>>
>> This has the scsi layer do commands like inquirys and report luns to 
>> find your disks. The scsi layer will then create the kernel structs for 
>> the scsi devices and when that happens it will fire hot plug events that 
>> udev waits for. When the kernel setup is done the echo command above 
>> will return.
>>
>> So from iscsiadm's perspective it has logged in and detected the disks. 
>> It has no idea udev is running or not. Maybe in some cases like when run 
>> from the initrd/initramfs udev might not be running like normal.
>>
>> If you add a udev rule, that checks for iscsi devices and does whatever 
>> you were trying to do will that help?
> 
> To be honest, I don't really need udev to have completed, but I'm using it
> finishing as an indicator that I can safely scan /sys/ for the new iscsi
> device and start using it in a device mapper target. From what you describe,


The dm target is not dm-multipath is it? I believe dm-multipath's 
userspace component runs off of the hot plug events. When I discvoery 
iscsi or FC or SAS disks, it looks like multipthd gets run or handles a 
hot plug event, and then it will assemble dm-multipath devices similar 
to how you are trying to do it.



> iscsiadm exiting ought to be completely sufficient for this, and a udevadm
> settle on its own should be enough to guarantee the /dev/sdX node.
> 
> What I'm doing is fairly heavyweight (building device mapper tables and then
> starting qemu-kvm virtual machines) and potentially depends on a number of
> different iscsi logins, so I can't reasonably do it in a udev rule. However,
> I could have a custom udev rule to produce a symlink by target name for the
> device node, and do something like
> 
>   udevadm settle
>   until [ -f /dev/my/symlink ]; do sleep 0.1; done
> 
> and read the major/minor from (the target of) the symlink rather than
> searching sysfs myself; maybe that would be a bit tidier.
> 
>> For the problems below, I am not sure what is up. The iscsi code has no
>> control over it. I think the first problem is fixed though. I think there
>> were some bugs in the kernel async scanning code (can you turn that off in
>> the kernel configuration) that were fixed recently in 2.6.31-rc5.
> 
> Sure, I can turn off the async bus scanning. It's not something we really
> make use of other than the convenience of slightly faster boot times. Don't
> suppose there's an easy way to disable this without rebooting the machine?
> 
> Out of curiousity, and wondering about a back-port, I've been looking for
> the bug-fix commit you mention, but haven't turned it up yet. I found Edward
> Goggin's patch initialising max_target_blocked in scsi_alloc_target() but
> that's already in the 2.6.30.4 stable tree that I'm using. I don't see any
> more recent changes in scsi_scan.c apart from the increased INQUIRY timeout.
> Were the patches elsewere in the kernel?
> 


I was thinking you might be hitting a problem in the async code the 
scsi_scan.c/sd.c code use like here:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;f=kernel/async.c;h=27235f5de198997180d4a293adc932d48c6101c7;hb=HEAD

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Login race, and (perhaps unrelated) kernel oops

2009-08-10 Thread Chris Webb

Hi Mike. Many thanks for getting back to me in such detail.

Mike Christie  writes:

> On 08/06/2009 02:32 PM, Chris Webb wrote:
[...]
> > What's the canonical race-free way to wait for the iscsi login to properly
> > finish and the device node to be available? At the moment, I have a hack
> 
> I am not sure about that. I will just add some details about what 
> iscsiadm is doing.
>
> After it logs into the target it does
> 
> echo - - -  > /sys/class/scsi_host/hostX/scan
> 
> This has the scsi layer do commands like inquirys and report luns to 
> find your disks. The scsi layer will then create the kernel structs for 
> the scsi devices and when that happens it will fire hot plug events that 
> udev waits for. When the kernel setup is done the echo command above 
> will return.
> 
> So from iscsiadm's perspective it has logged in and detected the disks. 
> It has no idea udev is running or not. Maybe in some cases like when run 
> from the initrd/initramfs udev might not be running like normal.
> 
> If you add a udev rule, that checks for iscsi devices and does whatever 
> you were trying to do will that help?

To be honest, I don't really need udev to have completed, but I'm using it
finishing as an indicator that I can safely scan /sys/ for the new iscsi
device and start using it in a device mapper target. From what you describe,
iscsiadm exiting ought to be completely sufficient for this, and a udevadm
settle on its own should be enough to guarantee the /dev/sdX node.

What I'm doing is fairly heavyweight (building device mapper tables and then
starting qemu-kvm virtual machines) and potentially depends on a number of
different iscsi logins, so I can't reasonably do it in a udev rule. However,
I could have a custom udev rule to produce a symlink by target name for the
device node, and do something like

  udevadm settle
  until [ -f /dev/my/symlink ]; do sleep 0.1; done

and read the major/minor from (the target of) the symlink rather than
searching sysfs myself; maybe that would be a bit tidier.

> For the problems below, I am not sure what is up. The iscsi code has no
> control over it. I think the first problem is fixed though. I think there
> were some bugs in the kernel async scanning code (can you turn that off in
> the kernel configuration) that were fixed recently in 2.6.31-rc5.

Sure, I can turn off the async bus scanning. It's not something we really
make use of other than the convenience of slightly faster boot times. Don't
suppose there's an easy way to disable this without rebooting the machine?

Out of curiousity, and wondering about a back-port, I've been looking for
the bug-fix commit you mention, but haven't turned it up yet. I found Edward
Goggin's patch initialising max_target_blocked in scsi_alloc_target() but
that's already in the 2.6.30.4 stable tree that I'm using. I don't see any
more recent changes in scsi_scan.c apart from the increased INQUIRY timeout.
Were the patches elsewere in the kernel?

Cheers,

Chris.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Login race, and (perhaps unrelated) kernel oops

2009-08-07 Thread Mike Christie

On 08/06/2009 02:32 PM, Chris Webb wrote:
> I'm seeing a fairly rare bug with drives I'm accessing with open-iscsi. The
> iscsiadm binary is called from within our management system, from a script
> which does
>
>iscsiadm -m discovery -t sendtargets -p $HOST
>iscsiadm -m node -T $TARGET -o update -n node.session.auth.authmethod -v 
> CHAP
>iscsiadm -m node -T $TARGET -o update -n node.session.auth.username -v 
> $USER
>iscsiadm -m node -T $TARGET -o update -n node.session.auth.password -v 
> $PASS
>iscsiadm -m node -T $TARGET -l
>
> and then waits for udev to settle and finds the new /dev/sdX device node with
> something like
>
>udevadm settle
>local SESSION TARGET SESSION BLOCK KERNEL
># shopt nullglob set, so this is correct in the 'no match' case...
>for SESSION in /sys/class/scsi_host/host*/device/session*; do
>  for TARGET in $SESSION/iscsi_session/*/targetname; do
>[ "`<  "$TARGET"`" = "$1" ] || continue
>for BLOCK in "$SESSION"/target*/*/block/*; do
>  KERNEL=${BLOCK##*/}
>  echo /dev/$KERNEL
>  return 0
>done
>  done
>done
>return 1
>
> (The code then goes on set up a device mapper table based on this node.)
>
> First problem is that iscsiadm -m node -l seems to be able to return without
> the node having been created, or even having begun to propagate to udev, so 
> the
> udevadm settle returns (i.e. empty queue) even though the /dev/sdX hasn't yet
> been created, presumably because the uevent hasn't yet gone out to udev even
> though iscsiadm has exited.
>
> What's the canonical race-free way to wait for the iscsi login to properly
> finish and the device node to be available? At the moment, I have a hack

I am not sure about that. I will just add some details about what 
iscsiadm is doing.

After it logs into the target it does

echo - - -  > /sys/class/scsi_host/hostX/scan

This has the scsi layer do commands like inquirys and report luns to 
find your disks. The scsi layer will then create the kernel structs for 
the scsi devices and when that happens it will fire hot plug events that 
udev waits for. When the kernel setup is done the echo command above 
will return.

So from iscsiadm's perspective it has logged in and detected the disks. 
It has no idea udev is running or not. Maybe in some cases like when run 
from the initrd/initramfs udev might not be running like normal.

If you add a udev rule, that checks for iscsi devices and does whatever 
you were trying to do will that help?


> looking like sleep 0.1; udevadm settle, which is ugly and unreliable: it
> will clearly fail on a sufficiently loaded box.
>
> However, with this hack in place, the vast majority of iscsi logins work
> okay. Nonetheless, I see occasional serious kernel problems, like this oops:
>


For the problems below, I am not sure what is up. The iscsi code has no 
control over it. I think the first problem is fixed though. I think 
there were some bugs in the kernel async scanning code (can you turn 
that off in the kernel configuration) that were fixed recently in 
2.6.31-rc5.


>10:05:03.019 kernel: scsi23 : iSCSI Initiator over TCP/IP
>10:05:04.015 kernel: scsi 23:0:0:0: RAID  IET  Controller  
>  0001 PQ: 0 ANSI: 5
>10:05:04.015 kernel: scsi 23:0:0:0: Attached scsi generic sg14 type 12
>10:05:04.017 kernel: scsi 23:0:0:1: Direct-Access IET  
> VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
>10:05:04.017 kernel: sd 23:0:0:1: Attached scsi generic sg15 type 0
>10:05:04.018 kernel: sd 23:0:0:1: [sdk] 27262976 512-byte hardware 
> sectors: (13.9 GB/13.0 GiB)
>10:05:04.018 kernel: sd 23:0:0:1: [sdk] Write Protect is off
>10:05:04.019 kernel: sd 23:0:0:1: [sdk] Write cache: disabled, read cache: 
> enabled, doesn't support DPO or FUA
>10:05:04.248 kernel: sdk: unknown partition table
>10:05:04.248 kernel: BUG: unable to handle kernel NULL pointer dereference 
> at 0010
>10:05:04.248 kernel: IP: [] disk_part_iter_next+0x74/0xfd
>10:05:04.248 kernel: PGD 82ad47067 PUD 82b591067 PMD 0
>10:05:04.248 kernel: Oops:  [#1] PREEMPT SMP
>10:05:04.248 kernel: last sysfs file: 
> /sys/devices/platform/host14/session9/iscsi_session/session9/ifacename
>10:05:04.248 kernel: CPU 6
>10:05:04.248 kernel: Modules linked in:
>10:05:04.248 kernel: Pid: 29251, comm: async/0 Not tainted 
> 2.6.30.4-elastic-lon-p #1 X7DBN
>10:05:04.248 kernel: RIP: 0010:[]  [] 
> disk_part_iter_next+0x74/0xfd
>10:05:04.248 kernel: RSP: 0018:8802645b9dd0  EFLAGS: 00010246
>10:05:04.248 kernel: RAX: 880548d43c00 RBX: 8802645b9e00 RCX: 
> 
>10:05:04.248 kernel: RDX:  RSI: 880548d43c00 RDI: 
> 
>10:05:04.248 kernel: RBP: 8802645b9df0 R08:  R09: 
> 88082cdaf000
>10:05:04.248 kernel: R10: 8802645b9ce0 R11: 880825d06fe8 R12: 
> 8802645b9e00
>10

Re: Login race, and (perhaps unrelated) kernel oops

2009-08-07 Thread Chris Webb

Chris Webb  writes:

> I'm running stock 2.6.30.4 with the upstream kernel iscsi_tcp driver
> compiled in

...architecture x86-64 is probably also worth mentioning!

Cheers,

Chris.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---