Re: Login race, and (perhaps unrelated) kernel oops

Mike Christie Thu, 20 Aug 2009 10:48:56 -0700

On 08/20/2009 12:33 PM, Chris Webb wrote:
> Mike Christie<[email protected]>  writes:
>
>> I was thinking you might be hitting a problem in the async code the
>> scsi_scan.c/sd.c code use like here:
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;f=kernel/async.c;h=27235f5de198997180d4a293adc932d48c6101c7;hb=HEAD
>
> Hi. Sorry to follow up again, I've now seen the same oops with async scsi
> scan turned off at compile time in the kernel:
>


Could you repost this to the linux-scsi list? Change the title to oops 
during scsi scanning disk setup.

I am not sure what is causing this. It does not look like iscsi is the 
problem. We are just kicking off the scan. The oops is in the disk 
setup, and I am not familiar with that code. The linux-scsi guys can 
help more than I can.


>    p0# uname -a
>    Linux f7ea8425-d45b-490f-a738-d181d0df6963.host.elastichosts.com 
> 2.6.30.4-elastic-lon-p #2 SMP PREEMPT Thu Aug 20 14:30:50 BST 2009 x86_64 
> Intel(R) Xeon(R) CPU E5420 @ 2.50GHz GenuineIntel GNU/Linux
>    p0# zgrep SCAN_ASYNC /proc/config.gz
>    # CONFIG_SCSI_SCAN_ASYNC is not set
>
>    p0# cat /var/log/kern/2009-08-20
>    [...]
>    15:27:10.485 kernel: scsi9 : iSCSI Initiator over TCP/IP
>    15:27:11.493 kernel: scsi 9:0:0:0: RAID              IET      Controller   
>     0001 PQ: 0 ANSI: 5
>    15:27:11.493 kernel: scsi 9:0:0:0: Attached scsi generic sg6 type 12
>    15:27:11.495 kernel: scsi 9:0:0:1: Direct-Access     IET      VIRTUAL-DISK 
>     0001 PQ: 0 ANSI: 5
>    15:27:11.495 kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
>    15:27:11.495 kernel: sd 9:0:0:1: [sdg] 4194304 512-byte hardware sectors: 
> (2.14 GB/2.00 GiB)
>    15:27:11.495 kernel: sd 9:0:0:1: [sdg] Write Protect is off
>    15:27:11.495 kernel: sd 9:0:0:1: [sdg] Write cache: disabled, read cache: 
> enabled, doesn't support DPO or FUA
>    15:27:13.012 kernel: sdg:<6>scsi 9:0:0:1: [sdg] Unhandled error code
>    15:27:13.012 kernel: scsi 9:0:0:1: [sdg] Result: hostbyte=0x07 
> driverbyte=0x00
>    15:27:13.012 kernel: end_request: I/O error, dev sdg, sector 0
>    15:27:13.012 kernel: Buffer I/O error on device sdg, logical block 0
>    15:27:13.012 kernel: ldm_validate_partition_table(): Disk read failed.
>    15:27:13.012 kernel: unable to read partition table
>    15:27:13.014 kernel: BUG: unable to handle kernel NULL pointer dereference 
> at 0000000000000010
>    15:27:13.014 kernel: IP: [<ffffffff803f0d77>] disk_part_iter_next+0x74/0xfd
>    15:27:13.014 kernel: PGD 82ad0b067 PUD 82cd7e067 PMD 0
>    15:27:13.014 kernel: Oops: 0000 [#1] PREEMPT SMP
>    15:27:13.014 kernel: last sysfs file: 
> /sys/devices/platform/host9/session4/iscsi_session/session4/ifacename
>    15:27:13.014 kernel: CPU 5
>    15:27:13.014 kernel: Modules linked in:
>    15:27:13.014 kernel: Pid: 13999, comm: async/0 Not tainted 
> 2.6.30.4-elastic-lon-p #2 X7DBN
>    15:27:13.014 kernel: RIP: 0010:[<ffffffff803f0d77>]  [<ffffffff803f0d77>] 
> disk_part_iter_next+0x74/0xfd
>    15:27:13.014 kernel: RSP: 0018:ffff88066afa3dd0  EFLAGS: 00010246
>    15:27:13.014 kernel: RAX: ffff88082b58a000 RBX: ffff88066afa3e00 RCX: 
> 0000000000000000
>    15:27:13.014 kernel: RDX: 0000000000000000 RSI: ffff88082b58a000 RDI: 
> 0000000000000000
>    15:27:13.014 kernel: RBP: ffff88066afa3df0 R08: ffff88066afa2000 R09: 
> ffff8806a204f000
>    15:27:13.014 kernel: R10: 000000fb12c7d274 R11: ffff8806c2bf0628 R12: 
> ffff88066afa3e00
>    15:27:13.014 kernel: R13: ffff88082c829a00 R14: 0000000000000000 R15: 
> ffff8806bc50c920
>    15:27:13.014 kernel: FS:  0000000000000000(0000) GS:ffff88002818a000(0000) 
> knlGS:0000000000000000
>    15:27:13.014 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>    15:27:13.014 kernel: CR2: 0000000000000010 CR3: 000000082ade3000 CR4: 
> 00000000000426e0
>    15:27:13.014 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
>    15:27:13.014 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
>    15:27:13.014 kernel: Process async/0 (pid: 13999, threadinfo 
> ffff88066afa2000, task ffff8806c2bf05e0)
>    15:27:13.014 kernel: Stack:
>    15:27:13.014 kernel: 0000000000000000 ffff88066afa3e00 ffff88066afa3e00 
> ffff88082c829a00
>    15:27:13.014 kernel: ffff88066afa3e40 ffffffff80306feb ffff88082b58a000 
> 0000000000000000
>    15:27:13.014 kernel: 0000000000000001 ffff8806bc50c920 ffff88066afa3e40 
> ffff88082b58a000
>    15:27:13.014 kernel: Call Trace:
>    15:27:13.014 kernel: [<ffffffff80306feb>] register_disk+0x122/0x13a
>    15:27:13.014 kernel: [<ffffffff803f0b0f>] add_disk+0xaa/0x106
>    15:27:13.014 kernel: [<ffffffff80493609>] sd_probe_async+0x198/0x25b
>    15:27:13.014 kernel: [<ffffffff80270482>] async_thread+0x10c/0x20d
>    15:27:13.014 kernel: [<ffffffff802545ff>] ? default_wake_function+0x0/0xf
>    15:27:13.014 kernel: [<ffffffff80270376>] ? async_thread+0x0/0x20d
>    15:27:13.014 kernel: [<ffffffff8026ad89>] kthread+0x55/0x80
>    15:27:13.014 kernel: [<ffffffff8022be6a>] child_rip+0xa/0x20
>    15:27:13.014 kernel: [<ffffffff8026ad34>] ? kthread+0x0/0x80
>    15:27:13.014 kernel: [<ffffffff8022be60>] ? child_rip+0x0/0x20
>    15:27:13.014 kernel: Code: c8 ff 80 e1 0c b9 00 00 00 00 0f 44 c1 41 83 cd 
> ff 48 8d 7a 20 48 be ff ff ff ff 08 00 00 00 48 b9 00 00 00 00 08 00 00 00 eb 
> 50<8b>  42 10 41 bd 01 00 00 00 eb db 4c 63 c2 4e 8d 04 c7 4d 8b 20
>    15:27:13.015 kernel: RIP  [<ffffffff803f0d77>] 
> disk_part_iter_next+0x74/0xfd
>    15:27:13.015 kernel: RSP<ffff88066afa3dd0>
>    15:27:13.015 kernel: CR2: 0000000000000010
>    15:27:13.015 kernel: ---[ end trace 6104b56ef5590e25 ]---
>    15:27:13.015 kernel: note: async/0[13999] exited with preempt_count 1
>    15:27:13.064 kernel: device-mapper: ioctl: device doesn't appear to be in 
> the dev hash table.
>    15:27:13.115 kernel: device-mapper: ioctl: device doesn't appear to be in 
> the dev hash table.
>    15:34:52.618 kernel: device tap1 entered promiscuous mode
>    15:34:52.621 kernel: br0: port 8(tap1) entering forwarding state
>    15:39:12.560 kernel: scsi10 : iSCSI Initiator over TCP/IP
>    15:39:13.568 kernel: scsi 10:0:0:0: RAID              IET      Controller  
>      0001 PQ: 0 ANSI: 5
>    15:39:13.568 kernel: scsi 10:0:0:0: Attached scsi generic sg6 type 12
>    15:39:13.569 kernel: scsi 10:0:0:1: Direct-Access     IET      
> VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
>    15:39:13.572 kernel: general protection fault: 0000 [#2] PREEMPT SMP
>    15:39:13.572 kernel: last sysfs file: 
> /sys/devices/platform/host10/scsi_host/host10/scan
>    15:39:13.572 kernel: CPU 5
>    15:39:13.572 kernel: Modules linked in:
>    15:39:13.572 kernel: Pid: 16876, comm: iscsid Tainted: G      D    
> 2.6.30.4-elastic-lon-p #2 X7DBN
>    15:39:13.572 kernel: RIP: 0010:[<ffffffff8024b6ce>]  [<ffffffff8024b6ce>] 
> __wake_up_common+0x46/0x7f
>    15:39:13.572 kernel: RSP: 0018:ffff8802580bd848  EFLAGS: 00010006
>    15:39:13.572 kernel: RAX: ffff88066afa3ed0 RBX: 0000000000000001 RCX: 
> 0000000000000000
>    15:39:13.572 kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: 
> ffff88066afa3ed0
>    15:39:13.572 kernel: RBP: ffff8802580bd888 R08: 0000000000000000 R09: 
> ffff88082b9c4d88
>    15:39:13.572 kernel: R10: 0000000000000abc R11: 00000000000002af R12: 
> 0000000000000003
>    15:39:13.572 kernel: R13: ea164bc9f170a707 R14: 000000003f771c34 R15: 
> 0000000000000000
>    15:39:13.572 kernel: FS:  00007f99209626f0(0000) GS:ffff88002818a000(0000) 
> knlGS:0000000000000000
>    15:39:13.572 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>    15:39:13.572 kernel: CR2: 0000000002aced78 CR3: 00000005c6a7d000 CR4: 
> 00000000000426e0
>    15:39:13.572 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
>    15:39:13.572 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
>    15:39:13.572 kernel: Process iscsid (pid: 16876, threadinfo 
> ffff8802580bc000, task ffff8806c2bf1780)
>    15:39:13.572 kernel: Stack:
>    15:39:13.572 kernel: 0000000000000000 ffffffff80835c38 ffff8802580bd888 
> ffffffff80835c30
>    15:39:13.572 kernel: 0000000000000282 ffff88082b5d7000 0000000000000011 
> ffffffff80835c80
>    15:39:13.572 kernel: ffff8802580bd8c8 ffffffff8024c06c 00000000000080d0 
> 0000000000000000
>    15:39:13.572 kernel: Call Trace:
>    15:39:13.572 kernel: [<ffffffff8024c06c>] __wake_up+0x34/0x48
>    15:39:13.572 kernel: [<ffffffff80493471>] ? sd_probe_async+0x0/0x25b
>    15:39:13.572 kernel: [<ffffffff802708cb>] __async_schedule+0x17e/0x190
>    15:39:13.572 kernel: [<ffffffff802708f8>] async_schedule+0x10/0x14
>    15:39:13.572 kernel: [<ffffffff80493d4a>] sd_probe+0x1bd/0x213
>    15:39:13.572 kernel: [<ffffffff80473a1e>] driver_probe_device+0x9a/0x11f
>    15:39:13.572 kernel: [<ffffffff80473b54>] __device_attach+0x35/0x3a
>    15:39:13.572 kernel: [<ffffffff80473b1f>] ? __device_attach+0x0/0x3a
>    15:39:13.572 kernel: [<ffffffff80472fd4>] bus_for_each_drv+0x51/0x88
>    15:39:13.572 kernel: [<ffffffff80473be1>] device_attach+0x5e/0x75
>    15:39:13.572 kernel: [<ffffffff80472e3c>] bus_attach_device+0x26/0x58
>    15:39:13.572 kernel: [<ffffffff80471a5d>] device_add+0x3ff/0x562
>    15:39:13.572 kernel: [<ffffffff80485104>] scsi_sysfs_add_sdev+0xb5/0x252
>    15:39:13.572 kernel: [<ffffffff80482f72>] 
> scsi_probe_and_add_lun+0x910/0xa32
>    15:39:13.572 kernel: [<ffffffff8048363c>] __scsi_scan_target+0x3a5/0x542
>    15:39:13.572 kernel: [<ffffffff8029e08d>] ? zone_statistics+0x60/0x65
>    15:39:13.572 kernel: [<ffffffff80293369>] ? 
> get_page_from_freelist+0x4ad/0x67a
>    15:39:13.572 kernel: [<ffffffff80483dce>] scsi_scan_target+0x97/0xae
>    15:39:13.572 kernel: [<ffffffff80487c3b>] iscsi_user_scan_session+0xcd/0xe4
>    15:39:13.572 kernel: [<ffffffff80487b6e>] ? 
> iscsi_user_scan_session+0x0/0xe4
>    15:39:13.572 kernel: [<ffffffff80470f95>] device_for_each_child+0x35/0x6c
>    15:39:13.572 kernel: [<ffffffff80487b53>] iscsi_user_scan+0x28/0x2a
>    15:39:13.572 kernel: [<ffffffff8048471c>] store_scan+0x9b/0xc6
>    15:39:13.573 kernel: [<ffffffff80470765>] dev_attr_store+0x1b/0x1d
>    15:39:13.573 kernel: [<ffffffff8030b61d>] sysfs_write_file+0xf2/0x12e
>    15:39:13.573 kernel: [<ffffffff802c1711>] vfs_write+0xad/0x129
>    15:39:13.573 kernel: [<ffffffff802c1846>] sys_write+0x45/0x6c
>    15:39:13.573 kernel: [<ffffffff8022aeeb>] system_call_fastpath+0x16/0x1b
>    15:39:13.573 kernel: Code: 47 08 48 83 c7 08 48 83 e8 18 4c 8b 68 18 48 89 
> 7d c8 49 83 ed 18 eb 33 44 8b 30 4c 89 c1 4c 89 45 c0 44 89 fa 44 89 e6 48 89 
> c7<ff>  50 10 4c 8b 45 c0 85 c0 74 0a 41 80 e6 01 74 04 ff cb 74 15
>    15:39:13.573 kernel: RIP  [<ffffffff8024b6ce>] __wake_up_common+0x46/0x7f
>    15:39:13.573 kernel: RSP<ffff8802580bd848>
>    15:39:13.573 kernel: ---[ end trace 6104b56ef5590e26 ]---
>    15:39:13.573 kernel: note: iscsid[16876] exited with preempt_count 1
>    15:39:13.573 kernel: BUG: scheduling while atomic: iscsid/16876/0x10000002
>    15:39:13.573 kernel: Modules linked in:
>    15:39:13.573 kernel: Pid: 16876, comm: iscsid Tainted: G      D    
> 2.6.30.4-elastic-lon-p #2
>    15:39:13.573 kernel: Call Trace:
>    15:39:13.573 kernel: [<ffffffff80250e32>] __schedule_bug+0x57/0x5c
>    15:39:13.573 kernel: [<ffffffff806645b9>] __schedule+0xc1/0x814
>    15:39:13.573 kernel: [<ffffffff803ff025>] ? number+0x12f/0x225
>    15:39:13.573 kernel: [<ffffffff8026ef79>] ? 
> __atomic_notifier_call_chain+0x20/0x4e
>    15:39:13.573 kernel: [<ffffffff80664dec>] schedule+0x18/0x3b
>    15:39:13.573 kernel: [<ffffffff80250f33>] __cond_resched+0x1c/0x45
>    15:39:13.573 kernel: [<ffffffff80664ec2>] _cond_resched+0x30/0x3b
>    15:39:13.573 kernel: [<ffffffff802a17b2>] unmap_vmas+0x6cd/0x886
>    15:39:13.573 kernel: [<ffffffff802a5db3>] exit_mmap+0xd4/0x184
>    15:39:13.573 kernel: [<ffffffff80255c6b>] mmput+0x2b/0xb4
>    15:39:13.573 kernel: [<ffffffff80259524>] exit_mm+0xff/0x10a
>    15:39:13.574 kernel: [<ffffffff8025ae2c>] do_exit+0x17a/0x695
>    15:39:13.574 kernel: [<ffffffff8022ea59>] oops_end+0x89/0x8e
>    15:39:13.574 kernel: [<ffffffff8022ec1c>] die+0x55/0x5e
>    15:39:13.574 kernel: [<ffffffff8022ccaa>] do_general_protection+0x123/0x12b
>    15:39:13.574 kernel: [<ffffffff806670cf>] general_protection+0x1f/0x30
>    15:39:13.574 kernel: [<ffffffff8024b6ce>] ? __wake_up_common+0x46/0x7f
>    15:39:13.574 kernel: [<ffffffff8024c06c>] __wake_up+0x34/0x48
>    15:39:13.574 kernel: [<ffffffff80493471>] ? sd_probe_async+0x0/0x25b
>    15:39:13.574 kernel: [<ffffffff802708cb>] __async_schedule+0x17e/0x190
>    15:39:13.574 kernel: [<ffffffff802708f8>] async_schedule+0x10/0x14
>    15:39:13.574 kernel: [<ffffffff80493d4a>] sd_probe+0x1bd/0x213
>    15:39:13.574 kernel: [<ffffffff80473a1e>] driver_probe_device+0x9a/0x11f
>    15:39:13.575 kernel: [<ffffffff80473b54>] __device_attach+0x35/0x3a
>    15:39:13.575 kernel: [<ffffffff80473b1f>] ? __device_attach+0x0/0x3a
>    15:39:13.575 kernel: [<ffffffff80472fd4>] bus_for_each_drv+0x51/0x88
>    15:39:13.575 kernel: [<ffffffff80473be1>] device_attach+0x5e/0x75
>    15:39:13.575 kernel: [<ffffffff80472e3c>] bus_attach_device+0x26/0x58
>    15:39:13.575 kernel: [<ffffffff80471a5d>] device_add+0x3ff/0x562
>    15:39:13.575 kernel: [<ffffffff80485104>] scsi_sysfs_add_sdev+0xb5/0x252
>    15:39:13.575 kernel: [<ffffffff80482f72>] 
> scsi_probe_and_add_lun+0x910/0xa32
>    15:39:13.575 kernel: [<ffffffff8048363c>] __scsi_scan_target+0x3a5/0x542
>    15:39:13.575 kernel: [<ffffffff8029e08d>] ? zone_statistics+0x60/0x65
>    15:39:13.575 kernel: [<ffffffff80293369>] ? 
> get_page_from_freelist+0x4ad/0x67a
>    15:39:13.575 kernel: [<ffffffff80483dce>] scsi_scan_target+0x97/0xae
>    15:39:13.575 kernel: [<ffffffff80487c3b>] iscsi_user_scan_session+0xcd/0xe4
>    15:39:13.575 kernel: [<ffffffff80487b6e>] ? 
> iscsi_user_scan_session+0x0/0xe4
>    15:39:13.575 kernel: [<ffffffff80470f95>] device_for_each_child+0x35/0x6c
>    15:39:13.575 kernel: [<ffffffff80487b53>] iscsi_user_scan+0x28/0x2a
>    15:39:13.575 kernel: [<ffffffff8048471c>] store_scan+0x9b/0xc6
>    15:39:13.575 kernel: [<ffffffff80470765>] dev_attr_store+0x1b/0x1d
>    15:39:13.575 kernel: [<ffffffff8030b61d>] sysfs_write_file+0xf2/0x12e
>    15:39:13.575 kernel: [<ffffffff802c1711>] vfs_write+0xad/0x129
>    15:39:13.575 kernel: [<ffffffff802c1846>] sys_write+0x45/0x6c
>    15:39:13.575 kernel: [<ffffffff8022aeeb>] system_call_fastpath+0x16/0x1b
>    15:39:13.643 kernel: device-mapper: ioctl: device doesn't appear to be in 
> the dev hash table.
>    [...]
>
> iscsid crashed following this, and the machine locked up hard (no longer
> even responding to ping) shortly afterwards.
>
> I'm not quite clear how I can be in sd_probe_async when asynchronous bus
> scanning is compiled out completely, but perhaps I'm misunderstanding how
> this function is used.
>
> The common factor between this incident and the last couple of crashes we've
> seen is that the iSCSI target was under a bit of load in each case (RAID
> resyncing here) and logins were therefore a bit slower than they'd normally
> be. This might be evidence for some sort of race?
>
> Best wishes,
>
> Chris.
>
> >


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Re: Login race, and (perhaps unrelated) kernel oops

Reply via email to