Qinghua(Kevin) Ye wrote: > Hi All, > > I encountered another kernel oops in the open-iscsi code. Not sure if it is > fixed in the new code, but I would like to have some idea about it. Thanks. > > My setup: > Ubuntu 8.04 with kernel 2.6.24-24-generic. > Open-iscsi 2.0-870.3 > > > The kernel oops happens after my iscsi target node crashed. > Here is the kernel message. > Dec 7 10:08:21 qye-serv1 kernel: [1459378.575584] connection3903:0: > detected conn error (1011) > Dec 7 10:08:21 qye-serv1 kernel: [1459378.826718] sd 18028:0:0:16: timing > out command, waited 180s > Dec 7 10:08:21 qye-serv1 kernel: [1459378.826827] sd 18028:0:0:16: [sdd] > Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK > Dec 7 10:08:21 qye-serv1 kernel: [1459378.826840] end_request: I/O error, > dev sdd, sector 4505344 > Dec 7 10:08:21 qye-serv1 kernel: [1459378.826897] Buffer I/O error on > device sdd, logical block 563168 > Dec 7 10:10:21 qye-serv1 kernel: [1459498.629142] session3903: session > recovery timed out after 120 secs > Dec 7 10:10:21 qye-serv1 kernel: [1459498.629618] BUG: unable to handle > kernel paging request at virtual address fcb8d006 > Dec 7 10:10:21 qye-serv1 kernel: [1459498.629815] printing eip: e0a49ff7 > *pde = 00000000 > Dec 7 10:10:21 qye-serv1 kernel: [1459498.630090] Oops: 0000 [#1] SMP > Dec 7 10:10:21 qye-serv1 kernel: [1459498.630290] Modules linked in: > iscsi_tcp libiscsi scsi_transport_iscsi iscsi_trgt nls_iso8859_1 nls_cp437 > vfat fat nfsd auth_rpcgss exportfs crc32c libcrc32c vmmemctl > cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_stats > freq_table cpufreq_powersave sbs video output sbshc dock battery nfs lockd > nfs_acl sunrpc iptable_filter ip_tables x_tables vmhgfs lp loop ipv6 > intel_agp i2c_piix4 serio_raw container ac button agpgart i2c_core shpchp > pci_hotplug parport_pc parport evdev psmouse pcspkr ext3 jbd mbcache ide_cd > cdrom sg sd_mod floppy pcnet32 mptspi mptscsih mptbase mii pata_acpi > ata_generic scsi_transport_spi ata_piix libata scsi_mod ide_generic ide_core > raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath > linear md_mod dm_mirror dm_snapshot dm_mod thermal processor fan fbcon > tileblit font bitblit softcursor fuse vmxnet > Dec 7 10:10:21 qye-serv1 kernel: [1459498.631597] > Dec 7 10:10:21 qye-serv1 kernel: [1459498.631735] Pid: 32010, comm: > iscsi_eh Not tainted (2.6.24-24-generic #1) > Dec 7 10:10:21 qye-serv1 kernel: [1459498.631837] EIP: 0060:[<e0a49ff7>] > EFLAGS: 00010097 CPU: 0 > Dec 7 10:10:21 qye-serv1 kernel: [1459498.632168] EIP is at > iscsi_queuecommand+0x47/0x260 [libiscsi] > Dec 7 10:10:21 qye-serv1 kernel: [1459498.632273] EAX: e09776fa EBX: > d8384500 ECX: e09775e0 EDX: e0979560 > Dec 7 10:10:21 qye-serv1 kernel: [1459498.632370] ESI: d8384500 EDI: > c38d9400 EBP: c38d9400 ESP: ce921eb8 > Dec 7 10:10:21 qye-serv1 kernel: [1459498.632467] DS: 007b ES: 007b FS: > 00d8 GS: 0000 SS: 0068 > Dec 7 10:10:21 qye-serv1 kernel: [1459498.632583] Process iscsi_eh (pid: > 32010, ti=ce920000 task=c61aa000 task.ti=ce920000) > Dec 7 10:10:21 qye-serv1 kernel: [1459498.633296] Stack: e0979560 00000000 > 00000000 d8384500 00000287 c38d9400 00000031 e0979da7 > Dec 7 10:10:21 qye-serv1 kernel: [1459498.633508] dc501600 dc501600 > c38d9400 d8926000 d81dc810 e098018a 00000036 0016452a > Dec 7 10:10:21 qye-serv1 kernel: [1459498.633660] 00099996 d8926028 > d8926148 d89260b0 d8384500 d81dc810 d81dc810 00000000 > Dec 7 10:10:21 qye-serv1 kernel: [1459498.633816] Call Trace: > Dec 7 10:10:21 qye-serv1 kernel: [1459498.634045] [<e0979560>] > scsi_done+0x0/0x20 [scsi_mod] > Dec 7 10:10:21 qye-serv1 kernel: [1459498.708347] [<e0979da7>] > scsi_dispatch_cmd+0x147/0x280 [scsi_mod] > Dec 7 10:10:21 qye-serv1 kernel: [1459498.708502] [<e098018a>] > scsi_request_fn+0x1ea/0x380 [scsi_mod] > Dec 7 10:10:21 qye-serv1 kernel: [1459498.749813] [<e097e760>] > device_unblock+0x0/0x10 [scsi_mod] > Dec 7 10:10:21 qye-serv1 kernel: [1459498.749930] [<c020bda2>] > blk_start_queue+0x32/0x90 > Dec 7 10:10:21 qye-serv1 kernel: [1459498.832531] [<c02803ce>] > get_device+0xe/0x20 > Dec 7 10:10:21 qye-serv1 kernel: [1459498.879473] [<e097913e>] > scsi_device_get+0x1e/0x50 [scsi_mod] > Dec 7 10:10:22 qye-serv1 kernel: [1459498.879592] [<e097e735>] > scsi_internal_device_unblock+0x35/0x60 [scsi_mod] > Dec 7 10:10:22 qye-serv1 kernel: [1459498.879714] [<e0979f52>] > starget_for_each_device+0x72/0x80 [scsi_mod] > Dec 7 10:10:22 qye-serv1 kernel: [1459498.879966] [<e097e080>] > target_unblock+0x0/0x20 [scsi_mod] > Dec 7 10:10:22 qye-serv1 kernel: [1459498.880094] [<e097e09b>] > target_unblock+0x1b/0x20 [scsi_mod] > Dec 7 10:10:22 qye-serv1 kernel: [1459498.880201] [<c02805c2>] > device_for_each_child+0x22/0x40 > Dec 7 10:10:22 qye-serv1 kernel: [1459498.880290] [<e0969710>] > session_recovery_timedout+0x0/0xc0 [scsi_transport_iscsi] > Dec 7 10:10:22 qye-serv1 kernel: [1459498.880439] [<c013ce6f>] > run_workqueue+0xbf/0x160 > Dec 7 10:10:22 qye-serv1 kernel: [1459498.933133] [<c013d910>] > worker_thread+0x0/0xe0 > Dec 7 10:10:22 qye-serv1 kernel: [1459498.933218] [<c013d994>] > worker_thread+0x84/0xe0 > Dec 7 10:10:22 qye-serv1 kernel: [1459498.933297] [<c0140c20>] > autoremove_wake_function+0x0/0x40 > Dec 7 10:10:22 qye-serv1 kernel: [1459498.933407] [<c013d910>] > worker_thread+0x0/0xe0 > Dec 7 10:10:22 qye-serv1 kernel: [1459498.933501] [<c0140962>] > kthread+0x42/0x70 > Dec 7 10:10:22 qye-serv1 kernel: [1459498.933574] [<c0140920>] > kthread+0x0/0x70 > Dec 7 10:10:22 qye-serv1 kernel: [1459498.934156] [<c0105667>] > kernel_thread_helper+0x7/0x10 > Dec 7 10:10:22 qye-serv1 kernel: [1459498.934669] ======================= > Dec 7 10:10:22 qye-serv1 kernel: [1459498.934759] Code: 00 00 00 00 89 90 > f0 00 00 00 c7 80 20 01 00 00 00 00 00 00 c7 80 f4 00 00 00 00 00 00 00 8b > 00 8b 28 b8 01 00 00 00 8b 01 2c 86 <02> 8b 06 8b 80 24 01 00 00 8b 58 74 81 > eb a4 00 00 00 8b bb a0 > Dec 7 10:10:22 qye-serv1 kernel: [1459498.935195] EIP: [<e0a49ff7>] > iscsi_queuecommand+0x47/0x260 [libiscsi] SS:ESP 0068:ce921eb8 > Dec 7 10:10:22 qye-serv1 kernel: [1459498.954695] ---[ end trace > 56831ec2af4ad03c ]--- >
Are you logging out or doing any iscsiadm operation while this is going on? It looks like we detected a connection problem (the target crashing probably). There was some IO running at the time, so it got queued up. The initiator tried to log into the target for replacement/recovery_timeout seconds (120 in your trace), but could not so the replacement_timeout/recovery_timeout fired, and the initiator started failing IO. Then you got the oops above. I have not seen this before. Is this one reproducable? -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.