Hi,
we have recently experienced a problem when running a mkfs.ext4 on a freshly
presented LUN spirals the CPU of out control and makes all the KVM guests on
the same host freeze.
On checking /var/log/messages when it happens I see:
------------------------------------------/ snip
/--------------------------------------------------------
Feb 4 01:29:09 kvm01 kernel: connection1:0: ping timeout of 3 secs expired,
last rx 4295304767, last ping 4295307767, now 4295310767
Feb 4 01:29:09 kvm01 kernel: connection1:0: detected conn error (1011)
Feb 4 01:29:10 kvm01 iscsid: Kernel reported iSCSI connection 1:0 error (1011)
state (3)
Feb 4 01:30:06 kvm01 kernel: BUG: soft lockup - CPU#5 stuck for 61s!
[kblockd/5:280]
Feb 4 01:30:06 kvm01 kernel: Modules linked in: ipt_REJECT xt_tcpudp
iptable_filter ip_tables x_tables tun dm_round_robin autofs4 bridge stp ib_iser
rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi sha256_generic aes_x86_64 aes_generic cbc
dm_crypt dm_multipath scsi_dh sbs sbshc battery acpi_memhotplug ac parport_pc
lp parport joydev e1000e sr_mod cdrom sg button rtc_cmos rtc_core serio_raw
shpchp i2c_nforce2 forcedeth pcspkr rtc_lib i2c_core dm_snapshot dm_zero
dm_mirror dm_region_hash dm_log dm_mod usb_storage sata_nv pata_acpi
ata_generic libata sd_mod scsi_mod raid1 uhci_hcd ohci_hcd ehci_hcd [last
unloaded: freq_table]
Feb 4 01:30:06 kvm01 kernel: CPU 5:
Feb 4 01:30:06 kvm01 kernel: Modules linked in: ipt_REJECT xt_tcpudp
iptable_filter ip_tables x_tables tun dm_round_robin autofs4 bridge stp ib_iser
rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi sha256_generic aes_x86_64 aes_generic cbc
dm_crypt dm_multipath scsi_dh sbs sbshc battery acpi_memhotplug ac parport_pc
lp parport joydev e1000e sr_mod cdrom sg button rtc_cmos rtc_core serio_raw
shpchp i2c_nforce2 forcedeth pcspkr rtc_lib i2c_core dm_snapshot dm_zero
dm_mirror dm_region_hash dm_log dm_mod usb_storage sata_nv pata_acpi
ata_generic libata sd_mod scsi_mod raid1 uhci_hcd ohci_hcd ehci_hcd [last
unloaded: freq_table]
Feb 4 01:30:06 kvm01 kernel: Pid: 280, comm: kblockd/5 Not tainted 2.6.29.1 #1
H8DMU
Feb 4 01:30:06 kvm01 kernel: RIP: 0010:[<ffffffffa0031db9>]
[<ffffffffa0031db9>] scsi_request_fn+0x3c0/0x491 [scsi_mod]
Feb 4 01:30:06 kvm01 kernel: RSP: 0018:ffff88021e443e60 EFLAGS: 00000202
Feb 4 01:30:06 kvm01 kernel: RAX: ffff88021ccc1800 RBX: ffff88021ccc1800 RCX:
0000000000000000
Feb 4 01:30:06 kvm01 kernel: RDX: 0000000000005b5a RSI: 0000000000000282 RDI:
ffff88021b5874d8
Feb 4 01:30:06 kvm01 kernel: RBP: ffffffff80224c6e R08: ffff88041e478cc8 R09:
0000000000001058
Feb 4 01:30:06 kvm01 kernel: R10: ffff88040e070d88 R11: 0000000000001058 R12:
ffff88040e1b1480
Feb 4 01:30:06 kvm01 kernel: R13: ffff88021b587290 R14: 0000000000001058 R15:
0000000000000286
Feb 4 01:30:06 kvm01 kernel: FS: 0000000040658940(0000)
GS:ffff88041eb2b140(0000) knlGS:0000000000000000
Feb 4 01:30:06 kvm01 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb 4 01:30:06 kvm01 kernel: CR2: 0000000000859840 CR3: 00000004024a7000 CR4:
00000000000006e0
Feb 4 01:30:06 kvm01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
Feb 4 01:30:06 kvm01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
Feb 4 01:30:06 kvm01 kernel: Call Trace:
Feb 4 01:30:06 kvm01 kernel: [<ffffffff803fd9b9>] ? blk_unplug_work+0x0/0x41
Feb 4 01:30:06 kvm01 kernel: [<ffffffff803ff8eb>] ?
generic_unplug_device+0x21/0x38
Feb 4 01:30:06 kvm01 kernel: [<ffffffff8026280b>] ? run_workqueue+0x91/0x12b
Feb 4 01:30:06 kvm01 kernel: [<ffffffff80263132>] ? worker_thread+0x93/0x9e
Feb 4 01:30:06 kvm01 kernel: [<ffffffff80265b4e>] ?
autoremove_wake_function+0x0/0x2e
Feb 4 01:30:06 kvm01 kernel: [<ffffffff8026309f>] ? worker_thread+0x0/0x9e
Feb 4 01:30:06 kvm01 kernel: [<ffffffff80265a1e>] ? kthread+0x47/0x75
Feb 4 01:30:06 kvm01 kernel: [<ffffffff8022519a>] ? child_rip+0xa/0x20
Feb 4 01:30:06 kvm01 kernel: [<ffffffff802659d7>] ? kthread+0x0/0x75
Feb 4 01:30:06 kvm01 kernel: [<ffffffff80225190>] ? child_rip+0x0/0x20
Feb 4 01:30:52 kvm01 iscsid: connection1:0 is operational after recovery (1
attempts)
Feb 4 01:30:57 kvm01 kernel: connection1:0: ping timeout of 3 secs expired,
last rx 4295413014, last ping 4295416014, now 4295419014
Feb 4 01:30:57 kvm01 kernel: connection1:0: detected conn error (1011)
Feb 4 01:30:58 kvm01 iscsid: Kernel reported iSCSI connection 1:0 error (1011)
state (3)
Feb 4 01:31:23 kvm01 kernel: connection1:0: detected conn error (1019)
Feb 4 01:31:23 kvm01 iscsid: Kernel reported iSCSI connection 1:0 error (1019)
state (1)
Feb 4 01:31:27 kvm01 iscsid: connection1:0 is operational after recovery (1
attempts)
------------------------------------------/ snip
/--------------------------------------------------------
This is with kernel and initiator tools:
Linux kvm01.xxxxxxxx.xxx 2.6.29.1 #1 SMP Sat Apr 11 20:03:55 EDT 2009 x86_64
x86_64 x86_64 GNU/Linux
iscsi-initiator-utils-6.2.0.871-0.10.el5
Prior to the mkfs the host was running fine with four others guests on multiple
iSCSI LUNs.
Having checked the network switch all looks pretty good. Would be grateful
some advice on how to track down the issue.
--
Thanks, Phil
--
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/open-iscsi?hl=en.