Mike, Thanks for the speedy reply!
> The hung task warnings are saying that some IO has taken longer than the > hung task timeout value which looks like it is 2 miniutes for you. > > Are you doing any type of port down/up type of test? Nope, just the following: - Power on blade - be2iscsi BIOS logs in to target(s) - Grub loads linux (+initramfs) - initramfs runs iscsistart -b > Is there any line before this? yes, but it's just the usual bootup bits. I'll include the whole output at the bottom of this message. What I already understood from the console output is that the iscsi layer is failing up to multipath, and then when both paths are dead (one always fails, followed by the other. Network conditions are good, HBAs are still (icmp) ping-able, as is the target.) multipath passes the failure on upwards, which eventually results in the kernel dying to to losing its filesystems. > So with the default setting of the replacement_timeout (120 secs) you > should be seeing a message: > session recovery timed out after X secs > before you see hung task message below. Yep, I do see those. I figured they were from the iscsi layer. Is it worth validating the configuration first *without* using multipath? It's fairly trivial to disable. The only reason i've not tried this yet is that the same problem happened when using a non-multipath target (single target on a linux box using ietd). I'm starting to think that the be2iscsi driver or the actual ServerEngines HBA is somehow unhappy. I've updated them to the latest firmware. These problems don't happen with the iscsi_tcp module. I'd really like to stick with be2iscsi though, as the offload cards have the *huge* advantage of decoupling the iscsi and networking stacks. > Is this easy to replicate? There is just too much going wrong here. If > it happens again, can you do it happens every time the machine is booted, after about 5-20 minutes > > cat /sys/block/sdX/device/state # cat /sys/block/sd*/device/state running running running At the moment (booted about 5 minutes ago) # iscsiadm -m session -P 3 iSCSI Transport Class version 2.0-870 version 2.0-872 Target: iqn.2003-10.com.lefthandnetworks:thm-san:25:thm-vmutil01-root Current Portal: 10.20.128.100:3260,1 Persistent Portal: 10.20.128.100:3260,1 ********** Interface: ********** Iface Name: be2iscsi.d4:85:64:56:90:c9 Iface Transport: be2iscsi Iface Initiatorname: iqn.2011-05.com.travelfusion.dc.thm-vmutil01 Iface IPaddress: <empty> Iface HWaddress: d4:85:64:56:90:c9 Iface Netdev: <empty> SID: 1 iSCSI Connection State: LOGGED IN iSCSI Session State: LOGGED_IN Internal iscsid Session State: NO CHANGE ************************ Negotiated iSCSI params: ************************ HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 65536 MaxXmitDataSegmentLength: 65536 FirstBurstLength: 8192 MaxBurstLength: 262144 ImmediateData: Yes InitialR2T: Yes MaxOutstandingR2T: 1 ************************ Attached SCSI devices: ************************ Host Number: 0 State: running scsi0 Channel 00 Id 0 Lun: 0 Attached scsi disk sda State: running ********** Interface: ********** Iface Name: be2iscsi.d4:85:64:56:90:cd Iface Transport: be2iscsi Iface Initiatorname: iqn.2011-05.com.travelfusion.dc.thm-vmutil01 Iface IPaddress: <empty> Iface HWaddress: d4:85:64:56:90:cd Iface Netdev: <empty> SID: 2 iSCSI Connection State: LOGGED IN iSCSI Session State: LOGGED_IN Internal iscsid Session State: NO CHANGE ************************ Negotiated iSCSI params: ************************ HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 65536 MaxXmitDataSegmentLength: 65536 FirstBurstLength: 8192 MaxBurstLength: 262144 ImmediateData: Yes InitialR2T: Yes MaxOutstandingR2T: 1 ************************ Attached SCSI devices: ************************ Host Number: 1 State: running scsi1 Channel 00 Id 0 Lun: 0 Attached scsi disk sdb State: running Target: iqn.2003-10.com.lefthandnetworks:thm-san:27:thm-vmutil01-nfs Current Portal: 10.20.128.100:3260,1 Persistent Portal: 10.20.128.100:3260,1 ********** Interface: ********** Iface Name: be2iscsi.d4:85:64:56:90:c9 Iface Transport: be2iscsi Iface Initiatorname: iqn.2011-05.com.travelfusion.dc.thm-vmutil01 Iface IPaddress: <empty> Iface HWaddress: d4:85:64:56:90:c9 Iface Netdev: <empty> SID: 3 iSCSI Connection State: LOGGED IN iSCSI Session State: LOGGED_IN Internal iscsid Session State: NO CHANGE ************************ Negotiated iSCSI params: ************************ HeaderDigest: None DataDigest: None MaxRecvDataSegmentLength: 65536 MaxXmitDataSegmentLength: 65536 FirstBurstLength: 8192 MaxBurstLength: 262144 ImmediateData: Yes InitialR2T: Yes MaxOutstandingR2T: 1 ************************ Attached SCSI devices: ************************ Host Number: 0 State: running scsi0 Channel 00 Id 1 Lun: 0 Attached scsi disk sdc State: running > Would you also be able to run a patch that will add some extra debugging > to the driver and iscsi layer? yes please! > I will try to contact HP and get access to a box like this. Jay is > leaving on vacation so I do not think he will be able to help for a > couple days. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.