I'm running a setup composed of: Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.
Whenever the equallogic rebalances the LUNs between the controllers/ports, it requests the initiator to logout and login again to the new port/ip. If the guests are idle, the following messages show up in the logs: Aug 3 17:55:08 goncalog140 kernel: connection1:0: detected conn error (1011) Aug 3 17:55:09 goncalog140 kernel: connection1:0: detected conn error (1011) Aug 3 17:55:10 goncalog140 iscsid: connection1:0 is operational after recovery (1 attempts) However, if one of the RHEL guests is busy performing IO, we end up having a few failed requests as well: Aug 3 17:55:26 goncalog140 kernel: connection1:0: dropping R2T itt 55 in recovery. Aug 3 17:55:26 goncalog140 kernel: connection1:0: detected conn error (1011) Aug 3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK Aug 3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 533399 Aug 3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK Aug 3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 5337 51 Aug 3 17:55:27 goncalog140 kernel: connection1:0: detected conn error (1011) Aug 3 17:55:29 goncalog140 iscsid: connection1:0 is operational after recovery (1 attempts) And as a side effect, the guest filesystem goes read-only. Googling around, I've found the following thread on this list which covers the same error I'm seeing in the logs: http://groups.google.com/group/open-iscsi/browse_thread/thread/3a7a5db6e5020423/8e95febb6cf79f64?lnk=gst&q=conn+error#8e95febb6cf79f64 I've also compiled the drivers iscsi_tcp/libiscsi with the patch from Mike Christie taken from that thread which can be found in the link below: http://groups.google.com/group/open-iscsi/attach/db552832995daaa7/trace-conn-error.patch?part=2&view=1 Is this a known issue? Is there anything else from a troubleshooting perspective that I could do? I've uploaded the following files, in case someone would like to take a look: Tcpdump's collected a couple of days ago in another reproduction/analysis of the same bug (apologies, but I didn't get around to collect new tcp dumps with today's reproduction): 0tcpdump0947.pcap 162K - 09:47 (GMT+1) nothing occurred. 1tcpdump0952.pcap 4.8M - 09:52 (GMT+2) problem occurred Logs from today's reproduction of the issue with the patched drivers for additional backtracing: vm-boot.txt 2.7K After VM creation vm-lun-rebalance-no-effect.txt 3.1K VM is idling, FS does not become read-only. vm-lun-rebalance-fs-readonly.txt 3.3K VM is dd'ing /dev/zero to iscsi based disk, FS becomes read-only. guest-dmesg.txt 14K RHEL 5.3 with 2.6.18-194.8.1.el5xen (RHEL 5.5 kernel) All these files can be found in the following link: http://promisc.org/iscsi/ Any help would be greatly appreciated! Cheers, -Goncalo. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.