Mike Christie wrote: > ccing Hannes from suse, because this looks like a SLES only bug. > > Hey Hannes, > > The user is using Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0) > running a couple of RHEL 5.5 VMs. The underlying storage for these VMs > is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array. > > > On 08/05/2010 02:21 PM, Goncalo Gomes wrote: >> I've copied both the messages file from the host goncalog140 and the >> patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these >> files in the link below: >> >> http://promisc.org/iscsi/ >> > > It looks like this chunk from libiscsi.c:iscsi_queuecommand: > > case ISCSI_STATE_FAILED: > reason = FAILURE_SESSION_FAILED; > sc->result = DID_TRANSPORT_DISRUPTED << 16; > break; > > is causing IO errors. > > You want to use something like DID_IMM_RETRY because it can be a long > time between the time the kernel marks the state as ISCSI_STATE_FAILED > until we start recovery and properly get all the device queues blocked, > so we can exhaust all the retries if we use DID_TRANSPORT_DISRUPTED. Yeah, I noticed. But the problem is that multipathing will stall during this time, ie no failover will occur and I/O will stall. Using DID_TRANSPORT_DISRUPTED will circumvent this and we can failover immediately.
Sadly I got additional bugreports about this so I think I'll have to revert it. I have put some test kernels at http://beta.suse.com/private/hare/sles11/iscsi Can you test with them and check if this issue is solved? Thanks. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage [email protected] +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
