Hi Hannes,

Thanks. The Citrix XenServer 5.6 distribution kernel is based on the 2.6.27 
tree of SLES 11. We add a few extra patches specific to Xen,  dom0 integration 
and some backports from upstream. To the best of my knowledge these additions 
don't touch the iscsi layer, so from the iscsi drivers point of view, I believe 
they are as pristine as the ones in the SuSE kernel and that's why we need the 
patch as the binaries probably will mismatch gcc version and/or the versioning 
that we use e.g I do definitely 
appreciate your 'forward thinking' with regards to the issue, though!


-----Original Message-----
From: Hannes Reinecke [mailto:h...@suse.de] 
Sent: 30 August 2010 15:12
To: Goncalo Gomes
Cc: Mike Christie; open-iscsi@googlegroups.com; Shantanu Mehendale
Subject: Re: detected conn error (1011)

Goncalo Gomes wrote:
> Hi,
> On Fri, 2010-08-06 at 15:57 +0100, Hannes Reinecke wrote: 
>> Mike Christie wrote:
>>> ccing Hannes from suse, because this looks like a SLES only bug.
>>> Hey Hannes,
>>> The user is using Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0)
>>> running a couple of RHEL 5.5 VMs. The underlying storage for these VMs
>>> is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.
>>> On 08/05/2010 02:21 PM, Goncalo Gomes wrote:
>>>> I've copied both the messages file from the host goncalog140 and the
>>>> patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these
>>>> files in the link below:
>>>> http://promisc.org/iscsi/
>>> It looks like this chunk from libiscsi.c:iscsi_queuecommand:
>>>         case ISCSI_STATE_FAILED:
>>>             reason = FAILURE_SESSION_FAILED;
>>>             sc->result = DID_TRANSPORT_DISRUPTED << 16;
>>>             break;
>>> is causing IO errors.
>>> You want to use something like DID_IMM_RETRY because it can be a long
>>> time between the time the kernel marks the state as ISCSI_STATE_FAILED
>>> until we start recovery and properly get all the device queues blocked,
>>> so we can exhaust all the retries if we use DID_TRANSPORT_DISRUPTED.
>> Yeah, I noticed.
>> But the problem is that multipathing will stall during this time,
>> ie no failover will occur and I/O will stall. Using DID_TRANSPORT_DISRUPTED
>> will circumvent this and we can failover immediately.
>> Sadly I got additional bugreports about this so I think I'll have
>> to revert it.
> I applied and tested the changes Mike Christie suggests. After the LUN
> is rebalanced within the array I no longer see the IO errors and it
> appears the setup is now resilient to the equallogic LUN failover
> process.
> I'm attaching the log from the dmesg merely for sanity check purposes,
> if anyone cares to take a look?
>> I have put some test kernels at
>> http://beta.suse.com/private/hare/sles11/iscsi
> Do the test kernels in the url above contain the change of
> DID_TRANSPORT_DISRUPTED to DID_DIMM_RETRY or is there more to it than
> simply changing the result code? If the latter, would you be able to
> upload the source rpms or a unified patch containing the changes you are
> are staging? I'm looking for a more pallatable way to test them, given I
> have no SLES box lying around, but will install one if needs be.
Got me confused. How would you test the patch if not on a SLES box?
Presumably you would have to install the new kernel on the instance
you are planning to run the test on. Which for any sane setup would
have to be a SLES box. In which case you can just use the provided
kernel directly and save you the compilation step.

Am I missing something?


Dr. Hannes Reinecke                   zSeries & Storage
h...@suse.de                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
For more options, visit this group at 

Reply via email to