Matthew Kent wrote:
> On Mon, 2009-04-13 at 17:28 -0500, Mike Christie wrote:
>> Matthew Kent wrote:
>>> On Mon, 2009-04-13 at 15:44 -0500, Mike Christie wrote:
>>>> Matthew Kent wrote:
>>>>> Can anyone suggest a timeout I might be hitting or a setting I'm
>>>>> missing?
>>>>>
>>>>> The run down:
>>>>>
>>>>> - EqualLogic target
>>>>> - CentOS 5.2 client
>>>> You will want to upgrade that to 5.3 when you can. The iscsi code in 
>>>> there fixes a bug where the initiator dropped the session when it should 
>>>> not.
>>>>
>>> Will do, probably Wednesday night and we'll see if this goes away. I'll
>>> be sure to follow up for the archives.
>>>
>>>>> - xfs > lvm > iscsi
>>>>>
>>>>> During a period of high load the EqualLogic decides to load balance:
>>>>>
>>>>>  INFO  4/13/09  12:08:29 AM  eql3    iSCSI session to target
>>>>> '20.20.20.31:3260,
>>>>> iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from
>>>>> initiator '20.20.20.92:51274, iqn.1994-05.com.redhat:a62ba20db72' was
>>>>> closed.   Load balancing request was received on the array.  
>>>> So is this what you get in the EQL log when it decides to load balance 
>>>> the initiator and send us to a different portal?
>>>>
>>> Yes, a straight copy from event log in the java web interface.
>>>
>>>>>  INFO  4/13/09  12:08:31 AM  eql3    iSCSI login to target
>>>>> '20.20.20.32:3260,
>>>>> iqn.2001-05.com.equallogic:0-8a0906-b7f6d3801-2b2000d0f5347d9a-foo' from
>>>>> initiator '20.20.20.92:44805, iqn.1994-05.com.redhat:a62ba20db72'
>>>>> successful, using standard frame length.  
>>>>>
>>>>> on the client see I get:
>>>>>
>>>>> Apr 13 00:08:29 moo kernel: [4576850.161324] sd 5:0:0:0: SCSI error:
>>>>> return code = 0x00020000
>>>>>
>>>>> Apr 13 00:08:29 moo kernel: [4576850.161330] end_request: I/O error, dev
>>>>> sdc, sector 113287552
>>>>>
>>>>> Apr 13 00:08:32 moo kernel: [4576852.470879] I/O error in filesystem
>>>>> ("dm-10") meta-data dev dm-10 block 0x6c0a000
>>>> Are you using dm-multipath over iscsi? Does this load balance issue 
>>>> affect all the paths at the same time? What is your multipath 
>>>> no_path_retry value? I think you might want to set that higher to avoid 
>>>> the FS from getting IO errors at this time if all paths are affected at 
>>>> the same time.
>>>>
>>> Not using multipath on this one.
>>>
>> Do you have xfs on sdc or is there something like LVM or RAID on top of sdc?
>>
>> That is really strange then. 0x00020000 is DID_BUS_BUSY. The iscsi 
>> initiator layer would return this when the target does its load 
>> balancing. The initiator does this to ask he scsi layer to retry the IO. 
>> If dm-multipath was used then it is failed to the multipath layer right 
>> away. If dm-multipath is not used then we get 5 retries so we should not 
>> see the error if there was only the one rebalancing at the time. If 
>> there was a bunch of load rebalancing within a couple minutes then it 
>> makes sense.
>>
> 
> Yeah xfs on top of lvm, no multipath.
> 
> Logs only show the one load balancing request around that time.
> 
> Funny thing is this system, and the load balancing etc, has been going
> error free for months now, but the last couple days it's flared up right
> around the time of some log rotation and heavy i/o.
> 
> We'll see what happens after the centos 5.3 upgrade. We'll also be

Upgrading to 5.3 will probably not help this issue. It would only help 
if on the initiator you saw lots of iscsi ping/nop timeout messages. I 
just suggested it to prevent those errors which people have hit a lot in 
5.2.

> upgrading the firmware on all the equallogics to the latest version.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to