Pasi Kärkkäinen wrote:
> On Mon, Dec 08, 2008 at 05:48:22AM -0800, Evan Broder wrote:
>   
>> A group I work with is currently using a Dell Equallogic RAID with
>> four servers on a dedicated storage network. We've been regularly
>> experiencing connection errors:
>>
>> Dec  8 00:50:36 aperture-science kernel: [1010621.595904]
>> connection1:0: iscsi: detected conn error (1011)
>> Dec  8 00:50:37 aperture-science iscsid: Kernel reported iSCSI
>> connection 1:0 error (1011) state (3)
>> Dec  8 00:50:39 aperture-science iscsid: Login authentication failed
>> with target iqn.2001-05.com.equallogic:
>> 0-8a0906-2b6e7d402-891497db5ca48925-xvm-volume-1
>> Dec  8 00:50:39 aperture-science kernel: [1010624.597349] iscsi: host
>> reset succeeded
>> Dec  8 00:50:40 aperture-science iscsid: connection1:0 is operational
>> after recovery (1 attempts)
>>
>> These errors always occur about 40 seconds after a multiple of 5
>> minutes after the hour. Other than that, we've seen no pattern in when
>> they occur, or their cause. We've tried running some scripts designed
>> to heavily utilize storage devices, but that doesn't seem to trigger
>> it. These errors occur on all four of our servers, but not at the same
>> time.
>>
>> The RAID is being used as a physical volume for an LVM volume
>> group. The servers are being used for hosting Xen virtual machines,
>> which use LVs on the RAID as their disk images. Shortly before the
>> connection errors are logged, all disk I/O from the virtual machines
>> hangs completely, causing the VMs to basically become non-responsive
>> to non-trivial interaction.
>>
>> Our four servers are running the stock Ubuntu Hardy 2.6.24 kernel. I
>> don't see an explicit version number in any of the iSCSI kernel
>> source, but drivers/scsi/scsi_transport_iscsi.c contains:
>>     
>>>> #define ISCSI_TRANSPORT_VERSION "2.0-724"
>>>>         
>> The userspace utilities are 2.0.865.
>>
>> Does anyone know how to stop these errors? Is there more diagnostic
>> information we could provide? We're way out of our league in terms of
>> debugging this.
>>
>>     
>
> Hmm.. wondering if those are related to automatic connection loadbalancing
> on Equallogic arrays. 
>
> Maybe check with iscsiadm if the connected interface (on the EQL array)
> changes after those errors:
>
> iscsiadm -m session -P3
>
> And check the 'Current Portal' and 'Persistent Portal' values.. 
>
> Persistent Portal should be your EQL group IP address, and Current Portal
> should be whatever interface you're happening to use atm.. 
>
> -- Pasi

We currently only have a single interface enabled, although we have
plans to add more in the future.

Looking at the RAID configuration, we have the "Group IP address" set to
10.5.128.128, and the member NIC's IP address set to 10.5.128.129. The
persistent portal os 10.5.128.128 on all four servers, and the current
portal is 10.5.128.129 on all servers. It doesn't change after one of
the connection errors.

- Evan

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to