Re: Frequent Connection Errors with Dell Equallogic RAID

2009-01-05 Thread Mike Christie

Mike Christie wrote:
> Evan Broder wrote:
>> Evan Broder wrote:
>>> Just got another round of failures with the new debugging patch in.
>>> Traffic dump at
>>> ,
>>> syslog (including the kernel log) at
>>> 
>>>
>>> - Evan
>>>   
>> I hate to press the issue, but has anyone had a chance to look at these?
>>
> 
> I have been out since the 17th and will be back on the 5th. I will check 
> them out when I am back. Sorry for the delay.
> 

I am looking at the updated logs and pcap now. I should have something 
to report later today.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-31 Thread Mike Christie

Evan Broder wrote:
> Evan Broder wrote:
>> Just got another round of failures with the new debugging patch in.
>> Traffic dump at
>> ,
>> syslog (including the kernel log) at
>> 
>>
>> - Evan
>>   
> 
> I hate to press the issue, but has anyone had a chance to look at these?
> 

I have been out since the 17th and will be back on the 5th. I will check 
them out when I am back. Sorry for the delay.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-30 Thread Evan Broder

Evan Broder wrote:
> Just got another round of failures with the new debugging patch in.
> Traffic dump at
> ,
> syslog (including the kernel log) at
> 
>
> - Evan
>   

I hate to press the issue, but has anyone had a chance to look at these?

- Evan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-20 Thread Evan Broder

Evan Broder wrote:
 Could you also run with this patch? In the open-iscsi source do

 patch -p1 -i path-to-patch/remove-undef-tcp.patch

 then rebuild with

 make DEBUG_TCP=1 DEBUG_SCSI=1
 make DEBUG_TCP=1 DEBUG_SCSI=1 install
 
> I'll hopefully have a log with this information in the next few days.
>
> - Evan
>   

Just got another round of failures with the new debugging patch in.
Traffic dump at
,
syslog (including the kernel log) at


- Evan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-18 Thread Evan Broder

Mike Christie wrote:
> I think wiresharp/ethereal will spit out the data in the packets by
> default and I am not sure of a way to ask it to drop the data and just
> send the packet info.

Ok - sorry for taking a while in getting this collected. I have a
tcpdump from one of the failures at
.
The failure should occur about halfway through the log. The
corresponding syslog is at

>>> Could you also run with this patch? In the open-iscsi source do
>>>
>>> patch -p1 -i path-to-patch/remove-undef-tcp.patch
>>>
>>> then rebuild with
>>>
>>> make DEBUG_TCP=1 DEBUG_SCSI=1
>>> make DEBUG_TCP=1 DEBUG_SCSI=1 install
I'll hopefully have a log with this information in the next few days.

- Evan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-12 Thread Mike Christie
Evan Broder wrote:
> 
> Incidentally, to what extent is user data exposed in the dump? I've
> asked some friends to let me run their VMs on citadel-station to try and
> increase the probability of an incident occurring, but they presumably
> wouldn't appreciate me sending out logs if data from their VM's disk
> images can be recovered.

I think wiresharp/ethereal will spit out the data in the packets by 
default and I am not sure of a way to ask it to drop the data and just 
send the packet info.

> 
>> Could you also run with this patch? In the open-iscsi source do
>>
>> patch -p1 -i path-to-patch/remove-undef-tcp.patch
>>
>> then rebuild with
>>
>> make DEBUG_TCP=1 DEBUG_SCSI=1
>> make DEBUG_TCP=1 DEBUG_SCSI=1 install
> 
> Huh? Where do I find that patch?
> 

Did you see it attached in the mail? I am attaching it again.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

diff --git a/kernel/iscsi_tcp.c b/kernel/iscsi_tcp.c
index d074146..69286d0 100644
--- a/kernel/iscsi_tcp.c
+++ b/kernel/iscsi_tcp.c
@@ -48,7 +48,6 @@ MODULE_AUTHOR("Dmitry Yusupov , "
  "Alex Aizman ");
 MODULE_DESCRIPTION("iSCSI/TCP data-path");
 MODULE_LICENSE("GPL");
-#undef DEBUG_TCP
 #define DEBUG_ASSERT
 
 #ifdef DEBUG_TCP


Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-12 Thread Evan Broder

Mike Christie wrote:
> Are you doing iscsi root?

iscsid is running as root - is that what you mean?

> How do you know they become unresponsive? Do you have a ssh session
> and at this time you cannot contact the vm or do you have a console on
> the vm and at that time you cannot run any commands?

ssh sessions hang, SMTP connections hang. It's all network-based
traffic; I haven't tried interacting with the console.

Note that the host itself doesn't hang when one of these failures occur,
just the VMs using iSCSI for disk.

> The iscsi logs show we did not get the command response. So either the
> target droppped a packet or the network dropped something or the
> initiator dropped it in the network processing.
>
> Could you run this again but get a ethereal/wireshark trace at the
> same time? That way we can see if the network layer is getting the packet.

I'm trying to configure something to capture network traffic, but right
now I'm having a hard time getting a functional tshark. Right now, it
looks like it's capturing accurate data, but borking the TCP/UDP
checksums. The next time I have a failure, I'll forward the dump along
to see if it looks useful.

Incidentally, to what extent is user data exposed in the dump? I've
asked some friends to let me run their VMs on citadel-station to try and
increase the probability of an incident occurring, but they presumably
wouldn't appreciate me sending out logs if data from their VM's disk
images can be recovered.

> Could you also run with this patch? In the open-iscsi source do
>
> patch -p1 -i path-to-patch/remove-undef-tcp.patch
>
> then rebuild with
>
> make DEBUG_TCP=1 DEBUG_SCSI=1
> make DEBUG_TCP=1 DEBUG_SCSI=1 install

Huh? Where do I find that patch?

> One other question. Is the iscsi initiator running in the vm or in the
> host? Is this xen or something else like vmware? Could you take the
> ethereal/wireshark trace from wherever iscsi is running so we can see
> what its network layer is getting? If it is running in a guest and you
> can get a ethereal/wireshark trace from both the guest and host that
> would be nice too.

The iscsi initiator is running on the Xen host. We then use clustering
LVM to get LVs on the target that we use as the disks for the VMs.

- Evan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-12 Thread Mike Christie
Evan Broder wrote:
> Evan Broder wrote:
>> Ok - I'm running a few of my VMs, and setup some simple monitoring to
>> let me know if an error gets logged. I'll get back to the list if/when I
>> find anything.
>>
>> - Evan
> 
> I'm no longer seeing the "detected conn error", but I am still seeing
> the symptoms of the problems: on a multiple of 5 minutes past the hour,
> VMs become non-responsive for 30-60 seconds. The logs with all of the

Are you doing iscsi root?

How do you know they become unresponsive? Do you have a ssh session and 
at this time you cannot contact the vm or do you have a console on the 
vm and at that time you cannot run any commands?

> debugging options turned on are a bit...overwhelming, but I've posted
> the logs from an incident today. The incident happened at 14:05; I've
> included the logs from 14:04-14:06 at
> http://web.mit.edu/broder/Public/iscsi/citadel-station-2008-12-12-14-05.bz2
> 


It looks from Dec 12 14:05:09 citadel-station kernel: [107789.172478] 
iscsi: done [sc 88012d96ca40 res 0 itt 0xf]

to

Dec 12 14:05:38 citadel-station kernel: [107938.402859] iscsi: aborting 
[sc 8801287c91c0 itt 0x7c]

The only IO we see is the target sending us a nop (iscsi ping) and us 
responding. At 14:05:38 a scsi command times out and so the scsi layer 
fires up its error handling which sends an abort that fails then sends a 
lu reset which unjams us and the initiator then resends the commands it 
thought was lost. Note that if the lu reset failed, we would have 
eventually seen the conn error message you saw before so we probably are 
hitting the right code paths.

The initiator thinks these two commands:
Dec 12 14:05:38 citadel-station kernel: [107938.405876] iscsi: failing 
in progress sc 8801287c91c0 itt 0x7c
Dec 12 14:05:38 citadel-station kernel: [107938.405881] iscsi: failing 
in progress sc 8801274b6440 itt 0x5c

did not complete. But I am betting the target thinks it completed it 
since the target seems fine and is still pinging us.


The iscsi logs show we did not get the command response. So either the 
target droppped a packet or the network dropped something or the 
initiator dropped it in the network processing.

Could you run this again but get a ethereal/wireshark trace at the same 
time? That way we can see if the network layer is getting the packet.


Could you also run with this patch? In the open-iscsi source do

patch -p1 -i path-to-patch/remove-undef-tcp.patch

then rebuild with

make DEBUG_TCP=1 DEBUG_SCSI=1
make DEBUG_TCP=1 DEBUG_SCSI=1 install


One other question. Is the iscsi initiator running in the vm or in the 
host? Is this xen or something else like vmware? Could you take the 
ethereal/wireshark trace from wherever iscsi is running so we can see 
what its network layer is getting? If it is running in a guest and you 
can get a ethereal/wireshark trace from both the guest and host that 
would be nice too.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---

diff --git a/kernel/iscsi_tcp.c b/kernel/iscsi_tcp.c
index d074146..69286d0 100644
--- a/kernel/iscsi_tcp.c
+++ b/kernel/iscsi_tcp.c
@@ -48,7 +48,6 @@ MODULE_AUTHOR("Dmitry Yusupov , "
  "Alex Aizman ");
 MODULE_DESCRIPTION("iSCSI/TCP data-path");
 MODULE_LICENSE("GPL");
-#undef DEBUG_TCP
 #define DEBUG_ASSERT
 
 #ifdef DEBUG_TCP


Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-12 Thread Evan Broder

Evan Broder wrote:
> Ok - I'm running a few of my VMs, and setup some simple monitoring to
> let me know if an error gets logged. I'll get back to the list if/when I
> find anything.
>
> - Evan

I'm no longer seeing the "detected conn error", but I am still seeing
the symptoms of the problems: on a multiple of 5 minutes past the hour,
VMs become non-responsive for 30-60 seconds. The logs with all of the
debugging options turned on are a bit...overwhelming, but I've posted
the logs from an incident today. The incident happened at 14:05; I've
included the logs from 14:04-14:06 at
http://web.mit.edu/broder/Public/iscsi/citadel-station-2008-12-12-14-05.bz2

- Evan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-11 Thread Evan Broder

Mike Christie wrote:
> Not with just the above info. I t could be happening for a variety of 
> reasons.
>
> It looks like the target is either disconnecting us (this is why I was 
> looking for a TCP_CLOSE in the debug output) because we goofed on some 
> iscsi protocol issue but normally I would have thought the target would 
> log some message.
>
> Or for some reason the network is just getting disconnected or hitting 
> some problem and it is throwing an error when we try to send a packet. 
> For this with the newer code in that tarball we would throw a new error 
> code and so the "iscsi: detected conn error" output would have a 
> different value than 1011.

Ok - I'm running a few of my VMs, and setup some simple monitoring to
let me know if an error gets logged. I'll get back to the list if/when I
find anything.

- Evan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-11 Thread Mike Christie

Evan Broder wrote:
>> If you can play around on the box, it would be helpful to run the
>> open-iscsi tarball release. Build it with
>>
>> make DEBUG_TCP=1 make DEBUG_SCSI=1
>>
>> then do
>>
>> make DEBUG_TCP=1 make DEBUG_SCSI=1 install
>>
>> This will overwrite the existing kernel modules and tools with the
>> open-iscsi ones, so do not this if you cannot mess around with the box
>> and reinstall the ubuntu kernel and tools.
>>
>> This will dump out a lot of info in /var/log/messages, but we are just
>> looking for the part before you see these:
>>
>> Dec  8 00:50:36 aperture-science kernel: [1010621.595904]
>> connection1:0: iscsi: detected conn error (1011)
>>
>> You might see something about a TCP_CLOSE.
> 
> Ok - I've moved all of the virtual machines off of one of our servers,
> and it's currently running the latest tarball release of both the
> kernelspace and userspace pieces with both debug options enabled.
> 
> Our biggest problem when we were trying to diagnose this with Equallogic
> was that we didn't have a reliable way of generating the errors. They
> don't seem to happen when there's no iSCSI traffic, but they also didn't
> occur when we ran a program that generates rapid, random reads and writes.
> 
> I'd rather not put production VMs on this server while we're testing. Do
> you have any theories on how we could trigger these connection errors,
> if they're still occurring?
> 

Not with just the above info. I t could be happening for a variety of 
reasons.

It looks like the target is either disconnecting us (this is why I was 
looking for a TCP_CLOSE in the debug output) because we goofed on some 
iscsi protocol issue but normally I would have thought the target would 
log some message.

Or for some reason the network is just getting disconnected or hitting 
some problem and it is throwing an error when we try to send a packet. 
For this with the newer code in that tarball we would throw a new error 
code and so the "iscsi: detected conn error" output would have a 
different value than 1011.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-11 Thread Evan Broder

Mike Christie wrote:
> What firmware do you have, and did yo just start seeing these with a
> target firmware upgrade or a iscsi/ubuntu upgrade?

I can't find it, but I believe we are running version 4.0.1 of the
Equallogic firmware, but we were originally running 3.3.1 and saw
problems with that version as well.

We originally saw these problems on Debian Etch, and they persisted
across our upgrade to Ubuntu Hardy.

> Have you contacted the Equallogic guys? If not I will ping them if
> they do not reply. They might be busy or have not seen the thread here.

We do have a support contract with them and were working on the issue
before. We had thought the problem went away when we moved from Etch to
Hardy. We're planning to get in touch with them again soon, but I'm not
the point of contact on that, so I've asked one of my project's
co-maintainers to do that.

> If you can play around on the box, it would be helpful to run the
> open-iscsi tarball release. Build it with
>
> make DEBUG_TCP=1 make DEBUG_SCSI=1
>
> then do
>
> make DEBUG_TCP=1 make DEBUG_SCSI=1 install
>
> This will overwrite the existing kernel modules and tools with the
> open-iscsi ones, so do not this if you cannot mess around with the box
> and reinstall the ubuntu kernel and tools.
>
> This will dump out a lot of info in /var/log/messages, but we are just
> looking for the part before you see these:
>
> Dec  8 00:50:36 aperture-science kernel: [1010621.595904]
> connection1:0: iscsi: detected conn error (1011)
>
> You might see something about a TCP_CLOSE.

Ok - I've moved all of the virtual machines off of one of our servers,
and it's currently running the latest tarball release of both the
kernelspace and userspace pieces with both debug options enabled.

Our biggest problem when we were trying to diagnose this with Equallogic
was that we didn't have a reliable way of generating the errors. They
don't seem to happen when there's no iSCSI traffic, but they also didn't
occur when we ran a program that generates rapid, random reads and writes.

I'd rather not put production VMs on this server while we're testing. Do
you have any theories on how we could trigger these connection errors,
if they're still occurring?

- Evan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-10 Thread Paul Koning

> "Evan" == Evan Broder <[EMAIL PROTECTED]> writes:

 Evan> Looking at the RAID configuration, we have the "Group IP
 Evan> address" set to 10.5.128.128, and the member NIC's IP address
 Evan> set to 10.5.128.129. The persistent portal os 10.5.128.128 on
 Evan> all four servers, and the current portal is 10.5.128.129 on all
 Evan> servers. It doesn't change after one of the connection errors.

That suggests the load balancer is not involved here.

You might check the event logs on the array to see if there are any
messages there around the same time.  

 paul


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-10 Thread Evan Broder

zhaoyuqiang wrote:
> Hi,
> Someone solved the issue like this with running opensuse10.3. you can
> try download source code from
> http://www.open-iscsi.org/bits/open-iscsi-2.0-865.13.tar.gz
>  and
> complie for Ubuntu, it's like scsi_transport_iscsi 2.0-724 has
> some uncompatible with Equallogic in iscsi nohup
> response??  The vision 2.0-865.13.tar.gz which I find the
> scsi_transport_iscsi version is 2.0-865.  
> Ubuntu

Donald Williams wrote:
> Hello,
>
>  I would strongly suggest using the code version Mike mentioned.  I
use ubuntu 8.04/8.10 with that code without issues w/EQL arrays.
>
> Running the older transport kernel module has caused NOOP errors.  The
initiator sends out NOOPs w/different SN numbers than what the array is
expecting.   Upgrading the kernel module has resolved this at quite a
few customers.


I'm working now on moving the VMs off of one of my servers so I can do
some testing. I'll hopefully be able to get back by the end of the week
or so.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-10 Thread Mike Christie

Ulrich Windl wrote:
> On 8 Dec 2008 at 17:52, Pasi Kärkkäinen wrote:
> 
>> iscsiadm -m session -P3
>>
> 
> Hi,
> 
> being curious: My version if iscsiadm (version 2.0-754) doesn't know about 
> the "-
> P3". What ist it expected to do?
> 

In older versions you might be able to do

iscsiadm -m session -i

This was used to dump all the iscsi info we had for debugging.

In later versions we stablized the interface and you can pass it how 
much info you want (a number from 1 to 3), and what session you want 
info on or if no session is given it will print them all.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-10 Thread zhaoyuqiang
Hi,
Someone solved the issue like this with running opensuse10.3. you can try
download source code from http://www.open-iscsi.org/bits/
open-iscsi-2.0-865.13.tar.gz
and
complie for Ubuntu, it's like scsi_transport_iscsi 2.0-724 has
some uncompatible with Equallogic in iscsi nohup response??  The vision
2.0-865.13.tar.gz which I find the scsi_transport_iscsi version is 2.0-865.

Ubuntu
2008/12/8 Evan Broder <[EMAIL PROTECTED]>

>
> A group I work with is currently using a Dell Equallogic RAID with
> four servers on a dedicated storage network. We've been regularly
> experiencing connection errors:
>
> Dec  8 00:50:36 aperture-science kernel: [1010621.595904]
> connection1:0: iscsi: detected conn error (1011)
> Dec  8 00:50:37 aperture-science iscsid: Kernel reported iSCSI
> connection 1:0 error (1011) state (3)
> Dec  8 00:50:39 aperture-science iscsid: Login authentication failed
> with target iqn.2001-05.com.equallogic:
> 0-8a0906-2b6e7d402-891497db5ca48925-xvm-volume-1
> Dec  8 00:50:39 aperture-science kernel: [1010624.597349] iscsi: host
> reset succeeded
> Dec  8 00:50:40 aperture-science iscsid: connection1:0 is operational
> after recovery (1 attempts)
>
> These errors always occur about 40 seconds after a multiple of 5
> minutes after the hour. Other than that, we've seen no pattern in when
> they occur, or their cause. We've tried running some scripts designed
> to heavily utilize storage devices, but that doesn't seem to trigger
> it. These errors occur on all four of our servers, but not at the same
> time.
>
> The RAID is being used as a physical volume for an LVM volume
> group. The servers are being used for hosting Xen virtual machines,
> which use LVs on the RAID as their disk images. Shortly before the
> connection errors are logged, all disk I/O from the virtual machines
> hangs completely, causing the VMs to basically become non-responsive
> to non-trivial interaction.
>
> Our four servers are running the stock Ubuntu Hardy 2.6.24 kernel. I
> don't see an explicit version number in any of the iSCSI kernel
> source, but drivers/scsi/scsi_transport_iscsi.c contains:
> > > #define ISCSI_TRANSPORT_VERSION "2.0-724"
>
> The userspace utilities are 2.0.865.
>
> Does anyone know how to stop these errors? Is there more diagnostic
> information we could provide? We're way out of our league in terms of
> debugging this.
>
> - Evan
>
>
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-10 Thread Pasi Kärkkäinen

On Wed, Dec 10, 2008 at 12:45:04PM +0100, Ulrich Windl wrote:
> 
> On 8 Dec 2008 at 17:52, Pasi Kärkkäinen wrote:
> 
> > iscsiadm -m session -P3
> > 
> 
> Hi,
> 
> being curious: My version if iscsiadm (version 2.0-754) doesn't know about 
> the "-
> P3". What ist it expected to do?
> 

It just prints more information.. details.

-- Pasi

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-10 Thread Ulrich Windl

On 8 Dec 2008 at 17:52, Pasi Kärkkäinen wrote:

> iscsiadm -m session -P3
> 

Hi,

being curious: My version if iscsiadm (version 2.0-754) doesn't know about the 
"-
P3". What ist it expected to do?

Regards,
Ulrich


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-08 Thread Donald Williams
Hello,
 I would strongly suggest using the code version Mike mentioned.  I use
ubuntu 8.04/8.10 with that code without issues w/EQL arrays.

Running the older transport kernel module has caused NOOP errors.  The
initiator sends out NOOPs w/different SN numbers than what the array is
expecting.   Upgrading the kernel module has resolved this at quite a few
customers.

 If this doesn't resolve the issue,  please open a case with EQL.
 877.887.7337.   They can review the logs and compare it to when the errors
occur.

 Don



On Mon, Dec 8, 2008 at 9:07 PM, Mike Christie <[EMAIL PROTECTED]> wrote:

>
> Mike Christie wrote:
> >
> > If you can play around on the box, it would be helpful to run the
> > open-iscsi tarball release. Build it with
>
> Oh yeah that is here:
> http://www.open-iscsi.org/bits/open-iscsi-2.0-870.1.tar.gz
>
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-08 Thread Mike Christie

Mike Christie wrote:
> 
> If you can play around on the box, it would be helpful to run the 
> open-iscsi tarball release. Build it with

Oh yeah that is here:
http://www.open-iscsi.org/bits/open-iscsi-2.0-870.1.tar.gz

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-08 Thread Mike Christie

Evan Broder wrote:
> Mike Christie wrote:
>> Is the only log error that bit about:
>>
>> iqn.2001-05.com.equallogic:0-8a0906-2b6e7d402-891497db5ca48925-xvm-volume-1'
>>
>> from initiator '10.5.128.16:53712, iqn
>> 1993-08.org.debian:01:bce099ff4d44' was closed.
> Yeah, that's the only thing that ever shows up. It's immediately
> followed by a successful reconnection.
> 

What firmware do you have, and did yo just start seeing these with a 
target firmware upgrade or a iscsi/ubuntu upgrade?


Have you contacted the Equallogic guys? If not I will ping them if they 
do not reply. They might be busy or have not seen the thread here.


If you can play around on the box, it would be helpful to run the 
open-iscsi tarball release. Build it with

make DEBUG_TCP=1 make DEBUG_SCSI=1

then do

make DEBUG_TCP=1 make DEBUG_SCSI=1 install

This will overwrite the existing kernel modules and tools with the 
open-iscsi ones, so do not this if you cannot mess around with the box 
and reinstall the ubuntu kernel and tools.

This will dump out a lot of info in /var/log/messages, but we are just 
looking for the part before you see these:

Dec  8 00:50:36 aperture-science kernel: [1010621.595904]
connection1:0: iscsi: detected conn error (1011)

You might see something about a TCP_CLOSE.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-08 Thread Evan Broder

Mike Christie wrote:
> Is the only log error that bit about:
>
> iqn.2001-05.com.equallogic:0-8a0906-2b6e7d402-891497db5ca48925-xvm-volume-1'
>
> from initiator '10.5.128.16:53712, iqn
> 1993-08.org.debian:01:bce099ff4d44' was closed.
Yeah, that's the only thing that ever shows up. It's immediately
followed by a successful reconnection.

- Evan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-08 Thread Mike Christie

Evan Broder wrote:
> Pasi Kärkkäinen wrote:
>> On Mon, Dec 08, 2008 at 02:57:44PM -0500, Evan Broder wrote:
>>   
>>> Yeah. Our storage network is completely isolated, so we thought about
>>> trying to disable CHAP, but we couldn't find an option to turn it off in
>>> the Equallogic config.
>>>
>>> 
>> In your volume properties, add access with just the IP address. Remove any
>> username access entries.
>>
>> I'm using EQL volumes without CHAP.
>>
>> -- Pasi
>>   
> 
> Hmm, ok. So it's possible to turn off CHAP, but that doesn't really seem
> like it's the issue here; from my layman's reading of the logs, the

Yeah, like I said in the other thread, we should be fine with it on.

> authentication failures come off as a side-effect of mis-matched state
> between the initiator and the target, does it not?
> 

This should not really happen. It has not in the past with these 
targets. When the initiator starts a login/reconnect, the target will 
figure out that we are trying to relogin and it will do the right thing 
(this is part of the the iscsi spec). We should not get that error.

Is the only log error that bit about:

iqn.2001-05.com.equallogic:0-8a0906-2b6e7d402-891497db5ca48925-xvm-volume-1'
from initiator '10.5.128.16:53712, iqn
1993-08.org.debian:01:bce099ff4d44' was closed.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-08 Thread Evan Broder

Pasi Kärkkäinen wrote:
> On Mon, Dec 08, 2008 at 02:57:44PM -0500, Evan Broder wrote:
>   
>> Yeah. Our storage network is completely isolated, so we thought about
>> trying to disable CHAP, but we couldn't find an option to turn it off in
>> the Equallogic config.
>>
>> 
>
> In your volume properties, add access with just the IP address. Remove any
> username access entries.
>
> I'm using EQL volumes without CHAP.
>
> -- Pasi
>   

Hmm, ok. So it's possible to turn off CHAP, but that doesn't really seem
like it's the issue here; from my layman's reading of the logs, the
authentication failures come off as a side-effect of mis-matched state
between the initiator and the target, does it not?

- Evan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-08 Thread Evan Broder

Paul Koning wrote:
> You might check the event logs on the array to see if there are any
> messages there around the same time.  
>
>paul
>
>   
There isn't anything. There's a message for the disconnect, and a
message for the reconnect.

- Evan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-08 Thread Pasi Kärkkäinen

On Mon, Dec 08, 2008 at 02:57:44PM -0500, Evan Broder wrote:
> 
> Mike Christie wrote:
> > Evan Broder wrote:
> >> A group I work with is currently using a Dell Equallogic RAID with
> >> four servers on a dedicated storage network. We've been regularly
> >> experiencing connection errors:
> >>
> >> Dec  8 00:50:36 aperture-science kernel: [1010621.595904]
> >> connection1:0: iscsi: detected conn error (1011)
> >> Dec  8 00:50:37 aperture-science iscsid: Kernel reported iSCSI
> >> connection 1:0 error (1011) state (3)
> >> Dec  8 00:50:39 aperture-science iscsid: Login authentication failed
> >> with target iqn.2001-05.com.equallogic:
> >
> > Are you guys using CHAP at this time?
> 
> Yeah. Our storage network is completely isolated, so we thought about
> trying to disable CHAP, but we couldn't find an option to turn it off in
> the Equallogic config.
> 

In your volume properties, add access with just the IP address. Remove any
username access entries.

I'm using EQL volumes without CHAP.

-- Pasi

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-08 Thread Mike Christie

Evan Broder wrote:
> Mike Christie wrote:
>> Evan Broder wrote:
>>> A group I work with is currently using a Dell Equallogic RAID with
>>> four servers on a dedicated storage network. We've been regularly
>>> experiencing connection errors:
>>>
>>> Dec  8 00:50:36 aperture-science kernel: [1010621.595904]
>>> connection1:0: iscsi: detected conn error (1011)
>>> Dec  8 00:50:37 aperture-science iscsid: Kernel reported iSCSI
>>> connection 1:0 error (1011) state (3)
>>> Dec  8 00:50:39 aperture-science iscsid: Login authentication failed
>>> with target iqn.2001-05.com.equallogic:
>> Are you guys using CHAP at this time?
> 
> Yeah. Our storage network is completely isolated, so we thought about
> trying to disable CHAP, but we couldn't find an option to turn it off in
> the Equallogic config.
> 

It is ok that it is on. I was just asking because it is strange that 
when we try to log back in, we get a authentication error the first tim 
but it looks like we retry again and it works ok, or we misread the 
status and think we are in the middle of a redirect and drop the error 
and log back in ok.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-08 Thread Evan Broder

Mike Christie wrote:
> Evan Broder wrote:
>> A group I work with is currently using a Dell Equallogic RAID with
>> four servers on a dedicated storage network. We've been regularly
>> experiencing connection errors:
>>
>> Dec  8 00:50:36 aperture-science kernel: [1010621.595904]
>> connection1:0: iscsi: detected conn error (1011)
>> Dec  8 00:50:37 aperture-science iscsid: Kernel reported iSCSI
>> connection 1:0 error (1011) state (3)
>> Dec  8 00:50:39 aperture-science iscsid: Login authentication failed
>> with target iqn.2001-05.com.equallogic:
>
> Are you guys using CHAP at this time?

Yeah. Our storage network is completely isolated, so we thought about
trying to disable CHAP, but we couldn't find an option to turn it off in
the Equallogic config.

- Evan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-08 Thread Mike Christie

Evan Broder wrote:
> A group I work with is currently using a Dell Equallogic RAID with
> four servers on a dedicated storage network. We've been regularly
> experiencing connection errors:
> 
> Dec  8 00:50:36 aperture-science kernel: [1010621.595904]
> connection1:0: iscsi: detected conn error (1011)
> Dec  8 00:50:37 aperture-science iscsid: Kernel reported iSCSI
> connection 1:0 error (1011) state (3)
> Dec  8 00:50:39 aperture-science iscsid: Login authentication failed
> with target iqn.2001-05.com.equallogic:

Are you guys using CHAP at this time?


> 0-8a0906-2b6e7d402-891497db5ca48925-xvm-volume-1
> Dec  8 00:50:39 aperture-science kernel: [1010624.597349] iscsi: host
> reset succeeded
> Dec  8 00:50:40 aperture-science iscsid: connection1:0 is operational
> after recovery (1 attempts)
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-08 Thread Evan Broder

Pasi Kärkkäinen wrote:
> On Mon, Dec 08, 2008 at 05:48:22AM -0800, Evan Broder wrote:
>   
>> A group I work with is currently using a Dell Equallogic RAID with
>> four servers on a dedicated storage network. We've been regularly
>> experiencing connection errors:
>>
>> Dec  8 00:50:36 aperture-science kernel: [1010621.595904]
>> connection1:0: iscsi: detected conn error (1011)
>> Dec  8 00:50:37 aperture-science iscsid: Kernel reported iSCSI
>> connection 1:0 error (1011) state (3)
>> Dec  8 00:50:39 aperture-science iscsid: Login authentication failed
>> with target iqn.2001-05.com.equallogic:
>> 0-8a0906-2b6e7d402-891497db5ca48925-xvm-volume-1
>> Dec  8 00:50:39 aperture-science kernel: [1010624.597349] iscsi: host
>> reset succeeded
>> Dec  8 00:50:40 aperture-science iscsid: connection1:0 is operational
>> after recovery (1 attempts)
>>
>> These errors always occur about 40 seconds after a multiple of 5
>> minutes after the hour. Other than that, we've seen no pattern in when
>> they occur, or their cause. We've tried running some scripts designed
>> to heavily utilize storage devices, but that doesn't seem to trigger
>> it. These errors occur on all four of our servers, but not at the same
>> time.
>>
>> The RAID is being used as a physical volume for an LVM volume
>> group. The servers are being used for hosting Xen virtual machines,
>> which use LVs on the RAID as their disk images. Shortly before the
>> connection errors are logged, all disk I/O from the virtual machines
>> hangs completely, causing the VMs to basically become non-responsive
>> to non-trivial interaction.
>>
>> Our four servers are running the stock Ubuntu Hardy 2.6.24 kernel. I
>> don't see an explicit version number in any of the iSCSI kernel
>> source, but drivers/scsi/scsi_transport_iscsi.c contains:
>> 
 #define ISCSI_TRANSPORT_VERSION "2.0-724"
 
>> The userspace utilities are 2.0.865.
>>
>> Does anyone know how to stop these errors? Is there more diagnostic
>> information we could provide? We're way out of our league in terms of
>> debugging this.
>>
>> 
>
> Hmm.. wondering if those are related to automatic connection loadbalancing
> on Equallogic arrays. 
>
> Maybe check with iscsiadm if the connected interface (on the EQL array)
> changes after those errors:
>
> iscsiadm -m session -P3
>
> And check the 'Current Portal' and 'Persistent Portal' values.. 
>
> Persistent Portal should be your EQL group IP address, and Current Portal
> should be whatever interface you're happening to use atm.. 
>
> -- Pasi

We currently only have a single interface enabled, although we have
plans to add more in the future.

Looking at the RAID configuration, we have the "Group IP address" set to
10.5.128.128, and the member NIC's IP address set to 10.5.128.129. The
persistent portal os 10.5.128.128 on all four servers, and the current
portal is 10.5.128.129 on all servers. It doesn't change after one of
the connection errors.

- Evan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-08 Thread Evan Broder

Mike Christie wrote:
> Evan Broder wrote:
>
 #define ISCSI_TRANSPORT_VERSION "2.0-724"
>>
>> The userspace utilities are 2.0.865.
>>
>
> Are these the Ubuntu tools or did you get them from open-iscsi.org? If
> they are ubuntu ones is there more version info on the ubuntu package?
> What is the iscsi package named in ubuntu btw?

We are using both the kernel modules and userspace utilities from
Ubuntu. The kernel modules are included with the stock kernels; the
userspace is in the open-iscsi package.

The userspace is patched some from the upstream sources - see
http://archive.ubuntu.com/ubuntu/pool/main/o/open-iscsi/open-iscsi_2.0.865-1ubuntu3.diff.gz
- I'm not entirely clear on what the patches do, though

> Is there any other error messages in the logs other than this:
>
>  Dec  8 00:50:36 aperture-science kernel: [1010621.595904]
> connection1:0: iscsi: detected conn error (1011)
> Dec  8 00:50:37 aperture-science iscsid: Kernel reported iSCSI
> connection 1:0 error (1011) state (3)
>
>
> Is there something about a ping or nop timing out or a logout? If not
> then look in the target logs for problems around that time.

Every time we see one of these connection failures, we get the series of
5 errors I included in the first e-mail. Nothing about nops, pings, or
logouts. The target logs just say

> iSCSI session to target '10.5.128.129:3260,
iqn.2001-05.com.equallogic:0-8a0906-2b6e7d402-891497db5ca48925-xvm-volume-1'
from initiator '10.5.128.16:53712, iqn
1993-08.org.debian:01:bce099ff4d44' was closed.
> iSCSI initiator connection failure.
> Connection was closed by peer.

- Evan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-08 Thread Mike Christie

Evan Broder wrote:

>>> #define ISCSI_TRANSPORT_VERSION "2.0-724"
> 
> The userspace utilities are 2.0.865.
> 

Are these the Ubuntu tools or did you get them from open-iscsi.org? If 
they are ubuntu ones is there more version info on the ubuntu package? 
What is the iscsi package named in ubuntu btw?


Is there any other error messages in the logs other than this:

  Dec  8 00:50:36 aperture-science kernel: [1010621.595904]
connection1:0: iscsi: detected conn error (1011)
Dec  8 00:50:37 aperture-science iscsid: Kernel reported iSCSI
connection 1:0 error (1011) state (3)


Is there something about a ping or nop timing out or a logout? If not 
then look in the target logs for problems around that time.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Frequent Connection Errors with Dell Equallogic RAID

2008-12-08 Thread Pasi Kärkkäinen

On Mon, Dec 08, 2008 at 05:48:22AM -0800, Evan Broder wrote:
> 
> A group I work with is currently using a Dell Equallogic RAID with
> four servers on a dedicated storage network. We've been regularly
> experiencing connection errors:
> 
> Dec  8 00:50:36 aperture-science kernel: [1010621.595904]
> connection1:0: iscsi: detected conn error (1011)
> Dec  8 00:50:37 aperture-science iscsid: Kernel reported iSCSI
> connection 1:0 error (1011) state (3)
> Dec  8 00:50:39 aperture-science iscsid: Login authentication failed
> with target iqn.2001-05.com.equallogic:
> 0-8a0906-2b6e7d402-891497db5ca48925-xvm-volume-1
> Dec  8 00:50:39 aperture-science kernel: [1010624.597349] iscsi: host
> reset succeeded
> Dec  8 00:50:40 aperture-science iscsid: connection1:0 is operational
> after recovery (1 attempts)
> 
> These errors always occur about 40 seconds after a multiple of 5
> minutes after the hour. Other than that, we've seen no pattern in when
> they occur, or their cause. We've tried running some scripts designed
> to heavily utilize storage devices, but that doesn't seem to trigger
> it. These errors occur on all four of our servers, but not at the same
> time.
> 
> The RAID is being used as a physical volume for an LVM volume
> group. The servers are being used for hosting Xen virtual machines,
> which use LVs on the RAID as their disk images. Shortly before the
> connection errors are logged, all disk I/O from the virtual machines
> hangs completely, causing the VMs to basically become non-responsive
> to non-trivial interaction.
> 
> Our four servers are running the stock Ubuntu Hardy 2.6.24 kernel. I
> don't see an explicit version number in any of the iSCSI kernel
> source, but drivers/scsi/scsi_transport_iscsi.c contains:
> > > #define ISCSI_TRANSPORT_VERSION "2.0-724"
> 
> The userspace utilities are 2.0.865.
> 
> Does anyone know how to stop these errors? Is there more diagnostic
> information we could provide? We're way out of our league in terms of
> debugging this.
> 

Hmm.. wondering if those are related to automatic connection loadbalancing
on Equallogic arrays. 

Maybe check with iscsiadm if the connected interface (on the EQL array)
changes after those errors:

iscsiadm -m session -P3

And check the 'Current Portal' and 'Persistent Portal' values.. 

Persistent Portal should be your EQL group IP address, and Current Portal
should be whatever interface you're happening to use atm.. 

-- Pasi

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---