Mike,

I have some more details on this.  It seems that a simple `ping -Ieth2
-i1 192.168.0.19` <-- our group IP to the EqualLogic is able to "reset
sessions."

eth0 = 1 active nic in the bond (public network)
eth2 = iface eth2 (192.168.0.151/16)
eth3 = iface eth3 (192.168.0.161/16)

I slammed the public network for that system from 3 external systems at
roughly 101MB/s <-- very nicely slammed for gigabit :) with netcat's to
/dev/null.

I had 8 netcat connections going through public network for about 25
minutes without a single hiccup (as expected).  

For the iSCSI side I had previously done performance testing with dt as
well as dd with bs=1M and was slamming the EqualLogic storage getting
around 60MB/s reads on average with (2) systems each having OCFS2 shared
storage and (2) iSCSI sessions each.  Writes were between 30MB/s and
155MB/s depending on which EqualLogic array was being hit (SATA vs
SAS15k respectively).  This seemed to work well with a read and a write
going on simultaneously for about 2 hours.

As soon as I introduce pings:
[r...@oim6102501 ~]# ping -Ieth2 192.168.0.19& ping -I eth3 192.168.0.19
[r...@oim6102504 ~]# ping -Ieth2 192.168.0.19& ping -I eth3 192.168.0.19

I receive the following sessions failing, according to the EqualLogic
INFO  7/10/09  11:02:02 AM 
   SATA001  iSCSI session to target '192.168.0.30:3260,
iqn.2001-05.com.equallogic:0-8a0906-82f16c402-fe30000b33e4a3bc-ovm-1-lun
0' 
   from initiator '192.168.0.161:45531,
iqn.1994-05.com.redhat:c79dbacd466' was closed.   
   iSCSI initiator connection failure.   Reset received on the
connection.


Or according to /var/log/messages on my OVM Server:

Jul 10 11:02:12 oim6102501 kernel: ping timeout of 10 secs expired, last
rx 16848993, last ping 16851493, now 16852743 
Jul 10 11:02:12 oim6102501 kernel:  connection1:0: iscsi: detected conn
error (1011) 
Jul 10 11:02:12 oim6102501 iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3) 
Jul 10 11:02:27 oim6102501 kernel: iscsi: cmd 0x28 is not queued (8) 
Jul 10 11:02:27 oim6102501 kernel:  session1: iscsi: session recovery
timed out after 15 secs
Jul 10 11:02:27 oim6102501 kernel: sd 5:0:0:0: SCSI error: return code =
0x00010000

As soon as I do `killall ping`, within 1 minute the session will
reconnect and dm-multipath will be happy again.

So I'm wondering two things here:

1) I looked at the changelog between rpms.  I've included them below
(actually Tom from Oracle did, but I'm just relaying this) and don't see
any specific bug that talks about the "pdus with cmd sequences out of
order."  I did a google search and found a bunch of changelog info here
http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.29 but
couldn't find the specific pdus with cmd sequences.  Would you mind
pointing me to a publicly available bug repo where I can dig further on
this?  Or you if you happen to know the bug number I can do searches on
that as well.

2) Do you (or anyone else on the list) see a reason why a simple ping
would be disconnecting my session?

Sorry for the ignorance if you don't think I've dug deep enough into the
RFC... I'm still digging, but figured I'd toss this out to this list.

Thanks for any and all help, I appreciate it.


# rpm -qp --changelog
iscsi-initiator-utils-6.2.0.868-0.18.el5_3.1.i386.rpm
* Mon Jun 01 2009 Mike Christie <mchris...@redhat.com> -
6.2.0.868-0.18.1
- 501737 Fix handling of ipv6 addresses when login redirect is used.

* Wed Dec 17 2008 Mike Christie <mchris...@redhat.com> - 6.2.0.868-0.18
- 476752 Must install list.h (install as iscsi_list.h) because
fw_context.h is bringing it in. Also revert ibft name changes because
anaconda is using them instead of the fw_context.h functions.

* Tue Dec 02 2008 Mike Christie <mchris...@redhat.com> - 6.2.0.868-0.17
- 472562 (additional fixup to patch) only use logout time2wait for
relogins when response code is 2 or 3.

* Sun Nov 30 2008 Mike Christie <mchris...@redhat.com> - 6.2.0.868-0.16
- 432819 create node records for each ibft portal and log into all of
them

* Sat Nov 22 2008 Mike Christie <mchris...@redhat.com> - 6.2.0.868-0.15
- 472562 always retry relogins.

* Wed Nov 19 2008 Mike Christie <mchris...@redhat.com> - 6.2.0.868-0.14
- 432819 increase CHAP string sizes.

* Wed Nov 05 2008 Mike Christie <mchris...@redhat.com> - 6.2.0.868-0.12
- 469162 /var/lib/iscsi was not listed as owned by this package.

* Thu Sep 18 2008 Mike Christie <mchris...@redhat.com> - 6.2.0.868-0.11
- 253834 fix iscsid init script shutdown

* Tue Sep 16 2008 Mike Christie <mchris...@redhat.com> - 6.2.0.868-0.10
- 461294 Port login retry fixes from upstream.

* Tue Aug 26 2008 Mike Christie <mchris...@redhat.com> - 6.2.0.868-0.9
- Related to 445721 - Install fw helpers as lib so first stage of
installer can easily use it. This is a temp lib, and is not a stable
interface. For now, just for install, we are adding this library.

* Wed Aug 06 2008 Mike Christie <mchris...@redhat.com> - 6.2.0.868-0.8
- Related to 445721 - when using intel nics some network values are not
set, so iscsiadm would fail instead of just printing what we got.

* Mon Apr 28 2008 Mike Christie <mchris...@redhat.com> - 6.2.0.868-0.7
- 444379 Increase login retries for boot.

* Tue Mar 25 2008 Mike Christie <mchris...@redhat.com> - 6.2.0.868-0.6
- 438092 Print netdev.



-----Original Message-----
From: Hoot, Joseph 
Sent: Thursday, July 09, 2009 11:43 AM
To: 'open-iscsi@googlegroups.com'
Subject: RE: iscsiadm -m iface + routing

Thanks Mike,

We are definitely hitting some similar issues again, so I will be sure
to forward this off to Oracle and Dell.

-----Original Message-----
From: open-iscsi@googlegroups.com [mailto:open-is...@googlegroups.com]
On Behalf Of Mike Christie
Sent: Thursday, July 09, 2009 11:36 AM
To: open-iscsi@googlegroups.com
Subject: Re: iscsiadm -m iface + routing


On 07/06/2009 05:27 AM, hootjr29 wrote:
> Hi all,
>
> I'm currently attempting to implement a Dell EqualLogic iSCSI solution
> connected through m1000e switches to Dell m610 blades with (2) iSCSI
> dedicated nics in each blade running Oracle VM Server v2.1.5 (which, I
> believe, is based off of RHEL5.1).
>
> [r...@oim6102501 log]# rpm -qa | grep iscsi
> iscsi-initiator-utils-6.2.0.868-0.7.el5
> [r...@oim6102501 log]# uname -a
> Linux oim6102501 2.6.18-8.1.15.3.1.el5xen #1 SMP Tue May 12 19:21:30
> EDT 2009 i686 i686 i386 GNU/Linux
>

I think there was a bug in RHEL 5.1 and 5.0 where the initiator was 
sending pdus with cmd sequences out of order and the EQL box would drop 
the session. So if are not seeing ping/nop messages, then you might be 
hitting that problem. It is fixed in RHEL 5.3. Is there a Oracle release

based on that you can try? Or can you just use RHEL kernels? If so then 
you might want to try this one:
http://people.redhat.com/dzickus/el5/157.el5/



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to