Mike, I have some more details on this. It seems that a simple `ping -Ieth2 -i1 192.168.0.19` <-- our group IP to the EqualLogic is able to "reset sessions."
eth0 = 1 active nic in the bond (public network) eth2 = iface eth2 (192.168.0.151/16) eth3 = iface eth3 (192.168.0.161/16) I slammed the public network for that system from 3 external systems at roughly 101MB/s <-- very nicely slammed for gigabit :) with netcat's to /dev/null. I had 8 netcat connections going through public network for about 25 minutes without a single hiccup (as expected). For the iSCSI side I had previously done performance testing with dt as well as dd with bs=1M and was slamming the EqualLogic storage getting around 60MB/s reads on average with (2) systems each having OCFS2 shared storage and (2) iSCSI sessions each. Writes were between 30MB/s and 155MB/s depending on which EqualLogic array was being hit (SATA vs SAS15k respectively). This seemed to work well with a read and a write going on simultaneously for about 2 hours. As soon as I introduce pings: [r...@oim6102501 ~]# ping -Ieth2 192.168.0.19& ping -I eth3 192.168.0.19 [r...@oim6102504 ~]# ping -Ieth2 192.168.0.19& ping -I eth3 192.168.0.19 I receive the following sessions failing, according to the EqualLogic INFO 7/10/09 11:02:02 AM SATA001 iSCSI session to target '192.168.0.30:3260, iqn.2001-05.com.equallogic:0-8a0906-82f16c402-fe30000b33e4a3bc-ovm-1-lun 0' from initiator '192.168.0.161:45531, iqn.1994-05.com.redhat:c79dbacd466' was closed. iSCSI initiator connection failure. Reset received on the connection. Or according to /var/log/messages on my OVM Server: Jul 10 11:02:12 oim6102501 kernel: ping timeout of 10 secs expired, last rx 16848993, last ping 16851493, now 16852743 Jul 10 11:02:12 oim6102501 kernel: connection1:0: iscsi: detected conn error (1011) Jul 10 11:02:12 oim6102501 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Jul 10 11:02:27 oim6102501 kernel: iscsi: cmd 0x28 is not queued (8) Jul 10 11:02:27 oim6102501 kernel: session1: iscsi: session recovery timed out after 15 secs Jul 10 11:02:27 oim6102501 kernel: sd 5:0:0:0: SCSI error: return code = 0x00010000 As soon as I do `killall ping`, within 1 minute the session will reconnect and dm-multipath will be happy again. So I'm wondering two things here: 1) I looked at the changelog between rpms. I've included them below (actually Tom from Oracle did, but I'm just relaying this) and don't see any specific bug that talks about the "pdus with cmd sequences out of order." I did a google search and found a bunch of changelog info here http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.29 but couldn't find the specific pdus with cmd sequences. Would you mind pointing me to a publicly available bug repo where I can dig further on this? Or you if you happen to know the bug number I can do searches on that as well. 2) Do you (or anyone else on the list) see a reason why a simple ping would be disconnecting my session? Sorry for the ignorance if you don't think I've dug deep enough into the RFC... I'm still digging, but figured I'd toss this out to this list. Thanks for any and all help, I appreciate it. # rpm -qp --changelog iscsi-initiator-utils-22.214.171.1248-0.18.el5_3.1.i386.rpm * Mon Jun 01 2009 Mike Christie <mchris...@redhat.com> - 126.96.36.1998-0.18.1 - 501737 Fix handling of ipv6 addresses when login redirect is used. * Wed Dec 17 2008 Mike Christie <mchris...@redhat.com> - 188.8.131.528-0.18 - 476752 Must install list.h (install as iscsi_list.h) because fw_context.h is bringing it in. Also revert ibft name changes because anaconda is using them instead of the fw_context.h functions. * Tue Dec 02 2008 Mike Christie <mchris...@redhat.com> - 184.108.40.2068-0.17 - 472562 (additional fixup to patch) only use logout time2wait for relogins when response code is 2 or 3. * Sun Nov 30 2008 Mike Christie <mchris...@redhat.com> - 220.127.116.118-0.16 - 432819 create node records for each ibft portal and log into all of them * Sat Nov 22 2008 Mike Christie <mchris...@redhat.com> - 18.104.22.1688-0.15 - 472562 always retry relogins. * Wed Nov 19 2008 Mike Christie <mchris...@redhat.com> - 22.214.171.1248-0.14 - 432819 increase CHAP string sizes. * Wed Nov 05 2008 Mike Christie <mchris...@redhat.com> - 126.96.36.1998-0.12 - 469162 /var/lib/iscsi was not listed as owned by this package. * Thu Sep 18 2008 Mike Christie <mchris...@redhat.com> - 188.8.131.528-0.11 - 253834 fix iscsid init script shutdown * Tue Sep 16 2008 Mike Christie <mchris...@redhat.com> - 184.108.40.2068-0.10 - 461294 Port login retry fixes from upstream. * Tue Aug 26 2008 Mike Christie <mchris...@redhat.com> - 220.127.116.118-0.9 - Related to 445721 - Install fw helpers as lib so first stage of installer can easily use it. This is a temp lib, and is not a stable interface. For now, just for install, we are adding this library. * Wed Aug 06 2008 Mike Christie <mchris...@redhat.com> - 18.104.22.1688-0.8 - Related to 445721 - when using intel nics some network values are not set, so iscsiadm would fail instead of just printing what we got. * Mon Apr 28 2008 Mike Christie <mchris...@redhat.com> - 22.214.171.1248-0.7 - 444379 Increase login retries for boot. * Tue Mar 25 2008 Mike Christie <mchris...@redhat.com> - 126.96.36.1998-0.6 - 438092 Print netdev. -----Original Message----- From: Hoot, Joseph Sent: Thursday, July 09, 2009 11:43 AM To: 'firstname.lastname@example.org' Subject: RE: iscsiadm -m iface + routing Thanks Mike, We are definitely hitting some similar issues again, so I will be sure to forward this off to Oracle and Dell. -----Original Message----- From: email@example.com [mailto:open-is...@googlegroups.com] On Behalf Of Mike Christie Sent: Thursday, July 09, 2009 11:36 AM To: firstname.lastname@example.org Subject: Re: iscsiadm -m iface + routing On 07/06/2009 05:27 AM, hootjr29 wrote: > Hi all, > > I'm currently attempting to implement a Dell EqualLogic iSCSI solution > connected through m1000e switches to Dell m610 blades with (2) iSCSI > dedicated nics in each blade running Oracle VM Server v2.1.5 (which, I > believe, is based off of RHEL5.1). > > [r...@oim6102501 log]# rpm -qa | grep iscsi > iscsi-initiator-utils-188.8.131.528-0.7.el5 > [r...@oim6102501 log]# uname -a > Linux oim6102501 2.6.18-184.108.40.206.1.el5xen #1 SMP Tue May 12 19:21:30 > EDT 2009 i686 i686 i386 GNU/Linux > I think there was a bug in RHEL 5.1 and 5.0 where the initiator was sending pdus with cmd sequences out of order and the EQL box would drop the session. So if are not seeing ping/nop messages, then you might be hitting that problem. It is fixed in RHEL 5.3. Is there a Oracle release based on that you can try? Or can you just use RHEL kernels? If so then you might want to try this one: http://people.redhat.com/dzickus/el5/157.el5/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to email@example.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---