On 07/10/2009 01:06 PM, Hoot, Joseph wrote:
> Mike,
> I have some more details on this.  It seems that a simple `ping -Ieth2
> -i1`<-- our group IP to the EqualLogic is able to "reset
> sessions."
> eth0 = 1 active nic in the bond (public network)
> eth2 = iface eth2 (
> eth3 = iface eth3 (
> I slammed the public network for that system from 3 external systems at
> roughly 101MB/s<-- very nicely slammed for gigabit :) with netcat's to
> /dev/null.
> I had 8 netcat connections going through public network for about 25
> minutes without a single hiccup (as expected).
> For the iSCSI side I had previously done performance testing with dt as
> well as dd with bs=1M and was slamming the EqualLogic storage getting
> around 60MB/s reads on average with (2) systems each having OCFS2 shared
> storage and (2) iSCSI sessions each.  Writes were between 30MB/s and
> 155MB/s depending on which EqualLogic array was being hit (SATA vs
> SAS15k respectively).  This seemed to work well with a read and a write
> going on simultaneously for about 2 hours.
> As soon as I introduce pings:
> [r...@oim6102501 ~]# ping -Ieth2  ping -I eth3
> [r...@oim6102504 ~]# ping -Ieth2  ping -I eth3
> I receive the following sessions failing, according to the EqualLogic
> INFO  7/10/09  11:02:02 AM
>     SATA001  iSCSI session to target ',
> iqn.2001-05.com.equallogic:0-8a0906-82f16c402-fe30000b33e4a3bc-ovm-1-lun
> 0'
>     from initiator ',
> iqn.1994-05.com.redhat:c79dbacd466' was closed.
>     iSCSI initiator connection failure.   Reset received on the
> connection.
> Or according to /var/log/messages on my OVM Server:
> Jul 10 11:02:12 oim6102501 kernel: ping timeout of 10 secs expired, last
> rx 16848993, last ping 16851493, now 16852743

The target is getting the errors because the initiator's iscsi pings 
(nops) are not completing within those noop values I described in the 
last mail.

I have no idea why a network ping would cause the iscsi ping to fail. 
Maybe it is causing something to go wrong in the network routing. I 
really have no idea at this point though. I have never seen this before.

If you were slamming the network while running the iscsi traffic then 
this could cause the iscsi pings to take longer than noop_timeout 
seconds due to the nop getting stuck behind a long scsi/iscsi command 
and the non iscsi network test slowing down the iscsi traffic. However, 
just doing the ping commands above should not cause a problem.

If you turn off nops completely by setting those two noop values to zero:
node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0

you should not see the ping timeout errors. But do you then see a "Host 
reset succeeded" message?

One other question. It looks like the iscsi code you are using is from 
code based on 5.2. There was a bug in there where we would think a ping 
timedout when it did not. I do not think you are hitting this, but if 
you could make sure that you are running something based on Red Hat's 
5.1 it could rule that out.

> Jul 10 11:02:12 oim6102501 kernel:  connection1:0: iscsi: detected conn
> error (1011)
> Jul 10 11:02:12 oim6102501 iscsid: Kernel reported iSCSI connection 1:0
> error (1011) state (3)
> Jul 10 11:02:27 oim6102501 kernel: iscsi: cmd 0x28 is not queued (8)
> Jul 10 11:02:27 oim6102501 kernel:  session1: iscsi: session recovery
> timed out after 15 secs
> Jul 10 11:02:27 oim6102501 kernel: sd 5:0:0:0: SCSI error: return code =
> 0x00010000
> As soon as I do `killall ping`, within 1 minute the session will
> reconnect and dm-multipath will be happy again.
> So I'm wondering two things here:
> 1) I looked at the changelog between rpms.  I've included them below
> (actually Tom from Oracle did, but I'm just relaying this) and don't see
> any specific bug that talks about the "pdus with cmd sequences out of
> order."  I did a google search and found a bunch of changelog info here
> http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.29 but
> couldn't find the specific pdus with cmd sequences.  Would you mind
> pointing me to a publicly available bug repo where I can dig further on
> this?  Or you if you happen to know the bug number I can do searches on
> that as well.

I do not have a red hat bugzilla. Here is the upstream commit:


I do not think you are hitting this problem though. If you were you 
would not see that iscsi ping timeout message.

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
For more options, visit this group at http://groups.google.com/group/open-iscsi

Reply via email to