On 07/10/2009 01:06 PM, Hoot, Joseph wrote:
> Mike,
>
> I have some more details on this.  It seems that a simple `ping -Ieth2
> -i1 192.168.0.19`<-- our group IP to the EqualLogic is able to "reset
> sessions."
>
> eth0 = 1 active nic in the bond (public network)
> eth2 = iface eth2 (192.168.0.151/16)
> eth3 = iface eth3 (192.168.0.161/16)
>
> I slammed the public network for that system from 3 external systems at
> roughly 101MB/s<-- very nicely slammed for gigabit :) with netcat's to
> /dev/null.
>
> I had 8 netcat connections going through public network for about 25
> minutes without a single hiccup (as expected).
>
> For the iSCSI side I had previously done performance testing with dt as
> well as dd with bs=1M and was slamming the EqualLogic storage getting
> around 60MB/s reads on average with (2) systems each having OCFS2 shared
> storage and (2) iSCSI sessions each.  Writes were between 30MB/s and
> 155MB/s depending on which EqualLogic array was being hit (SATA vs
> SAS15k respectively).  This seemed to work well with a read and a write
> going on simultaneously for about 2 hours.
>
> As soon as I introduce pings:
> [r...@oim6102501 ~]# ping -Ieth2 192.168.0.19&  ping -I eth3 192.168.0.19
> [r...@oim6102504 ~]# ping -Ieth2 192.168.0.19&  ping -I eth3 192.168.0.19
>
> I receive the following sessions failing, according to the EqualLogic
> INFO  7/10/09  11:02:02 AM
>     SATA001  iSCSI session to target '192.168.0.30:3260,
> iqn.2001-05.com.equallogic:0-8a0906-82f16c402-fe30000b33e4a3bc-ovm-1-lun
> 0'
>     from initiator '192.168.0.161:45531,
> iqn.1994-05.com.redhat:c79dbacd466' was closed.
>     iSCSI initiator connection failure.   Reset received on the
> connection.
>
>
> Or according to /var/log/messages on my OVM Server:
>
> Jul 10 11:02:12 oim6102501 kernel: ping timeout of 10 secs expired, last
> rx 16848993, last ping 16851493, now 16852743


The target is getting the errors because the initiator's iscsi pings 
(nops) are not completing within those noop values I described in the 
last mail.

I have no idea why a network ping would cause the iscsi ping to fail. 
Maybe it is causing something to go wrong in the network routing. I 
really have no idea at this point though. I have never seen this before.

If you were slamming the network while running the iscsi traffic then 
this could cause the iscsi pings to take longer than noop_timeout 
seconds due to the nop getting stuck behind a long scsi/iscsi command 
and the non iscsi network test slowing down the iscsi traffic. However, 
just doing the ping commands above should not cause a problem.

If you turn off nops completely by setting those two noop values to zero:
node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0

you should not see the ping timeout errors. But do you then see a "Host 
reset succeeded" message?


One other question. It looks like the iscsi code you are using is from 
code based on 5.2. There was a bug in there where we would think a ping 
timedout when it did not. I do not think you are hitting this, but if 
you could make sure that you are running something based on Red Hat's 
5.1 it could rule that out.


> Jul 10 11:02:12 oim6102501 kernel:  connection1:0: iscsi: detected conn
> error (1011)
> Jul 10 11:02:12 oim6102501 iscsid: Kernel reported iSCSI connection 1:0
> error (1011) state (3)
> Jul 10 11:02:27 oim6102501 kernel: iscsi: cmd 0x28 is not queued (8)
> Jul 10 11:02:27 oim6102501 kernel:  session1: iscsi: session recovery
> timed out after 15 secs
> Jul 10 11:02:27 oim6102501 kernel: sd 5:0:0:0: SCSI error: return code =
> 0x00010000
>
> As soon as I do `killall ping`, within 1 minute the session will
> reconnect and dm-multipath will be happy again.
>
> So I'm wondering two things here:
>
> 1) I looked at the changelog between rpms.  I've included them below
> (actually Tom from Oracle did, but I'm just relaying this) and don't see
> any specific bug that talks about the "pdus with cmd sequences out of
> order."  I did a google search and found a bunch of changelog info here
> http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.29 but
> couldn't find the specific pdus with cmd sequences.  Would you mind
> pointing me to a publicly available bug repo where I can dig further on
> this?  Or you if you happen to know the bug number I can do searches on
> that as well.
>

I do not have a red hat bugzilla. Here is the upstream commit:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=77a23c21aaa723f6b0ffc4a701be8c8e5a32346d

I do not think you are hitting this problem though. If you were you 
would not see that iscsi ping timeout message.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to