On 11/30/2010 11:05 AM, hootjr29 wrote:
Hi all,

I am running into issues where I am getting iscsid ping timeouts for
my connections (not all.. just some... and it appears to be when the
EqualLogic system is busier).

Example:
=======
Nov 29 01:03:47 oim6102505 kernel:  connection90:0: ping timeout of 10
secs expired, recv timeout 5, last rx 198077764, last ping 1980790
14, now 198081514
Nov 29 01:03:47 oim6102505 kernel:  connection90:0: detected conn
error (1011)
Nov 29 01:03:47 oim6102505 multipathd: sdam: readsector0 checker
reports path is down

GIVENS:
=======
[r...@servernamehere ~]# iscsiadm -m host -P 1
Host Number: 10
         State: running
         Transport: tcp
         Initiatorname:<empty>
         IPaddress: 192.168.9.9
         HWaddress: 00:10:18:3B:e5:23
         Netdev:<empty>
[r...@servernamehere ~]# rpm -qa | grep iscsi
iscsi-initiator-utils-6.2.0.871-0.7.el5
[r...@servernamehere ~]# uname -a
Linux servernamehere 2.6.18-128.2.1.4.27.el5xen #1 SMP Sat Jul 24
02:16:40 EDT 2010 i686 i686 i386 GNU/Linux


I've run into issues in the past where this was related to nop-out
code.  Mike Christie had provided the patches that appear to have
resolved it in the open-iscsi 871 code.  I worked with Oracle support
(this is an Oracle VM 2.2.1 environment). and they were able to update
their yum repos to reflect this open-iscsi update.



Could you send me the libiscsi.c and iscsi_tcp.c files in the kernel you are using or could you point me to the kernel source?



Now (a year or so later), I'm starting to see more connection timeout
messages.  After digging into this I determined that it looks like we
may be hitting possible EqualLogic problems with it sending pings in a
different way that it is expected in the nop-out standard/code?

I found this thread which may be related:

   
http://groups.google.com/group/open-iscsi/browse_thread/thread/a220595ec4f5f1d2/e90fc5d983a6186c?lnk=gst&q=bnx2i#e90fc5d983a6186c


I think those issues were related to and the fault of the offload bnx2i driver. There were several bugs in that code related to nops/pings. They should not affect you.


QUESTIONS:
===========
1) I guess what I'm wondering (and I've asked Oracle support to dig
further into this as well, btw) is if anyone knows if bnx2 falls into
the same type of bugs as bnx2i with regards to nop-out code?

No. If you are using bnx2 + iscsi_tcp then bnx2i does not come into play.


2) If I disable nop-outs, this will likely remove these errors.  But
will it negatively affect my connections?  Even if the EQLX is 100%
busy doing stuff, will the scsi and dm-multipath code just handle that
outside of iSCSI code?  In other words, I guess I don't know what what
question I'm really asking here, but just am nervous about disabling
nop-outs :/

If you disable initiator nops and there is a valid problem then it will take longer to fail a path in cases the network layer does not give us an error and we were detecting the problem from the nop timing out.

The scsi layer and dm-multipath will eventually figure things out. It will just take longer. The scsi layer's per command timeout (/sys/block/sdX/device/timeout) will eventually expire. This will start the scsi error handler which tries to send aborts and resets. If the path is really bad those will fail since we cannot reach the target. The iscsi layer will then try to relogin for node.session.timeo.replacement_timeout seconds. When that fails, the iscsi layer will tell the scsi layer that we have failed and the scsi layer will then notify the multipath layer which will retry the IO on another path.



Any help/advice is appreciated :)



Another EQL customer contacted EQL/Dell support they had them try these settings:


1) sysctl.conf

net.ipv4.conf.all.arp_ignore=1
net.ipv4.conf.all.arp_announce=2
net.ipv4.netfilter.ip_conntrack_tcp_be_liberal=1


2) iscsid.conf

node.session.cmds_max = 1024
node.session.queue_depth = 128

--
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to