I'm running RHEL5 kernel 2.6.18 in a VM under VMware ESX 4.0.0,
261974. We have several VM that don't have any problem but they are
all running some flavor of Windows. This is our only Linux box using a
targeted iSCSI volume.

# yum info device-mapper-multipath
Loaded plugins: rhnplugin, security
Installed Packages
Name       : device-mapper-multipath
Arch       : x86_64
Version    : 0.4.7
Release    : 34.el5_5.6
Size       : 6.9 M
Repo       : installed

# yum info iscsi-initiator-utils
Loaded plugins: rhnplugin, security
Installed Packages
Name       : iscsi-initiator-utils
Arch       : x86_64
Version    : 6.2.0.871
Release    : 0.20.el5_5
Size       : 1.9 M
Repo       : installed

We are using multipath connections and one of the paths frequently
gets ping timeouts (usually about every 5 - 15 minutes) and then the
connection error which causes a problematic pause in IO while the
multipathing recovers.
The server is our campus web server and is running Apache, mysql, and
Joomla.

I've been a sysadmin for other OSes for many years but am new to Linux
server administration and any help would be greatly appreciated.

Below are more details. If there is anything else I can provide that
would be helpful please let me know:


# dmesg
device-mapper: multipath: Failing path 8:32.
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4355241960, last ping 4355246960, now 4355251960
 connection2:0: detected conn error (1011)
device-mapper: multipath: Failing path 8:32.
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4355898393, last ping 4355903393, now 4355908393
 connection2:0: detected conn error (1011)
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4357043430, last ping 4357048430, now 4357053430
 connection2:0: detected conn error (1011)
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4358334347, last ping 4358339347, now 4358344347
 connection2:0: detected conn error (1011)
device-mapper: multipath: Failing path 8:32.
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4358911119, last ping 4358916119, now 4358921119
 connection2:0: detected conn error (1011)
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4360295386, last ping 4360300386, now 4360305386
 connection2:0: detected conn error (1011)
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4360717303, last ping 4360722303, now 4360727303
 connection2:0: detected conn error (1011)
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4361937494, last ping 4361942494, now 4361947494
 connection2:0: detected conn error (1011)
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4363713740, last ping 4363718740, now 4363723740
 connection2:0: detected conn error (1011)
device-mapper: multipath: Failing path 8:32.
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4364203932, last ping 4364208932, now 4364213932
 connection2:0: detected conn error (1011)
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4365150602, last ping 4365155602, now 4365160602
 connection2:0: detected conn error (1011)
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4365512045, last ping 4365517045, now 4365522045
 connection2:0: detected conn error (1011)
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4365977352, last ping 4365982352, now 4365987352
 connection2:0: detected conn error (1011)
 connection2:0: detected conn error (1019)
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4366877780, last ping 4366882780, now 4366887780
 connection2:0: detected conn error (1011)
 connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4367263655, last ping 4367268655, now 4367273655
 connection2:0: detected conn error (1011)

# tail /var/log/messages

Nov 12 10:40:32 techcms kernel:  connection2:0: ping timeout of 5 secs
expired, recv timeout 5, last rx 4368811299, last ping 4368816299, now
4368821299
Nov 12 10:40:32 techcms kernel:  connection2:0: detected conn error
(1011)
Nov 12 10:40:33 techcms multipathd: sdc: readsector0 checker reports
path is down
Nov 12 10:40:33 techcms multipathd: checker failed path 8:32 in map
san-techcms
Nov 12 10:40:33 techcms multipathd: san-techcms: remaining active
paths: 1
Nov 12 10:40:33 techcms multipathd: dm-2: add map (uevent)
Nov 12 10:40:33 techcms kernel: device-mapper: multipath: Failing path
8:32.
Nov 12 10:40:33 techcms multipathd: dm-2: devmap already registered
Nov 12 10:40:33 techcms iscsid: Kernel reported iSCSI connection 2:0
error (1011) state (3)
Nov 12 10:40:55 techcms multipathd: sdc: readsector0 checker reports
path is up
Nov 12 10:40:55 techcms multipathd: 8:32: reinstated
Nov 12 10:40:55 techcms multipathd: san-techcms: remaining active
paths: 2
Nov 12 10:40:55 techcms multipathd: dm-2: add map (uevent)
Nov 12 10:40:55 techcms multipathd: dm-2: devmap already registered
Nov 12 10:40:55 techcms iscsid: connection2:0 is operational after
recovery (2 attempts)
Nov 12 10:43:25 techcms kernel:  connection2:0: ping timeout of 5 secs
expired, recv timeout 5, last rx 4368984192, last ping 4368989192, now
4368994192
Nov 12 10:43:25 techcms kernel:  connection2:0: detected conn error
(1011)
Nov 12 10:43:26 techcms iscsid: Kernel reported iSCSI connection 2:0
error (1011) state (3)
Nov 12 10:43:29 techcms iscsid: connection2:0 is operational after
recovery (1 attempts)

# iscsiadm -m session -P3

iSCSI Transport Class version 2.0-871
version 2.0-871
Target: iqn.2001-05.com.equallogic:0-8a0906-d97510304-d8300001c3a4bcf6-
techcms
        Current Portal: 192.168.2.55:3260,1
        Persistent Portal: 192.168.2.22:3260,1
                **********
                Interface:
                **********
                Iface Name: eth1
                Iface Transport: tcp
                Iface Initiatorname: iqn.2010-04.edu.tntech:techcms.tntech.edu
                Iface IPaddress: 192.168.3.232
                Iface HWaddress: 00:50:56:81:4C:8E
                Iface Netdev: <empty>
                SID: 1
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                ************************
                Negotiated iSCSI params:
                ************************
                HeaderDigest: None
                DataDigest: None
                MaxRecvDataSegmentLength: 262144
                MaxXmitDataSegmentLength: 65536
                FirstBurstLength: 65536
                MaxBurstLength: 262144
                ImmediateData: Yes
                InitialR2T: No
                MaxOutstandingR2T: 1
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 1  State: running
                scsi1 Channel 00 Id 0 Lun: 0
                        Attached scsi disk sdb          State: running
        Current Portal: 192.168.2.48:3260,1
        Persistent Portal: 192.168.2.22:3260,1
                **********
                Interface:
                **********
                Iface Name: eth2
                Iface Transport: tcp
                Iface Initiatorname: iqn.2010-04.edu.tntech:techcms.tntech.edu
                Iface IPaddress: 192.168.3.233
                Iface HWaddress: 00:50:56:81:02:19
                Iface Netdev: <empty>
                SID: 2
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                ************************
                Negotiated iSCSI params:
                ************************
                HeaderDigest: None
                DataDigest: None
                MaxRecvDataSegmentLength: 262144
                MaxXmitDataSegmentLength: 65536
                FirstBurstLength: 65536
                MaxBurstLength: 262144
                ImmediateData: Yes
                InitialR2T: No
                MaxOutstandingR2T: 1
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 2  State: running
                scsi2 Channel 00 Id 0 Lun: 0
                        Attached scsi disk sdc          State: running


Here's a brief dstat session that might give you an idea of typical
data rates but of course they can vary quite a bit.

# dstat -cndmD total,hda,sda -N eth1,eth2
----total-cpu-usage---- --net/eth1- --net/eth2- -dsk/total- --dsk/
hda-- --dsk/sda-- ------memory-usage-----
usr sys idl wai hiq siq| recv  send: recv  send| read  writ: read
writ: read  writ| used  buff  cach  free
 17   1  81   1   0   0|   0     0 :   0     0 | 180k  211k: 5.4B
0 :  32k   18k|2769M  242M 9135M 3665M
 24   1  74   0   0   0|9424B  495k:8284B  489k|   0   972k:   0
0 :   0     0 |2769M  242M 9135M 3664M
 23   3  74   0   0   1| 604B    0 : 604B    0 |   0     0 :   0
0 :   0     0 |2771M  242M 9135M 3662M
  2   0  98   0   0   0|   0     0 :   0     0 |   0     0 :   0
0 :   0     0 |2771M  242M 9135M 3662M
  1   0  99   0   0   0| 604B    0 : 604B    0 |   0   512k:   0
0 :   0   256k|2771M  242M 9135M 3662M
  5   0  94   0   0   0| 906B    0 : 906B    0 |   0     0 :   0
0 :   0     0 |2773M  242M 9135M 3660M
 15   1  84   0   0   0| 114B  180B: 806B  246B|   0     0 :   0
0 :   0     0 |2820M  242M 9135M 3613M
  0   0  99   0   0   0|8162B  359k:7130B  332k|   0   680k:   0
0 :   0     0 |2820M  242M 9135M 3613M
 11   0  88   0   0   0|   0     0 :   0     0 |   0     0 :   0
0 :   0     0 |2821M  242M 9135M 3612M
 24   2  75   0   0   0| 626B  180B:   0     0 |   0    56k:   0
0 :   0    28k|2821M  242M 9135M 3612M
 24   2  74   0   0   0| 906B    0 : 906B    0 |   0     0 :   0
0 :   0     0 |2822M  242M 9135M 3611M
 13   0  86   0   0   0| 664B    0 :  25k  312B|  24k    0 :   0
0 :   0     0 |2823M  242M 9135M 3610M
 24   1  74   0   0   1|  22k  820k:  26k  816k|   0  1596k:   0
0 :   0     0 |2823M  242M 9135M 3610M
 20   3  76   0   0   1| 302B    0 : 302B    0 |   0     0 :   0
0 :   0     0 |2871M  242M 9135M 3562M
 10   1  90   0   0   0|   0     0 :   0     0 |   0    40k:   0
0 :   0    20k|2871M  242M 9135M 3562M
 19   2  79   0   0   0| 604B    0 : 604B    0 |   0     0 :   0
0 :   0     0 |2872M  242M 9135M 3561M
 12   1  87   0   0   0|   0     0 :   0     0 |   0     0 :   0
0 :   0     0 |2872M  242M 9135M 3561M
  1   0  98   0   0   0|6324B  405k:8370B  415k|   0   808k:   0
0 :   0     0 |2872M  242M 9135M 3561M
 19   1  79   0   0   0|  92B    0 :  92B    0 |   0     0 :   0
0 :   0     0 |2874M  242M 9135M 3559M
 10   1  90   0   0   0| 788B    0 : 788B    0 |   0     0 :   0
0 :   0     0 |2874M  242M 9135M 3558M
  1   0  99   0   0   0|   0     0 :   0     0 |   0    40k:   0
0 :   0    20k|2874M  242M 9135M 3558M
  0   0 100   0   0   0| 604B    0 : 604B    0 |   0     0 :   0
0 :   0     0 |2874M  242M 9135M 3558M
  0   0  99   0   0   0|5244B  227k:4308B  227k|   0  1024k:   0
0 :   0   288k|2874M  242M 9135M 3559M

/etc/iscsi/iscsid.conf -- I modified the FastAbort setting at the end
in an effort to resolve these problems but it has not helped.

#
# Open-iSCSI default configuration.
# Could be located at /etc/iscsi/iscsid.conf or ~/.iscsid.conf
#
# Note: To set any of these values for a specific node/session run
# the iscsiadm --mode node --op command for the value. See the README
# and man page for iscsiadm for details on the --op command.
#

################
# iSNS settings
################
# Address of iSNS server
#isns.address = 192.168.0.1
#isns.port = 3205

#############################
# NIC/HBA and driver settings
#############################
# open-iscsi can create a session and bind it to a NIC/HBA.
# To set this up see the example iface config file.

#*****************
# Startup settings
#*****************

# To request that the iscsi initd scripts startup a session set to
"automatic".
# node.startup = automatic
#
# To manually startup the session set to "manual". The default is
automatic.
node.startup = automatic

# *************
# CHAP Settings
# *************

# To enable CHAP authentication set node.session.auth.authmethod
# to CHAP. The default is None.
#node.session.auth.authmethod = CHAP

# To set a CHAP username and password for initiator
# authentication by the target(s), uncomment the following lines:
#node.session.auth.username = username
#node.session.auth.password = password

# To set a CHAP username and password for target(s)
# authentication by the initiator, uncomment the following lines:
#node.session.auth.username_in = username_in
#node.session.auth.password_in = password_in

# To enable CHAP authentication for a discovery session to the target
# set discovery.sendtargets.auth.authmethod to CHAP. The default is
None.
#discovery.sendtargets.auth.authmethod = CHAP

# To set a discovery session CHAP username and password for the
initiator
# authentication by the target(s), uncomment the following lines:
#discovery.sendtargets.auth.username = username
#discovery.sendtargets.auth.password = password

# To set a discovery session CHAP username and password for target(s)
# authentication by the initiator, uncomment the following lines:
#discovery.sendtargets.auth.username_in = username_in
#discovery.sendtargets.auth.password_in = password_in

# ********
# Timeouts
# ********
#
# See the iSCSI REAME's Advanced Configuration section for tips
# on setting timeouts when using multipath or doing root over iSCSI.
#
# To specify the length of time to wait for session re-establishment
# before failing SCSI commands back to the application when running
# the Linux SCSI Layer error handler, edit the line.
# The value is in seconds and the default is 120 seconds.
node.session.timeo.replacement_timeout = 120

# To specify the time to wait for login to complete, edit the line.
# The value is in seconds and the default is 15 seconds.
node.conn[0].timeo.login_timeout = 15

# To specify the time to wait for logout to complete, edit the line.
# The value is in seconds and the default is 15 seconds.
node.conn[0].timeo.logout_timeout = 15

# Time interval to wait for on connection before sending a ping.
node.conn[0].timeo.noop_out_interval = 5

# To specify the time to wait for a Nop-out response before failing
# the connection, edit this line. Failing the connection will
# cause IO to be failed back to the SCSI layer. If using dm-multipath
# this will cause the IO to be failed to the multipath layer.
node.conn[0].timeo.noop_out_timeout = 5

# To specify the time to wait for abort response before
# failing the operation and trying a logical unit reset edit the line.
# The value is in seconds and the default is 15 seconds.
node.session.err_timeo.abort_timeout = 15

# To specify the time to wait for a logical unit response
# before failing the operation and trying session re-establishment
# edit the line.
# The value is in seconds and the default is 30 seconds.
node.session.err_timeo.lu_reset_timeout = 20

#******
# Retry
#******

# To specify the number of times iscsid should retry a login
# if the login attempt fails due to the
node.conn[0].timeo.login_timeout
# expiring modify the following line. Note that if the login fails
# quickly (before node.conn[0].timeo.login_timeout fires) because the
network
# layer or the target returns an error, iscsid may retry the login
more than
# node.session.initial_login_retry_max times.
#
# This retry count along with node.conn[0].timeo.login_timeout
# determines the maximum amount of time iscsid will try to
# establish the initial login. node.session.initial_login_retry_max is
# multiplied by the node.conn[0].timeo.login_timeout to determine the
# maximum amount.
#
# The default node.session.initial_login_retry_max is 8 and
# node.conn[0].timeo.login_timeout is 15 so we have:
#
# node.conn[0].timeo.login_timeout *
node.session.initial_login_retry_max =
#                                                               120 seconds
#
# Valid values are any integer value. This only
# affects the initial login. Setting it to a high value can slow
# down the iscsi service startup. Setting it to a low value can
# cause a session to not get logged into, if there are distuptions
# during startup or if the network is not ready at that time.
node.session.initial_login_retry_max = 8

################################
# session and device queue depth
################################

# To control how many commands the session will queue set
# node.session.cmds_max to an integer between 2 and 2048 that is also
# a power of 2. The default is 128.
node.session.cmds_max = 128

# To control the device's queue depth set node.session.queue_depth
# to a value between 1 and 1024. The default is 32.
node.session.queue_depth = 32

#***************
# iSCSI settings
#***************

# To enable R2T flow control (i.e., the initiator must wait for an R2T
# command before sending any data), uncomment the following line:
#
#node.session.iscsi.InitialR2T = Yes
#
# To disable R2T flow control (i.e., the initiator has an implied
# initial R2T of "FirstBurstLength" at offset 0), uncomment the
following line:
#
# The defaults is No.
node.session.iscsi.InitialR2T = No

#
# To disable immediate data (i.e., the initiator does not send
# unsolicited data with the iSCSI command PDU), uncomment the
following line:
#
#node.session.iscsi.ImmediateData = No
#
# To enable immediate data (i.e., the initiator sends unsolicited data
# with the iSCSI command packet), uncomment the following line:
#
# The default is Yes
node.session.iscsi.ImmediateData = Yes

# To specify the maximum number of unsolicited data bytes the
initiator
# can send in an iSCSI PDU to a target, edit the following line.
#
# The value is the number of bytes in the range of 512 to (2^24-1) and
# the default is 262144
node.session.iscsi.FirstBurstLength = 262144

# To specify the maximum SCSI payload that the initiator will
negotiate
# with the target for, edit the following line.
#
# The value is the number of bytes in the range of 512 to (2^24-1) and
# the defauls it 16776192
node.session.iscsi.MaxBurstLength = 16776192

# To specify the maximum number of data bytes the initiator can
receive
# in an iSCSI PDU from a target, edit the following line.
#
# The value is the number of bytes in the range of 512 to (2^24-1) and
# the default is 262144
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144

# To specify the maximum number of data bytes the initiator can
receive
# in an iSCSI PDU from a target during a discovery session, edit the
# following line.
#
# The value is the number of bytes in the range of 512 to (2^24-1) and
# the default is 32768
#
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768

# To allow the targets to control the setting of the digest checking,
# with the initiator requesting a preference of enabling the checking,
uncomment
# the following lines (Data digests are not supported and on ppc/ppc64
# both header and data digests are not supported.):
#node.conn[0].iscsi.HeaderDigest = CRC32C,None
#
# To allow the targets to control the setting of the digest checking,
# with the initiator requesting a preference of disabling the
checking,
# uncomment the following lines:
#node.conn[0].iscsi.HeaderDigest = None,CRC32C
#
# To enable CRC32C digest checking for the header and/or data part of
# iSCSI PDUs, uncomment the following lines:
#node.conn[0].iscsi.HeaderDigest = CRC32C
#
# To disable digest checking for the header and/or data part of
# iSCSI PDUs, uncomment the following lines:
#node.conn[0].iscsi.HeaderDigest = None
#
# The default is to never use DataDigests or HeaderDigests.
#
node.conn[0].iscsi.HeaderDigest = None

#************
# Workarounds
#************

# Some targets like IET prefer after an initiator has sent a task
# management function like an ABORT TASK or LOGICAL UNIT RESET, that
# it does not respond to PDUs like R2Ts. To enable this behavior
uncomment
# the following line (The default behavior is Yes):
# node.session.iscsi.FastAbort = Yes

# Some targets like Equalogic prefer that after an initiator has sent
# a task management function like an ABORT TASK or LOGICAL UNIT RESET,
that
# it continue to respond to R2Ts. To enable this uncomment this line
node.session.iscsi.FastAbort = No


--------------------

/etc/multipath.conf

# This is a basic configuration file with some examples, for device
mapper
# multipath.
# For a complete list of the default configuration values, see
# /usr/share/doc/device-mapper-multipath-0.4.7/multipath.conf.defaults
# For a list of configuration options with descriptions, see
# /usr/share/doc/device-mapper-multipath-0.4.7/
multipath.conf.annotated


# Blacklist all devices by default. Remove this to enable multipathing
# on the default devices.
#blacklist {
#        devnode "*"
#}

## By default, devices with vendor = "IBM" and product = "S/390.*" are
## blacklisted. To enable mulitpathing on these devies, uncomment the
## following lines.
#blacklist_exceptions {
#       device {
#               vendor  "IBM"
#               product "S/390.*"
#       }
#}

## Use user friendly names, instead of using WWIDs as names.
defaults {
        user_friendly_names yes
}
##
## Here is an example of how to configure some standard options.
##
#
#defaults {
#       udev_dir                /dev
#       polling_interval        10
#       selector                "round-robin 0"
#       path_grouping_policy    multibus
#       getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
#       prio_callout            /bin/true
#       path_checker            readsector0
#       rr_min_io               100
#       max_fds                 8192
#       rr_weight               priorities
#       failback                immediate
#       no_path_retry           fail
#       user_friendly_names     yes
#}
##
## The wwid line in the following blacklist section is shown as an
example
## of how to blacklist devices by wwid.  The 2 devnode lines are the
## compiled in default blacklist. If you want to blacklist entire
types
## of devices, such as all scsi devices, you should use a devnode
line.
## However, if you want to blacklist specific devices, you should use
## a wwid line.  Since there is no guarantee that a specific device
will
## not change names on reboot (from /dev/sda to /dev/sdb for example)
## devnode lines are not recommended for blacklisting specific
devices.
##
#blacklist {
#       wwid 26353900f02796769
#       devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
#       devnode "^hd[a-z]"
        devnode sda
#}
blacklist {
        devnode "^sd[a]$"
}

multipaths {
        multipath {
                wwid                    36090a048301075d9f6bca4c3010030d8
                alias                   san-techcms
                path_grouping_policy    multibus
                path_checker            readsector0
                path_selector           "round-robin 0"
                failback                manual
                rr_weight               priorities
                no_path_retry           5
                rr_min_io               10
        }
#       multipath {
#               wwid                    1DEC_____321816758474
#               alias                   red
#       }
}
#devices {
#       device {
#               vendor                  "COMPAQ  "
#               product                 "HSV110 (C)COMPAQ"
#               path_grouping_policy    multibus
#               getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
#               path_checker            readsector0
#               path_selector           "round-robin 0"
#               hardware_handler        "0"
#               failback                15
#               rr_weight               priorities
#               no_path_retry           queue
#       }
#       device {
#               vendor                  "COMPAQ  "
#               product                 "MSA1000         "
#               path_grouping_policy    multibus
#       }
#}


Michael W. Wheeler, OpenVMS, Windows, Macintosh
Systems Support, Tennessee Technological University


-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to