I'm running RHEL5 kernel 2.6.18 in a VM under VMware ESX 4.0.0,
261974. We have several VM that don't have any problem but they are
all running some flavor of Windows. This is our only Linux box using a
targeted iSCSI volume.
# yum info device-mapper-multipath
Loaded plugins: rhnplugin, security
Installed Packages
Name : device-mapper-multipath
Arch : x86_64
Version : 0.4.7
Release : 34.el5_5.6
Size : 6.9 M
Repo : installed
# yum info iscsi-initiator-utils
Loaded plugins: rhnplugin, security
Installed Packages
Name : iscsi-initiator-utils
Arch : x86_64
Version : 6.2.0.871
Release : 0.20.el5_5
Size : 1.9 M
Repo : installed
We are using multipath connections and one of the paths frequently
gets ping timeouts (usually about every 5 - 15 minutes) and then the
connection error which causes a problematic pause in IO while the
multipathing recovers.
The server is our campus web server and is running Apache, mysql, and
Joomla.
I've been a sysadmin for other OSes for many years but am new to Linux
server administration and any help would be greatly appreciated.
Below are more details. If there is anything else I can provide that
would be helpful please let me know:
# dmesg
device-mapper: multipath: Failing path 8:32.
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4355241960, last ping 4355246960, now 4355251960
connection2:0: detected conn error (1011)
device-mapper: multipath: Failing path 8:32.
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4355898393, last ping 4355903393, now 4355908393
connection2:0: detected conn error (1011)
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4357043430, last ping 4357048430, now 4357053430
connection2:0: detected conn error (1011)
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4358334347, last ping 4358339347, now 4358344347
connection2:0: detected conn error (1011)
device-mapper: multipath: Failing path 8:32.
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4358911119, last ping 4358916119, now 4358921119
connection2:0: detected conn error (1011)
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4360295386, last ping 4360300386, now 4360305386
connection2:0: detected conn error (1011)
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4360717303, last ping 4360722303, now 4360727303
connection2:0: detected conn error (1011)
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4361937494, last ping 4361942494, now 4361947494
connection2:0: detected conn error (1011)
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4363713740, last ping 4363718740, now 4363723740
connection2:0: detected conn error (1011)
device-mapper: multipath: Failing path 8:32.
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4364203932, last ping 4364208932, now 4364213932
connection2:0: detected conn error (1011)
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4365150602, last ping 4365155602, now 4365160602
connection2:0: detected conn error (1011)
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4365512045, last ping 4365517045, now 4365522045
connection2:0: detected conn error (1011)
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4365977352, last ping 4365982352, now 4365987352
connection2:0: detected conn error (1011)
connection2:0: detected conn error (1019)
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4366877780, last ping 4366882780, now 4366887780
connection2:0: detected conn error (1011)
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last
rx 4367263655, last ping 4367268655, now 4367273655
connection2:0: detected conn error (1011)
# tail /var/log/messages
Nov 12 10:40:32 techcms kernel: connection2:0: ping timeout of 5 secs
expired, recv timeout 5, last rx 4368811299, last ping 4368816299, now
4368821299
Nov 12 10:40:32 techcms kernel: connection2:0: detected conn error
(1011)
Nov 12 10:40:33 techcms multipathd: sdc: readsector0 checker reports
path is down
Nov 12 10:40:33 techcms multipathd: checker failed path 8:32 in map
san-techcms
Nov 12 10:40:33 techcms multipathd: san-techcms: remaining active
paths: 1
Nov 12 10:40:33 techcms multipathd: dm-2: add map (uevent)
Nov 12 10:40:33 techcms kernel: device-mapper: multipath: Failing path
8:32.
Nov 12 10:40:33 techcms multipathd: dm-2: devmap already registered
Nov 12 10:40:33 techcms iscsid: Kernel reported iSCSI connection 2:0
error (1011) state (3)
Nov 12 10:40:55 techcms multipathd: sdc: readsector0 checker reports
path is up
Nov 12 10:40:55 techcms multipathd: 8:32: reinstated
Nov 12 10:40:55 techcms multipathd: san-techcms: remaining active
paths: 2
Nov 12 10:40:55 techcms multipathd: dm-2: add map (uevent)
Nov 12 10:40:55 techcms multipathd: dm-2: devmap already registered
Nov 12 10:40:55 techcms iscsid: connection2:0 is operational after
recovery (2 attempts)
Nov 12 10:43:25 techcms kernel: connection2:0: ping timeout of 5 secs
expired, recv timeout 5, last rx 4368984192, last ping 4368989192, now
4368994192
Nov 12 10:43:25 techcms kernel: connection2:0: detected conn error
(1011)
Nov 12 10:43:26 techcms iscsid: Kernel reported iSCSI connection 2:0
error (1011) state (3)
Nov 12 10:43:29 techcms iscsid: connection2:0 is operational after
recovery (1 attempts)
# iscsiadm -m session -P3
iSCSI Transport Class version 2.0-871
version 2.0-871
Target: iqn.2001-05.com.equallogic:0-8a0906-d97510304-d8300001c3a4bcf6-
techcms
Current Portal: 192.168.2.55:3260,1
Persistent Portal: 192.168.2.22:3260,1
**********
Interface:
**********
Iface Name: eth1
Iface Transport: tcp
Iface Initiatorname: iqn.2010-04.edu.tntech:techcms.tntech.edu
Iface IPaddress: 192.168.3.232
Iface HWaddress: 00:50:56:81:4C:8E
Iface Netdev: <empty>
SID: 1
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
************************
Negotiated iSCSI params:
************************
HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 262144
MaxXmitDataSegmentLength: 65536
FirstBurstLength: 65536
MaxBurstLength: 262144
ImmediateData: Yes
InitialR2T: No
MaxOutstandingR2T: 1
************************
Attached SCSI devices:
************************
Host Number: 1 State: running
scsi1 Channel 00 Id 0 Lun: 0
Attached scsi disk sdb State: running
Current Portal: 192.168.2.48:3260,1
Persistent Portal: 192.168.2.22:3260,1
**********
Interface:
**********
Iface Name: eth2
Iface Transport: tcp
Iface Initiatorname: iqn.2010-04.edu.tntech:techcms.tntech.edu
Iface IPaddress: 192.168.3.233
Iface HWaddress: 00:50:56:81:02:19
Iface Netdev: <empty>
SID: 2
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
************************
Negotiated iSCSI params:
************************
HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 262144
MaxXmitDataSegmentLength: 65536
FirstBurstLength: 65536
MaxBurstLength: 262144
ImmediateData: Yes
InitialR2T: No
MaxOutstandingR2T: 1
************************
Attached SCSI devices:
************************
Host Number: 2 State: running
scsi2 Channel 00 Id 0 Lun: 0
Attached scsi disk sdc State: running
Here's a brief dstat session that might give you an idea of typical
data rates but of course they can vary quite a bit.
# dstat -cndmD total,hda,sda -N eth1,eth2
----total-cpu-usage---- --net/eth1- --net/eth2- -dsk/total- --dsk/
hda-- --dsk/sda-- ------memory-usage-----
usr sys idl wai hiq siq| recv send: recv send| read writ: read
writ: read writ| used buff cach free
17 1 81 1 0 0| 0 0 : 0 0 | 180k 211k: 5.4B
0 : 32k 18k|2769M 242M 9135M 3665M
24 1 74 0 0 0|9424B 495k:8284B 489k| 0 972k: 0
0 : 0 0 |2769M 242M 9135M 3664M
23 3 74 0 0 1| 604B 0 : 604B 0 | 0 0 : 0
0 : 0 0 |2771M 242M 9135M 3662M
2 0 98 0 0 0| 0 0 : 0 0 | 0 0 : 0
0 : 0 0 |2771M 242M 9135M 3662M
1 0 99 0 0 0| 604B 0 : 604B 0 | 0 512k: 0
0 : 0 256k|2771M 242M 9135M 3662M
5 0 94 0 0 0| 906B 0 : 906B 0 | 0 0 : 0
0 : 0 0 |2773M 242M 9135M 3660M
15 1 84 0 0 0| 114B 180B: 806B 246B| 0 0 : 0
0 : 0 0 |2820M 242M 9135M 3613M
0 0 99 0 0 0|8162B 359k:7130B 332k| 0 680k: 0
0 : 0 0 |2820M 242M 9135M 3613M
11 0 88 0 0 0| 0 0 : 0 0 | 0 0 : 0
0 : 0 0 |2821M 242M 9135M 3612M
24 2 75 0 0 0| 626B 180B: 0 0 | 0 56k: 0
0 : 0 28k|2821M 242M 9135M 3612M
24 2 74 0 0 0| 906B 0 : 906B 0 | 0 0 : 0
0 : 0 0 |2822M 242M 9135M 3611M
13 0 86 0 0 0| 664B 0 : 25k 312B| 24k 0 : 0
0 : 0 0 |2823M 242M 9135M 3610M
24 1 74 0 0 1| 22k 820k: 26k 816k| 0 1596k: 0
0 : 0 0 |2823M 242M 9135M 3610M
20 3 76 0 0 1| 302B 0 : 302B 0 | 0 0 : 0
0 : 0 0 |2871M 242M 9135M 3562M
10 1 90 0 0 0| 0 0 : 0 0 | 0 40k: 0
0 : 0 20k|2871M 242M 9135M 3562M
19 2 79 0 0 0| 604B 0 : 604B 0 | 0 0 : 0
0 : 0 0 |2872M 242M 9135M 3561M
12 1 87 0 0 0| 0 0 : 0 0 | 0 0 : 0
0 : 0 0 |2872M 242M 9135M 3561M
1 0 98 0 0 0|6324B 405k:8370B 415k| 0 808k: 0
0 : 0 0 |2872M 242M 9135M 3561M
19 1 79 0 0 0| 92B 0 : 92B 0 | 0 0 : 0
0 : 0 0 |2874M 242M 9135M 3559M
10 1 90 0 0 0| 788B 0 : 788B 0 | 0 0 : 0
0 : 0 0 |2874M 242M 9135M 3558M
1 0 99 0 0 0| 0 0 : 0 0 | 0 40k: 0
0 : 0 20k|2874M 242M 9135M 3558M
0 0 100 0 0 0| 604B 0 : 604B 0 | 0 0 : 0
0 : 0 0 |2874M 242M 9135M 3558M
0 0 99 0 0 0|5244B 227k:4308B 227k| 0 1024k: 0
0 : 0 288k|2874M 242M 9135M 3559M
/etc/iscsi/iscsid.conf -- I modified the FastAbort setting at the end
in an effort to resolve these problems but it has not helped.
#
# Open-iSCSI default configuration.
# Could be located at /etc/iscsi/iscsid.conf or ~/.iscsid.conf
#
# Note: To set any of these values for a specific node/session run
# the iscsiadm --mode node --op command for the value. See the README
# and man page for iscsiadm for details on the --op command.
#
################
# iSNS settings
################
# Address of iSNS server
#isns.address = 192.168.0.1
#isns.port = 3205
#############################
# NIC/HBA and driver settings
#############################
# open-iscsi can create a session and bind it to a NIC/HBA.
# To set this up see the example iface config file.
#*****************
# Startup settings
#*****************
# To request that the iscsi initd scripts startup a session set to
"automatic".
# node.startup = automatic
#
# To manually startup the session set to "manual". The default is
automatic.
node.startup = automatic
# *************
# CHAP Settings
# *************
# To enable CHAP authentication set node.session.auth.authmethod
# to CHAP. The default is None.
#node.session.auth.authmethod = CHAP
# To set a CHAP username and password for initiator
# authentication by the target(s), uncomment the following lines:
#node.session.auth.username = username
#node.session.auth.password = password
# To set a CHAP username and password for target(s)
# authentication by the initiator, uncomment the following lines:
#node.session.auth.username_in = username_in
#node.session.auth.password_in = password_in
# To enable CHAP authentication for a discovery session to the target
# set discovery.sendtargets.auth.authmethod to CHAP. The default is
None.
#discovery.sendtargets.auth.authmethod = CHAP
# To set a discovery session CHAP username and password for the
initiator
# authentication by the target(s), uncomment the following lines:
#discovery.sendtargets.auth.username = username
#discovery.sendtargets.auth.password = password
# To set a discovery session CHAP username and password for target(s)
# authentication by the initiator, uncomment the following lines:
#discovery.sendtargets.auth.username_in = username_in
#discovery.sendtargets.auth.password_in = password_in
# ********
# Timeouts
# ********
#
# See the iSCSI REAME's Advanced Configuration section for tips
# on setting timeouts when using multipath or doing root over iSCSI.
#
# To specify the length of time to wait for session re-establishment
# before failing SCSI commands back to the application when running
# the Linux SCSI Layer error handler, edit the line.
# The value is in seconds and the default is 120 seconds.
node.session.timeo.replacement_timeout = 120
# To specify the time to wait for login to complete, edit the line.
# The value is in seconds and the default is 15 seconds.
node.conn[0].timeo.login_timeout = 15
# To specify the time to wait for logout to complete, edit the line.
# The value is in seconds and the default is 15 seconds.
node.conn[0].timeo.logout_timeout = 15
# Time interval to wait for on connection before sending a ping.
node.conn[0].timeo.noop_out_interval = 5
# To specify the time to wait for a Nop-out response before failing
# the connection, edit this line. Failing the connection will
# cause IO to be failed back to the SCSI layer. If using dm-multipath
# this will cause the IO to be failed to the multipath layer.
node.conn[0].timeo.noop_out_timeout = 5
# To specify the time to wait for abort response before
# failing the operation and trying a logical unit reset edit the line.
# The value is in seconds and the default is 15 seconds.
node.session.err_timeo.abort_timeout = 15
# To specify the time to wait for a logical unit response
# before failing the operation and trying session re-establishment
# edit the line.
# The value is in seconds and the default is 30 seconds.
node.session.err_timeo.lu_reset_timeout = 20
#******
# Retry
#******
# To specify the number of times iscsid should retry a login
# if the login attempt fails due to the
node.conn[0].timeo.login_timeout
# expiring modify the following line. Note that if the login fails
# quickly (before node.conn[0].timeo.login_timeout fires) because the
network
# layer or the target returns an error, iscsid may retry the login
more than
# node.session.initial_login_retry_max times.
#
# This retry count along with node.conn[0].timeo.login_timeout
# determines the maximum amount of time iscsid will try to
# establish the initial login. node.session.initial_login_retry_max is
# multiplied by the node.conn[0].timeo.login_timeout to determine the
# maximum amount.
#
# The default node.session.initial_login_retry_max is 8 and
# node.conn[0].timeo.login_timeout is 15 so we have:
#
# node.conn[0].timeo.login_timeout *
node.session.initial_login_retry_max =
# 120 seconds
#
# Valid values are any integer value. This only
# affects the initial login. Setting it to a high value can slow
# down the iscsi service startup. Setting it to a low value can
# cause a session to not get logged into, if there are distuptions
# during startup or if the network is not ready at that time.
node.session.initial_login_retry_max = 8
################################
# session and device queue depth
################################
# To control how many commands the session will queue set
# node.session.cmds_max to an integer between 2 and 2048 that is also
# a power of 2. The default is 128.
node.session.cmds_max = 128
# To control the device's queue depth set node.session.queue_depth
# to a value between 1 and 1024. The default is 32.
node.session.queue_depth = 32
#***************
# iSCSI settings
#***************
# To enable R2T flow control (i.e., the initiator must wait for an R2T
# command before sending any data), uncomment the following line:
#
#node.session.iscsi.InitialR2T = Yes
#
# To disable R2T flow control (i.e., the initiator has an implied
# initial R2T of "FirstBurstLength" at offset 0), uncomment the
following line:
#
# The defaults is No.
node.session.iscsi.InitialR2T = No
#
# To disable immediate data (i.e., the initiator does not send
# unsolicited data with the iSCSI command PDU), uncomment the
following line:
#
#node.session.iscsi.ImmediateData = No
#
# To enable immediate data (i.e., the initiator sends unsolicited data
# with the iSCSI command packet), uncomment the following line:
#
# The default is Yes
node.session.iscsi.ImmediateData = Yes
# To specify the maximum number of unsolicited data bytes the
initiator
# can send in an iSCSI PDU to a target, edit the following line.
#
# The value is the number of bytes in the range of 512 to (2^24-1) and
# the default is 262144
node.session.iscsi.FirstBurstLength = 262144
# To specify the maximum SCSI payload that the initiator will
negotiate
# with the target for, edit the following line.
#
# The value is the number of bytes in the range of 512 to (2^24-1) and
# the defauls it 16776192
node.session.iscsi.MaxBurstLength = 16776192
# To specify the maximum number of data bytes the initiator can
receive
# in an iSCSI PDU from a target, edit the following line.
#
# The value is the number of bytes in the range of 512 to (2^24-1) and
# the default is 262144
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
# To specify the maximum number of data bytes the initiator can
receive
# in an iSCSI PDU from a target during a discovery session, edit the
# following line.
#
# The value is the number of bytes in the range of 512 to (2^24-1) and
# the default is 32768
#
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
# To allow the targets to control the setting of the digest checking,
# with the initiator requesting a preference of enabling the checking,
uncomment
# the following lines (Data digests are not supported and on ppc/ppc64
# both header and data digests are not supported.):
#node.conn[0].iscsi.HeaderDigest = CRC32C,None
#
# To allow the targets to control the setting of the digest checking,
# with the initiator requesting a preference of disabling the
checking,
# uncomment the following lines:
#node.conn[0].iscsi.HeaderDigest = None,CRC32C
#
# To enable CRC32C digest checking for the header and/or data part of
# iSCSI PDUs, uncomment the following lines:
#node.conn[0].iscsi.HeaderDigest = CRC32C
#
# To disable digest checking for the header and/or data part of
# iSCSI PDUs, uncomment the following lines:
#node.conn[0].iscsi.HeaderDigest = None
#
# The default is to never use DataDigests or HeaderDigests.
#
node.conn[0].iscsi.HeaderDigest = None
#************
# Workarounds
#************
# Some targets like IET prefer after an initiator has sent a task
# management function like an ABORT TASK or LOGICAL UNIT RESET, that
# it does not respond to PDUs like R2Ts. To enable this behavior
uncomment
# the following line (The default behavior is Yes):
# node.session.iscsi.FastAbort = Yes
# Some targets like Equalogic prefer that after an initiator has sent
# a task management function like an ABORT TASK or LOGICAL UNIT RESET,
that
# it continue to respond to R2Ts. To enable this uncomment this line
node.session.iscsi.FastAbort = No
--------------------
/etc/multipath.conf
# This is a basic configuration file with some examples, for device
mapper
# multipath.
# For a complete list of the default configuration values, see
# /usr/share/doc/device-mapper-multipath-0.4.7/multipath.conf.defaults
# For a list of configuration options with descriptions, see
# /usr/share/doc/device-mapper-multipath-0.4.7/
multipath.conf.annotated
# Blacklist all devices by default. Remove this to enable multipathing
# on the default devices.
#blacklist {
# devnode "*"
#}
## By default, devices with vendor = "IBM" and product = "S/390.*" are
## blacklisted. To enable mulitpathing on these devies, uncomment the
## following lines.
#blacklist_exceptions {
# device {
# vendor "IBM"
# product "S/390.*"
# }
#}
## Use user friendly names, instead of using WWIDs as names.
defaults {
user_friendly_names yes
}
##
## Here is an example of how to configure some standard options.
##
#
#defaults {
# udev_dir /dev
# polling_interval 10
# selector "round-robin 0"
# path_grouping_policy multibus
# getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
# prio_callout /bin/true
# path_checker readsector0
# rr_min_io 100
# max_fds 8192
# rr_weight priorities
# failback immediate
# no_path_retry fail
# user_friendly_names yes
#}
##
## The wwid line in the following blacklist section is shown as an
example
## of how to blacklist devices by wwid. The 2 devnode lines are the
## compiled in default blacklist. If you want to blacklist entire
types
## of devices, such as all scsi devices, you should use a devnode
line.
## However, if you want to blacklist specific devices, you should use
## a wwid line. Since there is no guarantee that a specific device
will
## not change names on reboot (from /dev/sda to /dev/sdb for example)
## devnode lines are not recommended for blacklisting specific
devices.
##
#blacklist {
# wwid 26353900f02796769
# devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
# devnode "^hd[a-z]"
devnode sda
#}
blacklist {
devnode "^sd[a]$"
}
multipaths {
multipath {
wwid 36090a048301075d9f6bca4c3010030d8
alias san-techcms
path_grouping_policy multibus
path_checker readsector0
path_selector "round-robin 0"
failback manual
rr_weight priorities
no_path_retry 5
rr_min_io 10
}
# multipath {
# wwid 1DEC_____321816758474
# alias red
# }
}
#devices {
# device {
# vendor "COMPAQ "
# product "HSV110 (C)COMPAQ"
# path_grouping_policy multibus
# getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
# path_checker readsector0
# path_selector "round-robin 0"
# hardware_handler "0"
# failback 15
# rr_weight priorities
# no_path_retry queue
# }
# device {
# vendor "COMPAQ "
# product "MSA1000 "
# path_grouping_policy multibus
# }
#}
Michael W. Wheeler, OpenVMS, Windows, Macintosh
Systems Support, Tennessee Technological University
--
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/open-iscsi?hl=en.