Dear open-iscsi gurus...

I'm working with an IBM DS3300 storage system accessible via open-
iscsi technology to many different clients. To give you the proper
context before describing my problem, I'll introduce my setup:


*** The target setup ***

- IBM DS3300 storage system as ISCSI target with 12 SATA disks.
- 2 ISCSI controllers taking care of 2 network interfaces each (with
Jumbo frames enabled).
- Each controller owns a Raid 10 with one logical unit / partition
each


*** The initiators setup (with network interfaces with Jumbo frames
enabled) ***

- OS:
[r...@core26]# cat /etc/redhat-release
Scientific Linux SL release 5.2 (Boron)

- Kernel:
[r...@core26]# uname -a
Linux core26.ncg.ingrid.pt 2.6.18-92.1.22.el5xen #1 SMP Tue Dec 16
07:06:23 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

- Open ISCSI version:
[r...@core26]# rpm -qa | grep iscsi
iscsi-initiator-utils-6.2.0.868-0.7.el5

- Timeouts configuration for iscsi ( /etc/iscsi/iscsi.conf)
node.session.timeo.replacement_timeout = 60
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.noop_out_interval = 10
node.conn[0].timeo.noop_out_timeout = 15

- Multipath version:
[r...@core26 iozone3_318]# rpm -qa | grep multipath
device-mapper-multipath-0.4.7-17.el5

- RDAC plugin for the MD-multipath for path discover (as intructed by
IBM docs). The defaults section of the /etc/multipath.conf file is:

defaults {
        polling_interval        30
        rr_min_io               100
        failback                immediate
        no_path_retry           5
        user_friendly_names     yes
        selector                "round-robin 0"
        features                "1 queue_if_no_path"
        device {
                vendor                  "IBM"
                product                 "1726-3xx  FAStT"
                path_checker            rdac
                path_grouping_policy    group_by_prio
                prio_callout            "/sbin/mpath_prio_rdac /dev/
%n"
                hardware_handler        "1 rdac"
        }
}

*** The problem ***

- I'm running 6 iozone clients (a famous I/O benchmark) from 6
different initiators. From these 6 initiators, 3 will do some I/O in
one partition and the other 3 will do some I/O in the other partition,
everything done simultaneously.

- Each iozone starts 96 threads of 1 GB each. In total, I'm trying to
measure the sequential and random read/write performance of the ISCSI
target with files adding 288 GB per partition.

- Some hours after starting the test, I see in /var/log/messages a lot
of messages like:

Mar 31 23:19:30 core26 kernel: attempt to access beyond end of device
Mar 31 23:19:30 core26 kernel: dm-10: rw=1, want=4311810312,
limit=976562467
Mar 31 23:19:30 core26 kernel: Buffer I/O error on device dm-10,
logical block 538976288
Mar 31 23:19:30 core26 kernel: lost page write due to I/O error on
dm-10
Mar 31 23:19:30 core26 kernel: attempt to access beyond end of device
Mar 31 23:19:30 core26 kernel: dm-10: rw=1, want=4311810312,
limit=976562467
Mar 31 23:19:30 core26 kernel: Buffer I/O error on device dm-10,
logical block 538976288
Mar 31 23:19:30 core26 kernel: lost page write due to I/O error on
dm-10

- Using the "dmsetup ls" one can see that device dm-10 relates to one
of the DS3000 Raid10 partitions (called iscsi06-apoio2p1):
[r...@core26 ~]# dmsetup ls
iscsi06-apoio2p1        (253, 10)

- Afterwards, I see some new messages in /var/log/messages like

Apr  1 11:44:13 core26 kernel: ping timeout of 15 secs expired, last
rx 4310042910, last ping 4310045420, now 4310049160
Apr  1 11:44:13 core26 kernel:  connection6:0: iscsi: detected conn
error (1011)
Apr  1 11:44:13 core26 kernel: 122 [RAIDarray.mpp]iscsi06:1:0:1
Controller IO time expired. Delta 43701 secs
Apr  1 11:44:13 core26 kernel: 497 [RAIDarray.mpp]iscsi06:1:0:1 Failed
controller to 0. retry. vcmnd SN 458970 pdev H6:C0:T0:L1
0x00/0x00/0x00 0x00020000 mpp_status:2
Apr  1 11:44:13 core26 kernel: 10 [RAIDarray.mpp]iscsi06:0 Failover
command issued
Apr  1 11:44:13 core26 kernel: 122 [RAIDarray.mpp]iscsi06:1:0:1
Controller IO time expired. Delta 43701 secs
Apr  1 11:44:13 core26 kernel: 497 [RAIDarray.mpp]iscsi06:1:0:1 Failed
controller to 0. retry. vcmnd SN 458968 pdev H6:C0:T0:L 1
0x00/0x00/0x00 0x00020000 mpp_status:2
Apr  1 11:44:13 core26 kernel: 122 [RAIDarray.mpp]iscsi06:1:0:1
Controller IO time expired. Delta 43702 secs
Apr  1 11:44:13 core26 kernel: 497 [RAIDarray.mpp]iscsi06:1:0:1 Failed
controller to 0. retry. vcmnd SN 458957 pdev H6:C0:T0:L1
0x00/0x00/0x00 0x00020000 mpp_status:2
Apr  1 11:44:13 core26 kernel: 122 [RAIDarray.mpp]iscsi06:1:0:1
Controller IO time expired. Delta 43703 secs
Apr  1 11:44:13 core26 kernel: 497 [RAIDarray.mpp]iscsi06:1:0:1 Failed
controller to 0. retry. vcmnd SN 458955 pdev H6:C0:T0:L1
0x00/0x00/0x00 0x00020000 mpp_status:2
Apr  1 11:44:13 core26 kernel: 122 [RAIDarray.mpp]iscsi06:1:0:1
Controller IO time expired. Delta 43706 secs
Apr  1 11:44:13 core26 kernel: 497 [RAIDarray.mpp]iscsi06:1:0:1 Failed
controller to 0. retry. vcmnd SN 458916 pdev H6:C0:T0:L1
0x00/0x00/0x00 0x00020000 mpp_status:2
Apr  1 11:44:13 core26 kernel: 122 [RAIDarray.mpp]iscsi06:1:0:1
Controller IO time expired. Delta 43706 secs
Apr  1 11:44:13 core26 kernel: 497 [RAIDarray.mpp]iscsi06:1:0:1 Failed
controller to 0. retry. vcmnd SN 458915 pdev H6:C0:T0:L1
0x00/0x00/0x00 0x00020000 mpp_status:2
Apr  1 11:44:13 core26 kernel: 122 [RAIDarray.mpp]iscsi06:1:0:1
Controller IO time expired. Delta 43709 secs
Apr  1 11:44:13 core26 kernel: 497 [RAIDarray.mpp]iscsi06:1:0:1 Failed
controller to 0. retry. vcmnd SN 458836 pdev H6:C0:T0:L1
0x00/0x00/0x00 0x00020000 mpp_status:2
Apr  1 11:44:13 core26 kernel: 122 [RAIDarray.mpp]iscsi06:1:0:1
Controller IO time expired. Delta 43709 secs
Apr  1 11:44:13 core26 kernel: 497 [RAIDarray.mpp]iscsi06:1:0:1 Failed
controller to 0. retry. vcmnd SN 458835 pdev H6:C0:T0:L1
0x00/0x00/0x00 0x00020000 mpp_status:2
Apr  1 11:44:13 core26 kernel: ping timeout of 15 secs expired, last
rx 4310043020, last ping 4310045590, now 4310049270
Apr  1 11:44:13 core26 kernel:  connection8:0: iscsi: detected conn
error (1011)
Apr  1 11:44:14 core26 kernel: 122 [RAIDarray.mpp]iscsi06:1:1:1
Controller IO time expired. Delta 43702 secs
Apr  1 11:44:14 core26 kernel: 497 [RAIDarray.mpp]iscsi06:1:1:1 Failed
controller to 0. retry. vcmnd SN 458969 pdev H8:C0:T0:L1
0x00/0x00/0x00 0x00020000 mpp_status:2
Apr  1 11:44:14 core26 kernel: 122 [RAIDarray.mpp]iscsi06:1:1:1
Controller IO time expired. Delta 43702 secs
Apr  1 11:44:14 core26 kernel: 497 [RAIDarray.mpp]iscsi06:1:1:1 Failed
controller to 0. retry. vcmnd SN 458967 pdev H8:C0:T0:L1
0x00/0x00/0x00 0x00020000 mpp_status:2
Apr  1 11:44:14 core26 kernel: 122 [RAIDarray.mpp]iscsi06:1:1:1
Controller IO time expired. Delta 43702 secs
Apr  1 11:44:14 core26 kernel: 497 [RAIDarray.mpp]iscsi06:1:1:1 Failed
controller to 0. retry. vcmnd SN 458966 pdev H8:C0:T0:L1
0x00/0x00/0x00 0x00020000 mpp_status:2
Apr  1 11:44:14 core26 kernel: 122 [RAIDarray.mpp]iscsi06:1:1:1
Controller IO time expired. Delta 43703 secs
Apr  1 11:44:14 core26 kernel: 497 [RAIDarray.mpp]iscsi06:1:1:1 Failed
controller to 0. retry. vcmnd SN 458956 pdev H8:C0:T0:L1
0x00/0x00/0x00 0x00020000 mpp_status:2
Apr  1 11:44:14 core26 kernel: 122 [RAIDarray.mpp]iscsi06:1:1:1
Controller IO time expired. Delta 43707 secs
Apr  1 11:44:14 core26 kernel: 497 [RAIDarray.mpp]iscsi06:1:1:1 Failed
controller to 0. retry. vcmnd SN 458920 pdev H8:C0:T0:L1
0x00/0x00/0x00 0x00020000 mpp_status:2
Apr  1 11:44:14 core26 kernel: 122 [RAIDarray.mpp]iscsi06:1:1:1
Controller IO time expired. Delta 43707 secs
Apr  1 11:44:14 core26 kernel: 497 [RAIDarray.mpp]iscsi06:1:1:1 Failed
controller to 0. retry. vcmnd SN 458919 pdev H8:C0:T0:L1
0x00/0x00/0x00 0x00020000 mpp_status:2
Apr  1 11:44:14 core26 iscsid: Kernel reported iSCSI connection 8:0
error (1011) state (3)
Apr  1 11:44:14 core26 kernel: 128 [RAIDarray.mpp]iscsi06:0:1:1 Array
IO time expired. Delta 43702 secs
Apr  1 11:44:14 core26 kernel: 492 [RAIDarray.mpp]iscsi06:0:1:1 IO
FAILURE. vcmnd SN 458968 pdev H7:C0:T0:L1 0x05/0x94/0x01 0x08000002
mpp_status:1
Apr  1 11:44:14 core26 kernel: 128 [RAIDarray.mpp]iscsi06:0:0:1 Array
IO time expired. Delta 43702 secs
Apr  1 11:44:14 core26 kernel: 492 [RAIDarray.mpp]iscsi06:0:0:1 IO
FAILURE. vcmnd SN 458970 pdev H5:C0:T0:L1 0x05/0x94/0x01 0x08000002
mpp_status:1
Apr  1 11:44:14 core26 kernel: 128 [RAIDarray.mpp]iscsi06:0:1:1 Array
IO time expired. Delta 43704 secs
Apr  1 11:44:14 core26 kernel: 492 [RAIDarray.mpp]iscsi06:0:1:1 IO
FAILURE. vcmnd SN 458955 pdev H7:C0:T0:L1 0x05/0x94/0x01 0x08000002
mpp_status:1
Apr  1 11:44:14 core26 kernel: 128 [RAIDarray.mpp]iscsi06:0:0:1 Array
IO time expired. Delta 43703 secs
Apr  1 11:44:14 core26 kernel: 492 [RAIDarray.mpp]iscsi06:0:0:1 IO
FAILURE. vcmnd SN 458957 pdev H5:C0:T0:L1 0x05/0x94/0x01 0x08000002
mpp_status:1
Apr  1 11:44:14 core26 kernel: 128 [RAIDarray.mpp]iscsi06:0:0:1 Array
IO time expired. Delta 43707 secs
Apr  1 11:44:14 core26 kernel: 492 [RAIDarray.mpp]iscsi06:0:0:1 IO
FAILURE. vcmnd SN 458916 pdev H5:C0:T0:L1 0x05/0x94/0x01 0x08000002
mpp_status:1
Apr  1 11:44:14 core26 kernel: 128 [RAIDarray.mpp]iscsi06:0:1:1 Array
IO time expired. Delta 43707 secs
Apr  1 11:44:14 core26 kernel: 492 [RAIDarray.mpp]iscsi06:0:1:1 IO
FAILURE. vcmnd SN 458915 pdev H7:C0:T0:L1 0x05/0x94/0x01 0x08000002
mpp_status:1
Apr  1 11:44:14 core26 kernel: 128 [RAIDarray.mpp]iscsi06:0:0:1 Array
IO time expired. Delta 43710 secs
Apr  1 11:44:14 core26 kernel: 492 [RAIDarray.mpp]iscsi06:0:0:1 IO
FAILURE. vcmnd SN 458836 pdev H5:C0:T0:L1 0x05/0x94/0x01 0x08000002
mpp_status:1
Apr  1 11:44:14 core26 kernel: 128 [RAIDarray.mpp]iscsi06:0:1:1 Array
IO time expired. Delta 43710 secs
Apr  1 11:44:14 core26 kernel: 492 [RAIDarray.mpp]iscsi06:0:1:1 IO
FAILURE. vcmnd SN 458835 pdev H7:C0:T0:L1 0x05/0x94/0x01 0x08000002
mpp_status:1
Apr  1 11:44:14 core26 kernel: end_request: I/O error, dev sdc, sector
290
Apr  1 11:44:14 core26 kernel: device-mapper: multipath: Failing path
8:32.
Apr  1 11:44:14 core26 kernel: end_request: I/O error, dev sdc, sector
290
Apr  1 11:44:14 core26 last message repeated 6 times
Apr  1 11:44:14 core26 kernel: 128 [RAIDarray.mpp]iscsi06:0:0:1 Array
IO time expired. Delta 43702 secs
Apr  1 11:44:14 core26 kernel: 492 [RAIDarray.mpp]iscsi06:0:0:1 IO
FAILURE. vcmnd SN 458969 pdev H5:C0:T0:L1 0x05/0x94/0x01 0x08000002
mpp_status:1
Apr  1 11:44:14 core26 kernel: end_request: I/O error, dev sdc, sector
290
Apr  1 11:44:14 core26 kernel: 128 [RAIDarray.mpp]iscsi06:0:1:1 Array
IO time expired. Delta 43702 secs
Apr  1 11:44:14 core26 kernel: 492 [RAIDarray.mpp]iscsi06:0:1:1 IO
FAILURE. vcmnd SN 458967 pdev H7:C0:T0:L1 0x05/0x94/0x01 0x08000002
mpp_status:1
Apr  1 11:44:15 core26 kernel: end_request: I/O error, dev sdc, sector
290
Apr  1 11:44:15 core26 kernel: 128 [RAIDarray.mpp]iscsi06:0:0:1 Array
IO time expired. Delta 43702 secs
Apr  1 11:44:15 core26 kernel: 492 [RAIDarray.mpp]iscsi06:0:0:1 IO
FAILURE. vcmnd SN 458966 pdev H5:C0:T0:L1 0x05/0x94/0x01 0x08000002
mpp_status:1

And some more...

Apr  1 11:46:04 core26 iscsid: connect failed (113)
Apr  1 11:46:05 core26 kernel: attempt to access beyond end of device
Apr  1 11:46:05 core26 kernel: dm-10: rw=1, want=4311810312,
limit=976562467
Apr  1 11:46:05 core26 kernel: attempt to access beyond end of device
Apr  1 11:46:06 core26 kernel: dm-10: rw=1, want=4311810312,
limit=976562467
Apr  1 11:46:06 core26 kernel: attempt to access beyond end of device
Apr  1 11:46:06 core26 kernel: dm-10: rw=1, want=4311810312,
limit=976562467
Apr  1 11:46:06 core26 kernel: attempt to access beyond end of device
Apr  1 11:46:06 core26 kernel: dm-10: rw=1, want=4311810312,
limit=976562467
Apr  1 11:46:07 core26 iscsid: connect failed (113)
Apr  1 11:46:08 core26 kernel: attempt to access beyond end of device
Apr  1 11:46:09 core26 kernel: dm-10: rw=1, want=4311810312,
limit=976562467
Apr  1 11:46:10 core26 iscsid: connect failed (113)
Apr  1 11:46:10 core26 kernel: 167 [RAIDarray.mpp]iscsi06:0 Controller
failover retried (0x5/0x91/0x36)
Apr  1 11:46:10 core26 kernel: 10 [RAIDarray.mpp]iscsi06:0 Failover
command issued
Apr  1 11:46:10 core26 kernel: 167 [RAIDarray.mpp]iscsi06:0 Controller
failover retried (0x5/0x91/0x36)
Apr  1 11:46:10 core26 kernel: 10 [RAIDarray.mpp]iscsi06:0 Failover
command issued
Apr  1 11:46:10 core26 kernel: 167 [RAIDarray.mpp]iscsi06:0 Controller
failover retried (0x5/0x91/0x36)
Apr  1 11:46:10 core26 kernel: 10 [RAIDarray.mpp]iscsi06:0 Failover
command issued
Apr  1 11:46:10 core26 kernel:  session6: iscsi: session recovery
timed out after 60 secs
Apr  1 11:46:10 core26 kernel: iscsi: cmd 0x5a is not queued (8)
Apr  1 11:46:10 core26 kernel: iscsi: cmd 0x12 is not queued (8)
Apr  1 11:46:10 core26 kernel: iscsi: cmd 0x5a is not queued (8)
Apr  1 11:46:10 core26 kernel: iscsi: cmd 0x2a is not queued (8)

And this goes on and on (I can provide the full logs if needed).

- The final result is that the partition end up in read-only mode.

In conclusion, it seems that at a given time there are attempts to
access beyond end of device. I don't know who is the guilty guy, if
the kernel itself if the iscsi framework. Then the ISCSI controller
start to fail also with IO expired messages.

Do you have suggestion of what can I be doing wrong?

Thanks in Advance
Cheers
Goncalo Borges


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to