Hello,

I've simulated a remote network failure by adding a blackhole route for one of 
the 
remote iSCSI ports, watching the kernel messages of SLES 10 SP1 (open-iscsi-
2.0.707-0.32). I tried to follow the configuration guidelines in README for 
using 
multipath. The host sees three target LUNs, each reachable via 16 paths like 
this:

two NICs * two iSCSI boxes * two FC adapters * two storage controllers

I feel that some error messages would benefit from being throttled or more 
specific:

It all starts with this:
iscsid: cannot make a connection to 172.20.76.1:3260 (101)
kernel:  connection224:0: iscsi: detected conn error (1011)
kernel:  connection225:0: iscsi: detected conn error (1011)
kernel:  connection226:0: iscsi: detected conn error (1011)
kernel:  connection227:0: iscsi: detected conn error (1011)
[...more and repeating about every second...]
iscsid: detected iSCSI connection 224:0 error (1011) state (3)
iscsid: cannot make a connection to 172.20.76.1:3260 (101)
[...]
multipathd: sdaa: tur checker reports path is down
multipathd: checker failed path 65:160 in map EVA-2_iSCSI-1
multipathd: EVA-2_iSCSI-1: remaining active paths: 15
iscsid: detected iSCSI connection 222:0 error (1011) state (3)
iscsid: cannot make a connection to 172.20.76.1:3260 (101)
iscsid: cannot make a connection to 172.20.76.1:3260 (101)
iscsid: cannot make a connection to 172.20.76.1:3260 (101)
iscsid: cannot make a connection to 172.20.76.1:3260 (101)
iscsid: cannot make a connection to 172.20.76.1:3260 (101)
iscsid: cannot make a connection to 172.20.76.1:3260 (101)
iscsid: cannot make a connection to 172.20.76.1:3260 (101)
iscsid: cannot make a connection to 172.20.76.1:3260 (101)
[ here the attempts for a connection appeared within two seconds, poosibly for 
different targets/LUNs, but the message doesn't tell which. There are at least 
one 
hundred of these messages within a very short time ]
kernel:  session223: iscsi: session recovery timed out after 120 secs
multipathd: sdab: tur checker reports path is down
kernel: iscsi: cmd 0x0 is not queued (8)
kernel: device-mapper: dm-multipath: Failing path 65:176.
multipathd: checker failed path 65:176 in map EVA-2_iSCSI-1
[ In the message from iscsi above a few details about the "owner" of the 
command 
being queued would make much sense IMHO ]
[...]
iscsid: cannot make a connection to 172.20.76.1:3260 (101)
iscsid: cannot make a connection to 172.20.76.1:3260 (101)
kernel: iscsi: cmd 0x0 is not queued (8)
kernel: iscsi: cmd 0x0 is not queued (8)
kernel: iscsi: cmd 0x0 is not queued (8)
kernel: iscsi: cmd 0x0 is not queued (8)
kernel: iscsi: cmd 0x0 is not queued (8)
kernel: iscsi: cmd 0x0 is not queued (8)
kernel: iscsi: cmd 0x0 is not queued (8)
kernel: iscsi: cmd 0x0 is not queued (8)
kernel: iscsi: cmd 0x0 is not queued (8)
kernel: iscsi: cmd 0x0 is not queued (8)
kernel: iscsi: cmd 0x0 is not queued (8)
kernel: iscsi: cmd 0x0 is not queued (8)
multipathd: sdaa: tur checker reports path is down
multipathd: sdab: tur checker reports path is down
multipathd: sdac: tur checker reports path is down
multipathd: sdad: tur checker reports path is down
multipathd: sdae: tur checker reports path is down
[...]
[ Here the problem mentioned before becomes obvious. I'm wondering why the 
syslog 
daemon does not say "last message repeated # times"; the iscsi messages all 
arrived in the same second ] 

[ the other thing is that multipathd REPEATEDLY reports the paths being down, 
i.e. 
it reports the STATE, not the TRANSITION of the state.: ]
kernel: iscsi: cmd 0x0 is not queued (8)
kernel: iscsi: cmd 0x0 is not queued (8)
kernel: iscsi: cmd 0x0 is not queued (8)
kernel: iscsi: cmd 0x0 is not queued (8)
multipathd: sdaa: tur checker reports path is down
multipathd: sdab: tur checker reports path is down
multipathd: sdac: tur checker reports path is down
multipathd: sdad: tur checker reports path is down
multipathd: sdae: tur checker reports path is down
[...]
[ The other amazing thing is that the path checker still reports a "down" three 
seconds after iSCSI detected a successful connection: ]
iscsid: connection222:0 is operational after recovery (83 attempts)
iscsid: connection223:0 is operational after recovery (84 attempts)
iscsid: connection227:0 is operational after recovery (84 attempts)
iscsid: connection229:0 is operational after recovery (84 attempts)
iscsid: connection224:0 is operational after recovery (84 attempts)
iscsid: connection225:0 is operational after recovery (84 attempts)
iscsid: connection226:0 is operational after recovery (84 attempts)
iscsid: connection228:0 is operational after recovery (84 attempts)
multipathd: sdav: tur checker reports path is down
multipathd: sdaw: tur checker reports path is down
[...more like these...for another 8 seconds; finally an UP transition]
multipathd: sdaw: tur checker reports path is down
multipathd: sdax: tur checker reports path is down
multipathd: sdaa: tur checker reports path is up
multipathd: 65:160: reinstated
multipathd: EVA-2_iSCSI-1: remaining active paths: 13
multipathd: sdab: tur checker reports path is up
multipathd: 65:176: reinstated
multipathd: EVA-2_iSCSI-1: remaining active paths: 14
multipathd: sdac: tur checker reports path is up
multipathd: 65:192: reinstated
[...]
It seems syslog could not log all of those messges:
syslog-ng[10113]: STATS: dropped 927

I know that this is not all iSCSI, but can something be done about the problem, 
like an exponential backuff strategy for emitting error messages?

If I configure something like 10 LUNs with 16 paths each, a failure of a 
network 
connection for a few seconds will flood the syslog with thousands of entries.

Regards,
Ulrich Windl


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to