Re: simulated network failure & error messages

2009-02-27 Thread Mike Christie

Ulrich Windl wrote:
> [...]
>>> [ the other thing is that multipathd REPEATEDLY reports the paths being 
>>> down, i.e. 
>>> it reports the STATE, not the TRANSITION of the state.: ]
>> Do you mean it repeatedly reports this for each path or that you get a 
>> message for each path that is on the same session/connection?
> 
> Does that extract from the log answer your question? ;-)
> Feb 26 13:24:30 rkdvmso1 multipathd: sdaa: tur checker reports path is down
>

> Feb 26 13:26:44 rkdvmso1 multipathd: sdaa: tur checker reports path is down

:) Ok, I get it. You should report that to the dm-devel list, 
dm-de...@redhat.com, so they fix that up. Also report the path down 
messages 3 seconds after the connection up ones. I thought it might be a 
race with the logging or something but 3 seconds is a long time so maybe 
it is bug (multipathd getting backed up or something).

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: simulated network failure & error messages

2009-02-27 Thread Ulrich Windl

On 26 Feb 2009 at 9:51, Mike Christie wrote:

> 
> Ulrich Windl wrote:
> > Hello,
> > 
> > I've simulated a remote network failure by adding a blackhole route for one 
> > of the 
> > remote iSCSI ports, watching the kernel messages of SLES 10 SP1 (open-iscsi-
> > 2.0.707-0.32). I tried to follow the configuration guidelines in README for 
> > using 
> > multipath. The host sees three target LUNs, each reachable via 16 paths 
> > like this:
[...]
> > It all starts with this:
> > iscsid: cannot make a connection to 172.20.76.1:3260 (101)
> > kernel:  connection224:0: iscsi: detected conn error (1011)
> > kernel:  connection225:0: iscsi: detected conn error (1011)
> > kernel:  connection226:0: iscsi: detected conn error (1011)
> > kernel:  connection227:0: iscsi: detected conn error (1011)
> > [...more and repeating about every second...]
> 
> The detected conn error messages are not repeating are they? It should 
> only fire once when we detect a problem.

Yes, there is only one "conn error" per connection.

> 
> > iscsid: detected iSCSI connection 224:0 error (1011) state (3)
> > iscsid: cannot make a connection to 172.20.76.1:3260 (101)
> 
> These ones do spit out over and over. I agree we should have some back 
> off code for that. Will fix it in a second.

Very good!

[...]
> > [ the other thing is that multipathd REPEATEDLY reports the paths being 
> > down, i.e. 
> > it reports the STATE, not the TRANSITION of the state.: ]
> 
> Do you mean it repeatedly reports this for each path or that you get a 
> message for each path that is on the same session/connection?

Does that extract from the log answer your question? ;-)
Feb 26 13:24:30 rkdvmso1 multipathd: sdaa: tur checker reports path is down
Feb 26 13:26:28 rkdvmso1 multipathd: sdab: tur checker reports path is down
Feb 26 13:26:29 rkdvmso1 multipathd: sdac: tur checker reports path is down
Feb 26 13:26:29 rkdvmso1 multipathd: sdad: tur checker reports path is down
Feb 26 13:26:29 rkdvmso1 multipathd: sdae: tur checker reports path is down
Feb 26 13:26:29 rkdvmso1 multipathd: sdaf: tur checker reports path is down
Feb 26 13:26:29 rkdvmso1 multipathd: sdag: tur checker reports path is down
Feb 26 13:26:29 rkdvmso1 multipathd: sdah: tur checker reports path is down
Feb 26 13:26:29 rkdvmso1 multipathd: sdau: tur checker reports path is down
Feb 26 13:26:29 rkdvmso1 multipathd: sdav: tur checker reports path is down
Feb 26 13:26:29 rkdvmso1 multipathd: sdaw: tur checker reports path is down
Feb 26 13:26:29 rkdvmso1 multipathd: sdax: tur checker reports path is down
Feb 26 13:26:34 rkdvmso1 multipathd: sdaa: tur checker reports path is down
Feb 26 13:26:34 rkdvmso1 multipathd: sdab: tur checker reports path is down
Feb 26 13:26:34 rkdvmso1 multipathd: sdac: tur checker reports path is down
Feb 26 13:26:34 rkdvmso1 multipathd: sdad: tur checker reports path is down
Feb 26 13:26:34 rkdvmso1 multipathd: sdae: tur checker reports path is down
Feb 26 13:26:34 rkdvmso1 multipathd: sdaf: tur checker reports path is down
Feb 26 13:26:34 rkdvmso1 multipathd: sdag: tur checker reports path is down
Feb 26 13:26:34 rkdvmso1 multipathd: sdah: tur checker reports path is down
Feb 26 13:26:34 rkdvmso1 multipathd: sdau: tur checker reports path is down
Feb 26 13:26:34 rkdvmso1 multipathd: sdav: tur checker reports path is down
Feb 26 13:26:34 rkdvmso1 multipathd: sdaw: tur checker reports path is down
Feb 26 13:26:34 rkdvmso1 multipathd: sdax: tur checker reports path is down
Feb 26 13:26:39 rkdvmso1 multipathd: sdaa: tur checker reports path is down
Feb 26 13:26:39 rkdvmso1 multipathd: sdab: tur checker reports path is down
Feb 26 13:26:39 rkdvmso1 multipathd: sdac: tur checker reports path is down
Feb 26 13:26:39 rkdvmso1 multipathd: sdad: tur checker reports path is down
Feb 26 13:26:39 rkdvmso1 multipathd: sdae: tur checker reports path is down
Feb 26 13:26:39 rkdvmso1 multipathd: sdaf: tur checker reports path is down
Feb 26 13:26:39 rkdvmso1 multipathd: sdag: tur checker reports path is down
Feb 26 13:26:39 rkdvmso1 multipathd: sdah: tur checker reports path is down
Feb 26 13:26:39 rkdvmso1 multipathd: sdau: tur checker reports path is down
Feb 26 13:26:39 rkdvmso1 multipathd: sdav: tur checker reports path is down
Feb 26 13:26:39 rkdvmso1 multipathd: sdaw: tur checker reports path is down
Feb 26 13:26:39 rkdvmso1 multipathd: sdax: tur checker reports path is down
Feb 26 13:26:44 rkdvmso1 multipathd: sdaa: tur checker reports path is down
Feb 26 13:26:44 rkdvmso1 multipathd: sdab: tur checker reports path is down
Feb 26 13:26:44 rkdvmso1 multipathd: sdac: tur checker reports path is down
Feb 26 13:26:44 rkdvmso1 multipathd: sdad: tur checker reports path is down
Feb 26 13:26:44 rkdvmso1 multipathd: sdae: tur checker reports path is down
Feb 26 13:26:44 rkdvmso1 multipathd: sdaf: tur checker reports path is down
Feb 26 13:26:44 rkdvmso1 multipathd: sdag: tur checker reports path is down
Feb 26 13:26:44 rkdvmso1 multipathd: sdah: tur checker reports path is down
Fe

Re: simulated network failure & error messages

2009-02-26 Thread Mike Christie

Ulrich Windl wrote:
> Hello,
> 
> I've simulated a remote network failure by adding a blackhole route for one 
> of the 
> remote iSCSI ports, watching the kernel messages of SLES 10 SP1 (open-iscsi-
> 2.0.707-0.32). I tried to follow the configuration guidelines in README for 
> using 
> multipath. The host sees three target LUNs, each reachable via 16 paths like 
> this:
> 
> two NICs * two iSCSI boxes * two FC adapters * two storage controllers
> 
> I feel that some error messages would benefit from being throttled or more 
> specific:
> 
> It all starts with this:
> iscsid: cannot make a connection to 172.20.76.1:3260 (101)
> kernel:  connection224:0: iscsi: detected conn error (1011)
> kernel:  connection225:0: iscsi: detected conn error (1011)
> kernel:  connection226:0: iscsi: detected conn error (1011)
> kernel:  connection227:0: iscsi: detected conn error (1011)
> [...more and repeating about every second...]

The detected conn error messages are not repeating are they? It should 
only fire once when we detect a problem.

> iscsid: detected iSCSI connection 224:0 error (1011) state (3)
> iscsid: cannot make a connection to 172.20.76.1:3260 (101)

These ones do spit out over and over. I agree we should have some back 
off code for that. Will fix it in a second.


> [...]
> multipathd: sdaa: tur checker reports path is down
> multipathd: checker failed path 65:160 in map EVA-2_iSCSI-1
> multipathd: EVA-2_iSCSI-1: remaining active paths: 15
> iscsid: detected iSCSI connection 222:0 error (1011) state (3)
> iscsid: cannot make a connection to 172.20.76.1:3260 (101)
> iscsid: cannot make a connection to 172.20.76.1:3260 (101)
> iscsid: cannot make a connection to 172.20.76.1:3260 (101)
> iscsid: cannot make a connection to 172.20.76.1:3260 (101)
> iscsid: cannot make a connection to 172.20.76.1:3260 (101)
> iscsid: cannot make a connection to 172.20.76.1:3260 (101)
> iscsid: cannot make a connection to 172.20.76.1:3260 (101)
> iscsid: cannot make a connection to 172.20.76.1:3260 (101)
> [ here the attempts for a connection appeared within two seconds, poosibly 
> for 
> different targets/LUNs, but the message doesn't tell which. There are at 
> least one 
> hundred of these messages within a very short time ]

This should be fixed soon. I just fixed up the kernel messages to add 
some prefix so you can match this up. I am going to get to the userspace 
messsages next.


> kernel:  session223: iscsi: session recovery timed out after 120 secs
> multipathd: sdab: tur checker reports path is down
> kernel: iscsi: cmd 0x0 is not queued (8)
> kernel: device-mapper: dm-multipath: Failing path 65:176.
> multipathd: checker failed path 65:176 in map EVA-2_iSCSI-1
> [ In the message from iscsi above a few details about the "owner" of the 
> command 
> being queued would make much sense IMHO ]

In the upstream code it should be fixed, so that if it is prtinted you 
know where it is coming from.

However, upstream does not normally turn this on. It is SUSE specific, 
so if you do not like it you have to make a suse bugzilla and ask to 
turn it off.


> [...]
> iscsid: cannot make a connection to 172.20.76.1:3260 (101)
> iscsid: cannot make a connection to 172.20.76.1:3260 (101)
> kernel: iscsi: cmd 0x0 is not queued (8)
> kernel: iscsi: cmd 0x0 is not queued (8)
> kernel: iscsi: cmd 0x0 is not queued (8)
> kernel: iscsi: cmd 0x0 is not queued (8)
> kernel: iscsi: cmd 0x0 is not queued (8)
> kernel: iscsi: cmd 0x0 is not queued (8)
> kernel: iscsi: cmd 0x0 is not queued (8)
> kernel: iscsi: cmd 0x0 is not queued (8)
> kernel: iscsi: cmd 0x0 is not queued (8)
> kernel: iscsi: cmd 0x0 is not queued (8)
> kernel: iscsi: cmd 0x0 is not queued (8)
> kernel: iscsi: cmd 0x0 is not queued (8)
> multipathd: sdaa: tur checker reports path is down
> multipathd: sdab: tur checker reports path is down
> multipathd: sdac: tur checker reports path is down
> multipathd: sdad: tur checker reports path is down
> multipathd: sdae: tur checker reports path is down
> [...]
> [ Here the problem mentioned before becomes obvious. I'm wondering why the 
> syslog 
> daemon does not say "last message repeated # times"; the iscsi messages all 
> arrived in the same second ] 
> 
> [ the other thing is that multipathd REPEATEDLY reports the paths being down, 
> i.e. 
> it reports the STATE, not the TRANSITION of the state.: ]

Do you mean it repeatedly reports this for each path or that you get a 
message for each path that is on the same session/connection?

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---