On 2010-04-13 22:03, Mike Christie wrote:
On 04/13/2010 03:23 AM, Christian Iversen wrote:
Hi iSCSI guys

I've set up iSCSI storage on our servers, using IETD and OpenISCSI.

It works and performs great, but I am a little unsure of how to adjust
the timeout values properly.

On our storage servers, we use heartbeat to achieve HA failover, which
works nicely. However, the client machines only try for a fixed amount
of time before giving up, so if the failover for some reason does not
happen relatively quickly, everything grinds to a halt in a really bad
way.

I would like to set up open-iscsi to keep trying, preferably at low
intervals, and not give up contacting the server.

There are quite a few different timeouts, and I have been unable to find
any sort of reference documentation for this. Maybe someone here can
help?


Did you read the README? I tried to document the timeouts that are asked
about most frequently on the list.

Thank you! I've been looking for that kind of document for a while. Things are somewhat clearer now :)

What I'd like is the following:

- Never give up trying (or at least try for a month :)

The iscsi initiator almost always tries to reconnect to the target. If
it gets a successful login then that fails it will try to relogin until
the the user runs some iscsiadm command to logout.

If you mean you want it to hold onto IO and not fail it, then you want
the replacement_timeout/recovery_timeout. There should be info in the
README and iscsid.conf about this. If it is not clear let me know.

There's info about replacement_timeout, but no recovery_timeout. Maybe only the former is a valid name?

If in the iscsid.conf you see this for
node.session.timeo.replacement_timeout then this is what I think you are
asking for (that is if you are saying you do not want IO failed) and you
want to set the value to 0.
# - If the value is 0, IO will be failed immediately.
# - If the value is less than 0, IO will remain queued until the session
# is logged back in, or until the user runs the logout command.

I'm a little unsure about the semantics for "failed io". What I want is the iscsi client to see all IO as working, or hanging indefinitely if the server cannot be contacted.

If there is a low-level error, I'd like iscsi to detect this quickly and reconnect right away. (this will happen when there's a failover). Will the following settings work for this purpose:

node.conn[0].timeo.noop_out_interval = 2
node.conn[0].timeo.noop_out_timeout = 2
node.session.timeo.replacement_timeout = 86400

Per my understanding: This will ping the server every 2. seconds, and wait 2 seconds for a reply. If a connection problem is discovered, the client will try for 24 hours (86400 seconds) to reestablish a connection before giving up and returning IO errors to higher layers.

Is this correct? From your description it seems like replacement_timeout = 0 would cause immediate IO errors in case of connection problems? Or did I misunderstand?

--
Med venlig hilsen / Best regards
Christian Iversen

Sikkerhed.org ApS
Fuglebakkevej 88                       E-mail:  [email protected]
1. sal                                 Web:     www.sikkerhed.org
DK-2000 Frederiksberg                  Direkte: [email protected]

--
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to