On 04/14/2010 07:02 AM, Christian Iversen wrote:
On 2010-04-13 22:03, Mike Christie wrote:
On 04/13/2010 03:23 AM, Christian Iversen wrote:
Hi iSCSI guys
I've set up iSCSI storage on our servers, using IETD and OpenISCSI.
It works and performs great, but I am a little unsure of how to adjust
the timeout values properly.
On our storage servers, we use heartbeat to achieve HA failover, which
works nicely. However, the client machines only try for a fixed amount
of time before giving up, so if the failover for some reason does not
happen relatively quickly, everything grinds to a halt in a really bad
way.
I would like to set up open-iscsi to keep trying, preferably at low
intervals, and not give up contacting the server.
There are quite a few different timeouts, and I have been unable to find
any sort of reference documentation for this. Maybe someone here can
help?
Did you read the README? I tried to document the timeouts that are asked
about most frequently on the list.
Thank you! I've been looking for that kind of document for a while.
Things are somewhat clearer now :)
What I'd like is the following:
- Never give up trying (or at least try for a month :)
The iscsi initiator almost always tries to reconnect to the target. If
it gets a successful login then that fails it will try to relogin until
the the user runs some iscsiadm command to logout.
If you mean you want it to hold onto IO and not fail it, then you want
the replacement_timeout/recovery_timeout. There should be info in the
README and iscsid.conf about this. If it is not clear let me know.
There's info about replacement_timeout, but no recovery_timeout. Maybe
only the former is a valid name?
replacement_timeout is the name of the setting in iscsid.conf, but for
some dumb reason I named it recovery_timeout in the kernel.
If in the iscsid.conf you see this for
node.session.timeo.replacement_timeout then this is what I think you are
asking for (that is if you are saying you do not want IO failed) and you
want to set the value to 0.
# - If the value is 0, IO will be failed immediately.
# - If the value is less than 0, IO will remain queued until the session
# is logged back in, or until the user runs the logout command.
I'm a little unsure about the semantics for "failed io". What I want is
the iscsi client to see all IO as working, or hanging indefinitely if
the server cannot be contacted.
Then set the replacement_timeout to -1.
If there is a low-level error, I'd like iscsi to detect this quickly and
reconnect right away. (this will happen when there's a failover). Will
the following settings work for this purpose:
node.conn[0].timeo.noop_out_interval = 2
node.conn[0].timeo.noop_out_timeout = 2
node.session.timeo.replacement_timeout = 86400
Yes.
Per my understanding: This will ping the server every 2. seconds, and
wait 2 seconds for a reply. If a connection problem is discovered, the
client will try for 24 hours (86400 seconds) to reestablish a connection
before giving up and returning IO errors to higher layers.
Is this correct? From your description it seems like replacement_timeout
Yes.
= 0 would cause immediate IO errors in case of connection problems? Or
did I misunderstand?
Yeah, on newer versions 0 causes the IO to be failed immediately. I
wrote that wrong before.
--
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/open-iscsi?hl=en.