Very enlightening, thanks.
If there is a constant stream of traffic over the iscsi session and
there is a network failure,
then the scsi eh timer should fire right?
And the disk will then go offline (according to /sys/block/<disk>/
device/state )?

I think where this is leading to is to use dm-multipath even if there
is only a single path since dm-multipath
will constantly test the link.

On Sep 11, 7:53 am, Mike Christie <> wrote:
> On 09/10/2009 05:23 PM, Chiradeep Vittal wrote:
> > Thanks. I'll take a look at the netlink interface. Not using multipath
> > for now, but will do so later.
> > For basic monitoring of storage network problems, here's what I am
> > thinking:
> > 1. If there is a network failure, eventually cat /sys/block/<disk>/
> > device/state should show "offline" ?
> > 2. How long will this take? I know that this is a function of
> > replacement_timeout, noop_interval, noop_timeout and scsi timeout, but
> > the relationship is not clear
> > Let us say
> > a=session.timeo.noop_out_interval=5
> > b=session.timeo.noop_out_timeout=5
> > c=session.timeo.replacement_timeout=120
> > d=`cat /sys/block/<disk>/device/timeout`=60
> > The disk should go offline in a maximum of a+b+c+d=190s after a
> > network failure?
> It is not really that easy, because if the nop times out the iscsi layer
> will drop the session and the disk state will not change to offline. The
> disk state will only change if the scsi command timer fires and the scsi
> eh runs and fails. In this case the disk state will go to offline.
> For the nop timeout case and the scsi eh failing case, the iscsi session
> state will go to failed, so you could check that instead. That value is in
> /sys/class/iscsi_session/session%SID/state
> > If the network comes back up, how soon will the disk state go to
> > 'running' ?
> When the iscsi session is dropped due to a nop timeout or the scsi eh
> failing, the initiator will basically poll the network ever couple of
> seconds by trying to reconnect the tcp connection. And so it depends on
> the type of failure. If the initiator is trying to reconnect the tcp
> connection when the network comes up, then we could reconnect right
> away, or if the network layer cannot figure things out the reconnect
> could timeout and then the next try would work, or if the network had
> given us a error right away when we tried the reconnect then it on the
> next reconnect attempt we would be successful.
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to
To unsubscribe from this group, send email to
For more options, visit this group at

Reply via email to