Il giorno martedì 21 aprile 2020 20:44:22 UTC+2, The Lee-Man ha scritto:
>
>
> Because of the design of iSCSI, there is no way for the initiator to know 
> the server has gone away. The only time an initiator might figure this out 
> is when it tries to communicate with the target.
>
> This assumes we are not using some sort of directory service, like iSNS, 
> which can send asynchronous notifications. But even then, the iSNS server 
> would have to somehow know that the target went down. If the target 
> crashed, that might be difficult to ascertain.
>
> So in the absence of some asynchronous notification, the initiator only 
> knows the target is not responding if it tries to talk to that target.
>
> Normally iscsid defaults to sending periodic NO-OPs to the target every 5 
> seconds. So if the target goes away, the initiator usually notices, even if 
> no regular I/O is occurring.
>

True.
 

>
> But this is where the error recovery gets tricky, because iscsi tries to 
> handle "lossy" connections. What if the server will be right back? Maybe 
> it's rebooting? Maybe the cable will be plugged back in? So iscsi keeps 
> trying to reconnect. As a matter of fact, if you stop iscsid and restart 
> it, it sees the failed connection and retries it -- forever, by default. I 
> actually added a configuration parameter called reopen_max, that can limit 
> the number of retries. But there was pushback on changing the default value 
> from 0, which is "retry forever".
>
> So what exactly do you think the system should do when a connection "goes 
> away"? How long does it have to be gone to be considered gone for good? If 
> the target comes back "later" should it get the same disc name? Should we 
> retry, and if so how much before we give up? I'm interested in your views, 
> since it seems like a non-trivial problem to me.
>

Well, for short disconnections the re-try approach is surely the better 
one. But I naively assumed that a longer disconnection, as described by the 
node.session.timeo.replacement_timeout parameter, would tear down the 
device with a corresponding udev event. Udev should have no problem 
assigning the device a sensible persistent name, right?
 

>
> So you're saying as soon as a bad connection is detected (perhaps by a 
> NOOP), the device should go away? 
>

I would say that the device should go away not a the first NOOP failing, 
but when the replacement_timeout (or another sensible timeout) expires.

This open the door to another question: from iscsid.conf 
<https://github.com/open-iscsi/open-iscsi/blob/master/etc/iscsid.conf#L99> 
and README 
<https://github.com/open-iscsi/open-iscsi/blob/master/README#L1476> files I 
(wrongly?) understand that replacement_timeout come into play only when the 
SCSI EH is running, while in the other cases different timeouts as 
node.session.err_timeo.lu_reset_timeout and 
node.session.err_timeo.tgt_reset_timeout should affect the (dis)connection. 
However, in all my tests, I only saw replacement_timeout being honored, 
still I did not catch a single running instance of SCSI EH via the proposed 
command iscsiadm -m session -P 3

What I am missing?
Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/67349dca-9647-4dbd-affc-ded6e8f01ee9%40googlegroups.com.

Reply via email to