Il giorno martedì 21 aprile 2020 22:30:44 UTC+2, Gionatan Danti ha scritto:
>
>
> Il giorno martedì 21 aprile 2020 20:44:22 UTC+2, The Lee-Man ha scritto:
>>
>>
>> Because of the design of iSCSI, there is no way for the initiator to know 
>> the server has gone away. The only time an initiator might figure this out 
>> is when it tries to communicate with the target.
>>
>> This assumes we are not using some sort of directory service, like iSNS, 
>> which can send asynchronous notifications. But even then, the iSNS server 
>> would have to somehow know that the target went down. If the target 
>> crashed, that might be difficult to ascertain.
>>
>> So in the absence of some asynchronous notification, the initiator only 
>> knows the target is not responding if it tries to talk to that target.
>>
>> Normally iscsid defaults to sending periodic NO-OPs to the target every 5 
>> seconds. So if the target goes away, the initiator usually notices, even if 
>> no regular I/O is occurring.
>>
>
> True.
>  
>
>>
>> But this is where the error recovery gets tricky, because iscsi tries to 
>> handle "lossy" connections. What if the server will be right back? Maybe 
>> it's rebooting? Maybe the cable will be plugged back in? So iscsi keeps 
>> trying to reconnect. As a matter of fact, if you stop iscsid and restart 
>> it, it sees the failed connection and retries it -- forever, by default. I 
>> actually added a configuration parameter called reopen_max, that can limit 
>> the number of retries. But there was pushback on changing the default value 
>> from 0, which is "retry forever".
>>
>> So what exactly do you think the system should do when a connection "goes 
>> away"? How long does it have to be gone to be considered gone for good? If 
>> the target comes back "later" should it get the same disc name? Should we 
>> retry, and if so how much before we give up? I'm interested in your views, 
>> since it seems like a non-trivial problem to me.
>>
>
> Well, for short disconnections the re-try approach is surely the better 
> one. But I naively assumed that a longer disconnection, as described by the 
> node.session.timeo.replacement_timeout parameter, would tear down the 
> device with a corresponding udev event. Udev should have no problem 
> assigning the device a sensible persistent name, right?
>  
>
>>
>> So you're saying as soon as a bad connection is detected (perhaps by a 
>> NOOP), the device should go away? 
>>
>
> I would say that the device should go away not a the first NOOP failing, 
> but when the replacement_timeout (or another sensible timeout) expires.
>
> This open the door to another question: from iscsid.conf 
> <https://github.com/open-iscsi/open-iscsi/blob/master/etc/iscsid.conf#L99> 
> and README 
> <https://github.com/open-iscsi/open-iscsi/blob/master/README#L1476> files 
> I (wrongly?) understand that replacement_timeout come into play only when 
> the SCSI EH is running, while in the other cases different timeouts as 
> node.session.err_timeo.lu_reset_timeout and 
> node.session.err_timeo.tgt_reset_timeout should affect the 
> (dis)connection. However, in all my tests, I only saw replacement_timeout 
> being 
> honored, still I did not catch a single running instance of SCSI EH via the 
> proposed command iscsiadm -m session -P 3
>
> What I am missing?
> Thanks.
>

Hi all, any thoughts regarding the point above?
Thanks. 

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/a0f1cad1-e867-4725-b0a9-32e530e019c5%40googlegroups.com.

Reply via email to