Il giorno martedì 21 aprile 2020 22:30:44 UTC+2, Gionatan Danti ha scritto: > > > Il giorno martedì 21 aprile 2020 20:44:22 UTC+2, The Lee-Man ha scritto: >> >> >> Because of the design of iSCSI, there is no way for the initiator to know >> the server has gone away. The only time an initiator might figure this out >> is when it tries to communicate with the target. >> >> This assumes we are not using some sort of directory service, like iSNS, >> which can send asynchronous notifications. But even then, the iSNS server >> would have to somehow know that the target went down. If the target >> crashed, that might be difficult to ascertain. >> >> So in the absence of some asynchronous notification, the initiator only >> knows the target is not responding if it tries to talk to that target. >> >> Normally iscsid defaults to sending periodic NO-OPs to the target every 5 >> seconds. So if the target goes away, the initiator usually notices, even if >> no regular I/O is occurring. >> > > True. > > >> >> But this is where the error recovery gets tricky, because iscsi tries to >> handle "lossy" connections. What if the server will be right back? Maybe >> it's rebooting? Maybe the cable will be plugged back in? So iscsi keeps >> trying to reconnect. As a matter of fact, if you stop iscsid and restart >> it, it sees the failed connection and retries it -- forever, by default. I >> actually added a configuration parameter called reopen_max, that can limit >> the number of retries. But there was pushback on changing the default value >> from 0, which is "retry forever". >> >> So what exactly do you think the system should do when a connection "goes >> away"? How long does it have to be gone to be considered gone for good? If >> the target comes back "later" should it get the same disc name? Should we >> retry, and if so how much before we give up? I'm interested in your views, >> since it seems like a non-trivial problem to me. >> > > Well, for short disconnections the re-try approach is surely the better > one. But I naively assumed that a longer disconnection, as described by the > node.session.timeo.replacement_timeout parameter, would tear down the > device with a corresponding udev event. Udev should have no problem > assigning the device a sensible persistent name, right? > > >> >> So you're saying as soon as a bad connection is detected (perhaps by a >> NOOP), the device should go away? >> > > I would say that the device should go away not a the first NOOP failing, > but when the replacement_timeout (or another sensible timeout) expires. > > This open the door to another question: from iscsid.conf > <https://github.com/open-iscsi/open-iscsi/blob/master/etc/iscsid.conf#L99> > and README > <https://github.com/open-iscsi/open-iscsi/blob/master/README#L1476> files > I (wrongly?) understand that replacement_timeout come into play only when > the SCSI EH is running, while in the other cases different timeouts as > node.session.err_timeo.lu_reset_timeout and > node.session.err_timeo.tgt_reset_timeout should affect the > (dis)connection. However, in all my tests, I only saw replacement_timeout > being > honored, still I did not catch a single running instance of SCSI EH via the > proposed command iscsiadm -m session -P 3 > > What I am missing? > Thanks. >
Hi all, any thoughts regarding the point above? Thanks. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/a0f1cad1-e867-4725-b0a9-32e530e019c5%40googlegroups.com.
