>>> The Lee-Man <[email protected]> schrieb am 21.04.2020 um 20:44 in
Nachricht
<618_1587494664_5E9F3F08_618_445_1_7f583720-8a84-4872-8d1a-5cd284295c22@googlegr
ups.com>:
> On Tuesday, April 21, 2020 at 12:31:24 AM UTC-7, Gionatan Danti wrote:
>>
>> [reposting, as the previous one seems to be lost]
>>
>> Hi all,
>> I have a question regarding udev events when using iscsi disks.
>>
>> By using "udevadm monitor" I can see that events are generated when I 
>> login and logout from an iscsi portal/resource, creating/destroying the 
>> relative links under /dev/
>>
>> However, I can not see anything when the remote machine simple 
>> dies/reboots/disconnects: while "dmesg" shows the iscsi timeout expiring, I 
>> don't see anything about a removed disk (and the links under /dev/ remains 
>> unaltered, indeed). At the same time, when the remote machine and disk 
>> become available again, no reconnection events happen.
>>
> 
> Because of the design of iSCSI, there is no way for the initiator to know 
> the server has gone away. The only time an initiator might figure this out 
> is when it tries to communicate with the target.

My knowlege of the SCSI stack is quite poor, but I think the last revisions of 
parallel SCSI (like Ultra 320 (or was it 160?)) had a concept of "domain 
validation". AFAIK the leatter meant measuring the quality of the wires, 
adjusting the transfer speed.
While basically SCSI assumes "the bus" won't go away magically, a future iSCSI 
standard might contain  regular "bus checks" to trigger recovery actions if the 
"bus" (network transport connection) seems to be gone.

> 
> This assumes we are not using some sort of directory service, like iSNS, 
> which can send asynchronous notifications. But even then, the iSNS server 
> would have to somehow know that the target went down. If the target 
> crashed, that might be difficult to ascertain.

To be picky: If the traget went down (like a classical failing SCSI disk), it 
could issue some attention message, but when the transport went down, no such 
message can be received. So I think there's a difference between "target down" 
(device not present, device fails to respond) and "bus down" (no communication 
possible any more). In the second case no assumptions can be made about the 
health of the traget device.

> 
> So in the absence of some asynchronous notification, the initiator only 
> knows the target is not responding if it tries to talk to that target.
> 
> Normally iscsid defaults to sending periodic NO-OPs to the target every 5 
> seconds. So if the target goes away, the initiator usually notices, even if 
> no regular I/O is occurring.

So the target went away, or the bus went down?

> 
> But this is where the error recovery gets tricky, because iscsi tries to 
> handle "lossy" connections. What if the server will be right back? Maybe 
> it's rebooting? Maybe the cable will be plugged back in? So iscsi keeps 
> trying to reconnect. As a matter of fact, if you stop iscsid and restart 
> it, it sees the failed connection and retries it -- forever, by default. I 
> actually added a configuration parameter called reopen_max, that can limit 
> the number of retries. But there was pushback on changing the default value 
> from 0, which is "retry forever".
> 
> So what exactly do you think the system should do when a connection "goes 
> away"? How long does it have to be gone to be considered gone for good? If 
> the target comes back "later" should it get the same disc name? Should we 
> retry, and if so how much before we give up? I'm interested in your views, 
> since it seems like a non-trivial problem to me.

IMHO a "bus down" is a critical event affecting _all_ devices on that bus, not 
just a single target. Well, it might be some extra noise if those other targets 
have no I/O outstanding, but it's better to know that the bus is down before 
initiating a transfer rather than concluding seconds later that the target 
seems unreachable for some reasons unknown.

> 
>>
>> I can read here that, years ago, a patch was in progress to give better 
>> integration with udev when a device disconnects/reconnects. Did the patch 
>> got merged? Or does the one I described above remain the expected behavior? 
>> Can be changed?
>>
> 
> So you're saying as soon as a bad connection is detected (perhaps by a 
> NOOP), the device should go away? 

Maybe the state should be similar to a device being in power-save mode: It's 
not accessible right now, but should be woke up ASAP. See my earlier comparison 
to NFS hard-mounts...

Regards,
Ulrich

> 
>>
>> Thanks.
>>
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/open-iscsi/7f583720-8a84-4872-8d1a-5cd28429 
> 5c22%40googlegroups.com.



-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/5E9FEA90020000A1000387D3%40gwsmtp.uni-regensburg.de.

Reply via email to