Re: udev events for iscsi

2020-05-30 Thread Gionatan Danti

Il giorno martedì 28 aprile 2020 23:15:31 UTC+2, Gionatan Danti ha scritto:

Well, for short disconnections the re-try approach is surely the better 
> one. But I naively assumed that a longer disconnection, as described by the 
> node.session.timeo.replacement_timeout parameter, would tear down the 
> device with a corresponding udev event. Udev should have no problem 
> assigning the device a sensible persistent name, right?
>
 

> This open the door to another question: from iscsid.conf 
> <https://github.com/open-iscsi/open-iscsi/blob/master/etc/iscsid.conf#L99> 
> and README 
> <https://github.com/open-iscsi/open-iscsi/blob/master/README#L1476> files 
> I (wrongly?) understand that replacement_timeout come into play only when 
> the SCSI EH is running, while in the other cases different timeouts as 
> node.session.err_timeo.lu_reset_timeout and 
> node.session.err_timeo.tgt_reset_timeout should affect the 
> (dis)connection. However, in all my tests, I only saw replacement_timeout 
> being 
> honored, still I did not catch a single running instance of SCSI EH via the 
> proposed command iscsiadm -m session -P 3



Hi  all and sorry for the bump, but I would really like to understand the 
two points above (especially the one regarding the various timeout values).
Can someone shed some light?
Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/8ff6f813-44d1-4388-ba0a-85d5e54933e7%40googlegroups.com.


Re: udev events for iscsi

2020-04-28 Thread Gionatan Danti


Il giorno martedì 21 aprile 2020 22:30:44 UTC+2, Gionatan Danti ha scritto:
>
>
> Il giorno martedì 21 aprile 2020 20:44:22 UTC+2, The Lee-Man ha scritto:
>>
>>
>> Because of the design of iSCSI, there is no way for the initiator to know 
>> the server has gone away. The only time an initiator might figure this out 
>> is when it tries to communicate with the target.
>>
>> This assumes we are not using some sort of directory service, like iSNS, 
>> which can send asynchronous notifications. But even then, the iSNS server 
>> would have to somehow know that the target went down. If the target 
>> crashed, that might be difficult to ascertain.
>>
>> So in the absence of some asynchronous notification, the initiator only 
>> knows the target is not responding if it tries to talk to that target.
>>
>> Normally iscsid defaults to sending periodic NO-OPs to the target every 5 
>> seconds. So if the target goes away, the initiator usually notices, even if 
>> no regular I/O is occurring.
>>
>
> True.
>  
>
>>
>> But this is where the error recovery gets tricky, because iscsi tries to 
>> handle "lossy" connections. What if the server will be right back? Maybe 
>> it's rebooting? Maybe the cable will be plugged back in? So iscsi keeps 
>> trying to reconnect. As a matter of fact, if you stop iscsid and restart 
>> it, it sees the failed connection and retries it -- forever, by default. I 
>> actually added a configuration parameter called reopen_max, that can limit 
>> the number of retries. But there was pushback on changing the default value 
>> from 0, which is "retry forever".
>>
>> So what exactly do you think the system should do when a connection "goes 
>> away"? How long does it have to be gone to be considered gone for good? If 
>> the target comes back "later" should it get the same disc name? Should we 
>> retry, and if so how much before we give up? I'm interested in your views, 
>> since it seems like a non-trivial problem to me.
>>
>
> Well, for short disconnections the re-try approach is surely the better 
> one. But I naively assumed that a longer disconnection, as described by the 
> node.session.timeo.replacement_timeout parameter, would tear down the 
> device with a corresponding udev event. Udev should have no problem 
> assigning the device a sensible persistent name, right?
>  
>
>>
>> So you're saying as soon as a bad connection is detected (perhaps by a 
>> NOOP), the device should go away? 
>>
>
> I would say that the device should go away not a the first NOOP failing, 
> but when the replacement_timeout (or another sensible timeout) expires.
>
> This open the door to another question: from iscsid.conf 
> <https://github.com/open-iscsi/open-iscsi/blob/master/etc/iscsid.conf#L99> 
> and README 
> <https://github.com/open-iscsi/open-iscsi/blob/master/README#L1476> files 
> I (wrongly?) understand that replacement_timeout come into play only when 
> the SCSI EH is running, while in the other cases different timeouts as 
> node.session.err_timeo.lu_reset_timeout and 
> node.session.err_timeo.tgt_reset_timeout should affect the 
> (dis)connection. However, in all my tests, I only saw replacement_timeout 
> being 
> honored, still I did not catch a single running instance of SCSI EH via the 
> proposed command iscsiadm -m session -P 3
>
> What I am missing?
> Thanks.
>

Hi all, any thoughts regarding the point above?
Thanks. 

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/a0f1cad1-e867-4725-b0a9-32e530e019c5%40googlegroups.com.


Re: udev events for iscsi

2020-04-21 Thread Gionatan Danti

Il giorno martedì 21 aprile 2020 20:44:22 UTC+2, The Lee-Man ha scritto:
>
>
> Because of the design of iSCSI, there is no way for the initiator to know 
> the server has gone away. The only time an initiator might figure this out 
> is when it tries to communicate with the target.
>
> This assumes we are not using some sort of directory service, like iSNS, 
> which can send asynchronous notifications. But even then, the iSNS server 
> would have to somehow know that the target went down. If the target 
> crashed, that might be difficult to ascertain.
>
> So in the absence of some asynchronous notification, the initiator only 
> knows the target is not responding if it tries to talk to that target.
>
> Normally iscsid defaults to sending periodic NO-OPs to the target every 5 
> seconds. So if the target goes away, the initiator usually notices, even if 
> no regular I/O is occurring.
>

True.
 

>
> But this is where the error recovery gets tricky, because iscsi tries to 
> handle "lossy" connections. What if the server will be right back? Maybe 
> it's rebooting? Maybe the cable will be plugged back in? So iscsi keeps 
> trying to reconnect. As a matter of fact, if you stop iscsid and restart 
> it, it sees the failed connection and retries it -- forever, by default. I 
> actually added a configuration parameter called reopen_max, that can limit 
> the number of retries. But there was pushback on changing the default value 
> from 0, which is "retry forever".
>
> So what exactly do you think the system should do when a connection "goes 
> away"? How long does it have to be gone to be considered gone for good? If 
> the target comes back "later" should it get the same disc name? Should we 
> retry, and if so how much before we give up? I'm interested in your views, 
> since it seems like a non-trivial problem to me.
>

Well, for short disconnections the re-try approach is surely the better 
one. But I naively assumed that a longer disconnection, as described by the 
node.session.timeo.replacement_timeout parameter, would tear down the 
device with a corresponding udev event. Udev should have no problem 
assigning the device a sensible persistent name, right?
 

>
> So you're saying as soon as a bad connection is detected (perhaps by a 
> NOOP), the device should go away? 
>

I would say that the device should go away not a the first NOOP failing, 
but when the replacement_timeout (or another sensible timeout) expires.

This open the door to another question: from iscsid.conf 
 
and README 
 files I 
(wrongly?) understand that replacement_timeout come into play only when the 
SCSI EH is running, while in the other cases different timeouts as 
node.session.err_timeo.lu_reset_timeout and 
node.session.err_timeo.tgt_reset_timeout should affect the (dis)connection. 
However, in all my tests, I only saw replacement_timeout being honored, 
still I did not catch a single running instance of SCSI EH via the proposed 
command iscsiadm -m session -P 3

What I am missing?
Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/67349dca-9647-4dbd-affc-ded6e8f01ee9%40googlegroups.com.


udev events for iscsi

2020-04-21 Thread gionatan . danti
Hi all,
I have a question regarding udev events when using iscsi disks.

By using "udevadm monitor" I can see that events are generated when I login 
and logout from an iscsi portal/resource, creating/destroying the relative 
links under /dev/

However, I can not see anything when the remote machine simple 
dies/reboots/disconnects: while "dmesg" shows the iscsi timeout expiring, I 
don't see anything about a removed disk (and the links under /dev/ remains 
unaltered, indeed). At the same time, when the remote machine and disk 
become available again, no reconnection events happen.

I read a quite old thread here 
 were 
it was stated that a patch to better integrate iscsi with udev events was 
in progress. Did something changed/happened during these years? Is the 
behavior I observed (and described above) to be expected?

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/efc571ca-92db-4f58-a8a5-cff9a33dee98%40googlegroups.com.


udev events for iscsi

2020-04-21 Thread Gionatan Danti
[reposting, as the previous one seems to be lost]

Hi all,
I have a question regarding udev events when using iscsi disks.

By using "udevadm monitor" I can see that events are generated when I login 
and logout from an iscsi portal/resource, creating/destroying the relative 
links under /dev/

However, I can not see anything when the remote machine simple 
dies/reboots/disconnects: while "dmesg" shows the iscsi timeout expiring, I 
don't see anything about a removed disk (and the links under /dev/ remains 
unaltered, indeed). At the same time, when the remote machine and disk 
become available again, no reconnection events happen.

I can read here that, years ago, a patch was in progress to give better 
integration with udev when a device disconnects/reconnects. Did the patch 
got merged? Or does the one I described above remain the expected behavior? 
Can be changed?

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/13d4c963-b633-4672-97d9-dd41eec5fb5b%40googlegroups.com.