Re: Concurrent logins to different interfaces of same iscsi target and login timeout

Amit Bawer Sun, 09 Aug 2020 11:09:53 -0700


On Saturday, August 8, 2020 at 12:55:22 AM UTC+3 The Lee-Man wrote:


> On Monday, July 27, 2020 at 10:38:05 AM UTC-7, Amit Bawer wrote:
>>
>> Thank you for your answers,
>>
>> The motivation behind the original question is for reducing the waiting 
>> time for different iscsi connections logins
>> in case some of the portals are down.
>>
>> We have a limitation on our RHEV system where all logins to listed iscsi 
>> targets should finish within 180 seconds in total.
>> In our current implementation we serialize the iscsiadm node logins one 
>> after the other,
>> each is for specific target and portal. In such scheme, each login would 
>> wait 120 seconds in case a portal is down
>> (default 15 seconds login timeout * 8 login retries), so if we have 2 or 
>> more connections down, we spend at least 240 seconds
>> which exceeds our 180 seconds time limit and the entire operation is 
>> considered to be failed (RHEV-wise).
>>
>
> Of course these times are tunable, as the README distributed with 
> open-iscsi suggests. But each setting has a trade-off. For example, if you 
> shorten the timeout, you may miss connecting to a target that is just 
> temporarily unreachable. 
>
>>
>> Testing [1] different login schemes is summarized in the following table 
>> (logins to 2 targets with 2 portals each).
>> It seems that either login-all nodes after creating them, as suggested in 
>> previous answer here, compares in  total time spent 
>> with doing specific node logins concurrently (i.e. running iscsiadm -m 
>> node -T target -p portal -I interface  -l in parallel per
>> each target-portal), for both cases of all portals being online and when 
>> one portal is down:
>>
>> Login scheme                         Online  Portals             Active 
>> Sessions       Total Login Time (seconds)
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>>     All at once                            
>> 2/2                                 4                               2.1
>>     All at once                            1/2                           
>>       2                               120.2
>>     Serial target-portal              2/2                                
>> 4                                8.5
>>     Serial target-portal              1/2                                
>> 2                                243.5
>>     Concurrent target-portal     2/2                               
>> 4                                2.1
>>     Concurrent target-portal    1/2                                
>> 2                               120.1
>>
>
> So it looks like "All at once" is as fast as concurrent? I must be missing 
> something. Maybe I'm misunderstanding what "all at once" means? 
>

To illustrate from the test discussed above, calling login_all() after 
calling new_node(...) per each listed target and portal as shown below:
...
    for target, portal in connections:
        new_node(target, portal)

    if args.concurrency:
        login_threads(connections, args.concurrency)
    else:
        login_all()
...

def new_node(target, portal):
    logging.info("Adding node for target %s portal %s", target, portal)

    run([
        "iscsiadm",
        "--mode", "node",
        "--targetname", target,
        "--interface", "default",
        "--portal", portal,
        "--op=new"])

    run([
        "iscsiadm",
        "--mode", "node",
        "--targetname", target,
        "--interface", "default",
        "--portal", portal,
        "--op=update",
        "--name", "node.startup",
        "--value", "manual"])

def login_all():
    logging.info("Login to all nodes")
    try:
        run(["iscsiadm", "--mode", "node", "--loginall=manual"])
    except Error as e:
        # Expected timeout error when there are disconnected portals.
        if e.rc != 8:
            raise
        logging.error("Some login failed: %s", e)
 

>
>> Using concurrent target-portal logins seems to be preferable in our 
>> perspective as it allows to connect only to the
>> specified target and portals without the risk of intermixing with other 
>> potential iscsi targets.
>>
>
> Okay, maybe that explains it. You don't trust the "all" option? You are, 
> after all, in charge of the node database. But of course that's your 
> choice. 
>
It's more about safety I guess, since the connection flow may run on a 
machine which has other iscsi connections set outside/along this flow. 

>
>> The node creation part is kept serial in all tests here and we have seen 
>> it may result in the iscsi DB issues if run in parallel.
>> But using only node logins in parallel doesn't seems to have issues for 
>> at least 1000 tries of out tests.
>>
>
> In general the heavy lifting here is done by the kernel, which has proper 
> multi-thread locking. And I believe iscsiadm has a single lock to the 
> kernel communication socket, so that doesn't get messed up. So I wouldn't 
> go as far as guaranteeing that this will work, but I agree it certainly 
> seems to reliably work. 
>
>>
>> The question to be asked here is it advisable by open-iscsi?
>> I know I have been answered already that iscsiadm is racy, but does it 
>> applies to node logins as well?
>>
>
> I guess I answered that. I wouldn't advise against it, but I also wouldn't 
> call best practice in general. 
>
>>
>> The other option is to use one login-all call without parallelism, but 
>> that would have other implications on our system to consider.
>>
>
> Such as? 
>
As mentioned above,  unless there is a way to specify a list of targets and 
portals for a single login (all) command.

>
>> Your answers would be helpful once again.
>>
>> Thanks,
>> - Amit
>>
>>
> You might be interested in a new feature I'm considering adding to 
> iscsiadm to do asynchronous logins. In other words, the iscsiadm could, 
> when asked to login to one or more targets, would send the login request to 
> the targets, then return success immediately. It is then up to the end-user 
> (you in this case) to poll for when the target actually shows up.
>
This sounds very interesting, but probably will be available to us only on 
later RHEL releases, if chosen to be delivered downstream.
At present it seems we can only use the login-all way or logins in a 
dedicated threads per target-portal.

>
> This would mean that you system boot could occur much more quickly, 
> especially when using for example multipathing on top of two paths to a 
> target, and one path is not up. The problem is that this adds a layer of 
> functionality needed in the client (again, you in this case), since the 
> client has to poll for success, handle timeouts, etc. Also, this is just 
> test code, so you could try it at your own risk. :)
>
> If interested, let me know, and I'll point you at a repo:branch 
>

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/a3584ce4-5786-425a-91a5-d1b6e7d65bbdn%40googlegroups.com.

Re: Concurrent logins to different interfaces of same iscsi target and login timeout

Reply via email to