Hi Cathy:

You should really reply to the list, so others can comment, as well.

My comments are (well) below ...

> On Dec 20, 2017, at 3:04 PM, Cathy Zhou <[email protected]> wrote:
> 
> Lee-man,
> 
> Thanks for your comments. See inline:
> 
> On 12/20/2017 1:47 PM, The Lee-Man wrote:
>> 
>> 
>> On Friday, December 8, 2017 at 4:30:36 PM UTC-8, The Lee-Man wrote:
>> 
>>    On Thursday, December 7, 2017 at 11:49:53 AM UTC-8,
>>    [email protected] <mailto:[email protected]> wrote:
>> 
>>        From: Cathy Zhou <[email protected]>
>> 
>>        Co-existence of iscsid and iscsistart can lead to unexpected
>>        iSCSI boot
>>        behavior:
>> 
>>        We ran into iSCSI netboot failure when multiple interfaces are
>>        configured
>>        with dhcp. The problem is that iscsistart is not designed to
>>        be working
>>        together with iscsid. When an interface gets the dhcp offer
>>        successfully,
>>        the iscsiroot script is run which starts the iscsistart
>>        service to
>>        establish the iSCSI session. With the existence of iscsid, the
>>        iscsistart
>>        service's attempt to setup its own mgmt ipc fails. Instead,
>>        the request
>>        to login to the iscsi target is handled by the mgmt ipc of
>>        iscsid. After
>>        iscsistart finishes its login attempt, it eventually sends a
>>        stop_event_loop request to stop the mgmt process. As the
>>        result, it
>>        terminates iscsid.
>> 
>>        I don't think this behavior is expected. Further, we believe
>>        the restart
>>        of the iscsid service in iscsiroot.sh for multiple interfaces
>>        could also
>>        interfere with each other and race. That could also cause
>>        unexpected
>>        behavior.
>> 
>>        Specifically, in our multiple interface iSCSI boot case, below
>>        is what
>>        happened:
>> 
>>        a. eth0 successfully got the dhcp offer, the iscsiroot script
>>        was run and
>>        eventually ran "system-run" to start the "oneshot" iscsistart
>>        service.
>> 
>>        b. Before step a succeeded, the iscsiroot script was also run
>>        for eth1.
>>        It checks the status of the first iscsistart service instance,
>>        which was
>>        still "activating". So the iscsistart service was restarted
>>        and that
>>        killed the first instance. Note that in this case, the
>>        stop_event_loop
>>        request has not been sent, and iscsid was not terminated. As
>>        the result,
>>        the second iscsistart instance also failed because of the
>>        "existing
>>        session" error, as the half-baked session already existed in
>>        iscsid.
>> 
>>        Based on iscsistart(8), iscsistart's primary use is to start
>>        session
>>        used for iSCSI root boot, and is meant to work indpependently
>>        from iscsid.
>>        Making the two work together would need coordination between each
>>        other, which would be complicate and unnecessary.
>> 
>> 
>>    No, I believe iscsistart was mean to run *instead* of iscsid, not
>>    independently.
>> 
> That was what I meant: iscsistart is not to run together with iscsid.
>> 
>> 
>>        Therefore, to fix the issue, we'd either choose to remove the
>>        use of
>>        iscsid and solely use iscsistart or to remove iscsistart and
>>        use iscsid
>>        and iscsiadm to manage the iSCSI sessions.
>> 
>>        Our fix chooses the former. We tested the change in our setup.
>> 
>>        Signed-off-by: Cathy Zhou <[email protected]>
>>        ---
>>         modules.d/95iscsi/cleanup-iscsi.sh   |  2 +-
>>         modules.d/95iscsi/iscsiroot.sh       | 25
>>        +------------------------
>>         modules.d/95iscsi/module-setup.sh    | 30
>>        ------------------------------
>>         modules.d/95iscsi/parse-iscsiroot.sh | 10 ----------
>>         4 files changed, 2 insertions(+), 65 deletions(-)
>> 
>>        diff --git a/modules.d/95iscsi/cleanup-iscsi.sh
>>        b/modules.d/95iscsi/cleanup-iscsi.sh
>>        index bfc8aefc..e97d65ac 100755
>>        --- a/modules.d/95iscsi/cleanup-iscsi.sh
>>        +++ b/modules.d/95iscsi/cleanup-iscsi.sh
>>        @@ -1,4 +1,4 @@
>>         #!/bin/sh
>> 
>>        -[ -z "${DRACUT_SYSTEMD}" ] && [ -e /sys/module/bnx2i ] &&
>>        killproc iscsiuio
>>        +[ -e /sys/module/bnx2i ] && killproc iscsiuio
>> 
>> 
>>    Why are you removing the DRACUT_SYSTEMD checks?
>> 
> Because before my change, iscsiuio is started in the form of service as a 
> dependency of the iscsid service. My change removed the iscsid service, so 
> iscsiuio needs to be started explicitly for iSCSI offload for bnx2i driver.
>> 
>> 
>>        diff --git a/modules.d/95iscsi/iscsiroot.sh
>>        b/modules.d/95iscsi/iscsiroot.sh
>>        index aefd263d..c7f1c474 100755
>>        --- a/modules.d/95iscsi/iscsiroot.sh
>>        +++ b/modules.d/95iscsi/iscsiroot.sh
>>        @@ -36,7 +36,7 @@ iroot=${iroot#:}
>>         # figured out a way how to check whether this is built-in or not
>>         modprobe crc32c 2>/dev/null
>> 
>>        -if [ -z "${DRACUT_SYSTEMD}" ] && [ -e /sys/module/bnx2i ] &&
>>        ! [ -e /tmp/iscsiuio-started ]; then
>>        +[ -e /sys/module/bnx2i ] && ! [ -e /tmp/iscsiuio-started ]; then
>>                 iscsiuio
>>                 > /tmp/iscsiuio-started
>>         fi
>>        @@ -117,11 +117,6 @@ handle_netroot()
>>                    mkdir -p /etc/iscsi
>>                    ln -fs /run/initiatorname.iscsi
>>        /etc/iscsi/initiatorname.iscsi
>>                    > /tmp/iscsi_set_initiator
>>        -           if [ -n "$DRACUT_SYSTEMD" ]; then
>>        -               systemctl try-restart iscsid
>>        -               # FIXME: iscsid is not yet ready, when the
>>        service is :-/
>>        -               sleep 1
>>        -           fi
>>             fi
>> 
>>             if [ -z "$iscsi_initiator" ]; then
>>        @@ -138,11 +133,6 @@ handle_netroot()
>>                 mkdir -p /etc/iscsi
>>                 ln -fs /run/initiatorname.iscsi
>>        /etc/iscsi/initiatorname.iscsi
>>                 > /tmp/iscsi_set_initiator
>>        -        if [ -n "$DRACUT_SYSTEMD" ]; then
>>        -            systemctl try-restart iscsid
>>        -            # FIXME: iscsid is not yet ready, when the
>>        service is :-/
>>        -            sleep 1
>>        -        fi
>>             fi
>> 
>> 
>>        @@ -163,11 +153,6 @@ handle_netroot()
>>             if ! [ -e /etc/iscsi/initiatorname.iscsi ]; then
>>                 mkdir -p /etc/iscsi
>>                 ln -fs /run/initiatorname.iscsi
>>        /etc/iscsi/initiatorname.iscsi
>>        -        if [ -n "$DRACUT_SYSTEMD" ]; then
>>        -            systemctl try-restart iscsid
>>        -            # FIXME: iscsid is not yet ready, when the
>>        service is :-/
>>        -            sleep 1
>>        -        fi
>>             fi
>>         # FIXME $iscsi_protocol??
>> 
>>        @@ -234,14 +219,6 @@ if [ "$netif" != "timeout" ] &&
>>        getargbool 1 rd.iscsi.waitnet; then
>>             all_ifaces_setup || exit 0
>>         fi
>> 
>>        -if [ "$netif" = "timeout" ] && all_ifaces_setup; then
>>        -    # s.th <http://s.th>. went wrong and the timeout script hits
>>        -    # restart
>>        -    systemctl restart iscsid
>>        -    # damn iscsid is not ready after unit says it's ready
>>        -    sleep 2
>>        -fi
>>        -
>>         if getargbool 0 rd.iscsi.firmware -d -y iscsi_firmware ; then
>>             if [ "$netif" = "timeout" ] || [ "$netif" = "online" ]; then
>>                 handle_firmware
>>        diff --git a/modules.d/95iscsi/module-setup.sh
>>        b/modules.d/95iscsi/module-setup.sh
>>        index 04937b5b..1d185a84 100755
>>        --- a/modules.d/95iscsi/module-setup.sh
>>        +++ b/modules.d/95iscsi/module-setup.sh
>>        @@ -199,36 +199,6 @@ install() {
>>             inst "$moddir/iscsiroot.sh" "/sbin/iscsiroot"
>>             if ! dracut_module_included "systemd"; then
>>                 inst "$moddir/mount-lun.sh" "/bin/mount-lun.sh"
>>        -    else
>>        -        inst_multiple -o \
>>        -                      $systemdsystemunitdir/iscsi.service \
>>        -                      $systemdsystemunitdir/iscsid.service \
>>        -                      $systemdsystemunitdir/iscsid.socket \
>>        -                      $systemdsystemunitdir/iscsiuio.service \
>>        -                      $systemdsystemunitdir/iscsiuio.socket \
>>        -                      iscsiadm iscsid
>>        -
>>        -        mkdir -p
>>        "${initdir}/$systemdsystemunitdir/sockets.target.wants"
>>        -        for i in \
>>        -                iscsiuio.socket \
>>        -            ; do
>>        -            ln_r "$systemdsystemunitdir/${i}"
>>        "$systemdsystemunitdir/sockets.target.wants/${i}"
>>        -        done
>>        -
>>        -        mkdir -p
>>        "${initdir}/$systemdsystemunitdir/basic.target.wants"
>>        -        for i in \
>>        -                iscsid.service \
>>        -            ; do
>>        -            ln_r "$systemdsystemunitdir/${i}"
>>        "$systemdsystemunitdir/basic.target.wants/${i}"
>>        -        done
>>        -
>>        -        # Make sure iscsid is started after dracut-cmdline
>>        and ready for the initqueue
>>        -        mkdir -p
>>        "${initdir}/$systemdsystemunitdir/iscsid.service.d"
>>        -        (
>>        -            echo "[Unit]"
>>        -            echo "After=dracut-cmdline.service"
>>        -            echo "Before=dracut-initqueue.service"
>>        -        ) >
>>        "${initdir}/$systemdsystemunitdir/iscsid.service.d/dracut.conf"
>> 
>> 
>>    I'm not sure why you've removed all of this.
>> 
> Because the above is added for the iscsid service, which is no longer needed.
>> 
>> 
>>             fi
>>             inst_dir /var/lib/iscsi
>>             dracut_need_initqueue
>>        diff --git a/modules.d/95iscsi/parse-iscsiroot.sh
>>        b/modules.d/95iscsi/parse-iscsiroot.sh
>>        index 43b2e088..c8c66ccf 100755
>>        --- a/modules.d/95iscsi/parse-iscsiroot.sh
>>        +++ b/modules.d/95iscsi/parse-iscsiroot.sh
>>        @@ -116,11 +116,6 @@ if arg=$(getarg rd.iscsi.initiator -d
>>        iscsi_initiator=) && [ -n "$arg" ] && ! [
>>             if ! [ -e /etc/iscsi/initiatorname.iscsi ]; then
>>                 mkdir -p /etc/iscsi
>>                 ln -fs /run/initiatorname.iscsi
>>        /etc/iscsi/initiatorname.iscsi
>>        -        if [ -n "$DRACUT_SYSTEMD" ]; then
>>        -            systemctl try-restart iscsid
>>        -            # FIXME: iscsid is not yet ready, when the
>>        service is :-/
>>        -            sleep 1
>>        -        fi
>>             fi
>>         fi
>> 
>>        @@ -133,11 +128,6 @@ if [ -z $iscsi_initiator ] && [ -f
>>        /sys/firmware/ibft/initiator/initiator-name ]
>>                 mkdir -p /etc/iscsi
>>                 ln -fs /run/initiatorname.iscsi
>>        /etc/iscsi/initiatorname.iscsi
>>                 > /tmp/iscsi_set_initiator
>>        -        if [ -n "$DRACUT_SYSTEMD" ]; then
>>        -            systemctl try-restart iscsid
>>        -            # FIXME: iscsid is not yet ready, when the
>>        service is :-/
>>        -            sleep 1
>>        -        fi
>>             fi
>>         fi
>> 
>>        --         2.11.1
>> 
>>    So in general you've stopped starting or restarting the iscsid
>>    service?
>> 
> Yes. Actually we do not even start the iscsid service from the beginning. The 
> whole fix is to remove iscsid but only run iscsistart.

Okay, I’m glad you made this clear, because this is the problem with your 
approach.

The iscsistart program is not sufficient for supporting an iSCSI initiator. 
Namely, it does not manage sessions, so it can’t do any error handling (i.e. 
log out, log back in).

The whole reason the iscsistart program was created was that, at iSCSI boot 
time, we needed a lighter-weight program than the iscsiadm/iscsid pair to 
connect to a single target.

>> 
>> 
>>    Have you tested this with iscsiuio/brcm? I have no way currently
>>    to test to make sure things all still work.
>> 
> We will, and also planning to add more test cases to get more iscsiboot 
> coverage. But because this is a significant change, we'd like to get some 
> comments to say this is the right approach.
>> 
>> 
>>    Note: dracut is *not* my specialty, but I have looked at this
>>    stuff in the SUSE distribution, and it seems like a bit of a house
>>    of cards. :(
>> 
> 
> I agree. dracut is not my specialty either.
> 
> Thank you very much!
> - Cathy
> 

Bottom line, I need to better understand why you are using iscsitart in the 
first place, if you are not booting from an iSCSI disc. And if you are using 
iscsistart, why are you running iscsid before iscsistart is done?

-- 
Lee Duncan

"Ambition is the last refuge of failure." -- Oscar Wilde


-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to