Antw: Re: RFC: iscsid shutdown hangs with system when service manually killed

Ulrich Windl Mon, 21 Nov 2016 23:44:55 -0800

>>> Lee Duncan <[email protected]> schrieb am 21.11.2016 um 18:41 in
Nachricht <[email protected]>:


>> On Nov 21, 2016, at 6:14 AM, Hannes Reinecke <[email protected]> wrote:
>> 
>> On 11/19/2016 08:46 PM, The Lee-Man wrote:
>>> In this wonderful new world of systemd, I have an issue with stopping
>>> the iscsid service when the daemon has died or been killed.
>>> 
>>> My setup:
>>> * I have an iscsid.socket unit file, which is enabled and started
>>> * I have an iscsid.service unit file, which controls the iscsid daemon.
>>> This is disabled and not started
>>> 
>>> Normally, if I run a command like "iscsiadm -m discovery -t st -p
>>> SOME-TARGET", systemd notices that iscsiadm is trying to talk to iscsid
>>> through the socket, and it starts up iscsid. This is the cool part
>>> (IMHO) of systemd socket activation.
>>> 
>>> When I want to stop iscsid, I can just tell systemctl to do it via
>>> "systemctl stop iscsid", and it runs the "ExecStop=" command in the
>>> service unit file, which is "iscsiadm -k 0 2" before terminating the
>>> daemon process(es).
>>> 
>>> [NOTE: the "2" here in this command actually does nothing and is
>>> ignored, but I copied this from someplace else long ago, and the "2" was
>>> present there.]
>>> 
>>> It is of importance, in this case, that the ExecStop command actually
>>> sends an IPC message to the daemon (iscsid) requesting it to cleanly
>>> shut itself down. Herein lies the rub.
>>> 
>>> All of this works great until the daemon happens not to be running. You
>>> can simulate this with "kill -TERM $(pidof iscsid)" when the daemon is
>>> running. Now you are in a situation where systemd started the service
>>> and knows it is now not running, so it seems to send the ExecStop
>>> command to cleanly shut it down. This command hangs! It seems to be
>>> stuck in an infinite loop trying to send the shutdown command to the
>>> daemon, waiting for it to timeout, then trying again. The daemon starts
>>> up, sees an IPC error, and exits.
>>> 
>>> While this clearly seems like a systemd issue, I have found a workaround
>>> that looks clean. Well, as clean as the shutdown command is, anyway.
>> 
>> No, that's not the case.
>> This is a plain issue with iscsiadm; it's waiting in 'recv' to receive
>> an response packet from iscsid which of course never will come.
>> The reason why this becomes an issue now is that systemd holds the
>> socket for us, so the socket is available; hence 'recv()' will not
>> return with ENOTCONN or somesuch, indicating that the socket isn't
present.
>> The socket _is_ present, it's just not sending any responses back.
>> 
>> Attached is a patch for open-iscsi to use 'poll()' before recv(), and
>> terminating it if we didn't get a response in time.
>> 
>> This doesn't do anything from the mentioned socket activation issue for
>> iscsid, but at least iscsiadm doesn't hang anymore.
>> 
>> Cheers,
>> 
>> Hannes
> 
> One issue with the patch. It looks like you might need a “break” added
> after the “log_error()” when poll() fails, or it could be an infinite
loop:
> 
> +     while (len) {
> +             struct pollfd pfd;
> +
> +             pfd.fd = fd;
> +             pfd.events = POLLIN;
> +             err = poll(&pfd, 1, timeout);
> +             if (!err) {
> +                     if (poll_wait)
> +                             continue;
> +                     return ISCSI_ERR_ISCSID_NOTCONN;
> +             } else if (err < 0) {
> +                     if (errno == EINTR)
> +                             continue;
> +                     log_error("got poll error (%d/%d), daemon died?",
> +                               err, errno);
> +                     return ISCSI_ERR_ISCSID_COMM_ERR;
> +             } else if (!pfd.revents & POLLIN)
> +                     continue;
> +             err = recv(fd, rsp, sizeof(*rsp), MSG_WAITALL);
> +             if (err < 0) {
> +                     log_error("got read error (%d/%d), daemon died?",
> +                               err, errno);
>                        break;                                               

>          <<===== Add a break here?
> +             } else {
> +                     len -= err;
> +                     iscsi_err = rsp->err;
> +             }
> +     }
> 
> I can make that adjustment, if you agree it’s needed, when applying the 
> patch.

I also think that a look with three "continue" and one "break" is a bit
complicated. Maybe the fix was too "quick and dirty". Instead of the "break" a
"len = 0" would also do. The recv() seems to be done only when err >= 0 and
(pfd.events & POLLIN) != 0.
I think some more "if .. else" could avoid those "continue"s

Ulrich



> -- 
> The Lee-Man
> Prepare to be unprepared. -- Me
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/open-iscsi.
> For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Antw: Re: RFC: iscsid shutdown hangs with system when service manually killed

Reply via email to