Re: RFC: iscsid shutdown hangs with system when service manually killed

Lee Duncan Mon, 21 Nov 2016 09:42:17 -0800

> On Nov 21, 2016, at 6:14 AM, Hannes Reinecke <[email protected]> wrote:
> 
> On 11/19/2016 08:46 PM, The Lee-Man wrote:
>> In this wonderful new world of systemd, I have an issue with stopping
>> the iscsid service when the daemon has died or been killed.
>> 
>> My setup:
>> * I have an iscsid.socket unit file, which is enabled and started
>> * I have an iscsid.service unit file, which controls the iscsid daemon.
>> This is disabled and not started
>> 
>> Normally, if I run a command like "iscsiadm -m discovery -t st -p
>> SOME-TARGET", systemd notices that iscsiadm is trying to talk to iscsid
>> through the socket, and it starts up iscsid. This is the cool part
>> (IMHO) of systemd socket activation.
>> 
>> When I want to stop iscsid, I can just tell systemctl to do it via
>> "systemctl stop iscsid", and it runs the "ExecStop=" command in the
>> service unit file, which is "iscsiadm -k 0 2" before terminating the
>> daemon process(es).
>> 
>> [NOTE: the "2" here in this command actually does nothing and is
>> ignored, but I copied this from someplace else long ago, and the "2" was
>> present there.]
>> 
>> It is of importance, in this case, that the ExecStop command actually
>> sends an IPC message to the daemon (iscsid) requesting it to cleanly
>> shut itself down. Herein lies the rub.
>> 
>> All of this works great until the daemon happens not to be running. You
>> can simulate this with "kill -TERM $(pidof iscsid)" when the daemon is
>> running. Now you are in a situation where systemd started the service
>> and knows it is now not running, so it seems to send the ExecStop
>> command to cleanly shut it down. This command hangs! It seems to be
>> stuck in an infinite loop trying to send the shutdown command to the
>> daemon, waiting for it to timeout, then trying again. The daemon starts
>> up, sees an IPC error, and exits.
>> 
>> While this clearly seems like a systemd issue, I have found a workaround
>> that looks clean. Well, as clean as the shutdown command is, anyway.
> 
> No, that's not the case.
> This is a plain issue with iscsiadm; it's waiting in 'recv' to receive
> an response packet from iscsid which of course never will come.
> The reason why this becomes an issue now is that systemd holds the
> socket for us, so the socket is available; hence 'recv()' will not
> return with ENOTCONN or somesuch, indicating that the socket isn't present.
> The socket _is_ present, it's just not sending any responses back.
> 
> Attached is a patch for open-iscsi to use 'poll()' before recv(), and
> terminating it if we didn't get a response in time.
> 
> This doesn't do anything from the mentioned socket activation issue for
> iscsid, but at least iscsiadm doesn't hang anymore.
> 
> Cheers,
> 
> Hannes


One issue with the patch. It looks like you might need a “break” added
after the “log_error()” when poll() fails, or it could be an infinite loop:

+       while (len) {
+               struct pollfd pfd;
+
+               pfd.fd = fd;
+               pfd.events = POLLIN;
+               err = poll(&pfd, 1, timeout);
+               if (!err) {
+                       if (poll_wait)
+                               continue;
+                       return ISCSI_ERR_ISCSID_NOTCONN;
+               } else if (err < 0) {
+                       if (errno == EINTR)
+                               continue;
+                       log_error("got poll error (%d/%d), daemon died?",
+                                 err, errno);
+                       return ISCSI_ERR_ISCSID_COMM_ERR;
+               } else if (!pfd.revents & POLLIN)
+                       continue;
+               err = recv(fd, rsp, sizeof(*rsp), MSG_WAITALL);
+               if (err < 0) {
+                       log_error("got read error (%d/%d), daemon died?",
+                                 err, errno);
                       break;                                                   
      <<===== Add a break here?
+               } else {
+                       len -= err;
+                       iscsi_err = rsp->err;
+               }
+       }

I can make that adjustment, if you agree it’s needed, when applying the patch.
-- 
The Lee-Man
Prepare to be unprepared. -- Me

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Re: RFC: iscsid shutdown hangs with system when service manually killed

Reply via email to