>>> Lee Duncan <[email protected]> schrieb am 21.11.2016 um 18:41 in Nachricht <[email protected]>:
>> On Nov 21, 2016, at 6:14 AM, Hannes Reinecke <[email protected]> wrote: >> >> On 11/19/2016 08:46 PM, The Lee-Man wrote: >>> In this wonderful new world of systemd, I have an issue with stopping >>> the iscsid service when the daemon has died or been killed. >>> >>> My setup: >>> * I have an iscsid.socket unit file, which is enabled and started >>> * I have an iscsid.service unit file, which controls the iscsid daemon. >>> This is disabled and not started >>> >>> Normally, if I run a command like "iscsiadm -m discovery -t st -p >>> SOME-TARGET", systemd notices that iscsiadm is trying to talk to iscsid >>> through the socket, and it starts up iscsid. This is the cool part >>> (IMHO) of systemd socket activation. >>> >>> When I want to stop iscsid, I can just tell systemctl to do it via >>> "systemctl stop iscsid", and it runs the "ExecStop=" command in the >>> service unit file, which is "iscsiadm -k 0 2" before terminating the >>> daemon process(es). >>> >>> [NOTE: the "2" here in this command actually does nothing and is >>> ignored, but I copied this from someplace else long ago, and the "2" was >>> present there.] >>> >>> It is of importance, in this case, that the ExecStop command actually >>> sends an IPC message to the daemon (iscsid) requesting it to cleanly >>> shut itself down. Herein lies the rub. >>> >>> All of this works great until the daemon happens not to be running. You >>> can simulate this with "kill -TERM $(pidof iscsid)" when the daemon is >>> running. Now you are in a situation where systemd started the service >>> and knows it is now not running, so it seems to send the ExecStop >>> command to cleanly shut it down. This command hangs! It seems to be >>> stuck in an infinite loop trying to send the shutdown command to the >>> daemon, waiting for it to timeout, then trying again. The daemon starts >>> up, sees an IPC error, and exits. >>> >>> While this clearly seems like a systemd issue, I have found a workaround >>> that looks clean. Well, as clean as the shutdown command is, anyway. >> >> No, that's not the case. >> This is a plain issue with iscsiadm; it's waiting in 'recv' to receive >> an response packet from iscsid which of course never will come. >> The reason why this becomes an issue now is that systemd holds the >> socket for us, so the socket is available; hence 'recv()' will not >> return with ENOTCONN or somesuch, indicating that the socket isn't present. >> The socket _is_ present, it's just not sending any responses back. >> >> Attached is a patch for open-iscsi to use 'poll()' before recv(), and >> terminating it if we didn't get a response in time. >> >> This doesn't do anything from the mentioned socket activation issue for >> iscsid, but at least iscsiadm doesn't hang anymore. >> >> Cheers, >> >> Hannes > > One issue with the patch. It looks like you might need a “break” added > after the “log_error()” when poll() fails, or it could be an infinite loop: > > + while (len) { > + struct pollfd pfd; > + > + pfd.fd = fd; > + pfd.events = POLLIN; > + err = poll(&pfd, 1, timeout); > + if (!err) { > + if (poll_wait) > + continue; > + return ISCSI_ERR_ISCSID_NOTCONN; > + } else if (err < 0) { > + if (errno == EINTR) > + continue; > + log_error("got poll error (%d/%d), daemon died?", > + err, errno); > + return ISCSI_ERR_ISCSID_COMM_ERR; > + } else if (!pfd.revents & POLLIN) > + continue; > + err = recv(fd, rsp, sizeof(*rsp), MSG_WAITALL); > + if (err < 0) { > + log_error("got read error (%d/%d), daemon died?", > + err, errno); > break; > <<===== Add a break here? > + } else { > + len -= err; > + iscsi_err = rsp->err; > + } > + } > > I can make that adjustment, if you agree it’s needed, when applying the > patch. I also think that a look with three "continue" and one "break" is a bit complicated. Maybe the fix was too "quick and dirty". Instead of the "break" a "len = 0" would also do. The recv() seems to be done only when err >= 0 and (pfd.events & POLLIN) != 0. I think some more "if .. else" could avoid those "continue"s Ulrich > -- > The Lee-Man > Prepare to be unprepared. -- Me > > -- > You received this message because you are subscribed to the Google Groups > "open-iscsi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/open-iscsi. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
