On Wed, Oct 12, 2011 at 08:27:53AM -0500, Dan McGee wrote: > On Tue, Oct 11, 2011 at 1:33 PM, Lukas Fleischer > <[email protected]> wrote: > > On Tue, Oct 11, 2011 at 07:03:13PM +0200, Lukas Fleischer wrote: > >> On Tue, Oct 11, 2011 at 11:29:31AM -0500, Dan McGee wrote: > >> > On Tue, Oct 11, 2011 at 11:13 AM, Lukas Fleischer > >> > <[email protected]> wrote: > >> > > There's no need to keep the lock file if we interrupt repo-{add,remove} > >> > > using ^C (same applies to other signals). > >> > Definite +1 here. > >> > > >> > > In contrast to a clean exit, we do not `rm -rf` the temporary directory > >> > > here, so that it's still possible to analyze what went wrong. > >> > I'm not sure what the right answer is here- I see where you're coming > >> > from, but given that we give the user no indication we're leaving a > >> > mess behind, I think we should either clean it up or tell the user we > >> > did not. Also noticed that we print an error message with TERM even > >> > when HUP and QUIT are caught. > >> > >> I'm not really sure either, but I'd say that showing a message that > >> includes "$tmpdir" might be the best thing to do here. > >> > >> I also submitted a separate patch that fixes the SIGTERM message issue. > > > > Ugh. It's a bit more complicated, even. Given that we use exit at the > > very end of trap_exit(), this should already trigger clean_up() > > automatically. > Ahh, true. Odd I didn't think about this before. So this only happens > when? If someone hits Ctrl-C in quick succession just right?
Well, I was able to reproduce that a couple of times pressing ^C once. It only works as of a certain moment, tough (probably after we entered clean_up() - that's by best guess anyway): ---- $ ./repo-add extra.db.tar.gz foobar ==> Extracting database to a temporary location... ^C ==> ERROR: Aborted by user! Exiting... $ ./repo-add extra.db.tar.gz foobar ==> Extracting database to a temporary location... ==> ERROR: File 'foobar' not found. ==> No packages modified, nothing to do. ^C ==> ERROR: Aborted by user! Exiting... $ ./repo-add extra.db.tar.gz foobar ==> ERROR: Failed to acquire lockfile: extra.db.tar.gz.lck. ==> ERROR: Held by process 18522 ---- > > > The only case where this doesn't work is if we send a > > signal when clean_up() is already being executed (it's not too hard to > > reproduce that, actually). So just copying the two lines that unlink the > > lock file and remove the directory from clean_up() might be the easiest > > way to go? > I'm not sure we can totally eliminate all race conditions. Can we get > closer by unhooking all traps at the start of both trap_exit() and > clean_up(), and then explicitly calling clean_up() from trap_exit()? I thought of unhooking traps as well but I'm not too enthusiastic about that. We `rm -r` a directory in clean_up() and if this ever happens to hang for some reason, there will be no way to cancel, except for sending SIGKILL. > The bash manpage is woefully non-descriptive about signal handlers and > their reentry or recall procedure- I'm assuming you could > theoretically keep sending signals fast enough that signal handlers > pile up on the stack and none of them ever complete, so attempting to > unhook as soon as possible and not depend on the signal handlers being > in place anymore would at least prevent that stack growth. Doing some more tests, it seems like signals are blocked if the signal handler for a particular signal is already running. So, basically just call clean_up() from trap_exit() (or copy-paste the two lines, like I suggested before)?
