Re: [ccache] ccache interrupt handling bug

2015-08-30 Thread Nadav Har'El
On Sat, Aug 29, 2015 at 8:01 PM, Joel Rosdahl j...@rosdahl.net wrote:

 On 20 August 2015 at 21:04, Joel Rosdahl j...@rosdahl.net wrote:

 I will not have time to work on improving the signal handler until next
 week at the earliest.


 I've now had a look at improving the signal handler. It would be
 appreciated if somebody wants to review it:
 https://git.samba.org/?p=ccache.git;a=commitdiff;h=737c96fabeab9d66407758b848468aa49190c946


Hi,

Looking at this code, I don't see any obvious holes and it all seems
correct.

One thing you should perhaps consider is whether this change (using
sigaction, signal sets, instead of the antique signal(), etc.) might break
compilation on some antique UNIX machines or on MS-Windows (!?) or
something. But I doubt any present-day system actually lacks sigaction()
and friends, so I think your code is good.




-- 
Nadav Har'El
n...@cloudius-systems.com
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] ccache interrupt handling bug

2015-08-16 Thread Nadav Har'El
On Sun, Aug 9, 2015 at 11:23 PM, Joel Rosdahl j...@rosdahl.net wrote:

 Hi Nadav,

 (Sorry for the delayed reply.)

 Hi, I found a bug in ccache, which makes it impossible to
 correctly interrupt a compilation with a control-C (I tried this on Linux).
 [...]


 Thanks for the bug report and analysis. This will be fixed in ccache 3.2.3.

 Regards,
 -- Joel


Thanks. I see now the fix in 3.2.3, which if I understand correctly, is
simply:

--- a/ccache.c
+++ b/ccache.c
@@ -341,6 +341,7 @@ signal_handler(int signo)
 {
(void)signo;
clean_up_pending_tmp_files();
+   _exit(1);
 }

First of all, thanks. This indeed fixes the bug, and is exactly the first
patch I tried to fix it this problem.

I want to explain, though, why I ended up sending a different patch for
this problem - a 4-line patch instead of this one-liner. The explanation is
not very important, and I should probably have done this earlier, but
perhaps someone would find it interesting.

When the user types ^C, the operating system sends SIGINT to all processes
belonging to the terminal's process group - and in particular it sends
SIGINT to *both* the child compiler and ccache, separately.

Your patch makes ccache exit as soon as it gets the SIGINT - but the child
compiler might still be running for a while longer. This will usually be
fine, but I thought the user can be surprised if he sees ccache (which he
considers to be the compiler) exit, but ps shows the compiler is still
running. This would be especially annoying if some hypothetical compiler
trapped SIGINT deliberately, and continued to run long after ccache exits.

My approach, was to let ccache continue after the SIGINT, and continue to
wait (as it always does) for the child compiler to finish. The child, who
received a SIGINT too, willl indeed finish quickly. The code already
recognizes this case of the child killed by a signal, and returns status ==
-1. So all my 4-line patch did was to not restart the compiler if it died
with a status = -1 (i.e., a signal) and not an ordinary failure. But ccache
does not exit immediately on a signal - just after the child compiler exits.

Again, I guess this destinction is not very important, but I just wanted to
point it out in case anybody thinks it is important.

-- 
Nadav Har'El
n...@cloudius-systems.com
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


[ccache] ccache interrupt handling bug

2015-07-23 Thread Nadav Har'El
Hi, I found a bug in ccache, which makes it impossible to correctly
interrupt a compilation with a control-C (I tried this on Linux).

Consider the following C++ program from hell that takes 13 seconds to
compile on my machine (change 27 to a higher number to make it even
slower):

template int TreePos, int N struct FibSlow_t {
enum { value = FibSlow_tTreePos, N - 1::value +
FibSlow_tTreePos + (1  N), N - 2::value, };
};
template int TreePos struct FibSlow_tTreePos, 2 { enum { value = 1 }; };
template int TreePos struct FibSlow_tTreePos, 1 { enum { value = 1 }; };
static int s_value = FibSlow_t0, 27::value;


Compile this with: CCACHE_RECACHE=1 ccache g++ -c example.cc

Now try to interrupt this with control-C and note something really strange:
The first control-C is seemingly ignored and compilation continues! The
second control-C does work and stops compilation. When this is run from
some build system (e.g., ninja or make), control-C doesn't work at all.

The reason why the second interrupt works is simple: you used signal(),
whose archaic behavior is to set the signal handler for only one go. So
after the first ^C, the second one gets the default signal handling, i.e.,
exit, which works :-)

But why doesn't the first ^C, which calls signal_handler(), work, and cause
the compilation to continue?

It turns out that signal_handler() doesn't exit after cleaning up. Rather,
the log shows:

[2015-07-23T15:04:02.088253 21059] Executing /usr/bin/g++ -c -o z.o
/home/nyh/.ccache/tmp/z.stdout.rice.21059.nHZLHE.ii

here I pressed control-C

[2015-07-23T15:04:05.847059 21059] Unlink
/home/nyh/.ccache/1/a/873cb37b579cd5dd45ca9c43ae8030-644.o.tmp.stdout.rice.21059.9FTbHl
[2015-07-23T15:04:05.847138 21059] Failed opening
/home/nyh/.ccache/tmp/tmp.cpp_stderr.rice.21059.AQUGJX: No such file or
directory
[2015-07-23T15:04:05.847147 21059] Failed; falling back to running the real
compiler
[2015-07-23T15:04:05.847151 21059] Executing /usr/bin/g++ -c z.cc

So, after the signal, we delete the temporary file but continue (in
waitpid() in execute.c). Very soon afterwards, waitpid() discovers the
child also died (it also got the SIGINT signal, like all the processes
connected to the terminal's process group). The execute() call returns -1.

But the code ignores that, and considers this a general failure to run
the compiler, which then causes it to run the compiler again! So it's not
that the compilation isn't interrupted - it actually is, and then restarted!

One way to fix this bug is to recognize that the fact execute() returned -1
has a special meaning (a signal), and ccache should exit and not try to run
the compiler again. The following patch fixes the bug. I'm not sure it's
the best fix, but it works:

@@ -839,10 +840,15 @@
 args_add(args, i_tmpfile);
 }

 cc_log(Running real compiler);
 status = execute(args-argv, tmp_stdout_fd, tmp_stderr_fd);
+if (status == -1) {
+/* The compiler was interrupted by a signal */
+exit(1);
+}
+
 args_pop(args, 3);

 if (x_stat(tmp_stdout, st) != 0) {
 /* The stdout file was removed - cleanup in progress? Better bail
out. */
 stats_update(STATS_MISSING);




-- 
Nadav Har'El
n...@cloudius-systems.com
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache