Re: [ccache] ccache interrupt handling bug

2015-08-30 Thread Nadav Har'El
On Sat, Aug 29, 2015 at 8:01 PM, Joel Rosdahl  wrote:

> On 20 August 2015 at 21:04, Joel Rosdahl  wrote:
>
>> I will not have time to work on improving the signal handler until next
>> week at the earliest.
>
>
> I've now had a look at improving the signal handler. It would be
> appreciated if somebody wants to review it:
> https://git.samba.org/?p=ccache.git;a=commitdiff;h=737c96fabeab9d66407758b848468aa49190c946
>

Hi,

Looking at this code, I don't see any obvious holes and it all seems
correct.

One thing you should perhaps consider is whether this change (using
sigaction, signal sets, instead of the antique signal(), etc.) might break
compilation on some antique UNIX machines or on MS-Windows (!?) or
something. But I doubt any present-day system actually lacks sigaction()
and friends, so I think your code is good.




-- 
Nadav Har'El
n...@cloudius-systems.com
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] ccache interrupt handling bug

2015-08-16 Thread Nadav Har'El
On Mon, Aug 17, 2015 at 12:50 AM, Tom Lane  wrote:

>
> * make launches "ccache gcc -o test.o test.c"
> * user types control-C
> * ccache receives SIGINT and exits
> * make does unlink("test.o") to clean up after failed compile step, which
>   it thinks is done
> * gcc creates test.o
>

Stupid question: Does ccache let gcc write to test.o directly, or does it
write to some temporary file and then ccache copies it?


> * gcc receives SIGINT and exits
>
> We're left with a probably-broken test.o in the filesystem, despite
> make's attempt to clean up.  Now, the window for this is pretty narrow,
> but I think it could happen: I don't believe that sending signals to
> multiple processes is atomic in most kernels.
>
> You really don't want ccache to exit until after all externally-observable
> things have stopped happening.  (I recently fixed a somewhat related bug
> in Postgres :-(, which is why this caught my eye.)
>
> regards, tom lane
>



-- 
Nadav Har'El
n...@cloudius-systems.com
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] ccache interrupt handling bug

2015-08-16 Thread Nadav Har'El
On Sun, Aug 9, 2015 at 11:23 PM, Joel Rosdahl  wrote:

> Hi Nadav,
>
> (Sorry for the delayed reply.)
>
> Hi, I found a bug in ccache, which makes it impossible to
>> correctly interrupt a compilation with a control-C (I tried this on Linux).
>> [...]
>
>
> Thanks for the bug report and analysis. This will be fixed in ccache 3.2.3.
>
> Regards,
> -- Joel
>
>
Thanks. I see now the fix in 3.2.3, which if I understand correctly, is
simply:

--- a/ccache.c
+++ b/ccache.c
@@ -341,6 +341,7 @@ signal_handler(int signo)
 {
(void)signo;
clean_up_pending_tmp_files();
+   _exit(1);
 }

First of all, thanks. This indeed fixes the bug, and is exactly the first
patch I tried to fix it this problem.

I want to explain, though, why I ended up sending a different patch for
this problem - a 4-line patch instead of this one-liner. The explanation is
not very important, and I should probably have done this earlier, but
perhaps someone would find it interesting.

When the user types ^C, the operating system sends SIGINT to all processes
belonging to the terminal's process group - and in particular it sends
SIGINT to *both* the child compiler and ccache, separately.

Your patch makes ccache exit as soon as it gets the SIGINT - but the child
compiler might still be running for a while longer. This will usually be
fine, but I thought the user can be surprised if he sees ccache (which he
considers to be the compiler) exit, but "ps" shows the compiler is still
running. This would be especially annoying if some hypothetical compiler
trapped SIGINT deliberately, and continued to run long after ccache exits.

My approach, was to let ccache continue after the SIGINT, and continue to
wait (as it always does) for the child compiler to finish. The child, who
received a SIGINT too, willl indeed finish quickly. The code already
recognizes this case of the child killed by a signal, and returns status ==
-1. So all my 4-line patch did was to not restart the compiler if it died
with a status = -1 (i.e., a signal) and not an ordinary failure. But ccache
does not exit immediately on a signal - just after the child compiler exits.

Again, I guess this destinction is not very important, but I just wanted to
point it out in case anybody thinks it is important.

-- 
Nadav Har'El
n...@cloudius-systems.com
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] ccache interrupt handling bug

2015-08-10 Thread Nadav Har'El
On Sun, Aug 9, 2015 at 11:23 PM, Joel Rosdahl  wrote:

>
> Hi, I found a bug in ccache, which makes it impossible to
>> correctly interrupt a compilation with a control-C (I tried this on Linux).
>> [...]
>
>
> Thanks for the bug report and analysis. This will be fixed in ccache 3.2.3.
>

Thanks.

Funny thing is that after I fixed this bug, I realized that several of my
colleagues also encountered it many times before, but assumed was a problem
in "make" or "gcc" and treated it as a fact of life - not realizing it was
actually caused by ccache and simple to fix. So it will be great to have it
fixed. Thanks.

-- 
Nadav Har'El
n...@cloudius-systems.com
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


[ccache] ccache interrupt handling bug

2015-07-23 Thread Nadav Har'El
Hi, I found a bug in ccache, which makes it impossible to correctly
interrupt a compilation with a control-C (I tried this on Linux).

Consider the following C++ program from hell that takes 13 seconds to
compile on my machine (change "27" to a higher number to make it even
slower):

template  struct FibSlow_t {
enum { value = FibSlow_t::value +
FibSlow_t::value, };
};
template  struct FibSlow_t { enum { value = 1 }; };
template  struct FibSlow_t { enum { value = 1 }; };
static int s_value = FibSlow_t<0, 27>::value;


Compile this with: "CCACHE_RECACHE=1 ccache g++ -c example.cc"

Now try to interrupt this with control-C and note something really strange:
The first control-C is seemingly ignored and compilation continues! The
second control-C does work and stops compilation. When this is run from
some build system (e.g., "ninja" or "make"), control-C doesn't work at all.

The reason why the second interrupt works is simple: you used signal(),
whose archaic behavior is to set the signal handler for only one go. So
after the first ^C, the second one gets the default signal handling, i.e.,
exit, which works :-)

But why doesn't the first ^C, which calls signal_handler(), work, and cause
the compilation to continue?

It turns out that signal_handler() doesn't exit after cleaning up. Rather,
the log shows:

[2015-07-23T15:04:02.088253 21059] Executing /usr/bin/g++ -c -o z.o
/home/nyh/.ccache/tmp/z.stdout.rice.21059.nHZLHE.ii



[2015-07-23T15:04:05.847059 21059] Unlink
/home/nyh/.ccache/1/a/873cb37b579cd5dd45ca9c43ae8030-644.o.tmp.stdout.rice.21059.9FTbHl
[2015-07-23T15:04:05.847138 21059] Failed opening
/home/nyh/.ccache/tmp/tmp.cpp_stderr.rice.21059.AQUGJX: No such file or
directory
[2015-07-23T15:04:05.847147 21059] Failed; falling back to running the real
compiler
[2015-07-23T15:04:05.847151 21059] Executing /usr/bin/g++ -c z.cc

So, after the signal, we delete the temporary file but continue (in
waitpid() in execute.c). Very soon afterwards, waitpid() discovers the
child also died (it also got the SIGINT signal, like all the processes
connected to the terminal's process group). The execute() call returns -1.

But the code ignores that, and considers this a general "failure" to run
the compiler, which then causes it to run the compiler again! So it's not
that the compilation isn't interrupted - it actually is, and then restarted!

One way to fix this bug is to recognize that the fact execute() returned -1
has a special meaning (a signal), and ccache should exit and not try to run
the compiler again. The following patch fixes the bug. I'm not sure it's
the "best" fix, but it works:

@@ -839,10 +840,15 @@
 args_add(args, i_tmpfile);
 }

 cc_log("Running real compiler");
 status = execute(args->argv, tmp_stdout_fd, tmp_stderr_fd);
+if (status == -1) {
+/* The compiler was interrupted by a signal */
+exit(1);
+}
+
 args_pop(args, 3);

 if (x_stat(tmp_stdout, &st) != 0) {
     /* The stdout file was removed - cleanup in progress? Better bail
out. */
 stats_update(STATS_MISSING);




-- 
Nadav Har'El
n...@cloudius-systems.com
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache