[ccache] ccache interrupt handling bug

2015-07-23 Thread Nadav Har'El
Hi, I found a bug in ccache, which makes it impossible to correctly
interrupt a compilation with a control-C (I tried this on Linux).

Consider the following C++ program from hell that takes 13 seconds to
compile on my machine (change "27" to a higher number to make it even
slower):

template  struct FibSlow_t {
enum { value = FibSlow_t::value +
FibSlow_t::value, };
};
template  struct FibSlow_t { enum { value = 1 }; };
template  struct FibSlow_t { enum { value = 1 }; };
static int s_value = FibSlow_t<0, 27>::value;


Compile this with: "CCACHE_RECACHE=1 ccache g++ -c example.cc"

Now try to interrupt this with control-C and note something really strange:
The first control-C is seemingly ignored and compilation continues! The
second control-C does work and stops compilation. When this is run from
some build system (e.g., "ninja" or "make"), control-C doesn't work at all.

The reason why the second interrupt works is simple: you used signal(),
whose archaic behavior is to set the signal handler for only one go. So
after the first ^C, the second one gets the default signal handling, i.e.,
exit, which works :-)

But why doesn't the first ^C, which calls signal_handler(), work, and cause
the compilation to continue?

It turns out that signal_handler() doesn't exit after cleaning up. Rather,
the log shows:

[2015-07-23T15:04:02.088253 21059] Executing /usr/bin/g++ -c -o z.o
/home/nyh/.ccache/tmp/z.stdout.rice.21059.nHZLHE.ii



[2015-07-23T15:04:05.847059 21059] Unlink
/home/nyh/.ccache/1/a/873cb37b579cd5dd45ca9c43ae8030-644.o.tmp.stdout.rice.21059.9FTbHl
[2015-07-23T15:04:05.847138 21059] Failed opening
/home/nyh/.ccache/tmp/tmp.cpp_stderr.rice.21059.AQUGJX: No such file or
directory
[2015-07-23T15:04:05.847147 21059] Failed; falling back to running the real
compiler
[2015-07-23T15:04:05.847151 21059] Executing /usr/bin/g++ -c z.cc

So, after the signal, we delete the temporary file but continue (in
waitpid() in execute.c). Very soon afterwards, waitpid() discovers the
child also died (it also got the SIGINT signal, like all the processes
connected to the terminal's process group). The execute() call returns -1.

But the code ignores that, and considers this a general "failure" to run
the compiler, which then causes it to run the compiler again! So it's not
that the compilation isn't interrupted - it actually is, and then restarted!

One way to fix this bug is to recognize that the fact execute() returned -1
has a special meaning (a signal), and ccache should exit and not try to run
the compiler again. The following patch fixes the bug. I'm not sure it's
the "best" fix, but it works:

@@ -839,10 +840,15 @@
 args_add(args, i_tmpfile);
 }

 cc_log("Running real compiler");
 status = execute(args->argv, tmp_stdout_fd, tmp_stderr_fd);
+if (status == -1) {
+/* The compiler was interrupted by a signal */
+exit(1);
+}
+
 args_pop(args, 3);

 if (x_stat(tmp_stdout, &st) != 0) {
 /* The stdout file was removed - cleanup in progress? Better bail
out. */
 stats_update(STATS_MISSING);




-- 
Nadav Har'El
n...@cloudius-systems.com
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Caching failed compilations

2015-07-23 Thread Joel Rosdahl
(Resend since my original mail didn't reach the mailing list properly.)

(Sorry for the delayed reply, I have been on vacation.)

On cache-hit, there's currently no reason to actually look inside the file,
> right? It just does the copy blind (I forget exactly how). Reading the
> initial data from every binary on every cache-hit (the case we want to be
> most optimal) sounds like a Bad Thing.


No no, doing an extra read of initial data is not needed. If something I
wrote implied that I must have been unclear.

A cache hit in the current implementation (simplified, of course):

1. Stat object file in cache. If it exists, we have a hit.
2. Open file.
3. Read a chunk of the file into a buffer.
4. Write buffer content to the destination object file.
5. Repeat 3 and 4 until EOF.
6. Close file.

A cache hit in the suggested solution where special data (e.g. encoding the
exit code) is written to the cached "object file":

1. Stat object file in cache. If it exists, we have a hit.
2. Open file.
3. Read a chunk of the file into a buffer.
4. If the buffer contains special data (e.g. starts with a ccache-specific
header), exit with the encoded exit code (and write stderr, etc.). Else:
5. Write buffer content to the destination object file.
6. Repeat 3 and 5 until EOF.
7. Close file.

So, exactly the same system calls in both cases for a normal cache hit.
That's what I tried to summarize with "On a cache hit, we need to open and
read the file regardless of whether it's a real object file or special data
encoding an exit code".

The most common case must always be the "quick path" [...]


Heh, I certainly know that it's important not to slow down common cases
with e.g. more system calls. I didn't mean to ask you to explain what you
meant by the term "slow path" but why you thought that there *would be* a
slow path in the special-data-in-object-file solution. Again I must have
been unclear, sorry about that. (I'm a bit surprised that you felt the need
to explain basic stuff about what's important, to be honest.)

And as described above, I see no reason for why special-data-in-object-file
would be slower. Yes, the non-zero-exit-code case could perhaps be
optimized with a stat size hack, but that doesn't feel important.

I hope this is more clear. I'm beginning to lose faith in my English
communication skills. :-)

-- Joel
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache