Re: [ccache] ccache direct mode

2016-04-25 Thread Anders Björklund
vijay nag wrote:

> The documentation under direct mode suggest that it works by hashing
> the contents of include files, but it is not clear if it hashes the
> contents of every included file in BFS or DFS inclusion order ?

The files are indexed in the order the compiler (preprocessor) says.
You can check the order by running gcc -c yourself with the -H flag.

Or you can dump the .manifest file afterwards, using --dump-manifest.
Normally it is very top-to-bottom, which corresponds to a DFS order ?

/Anders
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Direct mode with -B and -iprefix

2016-04-25 Thread vijay nag
On Sat, Apr 23, 2016 at 3:29 PM, Anders Björklund  wrote:
> vijay nag wrote:
>> Hello ccache,
>> Why isn't ccache converting -B and -iprefix paths to relative paths when
>> CCACHE_BASEDIR is set ? Should it be converted to relative path ?
>
> Historically, the base dir has been mostly about "your" code...
>
> So not everything with the toolchain, whether it was -MD or -B,
> was converted as the compiler was usually sitting still in /usr.
> Lately there has been some changes to this (e.g. 60178b7), so
> maybe there are more paths that _could_ be converted to relative.
>
> But personally I would probably need to have *two* base paths,
> for that to work (one for the source code, one for the toolchain)
> So that I could have a base dir of $HOME/myproject but a toolchain
> dir of e.g. /usr, just in case I wanted to relocate the toolchain.
>
> Do you have some more examples, of where this is useful (to you) ?
>
> /Anders
> ___
> ccache mailing list
> [email protected]
> https://lists.samba.org/mailman/listinfo/ccache

The GCC libraries such as libgcc_s.a, libgcc.a are part of the source
tree we build every time. So, we first build these gcc related
libraries and then use the -B incantations in the cflags. Does direct
mode take into consideration compiler flags for hashing purposes ? We
can workaround -B problem by specifying GCC_EXEC_PREFIX, however I'm
not sure if we have to convert -iprefix to relative paths when basedir
is set ?

___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] ccache direct mode

2016-04-24 Thread vijay nag
Hello ccache,

The documentation under direct mode suggest that it works by hashing
the contents of include files, but it is not clear if it hashes the
contents of every included file in BFS or DFS inclusion order ?

___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Direct mode with -B and -iprefix

2016-04-23 Thread Anders Björklund
vijay nag wrote:
> Hello ccache,
> Why isn't ccache converting -B and -iprefix paths to relative paths when
> CCACHE_BASEDIR is set ? Should it be converted to relative path ?

Historically, the base dir has been mostly about "your" code...

So not everything with the toolchain, whether it was -MD or -B,
was converted as the compiler was usually sitting still in /usr.
Lately there has been some changes to this (e.g. 60178b7), so
maybe there are more paths that _could_ be converted to relative.

But personally I would probably need to have *two* base paths,
for that to work (one for the source code, one for the toolchain)
So that I could have a base dir of $HOME/myproject but a toolchain
dir of e.g. /usr, just in case I wanted to relocate the toolchain.

Do you have some more examples, of where this is useful (to you) ?

/Anders
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


[ccache] Direct mode with -B and -iprefix

2016-04-23 Thread vijay nag
Hello ccache,
Why isn't ccache converting -B and -iprefix paths to relative paths when
CCACHE_BASEDIR is set ? Should it be converted to relative path ?
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Solve Android building problem when using ccache direct mode

2014-04-12 Thread Joel Rosdahl
Hi Dongsheng,

Basically, you have not set up ccache correctly. ccache relies on the
assumption that if the compiler check (by default the compiler's mtime) is
the same then the whole toolchain is the same. If you use different
(locations of the) toolchains but mtimes of the compilers are the same,
then you have to instruct ccache how to properly identify the compiler by
setting CCACHE_COMPILERCHECK to a suitable command. Or, disable the direct
mode like you mentioned.

One option is to use -MMD instead of -MD when compiling so that toolchain
headers don't end up in the .d files at all.

It might be possible to improve ccache to convert absolute paths to
relative paths in .d files in some clever way. Patches are welcome.


On 5 April 2014 17:24, Qu Dongsheng  wrote:

>
> Hello
>
> When I was building Android source code on our building server, which many
> people share this machine and use ccache. I got an error that it tried to
> stat other people's system headers! After some debugging, I found that
> wrong dependency file was generated from the ccache.
>
> it was caused by
> - ccache is using direct mode
> - Every android source tree has prebuilt toolchain
> - Android compiles the C/C++ code with dependency file generated
> - We replaced Android prebuilt ccache with the ccache 3.1.7
>
> In such case, ccache just returns a wrong dependcency file, in which it
> contains the system headers with absolute path to other person's directory!
> I think that direct mode is not safe when dependency file will be
> generated. This issue happens even on the latest ccache. I made a patch to
> solve it, as follows. Of course, setting CCACHE_NODIRECT or CCACHE_UNIFY
> can solve the above building problem, but doing such will totally disable
> direct mode, we will not benefit from it.
>
> Is there better solution?
>
>
>
> commit 63da529963afa9e69b217d4dd0624f9828e294df
> Author: Qu Dongsheng 
> Date:   Sat Apr 5 22:45:14 2014 +0800
>
> Enabling direct mode is not safe when dependency file is generated.
> As the dependency file contains the absolute paths to the system
> headers, wrong dependency file can be returned if toochain's location is
> changed.
> This typically happens when multiple Android source trees are built on
> the same machine. (Every android source tree contains a copy of toolchain)
>
> diff --git a/ccache.c b/ccache.c
> index 02dbdfa..8efe921 100644
> --- a/ccache.c
> +++ b/ccache.c
> @@ -1995,6 +1995,7 @@ ccache(int argc, char *argv[])
> cc_log("Source file: %s", input_file);
> if (generating_dependencies) {
> cc_log("Dependency file: %s", output_dep);
> +   enable_direct = false;
> }
> cc_log("Object file: %s", output_obj);
>
>
>
>
>
> Dongsheng
> ___
> ccache mailing list
> [email protected]
> https://lists.samba.org/mailman/listinfo/ccache
>
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


[ccache] Solve Android building problem when using ccache direct mode

2014-04-05 Thread Qu Dongsheng

Hello

When I was building Android source code on our building server, which many 
people share this machine and use ccache. I got an error that it tried to stat 
other people's system headers! After some debugging, I found that wrong 
dependency file was generated from the ccache. 

it was caused by 
- ccache is using direct mode
- Every android source tree has prebuilt toolchain
- Android compiles the C/C++ code with dependency file generated
- We replaced Android prebuilt ccache with the ccache 3.1.7

In such case, ccache just returns a wrong dependcency file, in which it 
contains the system headers with absolute path to other person's directory! I 
think that direct mode is not safe when dependency file will be generated. This 
issue happens even on the latest ccache. I made a patch to solve it, as 
follows. Of course, setting CCACHE_NODIRECT or CCACHE_UNIFY can solve the above 
building problem, but doing such will totally disable direct mode, we will not 
benefit from it.

Is there better solution?



commit 63da529963afa9e69b217d4dd0624f9828e294df
Author: Qu Dongsheng 
Date:   Sat Apr 5 22:45:14 2014 +0800

Enabling direct mode is not safe when dependency file is generated.
As the dependency file contains the absolute paths to the system headers, 
wrong dependency file can be returned if toochain's location is changed.
This typically happens when multiple Android source trees are built on the 
same machine. (Every android source tree contains a copy of toolchain)

diff --git a/ccache.c b/ccache.c
index 02dbdfa..8efe921 100644
--- a/ccache.c
+++ b/ccache.c
@@ -1995,6 +1995,7 @@ ccache(int argc, char *argv[])
cc_log("Source file: %s", input_file);
if (generating_dependencies) {
cc_log("Dependency file: %s", output_dep);
+   enable_direct = false;
}
cc_log("Object file: %s", output_obj);
 




Dongsheng
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode

2013-10-28 Thread Andrew Stubbs

On 27/10/13 20:03, Joel Rosdahl wrote:

Assume that /example/foo/foo/bar.h doesn't exist and that
/example/foo/bar.h exists. The preprocessed output will indicate that
/example/foo/bar.h was read. Which directories where the header wasn't
found should we store in the manifest? The correct answer is
"/example/foo", which I don't see how ccache can conclude from the
available information.


Sigh, I had hoped that there'd be only false-negatives, and no missed 
positives.


I think there's enough line-number information in the preprocessed 
source to locate the #include directive in the original source file. The 
data has to be read from disk anyway, but all that extra processing 
might make a cache-miss very expensive.



Regarding how to know whether we can pass compiler-specific options like
-v, I have had some thoughts about that some years ago, see comment 3 on
https://bugzilla.samba.org/show_bug.cgi?id=7556#c3.


Agreed, capturing and caching the identity of any given compiler should 
not be too hard. "cc --version" is one way, or it might just be easier 
to load the binary into memory and memmem your way to victory. The 
result can be cached and indexed by a hash of the path and mtime.


Andrew

___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode

2013-10-27 Thread Joel Rosdahl
On 23 October 2013 12:01, Andrew Stubbs  wrote:

> I've thought about this a little, since our last exchange.


Thanks! Sounds like the best solution so far.

Some thoughts:


> 1. When running the preprocessor, on cache-miss, use -v the capture the
> compiler's search path. This ought to be almost zero extra cost.


This only works with very GCC-like compilers. For instance, ccache
currently works with Solaris's compiler as well, and it doesn't support -v.
Although I have written "Only works with GCC and compilers that behave
similar enough" in the documentation, it would be nice to continue
supporting other slightly less GCC-like compilers if possible.

2. For each header file found in the preprocessed source, record the
> directories in which the compiler did *not* find that file.
> [...]
> 4. On cache-lookup, load the manifest file, and do
> "access(potential_include_file, R_OK)" for each include path listed in the
> manifest. If any call comes back non-zero then skip direct mode.


I don't think that it's possible to calculate potential_include_file
correctly in all cases with only the information from -E -v. Consider this
.c file:

#include 

When compiling with "-I/example/foo -I/example", "gcc -E -v" will give
something like this:

#include "..." search starts here:
#include <...> search starts here:
 /example/foo
 /example
 /usr/lib/gcc/x86_64-linux-gnu/4.8/include
 /usr/local/include
 /usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed
 /usr/include/x86_64-linux-gnu
 /usr/include

Assume that /example/foo/foo/bar.h doesn't exist and that
/example/foo/bar.h exists. The preprocessed output will indicate that
/example/foo/bar.h was read. Which directories where the header wasn't
found should we store in the manifest? The correct answer is
"/example/foo", which I don't see how ccache can conclude from the
available information.

In terms of false negatives, it's hard to tell the difference between ""
> and <> includes (in which the search path starts in a different place), and
> it'll be suboptimal where the user's source contains explicit paths to
> header files.


Right.

I do think the problems are quite minor, though - your proposal is
certainly better than no detection mechanism, and it will hopefully cover
the absolute majority of use cases in practice.

Regarding how to know whether we can pass compiler-specific options like
-v, I have had some thoughts about that some years ago, see comment 3 on
https://bugzilla.samba.org/show_bug.cgi?id=7556#c3.

-- Joel
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode

2013-10-23 Thread Andrew Stubbs

On 23/10/13 16:27, Ian Norton wrote:

 > This scheme does not allow for the case where the compiler's search
path has been changed, but the "gcc" binary has not, but it should
certainly catch the cases where the include directories have been
updated, or the user's own -I include paths have new content.

Is it even possible to change the search paths without a rebuild?


Yes, you can modify the default specs file. (Actually, the default specs 
are built into the gcc binary, but gcc will automatically loads specs 
from a file in the right spot, if you create one.)



 > In terms of false negatives, it's hard to tell the difference between
"" and <> includes (in which the search path starts in a different
place), and it'll be suboptimal where the user's source contains
explicit paths to header files.

I would hole that statting missing would be quite inexpensive


Hope? Yes, that's my expectation, but any syscall is more expensive than 
not having it.


Andre

___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode

2013-10-23 Thread Ian Norton
On 23 Oct 2013 11:01, "Andrew Stubbs"  wrote:
>
> On 20/10/13 10:18, Joel Rosdahl wrote:
>>>
>>> bug?
>>
>>
>> Yes, see the discussion on
>> http://www.mail-archive.com/[email protected]/msg00920.html.
>>
>> By the way: I'm still torn on what to do, but I'm leaning towards keeping
>> direct mode on by default (documenting the behavior, of course).
>
>
> I've thought about this a little, since our last exchange.
>
> I like Christian Lohmaier's suggestion of a "safe-direct mode". Or
rather, something like it: I'd suggest making the existing direct mode, the
default, safe and then have a "fast-mode" flag that short-cuts back to the
current state. The "CCACHE_FAST_MODE" would be documented in the same
spirit as CCACHE_HARDLINK, and CCACHE_SLOPPINESS; that is, with lots of
caveats.
>
> Here's my idea how to make it safe:
>
> 1. When running the preprocessor, on cache-miss, use -v the capture the
compiler's search path. This ought to be almost zero extra cost.
>
> 2. For each header file found in the preprocessed source, record the
directories in which the compiler did *not* find that file. There should be
no need to stat anything; it ought to be obvious from reading the path name
where the compiler found it.
>
> 3. Store those paths in the manifest file.
>
> 4. On cache-lookup, load the manifest file, and do
"access(potential_include_file, R_OK)" for each include path listed in the
manifest. If any call comes back non-zero then skip direct mode.
>
> For example, lets say that the include search patch is this:
>
>  /usr/lib/gcc/x86_64-linux-gnu/4.8/include
>  /usr/local/include
>  /usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed
>  /usr/include/x86_64-linux-gnu
>  /usr/include
>
> And the source contains a reference to "/usr/include/stdio.h".
>
> We can see that "stdio.h" was not found in any of the first four include
paths, so we encode that data into the manifest.
>
> On lookup, ccache would only have to do four extra syscalls to make that
header "safe". (Possibly the CWD would also have to be checked.)
>
>
> This scheme does not allow for the case where the compiler's search path
has been changed, but the "gcc" binary has not, but it should certainly
catch the cases where the include directories have been updated, or the
user's own -I include paths have new content.

Is it even possible to change the search paths without a rebuild?

> In terms of false negatives, it's hard to tell the difference between ""
and <> includes (in which the search path starts in a different place), and
it'll be suboptimal where the user's source contains explicit paths to
header files.

I would hole that statting missing would be quite inexpensive

Ian
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode

2013-10-23 Thread Andrew Stubbs

On 20/10/13 10:18, Joel Rosdahl wrote:

bug?


Yes, see the discussion on
http://www.mail-archive.com/[email protected]/msg00920.html.

By the way: I'm still torn on what to do, but I'm leaning towards keeping
direct mode on by default (documenting the behavior, of course).


I've thought about this a little, since our last exchange.

I like Christian Lohmaier's suggestion of a "safe-direct mode". Or 
rather, something like it: I'd suggest making the existing direct mode, 
the default, safe and then have a "fast-mode" flag that short-cuts back 
to the current state. The "CCACHE_FAST_MODE" would be documented in the 
same spirit as CCACHE_HARDLINK, and CCACHE_SLOPPINESS; that is, with 
lots of caveats.


Here's my idea how to make it safe:

1. When running the preprocessor, on cache-miss, use -v the capture the 
compiler's search path. This ought to be almost zero extra cost.


2. For each header file found in the preprocessed source, record the 
directories in which the compiler did *not* find that file. There should 
be no need to stat anything; it ought to be obvious from reading the 
path name where the compiler found it.


3. Store those paths in the manifest file.

4. On cache-lookup, load the manifest file, and do 
"access(potential_include_file, R_OK)" for each include path listed in 
the manifest. If any call comes back non-zero then skip direct mode.


For example, lets say that the include search patch is this:

 /usr/lib/gcc/x86_64-linux-gnu/4.8/include
 /usr/local/include
 /usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed
 /usr/include/x86_64-linux-gnu
 /usr/include

And the source contains a reference to "/usr/include/stdio.h".

We can see that "stdio.h" was not found in any of the first four include 
paths, so we encode that data into the manifest.


On lookup, ccache would only have to do four extra syscalls to make that 
header "safe". (Possibly the CWD would also have to be checked.)



This scheme does not allow for the case where the compiler's search path 
has been changed, but the "gcc" binary has not, but it should certainly 
catch the cases where the include directories have been updated, or the 
user's own -I include paths have new content.


In terms of false negatives, it's hard to tell the difference between "" 
and <> includes (in which the search path starts in a different place), 
and it'll be suboptimal where the user's source contains explicit paths 
to header files.


Andrew
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode

2013-10-20 Thread Joel Rosdahl
> Perhaps a rename is needed as the title implies it is ubuntu specific and
intermittant?

Yes, done.

-- Joel


On 18 October 2013 12:22, Ian Norton  wrote:

> I've updated https://bugzilla.samba.org/show_bug.cgi?id=8424.
>
> Perhaps a rename is needed as the title implies it is ubuntu specific
> and intermittant?
>
> On 18 October 2013 11:05, Ian Norton  wrote:
> > Ok. I should have tried it before hand.  ccache *doesnt't* notice the
> > addition of the new header and still gives me a .o file from the first
> > invocation.
> >
> > bug?
> >
> > On 18 October 2013 10:54, Ian Norton  wrote:
> >> Hello All,
> >>
> >> I have a question about direct mode, it follows on from an old thread
> >> I've seen in the archives:
> >>
> >> http://www.mail-archive.com/[email protected]/msg00150.html
> >>
> >> I'll quote inline and follow on.
> >>
> >> Joel Rosdahl wrote:
> >>> tridge wrote:
> >>> > Also, does the hashtable used for included_files preserve the
> >>> > ordering? (the order of includes is also vital). Or do you rely on
> the
> >>> > hash of the file that does the #include changing for that?
> >>
> >>> The hashtable is unordered, and yes, I rely on the hash of the input
> >>> file to keep track of the ordering, and also of course on the include
> >>> file hashes. For a given manifest, the source file (and therefore the
> >>> order of the first level of include files) is known since the manifest
> >>> is looked up given the hash of the input file (and some more
> >>> information), and all other levels of include files are taken
> >>> care of using the same kind of reasoning. In other words, if the
> >>> include file order changes in some file, then the hash of that file
> >>> changes too, which leads to a cache miss. Which include files the
> >>> preprocessor reads is of course also a function of compiler options
> >>> like -I, but that is handled by also hashing those options when
> >>> computing the hash in direct mode. Do you see any potential problem
> >>> here?
> >>
> >> I realise I'm probably missing something, but how does direct mode
> >> handle the case where
> >> the command line args have not changed, and nor have the source file
> >> or previously used headers *but* a header file has been added to a
> >> folder on one of the -I paths? eg:
> >>
> >> hello.c:
> >> #include "test.h"
> >>
> >> inc1/test.h:
> >> void hello(void);
> >>
> >> gcc -I inc2 -I inc1 -c hello.c
> >>
> >> later, someone makes a new file:
> >>
> >> inc2/test.h:
> >> int hello(void);
> >>
> >> The same command line and original inputs would result in a different
> file.
> >>
> >> How does direct mode cover this case ( all our common input data has
> >> not changed )
> >>
> >> Many Thanks, ccache is fantastic btw!
> >>
> >> Ian
> ___
> ccache mailing list
> [email protected]
> https://lists.samba.org/mailman/listinfo/ccache
>
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode

2013-10-20 Thread Joel Rosdahl
> bug?

Yes, see the discussion on
http://www.mail-archive.com/[email protected]/msg00920.html.

By the way: I'm still torn on what to do, but I'm leaning towards keeping
direct mode on by default (documenting the behavior, of course).

-- Joel


On 18 October 2013 12:05, Ian Norton  wrote:

> Ok. I should have tried it before hand.  ccache *doesnt't* notice the
> addition of the new header and still gives me a .o file from the first
> invocation.
>
> bug?
>
> On 18 October 2013 10:54, Ian Norton  wrote:
> > Hello All,
> >
> > I have a question about direct mode, it follows on from an old thread
> > I've seen in the archives:
> >
> > http://www.mail-archive.com/[email protected]/msg00150.html
> >
> > I'll quote inline and follow on.
> >
> > Joel Rosdahl wrote:
> >> tridge wrote:
> >> > Also, does the hashtable used for included_files preserve the
> >> > ordering? (the order of includes is also vital). Or do you rely on the
> >> > hash of the file that does the #include changing for that?
> >
> >> The hashtable is unordered, and yes, I rely on the hash of the input
> >> file to keep track of the ordering, and also of course on the include
> >> file hashes. For a given manifest, the source file (and therefore the
> >> order of the first level of include files) is known since the manifest
> >> is looked up given the hash of the input file (and some more
> >> information), and all other levels of include files are taken
> >> care of using the same kind of reasoning. In other words, if the
> >> include file order changes in some file, then the hash of that file
> >> changes too, which leads to a cache miss. Which include files the
> >> preprocessor reads is of course also a function of compiler options
> >> like -I, but that is handled by also hashing those options when
> >> computing the hash in direct mode. Do you see any potential problem
> >> here?
> >
> > I realise I'm probably missing something, but how does direct mode
> > handle the case where
> > the command line args have not changed, and nor have the source file
> > or previously used headers *but* a header file has been added to a
> > folder on one of the -I paths? eg:
> >
> > hello.c:
> > #include "test.h"
> >
> > inc1/test.h:
> > void hello(void);
> >
> > gcc -I inc2 -I inc1 -c hello.c
> >
> > later, someone makes a new file:
> >
> > inc2/test.h:
> > int hello(void);
> >
> > The same command line and original inputs would result in a different
> file.
> >
> > How does direct mode cover this case ( all our common input data has
> > not changed )
> >
> > Many Thanks, ccache is fantastic btw!
> >
> > Ian
> ___
> ccache mailing list
> [email protected]
> https://lists.samba.org/mailman/listinfo/ccache
>
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode

2013-10-18 Thread Ian Norton
I've updated https://bugzilla.samba.org/show_bug.cgi?id=8424.

Perhaps a rename is needed as the title implies it is ubuntu specific
and intermittant?

On 18 October 2013 11:05, Ian Norton  wrote:
> Ok. I should have tried it before hand.  ccache *doesnt't* notice the
> addition of the new header and still gives me a .o file from the first
> invocation.
>
> bug?
>
> On 18 October 2013 10:54, Ian Norton  wrote:
>> Hello All,
>>
>> I have a question about direct mode, it follows on from an old thread
>> I've seen in the archives:
>>
>> http://www.mail-archive.com/[email protected]/msg00150.html
>>
>> I'll quote inline and follow on.
>>
>> Joel Rosdahl wrote:
>>> tridge wrote:
>>> > Also, does the hashtable used for included_files preserve the
>>> > ordering? (the order of includes is also vital). Or do you rely on the
>>> > hash of the file that does the #include changing for that?
>>
>>> The hashtable is unordered, and yes, I rely on the hash of the input
>>> file to keep track of the ordering, and also of course on the include
>>> file hashes. For a given manifest, the source file (and therefore the
>>> order of the first level of include files) is known since the manifest
>>> is looked up given the hash of the input file (and some more
>>> information), and all other levels of include files are taken
>>> care of using the same kind of reasoning. In other words, if the
>>> include file order changes in some file, then the hash of that file
>>> changes too, which leads to a cache miss. Which include files the
>>> preprocessor reads is of course also a function of compiler options
>>> like -I, but that is handled by also hashing those options when
>>> computing the hash in direct mode. Do you see any potential problem
>>> here?
>>
>> I realise I'm probably missing something, but how does direct mode
>> handle the case where
>> the command line args have not changed, and nor have the source file
>> or previously used headers *but* a header file has been added to a
>> folder on one of the -I paths? eg:
>>
>> hello.c:
>> #include "test.h"
>>
>> inc1/test.h:
>> void hello(void);
>>
>> gcc -I inc2 -I inc1 -c hello.c
>>
>> later, someone makes a new file:
>>
>> inc2/test.h:
>> int hello(void);
>>
>> The same command line and original inputs would result in a different file.
>>
>> How does direct mode cover this case ( all our common input data has
>> not changed )
>>
>> Many Thanks, ccache is fantastic btw!
>>
>> Ian
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode

2013-10-18 Thread Ian Norton
Ok. I should have tried it before hand.  ccache *doesnt't* notice the
addition of the new header and still gives me a .o file from the first
invocation.

bug?

On 18 October 2013 10:54, Ian Norton  wrote:
> Hello All,
>
> I have a question about direct mode, it follows on from an old thread
> I've seen in the archives:
>
> http://www.mail-archive.com/[email protected]/msg00150.html
>
> I'll quote inline and follow on.
>
> Joel Rosdahl wrote:
>> tridge wrote:
>> > Also, does the hashtable used for included_files preserve the
>> > ordering? (the order of includes is also vital). Or do you rely on the
>> > hash of the file that does the #include changing for that?
>
>> The hashtable is unordered, and yes, I rely on the hash of the input
>> file to keep track of the ordering, and also of course on the include
>> file hashes. For a given manifest, the source file (and therefore the
>> order of the first level of include files) is known since the manifest
>> is looked up given the hash of the input file (and some more
>> information), and all other levels of include files are taken
>> care of using the same kind of reasoning. In other words, if the
>> include file order changes in some file, then the hash of that file
>> changes too, which leads to a cache miss. Which include files the
>> preprocessor reads is of course also a function of compiler options
>> like -I, but that is handled by also hashing those options when
>> computing the hash in direct mode. Do you see any potential problem
>> here?
>
> I realise I'm probably missing something, but how does direct mode
> handle the case where
> the command line args have not changed, and nor have the source file
> or previously used headers *but* a header file has been added to a
> folder on one of the -I paths? eg:
>
> hello.c:
> #include "test.h"
>
> inc1/test.h:
> void hello(void);
>
> gcc -I inc2 -I inc1 -c hello.c
>
> later, someone makes a new file:
>
> inc2/test.h:
> int hello(void);
>
> The same command line and original inputs would result in a different file.
>
> How does direct mode cover this case ( all our common input data has
> not changed )
>
> Many Thanks, ccache is fantastic btw!
>
> Ian
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


[ccache] direct mode

2013-10-18 Thread Ian Norton
Hello All,

I have a question about direct mode, it follows on from an old thread
I've seen in the archives:

http://www.mail-archive.com/[email protected]/msg00150.html

I'll quote inline and follow on.

Joel Rosdahl wrote:
> tridge wrote:
> > Also, does the hashtable used for included_files preserve the
> > ordering? (the order of includes is also vital). Or do you rely on the
> > hash of the file that does the #include changing for that?

> The hashtable is unordered, and yes, I rely on the hash of the input
> file to keep track of the ordering, and also of course on the include
> file hashes. For a given manifest, the source file (and therefore the
> order of the first level of include files) is known since the manifest
> is looked up given the hash of the input file (and some more
> information), and all other levels of include files are taken
> care of using the same kind of reasoning. In other words, if the
> include file order changes in some file, then the hash of that file
> changes too, which leads to a cache miss. Which include files the
> preprocessor reads is of course also a function of compiler options
> like -I, but that is handled by also hashing those options when
> computing the hash in direct mode. Do you see any potential problem
> here?

I realise I'm probably missing something, but how does direct mode
handle the case where
the command line args have not changed, and nor have the source file
or previously used headers *but* a header file has been added to a
folder on one of the -I paths? eg:

hello.c:
#include "test.h"

inc1/test.h:
void hello(void);

gcc -I inc2 -I inc1 -c hello.c

later, someone makes a new file:

inc2/test.h:
int hello(void);

The same command line and original inputs would result in a different file.

How does direct mode cover this case ( all our common input data has
not changed )

Many Thanks, ccache is fantastic btw!

Ian
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode design bug

2012-11-08 Thread Andrew Stubbs

On 08/11/12 19:25, Mathias De Maré wrote:

Is it safe?

Yes. The most important aspect of a compiler cache is to always produce exactly 
the same output that the real compiler would produce. This includes providing 
exactly the same object files and exactly the same compiler warnings that would 
be produced if you use the real compiler. The only way you should be able to 
tell that you are using ccache is the speed.

...

Performance is an important aspect, but it seems better to me to
prefer the safe option (right now, that's preprocessing made, perhaps
in the future it can be a 'safe direct mode'), rather than the fast
option.
Application defaults (this is my personal opinion of course) benefit
from being safe. It's always possible for users to go with a more
advanced option if they do feel confident.


The above statement remains true if you do not change the 
toolchain/library installation without also wiping the cache.


If you do change the installation, without telling ccache, then ccache 
produces the exact same output that the real compiler *used to* produce.


To me, the question is whether that is broken enough to disable, by 
default, a feature that is such a huge performance win?


So, to answer this, I thought of who cares about the distinction, and 
decided that only the power users are likely to trip over the problem. 
Based on this I decided that a good first step would be to simply 
document the problem.


In any case, I do believe that that's only an interim measure, it is a 
problem that could well affect my users, and that's why I've been giving 
some thought to how best to solve it, properly.


Andrew
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode design bug

2012-11-08 Thread Mathias De Maré
Hello all,

I believe ccache is an incredibly useful application (which is why I'm
introducing it at my work place).
Besides the great performance improvements for builds, I particularly
like this feature mentioned on the front page:

> Is it safe?
>
> Yes. The most important aspect of a compiler cache is to always produce 
> exactly the same output that the real compiler would produce. This includes 
> providing exactly the same object files and exactly the same compiler 
> warnings that would be produced if you use the real compiler. The only way 
> you should be able to tell that you are using ccache is the speed.

Ccache is really good at giving the same results as the actual
compiler. I know, there may be a few exceptions, but those should be
kept to a minimum.

Performance is an important aspect, but it seems better to me to
prefer the safe option (right now, that's preprocessing made, perhaps
in the future it can be a 'safe direct mode'), rather than the fast
option.
Application defaults (this is my personal opinion of course) benefit
from being safe. It's always possible for users to go with a more
advanced option if they do feel confident.

Greetings,
Mathias
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode design bug

2012-11-08 Thread Andrew Stubbs

On 07/11/12 19:19, Joel Rosdahl wrote:

It would be nice if ccache were only used and enabled by conscious users
who have read and understood the documentation, but in reality that
doesn't happen in many cases. For instance, Linux distributions like
Fedora install and enable ccache by default (masquerading the system
compiler), at least when installing the development environment or
similar. That's not surprising given that ccache works very well for
most people and that it is advertised as being very safe.


Hmm, I was not aware Fedora did that, but then I don't use Fedora much, 
and when I have Ccache is transparent enough I wouldn't necessarily 
notice. :)


I am aware that Yocto uses it, by default, and certainly their users 
could stumble of this problem, but again, only rarely.



A similar issue, albeit not so interesting, perhaps, is what happens
when a user changes some part of the toolchain, but does not alter
the "gcc" binary. Ccache won't notice a new back-end compiler, a new
assembler, a new linker, a new default specs file or anything like
that. Chances are that any differences in the output are harmless,
but the cached objects are technically invalid.


Right. However, isn't the the fact that ccache may be affected by
toolchain changes much less surprising than the fact that ccache may
fail to pick up header files correctly?


That's why it's less interesting.


[In fact, I have a use-case in which I have multiple users sharing a
cache, and I wanted to be able to uniquely identify the same
toolchain across all the installations. The mtime etc. varies from
machine to machine, as might the exact tool mix, so I have some
local patches to do a much deeper hash of the toolchain binaries,
and include those in the object hashes. Even then, in the interests
of performance, those toolchain IDs are cached according to the
location and mtime, so changing the binutils will cause temporarily
wrong toolchain hashes. Would you be interested in such a feature
upstream?]


Perhaps, it depends on how intrusive it is and how toolchain-specific it is.


Basically, it first does the same as CCACHE_COMPILERCHECK=mtime, and 
uses that to look for a .toolid file in the cache. If the tool-id 
is cached it reads it from that file, and uses that ID to calculate the 
opject hashes as usual. If the tool-id is not cached then it runs "gcc 
-print-prog-name=..." a few times, hashes the binaries it finds, and 
caches the result for next time. CCACHE_COMPILERCHECK=content causes the 
ID to be re-cached, and =none and = are unaltered.


By this means the cached files can be shared across machines with 
toolchains that really are the same (all the way to the bottom) but 
happen to have different installation times being recognised as the 
same, and hashed as the same, but without having to re-hash the binary 
every time.


An interesting side-effect is that binaries cached in 
CCACHE_COMPILERCHECK=mtime mode are now compatible with those cached in 
CCACHE_COMPILERCHECK=content mode, although those cached in the other 
modes remain incompatible.


My implementation is currently GCC specific.


Not sure about that. I maybe overlook something, but ccache would "only"
have to follow all #include statements and note all header files that
don't exist in the include path list. (When #include is used with a
#defined token for the filename, fall back to the real compiler.) When
considering a potential cache hit, reject it if any of the header files
that didn't exist then exist now.


I was thinking of cases like:

#ifdef SOMETHING_NOT_DEFINED
#include "mystery-header.h"
#endif

Presumably you mean that it will note all the *directories* in which a 
particular header file was not found, on the way to finding it?



Anybody got other ideas?


Running the compiler with -v prints the header search directories.
You could use that to do your own scan.


To use the directories from "cpp -v" (plus directories from the command
line) to do some optimistic validation was my first thought as well, but
after thinking more about it I came to the conclusion that it wouldn't
buy much safety because no subdirectories will be checked, and you can't
tell which subdirectories to check unless you have parsed the #include
statements. Also, it would trigger many false negatives.


Yes, false negatives would happen, especially if there are include 
directories within the project source tree. :(


The problem is that I've not been able to think of a way that both 
solves your bug, and doesn't have a serious time-impact on either a 
direct-mode lookup, or a cache-miss.


As it happens, I've been thinking of ways to speed up adding things into 
the cache. I've been profiling the code, and found that, on a 
cache-miss, it spends an significant portion of it's runtime between the 
compiler exiting and ccache exiting. It has occurred to me that if we 
were to return the compiler's res

Re: [ccache] direct mode design bug

2012-11-08 Thread Christian Lohmaier
Hi Joel, *,

On Sun, Nov 4, 2012 at 8:10 PM, Joel Rosdahl  wrote:
>
> The direct mode, which was introduced in version 3.0 almost three years
> ago, has a design bug. The essence of the problem is that in the direct
> mode, ccache records header files that were used by the compiler, but it
> doesn't record header files that were not used but could have been used if
> they existed.

I wouldn't call it a bug, it is just how it works.

If you're going to "fix" this, then please by introducing a "safe
direct" mode in addition to the current direct mode.

If people create headers with the same name all over the place, only
they should "suffer".

>From my understanding this is impossible to fix without sacrifying
what direct mode is about. (avoiding to run the preprocessor, or
something similar that simulates a preprocessor)

ciao
Christian
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode design bug

2012-11-07 Thread Eitan Adler
On 7 November 2012 14:19, Joel Rosdahl  wrote:
> Hm. Coming to think of it, nothing stops Fedora et al from disabling direct
> mode by default even if ccache's own default is to enable it.

As a package maintainer I would like to discourage this view.
Downstream maintainers shouldn't have to modify the upstream default
except in extreme cases. This makes things confusing for the users and
results in weird questions on the mailing lists.

-- 
Eitan Adler
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode design bug

2012-11-07 Thread Joel Rosdahl
On 5 November 2012 16:31, Andrew Stubbs  wrote:

> Incidentally, you appear to have committed a patch updating the
> documentation stating that direct mode is off by default, but in the code
> direct_mode is still true, by default.


Yes, I started sketching on disabling it by default but stopped halfway
because I couldn't make up my mind at the time. I'll fix it, thanks.

-- Joel
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode design bug

2012-11-07 Thread Joel Rosdahl
Many thanks for the answer!

On 5 November 2012 14:53, Andrew Stubbs  wrote:

> My first reaction to this issue, rightly or wrongly, is that it's more of
> a documentation issue than a real bug. I mean, it can only occur if two
> people share a cache, or if the user installs new software and then reuses
> an old cache.
>

It can happen in other cases as well. Contrieved example, but still:

rm -rf subdir file.c config.h
echo '#include "config.h"' >file.c
mkdir subdir
echo '#warning subdir/config.h used' >subdir/config.h
sleep 1
ccache gcc -Isubdir -c file.c
# User: "Oops, forgot to create ./config.h."
echo '#warning config.h used' >config.h
sleep 1
ccache gcc -Isubdir -c file.c
# User: "Wat? Why isn't ./config.h used?"


For a real life example, see
https://bugzilla.samba.org/show_bug.cgi?id=8424#c0.

If the documentation simply says that you have to wipe your cache whenever
> you do that sort of thing then does that solve the problem?
>

It would be nice if ccache were only used and enabled by conscious users
who have read and understood the documentation, but in reality that doesn't
happen in many cases. For instance, Linux distributions like Fedora install
and enable ccache by default (masquerading the system compiler), at least
when installing the development environment or similar. That's not
surprising given that ccache works very well for most people and that it is
advertised as being very safe.

There are several other cases where ccache's behavior doesn't fully match
that of the real compiler - I'm just a bit worried that the direct mode
issue we're discussing perhaps is too much of a behavior mismatch.

Hm. Coming to think of it, nothing stops Fedora et al from disabling direct
mode by default even if ccache's own default is to enable it.

A similar issue, albeit not so interesting, perhaps, is what happens when a
> user changes some part of the toolchain, but does not alter the "gcc"
> binary. Ccache won't notice a new back-end compiler, a new assembler, a new
> linker, a new default specs file or anything like that. Chances are that
> any differences in the output are harmless, but the cached objects are
> technically invalid.


Right. However, isn't the the fact that ccache may be affected by toolchain
changes much less surprising than the fact that ccache may fail to pick up
header files correctly?


> [In fact, I have a use-case in which I have multiple users sharing a
> cache, and I wanted to be able to uniquely identify the same toolchain
> across all the installations. The mtime etc. varies from machine to
> machine, as might the exact tool mix, so I have some local patches to do a
> much deeper hash of the toolchain binaries, and include those in the object
> hashes. Even then, in the interests of performance, those toolchain IDs are
> cached according to the location and mtime, so changing the binutils will
> cause temporarily wrong toolchain hashes. Would you be interested in such a
> feature upstream?]


Perhaps, it depends on how intrusive it is and how toolchain-specific it is.

3. ccache could try to imitate what the preprocessor does.
>>
>
> Yuck. If you can program a faster preprocessor I'm sure the GCC folks
> would love to see it.


Thankfully, my suggestion wasn't to create a preprocessor substitute. :-)

You wouldn't get to dumb much down unless you're fine with running both
> your own preprocessor and then the real one for the preprocessor mode cache
> check.


Yes, that's of course what I had in mind.


> Even if you only wanted to look for #include statements you'd still need
> to evaluate all the #if directives.


Not sure about that. I maybe overlook something, but ccache would "only"
have to follow all #include statements and note all header files that don't
exist in the include path list. (When #include is used with a #defined
token for the filename, fall back to the real compiler.) When considering a
potential cache hit, reject it if any of the header files that didn't exist
then exist now.

 Anybody got other ideas?
>>
>
> Running the compiler with -v prints the header search directories. You
> could use that to do your own scan.


To use the directories from "cpp -v" (plus directories from the command
line) to do some optimistic validation was my first thought as well, but
after thinking more about it I came to the conclusion that it wouldn't buy
much safety because no subdirectories will be checked, and you can't tell
which subdirectories to check unless you have parsed the #include
statements. Also, it would trigger many false negatives.

BTW, gcc has an option "--trace-includes" that might be faster than
> scanning the preprocessor output, although the compiler still has to do all
> the same work. Like this: "gcc -E hello.c -o /dev/null".


How do you use --trace-includes? It doesn't seem to be documented and
nothing happens when I try it.

Please leave it on. The difference is like night and day, and the bug is
> rare and avoidable.


OK, we so far have one vote

Re: [ccache] direct mode design bug

2012-11-05 Thread Andrew Stubbs

On 04/11/12 19:10, Joel Rosdahl wrote:

Since a quick fix likely isn't possible in the short term, and I would like
to release ccache 3.2 soon, we have to decide whether the direct mode
should default to off or on. Please share any opinions!


Incidentally, you appear to have committed a patch updating the 
documentation stating that direct mode is off by default, but in the 
code direct_mode is still true, by default.


Andrew

___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] direct mode design bug

2012-11-05 Thread Andrew Stubbs

On 04/11/12 19:10, Joel Rosdahl wrote:

The direct mode, which was introduced in version 3.0 almost three years
ago, has a design bug. The essence of the problem is that in the direct
mode, ccache records header files that were used by the compiler, but it
doesn't record header files that were not used but could have been used if
they existed. So, when ccache checks if a result could be taken from
the cache, it can't check if the existence of a new header file should
invalidate the result.


My first reaction to this issue, rightly or wrongly, is that it's more 
of a documentation issue than a real bug. I mean, it can only occur if 
two people share a cache, or if the user installs new software and then 
reuses an old cache. If the documentation simply says that you have to 
wipe your cache whenever you do that sort of thing then does that solve 
the problem?


A similar issue, albeit not so interesting, perhaps, is what happens 
when a user changes some part of the toolchain, but does not alter the 
"gcc" binary. Ccache won't notice a new back-end compiler, a new 
assembler, a new linker, a new default specs file or anything like that. 
Chances are that any differences in the output are harmless, but the 
cached objects are technically invalid.


Having said all that, if Ccache Just Worked, that would be no bad thing.

[In fact, I have a use-case in which I have multiple users sharing a 
cache, and I wanted to be able to uniquely identify the same toolchain 
across all the installations. The mtime etc. varies from machine to 
machine, as might the exact tool mix, so I have some local patches to do 
a much deeper hash of the toolchain binaries, and include those in the 
object hashes. Even then, in the interests of performance, those 
toolchain IDs are cached according to the location and mtime, so 
changing the binutils will cause temporarily wrong toolchain hashes. 
Would you be interested in such a feature upstream?]



1. ccache could use strace or similar ways of monitoring the compiler and
tracing the performed system calls to find out where headers were probed. I
haven't measured, but I suspect that this would be slow.


The ptrace is quite easy to use, but it would be slow, and not terribly 
portable, plus you'd have to ignore all the other stat gubbins that a 
toolchain indulges in.



2. ccache could override strategic functions using LD_PRELOAD, thus
snooping on system calls without involving the kernel. This should be
possible and quite fast, but it's tricky to get right, and it's not very
portable. (By the way: This is what
http://audited-objects.sourceforge.netdoes, although I don't know if
it monitors and acts on probes of
nonexistent files.)


Faster, but more fragile, and I still don't like it.


3. ccache could try to imitate what the preprocessor does. That is, read
the source code file and follow #include statements instead of looking at
the preprocessor output. This essentially means implementing a dumbed down
version of a preprocessor, a task that doesn't sound trivial: It has to be
significantly faster than the real preprocessor to make a difference, it
will be more coupled to the behavior of the compiler and its various
options (-I, -idirafter, -isystem, etc), and it probably has to know the
compiler's default include directories.


Yuck. If you can program a faster preprocessor I'm sure the GCC folks 
would love to see it. You wouldn't get to dumb much down unless you're 
fine with running both your own preprocessor and then the real one for 
the preprocessor mode cache check. Even if you only wanted to look for 
#include statements you'd still need to evaluate all the #if directives. 
You could make it faster by ignoring the tokenization pass, but then 
you'd get other subtle bugs.



Anybody got other ideas?


Running the compiler with -v prints the header search directories. You 
could use that to do your own scan. It would be difficult to 
differentiate files specified by the user with absolute paths from files 
found by the compiler.


I suggest it would be better to do just the minimum to determine if a 
cached file is unsafe. Perhaps you could hash the directory stat for the 
include directories listed by "gcc -v"? (I've checked, and there doesn't 
seem to be a "-print-..." option for the include path.)


E.g. "gcc -v -c hello.c" gives:
.
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory 
"/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../x86_64-linux-gnu/include"

#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-linux-gnu/4.7/include
 /usr/local/include
 /usr/lib/gcc/x86_64-linux-gnu/4.7/include-fixed
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
..

so, you could stat the directories listed, and disallow direct mode if 
the mtime has changed since the manifest was last written. The paths to 
stat could be cached in the manifest.


Extra points if direct mode 

[ccache] direct mode design bug

2012-11-04 Thread Joel Rosdahl
Hi,

The direct mode, which was introduced in version 3.0 almost three years
ago, has a design bug. The essence of the problem is that in the direct
mode, ccache records header files that were used by the compiler, but it
doesn't record header files that were not used but could have been used if
they existed. So, when ccache checks if a result could be taken from
the cache, it can't check if the existence of a new header file should
invalidate the result.

This scenario is probably quite rare since only few people have reported it
during the years (there are two public bug reports:
https://bugzilla.samba.org/show_bug.cgi?id=8424 and
https://bugzilla.samba.org/show_bug.cgi?id=8728), but the problem may of
course happen without the user reporting it or knowing about it. Anyway,
regardless of frequency, it makes ccache's behavior differ from that of the
unwrapped compiler.

Unfortunately, I don't know how to fix the issue in a good way.

One obvious way would be to try to figure out in which directories the
preprocessor has looked for header files, store that information and do the
same search when considering a cache result. But how to do that?

1. ccache could use strace or similar ways of monitoring the compiler and
tracing the performed system calls to find out where headers were probed. I
haven't measured, but I suspect that this would be slow.

2. ccache could override strategic functions using LD_PRELOAD, thus
snooping on system calls without involving the kernel. This should be
possible and quite fast, but it's tricky to get right, and it's not very
portable. (By the way: This is what
http://audited-objects.sourceforge.netdoes, although I don't know if
it monitors and acts on probes of
nonexistent files.)

3. ccache could try to imitate what the preprocessor does. That is, read
the source code file and follow #include statements instead of looking at
the preprocessor output. This essentially means implementing a dumbed down
version of a preprocessor, a task that doesn't sound trivial: It has to be
significantly faster than the real preprocessor to make a difference, it
will be more coupled to the behavior of the compiler and its various
options (-I, -idirafter, -isystem, etc), and it probably has to know the
compiler's default include directories.

Anybody got other ideas?

Regarding option 3: If I understand correctly, distcc's pump mode does
something similar, so perhaps there is code to borrow or be inspired by?

Since a quick fix likely isn't possible in the short term, and I would like
to release ccache 3.2 soon, we have to decide whether the direct mode
should default to off or on. Please share any opinions!

-- Joel
___
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache


[ccache] ccache direct mode

2010-01-07 Thread Joel Rosdahl
On Thu, 7 Jan 2010 09:37:58 +1100
tridge at samba.org wrote:

>  > The basic idea of how to achieve this is finding out which files were
>  > included by the preprocessor and then storing the hash sums of those
>  > files in a file associated with the input file and compiler arguments.
> 
> How dependent is this on the compiler? Is the format of the include
> file names on different compilers (eg. sun compiler) consistent enough
> for this to be reliable?

If you mean the format used in the preprocessor output to indicate from
which files code comes from, then I don't know. I haven't investigated
other compilers than GCC, and I've only checked versions 3.3-4.3, so
earlier versions are left to check as well.

On the other hand, is ccache supposed to work with compilers that don't
behave like GCC? If so, I guess it should be possible to detect
GCC-ness in some way and select features accordingly.

> Also, does the hashtable used for included_files preserve the
> ordering? (the order of includes is also vital). Or do you rely on the
> hash of the file that does the #include changing for that?

The hashtable is unordered, and yes, I rely on the hash of the input
file to keep track of the ordering, and also of course on the include
file hashes. For a given manifest, the source file (and
therefore the order of the first level of include files) is known since
the manifest is looked up given the hash of the input file (and some
more information), and all other levels of include files are taken
care of using the same kind of reasoning. In other words, if the
include file order changes in some file, then the hash of that file
changes too, which leads to a cache miss. Which include files the
preprocessor reads is of course also a function of compiler options
like -I, but that is handled by also hashing those options when
computing the hash in direct mode. Do you see any potential problem
here?

Regards,
Joel


[ccache] ccache direct mode

2010-01-07 Thread tridge at samba.org
Hi Joel,

 > The basic idea of how to achieve this is finding out which files were
 > included by the preprocessor and then storing the hash sums of those
 > files in a file associated with the input file and compiler arguments.

How dependent is this on the compiler? Is the format of the include
file names on different compilers (eg. sun compiler) consistent enough
for this to be reliable?

Also, does the hashtable used for included_files preserve the
ordering? (the order of includes is also vital). Or do you rely on the
hash of the file that does the #include changing for that?

Cheers, Tridge


[ccache] ccache direct mode

2010-01-06 Thread Joel Rosdahl
Hi,

Some time ago, I observed that running the preprocessor on the input
file and then hashing the output was quite a bit slower than just
hashing the input file and all included files. I then got an idea:
Would it be possible to make ccache hash the source code directly
without running the preprocessor to speed things up? After some
thinking, experimenting and coding, I have concluded that the answer is
yes.

The basic idea of how to achieve this is finding out which files were
included by the preprocessor and then storing the hash sums of those
files in a file associated with the input file and compiler arguments.
When compiling the same input file (with the same arguments), the list
of include files and their hash sums can be read and verified in order
to look up the correct object file. In my implementation, this ccache
mode is called the "direct mode" and the standard ccache way is called 
the "preprocessor mode".

I've chosen the name "manifest" for the data structure containing the
include file list, their hash sums and the asssociated object file
names. The manifest file is stored in the ccache directory under the
name X.manifest, where X is the hash sum of the input file and the
compiler arguments. The manifest doesn't include the object file data,
just the name (which happens to be the hash sum of the preprocessor
output associated with the object file).

So, when ccache is asked to compile a file, the manifest is read, and
for each object file name in the manifest, the associated include
files' hash sums are verified. If there is a match, the compilation
result is known. If no object file matches, ccache falls back to the
preprocessor mode. After preprocessing and compiling, the manifest is
updated with the read include files and their hash sums.

By not running the preprocessor, CPU usage is reduced; the runtime is
about 0.2-1.0 times that of ccache in preprocessor mode. The relative
speedup is higher when I/O is fast (e.g., when files are in the disk
cache). Here are some unscientific measurements of compiling Samba on
my Linux system (with a filled disk cache):

Without ccache: 321 s
With original ccache..: 100 s
With ccache in direct mode:  28 s

I've never seen the direct mode make ccache slower, although it should
be possible in pathological cases.

The implementation is based on the latest CVS revision of ccache plus
most of the patches accumulated in the Debian ccache package. While
experimenting and implementing, I have done some other cleanup and
improvements as well. See
http://github.com/jrosdahl/ccache/raw/master/NEWS for a high-level list
of changes (including those committed to CVS but not yet released).

If you're interested, try it out! And please report any bugs, design
flaws or other problems you find. I'm not aware of any bugs, but I'd be
surprised if there aren't any left. In particular, I have probably made
ccache less portable since I've only built and tried it on relatively
modern GNU/Linux systems and GCCs.

Source code snapshot:

  http://cloud.github.com/downloads/jrosdahl/ccache/ccache-2.4_direct.1.tar.gz

Git repository:

  http://github.com/jrosdahl/ccache

Comments and improvements are welcome.

Regards,
Joel