[ccache] Questions about two hot functions in ccache

2010-10-17 Thread Justin Lebar
Hi, all.

I ran ccache through |perf| on my x64 Linux box today.  In my testcase
(|make clean && perf record -g make| within a subdirectory of the
Firefox tree), there are only four functions that see more than 2% of
the samples:

25.39%   c++  ccache [.]
hash_source_code_string
10.15%   c++  ccache [.] mdfour64
 4.04%   c++  [kernel.kallsyms]  [k]
copy_user_generic_string
 3.14%   c++  ccache [.] mdfour_update

So it appears that 13% of my CPU time is spent computing md4 hashes,
while another 25% is spent in hash_source_code_string but outside the
MD4 code.

To someone new to the code like me, it appears that there's some room
for optimization here.

* hash_source_code_string is doing twice as much work as anything else
in ccache, but only to catch edge cases (comments and special macros).
 If it could be simplified, the speed gains might offset the cost of
additional false positives.  If all we really care about is finding
the strings "__DATE__" and "__TIME__", there are faster algorithms
than a character-by-character search.

(Note also that the current implementation copies the whole file into
hashbuf one character at a time.  Again, do the benefits of stripping
out comments really offset this?)

* Why does ccache still use MD4?  Surely there's a better / faster
hash out there.  I noticed that ccache includes murmurhash, but it
doesn't seem like it's used in too many places.  There's probably a
good reason for this, but it's not too apparent to me.

You all probably know better than I if ccache should use a secure hash
function, or if something like murmurhash is sufficient -- a secure
hash function seems like overkill to me, fwiw.  But either way, is
MD4, which on the one hand is no longer a secure hash function, and
which on the other hand I'd imagine is nowhere near as fast as
something like murmurhash, the right function to use?

I'm curious what you all think about this.

Regards,
-Justin
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Questions about two hot functions in ccache

2010-10-19 Thread Justin Lebar
> Indeed. (This was by the way mentioned on the list a couple of weeks
> ago: http://www.mail-archive.com/ccache@lists.samba.org/msg00532.html)

To respond to Ramiro in that thread:

> Joel, is there a way we can time how long each part of ccache takes?

You can probably use xperf on Windows.

> Do you want to work on this? That would be awesome! I currently don't
> have much ccache time, and when I get some, I would like to work on
> other things first.

I have a few other projects to finish first, but I'll definitely add
this to my (short, but non-empty) list of toolchain patches to write.
It looks like there's serious performance to gain here.

> Even the 64-bit version of murmurhash has way too high
> collision rate.

Ah, I didn't realize that murmurhash gave a single word as output.
Yes, that's no good.

> MD4 has been there from the start and neither Tridge or I have seen any
> reason to switch it. MD5, SHA1 and other even more modern cryptograhic
> hash functions are indeed stronger but also slower, and the increased
> resistance against various crypto attacks doesn't seem necessary in a
> tool like ccache.

I'm also no expert on hash functions, but I'll e-mail around.

-Justin
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Questions about two hot functions in ccache

2010-10-19 Thread Justin Lebar
My cryptographically-inclined friend suggested we use a universal hash
function or something a bit stronger, such as VHASH.

These functions take a "key", which we could choose at random and fix
in the code.

VHASH outputs 64-bit digests with collision probability 2^61, so in
expectation you'd need to hash 2^30 files before you saw a collision.
If that wasn't good enough, we could compute two VHASH digests with
different keys and concatenate them.

-Justin

On Tue, Oct 19, 2010 at 2:31 PM, Martin Pool  wrote:
> On 20 October 2010 08:15, Joel Rosdahl  wrote:
>> MD4 has been there from the start and neither Tridge or I have seen any
>> reason to switch it. MD5, SHA1 and other even more modern cryptograhic
>> hash functions are indeed stronger but also slower, and the increased
>> resistance against various crypto attacks doesn't seem necessary in a
>> tool like ccache. That said, I'm sure there nowadays may exist hash
>> functions that are both better (i.e., with lower collision rate) AND
>> faster than MD4. Do you (or anyone else) know of any with properties
>> that would be a good fit for ccache?
>
> I think any of the cryptographic hash functions will have an even
> distribution of outputs, so nothing else will give stronger resistance
> to accidental collision.  The only problem with MD4 is that it might
> be vulnerable to malicious collisions (which seems pointless in ccache
> as it currently exists) and that others might be faster.
>
> --
> Martin
> ___
> ccache mailing list
> ccache@lists.samba.org
> https://lists.samba.org/mailman/listinfo/ccache
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Questions about two hot functions in ccache

2010-10-20 Thread Justin Lebar
Skimming the VHASH paper, it looks like it runs at about 1 cycle per
byte on a 64-bit Core 2 Merom machine when generating a 128-bit
digest.  (They don't have timings for 32-bit x86.)  It looks like they
just run the hash algorithm twice (with different keys) to generate a
128-bit digest.

I couldn't find great numbers on MD4, but [1] says 3.8cpb on really
old hardware.  Who knows what that would be today.

-Justin

[1] 
http://books.google.com/books?id=Xq4M8YTSeloC&pg=PA75&lpg=PA75&dq=md4+cpb&source=bl&ots=rtrQSLDtoG&sig=bh0mr2SN9_p0NCEF6_ZmxoadTTw&hl=en&ei=C52-TLXZPIuyngfy8f3JBw&sa=X&oi=book_result&ct=result&resnum=1&sqi=2&ved=0CBIQ6AEwAA#v=onepage&q=md4%20cpb&f=false

On Wed, Oct 20, 2010 at 12:34 AM, Martin Pool  wrote:
> On 20 October 2010 17:44, Justin Lebar  wrote:
>> My cryptographically-inclined friend suggested we use a universal hash
>> function or something a bit stronger, such as VHASH.
>>
>> These functions take a "key", which we could choose at random and fix
>> in the code.
>>
>> VHASH outputs 64-bit digests with collision probability 2^61, so in
>> expectation you'd need to hash 2^30 files before you saw a collision.
>> If that wasn't good enough, we could compute two VHASH digests with
>> different keys and concatenate them.
>
> Is VHASH expected to be faster than MD4?  I don't think adding more
> strength will help with anything.  The odds of an accidental MD4
> collision are low, and I don't know of any attack by which being able
> to predict or produce ccache collisions accomplishes anything for the
> attacker.  (If they can write to the cache you have bigger problems.)
>
> --
> Martin
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


[ccache] [Patch] Faster direct-mode hash

2010-11-07 Thread Justin Lebar
This patch is a followup to the discussion in "Questions about two hot
functions in ccache".

On my machine, the patch speeds up direct mode cache hits by about a
factor of 1.7 for the c++_includes.cc test file.  My benchmark of
|make clean && time make| in Mozilla's docshell/base went from 1.04s
(git master) to 0.64s, a 1.63x speedup.

Full output from ../perf.py on c++_includes.cc is included below.

I suspect we could use the fast_hash function for preprocessor mode
without much work.  I also suspect that switching to a smarter
algorithm for searching for "#include" would decrease the cost of
cache misses.  But I haven't profiled either of these cases.

I'm a bit concerned about the fact that I had to change the reported
file lengths in the manifest test (in test.sh).  I'm not sure what's
going on here; I may well have messed something up.  Hopefully not.
:)

-Justin

$ ../perf.py -n10 --hit-factor=10 --ccache=../ccache gcc-4.5 c++_includes.cc
Compilation command: gcc-4.5 c++_includes.cc -c -o c++_includes.o
Compilercheck: mtime
Compression: off
Hardlink: off
Nostats: off

* git master (9cdd1154)
Without ccache:   3.55 s (100.00 %) ( 1.00 x)
With ccache, preprocessor mode, cache miss:   4.16 s (117.10 %) ( 0.85 x)
With ccache, preprocessor mode, cache hit:0.87 s ( 24.51 %) ( 4.08 x)
With ccache, direct mode, cache miss: 4.22 s (118.98 %) ( 0.84 x)
With ccache, direct mode, cache hit:  0.15 s (  4.36 %) (22.92 x)

* patched
Without ccache:   3.53 s (100.00 %) ( 1.00 x)
With ccache, preprocessor mode, cache miss:   4.13 s (116.90 %) ( 0.86 x)
With ccache, preprocessor mode, cache hit:0.86 s ( 24.25 %) ( 4.12 x)
With ccache, direct mode, cache miss: 4.15 s (117.55 %) ( 0.85 x)
With ccache, direct mode, cache hit:  0.09 s (  2.47 %) (40.43 x)

* Speedup:  = .15 / .09 = 1.7x


diff
Description: Binary data
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] [Patch] Faster direct-mode hash

2010-11-07 Thread Justin Lebar
> check_for_temporal_macros could stop searching if both macros have been found?
> I cannot tell if doing that would make any real difference.

We could certainly do that, although I don't think it would help much.
 If your code has lots of __TIME__s, you're screwed anyway.  :)

On Sun, Nov 7, 2010 at 1:47 PM, Anders Furuhed
 wrote:
> Hi,
>
> check_for_temporal_macros could stop searching if both macros have been found?
> I cannot tell if doing that would make any real difference.
>
> Regards,
> Anders
>
> 7 nov 2010 kl. 22.13 skrev Justin Lebar:
>
>> This patch is a followup to the discussion in "Questions about two hot
>> functions in ccache".
>>
>> On my machine, the patch speeds up direct mode cache hits by about a
>> factor of 1.7 for the c++_includes.cc test file.  My benchmark of
>> |make clean && time make| in Mozilla's docshell/base went from 1.04s
>> (git master) to 0.64s, a 1.63x speedup.
>>
>> Full output from ../perf.py on c++_includes.cc is included below.
>>
>> I suspect we could use the fast_hash function for preprocessor mode
>> without much work.  I also suspect that switching to a smarter
>> algorithm for searching for "#include" would decrease the cost of
>> cache misses.  But I haven't profiled either of these cases.
>>
>> I'm a bit concerned about the fact that I had to change the reported
>> file lengths in the manifest test (in test.sh).  I'm not sure what's
>> going on here; I may well have messed something up.  Hopefully not.
>> :)
>>
>> -Justin
>>
>> $ ../perf.py -n10 --hit-factor=10 --ccache=../ccache gcc-4.5 c++_includes.cc
>> Compilation command: gcc-4.5 c++_includes.cc -c -o c++_includes.o
>> Compilercheck: mtime
>> Compression: off
>> Hardlink: off
>> Nostats: off
>>
>> * git master (9cdd1154)
>> Without ccache:                               3.55 s (100.00 %) ( 1.00 x)
>> With ccache, preprocessor mode, cache miss:   4.16 s (117.10 %) ( 0.85 x)
>> With ccache, preprocessor mode, cache hit:    0.87 s ( 24.51 %) ( 4.08 x)
>> With ccache, direct mode, cache miss:         4.22 s (118.98 %) ( 0.84 x)
>> With ccache, direct mode, cache hit:          0.15 s (  4.36 %) (22.92 x)
>>
>> * patched
>> Without ccache:                               3.53 s (100.00 %) ( 1.00 x)
>> With ccache, preprocessor mode, cache miss:   4.13 s (116.90 %) ( 0.86 x)
>> With ccache, preprocessor mode, cache hit:    0.86 s ( 24.25 %) ( 4.12 x)
>> With ccache, direct mode, cache miss:         4.15 s (117.55 %) ( 0.85 x)
>> With ccache, direct mode, cache hit:          0.09 s (  2.47 %) (40.43 x)
>>
>> * Speedup:  = .15 / .09 = 1.7x
>> ___
>> ccache mailing list
>> ccache@lists.samba.org
>> https://lists.samba.org/mailman/listinfo/ccache
>
> Anders Furuhed
> Pantor Engineering AB
> +46 8 412 9781
>
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] [Patch] Faster direct-mode hash

2010-11-08 Thread Justin Lebar
> The improved search for __{DATE,TIME}__ is uncontroversial, so that can be
> applied right away. However, I would like to make the
> LFG-based digest opt-in, at least for now, since I think we
> need time to test it and to collect hash-savvy people's opinions.

That sounds pretty reasonable to me.  In this case, you'll probably
just want to substitute the patch's fast_hash_buffer() call with
hash_buffer() -- that is, don't accumulate the string to hash one
character at a time like the code currently does.

> By the way, can you provide some reference to why LFG (and the properties
> you chose) would work well as a digest for ccache's purpose? What's the
> expected collision rate? Or in other words: how well can we sleep at night,
> knowing that we haven't messed up people's builds, if we would introduce the
> LFG-based algorithm? :-)

I don't have as good a reason as I should; I was just implementing
Michael Niedermayer 's suggestion from the previous
thread, as it seemed pretty reasonable.  Hopefully he can justify my
decision for me.  :D

-Justin
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


[ccache] Would backdating source files allow sharing hardlinks between trees?

2010-11-25 Thread Justin Lebar
Hi, all.

One of the things we'd like to do at Mozilla is use ccache with
CCACHE_HARDLINK between two separate checkouts of our code.

This is currently problematic, because all of the hardlinked object
files share the same modified-time.  As a result, when a source file
matches an object file in the cache, we have two unappealing options:

1) Update the object file's mtime to now.  If we do this, then when we
rebuild in another tree, the object file will be newer than the things
which depend on it (e.g. shared libraries which include the file), so
we'll have to relink.

2) Leave the object file's mtime unmodified.  If we do this, then when
we rebuild this same tree, the object file will appear out of date.

I believe ccache takes the first approach, which seems sensible.

But as an alternative, what if we backdated the source file's mtime to
before the object file was created and left the object file's mtime
unchanged?  (If we didn't want to arbitrarily choose a time, we could
save the mtime of the source file which originally generated the
object file and use that.)

This would be an option, of course -- it's kind of strange behavior,
so it certainly shouldn't be on unless the user asks for it.  But I
wonder if this actually solves the problem.  I suspect it might screw
up some important tool, but I can't think of a potential bad
interaction off the top of my head.

What do you all think?

-Justin
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] [Patch] Faster direct-mode hash

2010-11-26 Thread Justin Lebar
Here's my earlier patch split in two.

Unfortunately, the faster search algorithm doesn't seem to help much
on its own -- looks like you really need the faster hash, too.  The
results below are from

   perf.py gcc-4.5 c++_includes.cpp


*** All patches:

Without ccache:   6.51 s (100.00 %) ( 1.00 x)
With ccache, preprocessor mode, cache miss:   7.76 s (119.11 %) ( 0.84 x)
With ccache, preprocessor mode, cache hit:1.65 s ( 25.34 %) ( 3.95 x)
With ccache, direct mode, cache miss: 7.78 s (119.50 %) ( 0.84 x)
With ccache, direct mode, cache hit:  0.13 s (  1.96 %) (51.06 x)

*** Just temporal macro search:

Without ccache:   6.50 s (100.00 %) ( 1.00 x)
With ccache, preprocessor mode, cache miss:   7.74 s (118.99 %) ( 0.84 x)
With ccache, preprocessor mode, cache hit:1.65 s ( 25.37 %) ( 3.94 x)
With ccache, direct mode, cache miss: 7.86 s (120.81 %) ( 0.83 x)
With ccache, direct mode, cache hit:  0.23 s (  3.47 %) (28.82 x)

*** unpatched:

Without ccache:   6.51 s (100.00 %) ( 1.00 x)
With ccache, preprocessor mode, cache miss:   7.73 s (118.75 %) ( 0.84 x)
With ccache, preprocessor mode, cache hit:1.66 s ( 25.56 %) ( 3.91 x)
With ccache, direct mode, cache miss: 7.94 s (121.95 %) ( 0.82 x)
With ccache, direct mode, cache hit:  0.28 s (  4.34 %) (23.04 x)

On Mon, Nov 8, 2010 at 2:06 PM, Justin Lebar  wrote:
>> The improved search for __{DATE,TIME}__ is uncontroversial, so that can be
>> applied right away. However, I would like to make the
>> LFG-based digest opt-in, at least for now, since I think we
>> need time to test it and to collect hash-savvy people's opinions.
>
> That sounds pretty reasonable to me.  In this case, you'll probably
> just want to substitute the patch's fast_hash_buffer() call with
> hash_buffer() -- that is, don't accumulate the string to hash one
> character at a time like the code currently does.
>
>> By the way, can you provide some reference to why LFG (and the properties
>> you chose) would work well as a digest for ccache's purpose? What's the
>> expected collision rate? Or in other words: how well can we sleep at night,
>> knowing that we haven't messed up people's builds, if we would introduce the
>> LFG-based algorithm? :-)
>
> I don't have as good a reason as I should; I was just implementing
> Michael Niedermayer 's suggestion from the previous
> thread, as it seemed pretty reasonable.  Hopefully he can justify my
> decision for me.  :D
>
> -Justin
>


fast-hash
Description: Binary data


fast-temporal-macro-search
Description: Binary data
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Stumbling blocks with ccache and embedded/encapsulated environments

2010-12-02 Thread Justin Lebar
>> Even on a ccache *hit* both copies of the .o file wind up occupying
>> buffer cache space, because the ccached .o is read from disk [paging
>> it in] in order to write the .o file to the build output directory.
>> On a ccache miss the copy runs the other direction but you still wind
>> up with both sets of pages in the buffer cache.
>
> In the hit case I would have thought that the .o file you read would
> still create less memory pressure than the working memory of running
> the real compiler on that file?  Perhaps the difference is that the
> kernel knows that when the compiler exits, its anonymous pages can be
> thrown away, whereas it doesn't know which .o file it ought to retain.
>  So perhaps madvise might help.  (Just speculating.)

I'm curious about this.  I guess you'd madvise to tell the kernel that
the .o you just wrote shouldn't be cached?  But presumably it should
be, because you're going to link your program.

Alternatively, you could madvise and tell the kernel not to cache the
.o file from ccache's cache.  But if you re-compile, you want ccache's
cache to be in memory.

I'm not sure how one might win here without hardlinking.

-Justin

On Thu, Dec 2, 2010 at 4:24 PM, Martin Pool  wrote:
> On 3 December 2010 03:42, Christopher Tate  wrote:
>>> I'd love to know whether you also tried distcc for it, and if so what
>>> happened or what went wrong.  (Obviously it can only help for the
>>> C/C++ phases.)
>>
>> distcc can certainly help a great deal.  For us, it's a bit
>> problematic to use because more than half of our total build is
>> non-C/C++ that depends on the C/C++ targets [e.g. Java-language
>> modules that have partially native implementations],
>
> ... and you suspect that the Makefile dependencies are not solid
> enough to safely do a parallel build?
>
>> plus we have a
>> highly heterogeneous set of build machines: both Mac hosts and Linux,
>> not all the same distro of Linux, etc.  The inclusion of Macs in
>> particular makes distcc more of a pain to get up and running cleanly.
>
> That can certainly be a problem.
>
>>> I'm just trying to understand how this happens.  Is it that when
>>> ccache misses it writes out an object file both to the cache directory
>>> and into the build directory, and both will be in the buffer cache?
>>> So it's not so much they're paged in, but they are dirtied in memory
>>> and will still be held there.
>>
>> Even on a ccache *hit* both copies of the .o file wind up occupying
>> buffer cache space, because the ccached .o is read from disk [paging
>> it in] in order to write the .o file to the build output directory.
>> On a ccache miss the copy runs the other direction but you still wind
>> up with both sets of pages in the buffer cache.
>
> In the hit case I would have thought that the .o file you read would
> still create less memory pressure than the working memory of running
> the real compiler on that file?  Perhaps the difference is that the
> kernel knows that when the compiler exits, its anonymous pages can be
> thrown away, whereas it doesn't know which .o file it ought to retain.
>  So perhaps madvise might help.  (Just speculating.)
>
> --
> Martin
> ___
> ccache mailing list
> ccache@lists.samba.org
> https://lists.samba.org/mailman/listinfo/ccache
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Using git file hashes for ccache

2010-12-29 Thread Justin Lebar
> It is my understanding that in the ccache hit case, a significant
> fraction of the running time is spent computing hashes of the original
> source files.

Yes, ccache spends most of its time hashing when it gets a direct mode
cache hit, at least according to my measurements.  I wrote a patch a
little while ago which uses a less-secure hash function which speeds
up ccache somewhat; you may want to try applying it and see if it
speeds up your builds.  (Interestingly, the ccache speed improvement
didn't translate into faster Firefox builds for me -- I haven't had a
chance to investigate why.)

> git is also frequently used for development, makes use of file hashes,
> and is extremely fast. When doing operations such as git diff, in the
> common case where the source file has not been modified, git will
> notice that the file's attributes (including mtime) matches these
> stored in the git index file, and thus it won't have to actually read
> the file to conclude that the contents have not changed.

Maybe the right thing to do would be to have ccache keep track of the
source files' attributes.  If some environment variable was set,
ccache would treat a file with unchanged attributes as unchanged.
(ccache could maintain a new index into its cache, indexed on absolute
path, or it could hash a string "magic-bitstring | file-path | file
attributes" and use the current cache infrastructure.)  This seems a
lot simpler than trying to interface with git.

I imagine this would be a safe optimization for most users to turn on
--  I don't think too many users modify files without changing their
mtimes, as this would mess up most build systems.  But it might be
especially useful if users could give a list of paths and say that
this optimization applied for all subdirectories of each of the given
paths.  That way you could turn on the optimization for your system's
header files, which AIUI get hashed over and over again in direct
mode, but almost never change.

-Justin

On Wed, Dec 29, 2010 at 11:54 PM, Michel Lespinasse  wrote:
> Hi,
>
> It is my understanding that in the ccache hit case, a significant
> fraction of the running time is spent computing hashes of the original
> source files.
>
> git is also frequently used for development, makes use of file hashes,
> and is extremely fast. When doing operations such as git diff, in the
> common case where the source file has not been modified, git will
> notice that the file's attributes (including mtime) matches these
> stored in the git index file, and thus it won't have to actually read
> the file to conclude that the contents have not changed.
>
> I often use ccache to compile files out of git trees, and I was
> thinking that it could make use of the git index as well. The idea
> would be to use sha1 hashes instead of md4, and get these hashes out
> of the index (rather than computing them from the source file) when
> the file attributes match.
>
> I am wondering, has this been considered before ? what would project
> maintainers think of going that direction ?
>
> Thanks,
>
> --
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.
> ___
> ccache mailing list
> ccache@lists.samba.org
> https://lists.samba.org/mailman/listinfo/ccache
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Using git file hashes for ccache

2010-12-31 Thread Justin Lebar
> B) index based on hash of file contents, but have a ccache maintain
> database of (file name + attributes) -> (hash of file contents) pairs

> C) index based on hash of file contents, and use git index for looking
> up (file name + attributes) -> (hash of file contents) pairs

> C benefits people who frequently switch their git workspace between
> multiple branches. When switching back to a previously compiled
> branch, the file mtimes will be updated, but the git index shows that
> the contents haven't.

I expect that approach B would speed up ccache direct mode hits
significantly, as most of the time, you'd only hash the source file,
and you'd use the cached hashes of the files it includes.  If it runs
faster, presumably there would be less to gain by invoking ccache less
often.

I'm very skeptical that we want to add to ccache the kind of
complexity (and tight coupling!) that option C requires.  Furthermore,
it seems to me that some of this logic (e.g. "don't build me because,
although my mtime has changed, my contents haven't") belongs in the
build system.

I'd also guess that C wouldn't be much faster than B, since in the
steady state, B hashes only the source file and has cached hashes of
most or all of the source file's includes.

On Fri, Dec 31, 2010 at 8:12 AM, Michel Lespinasse  wrote:
> On Fri, Dec 31, 2010 at 4:27 AM, Wilson Snyder  wrote:
>> I also think this is a good approach, though having been
>> down the road before, mtime isn't always enough as you
>> noted, but including the size also makes it *almost*
>> perfect.  Most edits change the size.
>>
>> Note several tools like scons use this technique, and some
>> store the hashes in a single hash file inside each source
>> directory.  That has the nice advantage of allowing sharing,
>> though the downside of poluting the source areas so I don't
>> really like it.  I think putting it into the ccache
>> infrastructure is nicer; but you may still want multiple
>> hashes to be stored under a hash of the directory name,
>> instead of a hash of the filename, because that allows
>> reading fewer files.  (Otherwise reading the hundreds of
>> hash files will become the new bottleneck.)
>
> I actually see 3 different variants being discussed in this thread:
>
> A) index based on hash of file name + attributes instead of hash of
> file contents
> B) index based on hash of file contents, but have a ccache maintain
> database of (file name + attributes) -> (hash of file contents) pairs
> C) index based on hash of file contents, and use git index for looking
> up (file name + attributes) -> (hash of file contents) pairs
>
> A is simplest, and would probably work well enough for system include
> files. Not so much for project files though, especially if we want to
> support CCACHE_BASEDIR (ctime/mtime probably won't match across
> checked out versions).
>
> B could work pretty well, I think. There is the question of where to
> store that new database, but it's probably doable - the database is
> only a cache, so it's always OK to expire entries if it grows too
> much.
>
> C benefits people who frequently switch their git workspace between
> multiple branches. When switching back to a previously compiled
> branch, the file mtimes will be updated, but the git index shows that
> the contents haven't. This type of operation is the source of many
> ccache hits for me (after all, the compiler wouldn't even get invoked
> by make if no mtimes had changed).
>
> Making C work seems complicated, as we'd need to be able to read the
> git index. OTOH, this also nicely solves the problem of expiring
> database entries: git is in charge of maintaining the index so we
> don't need to care about it for project files, and out-of-project
> files such as system headers shouldn't change nearly as often so we'd
> hardly ever need to expire them from the ccache database. We could
> even avoid any problems of concurrent database updates by just never
> having ccache update any (file name + attributes) -> (hash of file
> contents) database - git would be in charge of updating its index for
> in-project files, and we could have an out-of-line ccache option to do
> it for infrequently-modified system files...
>
> --
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] improving ccache to work with debug symbols?

2011-02-28 Thread Justin Lebar
I like this idea.

I imagine the overhead of even a naive search-and-replace will be
pretty small compared to the hashing ccache does in direct mode, since
ccache currently hashes all of a file's #includes.

But if you want to make the search-and-replace really fast, you could
choose your nonce in such a way as to make a Boyer-Moore(-Horspool)
search particularly efficient.  Just pick characters which don't
usually appear in object files.

(This might become more important if ccache adds an option to look at
files' mtimes instead of always hashing, as has been discussed here.)

-Justin

On Mon, Feb 28, 2011 at 12:18 AM, James Donald  wrote:
> For usage of ccache at our company, here is one of our biggest hassles today: 
> even with the CCACHE_BASEDIR feature, users complain when gdb points to files 
> that no longer exist.
>
> Independently, a couple of our engineers have proposed the following fix :
>
> 1. On a cache miss, generate preprocessor output.
> 2. Modify it to extend debug symbol path names to some arbitrary path e.g. :
>  # 1 "/any/path/you/want/ca44a4def837bde348a738112/a.c"
>  # 1 ""
>  # 1 ""
>  # 1 "/any/path/you/want/ca44a4def837bde348a738112/a.c"
>  int main() {
>      return 0;
>  }
> 3. Compile the modified text into an object binary and store in the cache.
> 4. On a cache hit, retrieve the binary, search-and-replace the binary 
> representation of ca44a4def837bde348a738112 to the desired path name padded 
> with zeroes.
>
> All in all, it's not too different from the distcc patch to fix a similar 
> problem (http://testbit.eu/~timj/patches/), but adapted for ccache.
>
> I have run this by Joel so far. We have yet to hear of any existing patch to 
> fix the debug symbol path problem, but the idea seems feasible at first 
> glance.
> ---
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> ---
> ___
> ccache mailing list
> ccache@lists.samba.org
> https://lists.samba.org/mailman/listinfo/ccache
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] ccache cache in RAM -- bypassing file cache?

2011-06-24 Thread Justin Lebar
I'm not sure that ramfs copies data into the buffer cache.  According
to [1], a ramfs mount *is* the buffer cache.

[1] 
http://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt

On Fri, Jun 24, 2011 at 2:40 PM, David Coppit  wrote:
> We're trying to debug a performance difference of 12% speedup on a machine
> with 12GB versus a 41% speedup on a similar machine with 48GB of RAM.
>
> We've profiled the build, and believe it to have performance degradation if
> it has less than 10GB of ram. (5% slower builds with 10GB of system RAM, 10%
> slower builds with 9GB of system RAM.)
>
> Our ccache cache is in memory, on a ramfs mounted this way: sudo mount -t
> ramfs -o size=1G,mode=0700 ramfs /mnt/ramfs
>
> Our cache size after the initial warming run is 1GB.
>
> Here's what I'm thinking is happening: The OS is reading from the ramdisk
> and then caching the files in memory like it normally would. This is
> creating file cache contention with other file I/O in the build, offsetting
> our performance gain.
>
> For the machine with plenty of spare RAM, there is no file cache contention
> in the OS, and we see much better perf. Does this sound reasonable?
>
> I don't know if there's a way to prevent the OS from caching files on the
> ramfs mount, except to modify ccache's open() calls to include O_DIRECT. If
> I go that route, do I need to just modify the calls in hash.c, manifest.c,
> util.c, and gzlib.c?
> ___
> ccache mailing list
> ccache@lists.samba.org
> https://lists.samba.org/mailman/listinfo/ccache
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Support for -fprofile-generate, -fprofile-use, -fprofile-arcs

2011-07-21 Thread Justin Lebar
Do you need to hash in the cwd when compiling with -fprofile-generate?

If I use direct mode and don't have any absolute paths in my build, it
looks like I could have a cache hit between two compiles in different
directories.  That would be bad, since I think the absolute path the
.gcda is output to is hardcoded into the object file...

-Justin

On Thu, Jul 21, 2011 at 2:22 PM, Chris AtLee  wrote:
> Hi,
>
> I recently did some work to get ccache to support gcc's various
> -fprofile-* options. In some local testing in works great.
>
> I've got the code up on github right now:
> https://github.com/catlee/ccache/compare/jrosdahl:master...catlee:profile
>
> Does this approach look like it will work?
>
> Cheers,
> Chris
> ___
> ccache mailing list
> ccache@lists.samba.org
> https://lists.samba.org/mailman/listinfo/ccache
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Support for -fprofile-generate, -fprofile-use, -fprofile-arcs

2011-07-25 Thread Justin Lebar
+  if (output_to_real_object_first) {
+if (copy_file(tmp_obj, cached_obj, enable_compression) != 0) {
+  cc_log("Failed to move %s to %s: %s", tmp_obj, cached_obj,
strerror(errno));
+  stats_update(STATS_ERROR);
+  failed();
+}

Maybe you should hardlink, if CCACHE_HARDLINK is set.  Unless
copy_file already does that?

+  if (profile_use) {
+/* Calculate gcda name */
+char *gcda_name;
+char *base_name;
+output_to_real_object_first = true;
+base_name = remove_extension(output_obj);
+gcda_name = format("%s/%s.gcda", profile_dir, base_name);
+cc_log("Adding profile data %s to our hash", gcda_name);
+/* Add the gcda to our hash */
+hash_delimiter(hash, "-fprofile-use");
+hash_file(hash, gcda_name);

Could you add a comment saying that profile_dir is the cwd if
-fprofile-use doesn't have a parameter?  I initially thought it was
null (and that output_obj was an absolute path), and was confused as
to how this might work.

What happens if I specify -fprofile-use=dir1
-fbranch-probabilities=dir2?  I think ccache should bail and say it's
too hard in this and similar cases.

This looks good to me, for whatever that's worth.  :)

-Justin

On Mon, Jul 25, 2011 at 3:19 PM, Chris AtLee  wrote:
> I've pushed some changes up here that I hope addresses all the comments:
> https://github.com/catlee/ccache/compare/jrosdahl:master...catlee:profile
>
>
> On Fri, Jul 22, 2011 at 3:19 PM, Chris AtLee  wrote:
>> On Fri, Jul 22, 2011 at 1:49 PM, Joel Rosdahl  wrote:
>>> Justin Lebar's point that the cwd probably needs to be hashed seems
>>> valid. Other than that, I think it looks generally fine, but I only have
>>> limited knowledge about the -fprofile-* options so I can't say I
>>> understand their interaction completely. Some comments and questions,
>>> though:
>>>
>>> I think that the "output_to_real_object_first mode" should really be the
>>> default and only mode. That would of course mean that the things in
>>> from_cache() that need to be done after storing an object in the cache
>>> should be refactored out somehow, but we can do that later.
>>
>> Sounds good. I'll need to add another flag to indicate that we need to
>> hash cwd then if output_to_real_object_first is the default.
>>
>>> What about the -fprofile-generate=path and -fprofile-use=path forms?
>>> Should those be detected as well?
>>
>> For -fprofile-generate, I don't think we need to worry about the
>> profile directory since the inputs to the compilation aren't affected
>> by where the profile data is stored. -fprofile-use does need to know
>> where to look so that it can add the profile data to the hash. I'm
>> half tempted to remove -fprofile-use support, I'm not sure how likely
>> you are to get cache hits there. Running the same executable twice in
>> a row generates different profile data.
>>
>>> tmp_obj is freed at the end of to_cache(), so output_obj will refer to
>>> freed memory when dereferenced in from_cache() later. An x_strdup is needed.
>>
>> Fixed.
>>
>>> //-style comments won't compile on systems with C89 compilers, so use
>>> /**/ instead.
>>
>> Fixed.
>>
>>> You should probably use hash_delimiter(hash, "-fprofile-use") or so
>>> before hashing the gcda file (and then I guess that the hashing of "no
>>> data" won't be necessary).
>>
>> Ah, that's what that function is for!
>>
>> Thanks for the feedback, I should have some new commits up shortly.
>>
>> Cheers,
>> Chris
> ___
> ccache mailing list
> ccache@lists.samba.org
> https://lists.samba.org/mailman/listinfo/ccache
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Support for -fprofile-generate, -fprofile-use, -fprofile-arcs

2011-08-02 Thread Justin Lebar
Catlee,

Have you done any experiments to determine whether this actually gets
cache hits with -fprofile-use?  That is, do you ever get identical
.gcda files?  They could have a timestamp or something inside...

-Justin

On Mon, Aug 1, 2011 at 9:52 PM, Chris AtLee  wrote:
> Thanks again for all your feedback.
>
> My latest code on github should address all the comments so far:
> https://github.com/catlee/ccache/compare/jrosdahl:master...catlee:profile
>
> Keep the feedback coming!
>
> Cheers,
> Chris
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Support for -fprofile-generate, -fprofile-use, -fprofile-arcs

2011-08-08 Thread Justin Lebar
On Mon, Aug 8, 2011 at 9:58 AM, Chris AtLee  wrote:
> Any thoughts on if caching the .gcda files is useful? Maybe have
> another environment variable that switches it on?

What do you mean?  These files aren't generated until the program is
run, right?  So I guess what you're suggesting is that on the
-fprofile-use pass, we'd remember the gcda file that was there.  Then
on a later -fprofile-generate pass, if we have a cache hit, we'd
restore both the object and the gcda file that object eventually
created when it ran.

I'm not sure would be useful, particularly because gcda files are (in
theory -- I haven't tested) cumulative.  So if ccache restores the
gcda file and I then run the -fprofile-generate build, I'm going to
accumulate data into the file ccache restored.  Now some of my gcda
files have data from one run and some have data from two runs,
depending on whether ccache had a cache hit.

-Justin

> Other than that, are there any other concerns with the new code?
>
> Cheers,
> Chris
>
> On Tue, Aug 2, 2011 at 12:05 PM, Chris AtLee  wrote:
>> If you use the same .gcda files with -fprofile-use, then you do get
>> cache hits. However, running even a simple program with no loops or
>> branches twice in a row generates different .gcda files, which then
>> results in cache misses. So I'm not sure how useful the -fprofile-use
>> side of things is in general.
>>
>> On Tue, Aug 2, 2011 at 11:17 AM, Justin Lebar  wrote:
>>> Catlee,
>>>
>>> Have you done any experiments to determine whether this actually gets
>>> cache hits with -fprofile-use?  That is, do you ever get identical
>>> .gcda files?  They could have a timestamp or something inside...
>>>
>>> -Justin
>>>
>>> On Mon, Aug 1, 2011 at 9:52 PM, Chris AtLee  wrote:
>>>> Thanks again for all your feedback.
>>>>
>>>> My latest code on github should address all the comments so far:
>>>> https://github.com/catlee/ccache/compare/jrosdahl:master...catlee:profile
>>>>
>>>> Keep the feedback coming!
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>
>>
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Support for -fprofile-generate, -fprofile-use, -fprofile-arcs

2011-08-08 Thread Justin Lebar
> The .gcda files themselves aren't cached, their contents are used to
> calculate the hash for a -fprofile-use run. So if the .o file doesn't
> exist, and you have the same .gcda file, you get a cache hit.

Ah, I see.  What if the .o file does exist?  Why should that matter,
if gcc is going to overwrite it anyway?

You mentioned earlier that a simple program without branches or loops
didn't generate the same .gcda files when the program was run twice.
Would you mind including that code?  I wrote a pretty simple testcase
and observed the opposite result.  Maybe it's different in different
versions of gcc or something.

-Justin

On Mon, Aug 8, 2011 at 10:27 AM, Chris AtLee  wrote:
> On Mon, Aug 8, 2011 at 10:06 AM, Justin Lebar  wrote:
>> On Mon, Aug 8, 2011 at 9:58 AM, Chris AtLee  wrote:
>>> Any thoughts on if caching the .gcda files is useful? Maybe have
>>> another environment variable that switches it on?
>>
>> What do you mean?  These files aren't generated until the program is
>> run, right?  So I guess what you're suggesting is that on the
>> -fprofile-use pass, we'd remember the gcda file that was there.  Then
>> on a later -fprofile-generate pass, if we have a cache hit, we'd
>> restore both the object and the gcda file that object eventually
>> created when it ran.
>
> The .gcda files themselves aren't cached, their contents are used to
> calculate the hash for a -fprofile-use run. So if the .o file doesn't
> exist, and you have the same .gcda file, you get a cache hit.
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Support for -fprofile-generate, -fprofile-use, -fprofile-arcs

2011-08-10 Thread Justin Lebar
gcda files are cumulative.  Try

$ ./test; md5sum test.gcda; rm test.gcda
hello world
1c14199a60b2e5b9e6f1e96360adc40c  test.gcda
$ ./test; md5sum test.gcda; rm test.gcda
hello world
1c14199a60b2e5b9e6f1e96360adc40c  test.gcda

On Wed, Aug 10, 2011 at 10:51 AM, Chris AtLee  wrote:
> On Mon, Aug 8, 2011 at 4:24 PM, Justin Lebar  wrote:
>>> The .gcda files themselves aren't cached, their contents are used to
>>> calculate the hash for a -fprofile-use run. So if the .o file doesn't
>>> exist, and you have the same .gcda file, you get a cache hit.
>>
>> Ah, I see.  What if the .o file does exist?  Why should that matter,
>> if gcc is going to overwrite it anyway?
>
> It doesn't really matter..my point was that hopefully your make
> dependencies are set up so you're not calling gcc if you don't need
> to.
>
>> You mentioned earlier that a simple program without branches or loops
>> didn't generate the same .gcda files when the program was run twice.
>> Would you mind including that code?  I wrote a pretty simple testcase
>> and observed the opposite result.  Maybe it's different in different
>> versions of gcc or something.
>
> I have a simple hello world program. Even if I comment out the printf
> I get different .gcda files after each run.
>
> #include 
>
> int main() {
>    printf("hello world\n");
>    return 0;
> }
>
> gcc -fprofile-generate   -c -o test.o test.c
> gcc -fprofile-generate  test.o   -o test
> ./test; md5sum test.gcda
> 83fdede120951b154184271416082bdb  test.gcda
> ./test; md5sum test.gcda
> 230d10c340e6ae068a7d65b4bc355472  test.gcda
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Support for -fprofile-generate, -fprofile-use, -fprofile-arcs

2011-08-10 Thread Justin Lebar
On Wed, Aug 10, 2011 at 11:22 AM, Chris AtLee  wrote:
> Ah ha!
>
> What does this mean for whether it's worthwhile to include the .gcda
> contents in the hash when using -fprofile-use, and therefore if it's
> worthwhile to cache -fprofile-use at all?

This suggests to me that it might be worthwhile to include the .gcda
file in the hash.  You'd presumably get cache hits on simple files, or
files whose code is never run.

I'd be curious whether Firefox gets any cache hits during -fprofile-use.

-Justin

> On Wed, Aug 10, 2011 at 11:03 AM, Justin Lebar  wrote:
>> gcda files are cumulative.  Try
>>
>> $ ./test; md5sum test.gcda; rm test.gcda
>> hello world
>> 1c14199a60b2e5b9e6f1e96360adc40c  test.gcda
>> $ ./test; md5sum test.gcda; rm test.gcda
>> hello world
>> 1c14199a60b2e5b9e6f1e96360adc40c  test.gcda
>>
>> On Wed, Aug 10, 2011 at 10:51 AM, Chris AtLee  wrote:
>>> On Mon, Aug 8, 2011 at 4:24 PM, Justin Lebar  wrote:
>>>>> The .gcda files themselves aren't cached, their contents are used to
>>>>> calculate the hash for a -fprofile-use run. So if the .o file doesn't
>>>>> exist, and you have the same .gcda file, you get a cache hit.
>>>>
>>>> Ah, I see.  What if the .o file does exist?  Why should that matter,
>>>> if gcc is going to overwrite it anyway?
>>>
>>> It doesn't really matter..my point was that hopefully your make
>>> dependencies are set up so you're not calling gcc if you don't need
>>> to.
>>>
>>>> You mentioned earlier that a simple program without branches or loops
>>>> didn't generate the same .gcda files when the program was run twice.
>>>> Would you mind including that code?  I wrote a pretty simple testcase
>>>> and observed the opposite result.  Maybe it's different in different
>>>> versions of gcc or something.
>>>
>>> I have a simple hello world program. Even if I comment out the printf
>>> I get different .gcda files after each run.
>>>
>>> #include 
>>>
>>> int main() {
>>>    printf("hello world\n");
>>>    return 0;
>>> }
>>>
>>> gcc -fprofile-generate   -c -o test.o test.c
>>> gcc -fprofile-generate  test.o   -o test
>>> ./test; md5sum test.gcda
>>> 83fdede120951b154184271416082bdb  test.gcda
>>> ./test; md5sum test.gcda
>>> 230d10c340e6ae068a7d65b4bc355472  test.gcda
>>>
>>
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Support for -fprofile-generate, -fprofile-use, -fprofile-arcs

2011-08-21 Thread Justin Lebar
> Assuming -fprofile-generate/-fprofile-use also work on windows, that
> code shouldn't be ifdef'ed out. Instead I need an equivalent to
> x_realpath for windows (or windows support for x_realpath).

I'm confused; do you mean MSVC or GCC on Windows?

MSVC PGO doesn't have a set of compilation flags equivalent to
-fprofile-generate and -fprofile-use.  Most of the work is done in the
linker.  See the bottom of [1].

Does ccache support GCC on Windows running outside a cygwin-like
environment?  Presumably if you're in cygwin, you have x_realpath?

-Justin

[1] http://msdn.microsoft.com/en-us/library/aa289170%28v=vs.71%29.aspx

>
> Cheers,
> Chris
> ___
> ccache mailing list
> ccache@lists.samba.org
> https://lists.samba.org/mailman/listinfo/ccache
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Duplicate object files in the ccache - possible optimization?

2011-11-07 Thread Justin Lebar
> Looking at a ccache with about 40,000 .o files in it (created with direct
> mode turned on); of the 55 largest files, I found 11 pairs and one triplet
> of identical object files.  That's almost 25% of redundant storage that
> could have been avoided by looking at the preprocessed hash when there is no
> hit in direct mode.

It's much more interesting to look at the whole cache, I think.

$ find -name '*.o' -type f | wc -l
39312
jlebar@turing:~/.ccache$ find -name '*.o' -type f | xargs -P16 sha1sum
| cut -f 1 -d ' ' | sort | uniq -d | wc -l
1230

So it looks like there's some duplication on my machine, but not a
ton.  I'd be curious if you got significantly different numbers.

On Mon, Nov 7, 2011 at 12:49 PM, Frank Klotz
 wrote:
>  Hi Martin,
>
> Thanks for your responses.
>
> s/index hash/direct mode hash/g
>
> Apologies - I had a brain burp and was using the wrong terminology.
>
> That aside, however, with  the advent of direct mode, there ARE two hashes
> possible for any given object file - the direct mode hash (hashing all the
> sources that go into the compilation) and the preprocessed hash (hashing the
> result of running all those sources through the preprocessor).  And any time
> there is a cache miss, ccache has computed both those hashes, hasn't it?
>  (Or maybe not - if not, see discussion below.)  And it appears to me that
> in many cases, the resulting object file occurs twice in the cache, once
> under each hash.  And currently, those two occurrences are two separate
> files, which could be combined into a single inode with two hard-linked
> directory entries.
>
> Or am I confused about how direct mode interacts with preprocessed mode?  If
> running in direct mode, does ccache never compute the preprocessed hash?  If
> not, it obviously could, and I would recommend that it should.  Why?
>  Because when changes are made to a widely-used header file, it very
> commonly occurs that those changes only actually modify the preprocessor
> output of a small subset of the sources that include that header file, while
> many other sources don't use the changes (say, definition of new macros or
> new constants), so end up with the same preprocessed output, and the same
> object file, even though the input header files and direct mode hash did
> change).  In that case, ccache could still find hits in the cache with the
> preprocessed mode, even if it's a miss with the direct mode hash.  If ccache
> does not get a direct mode hit, it certainly will have to RUN the
> preprocessor to recompile the file - how much extra cost to compute the
> preprocessed hash, look it up (to avoid recompiling if it is found with THAT
> hash), and if a compile is still needed, store the resulting object file
> inode with 2 directory entries rather than just one?
>
> The way I read the doc about how direct mode works, I thought it would
> compute the direct mode hash, and if no hit, "fall back to preprocessed
> mode".  I thought that meant it would compute the preprocessed hash and look
> for that too.   Is that incorrect - does it only compute ONE hash in all
> cases - a direct mode hash if running in direct mode and a preprocessed hash
> if not in direct mode?  If so, then let's modify my suggested enhancement to
> be that in direct mode, calculate and use the preprocessed hash whenever
> there is no hit with direct mode, and create hard links using all computed
> hashes to the one single object file inode that eventually exists in the
> ccache.  I don't think direct mode and preprocessed mode HAVE to be mutually
> exclusive - when direct mode gets a miss, preprocessed mode can still often
> provide a hit.
>
> And if no preprocessed hash gets computed/stored when running in direct
> mode, then I suspect that the reason I see so many pairs of identical object
> files in my ccache is because of the situation I describe above, where a
> header file change has triggered a direct mode hash miss, but preprocessing
> the sources has resulted in an identical preprocessed file which was then
> passed to the compiler which produced an identical object file.  But ccache
> didn't KNOW that they were identical, because it didn't compute the
> preprocessed hash.
>
> Looking at a ccache with about 40,000 .o files in it (created with direct
> mode turned on); of the 55 largest files, I found 11 pairs and one triplet
> of identical object files.  That's almost 25% of redundant storage that
> could have been avoided by looking at the preprocessed hash when there is no
> hit in direct mode.
>
> Thanks,
> Frank
>
>
> On 11/07/2011 12:53 AM, Martin Pool wrote:
>>
>> On 5 November 2011 11:12, Frank Klotz
>>  wrote:
>>>
>>>  I used ccache at my previous employer, and was very convinced of its
>>> value.
>>>  Now that I have started a new job, I am in the process of trying to
>>> bring
>>> the new shop on board with ccache, so I have been doing lots of test runs
>>> and looking at things.  Here is one thing I am thinking could add some
>>> value.

[ccache] PATCH: Look at include files' mtimes

2012-05-20 Thread Justin Lebar
This patch lets ccache examine an include file's mtime and size in
lieu of hashing it, during direct mode.  If the mtime and size don't
match, we fall back to hashing.

The net result is roughly a factor-of-two speedup in ccache hits (*),
on my machine.

I'm not sure if this is a desirable feature, because obviously mtimes
can be tampered with.

I didn't provide a way to disable the feature in this patch because,
presuming we wanted to take this patch, I'm not sure if we'd want
mtime-snooping enabled by default.  Since most projects already rely
on accurate mtimes in their build systems, turning this on by default
doesn't seem particularly outrageous to me.

Please let me know what you think about this.

Regards,
-Justin

(*) Experimental procedure: In a Firefox debug objdir
(CCACHE_HARDLINK, Linux-64, Ubuntu 12.04, 4 CPU cores), let

* Let |nop| be the average real time from a few runs of

$ time make -C dom -sj16

  when there's nothing to do.

* Let |orig| be the average real time from a few runs of

$ find dom -name '*.o' && time make -C dom -sj16

  with ccache master (701f13192ee) (discarding the first run, of course).

* Let |mtime| be the real time from the same command as |orig|, but
with patched ccache.

Speedup is (orig - nop) / (mtime - nop).  On my machine, nop = 3.71,
orig = 4.88, mtime = 4.31.  Yes, our nop build times are atrocious.
From 2bd9951a076993f9cd1874fc2413660711b7a07a Mon Sep 17 00:00:00 2001
From: Justin Lebar 
Date: Sun, 20 May 2012 15:18:44 -0400
Subject: [PATCH] Look at mtime before hashing include files.

---
 ccache.c   |2 +-
 ccache.h   |1 +
 manifest.c |   73 +++-
 test.sh|2 +-
 4 files changed, 66 insertions(+), 12 deletions(-)

diff --git a/ccache.c b/ccache.c
index 8b50c36..af8898e 100644
--- a/ccache.c
+++ b/ccache.c
@@ -129,7 +129,7 @@ static char *manifest_path;
  * Time of compilation. Used to see if include files have changed after
  * compilation.
  */
-static time_t time_of_compilation;
+time_t time_of_compilation;
 
 /*
  * Files included by the preprocessor and their hashes/sizes. Key: file path.
diff --git a/ccache.h b/ccache.h
index 7e25883..3d0d93a 100644
--- a/ccache.h
+++ b/ccache.h
@@ -211,6 +211,7 @@ void lockfile_release(const char *path);
 /* - */
 /* ccache.c */
 
+extern time_t time_of_compilation;
 bool cc_process_args(struct args *orig_args, struct args **preprocessor_args,
 struct args **compiler_args);
 void cc_reset(void);
diff --git a/manifest.c b/manifest.c
index fc60503..9e58ee5 100644
--- a/manifest.c
+++ b/manifest.c
@@ -41,10 +41,12 @@
  *   index of include file path  (4 bytes unsigned int)
  *hash of include file( bytes)
  *size of include file(4 bytes unsigned int)
+ *   mtime of include file   (8 bytes signed int)
  * ...
+ * 
  * 
  * 
- * 
+ * 
  * 
  *  number of object name entries   (4 bytes unsigned int)
  *   number of include file hash indexes (4 bytes unsigned int)
@@ -63,7 +65,7 @@
  */
 
 static const uint32_t MAGIC = 0x63436d46U;
-static const uint8_t  VERSION = 0;
+static const uint8_t  VERSION = 1;
 static const uint32_t MAX_MANIFEST_ENTRIES = 100;
 
 #define static_assert(e) do { enum { static_assert__ = 1/(e) }; } while (false)
@@ -75,6 +77,8 @@ struct file_info {
uint8_t hash[16];
/* Size of referenced file. */
uint32_t size;
+   /* mtime of referenced file. */
+   int64_t mtime;
 };
 
 struct object {
@@ -109,10 +113,15 @@ struct manifest {
struct object *objects;
 };
 
+struct file_mtime_and_size {
+   uint32_t size;
+   int64_t mtime;
+};
+
 static unsigned int
 hash_from_file_info(void *key)
 {
-   static_assert(sizeof(struct file_info) == 24); /* No padding. */
+   static_assert(sizeof(struct file_info) == 32); /* No padding. */
return murmurhashneutral2(key, sizeof(struct file_info), 0);
 }
 
@@ -123,7 +132,8 @@ file_infos_equal(void *key1, void *key2)
struct file_info *fi2 = (struct file_info *)key2;
return fi1->index == fi2->index
   && memcmp(fi1->hash, fi2->hash, 16) == 0
-  && fi1->size == fi2->size;
+  && fi1->size == fi2->size
+  && fi1->mtime == fi2->mtime;
 }
 
 static void
@@ -262,6 +272,7 @@ read_manifest(gzFile f)
READ_INT(4, mf->file_infos[i].index);
READ_BYTES(mf->hash_size, mf->file_infos[i].hash);
READ_INT(4, mf->file_infos[i].size);
+   READ_INT(8, mf->file_infos[i].mtime);
}
 
READ_INT(4, mf-&g

Re: [ccache] PATCH: Look at include files' mtimes

2012-05-21 Thread Justin Lebar
> I've been burned by mtime only checking before as
> (excluding some recent file systems) mtime has a resolution
> only down to one second.

I tried to address this in the patch, although come to think of it, I
did it wrong.

The trick is only to *cache* mtimes that are at least one second older
than now.  Then the resolution of the clock isn't a problem.

But if the system clock is set back (e.g. by NTP), we're in trouble.
And hardlinks are often created without bumping the inode's mtime,
which is also problematic.  (It's problematic for make, too.)

So I tend to agree that this should be off by default, unless we have
some brilliant way of detecting system clock changes.

I can't say I'm particularly interested in supporting two manifest
versions simultaneously, but that's up to Joel.

-Justin

On Mon, May 21, 2012 at 9:43 PM, Wilson Snyder  wrote:
>
>>This patch lets ccache examine an include file's mtime and size in
>>lieu of hashing it, during direct mode.  If the mtime and size don't
>>match, we fall back to hashing.
>>
>>The net result is roughly a factor-of-two speedup in ccache hits (*),
>>on my machine.
>>
>>I'm not sure if this is a desirable feature, because obviously mtimes
>>can be tampered with.
>
> IMO I at my site I would be reluctant to use ccache with
> this enabled, and believe the default should be as safe as
> possible. I've been burned by mtime only checking before as
> (excluding some recent file systems) mtime has a resolution
> only down to one second.
>
> However including it as an option seems fine, and recording
> it into the manifest also seems good.  I would also suggest
> making the manifest format backward compatible when it is
> easy as in this case; just test the version number when
> reading the file.
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] PATCH: Look at include files' mtimes

2012-05-22 Thread Justin Lebar
> Better to do 2 seconds, since FAT (and maybe some other Windows
> related setups) has only a 2-second resolution.
>
> The other thing you can do is, on Unix, use the latest of ctime and
> mtime, which should catch cases where the mtime gets reset.

Thanks for the tips!

I'm happy to update the patch, but I'd first want to hear Joel's thoughts.

-Justin
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


[ccache] PATCH: Clarify docs for sloppiness option include_file_mtime

2012-05-26 Thread Justin Lebar
Perhaps instead of attaching patches it's easier to send you a pull request?

If so, https://jle...@github.com/jlebar/ccache.git branch
include-file-mtime-docs.

https://github.com/jlebar/ccache/compare/include-file-mtime-docs

Or alternatively, see below.  :)

diff --git a/MANUAL.txt b/MANUAL.txt
index 4b0cfb2..4be33ae 100644
--- a/MANUAL.txt
+++ b/MANUAL.txt
@@ -420,17 +420,18 @@ WRAPPERS>>.
 ccache can't take for granted. This setting makes it possible to tell
 ccache to relax some checks in order to increase the hit rate. The value
 should be a comma-separated string with options. Available options are:
 +
 --
 *file_macro*::
 Ignore *\_\_FILE__* being present in the source.
 *include_file_mtime*::
-Don't check the modification time of include files in the direct mode.
+By default, ccache will not cache a file if it includes a header whose
+mtime is too new.  This option disables that check.
 *time_macros*::
 Ignore *\_\_DATE\__* and *\_\_TIME__* being present in the source code.
 --
 +
 See the discussion under <<_troubleshooting,TROUBLESHOOTING>> for more
 information.

 *temporary_dir* (*CCACHE_TEMPDIR*)::
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Why not cache link commands?

2012-09-18 Thread Justin Lebar
> So, again, before I waste my time implementing this feature, are there any
> other fundamental gotchas that would prevent it ever working or ever being
> useful?

On a large project with many inputs to ld, you'd have to hash a /lot/
of object files, increasing the overhead of ccache substantially.  I
understand that this isn't your particular use-case, but it's the
common one.

If you're on Linux, have you tried the gold linker?

-Justin

On Tue, Sep 18, 2012 at 8:44 AM, Andrew Stubbs  wrote:
> Hi all, again,
>
> I've just posted about improving compile speed by caching compiler failures,
> and in the same vein I'd like to consider caching called-for-link compile
> tasks.
>
> This is partly interesting for the many small autoconf tests, but is also
> increasingly interesting for real compilations, now that
> whole-program-optimization and link-time-optimization is more available in
> GCC. Even without all this link-time compilation activity, there are some
> link operations that simply take forever, mostly due to large file sizes.
>
> Clearly there are some technical challenges in doing this: we'd have to hash
> all the object files and libraries (a la direct mode), but those problems
> are surmountable, I think. The linker does not use any libraries not listed
> with "gcc '-###' whatever".
>
> I'm also aware that it's not that interesting for many incremental builds,
> where the final link will always be different, but my use case is
> accelerating rebuilds of projects that my have many outputs, most of which
> are likely to be unaffected by small code changes. It's also worth noting
> that incremental builds are not the target use case for ccache in general.
>
> So, again, before I waste my time implementing this feature, are there any
> other fundamental gotchas that would prevent it ever working or ever being
> useful?
>
> Has anybody else ever tried to do this? Is anybody trying to do it now?
>
> Thanks
>
> Andrew
> ___
> ccache mailing list
> ccache@lists.samba.org
> https://lists.samba.org/mailman/listinfo/ccache
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Why not cache compile failures?

2012-09-18 Thread Justin Lebar
> I'm looking at ways to improve compile speed, and one obvious option is to
> cache compile failures. I'm thinking of certain non-called-for-link autoconf
> tests, in particular.

Doesn't autoconf have a cache of its own?

Anyway, ccache makes running the compiler faster.  In the cause of
giving the compiler a small program to compile to test a feature,
surely running the compiler takes virtually zero time, and the
overhead is elsewhere.

-Justin
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Why not cache link commands?

2012-09-18 Thread Justin Lebar
> What I'm looking for is more concrete
> roadblocks I haven't considered.

You'd basically have to rewrite all of ccache.

ccache hashes header files and spits out object files.

ldcache would hash object files and spit out linked files.  It would
use an entirely separate cache.  Its handling of command-line options
would be entirely different.  Its processing of input files would be
entirely different.  ISTM that very little would be shared.

Since this is targeting a niche use-case and is a large change to
ccache, I'd be hesitant to take this change upstream, if I were Joel.

-Justin

On Tue, Sep 18, 2012 at 11:27 AM, Andrew Stubbs  wrote:
> On 18/09/12 15:31, Justin Lebar wrote:
>>>
>>> So, again, before I waste my time implementing this feature, are there
>>> any
>>> other fundamental gotchas that would prevent it ever working or ever
>>> being
>>> useful?
>>
>>
>> On a large project with many inputs to ld, you'd have to hash a /lot/
>> of object files, increasing the overhead of ccache substantially.  I
>> understand that this isn't your particular use-case, but it's the
>> common one.
>
>
> Yes, that's true, but those are also the most expensive link commands, so
> maybe it's not so bad.
>
> I realise that there's some risk that a cache miss can be expensive, and
> that a cache hit might be only a very little cheaper than the real link, but
> I'm prepared to take that risk. What I'm looking for is more concrete
> roadblocks I haven't considered.
>
> Incidentally, I'm also considering the possibility of caching the hashes and
> using the inode/size/mtime etc. to short-cut that process (perhaps as a
> "sloppiness" option), not only for objects, but also for sources.
>
>
>> If you're on Linux, have you tried the gold linker?
>
>
> Let's limit this discussion to what can be done with ccache, please. I
> assure you, we know about the toolchain options.
>
> Andrew
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] [PATCH] Detect __DATE__ and __TIME__ correctly.

2012-10-08 Thread Justin Lebar
> I've encountered a bug while playing with ccache: temporal macros are not
> detected correctly.

Ouch!

>* the assumption that 'E' is less common in source than '_', 
> we check
>* str[i-2] first.

Update the comment?

>   while (i < len) {

I think this fails on at least one edge case: If the file contains
only the string "__date__", then len == i == 8 and we never enter the
loop, right?  I think we in general fail to detect temporal macros at
the very end of the file, with this patch.

The solution isn't as simple as making it |i <= len|, of course,
because the end of the loop reads str[i].

-Justin

On Mon, Oct 8, 2012 at 8:56 AM, Andrew Stubbs  wrote:
> Hi Joel,
>
> I've encountered a bug while playing with ccache: temporal macros are not
> detected correctly.
>
> Patch attached.
>
> Andrew
>
> ___
> ccache mailing list
> ccache@lists.samba.org
> https://lists.samba.org/mailman/listinfo/ccache
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] [PATCH] Detect __DATE__ and __TIME__ correctly.

2012-10-09 Thread Justin Lebar
Can you update the Python script in the comment right above the code
and confirm that it matches your new table?  It's hard for me to see
what you did here based on just the patch...

On Tue, Oct 9, 2012 at 10:43 AM, Andrew Stubbs  wrote:
> On 08/10/12 19:25, Justin Lebar wrote:
>>
>> I think this fails on at least one edge case: If the file contains
>> only the string "__date__", then len == i == 8 and we never enter the
>> loop, right?  I think we in general fail to detect temporal macros at
>> the very end of the file, with this patch.
>>
>> The solution isn't as simple as making it |i <= len|, of course,
>> because the end of the loop reads str[i].
>
>
> Grrr, those pesky fenceposts!
>
> Ok, after looking at it some more, I think the correct solution is to fix
> the table, not the code.
>
> New patch attached.
>
> Thanks
>
> Andrew
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] [PATCH] Detect __DATE__ and __TIME__ correctly.

2012-10-09 Thread Justin Lebar
Okay, I think this is right.  Thanks a lot for fixing my bug.  :)

It's not so surprising that the tests pass as-is, since we only test
one offset of "__TIME__".  But I tried modifying test.sh to move
"__TIME__" to different offsets, and everything still worked.

We should really get test coverage of this, since it's obviously
tricky to get right!

On Tue, Oct 9, 2012 at 11:05 AM, Andrew Stubbs  wrote:
> On 09/10/12 15:46, Justin Lebar wrote:
>>
>> Can you update the Python script in the comment right above the code
>> and confirm that it matches your new table?  It's hard for me to see
>> what you did here based on just the patch...
>
>
> In fact, I did do that. I just cocked up the git magic and didn't post it. I
> guess I need more sleep!
>
> I've also removed the unnecessary comma change.
>
> New version attached, again.
>
> Andrew
>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


[ccache] Fwd: [PATCH] Detect __DATE__ and __TIME__ correctly.

2012-11-04 Thread Justin Lebar
Here's the last patch I got.  Gmail tells me it was also sent to the
list; maybe it somehow got filtered.

I /think/ I'm including the attachment here...


-- Forwarded message --
From: Andrew Stubbs 
Date: Tue, Oct 9, 2012 at 11:05 AM
Subject: Re: [PATCH] Detect __DATE__ and __TIME__ correctly.
To: Justin Lebar 
Cc: Andrew Stubbs , "ccache@lists.samba.org"



On 09/10/12 15:46, Justin Lebar wrote:
>
> Can you update the Python script in the comment right above the code
> and confirm that it matches your new table?  It's hard for me to see
> what you did here based on just the patch...


In fact, I did do that. I just cocked up the git magic and didn't post
it. I guess I need more sleep!

I've also removed the unnecessary comma change.

New version attached, again.

Andrew
From 8ea4e57f434640ebfa09d75756b7f56a097638d2 Mon Sep 17 00:00:00 2001
From: Andrew Stubbs 
Date: Tue, 9 Oct 2012 15:17:19 +0100
Subject: [PATCH] Detect __DATE__ and __TIME__ correctly.
To: ccache@lists.samba.org

The code to detect __DATE__ and __TIME__ was off-by-one, and therefore
totally failed to detect time macros unless by chance alignments (1 in eight
macros might be correctly aligned).

The problem is that the code expects that 'i' will point to the last
underscore, and the skip table expects 'i' to point to the point after
the end of the string. For example, if str[i] == 'E' then the skip table
moves 'i' on 3 bytes, whereas the code only works with a 2-byte skip.

I've corrected the problem by adjusting the table to match the code.

I've confirmed the tests still pass.

Signed-off-by: Andrew Stubbs 
---
 macroskip.h |   28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/macroskip.h b/macroskip.h
index 1452201..cb32ec4 100644
--- a/macroskip.h
+++ b/macroskip.h
@@ -7,23 +7,23 @@
  * The other characters map as follows:
  *
  *   _ -> 1
- *   A -> 5
- *   D -> 6
- *   E -> 3
- *   I -> 5
- *   M -> 4
- *   T -> 4
+ *   A -> 4
+ *   D -> 5
+ *   E -> 2
+ *   I -> 4
+ *   M -> 3
+ *   T -> 3
  *
  *
  * This was generated with the following Python script:
  *
  * m = {'_': 1,
- *  'A': 5,
- *  'D': 6,
- *  'E': 3,
- *  'I': 5,
- *  'M': 4,
- *  'T': 4}
+ *  'A': 4,
+ *  'D': 5,
+ *  'E': 2,
+ *  'I': 4,
+ *  'M': 3,
+ *  'T': 3}
  *
  * for i in range(0, 256):
  * if chr(i) in m:
@@ -41,8 +41,8 @@ static const uint32_t macro_skip[] = {
 	8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,
 	8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,
 	8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,
-	8,  5,  8,  8,  6,  3,  8,  8,  8,  5,  8,  8,  8,  4,  8,  8,
-	8,  8,  8,  8,  4,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  1,
+	8,  4,  8,  8,  5,  2,  8,  8,  8,  4,  8,  8,  8,  3,  8,  8,
+	8,  8,  8,  8,  3,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  1,
 	8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,
 	8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,
 	8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,
-- 
1.7.9.5

___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] PATCH: Look at include files' mtimes

2012-12-24 Thread Justin Lebar
Hi, all.

I've resurrected these patches to look at files' mtimes and ctimes.
Hopefully the three patches here (with their commit messages) don't
need further explanation.  Note that the second patch here increases
safety for everyone, not just those who choose to have mtime matching
on.

These patches seem to be working, but I'm not seeing a significant
speedup on my Mac.  I think that may be a separate issue, as this
machine isn't particularly good at I/O.  I don't have access to my
Linux box for a while, so I'd certainly appreciate if someone could
verify whether there's a speedup here.

I'd also appreciate if some of you could test this patch by turning on
CCACHE_SLOPPINESS=file_stat_matches and letting me know if you have
any problems.

Happy holidays.

-Justin

On Sun, May 20, 2012 at 4:49 PM, Justin Lebar  wrote:
> This patch lets ccache examine an include file's mtime and size in
> lieu of hashing it, during direct mode.  If the mtime and size don't
> match, we fall back to hashing.
>
> The net result is roughly a factor-of-two speedup in ccache hits (*),
> on my machine.
>
> I'm not sure if this is a desirable feature, because obviously mtimes
> can be tampered with.
>
> I didn't provide a way to disable the feature in this patch because,
> presuming we wanted to take this patch, I'm not sure if we'd want
> mtime-snooping enabled by default.  Since most projects already rely
> on accurate mtimes in their build systems, turning this on by default
> doesn't seem particularly outrageous to me.
>
> Please let me know what you think about this.
>
> Regards,
> -Justin
>
> (*) Experimental procedure: In a Firefox debug objdir
> (CCACHE_HARDLINK, Linux-64, Ubuntu 12.04, 4 CPU cores), let
>
> * Let |nop| be the average real time from a few runs of
>
> $ time make -C dom -sj16
>
>   when there's nothing to do.
>
> * Let |orig| be the average real time from a few runs of
>
> $ find dom -name '*.o' && time make -C dom -sj16
>
>   with ccache master (701f13192ee) (discarding the first run, of course).
>
> * Let |mtime| be the real time from the same command as |orig|, but
> with patched ccache.
>
> Speedup is (orig - nop) / (mtime - nop).  On my machine, nop = 3.71,
> orig = 4.88, mtime = 4.31.  Yes, our nop build times are atrocious.
From ecac7e50e02d4a8319caff9faf4dbb9527a182e1 Mon Sep 17 00:00:00 2001
From: Justin Lebar 
Date: Mon, 24 Dec 2012 16:16:46 -0500
Subject: [PATCH 1/3] Extern time_of_compilation.

---
 ccache.c | 2 +-
 ccache.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/ccache.c b/ccache.c
index eceedff..b77f999 100644
--- a/ccache.c
+++ b/ccache.c
@@ -138,7 +138,7 @@ static char *manifest_path;
  * Time of compilation. Used to see if include files have changed after
  * compilation.
  */
-static time_t time_of_compilation;
+time_t time_of_compilation;
 
 /*
  * Files included by the preprocessor and their hashes/sizes. Key: file path.
diff --git a/ccache.h b/ccache.h
index 18a2b9e..5bcbf71 100644
--- a/ccache.h
+++ b/ccache.h
@@ -213,6 +213,7 @@ void lockfile_release(const char *path);
 /* - */
 /* ccache.c */
 
+extern time_t time_of_compilation;
 bool cc_process_args(struct args *args, struct args **preprocessor_args,
 struct args **compiler_args);
 void cc_reset(void);
-- 
1.8.0

From 6c2ffcab24538b00522dae63c74ff3d4ec960467 Mon Sep 17 00:00:00 2001
From: Justin Lebar 
Date: Mon, 24 Dec 2012 23:09:14 -0500
Subject: [PATCH 2/3] Check that included files' ctimes aren't too new.

ccache currently checks that a file's mtime isn't too new, unless
CCACHE_SLOPPINESS includes "include_file_mtime".

This patch adds a similar check that a file's ctime isn't too new.  We
skip this check if CCACHE_SLOPPINESS includes "include_file_ctime".
---
 MANUAL.txt   |  3 +++
 ccache.c |  6 ++
 ccache.h |  5 +++--
 conf.c   |  5 +
 test.sh  | 51 ---
 test/test_conf.c |  6 --
 6 files changed, 49 insertions(+), 27 deletions(-)

diff --git a/MANUAL.txt b/MANUAL.txt
index 3a4afde..e7411b4 100644
--- a/MANUAL.txt
+++ b/MANUAL.txt
@@ -428,6 +428,9 @@ WRAPPERS>>.
 *include_file_mtime*::
 By default, ccache will not cache a file if it includes a header whose
 mtime is too new.  This option disables that check.
+*include_file_ctime*::
+ccache also will not cache a file if it includes a header whose ctime is
+too new.  This option disables that check.
 *time_macros*::
 Ignore *\_\_DATE\__* and *\_\_TIME__* being present in the source code.
 --
diff --git a/ccache.c b/ccache.c
index b77

Re: [ccache] PATCH: Look at include files' mtimes

2013-06-03 Thread Justin Lebar
FWIW I've been using ccache with mtime checking for the past few weeks
and I haven't noticed any problems.  That is a pretty low bar, I
admit, but it's something.  :)

On Sun, Mar 3, 2013 at 3:54 PM, Joel Rosdahl  wrote:
> Hi Justin,
>
>> I've resurrected these patches to look at files' mtimes and ctimes. [...]
>
> I just found out that I forgot to have a look at your patches. Sorry
> about the delay.
>
> I seem fine, so I've applied them. I did need to fix the unit tests
> since they failed, though. Please have a look and see if it looks all
> right.
>
> Thanks,
> -- Joel
>
> On 25 December 2012 08:18, Justin Lebar  wrote:
>> Hi, all.
>>
>> I've resurrected these patches to look at files' mtimes and ctimes.
>> Hopefully the three patches here (with their commit messages) don't
>> need further explanation.  Note that the second patch here increases
>> safety for everyone, not just those who choose to have mtime matching
>> on.
>>
>> These patches seem to be working, but I'm not seeing a significant
>> speedup on my Mac.  I think that may be a separate issue, as this
>> machine isn't particularly good at I/O.  I don't have access to my
>> Linux box for a while, so I'd certainly appreciate if someone could
>> verify whether there's a speedup here.
>>
>> I'd also appreciate if some of you could test this patch by turning on
>> CCACHE_SLOPPINESS=file_stat_matches and letting me know if you have
>> any problems.
>>
>> Happy holidays.
>>
>> -Justin
>>
>> On Sun, May 20, 2012 at 4:49 PM, Justin Lebar  wrote:
>>> This patch lets ccache examine an include file's mtime and size in
>>> lieu of hashing it, during direct mode.  If the mtime and size don't
>>> match, we fall back to hashing.
>>>
>>> The net result is roughly a factor-of-two speedup in ccache hits (*),
>>> on my machine.
>>>
>>> I'm not sure if this is a desirable feature, because obviously mtimes
>>> can be tampered with.
>>>
>>> I didn't provide a way to disable the feature in this patch because,
>>> presuming we wanted to take this patch, I'm not sure if we'd want
>>> mtime-snooping enabled by default.  Since most projects already rely
>>> on accurate mtimes in their build systems, turning this on by default
>>> doesn't seem particularly outrageous to me.
>>>
>>> Please let me know what you think about this.
>>>
>>> Regards,
>>> -Justin
>>>
>>> (*) Experimental procedure: In a Firefox debug objdir
>>> (CCACHE_HARDLINK, Linux-64, Ubuntu 12.04, 4 CPU cores), let
>>>
>>> * Let |nop| be the average real time from a few runs of
>>>
>>> $ time make -C dom -sj16
>>>
>>>   when there's nothing to do.
>>>
>>> * Let |orig| be the average real time from a few runs of
>>>
>>> $ find dom -name '*.o' && time make -C dom -sj16
>>>
>>>   with ccache master (701f13192ee) (discarding the first run, of course).
>>>
>>> * Let |mtime| be the real time from the same command as |orig|, but
>>> with patched ccache.
>>>
>>> Speedup is (orig - nop) / (mtime - nop).  On my machine, nop = 3.71,
>>> orig = 4.88, mtime = 4.31.  Yes, our nop build times are atrocious.
>>
>> ___
>> ccache mailing list
>> ccache@lists.samba.org
>> https://lists.samba.org/mailman/listinfo/ccache
>>
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache