Re: [ccache] Caching failed compilations

2015-07-24 Thread Andrew Stubbs

On 23/07/15 22:28, Joel Rosdahl wrote:

(Sorry for the delayed reply, I have been on vacation.)


No problem; me too!


No no, doing an extra read of initial data is not needed. If something I
wrote implied that I must have been unclear.


OK, all clear now. I don't recall the exact code path for this stuff, 
and I suspect you'll need to rewrite/inline the copy routine in an ugly 
way, but I see now.



(I'm a bit surprised that you
felt the need to explain basic stuff about what's important, to be honest.)


Sorry, it just felt like we were talking past each other, and maybe not 
using the same terminology, so I tried to break the cycle by going back 
to basics.



I hope this is more clear. I'm beginning to lose faith in my English
communication skills. :-)


Internet conversations always do that. Not enough non-verbal communication.

Andrew

___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Caching failed compilations

2015-07-23 Thread Joel Rosdahl
(Resend since my original mail didn't reach the mailing list properly.)

(Sorry for the delayed reply, I have been on vacation.)

On cache-hit, there's currently no reason to actually look inside the file,
 right? It just does the copy blind (I forget exactly how). Reading the
 initial data from every binary on every cache-hit (the case we want to be
 most optimal) sounds like a Bad Thing.


No no, doing an extra read of initial data is not needed. If something I
wrote implied that I must have been unclear.

A cache hit in the current implementation (simplified, of course):

1. Stat object file in cache. If it exists, we have a hit.
2. Open file.
3. Read a chunk of the file into a buffer.
4. Write buffer content to the destination object file.
5. Repeat 3 and 4 until EOF.
6. Close file.

A cache hit in the suggested solution where special data (e.g. encoding the
exit code) is written to the cached object file:

1. Stat object file in cache. If it exists, we have a hit.
2. Open file.
3. Read a chunk of the file into a buffer.
4. If the buffer contains special data (e.g. starts with a ccache-specific
header), exit with the encoded exit code (and write stderr, etc.). Else:
5. Write buffer content to the destination object file.
6. Repeat 3 and 5 until EOF.
7. Close file.

So, exactly the same system calls in both cases for a normal cache hit.
That's what I tried to summarize with On a cache hit, we need to open and
read the file regardless of whether it's a real object file or special data
encoding an exit code.

The most common case must always be the quick path [...]


Heh, I certainly know that it's important not to slow down common cases
with e.g. more system calls. I didn't mean to ask you to explain what you
meant by the term slow path but why you thought that there *would be* a
slow path in the special-data-in-object-file solution. Again I must have
been unclear, sorry about that. (I'm a bit surprised that you felt the need
to explain basic stuff about what's important, to be honest.)

And as described above, I see no reason for why special-data-in-object-file
would be slower. Yes, the non-zero-exit-code case could perhaps be
optimized with a stat size hack, but that doesn't feel important.

I hope this is more clear. I'm beginning to lose faith in my English
communication skills. :-)

-- Joel
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Caching failed compilations

2015-07-07 Thread Anders Björklund
Hi Joel and all!

I also found the idea of storing failures interesting, and made a quick sample 
implementation earlier:

https://github.com/itension/ccache/compare/store_failures

Feature is enabled by setting $CCACHE_STOREFAILURES

It does store the status as a separate file, but on the other hand there is no 
object file stored for failures.
One could look for a object file (success) *before* looking for a status file 
(failure), to cut down on stat's ?

My biggest fear is that it will store I/O errors and whatnot, with no easy 
way to rebuild (needs a recache)
So I made it opt-in, rather than default. So far, it seems that almost 
everything in cc returns exit code 1


I was investigating cutting down on the number of files, and stored everything 
in a LMDB* database instead...
The biggest downside is that adding new files to the cache (and cleaning) now 
becomes much more involved.

* https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database

We already had a compact format developed for the memcache extension* (that is 
also proving to be very useful).
It does make extending the format harder, but I suppose one could use the 
version in the CCH1 header for that ?

* 
https://github.com/tardyp/ccache/commit/33852da77f54c9227cb90e013e1bb186a7d315c2


I am hesitant to replace the default (files), but see great potential when 
combining memcached with distcc.
It opens up for sharing a secondary cache between *several* servers, but 
without having to do a recompile.

So we use it like: ccache - memcached - distcc

Will clean up database/memcached for public testing...

It is possible to convert between the different cache formats, since the actual 
files inside are all the same.
A simple conversion script (in python) is provided. A little slow if the cache 
is in use, but otherwise it's OK.

/Anders


Från: ccache-boun...@lists.samba.org [ccache-boun...@lists.samba.org] f#246;r 
Andrew Stubbs [a...@codesourcery.com]
Skickat: den 7 juli 2015 10:58
Till: Joel Rosdahl
Kopia: Akim Demaille; ccache list
Ämne: Re: [ccache] Caching failed compilations

On 06/07/15 21:44, Joel Rosdahl wrote:
 That sounds like a reasonable idea, but I have occasionally seen empty
 object files in large and busy caches (it could be due to filesystem
 failure, hardware failure or hard system reset), so I'm afraid that
 using zero-length object files won't work out in practice. See also
 https://bugzilla.samba.org/show_bug.cgi?id=9972. But maybe writing some
 special content to the object file would be OK?

OK, fair enough, but I'd say that once you've opened the file and
checked the magic data then you've already killed performance. How about
a magic length that can be observed in the stat data?

A failure can be confirmed by a read, if and only if the length matches,
but a compile success will remain on the quick path.

A cache-hit for a compile failure need not be the *most* efficient code
path; it will likely end the build process. As long as it's faster than
the slow compile failures the OP cares about then all is well.

 Sorry, I don't see any advantage in this scheme. You might save a
 few bytes of disk space, and maybe a few inodes, but I've not seen
 any evidence that those are a problem. You'll also add extra file
 copies to every cache miss, and those are already expensive enough.


 My primary motivation for considering the mentioned scheme is to reduce
 disk seeks, not disk space. If you have a cold disk cache (on a rotating
 device), every new i-node that needs to be visited potentially/likely
 needs a new disk seek, which is slow. If all parts of the result are
 stored in one contiguous file, it should likely be quicker to retrieve.
 But as mentioned earlier, I have no data to back up this theory yet.

My understanding is that when a disk read occurs the kernel reads the
entire page into the memory cache. Subsequent inode reads will likely
hit that cache, so reading two inodes is nearly as cheep as reading one.
The system call overhead is constant, however.

 A secondary motivation for the scheme is that various code paths in
 ccache need to handle multiple files for a single result. There can now
 be between two (stderr, object) and six (stderr, object, dependency,
 coverage, diagnostics, split dwarf) files for each cached result. If one
 of those files is missing, then the result should be invalid. This is
 quite painful and there are most likely some lurking bugs related to this.

OK, that's quite a lot of files. Hopefully it does not look for a file
unless it really ought to be there? I worry that you'll hurt the common
case (just two files) in order to help the uncommon case, and that that
is already about as good as it can be (especially with hard-links).

 A third motivation is that it would be easier to include a check sum of
 the cached data to detect corruption so that ccache won't repeatedly
 deliver a bad object file (due to hardware error

Re: [ccache] Caching failed compilations

2015-07-06 Thread Andrew Stubbs

On 05/07/15 16:47, Joel Rosdahl wrote:

Hi,

I did have a look at how feasible it is, and basically I think it
can be done.


Yes, caching failures (from the compiler, not the preprocessor) would be
feasible and I think that it's a good idea, at least as an optional feature.

I'm not very tempted to add a new kind of file for storing the exit code
in the cache, though.


I certainly agree that it's not appropriate to have an exit code: 0 
file for every successful compile.


I'd also suggest that having a separate file hold error exit codes would 
be confusing should a compile fail (due to out-of-memory, say) and then 
a subsequent run succeeds and is cached (using CCACHE_RECACHE),
and having every cache-miss have to check for, and maybe delete another 
file is only going to hurt performance more.


After thinking further, I'd be tempted to say that ccache should *not* 
cache failures with exit codes other than 1 as they're likely not 
repeatable (OOM, Crtl-C, etc.).


Perhaps just signal a failed compile with a cache result that is present 
but zero-length? (We could also say that it failed if the the binary 
cache is missing, but the stderr cache is present, but that might be 
problematic.)



Instead I think that we should switch to store
only one file per result in the cache (except the manifest) and store
everything (exit code, stderr, object file, dependency file, etc.) in
it. The downside would be that hard_link mode wouldn't be possible
anymore but the upside is that fewer i-nodes will be used, which should
improve performance in theory. (Today at least two i-nodes are used per
cached result plus one for the manifest.)


Sorry, I don't see any advantage in this scheme. You might save a few 
bytes of disk space, and maybe a few inodes, but I've not seen any 
evidence that those are a problem. You'll also add extra file copies to 
every cache miss, and those are already expensive enough.


If you do go this route, please consider using an open format, like tar, 
and turn on compression by default to offset the internal padding (if 
the benchmarks don't show it hurting).


Andrew
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Caching failed compilations

2015-07-06 Thread Joel Rosdahl

 After thinking further, I'd be tempted to say that ccache should *not*
 cache failures with exit codes other than 1 as they're likely not
 repeatable (OOM, Crtl-C, etc.).


 Perhaps just signal a failed compile with a cache result that is present
 but zero-length? (We could also say that it failed if the the binary cache
 is missing, but the stderr cache is present, but that might be problematic.)


That sounds like a reasonable idea, but I have occasionally seen empty
object files in large and busy caches (it could be due to filesystem
failure, hardware failure or hard system reset), so I'm afraid that using
zero-length object files won't work out in practice. See also
https://bugzilla.samba.org/show_bug.cgi?id=9972. But maybe writing some
special content to the object file would be OK?

Sorry, I don't see any advantage in this scheme. You might save a few bytes
 of disk space, and maybe a few inodes, but I've not seen any evidence that
 those are a problem. You'll also add extra file copies to every cache miss,
 and those are already expensive enough.


My primary motivation for considering the mentioned scheme is to reduce
disk seeks, not disk space. If you have a cold disk cache (on a rotating
device), every new i-node that needs to be visited potentially/likely needs
a new disk seek, which is slow. If all parts of the result are stored in
one contiguous file, it should likely be quicker to retrieve. But as
mentioned earlier, I have no data to back up this theory yet.

A secondary motivation for the scheme is that various code paths in ccache
need to handle multiple files for a single result. There can now be between
two (stderr, object) and six (stderr, object, dependency, coverage,
diagnostics, split dwarf) files for each cached result. If one of those
files is missing, then the result should be invalid. This is quite painful
and there are most likely some lurking bugs related to this.

A third motivation is that it would be easier to include a check sum of the
cached data to detect corruption so that ccache won't repeatedly deliver a
bad object file (due to hardware error or whatnot).

Does this sound reasonable? What disadvantages do you see?

-- Joel
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Caching failed compilations

2015-07-05 Thread Joel Rosdahl
Hi,

I did have a look at how feasible it is, and basically I think it can be
 done.


Yes, caching failures (from the compiler, not the preprocessor) would be
feasible and I think that it's a good idea, at least as an optional feature.

I'm not very tempted to add a new kind of file for storing the exit code in
the cache, though. Instead I think that we should switch to store only one
file per result in the cache (except the manifest) and store everything
(exit code, stderr, object file, dependency file, etc.) in it. The downside
would be that hard_link mode wouldn't be possible anymore but the upside is
that fewer i-nodes will be used, which should improve performance in
theory. (Today at least two i-nodes are used per cached result plus one for
the manifest.)

-- Joel
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Caching failed compilations

2015-07-01 Thread Andrew Stubbs

On 30/06/15 15:30, Akim Demaille wrote:

Could ccache offer a mode where even failed compiled be ccached?


I'd like to see this too!

Or rather, I would have liked to see this on a project I worked on a 
while ago. It's not really a common use case for most people, I'd 
imagine, and not something I'd use now (my compile failures would be 
almost entirely cache misses).


I did have a look at how feasible it is, and basically I think it can be 
done. The caching is based on a hash of the input source files, combined 
with the command line arguments and various other environmental factors; 
in particular, the content of the output binaries is immaterial. The 
warning messages that accompany the binary are already cached, so 
caching error messages is not a big deal. It merely needs some sort of 
mechanism to record the exit code.


I don't believe it would be possible to cache compilations where a 
header file is missing, for example, but those fail very quickly anyway.


Andrew
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache