Re: [ccache] Caching failed compilations
On 23/07/15 22:28, Joel Rosdahl wrote: (Sorry for the delayed reply, I have been on vacation.) No problem; me too! No no, doing an extra read of initial data is not needed. If something I wrote implied that I must have been unclear. OK, all clear now. I don't recall the exact code path for this stuff, and I suspect you'll need to rewrite/inline the copy routine in an ugly way, but I see now. (I'm a bit surprised that you felt the need to explain basic stuff about what's important, to be honest.) Sorry, it just felt like we were talking past each other, and maybe not using the same terminology, so I tried to break the cycle by going back to basics. I hope this is more clear. I'm beginning to lose faith in my English communication skills. :-) Internet conversations always do that. Not enough non-verbal communication. Andrew ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache
Re: [ccache] Caching failed compilations
(Resend since my original mail didn't reach the mailing list properly.) (Sorry for the delayed reply, I have been on vacation.) On cache-hit, there's currently no reason to actually look inside the file, right? It just does the copy blind (I forget exactly how). Reading the initial data from every binary on every cache-hit (the case we want to be most optimal) sounds like a Bad Thing. No no, doing an extra read of initial data is not needed. If something I wrote implied that I must have been unclear. A cache hit in the current implementation (simplified, of course): 1. Stat object file in cache. If it exists, we have a hit. 2. Open file. 3. Read a chunk of the file into a buffer. 4. Write buffer content to the destination object file. 5. Repeat 3 and 4 until EOF. 6. Close file. A cache hit in the suggested solution where special data (e.g. encoding the exit code) is written to the cached object file: 1. Stat object file in cache. If it exists, we have a hit. 2. Open file. 3. Read a chunk of the file into a buffer. 4. If the buffer contains special data (e.g. starts with a ccache-specific header), exit with the encoded exit code (and write stderr, etc.). Else: 5. Write buffer content to the destination object file. 6. Repeat 3 and 5 until EOF. 7. Close file. So, exactly the same system calls in both cases for a normal cache hit. That's what I tried to summarize with On a cache hit, we need to open and read the file regardless of whether it's a real object file or special data encoding an exit code. The most common case must always be the quick path [...] Heh, I certainly know that it's important not to slow down common cases with e.g. more system calls. I didn't mean to ask you to explain what you meant by the term slow path but why you thought that there *would be* a slow path in the special-data-in-object-file solution. Again I must have been unclear, sorry about that. (I'm a bit surprised that you felt the need to explain basic stuff about what's important, to be honest.) And as described above, I see no reason for why special-data-in-object-file would be slower. Yes, the non-zero-exit-code case could perhaps be optimized with a stat size hack, but that doesn't feel important. I hope this is more clear. I'm beginning to lose faith in my English communication skills. :-) -- Joel ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache
Re: [ccache] Caching failed compilations
Hi Joel and all! I also found the idea of storing failures interesting, and made a quick sample implementation earlier: https://github.com/itension/ccache/compare/store_failures Feature is enabled by setting $CCACHE_STOREFAILURES It does store the status as a separate file, but on the other hand there is no object file stored for failures. One could look for a object file (success) *before* looking for a status file (failure), to cut down on stat's ? My biggest fear is that it will store I/O errors and whatnot, with no easy way to rebuild (needs a recache) So I made it opt-in, rather than default. So far, it seems that almost everything in cc returns exit code 1 I was investigating cutting down on the number of files, and stored everything in a LMDB* database instead... The biggest downside is that adding new files to the cache (and cleaning) now becomes much more involved. * https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database We already had a compact format developed for the memcache extension* (that is also proving to be very useful). It does make extending the format harder, but I suppose one could use the version in the CCH1 header for that ? * https://github.com/tardyp/ccache/commit/33852da77f54c9227cb90e013e1bb186a7d315c2 I am hesitant to replace the default (files), but see great potential when combining memcached with distcc. It opens up for sharing a secondary cache between *several* servers, but without having to do a recompile. So we use it like: ccache - memcached - distcc Will clean up database/memcached for public testing... It is possible to convert between the different cache formats, since the actual files inside are all the same. A simple conversion script (in python) is provided. A little slow if the cache is in use, but otherwise it's OK. /Anders Från: ccache-boun...@lists.samba.org [ccache-boun...@lists.samba.org] f#246;r Andrew Stubbs [a...@codesourcery.com] Skickat: den 7 juli 2015 10:58 Till: Joel Rosdahl Kopia: Akim Demaille; ccache list Ämne: Re: [ccache] Caching failed compilations On 06/07/15 21:44, Joel Rosdahl wrote: That sounds like a reasonable idea, but I have occasionally seen empty object files in large and busy caches (it could be due to filesystem failure, hardware failure or hard system reset), so I'm afraid that using zero-length object files won't work out in practice. See also https://bugzilla.samba.org/show_bug.cgi?id=9972. But maybe writing some special content to the object file would be OK? OK, fair enough, but I'd say that once you've opened the file and checked the magic data then you've already killed performance. How about a magic length that can be observed in the stat data? A failure can be confirmed by a read, if and only if the length matches, but a compile success will remain on the quick path. A cache-hit for a compile failure need not be the *most* efficient code path; it will likely end the build process. As long as it's faster than the slow compile failures the OP cares about then all is well. Sorry, I don't see any advantage in this scheme. You might save a few bytes of disk space, and maybe a few inodes, but I've not seen any evidence that those are a problem. You'll also add extra file copies to every cache miss, and those are already expensive enough. My primary motivation for considering the mentioned scheme is to reduce disk seeks, not disk space. If you have a cold disk cache (on a rotating device), every new i-node that needs to be visited potentially/likely needs a new disk seek, which is slow. If all parts of the result are stored in one contiguous file, it should likely be quicker to retrieve. But as mentioned earlier, I have no data to back up this theory yet. My understanding is that when a disk read occurs the kernel reads the entire page into the memory cache. Subsequent inode reads will likely hit that cache, so reading two inodes is nearly as cheep as reading one. The system call overhead is constant, however. A secondary motivation for the scheme is that various code paths in ccache need to handle multiple files for a single result. There can now be between two (stderr, object) and six (stderr, object, dependency, coverage, diagnostics, split dwarf) files for each cached result. If one of those files is missing, then the result should be invalid. This is quite painful and there are most likely some lurking bugs related to this. OK, that's quite a lot of files. Hopefully it does not look for a file unless it really ought to be there? I worry that you'll hurt the common case (just two files) in order to help the uncommon case, and that that is already about as good as it can be (especially with hard-links). A third motivation is that it would be easier to include a check sum of the cached data to detect corruption so that ccache won't repeatedly deliver a bad object file (due to hardware error
Re: [ccache] Caching failed compilations
On 05/07/15 16:47, Joel Rosdahl wrote: Hi, I did have a look at how feasible it is, and basically I think it can be done. Yes, caching failures (from the compiler, not the preprocessor) would be feasible and I think that it's a good idea, at least as an optional feature. I'm not very tempted to add a new kind of file for storing the exit code in the cache, though. I certainly agree that it's not appropriate to have an exit code: 0 file for every successful compile. I'd also suggest that having a separate file hold error exit codes would be confusing should a compile fail (due to out-of-memory, say) and then a subsequent run succeeds and is cached (using CCACHE_RECACHE), and having every cache-miss have to check for, and maybe delete another file is only going to hurt performance more. After thinking further, I'd be tempted to say that ccache should *not* cache failures with exit codes other than 1 as they're likely not repeatable (OOM, Crtl-C, etc.). Perhaps just signal a failed compile with a cache result that is present but zero-length? (We could also say that it failed if the the binary cache is missing, but the stderr cache is present, but that might be problematic.) Instead I think that we should switch to store only one file per result in the cache (except the manifest) and store everything (exit code, stderr, object file, dependency file, etc.) in it. The downside would be that hard_link mode wouldn't be possible anymore but the upside is that fewer i-nodes will be used, which should improve performance in theory. (Today at least two i-nodes are used per cached result plus one for the manifest.) Sorry, I don't see any advantage in this scheme. You might save a few bytes of disk space, and maybe a few inodes, but I've not seen any evidence that those are a problem. You'll also add extra file copies to every cache miss, and those are already expensive enough. If you do go this route, please consider using an open format, like tar, and turn on compression by default to offset the internal padding (if the benchmarks don't show it hurting). Andrew ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache
Re: [ccache] Caching failed compilations
After thinking further, I'd be tempted to say that ccache should *not* cache failures with exit codes other than 1 as they're likely not repeatable (OOM, Crtl-C, etc.). Perhaps just signal a failed compile with a cache result that is present but zero-length? (We could also say that it failed if the the binary cache is missing, but the stderr cache is present, but that might be problematic.) That sounds like a reasonable idea, but I have occasionally seen empty object files in large and busy caches (it could be due to filesystem failure, hardware failure or hard system reset), so I'm afraid that using zero-length object files won't work out in practice. See also https://bugzilla.samba.org/show_bug.cgi?id=9972. But maybe writing some special content to the object file would be OK? Sorry, I don't see any advantage in this scheme. You might save a few bytes of disk space, and maybe a few inodes, but I've not seen any evidence that those are a problem. You'll also add extra file copies to every cache miss, and those are already expensive enough. My primary motivation for considering the mentioned scheme is to reduce disk seeks, not disk space. If you have a cold disk cache (on a rotating device), every new i-node that needs to be visited potentially/likely needs a new disk seek, which is slow. If all parts of the result are stored in one contiguous file, it should likely be quicker to retrieve. But as mentioned earlier, I have no data to back up this theory yet. A secondary motivation for the scheme is that various code paths in ccache need to handle multiple files for a single result. There can now be between two (stderr, object) and six (stderr, object, dependency, coverage, diagnostics, split dwarf) files for each cached result. If one of those files is missing, then the result should be invalid. This is quite painful and there are most likely some lurking bugs related to this. A third motivation is that it would be easier to include a check sum of the cached data to detect corruption so that ccache won't repeatedly deliver a bad object file (due to hardware error or whatnot). Does this sound reasonable? What disadvantages do you see? -- Joel ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache
Re: [ccache] Caching failed compilations
Hi, I did have a look at how feasible it is, and basically I think it can be done. Yes, caching failures (from the compiler, not the preprocessor) would be feasible and I think that it's a good idea, at least as an optional feature. I'm not very tempted to add a new kind of file for storing the exit code in the cache, though. Instead I think that we should switch to store only one file per result in the cache (except the manifest) and store everything (exit code, stderr, object file, dependency file, etc.) in it. The downside would be that hard_link mode wouldn't be possible anymore but the upside is that fewer i-nodes will be used, which should improve performance in theory. (Today at least two i-nodes are used per cached result plus one for the manifest.) -- Joel ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache
Re: [ccache] Caching failed compilations
On 30/06/15 15:30, Akim Demaille wrote: Could ccache offer a mode where even failed compiled be ccached? I'd like to see this too! Or rather, I would have liked to see this on a project I worked on a while ago. It's not really a common use case for most people, I'd imagine, and not something I'd use now (my compile failures would be almost entirely cache misses). I did have a look at how feasible it is, and basically I think it can be done. The caching is based on a hash of the input source files, combined with the command line arguments and various other environmental factors; in particular, the content of the output binaries is immaterial. The warning messages that accompany the binary are already cached, so caching error messages is not a big deal. It merely needs some sort of mechanism to record the exit code. I don't believe it would be possible to cache compilations where a header file is missing, for example, but those fail very quickly anyway. Andrew ___ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache