Re: [ccache] Why not cache link commands?

2012-09-19 Thread Andrew Stubbs

On 18/09/12 22:59, Mike Frysinger wrote:

the linker's --build-id and associated .note.gnu.build-id section.  you can't
hash the entire object because it can change between compiles.  build-id lets
you say regardless of the hash of the entire object, we know the content that
matters is unchanged.


Ah, excellent, this is the sort of detail I was looking for!

My own brief experimentation shows that static libraries contain 
troublesome datestamps, but object files appear to be reproducible, 
given the same source and command line (the case ccache handles).


Under what circumstances can the binary change but the build-id remain 
the same? I'm aware of line number, and file path differences in the 
debug info. Is there anything else?


Anyway, as I understand it, ccache could dump the build-id section 
first, if there is one, and hash the entire binary second, if there 
isn't one.


I'm a bit concerned about the build-id though. As I read it, the 
build-id can't tell the difference between a stripped binary and one 
with full debug, and the two certainly produce different output (OK, a 
*very* smart tool could determine that, with a certain link command or 
script, two different inputs are equivalent, but let's not go there). It 
can't even tell the difference between an object with *only* debug.


Hashing the entire binary could lead to additional cache misses in the 
case that the user has made minor, unimportant changes to the build, but 
in the normal case the object file will have come from the cache anyway 
so this won't be a problem.


The library datestamps problem can be got around by hashing the output 
of ar p libNAME.a (perhaps combined with ar t libNAME.a, just to be 
safe, but certainly not with -v), or perhaps objdump -j 
.note.gnu.build-id -s libNAME.a if we want to use build-ids.



-### isn't meant to be a wildcard. That's an actual GCC option. I put
quotes around it because most shells would interpret the hashes as the
start of a comment.


hmm, gotcha.  it does seem to include all the necessary info.  whether it's
easy for a machine to parse across gcc versions is a diff question :).  seems
to have changed subtly over time between 3.3.6 and 4.7.1.


Probably true, but it ought to be possible to determine if we do 
understand it, or not, and fall back to the old behaviour if not.


Andrew
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Why not cache link commands?

2012-09-19 Thread Andrew Stubbs

On 19/09/12 13:18, Eitan Adler wrote:

Under what circumstances can the binary change but the build-id remain the
same? I'm aware of line number, and file path differences in the debug info.
Is there anything else?


differing -frandom-seed options perhaps?


If you've changed the command line then you've already got a cache miss, 
but yes, that sounds plausible.


Andrew

___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Why not cache link commands?

2012-09-18 Thread Mike Frysinger
On Tuesday 18 September 2012 08:44:29 Andrew Stubbs wrote:
 Clearly there are some technical challenges in doing this: we'd have to
 hash all the object files and libraries (a la direct mode), but those
 problems are surmountable, I think.

or just re-use build-id ...

 The linker does not use any libraries not listed with gcc '-###' whatever.

mmm different gcc flags can implicitly expand into -l### or different crt 
objects, so you can't cache linking at the compiler driver level w/out re-
implementing much of the guts of gcc, and even then you'd break with 
moderately patched gcc versions.

 I'm also aware that it's not that interesting for many incremental
 builds, where the final link will always be different, but my use case
 is accelerating rebuilds of projects that my have many outputs, most of
 which are likely to be unaffected by small code changes. It's also worth
 noting that incremental builds are not the target use case for ccache in
 general.

gold should already support incremental linking (ala build-id), so i don't 
think that's already a fixed problem
-mike


signature.asc
Description: This is a digitally signed message part.
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Why not cache link commands?

2012-09-18 Thread Mike Frysinger
On Tuesday 18 September 2012 17:07:53 Andrew Stubbs wrote:
 On 18/09/12 21:04, Mike Frysinger wrote:
  On Tuesday 18 September 2012 08:44:29 Andrew Stubbs wrote:
  Clearly there are some technical challenges in doing this: we'd have to
  hash all the object files and libraries (a la direct mode), but those
  problems are surmountable, I think.
  
  or just re-use build-id ...
 
 Sorry, I'm probably being thick, but what do you mean?

the linker's --build-id and associated .note.gnu.build-id section.  you can't 
hash the entire object because it can change between compiles.  build-id lets 
you say regardless of the hash of the entire object, we know the content that 
matters is unchanged.

  The linker does not use any libraries not listed with gcc '-###'
  whatever.
  
  mmm different gcc flags can implicitly expand into -l### or different crt
  objects, so you can't cache linking at the compiler driver level w/out
  re- implementing much of the guts of gcc, and even then you'd break with
  moderately patched gcc versions.
 
 -### isn't meant to be a wildcard. That's an actual GCC option. I put
 quotes around it because most shells would interpret the hashes as the
 start of a comment.

hmm, gotcha.  it does seem to include all the necessary info.  whether it's 
easy for a machine to parse across gcc versions is a diff question :).  seems 
to have changed subtly over time between 3.3.6 and 4.7.1.

  I'm also aware that it's not that interesting for many incremental
  builds, where the final link will always be different, but my use case
  is accelerating rebuilds of projects that my have many outputs, most of
  which are likely to be unaffected by small code changes. It's also worth
  noting that incremental builds are not the target use case for ccache in
  general.
  
  gold should already support incremental linking (ala build-id), so i
  don't think that's already a fixed problem

err, typo here.  s/don't//.

 As I said, the interesting use case is *not* incremental links. The
 interesting use case is accelerating clean builds. ccache can never
 help where genuinely new inputs are involved.

right, i was just agreeing with you and providing more details as to how it 
already works today.
-mike


signature.asc
Description: This is a digitally signed message part.
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache