Re: [ccache] Why not cache link commands?

2012-09-19 Thread Andrew Stubbs

On 19/09/12 13:18, Eitan Adler wrote:

Under what circumstances can the binary change but the build-id remain the
same? I'm aware of line number, and file path differences in the debug info.
Is there anything else?


differing -frandom-seed options perhaps?


If you've changed the command line then you've already got a cache miss, 
but yes, that sounds plausible.


Andrew

___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Why not cache link commands?

2012-09-19 Thread Eitan Adler
On 19 September 2012 05:43, Andrew Stubbs  wrote:
> On 18/09/12 22:59, Mike Frysinger wrote:
>>
>> the linker's --build-id and associated .note.gnu.build-id section.  you
>> can't
>> hash the entire object because it can change between compiles.  build-id
>> lets
>> you say "regardless of the hash of the entire object, we know the content
>> that
>> matters is unchanged".
>
>
> Ah, excellent, this is the sort of detail I was looking for!
>
> My own brief experimentation shows that static libraries contain troublesome
> datestamps, but object files appear to be reproducible, given the same
> source and command line (the case ccache handles).
>
> Under what circumstances can the binary change but the build-id remain the
> same? I'm aware of line number, and file path differences in the debug info.
> Is there anything else?

differing -frandom-seed options perhaps?


-- 
Eitan Adler
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Why not cache link commands?

2012-09-19 Thread Andrew Stubbs

On 18/09/12 22:59, Mike Frysinger wrote:

the linker's --build-id and associated .note.gnu.build-id section.  you can't
hash the entire object because it can change between compiles.  build-id lets
you say "regardless of the hash of the entire object, we know the content that
matters is unchanged".


Ah, excellent, this is the sort of detail I was looking for!

My own brief experimentation shows that static libraries contain 
troublesome datestamps, but object files appear to be reproducible, 
given the same source and command line (the case ccache handles).


Under what circumstances can the binary change but the build-id remain 
the same? I'm aware of line number, and file path differences in the 
debug info. Is there anything else?


Anyway, as I understand it, ccache could dump the build-id section 
first, if there is one, and hash the entire binary second, if there 
isn't one.


I'm a bit concerned about the build-id though. As I read it, the 
build-id can't tell the difference between a stripped binary and one 
with full debug, and the two certainly produce different output (OK, a 
*very* smart tool could determine that, with a certain link command or 
script, two different inputs are equivalent, but let's not go there). It 
can't even tell the difference between an object with *only* debug.


Hashing the entire binary could lead to additional cache misses in the 
case that the user has made minor, unimportant changes to the build, but 
in the normal case the object file will have come from the cache anyway 
so this won't be a problem.


The library datestamps problem can be got around by hashing the output 
of "ar p libNAME.a" (perhaps combined with "ar t libNAME.a", just to be 
safe, but certainly not with "-v"), or perhaps "objdump -j 
.note.gnu.build-id -s libNAME.a" if we want to use build-ids.



"-###" isn't meant to be a wildcard. That's an actual GCC option. I put
quotes around it because most shells would interpret the hashes as the
start of a comment.


hmm, gotcha.  it does seem to include all the necessary info.  whether it's
easy for a machine to parse across gcc versions is a diff question :).  seems
to have changed subtly over time between 3.3.6 and 4.7.1.


Probably true, but it ought to be possible to determine if we do 
understand it, or not, and fall back to the old behaviour if not.


Andrew
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Why not cache link commands?

2012-09-18 Thread Mike Frysinger
On Tuesday 18 September 2012 17:07:53 Andrew Stubbs wrote:
> On 18/09/12 21:04, Mike Frysinger wrote:
> > On Tuesday 18 September 2012 08:44:29 Andrew Stubbs wrote:
> >> Clearly there are some technical challenges in doing this: we'd have to
> >> hash all the object files and libraries (a la direct mode), but those
> >> problems are surmountable, I think.
> > 
> > or just re-use build-id ...
> 
> Sorry, I'm probably being thick, but what do you mean?

the linker's --build-id and associated .note.gnu.build-id section.  you can't 
hash the entire object because it can change between compiles.  build-id lets 
you say "regardless of the hash of the entire object, we know the content that 
matters is unchanged".

> >> The linker does not use any libraries not listed with "gcc '-###'
> >> whatever".
> > 
> > mmm different gcc flags can implicitly expand into -l### or different crt
> > objects, so you can't cache linking at the compiler driver level w/out
> > re- implementing much of the guts of gcc, and even then you'd break with
> > moderately patched gcc versions.
> 
> "-###" isn't meant to be a wildcard. That's an actual GCC option. I put
> quotes around it because most shells would interpret the hashes as the
> start of a comment.

hmm, gotcha.  it does seem to include all the necessary info.  whether it's 
easy for a machine to parse across gcc versions is a diff question :).  seems 
to have changed subtly over time between 3.3.6 and 4.7.1.

> >> I'm also aware that it's not that interesting for many incremental
> >> builds, where the final link will always be different, but my use case
> >> is accelerating rebuilds of projects that my have many outputs, most of
> >> which are likely to be unaffected by small code changes. It's also worth
> >> noting that incremental builds are not the target use case for ccache in
> >> general.
> > 
> > gold should already support incremental linking (ala build-id), so i
> > don't think that's already a fixed problem

err, typo here.  s/don't//.

> As I said, the interesting use case is *not* incremental links. The
> interesting use case is accelerating "clean" builds. ccache can never
> help where genuinely new inputs are involved.

right, i was just agreeing with you and providing more details as to how it 
already works today.
-mike


signature.asc
Description: This is a digitally signed message part.
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Why not cache link commands?

2012-09-18 Thread Andrew Stubbs

On 18/09/12 21:04, Mike Frysinger wrote:

On Tuesday 18 September 2012 08:44:29 Andrew Stubbs wrote:

Clearly there are some technical challenges in doing this: we'd have to
hash all the object files and libraries (a la direct mode), but those
problems are surmountable, I think.


or just re-use build-id ...


Sorry, I'm probably being thick, but what do you mean?


The linker does not use any libraries not listed with "gcc '-###' whatever".


mmm different gcc flags can implicitly expand into -l### or different crt
objects, so you can't cache linking at the compiler driver level w/out re-
implementing much of the guts of gcc, and even then you'd break with
moderately patched gcc versions.


"-###" isn't meant to be a wildcard. That's an actual GCC option. I put 
quotes around it because most shells would interpret the hashes as the 
start of a comment.


"-###" causes gcc to print the commands that it would run, including the 
link line (well, collect2, but same difference). We can read that and 
bypass reimplementing all of gcc. As you say, without this feature we 
couldn't predict what gcc will do: the compiler wouldn't even need to be 
patched if customer specs files were used.



I'm also aware that it's not that interesting for many incremental
builds, where the final link will always be different, but my use case
is accelerating rebuilds of projects that my have many outputs, most of
which are likely to be unaffected by small code changes. It's also worth
noting that incremental builds are not the target use case for ccache in
general.


gold should already support incremental linking (ala build-id), so i don't
think that's already a fixed problem


As I said, the interesting use case is *not* incremental links. The 
interesting use case is accelerating "clean" builds. ccache can never 
help where genuinely new inputs are involved.


Andrew

___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Why not cache link commands?

2012-09-18 Thread Mike Frysinger
On Tuesday 18 September 2012 08:44:29 Andrew Stubbs wrote:
> Clearly there are some technical challenges in doing this: we'd have to
> hash all the object files and libraries (a la direct mode), but those
> problems are surmountable, I think.

or just re-use build-id ...

> The linker does not use any libraries not listed with "gcc '-###' whatever".

mmm different gcc flags can implicitly expand into -l### or different crt 
objects, so you can't cache linking at the compiler driver level w/out re-
implementing much of the guts of gcc, and even then you'd break with 
moderately patched gcc versions.

> I'm also aware that it's not that interesting for many incremental
> builds, where the final link will always be different, but my use case
> is accelerating rebuilds of projects that my have many outputs, most of
> which are likely to be unaffected by small code changes. It's also worth
> noting that incremental builds are not the target use case for ccache in
> general.

gold should already support incremental linking (ala build-id), so i don't 
think that's already a fixed problem
-mike


signature.asc
Description: This is a digitally signed message part.
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Why not cache link commands?

2012-09-18 Thread Andrew Stubbs

On 18/09/12 16:37, Justin Lebar wrote:

ldcache would hash object files and spit out linked files.  It would
use an entirely separate cache.  Its handling of command-line options
would be entirely different.  Its processing of input files would be
entirely different.  ISTM that very little would be shared.


It takes multiple input files and returns a single output file, plus 
stderr. This much is the same.


An input object file is just as hashable as an input header file, you 
just find them a different way. I think the manifest file would need 
little or no modification.


Similarly, the output file is just as cacheable. There's probably no 
need to even use a different suffix in the cache.


I've yet to get into the precise details, but I think the file discovery 
mechanism would need to be abstracted out a little, but that's the 
biggest change.


The command line parsing would need a once over, of course. The biggest 
change there is that it's more normal to list multiple input files on 
the command line, and there's no "language" to determine.



Since this is targeting a niche use-case and is a large change to
ccache, I'd be hesitant to take this change upstream, if I were Joel.


Right, as little churn as possible, and no extra overhead in the most 
common cases.


Andrew
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Why not cache link commands?

2012-09-18 Thread Justin Lebar
> What I'm looking for is more concrete
> roadblocks I haven't considered.

You'd basically have to rewrite all of ccache.

ccache hashes header files and spits out object files.

ldcache would hash object files and spit out linked files.  It would
use an entirely separate cache.  Its handling of command-line options
would be entirely different.  Its processing of input files would be
entirely different.  ISTM that very little would be shared.

Since this is targeting a niche use-case and is a large change to
ccache, I'd be hesitant to take this change upstream, if I were Joel.

-Justin

On Tue, Sep 18, 2012 at 11:27 AM, Andrew Stubbs  wrote:
> On 18/09/12 15:31, Justin Lebar wrote:
>>>
>>> So, again, before I waste my time implementing this feature, are there
>>> any
>>> other fundamental gotchas that would prevent it ever working or ever
>>> being
>>> useful?
>>
>>
>> On a large project with many inputs to ld, you'd have to hash a /lot/
>> of object files, increasing the overhead of ccache substantially.  I
>> understand that this isn't your particular use-case, but it's the
>> common one.
>
>
> Yes, that's true, but those are also the most expensive link commands, so
> maybe it's not so bad.
>
> I realise that there's some risk that a cache miss can be expensive, and
> that a cache hit might be only a very little cheaper than the real link, but
> I'm prepared to take that risk. What I'm looking for is more concrete
> roadblocks I haven't considered.
>
> Incidentally, I'm also considering the possibility of caching the hashes and
> using the inode/size/mtime etc. to short-cut that process (perhaps as a
> "sloppiness" option), not only for objects, but also for sources.
>
>
>> If you're on Linux, have you tried the gold linker?
>
>
> Let's limit this discussion to what can be done with ccache, please. I
> assure you, we know about the toolchain options.
>
> Andrew
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Why not cache link commands?

2012-09-18 Thread Andrew Stubbs

On 18/09/12 15:31, Justin Lebar wrote:

So, again, before I waste my time implementing this feature, are there any
other fundamental gotchas that would prevent it ever working or ever being
useful?


On a large project with many inputs to ld, you'd have to hash a /lot/
of object files, increasing the overhead of ccache substantially.  I
understand that this isn't your particular use-case, but it's the
common one.


Yes, that's true, but those are also the most expensive link commands, 
so maybe it's not so bad.


I realise that there's some risk that a cache miss can be expensive, and 
that a cache hit might be only a very little cheaper than the real link, 
but I'm prepared to take that risk. What I'm looking for is more 
concrete roadblocks I haven't considered.


Incidentally, I'm also considering the possibility of caching the hashes 
and using the inode/size/mtime etc. to short-cut that process (perhaps 
as a "sloppiness" option), not only for objects, but also for sources.



If you're on Linux, have you tried the gold linker?


Let's limit this discussion to what can be done with ccache, please. I 
assure you, we know about the toolchain options.


Andrew
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Why not cache link commands?

2012-09-18 Thread Justin Lebar
> So, again, before I waste my time implementing this feature, are there any
> other fundamental gotchas that would prevent it ever working or ever being
> useful?

On a large project with many inputs to ld, you'd have to hash a /lot/
of object files, increasing the overhead of ccache substantially.  I
understand that this isn't your particular use-case, but it's the
common one.

If you're on Linux, have you tried the gold linker?

-Justin

On Tue, Sep 18, 2012 at 8:44 AM, Andrew Stubbs  wrote:
> Hi all, again,
>
> I've just posted about improving compile speed by caching compiler failures,
> and in the same vein I'd like to consider caching called-for-link compile
> tasks.
>
> This is partly interesting for the many small autoconf tests, but is also
> increasingly interesting for real compilations, now that
> whole-program-optimization and link-time-optimization is more available in
> GCC. Even without all this link-time compilation activity, there are some
> link operations that simply take forever, mostly due to large file sizes.
>
> Clearly there are some technical challenges in doing this: we'd have to hash
> all the object files and libraries (a la direct mode), but those problems
> are surmountable, I think. The linker does not use any libraries not listed
> with "gcc '-###' whatever".
>
> I'm also aware that it's not that interesting for many incremental builds,
> where the final link will always be different, but my use case is
> accelerating rebuilds of projects that my have many outputs, most of which
> are likely to be unaffected by small code changes. It's also worth noting
> that incremental builds are not the target use case for ccache in general.
>
> So, again, before I waste my time implementing this feature, are there any
> other fundamental gotchas that would prevent it ever working or ever being
> useful?
>
> Has anybody else ever tried to do this? Is anybody trying to do it now?
>
> Thanks
>
> Andrew
> ___
> ccache mailing list
> ccache@lists.samba.org
> https://lists.samba.org/mailman/listinfo/ccache
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache