Re: [ccache] Stumbling blocks with ccache and embedded/encapsulated environments

2010-12-02 Thread Christopher Tate
On Wed, Dec 1, 2010 at 9:00 PM, Martin Pool m...@canonical.com wrote:
 On 11 November 2010 10:56, Christopher Tate ct...@google.com wrote:
 I don't want to rain on peoples' parade here, because ccache is a
 great product that has real benefits, but I do want to share some of
 our findings regarding the use of ccache in our very large product --
 we were surprised by them, and you may be as well.  These findings are
 specifically for *large products*.  In our case, the total source code
 file size is on the order of 3 gigabytes (which includes not only
 C/C++ but also Java source files, a couple hundred thousand lines of
 makefiles, etc).  It's the Android mobile phone OS, fwiw: it builds
 something like 1-2 gigabytes of .o files from C/C++ during a full
 build, and does a ton of Java compilation, resource compilation,
 Dalvik compilation, etc as well.

 I'd love to know whether you also tried distcc for it, and if so what
 happened or what went wrong.  (Obviously it can only help for the
 C/C++ phases.)

distcc can certainly help a great deal.  For us, it's a bit
problematic to use because more than half of our total build is
non-C/C++ that depends on the C/C++ targets [e.g. Java-language
modules that have partially native implementations], plus we have a
highly heterogeneous set of build machines: both Mac hosts and Linux,
not all the same distro of Linux, etc.  The inclusion of Macs in
particular makes distcc more of a pain to get up and running cleanly.

 The issue is around VM/file system buffer cache management.  If you're
 using ccache, then you'll effectively be doubling the number of .o
 files that are paged into memory during the course of a build.

 I'm just trying to understand how this happens.  Is it that when
 ccache misses it writes out an object file both to the cache directory
 and into the build directory, and both will be in the buffer cache?
 So it's not so much they're paged in, but they are dirtied in memory
 and will still be held there.

Even on a ccache *hit* both copies of the .o file wind up occupying
buffer cache space, because the ccached .o is read from disk [paging
it in] in order to write the .o file to the build output directory.
On a ccache miss the copy runs the other direction but you still wind
up with both sets of pages in the buffer cache.

 It seems like turning on compression would reduce the effect.

At the expense of the extra cpu time, sure.  That might be a decent
tradeoff; modern cpus are getting quite fast relative to I/O.

 Turning on hardlinking might eliminate it altogether, though that
 could have other bad effects.

Right.  We haven't tried pursuing this because for other reasons the
marginal returns are still pretty low, and tinkering with the build
system is fraught with peril.  :)

--
christopher tate
android framework engineer
___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Stumbling blocks with ccache and embedded/encapsulated environments

2010-12-02 Thread Paul Smith
On Wed, 2010-12-01 at 21:47 -0500, Paul Smith wrote:
 Now I'm on to my next problem.  In order to get this to happen I have
 to set CCACHE_BASEDIR to strip off the workspace directory prefix, so
 that the per-workspace filenames are not embedded in the cache.  This
 works (see above), however the result is not so nice.

Ugh.  I lied.  Actually GDB handles this just fine with no special
instruction; I had a problem on my test server (and then I misunderstood
how the GDB substitute-path feature worked).

This works because GDB remembers not only the path of the source file,
but also the working directory when the compile happened:

(gdb) info source
Current source file is ../../../src/subdir/foo/foo.c
Compilation directory is /path/to/ONE/obj/subdir/foo
Source language is c.
Compiled with DWARF 2 debugging format.
Does not include preprocessor macro info.

So GDB intelligently notices that the source file path is relative and
appends it to the compilation directory, and viola! [1]


I _think_ I'm all set now.  I'll check back if more issues surface.

Cheers!


-
[1] sic

___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] Stumbling blocks with ccache and embedded/encapsulated environments

2010-12-02 Thread Justin Lebar
 Even on a ccache *hit* both copies of the .o file wind up occupying
 buffer cache space, because the ccached .o is read from disk [paging
 it in] in order to write the .o file to the build output directory.
 On a ccache miss the copy runs the other direction but you still wind
 up with both sets of pages in the buffer cache.

 In the hit case I would have thought that the .o file you read would
 still create less memory pressure than the working memory of running
 the real compiler on that file?  Perhaps the difference is that the
 kernel knows that when the compiler exits, its anonymous pages can be
 thrown away, whereas it doesn't know which .o file it ought to retain.
  So perhaps madvise might help.  (Just speculating.)

I'm curious about this.  I guess you'd madvise to tell the kernel that
the .o you just wrote shouldn't be cached?  But presumably it should
be, because you're going to link your program.

Alternatively, you could madvise and tell the kernel not to cache the
.o file from ccache's cache.  But if you re-compile, you want ccache's
cache to be in memory.

I'm not sure how one might win here without hardlinking.

-Justin

On Thu, Dec 2, 2010 at 4:24 PM, Martin Pool m...@sourcefrog.net wrote:
 On 3 December 2010 03:42, Christopher Tate ct...@google.com wrote:
 I'd love to know whether you also tried distcc for it, and if so what
 happened or what went wrong.  (Obviously it can only help for the
 C/C++ phases.)

 distcc can certainly help a great deal.  For us, it's a bit
 problematic to use because more than half of our total build is
 non-C/C++ that depends on the C/C++ targets [e.g. Java-language
 modules that have partially native implementations],

 ... and you suspect that the Makefile dependencies are not solid
 enough to safely do a parallel build?

 plus we have a
 highly heterogeneous set of build machines: both Mac hosts and Linux,
 not all the same distro of Linux, etc.  The inclusion of Macs in
 particular makes distcc more of a pain to get up and running cleanly.

 That can certainly be a problem.

 I'm just trying to understand how this happens.  Is it that when
 ccache misses it writes out an object file both to the cache directory
 and into the build directory, and both will be in the buffer cache?
 So it's not so much they're paged in, but they are dirtied in memory
 and will still be held there.

 Even on a ccache *hit* both copies of the .o file wind up occupying
 buffer cache space, because the ccached .o is read from disk [paging
 it in] in order to write the .o file to the build output directory.
 On a ccache miss the copy runs the other direction but you still wind
 up with both sets of pages in the buffer cache.

 In the hit case I would have thought that the .o file you read would
 still create less memory pressure than the working memory of running
 the real compiler on that file?  Perhaps the difference is that the
 kernel knows that when the compiler exits, its anonymous pages can be
 thrown away, whereas it doesn't know which .o file it ought to retain.
  So perhaps madvise might help.  (Just speculating.)

 --
 Martin
 ___
 ccache mailing list
 ccache@lists.samba.org
 https://lists.samba.org/mailman/listinfo/ccache

___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache