>> For all you guys, is the current caching - all filesystem based - useful 
>> enough? I've been chewing on a network
>> based extension, for all those disposable builders that don't really have 
>> great ways to cache

I am indeed finding that the built-in SCons caching isn’t conducive to network 
caching. I was preparing a separate e-mail about it but I’ll just include it 
here. Let me know if you want me to start a new thread for the discussion.

The basic summary is that the current cache implementation asks for the file 
when it needs it and doesn’t have any bulk frontload capability, so if I have 
5000 targets, I would have to do 5000 roundtrips to the server. That isn’t 
going to work for network caching, especially given that we want to integrate 
with virtual filesystems so we only hydrate the targets that are actually used 
by developers.

--- Background ---

One of the things we are working on at VMware is implementing remote caching 
using SCons. We are hoping to upstream as many of our changes as possible, so I 
am hoping to get ideas on how to do this right. I hope to send out a summary of 
the plans we have soon (dependency enforcement, remote caching, and 
platform-specific virtual filesystem integration), but for now I need help with 
one specific problem in our prototype: bulk cache frontloading.

--- Background on existing caching mechanism ---

Currently the SCons caching mechanism (as implemented by the Taskmaster and 
CacheDir classes) does just-in-time caching where SCons first asks a CacheDir 
object whether it has a target file in the cache. If it does, it proceeds onto 
the next target file in the targets list for that action (if any). If it 
doesn’t, it skips asking the cache for the rest of the target files in the 
targets list of that action and just runs the action. If all targets are 
retrieved from the cache, the action is not run.

--- Downsides of existing caching mechanism ---

This mechanism wouldn’t work for remote caching because it is 
latency-sensitive. If I am building 5000 files, populating the cache would 
require up to 5000 roundtrips to the cache server. If I have 100ms latency to 
the cache server, that is an overhead of 500 seconds.

--- What I’d like to do ---

I’d like to implement a --cache-frontload parameter that does two runs through 
the node graph:


  1.  An initial dry run where we generate the content signatures of all target 
files.
     *   This run culminates with a call to the CacheDir object, e.g. 
retrieve_all(allNodes) where “allNodes” contains (in the example from the 
previous section) 5000 entries, each of which has the result of 
get_cachedir_csig and the full file path.
  2.  A second run where we run any actions that were not fulfilled from cache.

I tried implementing this but ran into problems resetting the node graph 
between steps #1 and #2. Anything not retrieved from the cache needs to be 
reset to “pending” or “no state”, but ideally the cached children should be 
retained so we don’t need to scan the files again. The problems I am running 
into with resetting the node graph include:


  1.  Easily and quickly accessing all nodes that were iterated over during the 
dry run.
  2.  Resetting the state of all nodes not retrieved from cache.
  3.  Reversing seemingly destructive “end of lifecycle” actions from objects.

The most I could try to do was to remember all Node objects (including build 
targets and containing directories) and then do the following on each object:


  1.  node.set_state(SCons.Node.No_state)
  2.  node.clear()
  3.  node.clear_memoized_values()
  4.  node.executor_cleanup()

But it seems like a hack and I haven’t been able to get it to work well.

Has anyone tried doing something like this before? Any recommendations where to 
start?


From: Mats Wichmann <m...@wichmann.us>
Sent: Friday, May 24, 2019 5:51 PM
To: SCons developer list <scons-dev@scons.org>; Andrew C. Morrow 
<andrew.c.mor...@gmail.com>
Cc: Adrian Oney <ao...@vmware.com>; Adam Gross <gros...@vmware.com>
Subject: Re: [Scons-dev] Looking for help mapping Windows pdb semantics to SCons

For all you guys, is the current caching - all filesystem based - useful 
enough? I've been chewing on a network based extension, for all those 
disposable builders that don't really have great ways to cache

On May 24, 2019 3:45:01 PM MDT, "Andrew C. Morrow" 
<andrew.c.mor...@gmail.com<mailto:andrew.c.mor...@gmail.com>> wrote:

Hi Adam -

I'm working in this same area (caching and debug info handling) for the SCons 
based MongoDB build system, right now.

Overall, I am trying to move to a model on Windows that is more like using 
-gsplit-dwarf with the GNU tools, where every object file gets a (cacheable) 
.pdb, and then we link with /DEBUG:fastlink, and defer the final per 
library/executable PDB to a post link step by using mspdbcmf. This is similar 
to using dwp to package up the .dwo files.

You can see some of my very much work-in-progress state here: 
https://github.com/acmorrow/mongo/blob/SERVER-33661/site_scons/site_tools/separate_debug.py<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Facmorrow%2Fmongo%2Fblob%2FSERVER-33661%2Fsite_scons%2Fsite_tools%2Fseparate_debug.py&data=02%7C01%7Cgrossag%40vmware.com%7C81af0c69a9314136f95c08d6e091ead5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C636943314661182429&sdata=oOb3tjn0SdjIp4o21UO%2F1UQP%2BiR7Z4HH0PpC7Kw3vb4%3D&reserved=0>

Unfortunately, I've encountered one showstopper issue for us: 
https://developercommunity.visualstudio.com/content/problem/573023/absolute-paths-for-associated-pdb-files-are-record.html<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdevelopercommunity.visualstudio.com%2Fcontent%2Fproblem%2F573023%2Fabsolute-paths-for-associated-pdb-files-are-record.html&data=02%7C01%7Cgrossag%40vmware.com%7C81af0c69a9314136f95c08d6e091ead5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C636943314661192426&sdata=NcNX3WU4SObfVzt7wCagJqU3Xo3phiiGlfJJP7fMGoc%3D&reserved=0>,
 and I'm waiting to hear back on it.

The next steps in my current approach would be to move the actions that produce 
the finalized .dwp, .dSYM, or .pdb file into separate builders, rather than 
adding them as actions to the .Program and .SharedLibrary builders. That would 
allow the build tasks to finalize the debug information to be executed 
separately, or not at all for developer builds where keeping the debug info in 
separated per-object files is sufficient.

If you are interested, I'd be happy to collaborate (off-list initially?) to 
discuss some of the issues we have encountered and find a way to avoid 
duplication of effort. Improving the debug info handling situation is something 
I'm keenly interested in, as it is a major bottleneck in our build performance.

Thanks,
Andrew



On Fri, May 24, 2019 at 3:45 PM Tomasz Gajewski 
<to...@wp.pl<mailto:to...@wp.pl>> wrote:

Adam Gross via Scons-dev <scons-dev@scons.org<mailto:scons-dev@scons.org>> 
writes:

> I am investigating better supporting caching with SCons at VMware and
> am trying to see if I can teach SCons about pdb files.

Is there any problem for your use cases in using /Z7 option for
compilation? That tells the compiler to embed debug data in .obj file
like on linux. Then during linking pdb's are created. It works at least
for shared libraries and executables.

Regards
Tomasz Gajewski

_______________________________________________
Scons-dev mailing list
Scons-dev@scons.org<mailto:Scons-dev@scons.org>
https://pairlist2.pair.net/mailman/listinfo/scons-dev<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpairlist2.pair.net%2Fmailman%2Flistinfo%2Fscons-dev&data=02%7C01%7Cgrossag%40vmware.com%7C81af0c69a9314136f95c08d6e091ead5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C636943314661192426&sdata=pr3AaLF5m5P%2FUB0nQBdTRF69Wu2f3lhz6JNOR%2F%2FNkN8%3D&reserved=0>

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Scons-dev mailing list
Scons-dev@scons.org
https://pairlist2.pair.net/mailman/listinfo/scons-dev

Reply via email to