Jonathon, I've seen the clone before passing in other builds. I'm wondering if you could put your environment in a read-only mode before passing it (not allow changes to be made), would that suffice and remove the desire/need to clone()?
-Bill On Sun, Jul 23, 2017 at 4:51 PM, Jonathon Reinhart < [email protected]> wrote: > I just wanted to add some quick anecdotes. In some of our largest, most > complicated builds, we have observed a lot of the same things as you all > have. > > One time we did some quick profiling, and saw that much CPU time during a > null build was spent in the variable substitution. > > Additionally, we also have a habit of cloning the environment before > passing it to a SConscript. This is for safety - to ensure that a child > SConscript can't mess up the environment for its siblings. > > > Jonathon Reinhart > > > On Sat, Jul 22, 2017 at 5:23 PM, Bill Deegan <[email protected]> > wrote: > >> Jason, >> >> Any chance you could add these comments to the wiki page? >> https://bitbucket.org/scons/scons/wiki/NeedForSpeed >> >> -Bill >> >> On Sat, Jul 22, 2017 at 10:09 AM, Jason Kenny <[email protected]> wrote: >> >>> Some additional thoughts >>> >>> >>> >>> Serial DAG traversal: >>> >>> - On the issue here as well is that the Dag for doing builds is >>> based on nodes. There is a bit of logic to deal with handing side effects >>> and build actions that have multiple outputs. Greg Noel had made a push >>> for >>> something called TNG taskmaster. I understand now the main fix he was >>> going >>> for is to tweak SCons to navigate a builder Dag instead of Node DAG, the >>> node Dag is great to get the main organization but after that it is >>> generally trivial to make a DAG based on builder at the same time, >>> Traversing this is much faster, we require less “special” logic and will >>> be >>> easier to parallelize. >>> - On big improvement this provides is that we only need to test >>> if the sources or targets are out of date if the dependent builders >>> are all >>> up to date. If one of the is out of date, we just build, This vs we >>> check >>> each node and see if the build action has been done which requires >>> extra >>> scans and work in the current logic. >>> - Given a builder is out of data you just mark all parents out of >>> date. We only care about builders in a set that we don’t know are out >>> of >>> date yet. Simple tweaks on how we go through the tree can mean we >>> only need >>> to touch a few nodes. >>> >>> Start up time: >>> >>> - Zero build time is going to be the worse case for a build up to >>> date, as we have to make sure all items are in a good state. Time to >>> start >>> building on diff should be a lot faster. Scons spends a lot of time >>> having >>> to read everything on second passes. We can use our cache much better to >>> store states on what builds what, etc to avoid even having to read a >>> file. >>> If the file did not change we already know the node/builder tree it will >>> provide. We already know the actions. We can start building items as soon >>> as a md5/time stamp check fails most of the time. Globs can store >>> information about what it read and processed and only need to go off when >>> we notice a directory timestamp. Avoiding processing build files and >>> loading known state is much faster than processing the python code. My >>> work >>> in Parts has shown this. The trick is knowing when you might have to >>> load a >>> file again to make sure custom logic get processed correctly. >>> - In the case of Parts it would be great to load file concurrently >>> and in parallel. I think I have a way to go this concurrently which I >>> have >>> not done yet. The main issue is the node FS object tree is a sync point >>> for >>> being parallel. >>> >>> CacheDir: >>> >>> 100% agree.. >>> >>> SConsign generation: >>> >>> - I think this is a bigger deal for larger builds. I have found in >>> Parts, as I store more data I would try to break up the items into >>> different files. This helps, but in the end, at some point a pickle or >>> JSON >>> dump takes times. It also takes time to load them as in cases for builds >>> I >>> have had, loading 700mb files takes even the best systems a moment to do. >>> This is a big waste when I only need to get a little bit of data. >>> Likewise, >>> the storing of the data could and should be happening as we build items. >>> As >>> noted we don’t have a good way to store a single item without storing all >>> the file. If the file is large 100MB to GBs this can take time, as in >>> many >>> seconds, which in the end annoy users. I would say with what I do have >>> working well in Parts that the data storage, retrieval is the big time >>> suck. Addressing this would have the largest impact me. >>> >>> Process spawning: >>> >>> - I add this as We had submitted a sub process fix for POSIX >>> systems. The code effect larger builds more than smaller builds because >>> of >>> forking behavior. I don’t believe it been added to SCons as of yet. >>> - As a side design note, If we did make a multiprocessing setup for >>> SCons, This might be less of an issue, as the “process” workers only need >>> information about a build to run on. Changing of nodes state would have >>> to >>> be synced with the main process via messages as there would be no fast >>> efficient way to share the whole tree across all the process. >>> - Another thought is we might want to look at some nested parallel >>> strategies to make a task like setup that might allow us to use the TBB >>> python library to avoid the GIL issue. However, given my time on >>> SCons/Parts I think the change of a taskmaster to go over a builder DAG >>> will have the biggest effect >>> >>> >>> >>> Variable Substitution: >>> >>> I abuse this in Parts to share data in a lazy fashion between >>> components. It has been a sore point for me, given reason stated below. We >>> have done some work to address the items by reusing states better. I can >>> say there are some issues with the current code that causes memory bloat >>> and wasted time. I don’t want to dwell on this, but will say that this is >>> the second biggest item in my mind that would have a big impact to overall >>> time to the user. I know I want to change the load logic in Parts to avoid >>> using the substitution engine as much as possible. >>> >>> >>> >>> Environment creation: >>> >>> It easy to define lots of different environment in a large >>> build. How you do this is can be subtitle and have a huge effect on build >>> time. Ideally, you always want to clone the “default” environment you have >>> or pass values into builders, not the environment. I feel that it better >>> for SCons to define a more Default environment and all environment created >>> are clones. I would also push to have all Clone be a copy of write >>> environment. There are still cases in which the user needs a “clean” >>> environment, however, in my experience, the common case of all the >>> environments I have made in Parts are only small copy on write clones from >>> a common base. I think we should have more copy on write higher up the >>> stack. At the moment the class that does copy on write are used in >>> builders, not in the Clones. >>> >>> Configure check performance: >>> >>> - For me so far I try to avoid this feature as much as I can. >>> However, it does have it uses. I feel from using automake at the moment >>> SCons version is faster, but lacks some common features. The main issue I >>> have seen is that a user can make complex logic that can run slow. For a >>> project I am working on porting from automake, the item for me is if >>> there >>> is a better way to say this in SCons. At the moment it is a lot of code >>> that is easy to break. I would like a better way to express this. I feel >>> this could help address maintainability issues with configure logic as >>> well >>> as avoiding certain speed issues to better use Scons logic to check if we >>> need to >>> >>> >>> >>> Some last thoughts: >>> >>> 1. The big value SCons tends to have for me is the ability to create >>> reproducible environments to do a build. One that is not broken because >>> of >>> different shells the user might be running in. This ability to duplicate >>> exactly on a dumb shell is a huge win. The use of SConsign to help store >>> tool state is an item I want to improve on in the Parts toolchain >>> improvements. I think for SCons this is a win as well. More so for people >>> using SCons to cross build. There is a time to start up we can avoid by >>> some smarter logic on using what we know about tools. Honestly, tools >>> don’t >>> get added or removed as often as we change build files or source files. >>> 2. Given the common case for most devs would be to build changes in >>> the source, It seems to me using our cache better to speed this up would >>> have a big effect. We can detect changes in inputs that would cause us >>> load >>> build files. Most of the time the user added/removed code that has no >>> effect on the actions we would call in the end. Even with changes to >>> imports/include we don’t need to load build files we already processed. >>> The >>> Scanner can deal with that for us. >>> 3. Being smarter about how we store data could help us reduce what >>> we keep in memory for a non-interactive build. This can help large builds >>> as having to load a 2-3GB tree takes resources we would rather use on >>> other >>> items. I think we have options to store information and possible use of >>> generators to reduce memory overhead and improve build speeds. >>> 4. Given multiprocessing thinking, the main issue is that we have a >>> large data tree. Sharing this tree across processes will be slow. We need >>> to avoid this as much as we can. Using processes to do work that can be >>> independent as possible and pass state to the main thread about node >>> state >>> which has the main data structure will work much better. This should >>> have a >>> positive effect on builder based on Python code as they can build >>> independently. In all cases of builders, we have to address that I have >>> seen builder that try to set state in the environment or globally. These >>> states have to shared or avoided in some way. I not suggesting how to >>> solve >>> this.. but this will be a design issue to address. >>> 5. Last item is that no matter how good SCons is.. people will want >>> to be able to generate build files for a different system. The current >>> logic for Visual studio, for example, tries to make a makefile project to >>> run SCons. The users really want to make a MSBuild project. We should do >>> that. Likewise, we should be better at working with other build system >>> projects. Having good middleware to allow building or working with an >>> automake or CMake project will help adoption. CMake is doing well because >>> it is a build generator, same with Meson. You want to cover your bases >>> with >>> your users. Systems like these make it easy to do so. >>> >>> >>> >>> When I was at Intel some of the people helping me made a profiler for >>> Python in Intel VTune. I believe they are still working on that. It was >>> useful at making fixes that were not obvious in Parts to get speed >>> improvements. Since SCons is open source, you can use this tool for free. I >>> would recommend it as it will give you some incite the default tools will >>> not provide as well. >>> >>> >>> >>> >>> >>> Jason >>> >>> >>> >>> >>> >>> *From:* Scons-dev [mailto:[email protected]] * On Behalf Of >>> *Andrew >>> C. Morrow >>> *Sent:* Friday, July 21, 2017 10:40 AM >>> *To:* SCons developer list <[email protected]> >>> *Subject:* [Scons-dev] SCons performance investigations >>> >>> >>> >>> >>> >>> Hi scons-dev - >>> >>> >>> >>> The following is a revised draft of an email that I had originally >>> intended to send as a follow up to https://pairlist4.pair.net/ >>> pipermail/scons-users/2017-June/006018.html. Instead, Bill Deegan and I >>> took some time to expand on my first draft and add some ideas about how to >>> address some of th e issues. We hope to migrate this to the wiki, but >>> wanted to share it here first for feedback. >>> >>> >>> >>> ---- >>> >>> >>> >>> Performance is one of the major challenges facing SCons. When compared >>> with other current options, particularly Ninja, in many cases performance >>> can lag significantly. That said other options by and large lack the >>> extensibility and many features of SCons. >>> >>> >>> >>> Bill Deegan (SCons project co-manager) and I have been working together >>> to understand some of the issues that lead to poor SCons performance in a >>> real world (and fairly modestly sized) C++ codebase. Here is a summary of >>> some of our findings: >>> >>> >>> >>> - Python code usage: There are many places in the codebase where >>> while the code is correct, performance based on cpython’s implementation >>> can be improved by minor changes. >>> >>> >>> - Examples >>> >>> >>> - Using for loops and hashes to uniquify a list. Simple change in >>> Node class yielded approximately 15% speedup for null build >>> - Using if x.find(‘some character’) >=0 instead of is ‘some >>> character’ in x (timeit benchmark shows a 10x speed difference) >>> >>> >>> - Method to address >>> >>> >>> - Profile the code looking for hotspots with cprofile and >>> line_profiler. Then look for best implementations of code. (Use >>> timeit if >>> useful to compare implementations. There are examples of such in >>> the bench >>> dir (see: https://bitbucket.org/scons/sc >>> ons/src/68a8afebafbefcf88217e9e778c1845db4f81823/bench/?at=d >>> efault >>> >>> <https://bitbucket.org/scons/scons/src/68a8afebafbefcf88217e9e778c1845db4f81823/bench/?at=default> >>> ) >>> >>> >>> - Serial DAG traversal: SCons walks the DAG to find out of date >>> targets in a serial fashion. Once it finds them, it farms the work out to >>> other threads, but the DAG walk remains serial. Given the proliferation >>> of >>> multicore machines since SCons’ initial implementation, a parallel walk >>> of >>> the DAG would yield significant speedup. Likely this would require >>> implementation using the multiprocessing python library (instead of >>> threads), since the GIL would block real parallelism otherwise. Packages >>> like Boost where there are many header files can cause large increases in >>> the size of the DAG, exacerbating this issue. There are two serious >>> consequences of the slow DAG walk: >>> >>> >>> - Incremental rebuilds in large projects. Typical developer workflow >>> is to edit a file, rebuild, test. In our modestly sized codebase, we >>> see >>> the incremental time to do an ‘all’ rebuild for a one file change can >>> reach >>> well over a minute. This time is completely dominated by the serial >>> dependency walk. >>> - Inability to saturate distributed build clusters. In a >>> distcc/icecream build, the serial DAG walk is slow enough that not >>> enough >>> jobs can be farmed out in parallel to saturate even a modest (400 cpu) >>> build cluster. In our example, using ninja to drive a distributed full >>> build results in an approximately 15x speedup, but SCons can only >>> achieve a >>> 2x speedup. >>> - Method to address: >>> >>> >>> - Investigate changing tree walk to generator >>> - Investigate implementing tree walk using multiprocessing >>> library >>> >>> >>> - The dependency graph is the python object graph: The target >>> dependency DAG is modeled via python Node Object to Node Object linkages >>> (e.g. a list of child nodes held in a node). As a result, the only way to >>> determine up-to-date-ness is by deeply nested method calls that >>> repeatedly >>> traverse the Python object graph. An attempt is made to mitigate this by >>> memoizing state at the leaves (e.g. to cache the result of stat calls), >>> but >>> this still results in a large number of python function invocations for >>> even the simplest state checks, where a result is already known. >>> Similarly, >>> the lack of global visibility precludes using externally provided change >>> information to bypass scans. >>> >>> >>> - See above re generator >>> - Investigate modeling state separately from the python Node >>> graph via some sort of centralized scoreboarding mechanism, it seems >>> likely >>> that both the function call overhead could be eliminated and that >>> local >>> knowledge could be propagated globally more effectively. >>> >>> >>> - CacheDir: There are some issues listed below. End-to-end caching >>> functionality of SCons, including generated files, object files, shared >>> libraries, whole executables, etc., is one of its great strengths, but >>> its >>> performance has much room for improvement. >>> >>> >>> - Existing bug(s) when combining CacheDir with MD5-Timestamp >>> devalues CacheDir. >>> >>> >>> - Bug: http://scons.tigris.org/issues/show_bug.cgi?id=2980 >>> >>> >>> - Performance issues: >>> >>> >>> - CacheDir re-creates signature data when extracting nodes from the >>> Cache, even though it could have recorded the signature when >>> entering the >>> objects into the cache. >>> >>> >>> - Method to address >>> >>> >>> - Store signatures for items in cachedir and then use them directly >>> when copying items from Cache. >>> - Fix the CacheDir / MD5-Timestamp integration bug >>> >>> >>> - SConsign generation: The generation of the SConsign file is >>> monolithic, not incremental. This means that if only one object file >>> changed, the entire database needs to be re-written. It also appears that >>> the mechanism used to serialize it is itself slow. Moving to a faster >>> serialization model would be good, but even better would be to move to a >>> faster serialization model that also admitted incremental updates to >>> single >>> items. >>> >>> >>> - Method to address: >>> >>> >>> - Replace sconsign with something faster than the current >>> implementation, which is based on Pickle. >>> - And/or Improve sconsign with something which can >>> incrementally only write that which has changed. >>> >>> >>> - Configure check performance: Even cached Configure checks seems >>> slow, and for a complexly configured build this can add significant >>> startup >>> cost. Improvements here would be useful. >>> >>> >>> - Method to address: >>> >>> >>> - Code inspection, look for improvements >>> - Profile >>> >>> >>> - Variable Substitution: Currently variable substitution, which is >>> largely used to create the command lines run by SCons, uses an >>> appreciable >>> percentage (approximately 18% for a null incremental build) of SCons’ CPU >>> runtime. By and large much of this evaluation is duplicate (and thus >>> avoidable work). For the moderate sized build discussed above there are >>> approximately 100k calls to evaluation substitutions. There are only 413 >>> unique strings to be evaluated. Consider that the CXXCOM variable is >>> expanded 2412 times for this build. The only variables which are >>> guaranteed >>> unique are the SOURCES and TARGETS, all others could be evaluated once >>> and >>> cached. >>> >>> >>> - Prior work on this item: >>> >>> >>> - https://bitbucket.org/scons/scons/wiki/SubstQuoteEscapeCache >>> /Discussion >>> >>> >>> - Working doc on current and areas for improvement: >>> >>> >>> - https://bitbucket.org/scons/scons/wiki/SubstQuoteEscapeCache >>> /SubstImprovement2017 >>> >>> >>> - Method to address: >>> >>> >>> - Consider pre-evaluating Environment() variables where reasonable. >>> This could use some sort of copy-on-write between cloned >>> Environments. This >>> pre-evaluation would skip known target specific variables >>> (TARGET,SOURCES,CHANGED_SOURCES, and a few others), so >>> minimally the per command line substitution should be faster. >>> >>> >>> >>> Bill and I would appreciate any feedback or thoughts on the above items, >>> or suggestions for other areas to investigate. We are hoping that by >>> addressing some or all of these items, the runtime overhead of SCons could >>> be brought down significantly enough to re-render it competitive with other >>> build systems. We hope to begin work on the above items once SCons 3.0 has >>> shipped. >>> >>> >>> >>> Thanks, >>> >>> Andrew >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Scons-dev mailing list >>> [email protected] >>> https://pairlist2.pair.net/mailman/listinfo/scons-dev >>> >>> >> >> _______________________________________________ >> Scons-dev mailing list >> [email protected] >> https://pairlist2.pair.net/mailman/listinfo/scons-dev >> >> > > _______________________________________________ > Scons-dev mailing list > [email protected] > https://pairlist2.pair.net/mailman/listinfo/scons-dev > >
_______________________________________________ Scons-dev mailing list [email protected] https://pairlist2.pair.net/mailman/listinfo/scons-dev
