As Jason said, if you run a profile on a reasonable sized build the MD5'ing doesn't really show much % of runtime.
Regarding file reads fed to MD5, I did some experiments with different sizes and didn't really see much change. Though perhaps my experiments were too casual to be definitive. I tried 1MB and at least one other size. -Bill On Mon, Jul 24, 2017 at 1:45 PM, Jason Kenny <[email protected]> wrote: > From performance point of view. I have found that the best performance is > to do timestamp-csig checks as most of the time the concern is that the > file did not change, and for various reasons the timestamp did. So the csig > check is more about making sure we don’t rebuild stuff that system quirks > might have causes us to waste time on. I have never seen the case of the > timestamp did not change but the file content did. > > > > Also the main time I have seen for csig checks is reading the contents, > not processing the sig. ( this is total time as there is a delay form the > OS to read the file data). So for me there is not going to be a major > improvement of speed to gained here. > > > > From a design point of view I agree this would be a nice improvement that > should not be hard to add on top of the default decider logic. Keep in mind > the decider logic is as it is to allow the user full control, it is not > user friendly. Small tweaks as providing an interface impl to control the > csig has creation based on given input would be nice in certain cases. Keep > in mind the hash logic has to have the property to be reproducible between > runs. Security hashes define this for us by default. A unique reversible > hash would be just as useful and if it was faster would not hurt. However I > don’t think it would make help much in terms of speed in the general case > as again reading data of disk ( or main memory) is the main time limiter. > > > > Another reason why this could be useful as it can be nice to allow a way > to filter the content before it is hashed, allowing the removal of > comments, for example, from the hash csig which I have seen as a common > request, or reason to define a more complex decider object. Such an > interface would have a more functional usage I believe, vs a performance > one. > > > > Another tweak that might help is a configurable way to control how SCons > reads the file data. At the moment it is hardcoded to a best guess block > size, different sizes may help greatly with read times. > > > > Jason > > > > *From:* Scons-dev [mailto:[email protected]] * On Behalf Of *Andrew > Featherstone > *Sent:* Monday, July 24, 2017 2:38 PM > *To:* SCons developer list <[email protected]> > *Subject:* Re: [Scons-dev] SCons performance investigations > > > > Could SCons use a faster hashing algorithm if available in preference to > the md5 default? I know the user can override the Decider, but it'd be nice > if SCons did this by itself. > > > > Andrew > > > > On 24 July 2017 at 18:42, Jason Kenny <[email protected]> wrote: > > I believe we are all clear on why we Clone the environment. I did not > understand you were asking if a new feature would be useful. > > > > I would say that it would be useful to allow Read Only environments in > certain cases. However My worry is that it would be used as a way to > enforce values that might need to be tweaked. So this might lead to more > cloning, or a feature request to prevent cloning. This goes against what I > find useful in making component to build in a larger project in a easy > pluggable way. I think a more useful feature would be to allow Keys to be > set as read-only and warn if the value changes. Then has a build to set a > error on warning feature to error when a change happens vs warn, and to > also allow for section of code to be exception ( in either direction) to > the rules that need to. > > > > I believe most of what I understand is that a more aggressive copy on > write environment would decrease build time and memory usage. The primary > issue is dealing with update to list variables like CPPFLAGS or better yet > CPPDEFINES directly which would require proxies to allow “native” updates > to happen in a COW like way safely. > > > > Jason > > > > Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for > Windows 10 > > > > *From: *Bill Deegan <[email protected]> > *Sent: *Monday, July 24, 2017 10:18 AM > *To: *SCons developer list <[email protected]> > *Subject: *Re: [Scons-dev] SCons performance investigations > > > > Jason, > > A somewhat common use model is to create a configured Environment, and > then clone it to pass to a subordinate SConscript. > > The only reason the clone is done is to prevent the subordinate SConscript > from polluting the environment. > > It is for this use that I was asking if a read-only Environment would be > useful. > > I'm also curious if in this case is it expected that the SConscript would > or should not modify the Environment? Is this a functional clone, or a > protective clone.. > > > > -Bill > > > > On Sun, Jul 23, 2017 at 8:41 PM, Jason Kenny <[email protected]> wrote: > > I am not sure what “read-only” mode is. But in Part I make lots of clones > ( lots) so making a lot of clones is not the issue, it is done as when you > have larger build you start to break up items in to groups/components ( ie > Parts) and you want to make sure a user of one component does not change a > value that would effect another Component. We could do better on how the > data is copied and shared. By making Clones more of a copy on right setup. > > > > Jason > > > > *From:* Scons-dev [mailto:[email protected]] *On Behalf Of *Bill > Deegan > *Sent:* Sunday, July 23, 2017 7:38 PM > *To:* SCons developer list <[email protected]> > *Subject:* Re: [Scons-dev] SCons performance investigations > > > > Jonathon, > > I've seen the clone before passing in other builds. > > I'm wondering if you could put your environment in a read-only mode before > passing it (not allow changes to be made), would that suffice and remove > the desire/need to clone()? > > -Bill > > > > On Sun, Jul 23, 2017 at 4:51 PM, Jonathon Reinhart < > [email protected]> wrote: > > I just wanted to add some quick anecdotes. In some of our largest, most > complicated builds, we have observed a lot of the same things as you all > have. > > > > One time we did some quick profiling, and saw that much CPU time during a > null build was spent in the variable substitution. > > > > Additionally, we also have a habit of cloning the environment before > passing it to a SConscript. This is for safety - to ensure that a child > SConscript can't mess up the environment for its siblings. > > > > > > Jonathon Reinhart > > > > > > On Sat, Jul 22, 2017 at 5:23 PM, Bill Deegan <[email protected]> > wrote: > > Jason, > > Any chance you could add these comments to the wiki page? > https://bitbucket.org/scons/scons/wiki/NeedForSpeed > > -Bill > > > > On Sat, Jul 22, 2017 at 10:09 AM, Jason Kenny <[email protected]> wrote: > > Some additional thoughts > > > > Serial DAG traversal: > > · On the issue here as well is that the Dag for doing builds is > based on nodes. There is a bit of logic to deal with handing side effects > and build actions that have multiple outputs. Greg Noel had made a push for > something called TNG taskmaster. I understand now the main fix he was going > for is to tweak SCons to navigate a builder Dag instead of Node DAG, the > node Dag is great to get the main organization but after that it is > generally trivial to make a DAG based on builder at the same time, > Traversing this is much faster, we require less “special” logic and will be > easier to parallelize. > > o On big improvement this provides is that we only need to test if the > sources or targets are out of date if the dependent builders are all up to > date. If one of the is out of date, we just build, This vs we check each > node and see if the build action has been done which requires extra scans > and work in the current logic. > > o Given a builder is out of data you just mark all parents out of date. > We only care about builders in a set that we don’t know are out of date > yet. Simple tweaks on how we go through the tree can mean we only need to > touch a few nodes. > > Start up time: > > · Zero build time is going to be the worse case for a build up to > date, as we have to make sure all items are in a good state. Time to start > building on diff should be a lot faster. Scons spends a lot of time having > to read everything on second passes. We can use our cache much better to > store states on what builds what, etc to avoid even having to read a file. > If the file did not change we already know the node/builder tree it will > provide. We already know the actions. We can start building items as soon > as a md5/time stamp check fails most of the time. Globs can store > information about what it read and processed and only need to go off when > we notice a directory timestamp. Avoiding processing build files and > loading known state is much faster than processing the python code. My work > in Parts has shown this. The trick is knowing when you might have to load a > file again to make sure custom logic get processed correctly. > > · In the case of Parts it would be great to load file concurrently > and in parallel. I think I have a way to go this concurrently which I have > not done yet. The main issue is the node FS object tree is a sync point for > being parallel. > > CacheDir: > > 100% agree.. > > SConsign generation: > > · I think this is a bigger deal for larger builds. I have found in > Parts, as I store more data I would try to break up the items into > different files. This helps, but in the end, at some point a pickle or JSON > dump takes times. It also takes time to load them as in cases for builds I > have had, loading 700mb files takes even the best systems a moment to do. > This is a big waste when I only need to get a little bit of data. Likewise, > the storing of the data could and should be happening as we build items. As > noted we don’t have a good way to store a single item without storing all > the file. If the file is large 100MB to GBs this can take time, as in many > seconds, which in the end annoy users. I would say with what I do have > working well in Parts that the data storage, retrieval is the big time > suck. Addressing this would have the largest impact me. > > Process spawning: > > · I add this as We had submitted a sub process fix for POSIX > systems. The code effect larger builds more than smaller builds because of > forking behavior. I don’t believe it been added to SCons as of yet. > > · As a side design note, If we did make a multiprocessing setup for > SCons, This might be less of an issue, as the “process” workers only need > information about a build to run on. Changing of nodes state would have to > be synced with the main process via messages as there would be no fast > efficient way to share the whole tree across all the process. > > · Another thought is we might want to look at some nested parallel > strategies to make a task like setup that might allow us to use the TBB > python library to avoid the GIL issue. However, given my time on > SCons/Parts I think the change of a taskmaster to go over a builder DAG > will have the biggest effect > > > > Variable Substitution: > > I abuse this in Parts to share data in a lazy fashion between components. > It has been a sore point for me, given reason stated below. We have done > some work to address the items by reusing states better. I can say there > are some issues with the current code that causes memory bloat and wasted > time. I don’t want to dwell on this, but will say that this is the second > biggest item in my mind that would have a big impact to overall time to the > user. I know I want to change the load logic in Parts to avoid using the > substitution engine as much as possible. > > > > Environment creation: > > It easy to define lots of different environment in a large > build. How you do this is can be subtitle and have a huge effect on build > time. Ideally, you always want to clone the “default” environment you have > or pass values into builders, not the environment. I feel that it better > for SCons to define a more Default environment and all environment created > are clones. I would also push to have all Clone be a copy of write > environment. There are still cases in which the user needs a “clean” > environment, however, in my experience, the common case of all the > environments I have made in Parts are only small copy on write clones from > a common base. I think we should have more copy on write higher up the > stack. At the moment the class that does copy on write are used in > builders, not in the Clones. > > Configure check performance: > > · For me so far I try to avoid this feature as much as I can. > However, it does have it uses. I feel from using automake at the moment > SCons version is faster, but lacks some common features. The main issue I > have seen is that a user can make complex logic that can run slow. For a > project I am working on porting from automake, the item for me is if there > is a better way to say this in SCons. At the moment it is a lot of code > that is easy to break. I would like a better way to express this. I feel > this could help address maintainability issues with configure logic as well > as avoiding certain speed issues to better use Scons logic to check if we > need to > > > > Some last thoughts: > > 1. The big value SCons tends to have for me is the ability to create > reproducible environments to do a build. One that is not broken because of > different shells the user might be running in. This ability to duplicate > exactly on a dumb shell is a huge win. The use of SConsign to help store > tool state is an item I want to improve on in the Parts toolchain > improvements. I think for SCons this is a win as well. More so for people > using SCons to cross build. There is a time to start up we can avoid by > some smarter logic on using what we know about tools. Honestly, tools don’t > get added or removed as often as we change build files or source files. > 2. Given the common case for most devs would be to build changes in > the source, It seems to me using our cache better to speed this up would > have a big effect. We can detect changes in inputs that would cause us load > build files. Most of the time the user added/removed code that has no > effect on the actions we would call in the end. Even with changes to > imports/include we don’t need to load build files we already processed. The > Scanner can deal with that for us. > 3. Being smarter about how we store data could help us reduce what we > keep in memory for a non-interactive build. This can help large builds as > having to load a 2-3GB tree takes resources we would rather use on other > items. I think we have options to store information and possible use of > generators to reduce memory overhead and improve build speeds. > 4. Given multiprocessing thinking, the main issue is that we have a > large data tree. Sharing this tree across processes will be slow. We need > to avoid this as much as we can. Using processes to do work that can be > independent as possible and pass state to the main thread about node state > which has the main data structure will work much better. This should have a > positive effect on builder based on Python code as they can build > independently. In all cases of builders, we have to address that I have > seen builder that try to set state in the environment or globally. These > states have to shared or avoided in some way. I not suggesting how to solve > this.. but this will be a design issue to address. > 5. Last item is that no matter how good SCons is.. people will want to > be able to generate build files for a different system. The current logic > for Visual studio, for example, tries to make a makefile project to run > SCons. The users really want to make a MSBuild project. We should do that. > Likewise, we should be better at working with other build system projects. > Having good middleware to allow building or working with an automake or > CMake project will help adoption. CMake is doing well because it is a build > generator, same with Meson. You want to cover your bases with your users. > Systems like these make it easy to do so. > > > > When I was at Intel some of the people helping me made a profiler for > Python in Intel VTune. I believe they are still working on that. It was > useful at making fixes that were not obvious in Parts to get speed > improvements. Since SCons is open source, you can use this tool for free. I > would recommend it as it will give you some incite the default tools will > not provide as well. > > > > > > Jason > > > > > > *From:* Scons-dev [mailto:[email protected]] *On Behalf Of *Andrew > C. Morrow > *Sent:* Friday, July 21, 2017 10:40 AM > *To:* SCons developer list <[email protected]> > *Subject:* [Scons-dev] SCons performance investigations > > > > > > Hi scons-dev - > > > > The following is a revised draft of an email that I had originally > intended to send as a follow up to https://pairlist4.pair.net/ > pipermail/scons-users/2017-June/006018.html. Instead, Bill Deegan and I > took some time to expand on my first draft and add some ideas about how to > address some of th e issues. We hope to migrate this to the wiki, but > wanted to share it here first for feedback. > > > > ---- > > > > Performance is one of the major challenges facing SCons. When compared > with other current options, particularly Ninja, in many cases performance > can lag significantly. That said other options by and large lack the > extensibility and many features of SCons. > > > > Bill Deegan (SCons project co-manager) and I have been working together to > understand some of the issues that lead to poor SCons performance in a real > world (and fairly modestly sized) C++ codebase. Here is a summary of some > of our findings: > > > > - Python code usage: There are many places in the codebase where while > the code is correct, performance based on cpython’s implementation can be > improved by minor changes. > > > - Examples > > > - Using for loops and hashes to uniquify a list. Simple change in Node > class yielded approximately 15% speedup for null build > - Using if x.find(‘some character’) >=0 instead of is ‘some > character’ in x (timeit benchmark shows a 10x speed difference) > > > - Method to address > > > - Profile the code looking for hotspots with cprofile and > line_profiler. Then look for best implementations of code. (Use > timeit if > useful to compare implementations. There are examples of such in the > bench > dir (see: https://bitbucket.org/scons/scons/src/ > 68a8afebafbefcf88217e9e778c1845db4f81823/bench/?at=default > > <https://bitbucket.org/scons/scons/src/68a8afebafbefcf88217e9e778c1845db4f81823/bench/?at=default> > ) > > > - Serial DAG traversal: SCons walks the DAG to find out of date > targets in a serial fashion. Once it finds them, it farms the work out to > other threads, but the DAG walk remains serial. Given the proliferation of > multicore machines since SCons’ initial implementation, a parallel walk of > the DAG would yield significant speedup. Likely this would require > implementation using the multiprocessing python library (instead of > threads), since the GIL would block real parallelism otherwise. Packages > like Boost where there are many header files can cause large increases in > the size of the DAG, exacerbating this issue. There are two serious > consequences of the slow DAG walk: > > > - Incremental rebuilds in large projects. Typical developer workflow > is to edit a file, rebuild, test. In our modestly sized codebase, we see > the incremental time to do an ‘all’ rebuild for a one file change can > reach > well over a minute. This time is completely dominated by the serial > dependency walk. > - Inability to saturate distributed build clusters. In a > distcc/icecream build, the serial DAG walk is slow enough that not > enough > jobs can be farmed out in parallel to saturate even a modest (400 cpu) > build cluster. In our example, using ninja to drive a distributed full > build results in an approximately 15x speedup, but SCons can only > achieve a > 2x speedup. > - Method to address: > > > - Investigate changing tree walk to generator > - Investigate implementing tree walk using multiprocessing > library > > > - The dependency graph is the python object graph: The target > dependency DAG is modeled via python Node Object to Node Object linkages > (e.g. a list of child nodes held in a node). As a result, the only way to > determine up-to-date-ness is by deeply nested method calls that repeatedly > traverse the Python object graph. An attempt is made to mitigate this by > memoizing state at the leaves (e.g. to cache the result of stat calls), but > this still results in a large number of python function invocations for > even the simplest state checks, where a result is already known. Similarly, > the lack of global visibility precludes using externally provided change > information to bypass scans. > > > - See above re generator > - Investigate modeling state separately from the python Node graph > via some sort of centralized scoreboarding mechanism, it seems likely > that > both the function call overhead could be eliminated and that local > knowledge could be propagated globally more effectively. > > > - CacheDir: There are some issues listed below. End-to-end caching > functionality of SCons, including generated files, object files, shared > libraries, whole executables, etc., is one of its great strengths, but its > performance has much room for improvement. > > > - Existing bug(s) when combining CacheDir with MD5-Timestamp devalues > CacheDir. > > > - Bug: http://scons.tigris.org/issues/show_bug.cgi?id=2980 > > > - Performance issues: > > > - CacheDir re-creates signature data when extracting nodes from the > Cache, even though it could have recorded the signature when > entering the > objects into the cache. > > > - Method to address > > > - Store signatures for items in cachedir and then use them directly > when copying items from Cache. > - Fix the CacheDir / MD5-Timestamp integration bug > > > - SConsign generation: The generation of the SConsign file is > monolithic, not incremental. This means that if only one object file > changed, the entire database needs to be re-written. It also appears that > the mechanism used to serialize it is itself slow. Moving to a faster > serialization model would be good, but even better would be to move to a > faster serialization model that also admitted incremental updates to single > items. > > > - Method to address: > > > - Replace sconsign with something faster than the current > implementation, which is based on Pickle. > - And/or Improve sconsign with something which can incrementally > only write that which has changed. > > > - Configure check performance: Even cached Configure checks seems > slow, and for a complexly configured build this can add significant startup > cost. Improvements here would be useful. > > > - Method to address: > > > - Code inspection, look for improvements > - Profile > > > - Variable Substitution: Currently variable substitution, which is > largely used to create the command lines run by SCons, uses an appreciable > percentage (approximately 18% for a null incremental build) of SCons’ CPU > runtime. By and large much of this evaluation is duplicate (and thus > avoidable work). For the moderate sized build discussed above there are > approximately 100k calls to evaluation substitutions. There are only 413 > unique strings to be evaluated. Consider that the CXXCOM variable is > expanded 2412 times for this build. The only variables which are guaranteed > unique are the SOURCES and TARGETS, all others could be evaluated once and > cached. > > > - Prior work on this item: > > > - https://bitbucket.org/scons/scons/wiki/SubstQuoteEscapeCache/ > Discussion > > > - Working doc on current and areas for improvement: > > > - https://bitbucket.org/scons/scons/wiki/SubstQuoteEscapeCache/ > SubstImprovement2017 > > > - Method to address: > > > - Consider pre-evaluating Environment() variables where reasonable. > This could use some sort of copy-on-write between cloned > Environments. This > pre-evaluation would skip known target specific variables > (TARGET,SOURCES,CHANGED_SOURCES, and a few others), so minimally > the per command line substitution should be faster. > > > > Bill and I would appreciate any feedback or thoughts on the above items, > or suggestions for other areas to investigate. We are hoping that by > addressing some or all of these items, the runtime overhead of SCons could > be brought down significantly enough to re-render it competitive with other > build systems. We hope to begin work on the above items once SCons 3.0 has > shipped. > > > > Thanks, > > Andrew > > > > > > > > _______________________________________________ > Scons-dev mailing list > [email protected] > https://pairlist2.pair.net/mailman/listinfo/scons-dev > > > > > _______________________________________________ > Scons-dev mailing list > [email protected] > https://pairlist2.pair.net/mailman/listinfo/scons-dev > > > > > _______________________________________________ > Scons-dev mailing list > [email protected] > https://pairlist2.pair.net/mailman/listinfo/scons-dev > > > > > _______________________________________________ > Scons-dev mailing list > [email protected] > https://pairlist2.pair.net/mailman/listinfo/scons-dev > > > > > > > _______________________________________________ > Scons-dev mailing list > [email protected] > https://pairlist2.pair.net/mailman/listinfo/scons-dev > > > > _______________________________________________ > Scons-dev mailing list > [email protected] > https://pairlist2.pair.net/mailman/listinfo/scons-dev > >
_______________________________________________ Scons-dev mailing list [email protected] https://pairlist2.pair.net/mailman/listinfo/scons-dev
