On Apr 27, 2013, at 11:34 AM, Adam Seering wrote: > > > On 04/27/2013 09:10 AM, Martin Morgan wrote: >> On 04/26/2013 07:50 AM, Adam Seering wrote: >>> Hi, >>> I've been playing around with the R source code a little; mostly >>> just >>> trying to familiarize myself. I have access to some computers on a >>> reservation >>> system; so I've been reserving a computer, downloading and compiling >>> R, and >>> going from there. >>> >>> I'm finding that R takes a long time to build, though. (Well, >>> ok, maybe 5 >>> minutes -- I'm impatient :-) ) Most of that time, it's sitting there >>> byte-compiling some internal package or another, which uses just one >>> CPU core so >>> leaves the system mostly idle. >>> >>> I'm just curious if anyone has thought about parallelizing that >>> process? >> >> Hi Adam -- parallel builds are supported by adding the '-j' flag when >> you invoke make >> >> make -j >> >> The packages are being built in parallel, in as much as this is possible >> by their dependency structure. Also, you can configure without byte >> compilation, see ~/src/R-devel/configure --help to make this part of the >> build go more quickly. And after an initial build subsets of R, e.g., >> just the 'main' source or a single package like 'stats', can be built >> with (assuming R's source, e.g., from svn, is in ~/src/R-devel, and >> you're building R in ~/bin/R-devel) with >> >> cd ~/bin/R-devel/src/main >> make -j >> cd ~/bin/R-devel/src/library/stats >> make -j >> >> The definitive source for answers to questions like these is >> >> > RShowDoc("R-admin") >> >> Martin > > Hi Martin, > Thanks for the reply -- but I'm afraid the question you've answered > isn't the question that I intended to ask. > > Based on your response, I think the answer to my question is likely > "no." But let me try rephrasing anyway, just in case: > > I'm certainly quite aware of "-j" as a make argument; if I weren't, the > bottleneck would not be the byte-compilation, and the build would take rather > more than 5 minutes :-) That was the very first thing I tried. I don't > believe that parallel make is as parallel as it theoretically could be. (In > fact, I see almost no parallelism between libraries on my system; individual > .c files are parallelized nicely but only one library at a time. This mostly > matters at the compiling-bytecode step, since that's the biggest serial > operation per library.) My question is, has anyone thought about what it > would take to parallelize the build further? >
I think you may have failed to notice that installation of packages *is* parallelized. The *output* is shown only en-bloc and to avoid mixing outputs of the parallel installations. But there are dependencies among packages, so those that require most of the others have to be built last -- nonetheless, in the current R you can install 9 recommended packages in parallel. > I'm not sure that this can be done with just the makefiles. But the > following comment makes me at least a little suspicious: > > """ src/library/Makefile > ## FIXME: do some of this in parallel? > """ > > Surely some of the 'for' loops there could be unwound into proper make > targets with dependency information? I'm not sure if the dependency > information would effectively force a serial compilation anyway, though?... > > Another approach, if the above is hard for some reason: What I'm > seeing is that the byte compilation is largely serial; but as you note, > byte-compilation is optional. Could the makefiles just defer it?; skip it up > front and then do all the byte-compilations for all of the packages > concurrently? The problem is, again, dependencies - you cannot defer the compilation since it would change the package *after* is has already been used by another package which can cause inconsistencies (note that lazy loading is a red herring - it's used regardless of compilation). That said, you won't save significant amount of time anyway (did you actually profile the time or are you relying on your eyes to deceive you? ;)), so it's not worth the bother (try enabling LTO ;)). Personally, I simply disable package compilation for all developments builds. You won't notice the difference for testing anyway. Moreover, you'll be barely doing a full build repeatedly, so the 4 minutes it takes are certainly nothing compared to other projects of such size... It becomes more fun when you start building all CRAN packages ;). Cheers, Simon > From a very cursory read of the code, it looks like the relevant code is in > src/library/tools/R/makeLazyLoad.R?; and that file doesn't immediately look > like it's doing anything that fundamentally couldn't be parallelized? (ie., > running multiple R processes at once, one per library; at a glance the logic > looks nicely per-library.) > > A third approach could be to try to parallelize the logic in > makeLazyLoad.R. I would expect that to be at best much more difficult, > though. > > Anyway, there are lots of things that look like they could in theory be > done here. And I know just enough at this point to be dangerous; not enough > to contribute :-) Hence my asking, has anyone thought about this? If not, I > assume the best thing for me to do would be to poke at it; try to figure out > own my own how this works and what's most feasible. But if anyone has any > pointers, that would likely save me a bunch of time. And if this is > something that you prefer to keep serial for some reason, that would be good > to know too, so I don't spend time on it. > > Thanks, > Adam > > >> >> >>> >>> Thanks, >>> Adam >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel