On 04/27/2013 09:10 AM, Martin Morgan wrote:
On 04/26/2013 07:50 AM, Adam Seering wrote:
Hi,
     I've been playing around with the R source code a little; mostly
just
trying to familiarize myself.  I have access to some computers on a
reservation
system; so I've been reserving a computer, downloading and compiling
R, and
going from there.

     I'm finding that R takes a long time to build, though.  (Well,
ok, maybe 5
minutes -- I'm impatient :-) )  Most of that time, it's sitting there
byte-compiling some internal package or another, which uses just one
CPU core so
leaves the system mostly idle.

     I'm just curious if anyone has thought about parallelizing that
process?

Hi Adam -- parallel builds are supported by adding the '-j' flag when
you invoke make

   make -j

The packages are being built in parallel, in as much as this is possible
by their dependency structure. Also, you can configure without byte
compilation, see ~/src/R-devel/configure --help to make this part of the
build go more quickly. And after an initial build subsets of R, e.g.,
just the 'main' source or a single package like 'stats', can be built
with (assuming R's source, e.g., from svn, is in ~/src/R-devel, and
you're building R in ~/bin/R-devel) with

   cd ~/bin/R-devel/src/main
   make -j
   cd ~/bin/R-devel/src/library/stats
   make -j

The definitive source for answers to questions like these is

   > RShowDoc("R-admin")

Martin

Hi Martin,
Thanks for the reply -- but I'm afraid the question you've answered isn't the question that I intended to ask.

Based on your response, I think the answer to my question is likely "no." But let me try rephrasing anyway, just in case:

I'm certainly quite aware of "-j" as a make argument; if I weren't, the bottleneck would not be the byte-compilation, and the build would take rather more than 5 minutes :-) That was the very first thing I tried. I don't believe that parallel make is as parallel as it theoretically could be. (In fact, I see almost no parallelism between libraries on my system; individual .c files are parallelized nicely but only one library at a time. This mostly matters at the compiling-bytecode step, since that's the biggest serial operation per library.) My question is, has anyone thought about what it would take to parallelize the build further?

I'm not sure that this can be done with just the makefiles. But the following comment makes me at least a little suspicious:

""" src/library/Makefile
## FIXME: do some of this in parallel?
"""

Surely some of the 'for' loops there could be unwound into proper make targets with dependency information? I'm not sure if the dependency information would effectively force a serial compilation anyway, though?...

Another approach, if the above is hard for some reason: What I'm seeing is that the byte compilation is largely serial; but as you note, byte-compilation is optional. Could the makefiles just defer it?; skip it up front and then do all the byte-compilations for all of the packages concurrently? From a very cursory read of the code, it looks like the relevant code is in src/library/tools/R/makeLazyLoad.R?; and that file doesn't immediately look like it's doing anything that fundamentally couldn't be parallelized? (ie., running multiple R processes at once, one per library; at a glance the logic looks nicely per-library.)

A third approach could be to try to parallelize the logic in makeLazyLoad.R. I would expect that to be at best much more difficult, though.

Anyway, there are lots of things that look like they could in theory be done here. And I know just enough at this point to be dangerous; not enough to contribute :-) Hence my asking, has anyone thought about this? If not, I assume the best thing for me to do would be to poke at it; try to figure out own my own how this works and what's most feasible. But if anyone has any pointers, that would likely save me a bunch of time. And if this is something that you prefer to keep serial for some reason, that would be good to know too, so I don't spend time on it.

Thanks,
Adam





Thanks,
Adam

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to