Re: [Rpy] Multiple R Processes

Nathaniel Smith Wed, 11 Mar 2009 14:24:51 -0700

On Wed, Mar 11, 2009 at 1:04 PM, Mark Larsen <larsen...@gmail.com> wrote:
>
> I realize this has been hammered to death, but I can't seem to get my head 
> around it.
>
> Is the mapping from rpy (or rpy2) to R a one-to-one?


Yes.

> Can either invoke multiple R processes from a single python program?

No (at least not using rpy2 directly).

> In the past I've used python threads and piped stdin/stdout to multiple R 
> processes to take advantage of multi-core servers.  This method is getting 
> tired, though.  Can rpy2 invoke multiple R processes that require no 
> inter-communication?

See above. It's just a limitation of how R itself is written -- if you
want to run two pieces of R code at the same time, then you need two
operating system processes. rpy2 runs R within the same process as
python.

You have two options:
  -- Spawn multiple R processes and talk to them over pipes from python
  -- Spawn multiple Python processes, each of which uses rpy2 to load
R, and talk to them over any Python-level IPC library.
The latter is really a huge improvement over the former, because there
are great IPC libraries for python -- e.g., the pyprocessing library
(included by default in python 2.6). For instance, I have lots of
regression problems to solve, and prefer to use (computationally
expensive) robust regression, so it's very helpful to be able to say
things like:
----
import processing # or 'multiprocessing' in python 2.6
from rpy2.robjects import r
pool = processing.Pool() # automatically spawns one worker per cpu
r.library("robustbase")
def do_lmrob(formula_frame):
  return r.coef(r.lmrob(formula_frame[0], formula_frame[1]))
pool.map(do_regress, [(formula1, frame1), (formula2, frame2), ...]
----

> I see that Nathaniel Smith has a patch to allow concurrent access (from 
> multiple pythong threads) to the same R process.  Would this save me any 
> computational time since my R code pegs the CPU to 100% until completion?

Nope. My patch doesn't really allow concurrent access; it makes sure
that if you write code that attempts to access a single R interpreter
concurrently, that those requests will instead be serialized.

There are some important cases where this is useful, but they're
pretty specific:
  -- It lets one spawn a graphical update thread in the interactive
python interpreter, so that R graphing functions work properly,
without interfering with running interactive R code.
  -- Potentially, it lets one release the GIL while R code is running,
which would let you run R code and Python code in parallel (but not
Python code and Python code, or R code and R code).

-- Nathaniel

------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

Re: [Rpy] Multiple R Processes

Reply via email to