Yes, JRI loads an R dynamic library into the executor JVM, which faces 
thread-safe issue when there are multiple task threads within the executor.

If you are running Spark on Standalone mode, it is possible to run multiple 
workers per node, and at the same time, limit the cores per worker to be 1. 

You could use RDD.pipe(), but you may need handle binary-text conversion as the 
input/output to/from the R process is string-based.

I am thinking if the demand like yours (calling R code in RDD transformations) 
is much desired, we may consider refactoring RRDD for this purpose, although it 
is currently intended for internal use by SparkR and not a public API. 

-----Original Message-----
From: Simon Hafner [mailto:reactorm...@gmail.com] 
Sent: Monday, February 15, 2016 5:09 AM
To: user <user@spark.apache.org>
Subject: Running synchronized JRI code

Hello

I'm currently running R code in an executor via JRI. Because R is 
single-threaded, any call to R needs to be wrapped in a `synchronized`. Now I 
can use a bit more than one core per executor, which is undesirable. Is there a 
way to tell spark that this specific application (or even specific UDF) needs 
multiple JVMs? Or should I switch from JRI to a pipe-based (slower) setup?

Cheers,
Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org

Reply via email to