Github user sun-rui commented on the pull request:
https://github.com/apache/spark/pull/6743#issuecomment-117369661
@shivaram, the rLibDir parameter of sparkR.init() was intended for locating
SparkR package on worker nodes at the time when SparkR was a separate project.
Since now SparkR is a part of Spark, I think this parameter is useless because:
1. For YARN modes, SparkR package is dynamically shipped to workers and
located in the current working directory, uses do not need to care about its
location. This patch is for this purpose;
2. For standalone mode, SparkR is part of the Spark distribution, and can
be located within the specific sub-directory of the distribution. This patch
allows worker specific SPARK_HOME, which means it is not required that
SPARK_HOME across all workers be the same. While rLibDir parameter assumes all
workers have the same location of SparkR ( a limitation)
3. For Meos mode, spark.mesos.executor.home is used to specify location of
Spark on workers. SparkR can be located relative to this location.
This question makes me think about SparkConf.setSparkHome(), which confuses
me during creation of this patch. I am not sure how is setting is honored
across different deployment modes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]