[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

sun-rui Tue, 30 Jun 2015 16:10:58 -0700

Github user sun-rui commented on the pull request:

    https://github.com/apache/spark/pull/6743#issuecomment-117369661
  
    @shivaram, the rLibDir parameter of sparkR.init() was intended for locating 
SparkR package on worker nodes at the time when SparkR was a separate project. 
Since now SparkR is a part of Spark, I think this parameter is useless because:
    1. For YARN modes, SparkR package is dynamically shipped to workers and 
located in the current working directory, uses do not need to care about its 
location. This patch is for this purpose;
    2. For standalone mode, SparkR is part of the Spark distribution, and can 
be located within the specific sub-directory of the distribution. This patch 
allows worker specific SPARK_HOME, which means it is not required that 
SPARK_HOME across all workers be the same. While rLibDir parameter assumes all 
workers have the same location of SparkR ( a limitation)
    3. For Meos mode, spark.mesos.executor.home is used to specify location of 
Spark on workers. SparkR can be located relative to this location.
    
    This question makes me think about SparkConf.setSparkHome(), which confuses 
me during creation of this patch. I am not sure how is setting is honored 
across different deployment modes.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

Reply via email to