Github user sun-rui commented on the pull request:

    https://github.com/apache/spark/pull/9390#issuecomment-153225609
  
    @shivaram, there are two changes to JVM to R protocol:
    1. Env variable format used to convey SparkR package path to R. Previously 
a single path for the SparkR package is conveyed, in this PR, a comma-separated 
path list is conveyed. The first element is the path for SparkR package, the 
other is for additional R packages specified via spark-submit command line 
options.
    2. Within sparkR.init(), after launching a JVM backend, a path for 
additional R packages is passed from the JVM backend to R. The path is then 
added into .libPaths() so that the additional R packages can be loaded within R 
environment.
    
    So the basic change is that we have separate path for the SparkR package 
itself and additional R packages. An alternative design is:
    When there are additional R packages, zip them and the SparkR packge into 
an archive in a temporary directory. Then we can distribute only one file to 
cluster and the change No.1 is not needed.
    However, I think this alternative design has advantages:
    a. Each time there are additional R packages, the SparkR packge has to be 
re-zipped.
    b. With standalone mode cluster, actually only additional R packages need 
to be distributed. But in this design, the SparkR packge will be redistributed 
together, which is not necessary, and waste of network traffic.
    
    So I think the current PR is more efficient and potentially more flexible.
    
    No matter which design, change No.2 is necessary is for SparkR shell, 
because there needs to be a way for SparkR shell to access the additional R 
packages (the location of these packages is available after the R shell 
launched.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to