Github user sun-rui commented on the pull request:
https://github.com/apache/spark/pull/9390#issuecomment-153225609
@shivaram, there are two changes to JVM to R protocol:
1. Env variable format used to convey SparkR package path to R. Previously
a single path for the SparkR package is conveyed, in this PR, a comma-separated
path list is conveyed. The first element is the path for SparkR package, the
other is for additional R packages specified via spark-submit command line
options.
2. Within sparkR.init(), after launching a JVM backend, a path for
additional R packages is passed from the JVM backend to R. The path is then
added into .libPaths() so that the additional R packages can be loaded within R
environment.
So the basic change is that we have separate path for the SparkR package
itself and additional R packages. An alternative design is:
When there are additional R packages, zip them and the SparkR packge into
an archive in a temporary directory. Then we can distribute only one file to
cluster and the change No.1 is not needed.
However, I think this alternative design has advantages:
a. Each time there are additional R packages, the SparkR packge has to be
re-zipped.
b. With standalone mode cluster, actually only additional R packages need
to be distributed. But in this design, the SparkR packge will be redistributed
together, which is not necessary, and waste of network traffic.
So I think the current PR is more efficient and potentially more flexible.
No matter which design, change No.2 is necessary is for SparkR shell,
because there needs to be a way for SparkR shell to access the additional R
packages (the location of these packages is available after the R shell
launched.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]