[GitHub] spark pull request: [SPARK-8313] R Spark packages support

sun-rui Thu, 16 Jul 2015 01:42:46 -0700

Github user sun-rui commented on the pull request:

    https://github.com/apache/spark/pull/7139#issuecomment-121880406
  
    @brkyvz, could you give more explaination on your usage scenario that this 
PR is expected to support?
    
    1. This PR introduces a manifest keyword, a hybrid JAR format containing 
both R code and java classes that the R code may depend on. It feels not so 
natural, I'd rather like:
       use --jars or --packages to specify JARs purely containing JAVA classes
       introduces new spark-submit flags like --R-src-packages / 
--R-binary-packages allowing user to specify R packges to be used. No hybrid 
format is required.
    
    2. This PR installs the R packges only in the local host, which makes it 
less useful for production cluster environment. For example, for YARN cluster/ 
Standalone Cluster mode, It is still required that R package to be installed on 
Driver node (assume DataFrame API). So I would hope support for distributing R 
source packages or binary packages to Driver and worker nodes (also need to 
install source packages). Need further discussion on this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8313] R Spark packages support

Reply via email to