[GitHub] spark pull request: [SPARK-5654] Integrate SparkR

shivaram Thu, 19 Mar 2015 16:35:27 -0700

GitHub user shivaram opened a pull request:

    https://github.com/apache/spark/pull/5096


    [SPARK-5654] Integrate SparkR

    This pull requests integrates SparkR, an R frontend for Spark. The SparkR 
package contains both RDD and DataFrame APIs in R and is integrated with 
Spark's submission scripts to work on different cluster managers.
    
    Some integration points that would be great to get feedback on:
    
    1. Build procedure: SparkR requires R to be installed on the machine to be 
built. Right now we have a new Maven profile `-PsparkR` that can be used to 
enable SparkR builds
    
    2. YARN cluster mode: The R package that is built needs to be present on 
the driver and all the worker nodes during execution. The R package location is 
currently set using SPARK_HOME, but this might not work on YARN cluster mode.
    
    The SparkR package represents the work of many contributors and attached 
below is a list of people along with areas they worked on
    
    edwardt (@edwart) - Documentation improvements
    Felix Cheung (@felixcheung) - Documentation improvements
    Hossein Falaki (@falaki)  - Documentation improvements
    Chris Freeman (@cafreeman) - DataFrame API, Programming Guide
    Todd Gao (@7c00) - R worker Internals
    Ryan Hafen (@hafen) - SparkR Internals
    Qian Huang (@hqzizania) - RDD API
    Hao Lin (@hlin09) - RDD API, Closure cleaner
    Evert Lammerts (@evertlammerts) - DataFrame API
    Davies Liu (@davies) - DataFrame API, R worker internals, Merging with 
Spark 
    Yi Lu (@lythesia) - RDD API, Worker internals
    Matt Massie (@massie) - Jenkins build
    Harihar Nahak (@hnahak87) - SparkR examples
    Oscar Olmedo (@oscaroboto) - Spark configuration
    Antonio Piccolboni (@piccolbo) - SparkR examples, Namespace bug fixes
    Dan Putler (@dputler) - Dataframe API, SparkR Install Guide
    Ashutosh Raina (@ashutoshraina) - Build improvements
    Josh Rosen (@joshrosen) - Travis CI build
    Sun Rui (@sun-rui)- RDD API, JVM Backend, Shuffle improvements
    Shivaram Venkataraman (@shivaram) - RDD API, JVM Backend, Worker Internals
    Zongheng Yang (@concretevitamin) - RDD API, Pipelined RDDs, Examples and 
EC2 guide

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/amplab-extras/spark R

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5096.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5096
    
----
commit 9aa4acfeb2180b5b7c44302e1500d1bfe0639485
Author: Shivaram Venkataraman <[email protected]>
Date:   2015-02-27T18:56:32Z

    Merge pull request #184 from davies/socket
    
    [SPARKR-155] use socket in R worker

commit 798f4536d9dfb069e0c8f1bbd1fb24be404a7c14
Author: cafreeman <[email protected]>
Date:   2015-02-27T20:04:22Z

    Merge branch 'sparkr-sql' into dev

commit 3b4642980547714373ab1960cb9a096e2fcf233a
Author: Davies Liu <[email protected]>
Date:   2015-02-27T22:07:30Z

    Merge branch 'master' of github.com:amplab-extras/SparkR-pkg into random

commit 5ef66fb8b03a635e309a5004a1b411b50f63ef9c
Author: Davies Liu <[email protected]>
Date:   2015-02-27T22:33:07Z

    send back the port via temporary file

commit 2808dcfd2c0630625a5aa723cf0dbce642cd8f95
Author: cafreeman <[email protected]>
Date:   2015-02-27T23:54:17Z

    Three more DataFrame methods
    
    - `repartition`
    - `distinct`
    - `sampleDF`

commit cad0f0ca8c11ec5b3412b9926c92e89297a31b0a
Author: cafreeman <[email protected]>
Date:   2015-02-28T00:46:58Z

    Fix docs and indents

commit 27dd3a09ce37d8afe385ccda35b425ac5655905c
Author: lythesia <[email protected]>
Date:   2015-02-28T02:00:41Z

    modify tests for repartition

commit 889c265ee41f8faf3ee72e253cf019cb3a9a65a5
Author: cafreeman <[email protected]>
Date:   2015-02-28T02:08:18Z

    numToInt utility function
    
    Added `numToInt` converter function for allowing numeric arguments when 
integers are required. Updated `repartition`.

commit 7b0d070bc0fd18e26d94dfd4dbcc500963faa5bb
Author: lythesia <[email protected]>
Date:   2015-02-28T02:10:35Z

    keep partitions check

commit b0e7f731f4c64daac27a975a87b22c7276bbfe61
Author: cafreeman <[email protected]>
Date:   2015-02-28T02:28:08Z

    Update `sampleDF` test

commit ad0935ef12fc6639a6ce45f1860d0f62c07ae838
Author: lythesia <[email protected]>
Date:   2015-02-28T02:50:34Z

    minor fixes

commit 613464951add64f1f42a1bb814d86c0aa979cc18
Author: Shivaram Venkataraman <[email protected]>
Date:   2015-02-28T03:05:45Z

    Merge pull request #187 from cafreeman/sparkr-sql
    
    Three more DataFrame methods

commit 0346e5fc907aab71aef122e6ddc1b96f93d9abbf
Author: Davies Liu <[email protected]>
Date:   2015-02-28T07:05:42Z

    address comment

commit a00f5029279ca1e14afb4f1b63d91e946bddfd73
Author: lythesia <[email protected]>
Date:   2015-02-28T07:43:58Z

    fix indents

commit e425437d54493d2c687310eb54eb195f01b08252
Author: Shivaram Venkataraman <[email protected]>
Date:   2015-02-28T07:52:49Z

    Merge pull request #177 from lythesia/master
    
    [SPARKR-152] Support functions to change number of RDD partitions 
(coalesce, repartition)

commit 5c72e73fb9e1971b66e359687807490a8fdc4d40
Author: Davies Liu <[email protected]>
Date:   2015-02-28T08:08:51Z

    wait atmost 100 seconds

commit eb8ac119a0e266e656cbd3eeaf44c6722fd66045
Author: Shivaram Venkataraman <[email protected]>
Date:   2015-02-28T08:35:20Z

    Set Spark version 1.3.0 in Windows build

commit abb4bb9da2cfc65ccc9d58f3e48cdf8e3ad20a68
Author: Davies Liu <[email protected]>
Date:   2015-02-28T08:38:16Z

    add Column and expression

commit ae05bf1c1374e454c98f8a4de716b8d8970f46f3
Author: Davies Liu <[email protected]>
Date:   2015-02-28T08:42:19Z

    Merge branch 'sparkr-sql' of github.com:amplab-extras/SparkR-pkg into column
    
    Conflicts:
        pkg/R/utils.R
        pkg/inst/tests/test_sparkSQL.R

commit 7b7248759c228fe8b0d9418447f8e1fd7f71b723
Author: hlin09 <[email protected]>
Date:   2015-03-01T17:20:37Z

    Fix comments.

commit 3f57e56e3f67603bd2fda165370930fd39ad5117
Author: hlin09 <[email protected]>
Date:   2015-03-01T20:43:01Z

    Fix comments.

commit 4d36ab10389a6bccb0385a519ce0ce36dfc46696
Author: hlin09 <[email protected]>
Date:   2015-03-01T21:33:53Z

    Add tests for broadcast variables.

commit 7afa4c9d31fc3a7e9676a75ac51e0983708ccb1a
Author: Shivaram Venkataraman <[email protected]>
Date:   2015-03-01T22:44:59Z

    Merge pull request #186 from hlin09/funcDep3
    
    [SPARKR-142][SPARKR-196] (Step 2) Replaces getDependencies() with 
cleanClosure to capture UDF closures and serialize them to worker.

commit 6e51c7ff25388bcf05776fa1ee353401b31b9443
Author: Shivaram Venkataraman <[email protected]>
Date:   2015-03-01T23:00:24Z

    Fix stderr redirection on executors

commit 8c4deaedc570c2753a2103d59aba20178d9ef777
Author: Shivaram Venkataraman <[email protected]>
Date:   2015-03-01T23:06:29Z

    Remove unused function

commit f7caeb84321f04291214f17a7a6606cb3a0ddee8
Author: Davies Liu <[email protected]>
Date:   2015-03-01T23:11:37Z

    Update SparkRBackend.scala

commit b457833ea90575fb11840a18ff616f2d94be2aeb
Author: Shivaram Venkataraman <[email protected]>
Date:   2015-03-01T23:15:05Z

    Merge pull request #189 from shivaram/stdErrFix
    
    Fix stderr redirection on executors

commit 862f07c337705337ca8719485e6fe301a711bac7
Author: Shivaram Venkataraman <[email protected]>
Date:   2015-03-01T23:20:35Z

    Merge pull request #190 from shivaram/SPARKR-79
    
    [SPARKR-79] Remove unused function

commit 773baf064c923d3f44ea8fdbb5d2f36194245040
Author: Zongheng Yang <[email protected]>
Date:   2015-03-02T00:35:23Z

    Merge pull request #178 from davies/random
    
    [SPARKR-204] use random port in backend

commit 5c0bb24bd77a6e1ed4474144f14b6458cdd2c157
Author: Felix Cheung <[email protected]>
Date:   2015-03-02T06:20:41Z

    Doc updates: build and running on YARN

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5654] Integrate SparkR

Reply via email to