GitHub user sun-rui opened a pull request:
https://github.com/apache/spark/pull/12493
[SPARK-12919][SPARKR] Implement dapply() on DataFrame in SparkR.
## What changes were proposed in this pull request?
dapply() applies an R function on each partition of a DataFrame and returns
a new DataFrame.
The function signature is:
dapply(df, function(localDF) {}, schema = NULL)
R function input: local data.frame from the partition on local node
R function output: local data.frame
Schema specifies the Row format of the resulting DataFrame. It must match
the R function's output.
If schema is not specified, each partition of the result DataFrame will be
serialized in R into a single byte array. Such resulting DataFrame can be
processed by successive calls to dapply().
## How was this patch tested?
SparkR unit tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sun-rui/spark SPARK-12919
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12493.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12493
----
commit ef52c46ef6899248db6a81b11eaf051c22a11d27
Author: Sun Rui <[email protected]>
Date: 2016-04-19T07:29:47Z
[SPARK-12919][SPARKR] Implement dapply() on DataFrame in SparkR.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]