GitHub user NarineK opened a pull request:
https://github.com/apache/spark/pull/12836
[SPARK-12922][SparkR] Implement gapply() on DataFrame in SparkR [WIP]
## What changes were proposed in this pull request?
gapply() applies an R function on groups grouped by one or more columns of
a DataFrame, and returns a DataFrame. It is like GroupedDataSet.flatMapGroups()
in the Dataset API.
Due to current limitations on data frames, Iâve implemented and tested
gapply for one column. It can be later extended to multiple columns.
Please, let me know what do you think and if you have any ideas to improve
it.
## How was this patch tested?
Unit tests.
1. Primitive test with different column types
2. Add a boolean column
3. remove columns and do operations
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/NarineK/spark gapply2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12836.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12836
----
commit 19dcb2dccd5fca4e9014dac6fb57e38f56165530
Author: NarineK <[email protected]>
Date: 2016-05-02T06:31:15Z
First commit gapply
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]