Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/12836#issuecomment-221109113
@NarineK thank you for working on it, this is a great addition to SparkR!
Regarding the API, it is very close to R's `aggregate` function in the
`stats` package:
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/aggregate.html
We should probably follow this function rather than `gapply`, because it is
part of the very common `stats` package, so R users are probably more familiar
with the function than `gapply`. Furthemore, its description unifies the two
API styles presented in the ticket, and we can add later some support for R
formulas as the standard `aggregate`.
One difference with `gapply` is that the argument function `FUN` does not
get the key as an argument, but this should not be a problem. Then the
aggregation function can decide or not to use the key of the group. It should
be a small change compared to the current PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]