[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

thunterdb Mon, 23 May 2016 15:00:48 -0700

Github user thunterdb commented on the pull request:

    https://github.com/apache/spark/pull/12836#issuecomment-221109113
  
    @NarineK thank you for working on it, this is a great addition to SparkR!
    
    Regarding the API, it is very close to R's `aggregate` function in the 
`stats` package:
    https://stat.ethz.ch/R-manual/R-devel/library/stats/html/aggregate.html
    
    We should probably follow this function rather than `gapply`, because it is 
part of the very common `stats` package, so R users are probably more familiar 
with the function than `gapply`. Furthemore, its description unifies the two 
API styles presented in the ticket, and we can add later some support for R 
formulas as the standard `aggregate`.
    
    One difference with `gapply` is that the argument function `FUN` does not 
get the key as an argument, but this should not be a problem. Then the 
aggregation function can decide or not to use the key of the group. It should 
be a small change compared to the current PR.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

Reply via email to