[jira] [Commented] (SPARK-3312) Add a groupByKey which returns a special GroupBy object like in pandas

2016-10-07 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557182#comment-15557182
 ] 

holdenk commented on SPARK-3312:


I'm going to go ahead and close this, now that `Datasets` are here they pretty 
much do a much better version of this than we could have made with RDDs.

> Add a groupByKey which returns a special GroupBy object like in pandas
> --
>
> Key: SPARK-3312
> URL: https://issues.apache.org/jira/browse/SPARK-3312
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: holdenk
>Priority: Minor
>
> A common pattern which causes problems for new Spark users is using 
> groupByKey followed by a reduce. I'd like to make a special version of 
> groupByKey which returns a groupBy object (like the Panda's groupby object). 
> The resulting class would have a number of functions (min,max, stats, reduce) 
> which could all be implemented efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3312) Add a groupByKey which returns a special GroupBy object like in pandas

2015-02-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341727#comment-14341727
 ] 

Sean Owen commented on SPARK-3312:
--

Interesting, is the reduce / max / min in question here by key? We have the 
{{stats()}} method for RDDs of {{Double}} already to take care of this for a 
whole RDD. Rather than add an API method for the by-key case, it's possible to 
use {{StatCounter}} to compute all of these at once over a bunch of values that 
have been collected by key. Does that do the trick or is this something more?

 Add a groupByKey which returns a special GroupBy object like in pandas
 --

 Key: SPARK-3312
 URL: https://issues.apache.org/jira/browse/SPARK-3312
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: holdenk
Priority: Minor

 A common pattern which causes problems for new Spark users is using 
 groupByKey followed by a reduce. I'd like to make a special version of 
 groupByKey which returns a groupBy object (like the Panda's groupby object). 
 The resulting class would have a number of functions (min,max, stats, reduce) 
 which could all be implemented efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org