Re: Dataset aggregateByKey equivalent

2016-04-25 Thread Lee Becker
On Sat, Apr 23, 2016 at 8:56 AM, Michael Armbrust wrote: > Have you looked at aggregators? > > > https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Aggregator.html > Thanks for the pointer to aggregators. I wasn't yet aware of them. However, I still get similar error

Re: Dataset aggregateByKey equivalent

2016-04-23 Thread Michael Armbrust
Have you looked at aggregators? https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Aggregator.html On Fri, Apr 22, 2016 at 6:45 PM, Lee Becker wrote: > Is there a way to do aggregateByKey on Datasets the way one can on an RDD? > > Consider the following RDD code to b

Dataset aggregateByKey equivalent

2016-04-22 Thread Lee Becker
Is there a way to do aggregateByKey on Datasets the way one can on an RDD? Consider the following RDD code to build a set of KeyVals into a DataFrame containing a column with the KeyVals' keys and a column containing lists of KeyVals. The end goal is to join it with collections which which will b