This would also be possible with an Aggregator in Spark 1.6: https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Aggregator.html
On Tue, Jan 5, 2016 at 2:59 PM, Ted Yu <[email protected]> wrote: > Something like the following: > > val zeroValue = collection.mutable.Set[String]() > > val aggredated = data.aggregateByKey (zeroValue)((set, v) => set += v, > (setOne, setTwo) => setOne ++= setTwo) > > On Tue, Jan 5, 2016 at 2:46 PM, Gavin Yue <[email protected]> wrote: > >> Hey, >> >> For example, a table df with two columns >> id name >> 1 abc >> 1 bdf >> 2 ab >> 2 cd >> >> I want to group by the id and concat the string into array of string. >> like this >> >> id >> 1 [abc,bdf] >> 2 [ab, cd] >> >> How could I achieve this in dataframe? I stuck on df.groupBy("id"). ??? >> >> Thanks >> >> >
