This would also be possible with an Aggregator in Spark 1.6:
https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Aggregator.html

On Tue, Jan 5, 2016 at 2:59 PM, Ted Yu <[email protected]> wrote:

> Something like the following:
>
> val zeroValue = collection.mutable.Set[String]()
>
> val aggredated = data.aggregateByKey (zeroValue)((set, v) => set += v,
> (setOne, setTwo) => setOne ++= setTwo)
>
> On Tue, Jan 5, 2016 at 2:46 PM, Gavin Yue <[email protected]> wrote:
>
>> Hey,
>>
>> For example, a table df with two columns
>> id  name
>> 1   abc
>> 1   bdf
>> 2   ab
>> 2   cd
>>
>> I want to group by the id and concat the string into array of string.
>> like this
>>
>> id
>> 1 [abc,bdf]
>> 2 [ab, cd]
>>
>> How could I achieve this in dataframe?  I stuck on df.groupBy("id"). ???
>>
>> Thanks
>>
>>
>

Reply via email to