subject:"How to concat few rows into a new column in dataframe"

Re: How to concat few rows into a new column in dataframe

2016-01-06 Thread Sabarish Sasidharan

You can just repartition by the id, if the final objective is to have all data for the same key in the same partition. Regards Sab On Wed, Jan 6, 2016 at 11:02 AM, Gavin Yue wrote: > I found that in 1.6 dataframe could do repartition. > > Should I still need to do orderby first or I just have t

Re: How to concat few rows into a new column in dataframe

2016-01-05 Thread Gavin Yue

I found that in 1.6 dataframe could do repartition. Should I still need to do orderby first or I just have to repartition? On Tue, Jan 5, 2016 at 9:25 PM, Gavin Yue wrote: > I tried the Ted's solution and it works. But I keep hitting the JVM out > of memory problem. > And grouping the key

Re: How to concat few rows into a new column in dataframe

2016-01-05 Thread Gavin Yue

I tried the Ted's solution and it works. But I keep hitting the JVM out of memory problem. And grouping the key causes a lot of data shuffling. So I am trying to order the data based on ID first and save as Parquet. Is there way to make sure that the data is partitioned that each ID's data is

Re: How to concat few rows into a new column in dataframe

2016-01-05 Thread Michael Armbrust

This would also be possible with an Aggregator in Spark 1.6: https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Aggregator.html On Tue, Jan 5, 2016 at 2:59 PM, Ted Yu wrote: > Something like the following: > > val zeroValue = collection.mutable.Set[String]() > > val a

Re: How to concat few rows into a new column in dataframe

2016-01-05 Thread Ted Yu

Something like the following: val zeroValue = collection.mutable.Set[String]() val aggredated = data.aggregateByKey (zeroValue)((set, v) => set += v, (setOne, setTwo) => setOne ++= setTwo) On Tue, Jan 5, 2016 at 2:46 PM, Gavin Yue wrote: > Hey, > > For example, a table df with two columns > id

How to concat few rows into a new column in dataframe

2016-01-05 Thread Gavin Yue

Hey, For example, a table df with two columns id name 1 abc 1 bdf 2 ab 2 cd I want to group by the id and concat the string into array of string. like this id 1 [abc,bdf] 2 [ab, cd] How could I achieve this in dataframe? I stuck on df.groupBy("id"). ??? Thanks

Re: How to concat few rows into a new column in dataframe

Re: How to concat few rows into a new column in dataframe

Re: How to concat few rows into a new column in dataframe

Re: How to concat few rows into a new column in dataframe

Re: How to concat few rows into a new column in dataframe

How to concat few rows into a new column in dataframe

6 matches

Site Navigation

Mail list logo

Footer information