Re: Pyspark - How to add new column to dataframe based on existing column value

2016-02-10 Thread Viktor ARDELEAN
On 10 February 2016 at 13:04, wrote: > Hi Viktor, > > Try to create a UDF. It's quite simple! > > Ardo. > > > On 10 Feb 2016, at 10:34, Viktor ARDELEAN wrote: > > Hello, > > I want to add a new String column to the dataframe based on an existing > co

Pyspark - How to add new column to dataframe based on existing column value

2016-02-10 Thread Viktor ARDELEAN
, line 1, in AttributeError: 'Column' object has no attribute 'replace' So in fact I need somehow to get the value of the column df.str in order to call replace on it. Any ideas how to do this? -- Viktor ARDELEAN *P* Don't print this email, unless it's really necessary. Take care of the environment.

Pyspark - how to use UDFs with dataframe groupby

2016-02-09 Thread Viktor ARDELEAN
Hello, I am using following transformations on RDD: rddAgg = df.map(lambda l: (Row(a = l.a, b= l.b, c = l.c), l))\ .aggregateByKey([], lambda accumulatorList, value: accumulatorList + [value], lambda list1, list2: [list1] + [list2]) I want to use the dataframe groupBy + agg transforma