subject:"Sum array values by row in new column"

Re: Sum array values by row in new column

2016-08-16 Thread Javier Rey

Hi, Thanks!! this works, but I also need mean :) I am finding way. Regards. 2016-08-16 5:30 GMT-05:00 ayan guha : > Here is a more generic way of doing this: > > from pyspark.sql import Row > df = sc.parallelize([[1,2,3,4],[10,20,30]]).map(lambda x: >

Re: Sum array values by row in new column

2016-08-16 Thread ayan guha

Here is a more generic way of doing this: from pyspark.sql import Row df = sc.parallelize([[1,2,3,4],[10,20,30]]).map(lambda x: Row(numbers=x)).toDF() df.show() from pyspark.sql.functions import udf from pyspark.sql.types import IntegerType u = udf(lambda c: sum(c), IntegerType()) df1 =

Re: Sum array values by row in new column

2016-08-15 Thread Mike Metzger

Assuming you know the number of elements in the list, this should work: df.withColumn('total', df["_1"].getItem(0) + df["_1"].getItem(1) + df["_1"].getItem(2)) Mike On Mon, Aug 15, 2016 at 12:02 PM, Javier Rey wrote: > Hi everyone, > > I have one dataframe with one column

Sum array values by row in new column

2016-08-15 Thread Javier Rey

Hi everyone, I have one dataframe with one column this column is an array of numbers, how can I sum each array by row a obtain a new column with sum? in pyspark. Example: ++ | numbers| ++ |[10, 20, 30]| |[40, 50, 60]| |[70, 80, 90]| ++ The idea is obtain