Hi, Thanks!! this works, but I also need mean :) I am finding way.
Regards. 2016-08-16 5:30 GMT-05:00 ayan guha <guha.a...@gmail.com>: > Here is a more generic way of doing this: > > from pyspark.sql import Row > df = sc.parallelize([[1,2,3,4],[10,20,30]]).map(lambda x: > Row(numbers=x)).toDF() > df.show() > from pyspark.sql.functions import udf > from pyspark.sql.types import IntegerType > u = udf(lambda c: sum(c), IntegerType()) > df1 = df.withColumn("s",u(df.numbers)) > df1.show() > > On Tue, Aug 16, 2016 at 11:50 AM, Mike Metzger <m...@flexiblecreations.com > > wrote: > >> Assuming you know the number of elements in the list, this should work: >> >> df.withColumn('total', df["_1"].getItem(0) + df["_1"].getItem(1) + >> df["_1"].getItem(2)) >> >> Mike >> >> On Mon, Aug 15, 2016 at 12:02 PM, Javier Rey <jre...@gmail.com> wrote: >> >>> Hi everyone, >>> >>> I have one dataframe with one column this column is an array of numbers, >>> how can I sum each array by row a obtain a new column with sum? in pyspark. >>> >>> Example: >>> >>> +------------+ >>> | numbers| >>> +------------+ >>> |[10, 20, 30]| >>> |[40, 50, 60]| >>> |[70, 80, 90]| >>> +------------+ >>> >>> The idea is obtain the same df with a new column with totals: >>> >>> +------------+------ >>> | numbers| | >>> +------------+------ >>> |[10, 20, 30]|60 | >>> |[40, 50, 60]|150 | >>> |[70, 80, 90]|240 | >>> +------------+------ >>> >>> Regards! >>> >>> Samir >>> >>> >>> >>> >> > > > -- > Best Regards, > Ayan Guha >