Hi, Thanks!!
this works, but I also need mean :) I am finding way.
Regards.
2016-08-16 5:30 GMT-05:00 ayan guha :
> Here is a more generic way of doing this:
>
> from pyspark.sql import Row
> df = sc.parallelize([[1,2,3,4],[10,20,30]]).map(lambda x:
>
Here is a more generic way of doing this:
from pyspark.sql import Row
df = sc.parallelize([[1,2,3,4],[10,20,30]]).map(lambda x:
Row(numbers=x)).toDF()
df.show()
from pyspark.sql.functions import udf
from pyspark.sql.types import IntegerType
u = udf(lambda c: sum(c), IntegerType())
df1 =
Assuming you know the number of elements in the list, this should work:
df.withColumn('total', df["_1"].getItem(0) + df["_1"].getItem(1) +
df["_1"].getItem(2))
Mike
On Mon, Aug 15, 2016 at 12:02 PM, Javier Rey wrote:
> Hi everyone,
>
> I have one dataframe with one column
Hi everyone,
I have one dataframe with one column this column is an array of numbers,
how can I sum each array by row a obtain a new column with sum? in pyspark.
Example:
++
| numbers|
++
|[10, 20, 30]|
|[40, 50, 60]|
|[70, 80, 90]|
++
The idea is obtain