Issue with PySpark UDF on a column of Vectors

2015-06-18 Thread calstad
I am having trouble using a UDF on a column of Vectors in PySpark which can be illustrated here: from pyspark import SparkContext from pyspark.sql import Row from pyspark.sql.types import DoubleType from pyspark.sql.functions import udf from pyspark.mllib.linalg import Vectors FeatureRow = Row('i

Add Custom Aggregate Column to Spark DataFrame

2015-05-28 Thread calstad
I have a Spark DataFrame that looks like: | id | value | bin | |+---+-| | 1 | 3.4 | 2 | | 2 | 2.6 | 1 | | 3 | 1.8 | 1 | | 4 | 9.6 | 2 | I have a function `f` that takes an array of values and returns a number. I want to add a column to the