Have you checked: - the mllib doc for python https://spark.apache.org/docs/1.6.0/api/python/pyspark.mllib.html#pyspark.mllib.linalg.DenseVector - the udf doc https://spark.apache.org/docs/1.6.0/api/python/pyspark.sql.html#pyspark.sql.functions.udf
You should be fine in returning a DenseVector as the return type of the udf, as it provides access to a schema. These are just directions to explore, I haven't used PySpark myself. On Wed, Jan 27, 2016 at 10:38 AM, Stefan Panayotov <spanayo...@msn.com> wrote: > Hi, > > I have defined a UDF in Scala like this: > > import org.apache.spark.mllib.linalg.Vector > import org.apache.spark.mllib.stat.{MultivariateStatisticalSummary, > Statistics} > import org.apache.spark.mllib.linalg.DenseVector > > val determineVector = udf((a: Double, b: Double) => { > val data: Array[Double] = Array(a,b) > val dv = new DenseVector(data) > dv > }) > > How can I write the corresponding function in Pyhton/Pyspark? > > Thanks for your help > > Stefan Panayotov, PhD > Home: 610-355-0919 > Cell: 610-517-5586 > email: spanayo...@msn.com > spanayo...@outlook.com > spanayo...@comcast.net > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org