Re: Python UDFs

Jakob Odersky Wed, 27 Jan 2016 15:03:49 -0800

Have you checked:

- the mllib doc for python
https://spark.apache.org/docs/1.6.0/api/python/pyspark.mllib.html#pyspark.mllib.linalg.DenseVector
- the udf doc 
https://spark.apache.org/docs/1.6.0/api/python/pyspark.sql.html#pyspark.sql.functions.udf


You should be fine in returning a DenseVector as the return type of
the udf, as it provides access to a schema.

These are just directions to explore, I haven't used PySpark myself.

On Wed, Jan 27, 2016 at 10:38 AM, Stefan Panayotov <spanayo...@msn.com> wrote:
> Hi,
>
> I have defined a UDF in Scala like this:
>
> import org.apache.spark.mllib.linalg.Vector
> import org.apache.spark.mllib.stat.{MultivariateStatisticalSummary,
> Statistics}
> import org.apache.spark.mllib.linalg.DenseVector
>
> val determineVector = udf((a: Double, b: Double) => {
>     val data: Array[Double] = Array(a,b)
>     val dv = new DenseVector(data)
>     dv
>   })
>
> How can I write the corresponding function in Pyhton/Pyspark?
>
> Thanks for your help
>
> Stefan Panayotov, PhD
> Home: 610-355-0919
> Cell: 610-517-5586
> email: spanayo...@msn.com
> spanayo...@outlook.com
> spanayo...@comcast.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Python UDFs

Reply via email to