[jira] [Created] (SPARK-30154) Allow PySpark code efficiently convert MLlib vectors to dense arrays

Xiangrui Meng (Jira) Fri, 06 Dec 2019 09:42:48 -0800

Xiangrui Meng created SPARK-30154:
-------------------------------------

             Summary: Allow PySpark code efficiently convert MLlib vectors to 
dense arrays
                 Key: SPARK-30154
                 URL: https://issues.apache.org/jira/browse/SPARK-30154
             Project: Spark
          Issue Type: New Feature
          Components: ML, MLlib, PySpark
    Affects Versions: 3.0.0
            Reporter: Xiangrui Meng



If a PySpark user wants to convert MLlib sparse/dense vectors in a DataFrame 
into dense arrays, an efficient method is to do that in JVM. However, it 
requires PySpark user to write Scala code and register it as a UDF. Often this 
is infeasible for a pure python project.

What we can do is to predefine those converters in Scala and expose them in 
PySpark, e.g.:

{code}
from pyspark.ml.functions import vector_to_dense_array

df.select(vector_to_dense_array(col("features"))
{code}

cc: [~weichenxu123]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30154) Allow PySpark code efficiently convert MLlib vectors to dense arrays

Reply via email to