Xiangrui Meng created SPARK-30154: ------------------------------------- Summary: Allow PySpark code efficiently convert MLlib vectors to dense arrays Key: SPARK-30154 URL: https://issues.apache.org/jira/browse/SPARK-30154 Project: Spark Issue Type: New Feature Components: ML, MLlib, PySpark Affects Versions: 3.0.0 Reporter: Xiangrui Meng
If a PySpark user wants to convert MLlib sparse/dense vectors in a DataFrame into dense arrays, an efficient method is to do that in JVM. However, it requires PySpark user to write Scala code and register it as a UDF. Often this is infeasible for a pure python project. What we can do is to predefine those converters in Scala and expose them in PySpark, e.g.: {code} from pyspark.ml.functions import vector_to_dense_array df.select(vector_to_dense_array(col("features")) {code} cc: [~weichenxu123] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org