[GitHub] [spark] mengxr commented on a change in pull request #27522: [SPARK-30762] Add dtype=float32 support to vector_to_array UDF

GitBox Wed, 12 Feb 2020 10:34:14 -0800

mengxr commented on a change in pull request #27522: [SPARK-30762] Add 
dtype=float32 support to vector_to_array UDF
URL: https://github.com/apache/spark/pull/27522#discussion_r378435400


 ##########
 File path: mllib/src/test/scala/org/apache/spark/ml/FunctionsSuite.scala
 ##########
 @@ -61,5 +59,34 @@ class FunctionsSuite extends MLTest {
         "`org.apache.spark.ml.linalg.Vector` or 
`org.apache.spark.mllib.linalg.Vector`, " +
         s"but got ${valType}"))
     }
+
+    val df3 = Seq(
+      (Vectors.dense(1.0, 2.0, 3.0), OldVectors.dense(10.0, 20.0, 30.0)),
+      (Vectors.sparse(3, Seq((0, 2.0), (2, 3.0))), OldVectors.sparse(3, 
Seq((0, 20.0), (2, 30.0))))
+    ).toDF("vec", "oldVec")
+    val df_array_float = df3.select(
+      vector_to_array('vec, dtype = "float32"), vector_to_array('oldVec, dtype 
= "float32"))
+
+    // Check values are correct
+    val result3 = df_array_float.as[(Seq[Float], Seq[Float])].collect().toSeq
+
+    val expected3 = Seq(
+      (Seq(1.0, 2.0, 3.0), Seq(10.0, 20.0, 30.0)),
+      (Seq(2.0, 0.0, 3.0), Seq(20.0, 0.0, 30.0))
+    )
+    assert(result3 === expected3)
+
+    // Check data types are correct
+    df_array_double.schema.fields(0).dataType.simpleString == "array<double>"
 
 Review comment:
   * there is no check here
   * you can assert on the simple string of the entire schema directly:
   
   ~~~
   assert(df.schema.simpleString === "...")
   ~~~

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mengxr commented on a change in pull request #27522: [SPARK-30762] Add dtype=float32 support to vector_to_array UDF

Reply via email to