Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22391#discussion_r216793201
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala ---
@@ -147,4 +147,12 @@ class VectorAssemblerSuite
.filter(vectorUDF($"features") > 1)
.count() == 1)
}
+
+ test("SPARK-25371: VectorAssembler with empty inputCols") {
+ val inputDF = Seq(
+ (1, Vectors.dense(1.0, 2.0)), (2, Vectors.sparse(2, Array(1),
Array(3.0)))).toDF("i", "v")
+ val vectorAssembler = new
VectorAssembler().setInputCols(Array()).setOutputCol("a")
+ val output = vectorAssembler.transform(inputDF)
+ assert(output.select("a").limit(1).collect().head ==
Row(Vectors.sparse(0, Seq.empty)))
+ }
--- End diff --
Since `inputDF` is not important here, can we minimize the change like the
following? The following will look more similar with the original one.
```scala
test("SPARK-25371: VectorAssembler with empty inputCols") {
val vectorAssembler = new
VectorAssembler().setInputCols(Array()).setOutputCol("a")
val output = vectorAssembler.transform(Seq(1).toDF("x"))
assert(output.select("a").limit(1).collect().head ==
Row(Vectors.sparse(0, Seq.empty)))
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]