Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22391#discussion_r216793201
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala ---
    @@ -147,4 +147,12 @@ class VectorAssemblerSuite
           .filter(vectorUDF($"features") > 1)
           .count() == 1)
       }
    +
    +  test("SPARK-25371: VectorAssembler with empty inputCols") {
    +    val inputDF = Seq(
    +      (1, Vectors.dense(1.0, 2.0)), (2, Vectors.sparse(2, Array(1), 
Array(3.0)))).toDF("i", "v")
    +    val vectorAssembler = new 
VectorAssembler().setInputCols(Array()).setOutputCol("a")
    +    val output = vectorAssembler.transform(inputDF)
    +    assert(output.select("a").limit(1).collect().head == 
Row(Vectors.sparse(0, Seq.empty)))
    +  }
    --- End diff --
    
    Since `inputDF` is not important here, can we minimize the change like the 
following? The following will look more similar with the original one.
    ```scala
      test("SPARK-25371: VectorAssembler with empty inputCols") {
        val vectorAssembler = new 
VectorAssembler().setInputCols(Array()).setOutputCol("a")
        val output = vectorAssembler.transform(Seq(1).toDF("x"))
        assert(output.select("a").limit(1).collect().head == 
Row(Vectors.sparse(0, Seq.empty)))
      }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to