[GitHub] spark pull request #22373: [SPARK-25371][ML] VectorAssembler should not fail...

srowen Sun, 09 Sep 2018 21:04:42 -0700

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22373#discussion_r216193411
  
    --- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala ---
    @@ -256,4 +256,9 @@ class VectorAssemblerSuite
         assert(runWithMetadata("keep", additional_filter = "id1 > 2").count() 
== 4)
       }
     
    +  test("SPARK-25371: VectorAssembler with empty inputCols") {
    +    val vectorAssembler = new 
VectorAssembler().setInputCols(Array()).setOutputCol("a")
    --- End diff --
    
    It doesn't sound that useful, but the JIRA suggests this is the behavior in 
2.2. It throws a weird error in 2.3. I could imagine just allowing this 
behavior, or throwing a better exception. Is there a use case for no input? 
maybe you have some reusable pipeline that is applied to a subset of columns 
and sometimes it matches nothing. The output is empty but maybe that doesn't 
matter for whatever purpose it serves... maybe it's assembled with something 
else afterwards. I could picture a valid use case.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22373: [SPARK-25371][ML] VectorAssembler should not fail...

Reply via email to