Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22373#discussion_r216193411
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala ---
@@ -256,4 +256,9 @@ class VectorAssemblerSuite
assert(runWithMetadata("keep", additional_filter = "id1 > 2").count()
== 4)
}
+ test("SPARK-25371: VectorAssembler with empty inputCols") {
+ val vectorAssembler = new
VectorAssembler().setInputCols(Array()).setOutputCol("a")
--- End diff --
It doesn't sound that useful, but the JIRA suggests this is the behavior in
2.2. It throws a weird error in 2.3. I could imagine just allowing this
behavior, or throwing a better exception. Is there a use case for no input?
maybe you have some reusable pipeline that is applied to a subset of columns
and sometimes it matches nothing. The output is empty but maybe that doesn't
matter for whatever purpose it serves... maybe it's assembled with something
else afterwards. I could picture a valid use case.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]