[ https://issues.apache.org/jira/browse/SPARK-25371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Victor Alor updated SPARK-25371: -------------------------------- Description: When `VectorAssembler ` is given an empty array as its inputColumns it throws an opaque error. In versions less than 2.3 `VectorAssembler` it simply appends a column containing empty vectors. {code:java} val inputCols = Array() val outputCols = Array("A") val vectorAssembler = new VectorAssembler() .setInputCols(inputCols) .setOutputCol(outputCols) vectorAssmbler.fit(data).transform(df) {code} In versions 2.3 > this throws the exception below {code:java} org.apache.spark.sql.AnalysisException: cannot resolve 'named_struct()' due to data type mismatch: input to function named_struct requires at least one argument;; {code} Whereas in versions less than 2.3 it just adds a column containing an empty vector. I'm not certain if this is an intentional choice or an actual bug. If this is a bug, the `VectorAssembler` should be modified to append an empty vector column if it detects no inputCols. If it is a design decision it would be nice to throw a human readable exception explicitly stating inputColumns must not be empty. The current error is somewhat opaque. was: When `VectorAssembler ` is given an empty array as its inputColumns it throws an opaque error. In versions less than 2.3 `VectorAssembler` it simply appends a column containing empty vectors. {code:java} val inputCols = Array() val outputCols = Array("A") val vectorAssembler = new VectorAssembler() .setInputCols(inputCols) .setOutputCol(outputCols) vectorAssmbler.fit(data).transform(df) {code} In versions 2.3 > this throws the exception below {code:java} org.apache.spark.sql.AnalysisException: cannot resolve 'named_struct()' due to data type mismatch: input to function named_struct requires at least one argument;; {code} Whereas in versions less than 2.3 it just adds a column containing an empty vector. I'm not certain if this is an intentional choice or an actual bug. If this is a bug, the `VectorAssembler` should be modified to append an empty vector column if it detects no inputCols. If it is a design decision it would be nice to throw a human readable exception explicitly stating inputColumns must not be empty. The current error is somewhat opaque. > Vector Assembler with no input columns throws an exception > ---------------------------------------------------------- > > Key: SPARK-25371 > URL: https://issues.apache.org/jira/browse/SPARK-25371 > Project: Spark > Issue Type: Bug > Components: ML, MLlib > Affects Versions: 2.3.0, 2.3.1 > Reporter: Victor Alor > Priority: Trivial > > When `VectorAssembler ` is given an empty array as its inputColumns it throws > an opaque error. In versions less than 2.3 `VectorAssembler` it simply > appends a column containing empty vectors. > > {code:java} > val inputCols = Array() > val outputCols = Array("A") > val vectorAssembler = new VectorAssembler() > .setInputCols(inputCols) > .setOutputCol(outputCols) > vectorAssmbler.fit(data).transform(df) > {code} > In versions 2.3 > this throws the exception below > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve 'named_struct()' due > to data type mismatch: input to function named_struct requires at least one > argument;; > {code} > Whereas in versions less than 2.3 it just adds a column containing an empty > vector. > I'm not certain if this is an intentional choice or an actual bug. If this is > a bug, the `VectorAssembler` should be modified to append an empty vector > column if it detects no inputCols. > > If it is a design decision it would be nice to throw a human readable > exception explicitly stating inputColumns must not be empty. The current > error is somewhat opaque. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org