Github user WeichenXu123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19516#discussion_r146546628
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -291,9 +291,13 @@ final class ChiSqSelectorModel private[ml] (
val featureAttributes: Array[Attribute] = if
(origAttrGroup.attributes.nonEmpty) {
origAttrGroup.attributes.get.zipWithIndex.filter(x =>
selector.contains(x._2)).map(_._1)
} else {
- Array.fill[Attribute](selector.size)(NominalAttribute.defaultAttr)
+ null
--- End diff --
So I suggest keep current code, but you can improve the error description
here
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/util/MetadataUtils.scala#L66:
```
throw new IllegalArgumentException(s"Feature $idx is marked as" +
" Nominal (categorical), but it does not have the number of values
specified.")
```
Add tips here, tell user it can use `VectorIndexer` to process categorical
features first, to solve this issue.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]