Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19516#discussion_r146546628
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
    @@ -291,9 +291,13 @@ final class ChiSqSelectorModel private[ml] (
         val featureAttributes: Array[Attribute] = if 
(origAttrGroup.attributes.nonEmpty) {
           origAttrGroup.attributes.get.zipWithIndex.filter(x => 
selector.contains(x._2)).map(_._1)
         } else {
    -      Array.fill[Attribute](selector.size)(NominalAttribute.defaultAttr)
    +      null
    --- End diff --
    
    So I suggest keep current code, but you can improve the error description 
here 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/util/MetadataUtils.scala#L66:
    ```
     throw new IllegalArgumentException(s"Feature $idx is marked as" +
    " Nominal (categorical), but it does not have the number of values 
specified.")
    ```
    Add tips here, tell user it can use `VectorIndexer` to process categorical 
features first, to solve this issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to