[GitHub] [spark] zhengruifeng commented on pull request #29868: [SPARK-32973][ML][DOC] FeatureHasher does not check categoricalCols in inputCols

2020-09-27 Thread GitBox
zhengruifeng commented on pull request #29868: URL: https://github.com/apache/spark/pull/29868#issuecomment-699726267 Thanks @srowen @viirya @huaxingao for reviewing! This is an automated message from the Apache Git Service.

[GitHub] [spark] zhengruifeng commented on pull request #29868: [SPARK-32973][ML][DOC] FeatureHasher does not check categoricalCols in inputCols

2020-09-26 Thread GitBox
zhengruifeng commented on pull request #29868: URL: https://github.com/apache/spark/pull/29868#issuecomment-699571847 > This is still 'required' right? yes. This algorithm was designed to have this requirement. But I guess `must` may suggest an exception/error, so what about

[GitHub] [spark] zhengruifeng commented on pull request #29868: [SPARK-32973][ML][DOC] FeatureHasher does not check categoricalCols in inputCols

2020-09-25 Thread GitBox
zhengruifeng commented on pull request #29868: URL: https://github.com/apache/spark/pull/29868#issuecomment-698693191 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] zhengruifeng commented on pull request #29868: [SPARK-32973][ML][DOC] FeatureHasher does not check categoricalCols in inputCols

2020-09-24 Thread GitBox
zhengruifeng commented on pull request #29868: URL: https://github.com/apache/spark/pull/29868#issuecomment-698722302 @huaxingao Great catch! Yes, I need to modify https://github.com/apache/spark/pull/29850 to make sure only columns in `inputCols` can be taken into account. Thanks!

[GitHub] [spark] zhengruifeng commented on pull request #29868: [SPARK-32973][ML][DOC] FeatureHasher does not check categoricalCols in inputCols

2020-09-24 Thread GitBox
zhengruifeng commented on pull request #29868: URL: https://github.com/apache/spark/pull/29868#issuecomment-698693361 friendly ping @srowen @huaxingao This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] zhengruifeng commented on pull request #29868: [SPARK-32973][ML][DOC] FeatureHasher does not check categoricalCols in inputCols

2020-09-24 Thread GitBox
zhengruifeng commented on pull request #29868: URL: https://github.com/apache/spark/pull/29868#issuecomment-698693191 repl: ``` import org.apache.spark.ml.feature._ import org.apache.spark.ml.linalg.{Vector, Vectors} val df = Seq((2.0, 1, "foo"),(3.0, 2, "bar")).toDF("real",