Github user ymazari commented on a diff in the pull request:
https://github.com/apache/spark/pull/20367#discussion_r163358747
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -169,7 +201,7 @@ class CountVectorizer @Since("1.5.0") (@Since("1.5.0")
override val uid: String)
}.reduceByKey { case ((wc1, df1), (wc2, df2)) =>
(wc1 + wc2, df1 + df2)
}.filter { case (word, (wc, df)) =>
- df >= minDf
+ (df >= minDf) && (df <= maxDf)
--- End diff --
> from @mgaido91: nit: the parenthesis are not needed
Right. I added them for the purpose of clarity.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]