Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/20777#discussion_r173336451
--- Diff: python/pyspark/ml/feature.py ---
@@ -455,6 +506,12 @@ class CountVectorizer(JavaEstimator, HasInputCol,
HasOutputCol, JavaMLReadable,
" If this is an integer >= 1, this specifies the number of
documents the term must" +
" appear in; if this is a double in [0,1), then this specifies the
fraction of documents." +
" Default 1.0", typeConverter=TypeConverters.toFloat)
+ maxDF = Param(
+ Params._dummy(), "maxDF", "Specifies the minimum number of" +
+ " different documents a term must appear in to be included in the
vocabulary." +
+ " If this is an integer >= 1, this specifies the number of
documents the term must" +
+ " appear in; if this is a double in [0,1), then this specifies the
fraction of documents." +
+ " Default (2^63) - 1", typeConverter=TypeConverters.toFloat)
--- End diff --
I think this documentation is exactly the same as `minDF`, please refer to
the scala docs. Actually, I think the scala doc is a little confusing and
could be clearer. Would you like to take a shot at rewording it?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]