Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20777#discussion_r173336451
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -455,6 +506,12 @@ class CountVectorizer(JavaEstimator, HasInputCol, 
HasOutputCol, JavaMLReadable,
             " If this is an integer >= 1, this specifies the number of 
documents the term must" +
             " appear in; if this is a double in [0,1), then this specifies the 
fraction of documents." +
             " Default 1.0", typeConverter=TypeConverters.toFloat)
    +    maxDF = Param(
    +        Params._dummy(), "maxDF", "Specifies the minimum number of" +
    +        " different documents a term must appear in to be included in the 
vocabulary." +
    +        " If this is an integer >= 1, this specifies the number of 
documents the term must" +
    +        " appear in; if this is a double in [0,1), then this specifies the 
fraction of documents." +
    +        " Default (2^63) - 1", typeConverter=TypeConverters.toFloat)
    --- End diff --
    
    I think this documentation is exactly the same as `minDF`, please refer to 
the scala docs.  Actually, I think the scala doc is a little confusing and 
could be clearer.  Would you like to take a shot at rewording it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to