Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20777#discussion_r175184503
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
    @@ -70,19 +70,21 @@ private[feature] trait CountVectorizerParams extends 
Params with HasInputCol wit
       def getMinDF: Double = $(minDF)
     
       /**
    -   * Specifies the maximum number of different documents a term must 
appear in to be included
    -   * in the vocabulary.
    -   * If this is an integer greater than or equal to 1, this specifies the 
number of documents
    -   * the term must appear in; if this is a double in [0,1), then this 
specifies the fraction of
    -   * documents.
    +   * Specifies the maximum number of different documents a term could 
appear in to be included
    +   * in the vocabulary. A term that appears more than the threshold will 
be ignored. If this is an
    +   * integer greater than or equal to 1, this specifies the maximum number 
of documents the term
    +   * could appear in; if this is a double in [0,1), then this specifies 
the maximum fraction of
    +   * documents the term could appear in.
        *
    -   * Default: (2^64^) - 1
    +   * Default: (2^63) - 1
    --- End diff --
    
    I think the format for scaladoc actually needs the extra '^' to display 
right, see the `vocabSize` default.  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to