[GitHub] spark pull request #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to ...

srowen Thu, 15 Mar 2018 12:52:22 -0700

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20777#discussion_r174911085
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
    @@ -70,19 +70,21 @@ private[feature] trait CountVectorizerParams extends 
Params with HasInputCol wit
       def getMinDF: Double = $(minDF)
     
       /**
    -   * Specifies the maximum number of different documents a term must 
appear in to be included
    -   * in the vocabulary.
    -   * If this is an integer greater than or equal to 1, this specifies the 
number of documents
    -   * the term must appear in; if this is a double in [0,1), then this 
specifies the fraction of
    -   * documents.
    +   * Specifies the maximum number of different documents a term could 
appear in to be included
    +   * in the vocabulary. A term that appears more than the threshold will 
be ignored. If this is an
    +   * integer greater than or equal to 1, this specifies the maximum number 
of documents the term
    +   * could appear in; if this is a double in [0,1), then this specifies 
the maximum fraction of
    +   * documents the term could appear in.
    --- End diff --
    
    Agree, your wording is clearer.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20777: [SPARK-23615][ML][PYSPARK]Add maxDF Parameter to ...

Reply via email to