Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20777#discussion_r174911085
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
@@ -70,19 +70,21 @@ private[feature] trait CountVectorizerParams extends
Params with HasInputCol wit
def getMinDF: Double = $(minDF)
/**
- * Specifies the maximum number of different documents a term must
appear in to be included
- * in the vocabulary.
- * If this is an integer greater than or equal to 1, this specifies the
number of documents
- * the term must appear in; if this is a double in [0,1), then this
specifies the fraction of
- * documents.
+ * Specifies the maximum number of different documents a term could
appear in to be included
+ * in the vocabulary. A term that appears more than the threshold will
be ignored. If this is an
+ * integer greater than or equal to 1, this specifies the maximum number
of documents the term
+ * could appear in; if this is a double in [0,1), then this specifies
the maximum fraction of
+ * documents the term could appear in.
--- End diff --
Agree, your wording is clearer.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]