GitHub user huaxingao opened a pull request:
https://github.com/apache/spark/pull/20777
[SPARK-23615][ML][PYSPARK]Add maxDF Parameter to Python CountVectorizer
## What changes were proposed in this pull request?
The maxDF parameter is for filtering out frequently occurring terms. This
param was recently added to the Scala CountVectorizer and needs to be added to
Python also.
## How was this patch tested?
add doctest
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/huaxingao/spark spark-23615
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20777.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20777
----
commit cbf70bb9ff874af3b6fa76871798767c0174c266
Author: Huaxin Gao <huaxing@...>
Date: 2018-03-08T22:29:32Z
[SPARK-23615][ML][PYSPARK]Add maxDF Parameter to Python CountVectorizer
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]