Github user hhbyyh commented on the pull request:

    https://github.com/apache/spark/pull/7388#issuecomment-131104897
  
    Thanks @jkbradley for the update. Copy the comment of Joseph from 
https://github.com/hhbyyh/spark/pull/3 to here:
    
    Updates:
    
    Renamed "minCount" to "minTokenCount"
    Added "minTermFreq" back, including unit test
    Moved all Params to include in both Estimator and Model so that they can be 
viewed in either.
    New thoughts:
    
    Previously, we had decided to call "minTermFreq" a "frequency" even though 
it is a count. That was to follow other things such as sklearn and "TF-IDF."
    Now, I would like to rename it to "minTermCount." Yuhao got me thinking 
along these lines, and it actually matches other libraries to some extent:
    sklearn calls it "min_tf," but it accepts both integers (counts) and 
doubles (fractions or frequencies). Since we cannot take both the way Params 
are set up, the best we could do is have a minTermCount Param and a minTermFreq 
Param. Frequency might actually be a better way to specify this threshold.
    TF-IDF arguably uses fractional frequencies (not counts)...depending on 
whether "TF-IDF" refers to the value or the normalizer.
    Proposal: I propose we do 1 of the following:
    
    (A) Rename "minTermFreq" to "minTermCount"
    (B) Change "minTermFreq" to take a Double fraction
    (C) Remove "minTermFreq" for now pending further discussion
    CC: @hhbyyh @mengxr What do you think?
    
    @hhbyyh Feel free to merge, take parts, or modify as you see fit. It may be 
easiest if you go ahead and merge/modify so that @mengxr can view your full PR.
    
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to