[ https://issues.apache.org/jira/browse/SPARK-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387748#comment-14387748 ]
Littlestar edited comment on SPARK-6239 at 3/31/15 1:09 AM: ------------------------------------------------------------ >>>I would imagine a relative value is more usually useful. when recnum=12345678, minsupport=0.003, recnum*minsupport near to integer. Some result with little difference is lost because of double precision. was (Author: cnstar9988): >>If I want to set minCount=2, I must use.setMinSupport(1.99/(rdd.count())), >>because of double's precision. How to reopen this PR and mark relation to pull/5246, thanks. > Spark MLlib fpm#FPGrowth minSupport should use long instead > ----------------------------------------------------------- > > Key: SPARK-6239 > URL: https://issues.apache.org/jira/browse/SPARK-6239 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 1.3.0 > Reporter: Littlestar > Priority: Minor > > Spark MLlib fpm#FPGrowth minSupport should use long instead > ============== > val minCount = math.ceil(minSupport * count).toLong > because: > 1. [count]numbers of datasets is not kown before read. > 2. [minSupport ]double precision. > from mahout#FPGrowthDriver.java > addOption("minSupport", "s", "(Optional) The minimum number of times a > co-occurrence must be present." > + " Default Value: 3", "3"); > I just want to set minCount=2 for test. > Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org