[ https://issues.apache.org/jira/browse/SPARK-11585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996174#comment-14996174 ]
zhengruifeng edited comment on SPARK-11585 at 11/9/15 8:11 AM: --------------------------------------------------------------- I have implemented it based on Apriori's Rule-Generation Algorithm: https://github.com/zhengruifeng/spark-rules It's compatible with fpm's APIs. import org.apache.spark.mllib.fpm._ val data = sc.textFile("hdfs://ns1/whale/T40I10D100K.dat") val transactions = data.map(s => s.trim.split(' ').map(_.toInt)).persist() val fpg = new FPGrowth().setMinSupport(0.01) val model = fpg.run(transactions) val ar = new AprioriRules().setMinConfidence(0.1).setMaxConsequent(15).setNumPartitions(10) val results = ar.run(model.freqItemsets) and it output rule-generation infomation like this: 15/11/04 11:28:46 INFO AprioriRules: Candidates for 1-consequent rules : 312917 15/11/04 11:28:58 INFO AprioriRules: Generated 1-consequent rules : 306703 15/11/04 11:29:10 INFO AprioriRules: Candidates for 2-consequent rules : 707747 15/11/04 11:29:35 INFO AprioriRules: Generated 2-consequent rules : 704000 15/11/04 11:29:55 INFO AprioriRules: Candidates for 3-consequent rules : 1020253 15/11/04 11:30:38 INFO AprioriRules: Generated 3-consequent rules : 1014002 15/11/04 11:31:14 INFO AprioriRules: Candidates for 4-consequent rules : 972225 15/11/04 11:32:00 INFO AprioriRules: Generated 4-consequent rules : 956483 15/11/04 11:32:44 INFO AprioriRules: Candidates for 5-consequent rules : 653749 15/11/04 11:33:32 INFO AprioriRules: Generated 5-consequent rules : 626993 15/11/04 11:34:07 INFO AprioriRules: Candidates for 6-consequent rules : 331038 15/11/04 11:34:50 INFO AprioriRules: Generated 6-consequent rules : 314455 15/11/04 11:35:10 INFO AprioriRules: Candidates for 7-consequent rules : 138490 15/11/04 11:35:43 INFO AprioriRules: Generated 7-consequent rules : 136260 15/11/04 11:35:57 INFO AprioriRules: Candidates for 8-consequent rules : 48567 15/11/04 11:36:14 INFO AprioriRules: Generated 8-consequent rules : 47331 15/11/04 11:36:24 INFO AprioriRules: Candidates for 9-consequent rules : 12430 15/11/04 11:36:33 INFO AprioriRules: Generated 9-consequent rules : 11925 15/11/04 11:36:37 INFO AprioriRules: Candidates for 10-consequent rules : 2211 15/11/04 11:36:47 INFO AprioriRules: Generated 10-consequent rules : 2064 15/11/04 11:36:55 INFO AprioriRules: Candidates for 11-consequent rules : 246 15/11/04 11:36:58 INFO AprioriRules: Generated 11-consequent rules : 219 15/11/04 11:37:00 INFO AprioriRules: Candidates for 12-consequent rules : 13 15/11/04 11:37:03 INFO AprioriRules: Generated 12-consequent rules : 11 15/11/04 11:37:03 INFO AprioriRules: Candidates for 13-consequent rules : 0 was (Author: podongfeng): I have implemented it based on Apriori's Rule-Generation Algorithm: https://github.com/zhengruifeng/spark-rules It's compatible with fpm's APIs. import org.apache.spark.mllib.fpm._ import org.apache.spark.mllib.fpm.FPGrowth val data = sc.textFile("hdfs://ns1/whale/T40I10D100K.dat") val transactions = data.map(s => s.trim.split(' ').map(_.toInt)).persist() val fpg = new FPGrowth().setMinSupport(0.01) val model = fpg.run(transactions) val ar = new AprioriRules().setMinConfidence(0.1).setMaxConsequent(1).setNumPartitions(10) val results = ar.run(model.freqItemsets) > AssociationRules should generates all association rules with consequents of > arbitrary length > -------------------------------------------------------------------------------------------- > > Key: SPARK-11585 > URL: https://issues.apache.org/jira/browse/SPARK-11585 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib > Reporter: zhengruifeng > > AssociationRules should generates all association rules with consequents of > arbitrary length, no just rules which have a single item as the consequent. > Such as: > 39 804 ==> 413 743 819 #SUP: 1023 #CONF: 0.70117 > 39 743 ==> 413 804 819 #SUP: 1023 #CONF: 0.93939 > 39 413 ==> 743 804 819 #SUP: 1023 #CONF: 0.6007 > 819 ==> 39 413 743 804 #SUP: 1023 #CONF: 0.15418 > 804 ==> 39 413 743 819 #SUP: 1023 #CONF: 0.12997 > 743 ==> 39 413 804 819 #SUP: 1023 #CONF: 0.7276 > 39 ==> 413 743 804 819 #SUP: 1023 #CONF: 0.12874 > ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org