Hi, The Pfp-growth on my cluster is fine when I run it onto *webdocs.dat*dataset.( http://fimi.ua.ac.be/data/) My cluster has about 15 nodes and this will only take less than 10 mins. I looked into you log and found that cache misses. Which version of mahout are you using? For mahout-0.4, it would not use distributed-cache provided by hadoop. For mahout-0.5, the code is a little bit different and it uses distributed cache, which will fetch FList and gList file before execution. (setup method, in reducer) For more info about distributed cache in hadoop, see http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/filecache/DistributedCache.html
Best, Roger Dai 2011/9/22 bing wang <[email protected]> > Hi all, > > I run PfpGrowth in a big hadoop cluster , but the ParallelFPgrowth MR job > got stuck when some ParallelFPGrowthReducer tasks run into > fpGrowth.generateTopKFrequentPatterns(). > > I feed 6,468,284 record for PfpGrowth, and 110825 features greater than > minSupport(5), numGroups(20000), treeCacheSize(10) and with other paramters > as default. > I also set the hadoop mapred.child.java.opts option to 3g. > > The cluster started 20 ParallelFPGrowthReducer tasks, but about 10 tasks > was > stuck at the same reason. > > See the reducer task log(Log 1): > > 2011-09-20 21:11:49,860 INFO > org.apache.mahout.fpm.pfpgrowth.ParallelCountingReducer: localFList size > 871 > for key 8120 > 2011-09-20 21:11:49,860 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique items > 871 > 2011-09-20 21:11:49,860 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48722=>3 > 2011-09-20 21:11:49,860 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48723=>4 > 2011-09-20 21:11:49,860 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48720=>5 > 2011-09-20 21:11:49,860 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48721=>0 > 2011-09-20 21:11:49,860 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48724=>2 > 2011-09-20 21:11:49,860 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48725=>1 > 2011-09-20 21:11:49,860 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique pruned > items 27 > 2011-09-20 21:11:49,861 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of Nodes in the > FP > Tree: 93 > 2011-09-20 21:11:49,861 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for > all > patterns with 5 > 2011-09-20 21:11:49,861 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 2 Patterns with > Least Support 10 > 2011-09-20 21:11:49,861 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for > all > patterns with 4 > 2011-09-20 21:11:49,861 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 1 Patterns with > Least Support 13 > 2011-09-20 21:11:49,861 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for > all > patterns with 3 > 2011-09-20 21:11:49,862 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 2 Patterns with > Least Support 13 > 2011-09-20 21:11:49,862 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for > all > patterns with 2 > 2011-09-20 21:11:49,862 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 1 Patterns with > Least Support 14 > 2011-09-20 21:11:49,862 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for > all > patterns with 1 > 2011-09-20 21:11:49,863 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 1 Patterns with > Least Support 15 > 2011-09-20 21:11:49,863 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for > all > patterns with 0 > 2011-09-20 21:11:49,863 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 2 Patterns with > Least Support 17 > 2011-09-20 21:11:49,863 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Tree Cache: First Level: > Cache hits=20 Cache Misses=127 > 2011-09-20 21:11:49,864 INFO > org.apache.mahout.fpm.pfpgrowth.ParallelCountingReducer: localFList size > 1410 for key 8140 > 2011-09-20 21:11:49,865 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique items > 1410 > 2011-09-20 21:11:49,865 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48842=>22 > 2011-09-20 21:11:49,865 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48843=>23 > 2011-09-20 21:11:49,865 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48840=>21 > 2011-09-20 21:11:49,865 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48841=>27 > 2011-09-20 21:11:49,865 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern > 48844=>266 > 2011-09-20 21:11:49,865 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48845=>28 > 2011-09-20 21:11:49,865 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique pruned > items 353 > 2011-09-20 21:11:49,866 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of Nodes in the > FP > Tree: 2512 > 2011-09-20 21:11:49,866 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for > all > patterns with 266 > > > From last line of Log 1, we can see the reducer task which get stuck is > mining FTree Tree for all patterns with the attribute 266. > > However I think the task is still running because some reducer tasks will > keep moving after long time mining(see bellow log, Log 2) > > 2011-09-20 21:24:08,577 INFO > org.apache.mahout.fpm.pfpgrowth.ParallelCountingReducer: localFList size > 1183 for key 13596 > 2011-09-20 21:24:08,578 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique items > 1183 > 2011-09-20 21:24:08,578 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern > 81581=>250 > 2011-09-20 21:24:08,578 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern > 81580=>132 > 2011-09-20 21:24:08,578 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern > 81577=>129 > 2011-09-20 21:24:08,578 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern > 81576=>128 > 2011-09-20 21:24:08,578 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern > 81579=>131 > 2011-09-20 21:24:08,578 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern > 81578=>130 > 2011-09-20 21:24:08,578 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique pruned > items 251 > 2011-09-20 21:24:08,579 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of Nodes in the > FP > Tree: 1376 > 2011-09-20 21:24:08,579 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for > all > patterns with 250 > 2011-09-20 21:24:08,579 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 2 Patterns with > Least Support 5 > 2011-09-20 21:24:08,579 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for > all > patterns with 132 (take 2.5 hours ) > 2011-09-21 00:08:56,616 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 40 Patterns with > Least Support 7 > 2011-09-21 00:08:56,665 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for > all > patterns with 131 > 2011-09-21 00:08:56,703 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 3 Patterns with > Least Support 7 > 2011-09-21 00:08:56,703 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for > all > patterns with 130 > 2011-09-21 00:08:56,736 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 2 Patterns with > Least Support 7 > 2011-09-21 00:08:56,736 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for > all > patterns with 129 > 2011-09-21 00:08:56,769 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 1 Patterns with > Least Support 7 > 2011-09-21 00:08:56,769 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for > all > patterns with 128 > 2011-09-21 00:08:56,809 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 1 Patterns with > Least Support 7 > 2011-09-21 00:08:56,809 INFO > org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Tree Cache: First Level: > Cache hits=13 Cache Misses=475 > > And, the task status showed *"Processing FPTree: Bottom Up FP Growth > > reduce"*, which indicates that the job run into FpGrowth.growthBottomUp() > and be stuck within FpGrowth.growthBottomUp(). > > Some reduers will hold at this status about 25 hours till I kill the whole > PfpGrowth job manually.(see job history as follows) > > *JobName: * PFP Parallel FPGrowth running over input (ignore)/sortedoutput > *JobConf: * > > hdfs://hdpnn:9000/home/hadoop/cluster-data/mapred/system/job_201108301259_1089136/job.xml< > http://10.249.54.103/jobconf_history.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&jobLogDir=file:/home/hadoop/cluster-data/logs/history&jobUniqueString=hdpjt_1314680340456_job_201108301259_1089136 > > > *Submitted At: * 20-Sep-2011 20:31:57 > *Launched At: * 20-Sep-2011 20:32:04 (6sec) > *Finished At: * 21-Sep-2011 22:31:51 (25hrs, 59mins, 47sec) > *Status: * KILLED > *Analyse This Job< > http://10.249.54.103/analysejobhistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp > > > * > ------------------------------ > > Counter Map Reduce Total KindTotal > Tasks(successful+failed+killed)Successful > tasksFailed tasksKilled tasksStart TimeFinish Time Setup > 1< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=SETUP&status=all > > > 1< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=SETUP&status=SUCCESS > > > 0< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=SETUP&status=FAILED > > > 0< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=SETUP&status=KILLED > > > 20-Sep-2011 > 20:32:06 20-Sep-2011 20:32:07 (0sec) Map > 28< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=MAP&status=all > > > 22< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=MAP&status=SUCCESS > > > 0< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=MAP&status=FAILED > > > 6< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=MAP&status=KILLED > > > 20-Sep-2011 > 20:32:13 20-Sep-2011 20:56:01 (23mins, 48sec) Reduce > 24< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=REDUCE&status=all > > > 18< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=REDUCE&status=SUCCESS > > > 0< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=REDUCE&status=FAILED > > > 6< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=REDUCE&status=KILLED > > > 20-Sep-2011 > 20:32:33 21-Sep-2011 22:31:51 (25hrs, 59mins, 18sec) Cleanup > 1< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=CLEANUP&status=all > > > 1< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=CLEANUP&status=SUCCESS > > > 0< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=CLEANUP&status=FAILED > > > 0< > http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=CLEANUP&status=KILLED > > > 21-Sep-2011 > 22:31:50 21-Sep-2011 22:31:51 (0sec) > > > In Log 2, we see FP Tree for key 13596 only includes 1376 nodes, > however it take 2.5 hours to ming patterns for 132. > > So why it was so slowly when mining frequent patterns for some attributes? > What's the reason behinds that or how can I resolve? > > Thanks in advance! > > > > -- > Gmail/talk: [email protected] >
