I suggest you take a sample of your data and run it on these non-hadoop implementations of itemset miners, FPGrowth is one of the available algorithms.
http://www.borgelt.net/fpm.html If you have success on a small sample then start upscaling the sample as well as investigate the distributions of your data. - Neal On Sat, Sep 18, 2010 at 12:30 PM, Ted Dunning <[email protected]> wrote: > In order to encourage your excellent practice of reposting, I will repeat my > (non)-answer here. > > ------------------------------------------- > I don't know the answer to this, but previously this kind of problem was > caused by highly skewed statistics in the input data. > > If there are things that cooccur with everything, then you will have > problems with the speed of the algorithm. > > Can you say something about the distribution of your data? Can you post a > frequency by rank table? > > On Sat, Sep 18, 2010 at 10:37 AM, Mark <[email protected]> wrote: > >> I am trying to run FPGrowth: >> >> /hadoop jar /opt/mahout-0.3/mahout-examples-0.3.job >> org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver -i >> output/product/part-r-00000 -o pfp -method mapreduce -regex [\\t] -s 5 -g >> 17500 -k 50/ >> >> However the 3rd task:/ "Processing FPTree: Bottom Up FP Growth > reduce"/ >> will not finish. It's basically stuck at 85% and hasn't budged in over an >> hour. The output of the first task outputted there were about 37K features >> so I set -g to 17500. Does anyone know whats going on and how I can speed >> this up? >> >> Thanks >> >
