+1 Try data other than your own as well.
On 9/18/10, Ted Dunning <[email protected]> wrote: > Good advice relative to Mahout as well. Trying it on a smaller sample will > tell you if it is due to bad scaling or really a hangup. > > On Sat, Sep 18, 2010 at 12:03 PM, Mark <[email protected]> wrote: > >> Thanks. Ill give this a try and see how it performs >> >> >> On 9/18/10 12:01 PM, Neal Richter wrote: >> >>> I suggest you take a sample of your data and run it on these >>> non-hadoop implementations of itemset miners, FPGrowth is one of the >>> available algorithms. >>> >>> http://www.borgelt.net/fpm.html >>> >>> If you have success on a small sample then start upscaling the sample >>> as well as investigate the distributions of your data. >>> >>> - Neal >>> >>> On Sat, Sep 18, 2010 at 12:30 PM, Ted Dunning<[email protected]> >>> wrote: >>> >>>> In order to encourage your excellent practice of reposting, I will >>>> repeat >>>> my >>>> (non)-answer here. >>>> >>>> ------------------------------------------- >>>> I don't know the answer to this, but previously this kind of problem was >>>> caused by highly skewed statistics in the input data. >>>> >>>> If there are things that cooccur with everything, then you will have >>>> problems with the speed of the algorithm. >>>> >>>> Can you say something about the distribution of your data? Can you post >>>> a >>>> frequency by rank table? >>>> >>>> On Sat, Sep 18, 2010 at 10:37 AM, Mark<[email protected]> >>>> wrote: >>>> >>>> I am trying to run FPGrowth: >>>>> >>>>> /hadoop jar /opt/mahout-0.3/mahout-examples-0.3.job >>>>> org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver -i >>>>> output/product/part-r-00000 -o pfp -method mapreduce -regex [\\t] -s 5 >>>>> -g >>>>> 17500 -k 50/ >>>>> >>>>> However the 3rd task:/ "Processing FPTree: Bottom Up FP Growth> >>>>> reduce"/ >>>>> will not finish. It's basically stuck at 85% and hasn't budged in over >>>>> an >>>>> hour. The output of the first task outputted there were about 37K >>>>> features >>>>> so I set -g to 17500. Does anyone know whats going on and how I can >>>>> speed >>>>> this up? >>>>> >>>>> Thanks >>>>> >>>>> >
