I suggest you take a sample of your data and run it on these
non-hadoop implementations of itemset miners, FPGrowth is one of the
available algorithms.

http://www.borgelt.net/fpm.html

If you have success on a small sample then start upscaling the sample
as well as investigate the distributions of your data.

- Neal

On Sat, Sep 18, 2010 at 12:30 PM, Ted Dunning <[email protected]> wrote:
> In order to encourage your excellent practice of reposting, I will repeat my
> (non)-answer here.
>
> -------------------------------------------
> I don't know the answer to this, but previously this kind of problem was
> caused by highly skewed statistics in the input data.
>
> If there are things that cooccur with everything, then you will have
> problems with the speed of the algorithm.
>
> Can you say something about the distribution of your data?  Can you post a
> frequency by rank table?
>
> On Sat, Sep 18, 2010 at 10:37 AM, Mark <[email protected]> wrote:
>
>>  I am trying to run FPGrowth:
>>
>> /hadoop jar /opt/mahout-0.3/mahout-examples-0.3.job
>> org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver -i
>> output/product/part-r-00000 -o pfp -method mapreduce -regex [\\t] -s 5 -g
>> 17500 -k 50/
>>
>> However the 3rd task:/ "Processing FPTree: Bottom Up FP Growth > reduce"/
>> will not finish. It's basically stuck at 85% and hasn't budged in over an
>> hour. The output of the first task outputted there were about 37K features
>> so I set -g to 17500. Does anyone know whats going on and how I can speed
>> this up?
>>
>> Thanks
>>
>

Reply via email to