+1

Try data other than your own as well.




On 9/18/10, Ted Dunning <[email protected]> wrote:
> Good advice relative to Mahout as well.  Trying it on a smaller sample will
> tell you if it is due to bad scaling or really a hangup.
>
> On Sat, Sep 18, 2010 at 12:03 PM, Mark <[email protected]> wrote:
>
>>  Thanks. Ill give this a try and see how it performs
>>
>>
>> On 9/18/10 12:01 PM, Neal Richter wrote:
>>
>>> I suggest you take a sample of your data and run it on these
>>> non-hadoop implementations of itemset miners, FPGrowth is one of the
>>> available algorithms.
>>>
>>> http://www.borgelt.net/fpm.html
>>>
>>> If you have success on a small sample then start upscaling the sample
>>> as well as investigate the distributions of your data.
>>>
>>> - Neal
>>>
>>> On Sat, Sep 18, 2010 at 12:30 PM, Ted Dunning<[email protected]>
>>>  wrote:
>>>
>>>> In order to encourage your excellent practice of reposting, I will
>>>> repeat
>>>> my
>>>> (non)-answer here.
>>>>
>>>> -------------------------------------------
>>>> I don't know the answer to this, but previously this kind of problem was
>>>> caused by highly skewed statistics in the input data.
>>>>
>>>> If there are things that cooccur with everything, then you will have
>>>> problems with the speed of the algorithm.
>>>>
>>>> Can you say something about the distribution of your data?  Can you post
>>>> a
>>>> frequency by rank table?
>>>>
>>>> On Sat, Sep 18, 2010 at 10:37 AM, Mark<[email protected]>
>>>>  wrote:
>>>>
>>>>   I am trying to run FPGrowth:
>>>>>
>>>>> /hadoop jar /opt/mahout-0.3/mahout-examples-0.3.job
>>>>> org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver -i
>>>>> output/product/part-r-00000 -o pfp -method mapreduce -regex [\\t] -s 5
>>>>> -g
>>>>> 17500 -k 50/
>>>>>
>>>>> However the 3rd task:/ "Processing FPTree: Bottom Up FP Growth>
>>>>>  reduce"/
>>>>> will not finish. It's basically stuck at 85% and hasn't budged in over
>>>>> an
>>>>> hour. The output of the first task outputted there were about 37K
>>>>> features
>>>>> so I set -g to 17500. Does anyone know whats going on and how I can
>>>>> speed
>>>>> this up?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>

Reply via email to