Ok, now I've succeeded in running fpgrowth, both sequential and
mapreduce, from the 'fpg' job and the flag that chooses 'sequential'
from 'mapreduce'. I've done this from two different datasets,
accidents.dat and retail.dat. I only ran the first thousand lines of
both datasets for time reasons.

Both sequential and mapreduce locate the same ids as being in
patterns. Examining the patterns in detail, they do not match but
patterns involving id X generally the same size. Successive runs of
each variant give exactly the same results, so having sequential and
mapreduce give different result sets is puzzling. Pulling the
distances is a little difficult with text processing.

What can account for the different outputs of map/reduce and
sequential (pseudo-distributed) modes?




On 7/27/11, Lance Norskog <[email protected]> wrote:
> I'll prep a current version.
>
> On 7/27/11, Robin Anil <[email protected]> wrote:
>> On Tue, Jul 26, 2011 at 11:06 PM, Lance Norskog <[email protected]>
>> wrote:
>>
>>> The parameters and files mentioned on this page do not find any
>>> frequent patterns:
>>>
>>> https://cwiki.apache.org/confluence/display/MAHOUT/Parallel+Frequent+Pattern+Mining
>>
>> Let me run and correct this doc.
>>
>>>
>>>
>>> Have 'accidents.dat.gz' from the given site, or 'retail.dat.gz' from
>>> the same site, what parameters should find some frequent patterns?
>>
>>
>>> Also, what is the magic to get maven to pass JDK options to an exec'd
>>> class?
>>
>> Did you try using the bin/mahout script. the memory size is configurable
>> inside it.
>>
>>
>>> FPGrowth sequential needs the memory size bumped up.
>>
>>
>>> Cheers,
>>>
>>> --
>>> Lance Norskog
>>> [email protected]
>>>
>>
>
>
> --
> Lance Norskog
> [email protected]
>


-- 
Lance Norskog
[email protected]

Reply via email to