Thanks for the help Tom :-)

On Sun, Feb 26, 2012 at 11:56 PM, tom <[email protected]> wrote:

> It's not well documented, but there are actually two distinct
> implementations of FPGrowth, which each can be run sequentially or as
> mapreduce jobs.
>
> The --method option lets you select sequential/mapreduce, and the
> --useFPG2/-2 flag selects the alternate implementation.
>
> Any way you run FPG, patterns will be collected in
> FrequentPatternMaxHeaps; all implementation/mode combinations will make use
> of this class.
>
> I do not recall the precise details right now, but something about the
> mining/aggregation strategy used in the original (default) implementation
> leads to redundant patterns appearing when running in mapreduce mode.  If
> your question is driven by finding unexpected redundancies in FPG output,
> I'd be interested to hear if this persists after trying --useFPG2.
>
> -tom
>
>
>
> On 02/26/2012 12:06 PM, gaurav singh wrote:
>
>> Hi Tom,
>>
>> I don't understand, why do you say I will get a lot of redundant patterns?
>> In each group dependent shard generates patterns with respect to the
>> elements of that shard. The fpg-2 as far as I know and if I am correct is
>> only a new sequential implementation of fp-growth and not map/reduce
>> implementation.
>>
>> My question was specifically if we eliminate subpatterns from output in
>> mahout parallel fp-growth(map/reduce version)? I know that the function
>> exists in FrequentPatternMaxHeap, but that's the sequential algorithm, I
>> am
>> asking only about the map/reduce version?
>>
>> On Sun, Feb 26, 2012 at 9:39 PM, tom<[email protected]>  wrote:
>>
>>  Hi Gaurav,
>>>
>>> The patterns are accumulated in a heap (see FrequentPatternMaxHeap),
>>> which
>>> uses isSubPatternOf.
>>>
>>> That said, I do think the default implementation of PFPGrowth will get
>>> you
>>> many redundant patterns under certain circumstances, but the "-2"
>>> implementation will reduce (perhaps eliminate?) redundant patterns.
>>>
>>> -tom
>>>
>>>
>>> On 02/26/2012 09:39 AM, gaurav singh wrote:
>>>
>>>  Hi Guys,
>>>>
>>>>
>>>> There is a function in mahout sequential fp-growth algorithm named
>>>> isSubPatternof() which returns whether one pattern is subpattern of
>>>> another
>>>> pattern and if both have equal support only the one larger of the two is
>>>> output. I can't find any such function being used in parallel fp-growth.
>>>> Does that mean that in parallel fp-growth we display all the possible
>>>> patterns without eliminating such subpatterns?
>>>>
>>>> Thanks for help!
>>>>
>>>>
>>>>
>>
>


-- 
regards
Gaurav Singh

Reply via email to