Thanks for the help Tom :-) On Sun, Feb 26, 2012 at 11:56 PM, tom <[email protected]> wrote:
> It's not well documented, but there are actually two distinct > implementations of FPGrowth, which each can be run sequentially or as > mapreduce jobs. > > The --method option lets you select sequential/mapreduce, and the > --useFPG2/-2 flag selects the alternate implementation. > > Any way you run FPG, patterns will be collected in > FrequentPatternMaxHeaps; all implementation/mode combinations will make use > of this class. > > I do not recall the precise details right now, but something about the > mining/aggregation strategy used in the original (default) implementation > leads to redundant patterns appearing when running in mapreduce mode. If > your question is driven by finding unexpected redundancies in FPG output, > I'd be interested to hear if this persists after trying --useFPG2. > > -tom > > > > On 02/26/2012 12:06 PM, gaurav singh wrote: > >> Hi Tom, >> >> I don't understand, why do you say I will get a lot of redundant patterns? >> In each group dependent shard generates patterns with respect to the >> elements of that shard. The fpg-2 as far as I know and if I am correct is >> only a new sequential implementation of fp-growth and not map/reduce >> implementation. >> >> My question was specifically if we eliminate subpatterns from output in >> mahout parallel fp-growth(map/reduce version)? I know that the function >> exists in FrequentPatternMaxHeap, but that's the sequential algorithm, I >> am >> asking only about the map/reduce version? >> >> On Sun, Feb 26, 2012 at 9:39 PM, tom<[email protected]> wrote: >> >> Hi Gaurav, >>> >>> The patterns are accumulated in a heap (see FrequentPatternMaxHeap), >>> which >>> uses isSubPatternOf. >>> >>> That said, I do think the default implementation of PFPGrowth will get >>> you >>> many redundant patterns under certain circumstances, but the "-2" >>> implementation will reduce (perhaps eliminate?) redundant patterns. >>> >>> -tom >>> >>> >>> On 02/26/2012 09:39 AM, gaurav singh wrote: >>> >>> Hi Guys, >>>> >>>> >>>> There is a function in mahout sequential fp-growth algorithm named >>>> isSubPatternof() which returns whether one pattern is subpattern of >>>> another >>>> pattern and if both have equal support only the one larger of the two is >>>> output. I can't find any such function being used in parallel fp-growth. >>>> Does that mean that in parallel fp-growth we display all the possible >>>> patterns without eliminating such subpatterns? >>>> >>>> Thanks for help! >>>> >>>> >>>> >> > -- regards Gaurav Singh
