Re: a bug of fpgrowth?

tom pierce Wed, 22 Aug 2012 07:18:05 -0700

Hello,

Could you try re-running FP-Growth with the '-2' flag, and let us knowif you have more success?

This uses an alternate implementation of the FPGrowth algorithm; I havehad problems similar to what you are seeing when using the defaultimplementation.

I am skeptical of the change you suggest. That line seems to be relatedto maintaining a "least" pointer, and the logic seems right to me on thesurface.

It is interesting to hear that this change improves the diversity ofpatterns you receive. The default implementation of FPGrowth will often"mine" the same pattern several times. There is also a limit on how manypatterns will be returned (k). Together, these can limit the number ofunique patterns found.

The '-2' flag should eliminate duplicate patterns. You can also tryincreasing k if you want to find more patterns.


-tom

On 08/21/2012 11:22 PM, 林泽桢 wrote:

hello, when i use fpgrowth to get association rules, but it always come to
wrong, so confused.

Then i read the source code, i think i found a bug in line #102
of FrequentPatternMaxHeap.java, which " least.compareTo(frequentPattern) <
0 " should change to " least.compareTo(frequentPattern) > 0 ", the former
will filter a lot frequent patterns come after.

After modification, it comes to better, but when running on a file with
size of 400m and the maxHeapSize =1000, minsupport=2, fpgrowth costs above
10 hours, sometimes it spents 2 hours to compute one feature, is anything
wrong again?

thanks for help

Re: a bug of fpgrowth?

Reply via email to