One possible explanation is that Mahout's FPG avoids reporting
patterns that are subsumed by others.

For example, if you have pattern [a, b, c] with support 3, you clearly
must also have [a, b], [b, c] and [a, c] with support >= 3.  Mahout
will not report any of those unless the support is strictly greater
than 3.

Does that help explain your discrepancies?  If not can you share an
example data set along with a missed pattern?

-tom

On Mon, Dec 19, 2011 at 1:37 AM, gaurav singh <[email protected]> wrote:
> Hi All,
>
> I am using mahout  on Ubuntu 10.04  from the repository and running it on a
> data set of 1472 row, I am running it in sequential mode with k=200,000 and
> s= 400. I have implemented fp-growth in php but when I compare the output
> of my implementation of fp-growth and mahout fpg, I find that in mahout the
> output consists of just 17,500 patterns whereas from my implementation I
> get around 65,000 unique patterns(I have verified there uniqueness!), for
> the same value of support threshold. I have also verified my outputs from
> the actual data set and have found out that all my patterns are correct and
> do exist in the data set with correct value of their support.
>
>
> Can anyone please explain me the reason??
>
> Thanks!!
>
> --
> regards
> Gaurav Singh
>
>
>
>
>
> --
> regards
> Gaurav Singh
>
>
>
>
>
> --
> regards
> Gaurav Singh

Reply via email to