Maybe it's easiest to give an example.

If you have input:

a b c
a b c d
a    c d
a b c

You should expect Mahout to output (say, for support 2):

[a, b, c],3
[a, c, d],2
[a, c],4

You might also expect to see [a],4 or [a, b],3 but these are implied
by the other patterns.  Note that [a, b] and [a, c] are both
subpatterns of [a, b, c].  Only [a, c] is emitted because it has
greater support than [a, b, c]; [a, b] is not emitted, since it has
support equal to a reported superpattern.

If you were to create all possible subsets of each output pattern
(with the same support as the generating pattern),  then dedup these
by taking max-support, you'd have" fully-verbose" results.

Currently there is no trivial way to disable this behavior; it would
require code changes.  I'm not sure how easy it would be in the
current code, but I think it'd be reasonably easy in an alternate
implementation I've been trying to contribute.

-tom

On Mon, Dec 19, 2011 at 6:34 AM, gaurav singh <[email protected]> wrote:
> That seems to make sense. What do you mean by "  Mahout will not report any
> of those unless the support is strictly greater
> than 3. " Is there a way for me to get all the patterns with support
> strictly greater then a particular value?
>
> Thanks
> Gaurav
>
> On Mon, Dec 19, 2011 at 4:58 PM, Tom Pierce <[email protected]> wrote:
>
>> One possible explanation is that Mahout's FPG avoids reporting
>> patterns that are subsumed by others.
>>
>> For example, if you have pattern [a, b, c] with support 3, you clearly
>> must also have [a, b], [b, c] and [a, c] with support >= 3.  Mahout
>> will not report any of those unless the support is strictly greater
>> than 3.
>>
>> Does that help explain your discrepancies?  If not can you share an
>> example data set along with a missed pattern?
>>
>> -tom
>>
>> On Mon, Dec 19, 2011 at 1:37 AM, gaurav singh <[email protected]>
>> wrote:
>> > Hi All,
>> >
>> > I am using mahout  on Ubuntu 10.04  from the repository and running it
>> on a
>> > data set of 1472 row, I am running it in sequential mode with k=200,000
>> and
>> > s= 400. I have implemented fp-growth in php but when I compare the output
>> > of my implementation of fp-growth and mahout fpg, I find that in mahout
>> the
>> > output consists of just 17,500 patterns whereas from my implementation I
>> > get around 65,000 unique patterns(I have verified there uniqueness!), for
>> > the same value of support threshold. I have also verified my outputs from
>> > the actual data set and have found out that all my patterns are correct
>> and
>> > do exist in the data set with correct value of their support.
>> >
>> >
>> > Can anyone please explain me the reason??
>> >
>> > Thanks!!
>> >
>> > --
>> > regards
>> > Gaurav Singh
>> >
>> >
>> >
>> >
>> >
>> > --
>> > regards
>> > Gaurav Singh
>> >
>> >
>> >
>> >
>> >
>> > --
>> > regards
>> > Gaurav Singh
>>
>
>
>
> --
> regards
> Gaurav Singh

Reply via email to