One possible explanation is that Mahout's FPG avoids reporting patterns that are subsumed by others.
For example, if you have pattern [a, b, c] with support 3, you clearly must also have [a, b], [b, c] and [a, c] with support >= 3. Mahout will not report any of those unless the support is strictly greater than 3. Does that help explain your discrepancies? If not can you share an example data set along with a missed pattern? -tom On Mon, Dec 19, 2011 at 1:37 AM, gaurav singh <[email protected]> wrote: > Hi All, > > I am using mahout on Ubuntu 10.04 from the repository and running it on a > data set of 1472 row, I am running it in sequential mode with k=200,000 and > s= 400. I have implemented fp-growth in php but when I compare the output > of my implementation of fp-growth and mahout fpg, I find that in mahout the > output consists of just 17,500 patterns whereas from my implementation I > get around 65,000 unique patterns(I have verified there uniqueness!), for > the same value of support threshold. I have also verified my outputs from > the actual data set and have found out that all my patterns are correct and > do exist in the data set with correct value of their support. > > > Can anyone please explain me the reason?? > > Thanks!! > > -- > regards > Gaurav Singh > > > > > > -- > regards > Gaurav Singh > > > > > > -- > regards > Gaurav Singh
