Hello all,
I am new to mahout. I have just started looking into mahout to replace our 
current fpgrowth implementation with a parallel fp growth that Mahout since we 
started having scalability issues. I looked at PFPGrowth documentation and I 
noticed that it only produces top K frequent patterns but not the associations 
and what we need is associations. So I was thinking of implementing a simple 
AssociationGenerator given the frequent patterns output. However I am not sure 
what is the best way to generate associations given the frequent patterns 
produced by mahout.

I have the following sample output from mahout.

Key: 46485: Value: ([46485],936), ([46705, 46485],355)
Key: 46705: Value: ([46705],2526)

We are interested only in item set size of 2 since we need only 1 ANTECEDENT to 
1 CONSEQUENT ASSOCIATIONS ONLY.

I was planning to calculate associations with confidence as follows:
For each key above as A {
        for each two-item set as [A,C] {
                confidence (A->C) = support(A->C)/support(C);
                add association (A, C, confidence(A->C) to the list;
        }
}

Keeping the above requirement and pseudo code n mind, my questions as follows:
1. Is the above algorithm efficient?
2. In the first pattern, [46705, 46485] occurred 355 times but in second 
pattern why is the same pattern not repeated. Because of this calculating 
confidence (46705 -> 46485) becomes difficult. As you can see from above code, 
I was planning to read patterns for each feature and calculate confidence of 
all association with antecedent. But when I read feature 46705, I cannot 
calculate confidence of (46705 -> 46485) since the item set is not included 
with the feature.
3. Has anyone implemented associations from the generated frequent patterns.


Thanks
Praveen

Reply via email to