Hello all,
I am new to mahout. I have just started looking into mahout to replace our
current fpgrowth implementation with a parallel fp growth that Mahout since we
started having scalability issues. I looked at PFPGrowth documentation and I
noticed that it only produces top K frequent patterns but not the associations
and what we need is associations. So I was thinking of implementing a simple
AssociationGenerator given the frequent patterns output. However I am not sure
what is the best way to generate associations given the frequent patterns
produced by mahout.
I have the following sample output from mahout.
Key: 46485: Value: ([46485],936), ([46705, 46485],355)
Key: 46705: Value: ([46705],2526)
We are interested only in item set size of 2 since we need only 1 ANTECEDENT to
1 CONSEQUENT ASSOCIATIONS ONLY.
I was planning to calculate associations with confidence as follows:
For each key above as A {
for each two-item set as [A,C] {
confidence (A->C) = support(A->C)/support(C);
add association (A, C, confidence(A->C) to the list;
}
}
Keeping the above requirement and pseudo code n mind, my questions as follows:
1. Is the above algorithm efficient?
2. In the first pattern, [46705, 46485] occurred 355 times but in second
pattern why is the same pattern not repeated. Because of this calculating
confidence (46705 -> 46485) becomes difficult. As you can see from above code,
I was planning to read patterns for each feature and calculate confidence of
all association with antecedent. But when I read feature 46705, I cannot
calculate confidence of (46705 -> 46485) since the item set is not included
with the feature.
3. Has anyone implemented associations from the generated frequent patterns.
Thanks
Praveen