Hi Vipul, Frquent patterns are reported per feature which is why you are seeing the two patterns twice. First one is for feature 1518311 and second one is for feature 1476937.
However both should have the same exact support. I am not sure why you have different support for the same item set. May be if you send the full output from Mahout as it is we could take a look. Are you running on multi node Hadoop cluster. If so did you read all the output files? Praveen ________________________________________ From: ext Vipul Pandey [[email protected]] Sent: Thursday, February 03, 2011 8:21 PM To: [email protected] Subject: PFPGrowth - weird output? Hi all! I'm trying to run PFPgrowth on my data and this is an output I get. (Please note that I parse the output in frequentpatterns folder and generate this output with the support followed by the itemset) support : Itemset *234 1518311 1476937 * 235 55843184 238 1238079 244 34541 247 4516454 252 106478 252 670864 *254 1476937 1518311 * You can see that two items are reported twice (*1518311 1476937*) with different supports. And below are all the occurance of these two items together .... if you notice it has all the permutations of the three items (*1476937* *720020* * 1518311* ) 22 *1476937* 720020 *1518311* 30 *1518311* *1476937* 720020 30 720020 *1518311* *1476937* 34 720020 *1476937* *1518311* 38 *1518311* 720020 *1476937* 42 *1476937* *1518311* 720020 234 *1518311* *1476937* 254 *1476937* *1518311* Does this mean if I have to get the support of just the the pair (*1476937* *1518311* ) I will have to add all of them up !? Even in that case ... this total comes out to *684* and if I count the number of co-ocurrances of these two items in the original baskets the support is *766*? Why's there a difference? any idea? Thanks! Vipul
