Hey Praveen, thanks for responding.
> Frquent patterns are reported per feature which is why you are seeing the two > patterns twice. First one is for feature 1518311 and second one is for > feature 1476937. That's what I thought but then different support values made me dizzy! Also, it's seems like it's not just about reporting the pattern for each feature but for each combination of features : > 22 *1476937* 720020 *1518311* > 30 *1518311* *1476937* 720020 > 30 720020 *1518311* *1476937* > 34 720020 *1476937* *1518311* > 38 *1518311* 720020 *1476937* > 42 *1476937* *1518311* 720020 Here you can see each possible permutation of the three items registering different support. > Are you running on multi node Hadoop cluster. If so did you read all the > output files? I ran locally and then on a small 4 node cluster. I'm reading the parts file under frequentpatterns directory. Let me try to run it on a smaller scale and get you the output soon. Thanks! Vipul On Feb 3, 2011, at 6:44 PM, <[email protected]> <[email protected]> wrote: > Hi Vipul, > Frquent patterns are reported per feature which is why you are seeing the two > patterns twice. First one is for feature 1518311 and second one is for > feature 1476937. > > However both should have the same exact support. I am not sure why you have > different support for the same item set. May be if you send the full output > from Mahout as it is we could take a look. > > Are you running on multi node Hadoop cluster. If so did you read all the > output files? > > Praveen > ________________________________________ > From: ext Vipul Pandey [[email protected]] > Sent: Thursday, February 03, 2011 8:21 PM > To: [email protected] > Subject: PFPGrowth - weird output? > > Hi all! > > I'm trying to run PFPgrowth on my data and this is an output I get. (Please > note that I parse the output in frequentpatterns folder and generate this > output with the support followed by the itemset) > > support : Itemset > *234 1518311 1476937 * > 235 55843184 > 238 1238079 > 244 34541 > 247 4516454 > 252 106478 > 252 670864 > *254 1476937 1518311 * > > You can see that two items are reported twice (*1518311 1476937*) with > different supports. > > And below are all the occurance of these two items together .... if you > notice it has all the permutations of the three items (*1476937* *720020* * > 1518311* ) > > 22 *1476937* 720020 *1518311* > 30 *1518311* *1476937* 720020 > 30 720020 *1518311* *1476937* > 34 720020 *1476937* *1518311* > 38 *1518311* 720020 *1476937* > 42 *1476937* *1518311* 720020 > 234 *1518311* *1476937* > 254 *1476937* *1518311* > > Does this mean if I have to get the support of just the the pair (*1476937* > *1518311* ) I will have to add all of them up !? > > Even in that case ... this total comes out to *684* and if I count the > number of co-ocurrances of these two items in the original baskets the > support is *766*? Why's there a difference? any idea? > > > Thanks! > Vipul
