Hey Praveen, 

thanks for responding.

> Frquent patterns are reported per feature which is why you are seeing the two 
> patterns twice. First one is for feature 1518311 and second one is for 
> feature 1476937.
That's what I thought but then different support values made me dizzy! 

Also, it's seems like it's not just about reporting the pattern for each 
feature but for each combination of features : 
> 22 *1476937* 720020 *1518311*
> 30 *1518311* *1476937* 720020
> 30 720020 *1518311* *1476937*
> 34 720020 *1476937* *1518311*
> 38 *1518311* 720020 *1476937*
> 42 *1476937* *1518311* 720020
Here you can see each possible permutation of the three items registering 
different support. 


> Are you running on multi node Hadoop cluster. If so did you read all the 
> output files?
I ran locally and then on a small 4 node cluster. I'm reading the parts file 
under frequentpatterns directory.

Let me try to run it on a smaller scale and get you the output soon.

Thanks!
Vipul

On Feb 3, 2011, at 6:44 PM, <[email protected]> <[email protected]> 
wrote:

> Hi Vipul,
> Frquent patterns are reported per feature which is why you are seeing the two 
> patterns twice. First one is for feature 1518311 and second one is for 
> feature 1476937.
> 
> However both should have the same exact support. I am not sure why you have 
> different support for the same item set. May be if you send the full output 
> from Mahout as it is we could take a look.
> 
> Are you running on multi node Hadoop cluster. If so did you read all the 
> output files?
> 
> Praveen
> ________________________________________
> From: ext Vipul Pandey [[email protected]]
> Sent: Thursday, February 03, 2011 8:21 PM
> To: [email protected]
> Subject: PFPGrowth - weird output?
> 
> Hi all!
> 
> I'm trying to run PFPgrowth on my data and this is an output I get. (Please
> note that I parse the output in frequentpatterns folder and generate this
> output with the support followed by the itemset)
> 
> support : Itemset
> *234     1518311    1476937  *
> 235     55843184
> 238     1238079
> 244     34541
> 247     4516454
> 252     106478
> 252     670864
> *254     1476937   1518311  *
> 
> You can see that two items are reported twice (*1518311    1476937*) with
> different supports.
> 
> And below are all the occurance of these two items together .... if you
> notice it has all the permutations of the three items (*1476937* *720020* *
> 1518311*  )
> 
> 22 *1476937* 720020 *1518311*
> 30 *1518311* *1476937* 720020
> 30 720020 *1518311* *1476937*
> 34 720020 *1476937* *1518311*
> 38 *1518311* 720020 *1476937*
> 42 *1476937* *1518311* 720020
> 234 *1518311* *1476937*
> 254 *1476937* *1518311*
> 
> Does this mean if I have to get the support of just the the pair  (*1476937*
> *1518311*  ) I will have to add all of them up !?
> 
> Even in that case ... this total comes out to *684* and if I count the
> number of co-ocurrances of these two items in the original baskets the
> support is *766*? Why's there a difference? any idea?
> 
> 
> Thanks!
> Vipul

Reply via email to