Re: PFPGrowth - weird output?

Robin Anil Sat, 05 Mar 2011 02:14:21 -0800

Hi Vipul Is it possible for you to attach a test data to a JIRA issue for me
to investigate


Robin

On Sat, Mar 5, 2011 at 12:09 PM, Vipul Pandey <[email protected]> wrote:

> Hi All,
>
>
> I'm running into a different issue with PFP growth now. I see an output
> like :
>
> $ cat part-r-00000 | grep 1678807047
> 12      1678807047
> 38      1678807047 3159925415
>
> which says that the support (12) for the item (1678807047) is lesser than
> the support (38) of a pair containing that item. Needless to say that this
> is ridiculous.
> I get this even with the Sequential version of FPGrowth.
>
> $ cat part-r-00000  | grep 1441690161
> 12              1441690161 3910019844
> 18              1604285941 1441690161 3910019844
> 75              1441690161
>
>
> I'm sure I'm doing something "crafty" somewhere.
>
> For sequential, I supply the file containing baskets and get the output as
> a file of sequences.
>
> I run the following code to read the sequence file and write out the
> support and itemsets in plain text :
>
> (MapReduce was written for PFPGrowth output, which is bigger.  My reducer
> is just an identity reducer)
>          @Override
>        protected void map(Text key, TopKStringPatterns input, Context
> context)
>                        throws IOException, InterruptedException {
>                  for(Pair<List<String>,Long> pair : input.getPatterns()){
>                          StringBuffer sb = new StringBuffer();
>                          for(String item : pair.getFirst())
>                                  sb.append(item).append(" ");
>                          context.write(new LongWritable(pair.getSecond()),
> new Text(sb.toString()));
>                  }
>        }
>
> This gives me the output above.
> Is this the right way? Am I doing something wrong while parsing the output?
>
> My command line arguments are :
> -i ./baskets/part-r-00000 -o ./patterns -k 50 -method sequential -g 10
> -regex '[\t]' -s 10
>
> Any help would be highly appreciated.
>
> Regards,
> Vipul
>
>
>
>
> On Feb 3, 2011, at 6:44 PM, <[email protected]> <
> [email protected]> wrote:
>
> > Hi Vipul,
> > Frquent patterns are reported per feature which is why you are seeing the
> two patterns twice. First one is for feature 1518311 and second one is for
> feature 1476937.
> >
> > However both should have the same exact support. I am not sure why you
> have different support for the same item set. May be if you send the full
> output from Mahout as it is we could take a look.
> >
> > Are you running on multi node Hadoop cluster. If so did you read all the
> output files?
> >
> > Praveen
> > ________________________________________
> > From: ext Vipul Pandey [[email protected]]
> > Sent: Thursday, February 03, 2011 8:21 PM
> > To: [email protected]
> > Subject: PFPGrowth - weird output?
> >
> > Hi all!
> >
> > I'm trying to run PFPgrowth on my data and this is an output I get.
> (Please
> > note that I parse the output in frequentpatterns folder and generate this
> > output with the support followed by the itemset)
> >
> > support : Itemset
> > *234     1518311    1476937  *
> > 235     55843184
> > 238     1238079
> > 244     34541
> > 247     4516454
> > 252     106478
> > 252     670864
> > *254     1476937   1518311  *
> >
> > You can see that two items are reported twice (*1518311    1476937*) with
> > different supports.
> >
> > And below are all the occurance of these two items together .... if you
> > notice it has all the permutations of the three items (*1476937* *720020*
> *
> > 1518311*  )
> >
> > 22 *1476937* 720020 *1518311*
> > 30 *1518311* *1476937* 720020
> > 30 720020 *1518311* *1476937*
> > 34 720020 *1476937* *1518311*
> > 38 *1518311* 720020 *1476937*
> > 42 *1476937* *1518311* 720020
> > 234 *1518311* *1476937*
> > 254 *1476937* *1518311*
> >
> > Does this mean if I have to get the support of just the the pair
>  (*1476937*
> > *1518311*  ) I will have to add all of them up !?
> >
> > Even in that case ... this total comes out to *684* and if I count the
> > number of co-ocurrances of these two items in the original baskets the
> > support is *766*? Why's there a difference? any idea?
> >
> >
> > Thanks!
> > Vipul
>
>

Re: PFPGrowth - weird output?

Reply via email to