I've rewritten the FPGrowth wiki page. Is still a bit ragged. Please critique for content.
https://cwiki.apache.org/confluence/display/MAHOUT/Parallel+Frequent+Pattern+Mining On Thu, Jul 28, 2011 at 12:59 AM, Lance Norskog <[email protected]> wrote: > Ok, now I've succeeded in running fpgrowth, both sequential and > mapreduce, from the 'fpg' job and the flag that chooses 'sequential' > from 'mapreduce'. I've done this from two different datasets, > accidents.dat and retail.dat. I only ran the first thousand lines of > both datasets for time reasons. > > Both sequential and mapreduce locate the same ids as being in > patterns. Examining the patterns in detail, they do not match but > patterns involving id X generally the same size. Successive runs of > each variant give exactly the same results, so having sequential and > mapreduce give different result sets is puzzling. Pulling the > distances is a little difficult with text processing. > > What can account for the different outputs of map/reduce and > sequential (pseudo-distributed) modes? > > > > > On 7/27/11, Lance Norskog <[email protected]> wrote: >> I'll prep a current version. >> >> On 7/27/11, Robin Anil <[email protected]> wrote: >>> On Tue, Jul 26, 2011 at 11:06 PM, Lance Norskog <[email protected]> >>> wrote: >>> >>>> The parameters and files mentioned on this page do not find any >>>> frequent patterns: >>>> >>>> https://cwiki.apache.org/confluence/display/MAHOUT/Parallel+Frequent+Pattern+Mining >>> >>> Let me run and correct this doc. >>> >>>> >>>> >>>> Have 'accidents.dat.gz' from the given site, or 'retail.dat.gz' from >>>> the same site, what parameters should find some frequent patterns? >>> >>> >>>> Also, what is the magic to get maven to pass JDK options to an exec'd >>>> class? >>> >>> Did you try using the bin/mahout script. the memory size is configurable >>> inside it. >>> >>> >>>> FPGrowth sequential needs the memory size bumped up. >>> >>> >>>> Cheers, >>>> >>>> -- >>>> Lance Norskog >>>> [email protected] >>>> >>> >> >> >> -- >> Lance Norskog >> [email protected] >> > > > -- > Lance Norskog > [email protected] > -- Lance Norskog [email protected]
