Okay, got it "working" -- sort of. No errors, but no actual output either.
I installed a new AMI per the wiki article mentioned in this thread. Worked like a champ. Then, I tried to run the sample dataset "accidents.dat" using the fpg algorithm. I used the exact command line I found here on Grant Ingersoll's site: http://lucene.grantingersoll.com/2010/02/16/trijug-intro-to-mahout-slides-and-demo-examples/ ./bin/mahout fpg -i /home/mahout-in/fpm-input/accidents.dat -o patterns -k 50 -method mapreduce -g 10 -regex [\ ] Okay. So the job "completed" "successfully". But, the folder that should contain the final output (./patterns/fpgrowth/part-r-00000) is only *121 bytes total*. When I run the next command (also copied from the same blog article above): ./bin/mahout seqdumper --seqFile patterns/fpgrowth/part-r-00000 and that "sucessfully" runs with a result of : *Input Path: patterns/fpgrowth/part-r-00000 Key class: class org.apache.hadoop.io.Text Value Class: class org.apache.mahout.fpm.pfpgrowth.convertors.string.TopKStringPatterns Count: 0* So, what's the haps? Can anybody produce real output using my command line and data? Thanks. P.S. for the purpose of loosely documenting (and preserving for search) anissue I ran into and solved -- I ran into a Java Heap Out of Memory error while running this. Then, that put the Hadoop datanode into safemode which essentially stopped datanode from running when I did a stop-all.sh and start-all.sh. So, the workaround was to blow away and reformat the datanode, like so: *$HADOOP_HOME/bin/stop-all.sh rm -r /usr/local/hadoop-data/* <--*location set in hdfs-site.xml *$HADOOP_HOME/bin/hadoop namenode -format $HADOOP_HOME/bin/hadoop datanode -format $HADOOP_HOME/bin/start-all.sh $HADOOP_HOME/bin/hadoop dfs -mkdir /fpm-input $HADOOP_HOME/bin/hadoop dfs -put /home/mahout-in/fpm-input/* /fpm-input* On Tue, May 18, 2010 at 7:40 PM, Jeff Eastman <[email protected]>wrote: > Ya, I put it up last Thursday. Thought it might come in handy :). Of > course, it runs LDA, k-Means, Dirichlet too, though LDA is taking *forever* > to run build-reuters.sh with a single mapper. Gotta look into that next... > > Jeff > > > On 5/18/10 7:11 PM, Mike Roberts wrote: > >> Ah, nice! That's new -- er very recently update. Cool. Thanks. >> >> On Tue, May 18, 2010 at 6:50 PM, Jeff Eastman<[email protected] >> >wrote: >> >> >> >>> I'm running on a Cloudera Ubuntu based AMI that I subsequently configured >>> as in https://cwiki.apache.org/confluence/display/MAHOUT/MahoutEC2 >>> >>> Jeff >>> >>> >>> >>> On 5/18/10 6:37 PM, Mike Roberts wrote: >>> >>> >>> >>>> Nuts, and I was just about to finish my >>>> >>>> *"A Complete Newb’s Guide to (Installing on EC2) and Actually Running >>>> Mahout >>>> from the Command Line" *wiki post. >>>> >>>> Now, I'll have to see where I went wrong. Which distro are you running? >>>> I >>>> started with an Alestic Ubuntu 10.4 AMI (ami-cb97c68e). >>>> >>>> On Tue, May 18, 2010 at 5:34 PM, Jeff Eastman< >>>> [email protected] >>>> >>>> >>>>> wrote: >>>>> >>>>> >>>> >>>> >>>> >>>> >>>>> I also brought up a single instance at >>>>> http://ec2-184-73-30-93.compute-1.amazonaws.com:50030/jobtracker.jspand >>>>> that ran fine too. It looks to me like the problem, whatever it is, is >>>>> in >>>>> your AMI or its configuration. >>>>> >>>>> Jeff >>>>> >>>>> >>>>> >>>>> On 5/18/10 5:15 PM, Jeff Eastman wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> Welll, I just brought up a 2 node cluster at >>>>>> >>>>>> >>>>>> http://ec2-174-129-148-227.compute-1.amazonaws.com:50030/jobtracker.jspanditran >>>>>> fine. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 5/18/10 4:56 PM, Mike Roberts wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Single instance. Thx. >>>>>>> >>>>>>> On Tue, May 18, 2010 at 4:49 PM, Jeff Eastman< >>>>>>> [email protected] >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Hi Mike, >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Shouldn't happen. You running this on a single instance or on a >>>>>>>> hadoop >>>>>>>> cluster? I will see if I can duplicate. >>>>>>>> >>>>>>>> Jeff >>>>>>>> >>>>>>>> >>>>>>>> On 5/18/10 4:27 PM, Mike Roberts wrote: >>>>>>>> >>>>>>>> Hey Guys, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Just trying to get the example mentioned here working: >>>>>>>>> https://cwiki.apache.org/MAHOUT/parallelfrequentpatternmining.html >>>>>>>>> . >>>>>>>>> >>>>>>>>> I downloaded the accidents.dat file and placed it in >>>>>>>>> /home/ubuntu/mahout-in/fpm-input. >>>>>>>>> I created a directory for the output as >>>>>>>>> /home/ubuntu/mahout-in/fpm-out. >>>>>>>>> Then, I ran the following command: >>>>>>>>> ./bin/mahout fpg --input /home/ubuntu/mahout-in/fpm-input --output >>>>>>>>> /home/ubuntu/mahout-in/fpm-out --method mapreduce >>>>>>>>> >>>>>>>>> It runs for a bit and after the first step I get the following >>>>>>>>> error: >>>>>>>>> >>>>>>>>> java.io.IOException: java.lang.ClassNotFoundException: >>>>>>>>> org.apache.mahout.common.Pair >>>>>>>>> at >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:55) >>>>>>>>> >>>>>>>>> at >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:36) >>>>>>>>> >>>>>>>>> at >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75) >>>>>>>>> >>>>>>>>> at >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.mahout.fpm.pfpgrowth.PFPGrowth.deserializeList(PFPGrowth.java:84) >>>>>>>>> >>>>>>>>> at >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.mahout.fpm.pfpgrowth.TransactionSortingMapper.setup(TransactionSortingMapper.java:77) >>>>>>>>> >>>>>>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >>>>>>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >>>>>>>>> at >>>>>>>>> >>>>>>>>> >>>>>>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> The step that it was running: >>>>>>>>> 10/05/18 23:10:18 INFO pfpgrowth.PFPGrowth: No of Features: 30 >>>>>>>>> 10/05/18 23:10:18 INFO jvm.JvmMetrics: Cannot initialize JVM >>>>>>>>> Metrics >>>>>>>>> with >>>>>>>>> processName=JobTracker, sessionId= - already initialized >>>>>>>>> 10/05/18 23:10:18 WARN mapred.JobClient: Use GenericOptionsParser >>>>>>>>> for >>>>>>>>> parsing the arguments. Applications should implement Tool for the >>>>>>>>> same. >>>>>>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to >>>>>>>>> process >>>>>>>>> : >>>>>>>>> 1 >>>>>>>>> 10/05/18 23:10:19 INFO mapred.JobClient: Running job: >>>>>>>>> job_local_0002 >>>>>>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to >>>>>>>>> process >>>>>>>>> : >>>>>>>>> 1 >>>>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: io.sort.mb = 100 >>>>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: data buffer = >>>>>>>>> 79691776/99614720 >>>>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: record buffer = >>>>>>>>> 262144/327680 >>>>>>>>> 10/05/18 23:10:19 WARN mapred.LocalJobRunner: job_local_0002 >>>>>>>>> >>>>>>>>> Anyone know what's going on here, or have a solution? I verified >>>>>>>>> that >>>>>>>>> the >>>>>>>>> class file (Pair.Java) exists in >>>>>>>>> /trunk/core/src/main/java/org/apache/mahout/common. I did an mvn >>>>>>>>> install >>>>>>>>> in >>>>>>>>> core just to be sure. I'm running Hadoop 20.2 on Ubuntu 10.4 on >>>>>>>>> EC2. >>>>>>>>> BTW, >>>>>>>>> if it's not obvious, I'm a total Mahout n00b. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Mike >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> >> >> > >
