Re: Error Running Frequent Itemset Mining Example

Mike Roberts Thu, 20 May 2010 21:52:08 -0700

Okay, got it "working" -- sort of. No errors, but no actual output either.


I installed a new AMI per the wiki article mentioned in this thread.  Worked
like a champ.

Then, I tried to run the sample dataset "accidents.dat" using the fpg
algorithm.  I used the exact command line I found here on Grant Ingersoll's
site:
http://lucene.grantingersoll.com/2010/02/16/trijug-intro-to-mahout-slides-and-demo-examples/

 ./bin/mahout fpg -i /home/mahout-in/fpm-input/accidents.dat -o patterns -k
50 -method mapreduce -g 10 -regex [\ ]

Okay.  So the job "completed" "successfully".  But, the folder that should
contain the final output (./patterns/fpgrowth/part-r-00000) is only *121
bytes total*.

When I run the next command (also copied from the same blog article above):
 ./bin/mahout seqdumper --seqFile patterns/fpgrowth/part-r-00000

and that "sucessfully" runs with a result of :

*Input Path: patterns/fpgrowth/part-r-00000
Key class: class org.apache.hadoop.io.Text Value Class: class
org.apache.mahout.fpm.pfpgrowth.convertors.string.TopKStringPatterns
Count: 0*

So, what's the haps?  Can anybody produce real output using my command line
and data?

Thanks.

P.S. for the purpose of loosely documenting (and preserving for
search) anissue I ran into and solved -- I ran into a Java Heap Out of
Memory error while running this.  Then, that put the Hadoop datanode into
safemode which essentially stopped datanode from running when I did a
stop-all.sh and start-all.sh.  So, the workaround was to blow away and
reformat the datanode, like so:

*$HADOOP_HOME/bin/stop-all.sh
rm -r /usr/local/hadoop-data/*   <--*location set in hdfs-site.xml
*$HADOOP_HOME/bin/hadoop namenode -format
$HADOOP_HOME/bin/hadoop datanode -format
$HADOOP_HOME/bin/start-all.sh
$HADOOP_HOME/bin/hadoop dfs -mkdir /fpm-input
$HADOOP_HOME/bin/hadoop dfs -put /home/mahout-in/fpm-input/* /fpm-input*




On Tue, May 18, 2010 at 7:40 PM, Jeff Eastman <[email protected]>wrote:

> Ya, I put it up last Thursday. Thought it might come in handy :). Of
> course, it runs LDA, k-Means, Dirichlet too, though LDA is taking *forever*
> to run build-reuters.sh with a single mapper. Gotta look into that next...
>
> Jeff
>
>
> On 5/18/10 7:11 PM, Mike Roberts wrote:
>
>> Ah, nice!  That's new -- er very recently update.  Cool.  Thanks.
>>
>> On Tue, May 18, 2010 at 6:50 PM, Jeff Eastman<[email protected]
>> >wrote:
>>
>>
>>
>>> I'm running on a Cloudera Ubuntu based AMI that I subsequently configured
>>> as in https://cwiki.apache.org/confluence/display/MAHOUT/MahoutEC2
>>>
>>> Jeff
>>>
>>>
>>>
>>> On 5/18/10 6:37 PM, Mike Roberts wrote:
>>>
>>>
>>>
>>>> Nuts, and I was just about to finish my
>>>>
>>>> *"A Complete Newb’s Guide to (Installing on EC2) and Actually Running
>>>> Mahout
>>>> from the Command Line" *wiki post.
>>>>
>>>> Now, I'll have to see where I went wrong.  Which distro are you running?
>>>>  I
>>>> started with an Alestic Ubuntu 10.4 AMI (ami-cb97c68e).
>>>>
>>>> On Tue, May 18, 2010 at 5:34 PM, Jeff Eastman<
>>>> [email protected]
>>>>
>>>>
>>>>> wrote:
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> I also brought up a single instance at
>>>>> http://ec2-184-73-30-93.compute-1.amazonaws.com:50030/jobtracker.jspand
>>>>> that ran fine too. It looks to me like the problem, whatever it is, is
>>>>> in
>>>>> your AMI or its configuration.
>>>>>
>>>>> Jeff
>>>>>
>>>>>
>>>>>
>>>>> On 5/18/10 5:15 PM, Jeff Eastman wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Welll, I just brought up a 2 node cluster at
>>>>>>
>>>>>>
>>>>>> http://ec2-174-129-148-227.compute-1.amazonaws.com:50030/jobtracker.jspanditran
>>>>>>  fine.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 5/18/10 4:56 PM, Mike Roberts wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Single instance.  Thx.
>>>>>>>
>>>>>>> On Tue, May 18, 2010 at 4:49 PM, Jeff Eastman<
>>>>>>> [email protected]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>  Hi Mike,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Shouldn't happen. You running this on a single instance or on a
>>>>>>>> hadoop
>>>>>>>> cluster? I will see if I can duplicate.
>>>>>>>>
>>>>>>>> Jeff
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/18/10 4:27 PM, Mike Roberts wrote:
>>>>>>>>
>>>>>>>>  Hey Guys,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Just trying to get the example mentioned here working:
>>>>>>>>> https://cwiki.apache.org/MAHOUT/parallelfrequentpatternmining.html
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>> I downloaded the accidents.dat file and placed it in
>>>>>>>>> /home/ubuntu/mahout-in/fpm-input.
>>>>>>>>> I created a directory for the output as
>>>>>>>>> /home/ubuntu/mahout-in/fpm-out.
>>>>>>>>> Then, I ran the following command:
>>>>>>>>> ./bin/mahout fpg --input /home/ubuntu/mahout-in/fpm-input --output
>>>>>>>>> /home/ubuntu/mahout-in/fpm-out --method mapreduce
>>>>>>>>>
>>>>>>>>> It runs for a bit and after the first step I get the following
>>>>>>>>> error:
>>>>>>>>>
>>>>>>>>> java.io.IOException: java.lang.ClassNotFoundException:
>>>>>>>>> org.apache.mahout.common.Pair
>>>>>>>>>         at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:55)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:36)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.mahout.fpm.pfpgrowth.PFPGrowth.deserializeList(PFPGrowth.java:84)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.mahout.fpm.pfpgrowth.TransactionSortingMapper.setup(TransactionSortingMapper.java:77)
>>>>>>>>>
>>>>>>>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>>>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>>>>>         at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The step that it was running:
>>>>>>>>> 10/05/18 23:10:18 INFO pfpgrowth.PFPGrowth: No of Features: 30
>>>>>>>>> 10/05/18 23:10:18 INFO jvm.JvmMetrics: Cannot initialize JVM
>>>>>>>>> Metrics
>>>>>>>>> with
>>>>>>>>> processName=JobTracker, sessionId= - already initialized
>>>>>>>>> 10/05/18 23:10:18 WARN mapred.JobClient: Use GenericOptionsParser
>>>>>>>>> for
>>>>>>>>> parsing the arguments. Applications should implement Tool for the
>>>>>>>>> same.
>>>>>>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to
>>>>>>>>> process
>>>>>>>>> :
>>>>>>>>> 1
>>>>>>>>> 10/05/18 23:10:19 INFO mapred.JobClient: Running job:
>>>>>>>>> job_local_0002
>>>>>>>>> 10/05/18 23:10:19 INFO input.FileInputFormat: Total input paths to
>>>>>>>>> process
>>>>>>>>> :
>>>>>>>>> 1
>>>>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: io.sort.mb = 100
>>>>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: data buffer =
>>>>>>>>> 79691776/99614720
>>>>>>>>> 10/05/18 23:10:19 INFO mapred.MapTask: record buffer =
>>>>>>>>> 262144/327680
>>>>>>>>> 10/05/18 23:10:19 WARN mapred.LocalJobRunner: job_local_0002
>>>>>>>>>
>>>>>>>>> Anyone know what's going on here, or have a solution?  I verified
>>>>>>>>> that
>>>>>>>>> the
>>>>>>>>> class file (Pair.Java) exists in
>>>>>>>>> /trunk/core/src/main/java/org/apache/mahout/common.  I did an mvn
>>>>>>>>> install
>>>>>>>>> in
>>>>>>>>> core just to be sure.  I'm running Hadoop 20.2 on Ubuntu 10.4 on
>>>>>>>>> EC2.
>>>>>>>>>  BTW,
>>>>>>>>> if it's not obvious, I'm a total Mahout n00b.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Mike
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>

Re: Error Running Frequent Itemset Mining Example

Reply via email to