Re: wikipedia bayes quickstart example on EC2 (cloudera)

Jessie Wright Sun, 02 Mar 2014 09:33:26 -0800

I put the 0.9 distribution onto my cluster, updated my MAHOUT_HOME.

The discrepancy from the tutorial (wikipediaXMLSplitter vs
org.apache.mahout.text.wikipedia.WikipediaXMLSplitter) still stands.


Then I got a new issue:
NoSuchMethodError:
org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)

>From peeking around I tried running it as:
.../hadoop jar  $MAHOUT_HOME/mahout-examples-0.9-job.jar
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter -d
$MAHOUT_HOME/examples/tmp/enwiki-latest-pages-articles.xml -o
wikipedia/chunks -c 64

and back to the heap space error.

Full details here:  http://pastebin.com/rCNyTypf

Any advice is greatly appreciated.
Jessie


---------- Forwarded message ----------
From: Suneel Marthi <[email protected]>
To: "[email protected]" <[email protected]>
Cc:
Date: Sat, 1 Mar 2014 09:15:22 -0800 (PST)
Subject: Re: wikipedia bayes quickstart example on EC2 (cloudera)
Please work off of the latest Mahout 0.9, most of these issues from Mahout
0.7 have been addressed in later releases.





On Saturday, March 1, 2014 12:14 PM, Jessie Wright <[email protected]>
wrote:

Hi,

I'm a noob and trying to run the wikipedia bayes example on EC2 (using a
cdh4.5 setup).  I've searched the archives and haven't been able to find
info on this.  I apologize if this is a duplicate question.

The cloudera install comes with Mahout 0.7.

I've run into a few snags on the first step (chunking the data into
pieces).  The first was that it couldn't find  wikipediaXMLSplitter but I
found that substituting
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter
in the command it got past that error. (just changing
the capitalization wasn't enough)

However I am now stuck.  I'm getting a java.lang.OutOfMemoryError: Java
heap space error.
I upped MAHOUT_HEAPSIZE to 5000 and am still getting the same error.
See the full error here: http://pastebin.com/P5PYuR8U  (I added a print
statement to the mahout/bin just to confirm that my export of
MAHOUT_HEAPSIZE was being successfully detected)

I'm wondering whether some other  setting is overriding the
MAHOUT_HEAPSIZE?  One of the hadoop or cloudera specific ones?

Does anyone have any experience with this or suggestions?

Thank you,

Jessie Wright

Re: wikipedia bayes quickstart example on EC2 (cloudera)

Reply via email to