Re: Heap space

Mahmood Naderan Tue, 11 Mar 2014 11:19:23 -0700

Suneel,
Is it possible to create some king of parallelism in the process of creating 
the chunks in order to divide the resources to smaller pieces?

Let me explain in this way. Assume with one thread needs 20GB of heap and my
system cannot afford that. So I will divide that to 10 threads each needs 2GB.
If my system supports 10GB of heap, then I will feed 5 threads at one time.
When the first 5 threads are done (the chunks) then I will feed the next 5
threads and so on.

Regards,
Mahmood

On Monday, March 10, 2014 9:42 PM, Mahmood Naderan <[email protected]> wrote:

UPDATE:
I split another 5.4GB XML file with 4GB of RAM and -Xmx128m and it took 5
minutes
Regards,
Mahmood

On Monday, March 10, 2014 7:16 PM, Mahmood Naderan <[email protected]> wrote:

The extracted size is about 960MB (enwiki-latest-pages-articles10.xml).
With 4GB of RAM set for the OS and -Xmx128m for Hadoop, it took 77 seconds to
create 64MB chunks.
I was able to see 15 chunks with "hadoop dfs -ls".

P.S: Whenever I modify -Xmx value in mapred-site.xml, I run
$HADOOP/sbin/stop-all.sh && $HADOOP/sbin/start-all.sh
Is that necessary?

Regards,
Mahmood

On Monday, March 10, 2014 5:30 PM, Suneel Marthi <[email protected]>
wrote:

Morning Mahmood,

Please first try running this on a smaller dataset like
'enwiki-latest-pages-articles10.xml' as opposed to running on the entire
english wikipedia.

On Monday, March 10, 2014 2:59 AM, Mahmood Naderan <[email protected]> wrote:

Thanks for the update.
Thing is, when that command is running, in another terminal I run 'top' command
and I see that the java process takes less 1GB of memory. As another test, I
increased the size of memory to 48GB (since I am working with virtualbox) and
set the heap size to -Xmx45000m

Still I get the heap error.

I
expect that there should be a more meaningful error message that *who* needs
more heap size? Hadoop, Mahout, Java, ....?

Regards,
Mahmood

On Monday, March 10, 2014 1:31 AM, Suneel Marthi <[email protected]>
wrote:

Mahmood,

Firstly thanks for starting this email thread and for
highlighting the issues with
wikipedia example. Since you raised this issue, I updated the new wikipedia
examples page at
http://mahout.apache.org/users/classification/wikipedia-bayes-example.html
and also responded to a similar question on StackOverFlow at
http://stackoverflow.com/questions/19505422/mahout-error-when-try-out-wikipedia-examples/22286839#22286839.

I am assuming that u r running this locally on ur machine and r just trying out
the examples. Try out Sebastian's suggestion or else try running the example on
a much smaller dataset of wikipedia articles.

Lastly, w do realize that u have been struggling with this for about 3 days
now. Mahout presently lacks an entry for 'wikipediaXmlSplitter' in
driver.classes.default.props. Not sure at what point in time and which release
that had happened.

Please file a Jira for this and submit a patch.

On Sunday, March 9, 2014 2:25 PM, Mahmood Naderan <[email protected]> wrote:

Hi Suneel,
Do you have any idea? Searching the web shows many question regarding the heap
size for wikipediaXMLSplitter. I have increased the the memory size to 16GB and
still get that error. I have to say that using 'top' command, I see only 1GB of
memory is in use. So I wonder why it report such an error.
Is this a problem with Java, Mahout, Hadoop, ..?

Regards,
Mahmood

On Sunday, March 9, 2014 4:00 PM, Mahmood Naderan <[email protected]> wrote:

Excuse me, I added the -Xmx option and restarted the
hadoop services using
sbin/stop-all.sh && sbin/start-all.sh

however still I get heap size error. How can I find the correct and needed heap
size?

Regards,
Mahmood

On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan <[email protected]> wrote:

OK I found that I have to add this property to mapred-site.xml

<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m</value>
</property>

Regards,
Mahmood

On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <[email protected]> wrote:

Hello,
I ran this command

./bin/mahout wikipediaXMLSplitter -d
examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64

but got this error
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

There are many web pages regarding this and the solution is to add "-Xmx 2048M"
for example. My question is, that option should be passed to java command and
not Mahout. As result,
running "./bin/mahout -Xmx 2048M"
shows that there is no such option. What should I
do?

Regards,
Mahmood

Re: Heap space

Reply via email to