Thanks. Let me post a new thread on both lists with details

 
Regards,
Mahmood



On Tuesday, March 11, 2014 10:24 PM, Andrew Musselman 
<[email protected]> wrote:
 
Mahmood, just an observation and reminder, Suneel's not the only one on the
list.  We're all here to help.

This may be a question for a Hadoop list, unless I'm misunderstanding.
When you say "resume" what do you mean?



On Tue, Mar 11, 2014 at 11:46 AM, Mahmood Naderan <[email protected]>wrote:

> Suneel,
> One more thing.... Right now it has created 500 chunks. So 32GB out of
> 48GB (the original size of XML file) has been processed. Is it possible to
> resume that?
>
>
>
> Regards,
> Mahmood
>
>
>
> On Tuesday, March 11, 2014 9:47 PM, Mahmood Naderan <[email protected]>
> wrote:
>
> Suneel,
> Is it possible to create some king of parallelism in the process of
> creating the chunks in order to divide the resources to smaller pieces?
>
> Let me explain in this way. Assume with one thread needs 20GB of heap and
> my system cannot afford that. So I will divide that to 10 threads each
> needs 2GB.
> If my system supports 10GB of heap, then I will feed 5 threads at one
> time. When the first 5 threads are done (the chunks) then I will feed the
> next 5 threads and so on.
>
>
>
>
> Regards,
> Mahmood
>
>
>
> On Monday, March 10, 2014 9:42 PM, Mahmood Naderan <[email protected]>
> wrote:
>
> UPDATE:
> I split another 5.4GB XML file with 4GB of RAM and -Xmx128m and it took 5
> minutes
>
> Regards,
> Mahmood
>
>
>
> On Monday, March 10, 2014 7:16 PM, Mahmood Naderan <[email protected]>
> wrote:
>
> The extracted size is about 960MB (enwiki-latest-pages-articles10.xml).
> With 4GB of RAM set for the OS and -Xmx128m for Hadoop, it took 77 seconds
> to create 64MB chunks.
> I was able to see 15 chunks with "hadoop dfs -ls".
>
> P.S: Whenever I modify -Xmx value in mapred-site.xml, I run
>    $HADOOP/sbin/stop-all.sh && $HADOOP/sbin/start-all.sh
> Is that necessary?
>
>
> Regards,
> Mahmood
>
>
>
>
> On Monday, March 10, 2014 5:30 PM, Suneel Marthi <[email protected]>
> wrote:
>
> Morning Mahmood,
>
> Please first try running this on a smaller dataset like
> 'enwiki-latest-pages-articles10.xml' as opposed to running on the entire
> english wikipedia.
>
>
>
>
>
>
>
> On Monday, March 10, 2014 2:59 AM, Mahmood Naderan <[email protected]>
> wrote:
>
> Thanks for the update.
> Thing is, when that command is running, in another terminal I run 'top'
> command and I see that the java process takes less 1GB of memory. As
> another test, I
> increased the size of memory to 48GB (since I am working with virtualbox)
> and set the heap size to -Xmx45000m
>
> Still I get the heap error.
>
>
> I
>  expect that there should be a more meaningful error message that *who*
> needs more heap size? Hadoop, Mahout, Java, ....?
>
>
> Regards,
> Mahmood
>
>
>
>
> On Monday, March 10, 2014 1:31 AM, Suneel Marthi <[email protected]>
> wrote:
>
> Mahmood,
>
> Firstly thanks for starting this email thread and for
> highlighting the issues with
> wikipedia example. Since you raised this issue, I updated the new
> wikipedia examples page at
> http://mahout.apache.org/users/classification/wikipedia-bayes-example.html
> and also responded to a similar question on StackOverFlow at
>
> http://stackoverflow.com/questions/19505422/mahout-error-when-try-out-wikipedia-examples/22286839#22286839
> .
>
> I am assuming that u r running this locally on ur machine and r just
> trying out the examples. Try out Sebastian's suggestion or else try running
> the example on a much smaller dataset of wikipedia articles.
>
>
> Lastly, w do realize that u have been struggling with this for about 3
> days now.  Mahout presently lacks an entry for 'wikipediaXmlSplitter' in
> driver.classes.default.props.  Not sure at what point in time and which
> release that had happened.
>
> Please file a Jira for
>  this and submit a patch.
>
>
>
>
>
> On Sunday, March 9, 2014 2:25 PM, Mahmood Naderan <[email protected]>
> wrote:
>
> Hi Suneel,
> Do you have any idea? Searching the web shows many question regarding the
> heap size for wikipediaXMLSplitter. I have increased the the memory size to
> 16GB and still get that error. I have to say that using 'top' command, I
> see only 1GB of memory is in use. So I wonder why it report such an error.
> Is this a problem with Java, Mahout, Hadoop, ..?
>
>
> Regards,
> Mahmood
>
>
>
> On Sunday, March 9, 2014 4:00 PM, Mahmood Naderan
>  <[email protected]> wrote:
>
> Excuse me, I added the -Xmx option and restarted the
> hadoop services using
> sbin/stop-all.sh && sbin/start-all.sh
>
> however still I get heap size error. How can I find the correct and needed
> heap size?
>
>
> Regards,
> Mahmood
>
>
>
> On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan <[email protected]>
> wrote:
>
> OK  I found that I have to add this property to mapred-site.xml
>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx2048m</value>
> </property>
>
>
>
> Regards,
> Mahmood
>
>
>
>
> On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <[email protected]>
> wrote:
>
> Hello,
> I ran this command
>
>     ./bin/mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> but got this error
>      Exception in thread "main"
>  java.lang.OutOfMemoryError: Java heap space
>
> There are many web pages regarding this and the solution is to add "-Xmx
> 2048M"
>  for example. My question is, that option should be passed to java command
> and not Mahout. As  result,
> running "./bin/mahout -Xmx 2048M"
> shows that there is no such option. What should I
> do?
>
>
> Regards,
> Mahmood
>

Reply via email to