Hi Sebastian, Thanks for the hint.
--- wassalam, [bayu] /sent from Android phone/ On Oct 30, 2013 7:54 PM, "Sebastian Nagel" <[email protected]> wrote: > Hi, > > the script bin/crawl executes bin/nutch for every step (inject, fetch, > etc.). > > bin/nutch makes use of two environment variables (see comments in bin/nutch > ): > NUTCH_HEAPSIZE (in MB) > NUTCH_OPTS Extra Java runtime options > > export NUTCH_HEAPSIZE=2048 > should work but also > export NUTCH_OPTS="-Xmx2048m" > > The latter one would allow to add more Java options separated by space. > > Sebastian > > > 2013/10/30 Bayu Widyasanyata <[email protected]> > > > Hi All, > > > > When I ran crawl script [1] (not nutch's crawl), I got hava OOM heap > space: > > > > 2013-10-29 12:56:25,407 WARN mapred.LocalJobRunner - > > job_local1484958909_0001 > > > > java.lang.Exception: java.lang.OutOfMemoryError: Java heap space > > > > at > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) > > > > Caused by: java.lang.OutOfMemoryError: Java heap space > > > > at > > org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344) > > > > at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406) > > > > at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238) > > > > at > > > > > org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:348) > > > > at > org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:368) > > > > at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) > > > > at > org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:517) > > > > at > org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:399) > > > > at org.apache.hadoop.mapred.Merger.merge(Merger.java:77) > > > > at > > > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1698) > > > > at > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1328) > > > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:431) > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) > > > > at > > > > > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) > > > > at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > > > at java.lang.Thread.run(Thread.java:744) > > > > 2013-10-29 12:56:25,787 ERROR fetcher.Fetcher - Fetcher: > > java.io.IOException: Job failed! > > > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) > > > > at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1340) > > > > at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1376) > > > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > > > at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1349) > > > > I use nutch 1.7 and JDK 1.7.0.45. > > > > How to put java max heap size on crawl script? (-Xmx option)? > > > > Thanks in advance.- > > > > [1] > > http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script > > > > -- > > wassalam, > > [bayu] > > >

