Hi, the script bin/crawl executes bin/nutch for every step (inject, fetch, etc.).
bin/nutch makes use of two environment variables (see comments in bin/nutch ): NUTCH_HEAPSIZE (in MB) NUTCH_OPTS Extra Java runtime options export NUTCH_HEAPSIZE=2048 should work but also export NUTCH_OPTS="-Xmx2048m" The latter one would allow to add more Java options separated by space. Sebastian 2013/10/30 Bayu Widyasanyata <[email protected]> > Hi All, > > When I ran crawl script [1] (not nutch's crawl), I got hava OOM heap space: > > 2013-10-29 12:56:25,407 WARN mapred.LocalJobRunner - > job_local1484958909_0001 > > java.lang.Exception: java.lang.OutOfMemoryError: Java heap space > > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) > > Caused by: java.lang.OutOfMemoryError: Java heap space > > at > org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344) > > at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406) > > at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238) > > at > > org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:348) > > at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:368) > > at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) > > at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:517) > > at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:399) > > at org.apache.hadoop.mapred.Merger.merge(Merger.java:77) > > at > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1698) > > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1328) > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:431) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) > > at > > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > > 2013-10-29 12:56:25,787 ERROR fetcher.Fetcher - Fetcher: > java.io.IOException: Job failed! > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) > > at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1340) > > at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1376) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1349) > > I use nutch 1.7 and JDK 1.7.0.45. > > How to put java max heap size on crawl script? (-Xmx option)? > > Thanks in advance.- > > [1] > http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script > > -- > wassalam, > [bayu] >

