Hi,

the script bin/crawl executes bin/nutch for every step (inject, fetch,
etc.).

bin/nutch makes use of two environment variables (see comments in bin/nutch
):
 NUTCH_HEAPSIZE  (in MB)
 NUTCH_OPTS         Extra Java runtime options

 export NUTCH_HEAPSIZE=2048
should work but also
 export NUTCH_OPTS="-Xmx2048m"

The latter one would allow to add more Java options separated by space.

Sebastian


2013/10/30 Bayu Widyasanyata <[email protected]>

> Hi All,
>
> When I ran crawl script [1] (not nutch's crawl), I got hava OOM heap space:
>
> 2013-10-29 12:56:25,407 WARN  mapred.LocalJobRunner -
> job_local1484958909_0001
>
> java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
>
>        at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
>
> Caused by: java.lang.OutOfMemoryError: Java heap space
>
>        at
> org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344)
>
>        at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406)
>
>        at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238)
>
>        at
>
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:348)
>
>        at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:368)
>
>        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
>
>        at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:517)
>
>        at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:399)
>
>        at org.apache.hadoop.mapred.Merger.merge(Merger.java:77)
>
>        at
>
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1698)
>
>        at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1328)
>
>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:431)
>
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>
>        at
>
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>
>        at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>
>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>
>        at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>        at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>        at java.lang.Thread.run(Thread.java:744)
>
> 2013-10-29 12:56:25,787 ERROR fetcher.Fetcher - Fetcher:
> java.io.IOException: Job failed!
>
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>
>        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1340)
>
>        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1376)
>
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>
>        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1349)
>
> I use nutch 1.7 and JDK 1.7.0.45.
>
> How to put java max heap size on crawl script? (-Xmx option)?
>
> Thanks in advance.-
>
> [1]
> http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script
>
> --
> wassalam,
> [bayu]
>

Reply via email to