Hi, One more question for NUTCH_OPTS. Is it only for Java additional options or we could pass any nutch options e.g. topN, depth, etc.?
Since I couldn't find more tutorial on crawl script instead of mentioned here [1]. [1] http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script On Thu, Oct 31, 2013 at 8:43 PM, Bayu Widyasanyata <[email protected]>wrote: > Hi Sebastian, > > Thanks for the hint. > > --- > wassalam, > [bayu] > > /sent from Android phone/ > On Oct 30, 2013 7:54 PM, "Sebastian Nagel" <[email protected]> > wrote: > >> Hi, >> >> the script bin/crawl executes bin/nutch for every step (inject, fetch, >> etc.). >> >> bin/nutch makes use of two environment variables (see comments in >> bin/nutch >> ): >> NUTCH_HEAPSIZE (in MB) >> NUTCH_OPTS Extra Java runtime options >> >> export NUTCH_HEAPSIZE=2048 >> should work but also >> export NUTCH_OPTS="-Xmx2048m" >> >> The latter one would allow to add more Java options separated by space. >> >> Sebastian >> >> >> 2013/10/30 Bayu Widyasanyata <[email protected]> >> >> > Hi All, >> > >> > When I ran crawl script [1] (not nutch's crawl), I got hava OOM heap >> space: >> > >> > 2013-10-29 12:56:25,407 WARN mapred.LocalJobRunner - >> > job_local1484958909_0001 >> > >> > java.lang.Exception: java.lang.OutOfMemoryError: Java heap space >> > >> > at >> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) >> > >> > Caused by: java.lang.OutOfMemoryError: Java heap space >> > >> > at >> > org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344) >> > >> > at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406) >> > >> > at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238) >> > >> > at >> > >> > >> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:348) >> > >> > at >> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:368) >> > >> > at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) >> > >> > at >> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:517) >> > >> > at >> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:399) >> > >> > at org.apache.hadoop.mapred.Merger.merge(Merger.java:77) >> > >> > at >> > >> > >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1698) >> > >> > at >> > >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1328) >> > >> > at >> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:431) >> > >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) >> > >> > at >> > >> > >> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) >> > >> > at >> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> > >> > at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> > >> > at >> > >> > >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> > >> > at >> > >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> > >> > at java.lang.Thread.run(Thread.java:744) >> > >> > 2013-10-29 12:56:25,787 ERROR fetcher.Fetcher - Fetcher: >> > java.io.IOException: Job failed! >> > >> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) >> > >> > at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1340) >> > >> > at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1376) >> > >> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> > >> > at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1349) >> > >> > I use nutch 1.7 and JDK 1.7.0.45. >> > >> > How to put java max heap size on crawl script? (-Xmx option)? >> > >> > Thanks in advance.- >> > >> > [1] >> > http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script >> > >> > -- >> > wassalam, >> > [bayu] >> > >> > -- wassalam, [bayu]

