Hello -

I am using latest nutch trunk on a Linux machine (single file system)
- I am trying to fetch about 5-10K pages and every time I run fetch
command, after fetching few hundred pages, it starts throwing
OutofMemory exception (not related to heapsize):

2008-02-08 02:41:01,395 FATAL fetcher.Fetcher - java.io.IOException:
java.io.IOException: Cannot allocate memory
2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
java.lang.ProcessImpl.start(ProcessImpl.java:65)
2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
java.lang.Runtime.exec(Runtime.java:591)
2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
java.lang.Runtime.exec(Runtime.java:464)
2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.ShellCommand.runCommand(ShellCommand.java:48)
2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.ShellCommand.run(ShellCommand.java:42)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.DF.getAvailable(DF.java:72)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:296)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:88)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(MapTask.java:382)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:364)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:354)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:178)

Hard-disk does have  enough space (over 20GB of which <2 GB is used)

I am mostly using default hadoop and nutch settings (I tried changing
number of fetch threads - default 35 to 50, and 100 - but it doesn't
have any impact - Fetcher keeps on throwing the above exception after
a while.

Any thoughts?

Thanks
Jha.

Reply via email to