I don't know if you are using any custom plugins on the fetching stage.
I don't even know if this is possible (I don't need it). But, I have had
a similar experience with indexing. After a few thousand pages, Nutch
would start complaining about lack of memory. The culprit was my plugin
that created a connection to a database in each call. 

So, if you _are_ using custom plugins, make sure that they don't leak
resources and reduce dependency on garbage collection to the minimum. 

Regards,

Arkadi

> -----Original Message-----
> From: DS jha [mailto:[EMAIL PROTECTED]
> Sent: Friday, February 08, 2008 4:17 PM
> To: [email protected]
> Subject: fetcher failing with outofmemory exception
> 
> Hello -
> 
> I am using latest nutch trunk on a Linux machine (single file system)
> - I am trying to fetch about 5-10K pages and every time I run fetch
> command, after fetching few hundred pages, it starts throwing
> OutofMemory exception (not related to heapsize):
> 
> 2008-02-08 02:41:01,395 FATAL fetcher.Fetcher - java.io.IOException:
> java.io.IOException: Cannot allocate memory
> 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
> 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> java.lang.ProcessImpl.start(ProcessImpl.java:65)
> 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
> 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> java.lang.Runtime.exec(Runtime.java:591)
> 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> java.lang.Runtime.exec(Runtime.java:464)
> 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> org.apache.hadoop.fs.ShellCommand.runCommand(ShellCommand.java:48)
> 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> org.apache.hadoop.fs.ShellCommand.run(ShellCommand.java:42)
> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> org.apache.hadoop.fs.DF.getAvailable(DF.java:72)
> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
>
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathF
or
> Write(LocalDirAllocator.java:296)
> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
>
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllo
ca
> tor.java:124)
> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
>
org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFil
e.
> java:88)
> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
>
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(MapT
as
> k.java:382)
> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
>
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:36
4)
> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
>
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:354)
> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:178)
> 
> Hard-disk does have  enough space (over 20GB of which <2 GB is used)
> 
> I am mostly using default hadoop and nutch settings (I tried changing
> number of fetch threads - default 35 to 50, and 100 - but it doesn't
> have any impact - Fetcher keeps on throwing the above exception after
> a while.
> 
> Any thoughts?
> 
> Thanks
> Jha.


Reply via email to