You can change the -Xms and -Xmx settings in the mapred.child.java.opts 
variable in your hadoop-site.xml file to allow more memory for your 
tasks.  Are you trying to parse extremely big pages or files such as 
PDFs.  If you are you can also set maximum size limits for downloaded 
content using the file.content.limit and ftp.content.limit options in 
your nutch-site.xml file.

Dennis Kubes

Manoharam Reddy wrote:
> Time and again I get this error and as a result the segment remains
> incomplete. This wastes one iteration of the for() loop in which I am
> doing generate, fetch and update.
> 
> Can someone please tell me what are the measures I can take to avoid
> this error? And isn't it possible to make some code changes so that
> the whole fetch doesn't have to stop suddenly when this error occurs.
> Can't we do something in the code so that, the fetch still continues
> like in case of SocketException, in which case the fetch while(1) loop
> continues.
> 
> If it is not possible, please tell me how can I prevent this error
> from happening?
> 
> ----- ERROR -----
> 
> fetch of http://telephony/register.asp failed with:
> java.lang.OutOfMemoryError: Java heap space
> java.lang.NullPointerException
> at 
> org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87)
>  
> 
> at 
> org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125)
> ......
> at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:115)
> fetcher caught:java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87)
>  
> 
> at 
> org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125)
> .......
> at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:115)
> fetcher caught:java.lang.NullPointerException
> Fetcher: java.io.IOException: Job failed!
>  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>  at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:470)
>  at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:505)
>  at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>  at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:477)

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to