You can change the -Xms and -Xmx settings in the mapred.child.java.opts variable in your hadoop-site.xml file to allow more memory for your tasks. Are you trying to parse extremely big pages or files such as PDFs. If you are you can also set maximum size limits for downloaded content using the file.content.limit and ftp.content.limit options in your nutch-site.xml file.
Dennis Kubes Manoharam Reddy wrote: > Time and again I get this error and as a result the segment remains > incomplete. This wastes one iteration of the for() loop in which I am > doing generate, fetch and update. > > Can someone please tell me what are the measures I can take to avoid > this error? And isn't it possible to make some code changes so that > the whole fetch doesn't have to stop suddenly when this error occurs. > Can't we do something in the code so that, the fetch still continues > like in case of SocketException, in which case the fetch while(1) loop > continues. > > If it is not possible, please tell me how can I prevent this error > from happening? > > ----- ERROR ----- > > fetch of http://telephony/register.asp failed with: > java.lang.OutOfMemoryError: Java heap space > java.lang.NullPointerException > at > org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87) > > > at > org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125) > ...... > at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:115) > fetcher caught:java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87) > > > at > org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125) > ....... > at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:115) > fetcher caught:java.lang.NullPointerException > Fetcher: java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) > at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:470) > at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:505) > at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) > at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:477) ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers