Works fine and my memory problem had to do with the fact that I had too many threads...
2009/12/5 MilleBii <mille...@gmail.com> > Thx again Julien, > > Yes I'm going to buy myself the Hadoop book, because I thought I could do > without but I realize that I need to make good use of hadooop. > > Didn't know you could split fetching & parsing: so I suppose you just > issue nutch fetch <segment> -noParsing, followed by nutch parse <segment>. I > will try on my next run. > > > > 2009/12/5 Julien Nioche <lists.digitalpeb...@gmail.com> > > HADOOP_HEAPSIZE specifies the memory to be used by the hadoop demons and >> does NOT affect the memory used for the map/ reduce jobs. Maybe you should >> invest a bit of time reading about Hadoop first? >> >> As for your memory problem it could be due to the parsing and not the >> fetching. If you don't already do so I suggest that you separate the >> fetching from the parsing. First that will tell you which part fails + if >> it >> does fail in the parsing then you would not need to refetch the content >> >> J. >> >> 2009/12/5 MilleBii <mille...@gmail.com> >> >> > My fetch cycle failed on the following initial error : >> > >> > java.io.IOException: Task process exit with nonzero status of 65. >> > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425) >> > >> > Than it makes a second attempt and after 3 hours I bump on that error >> > (altough I had double HADOOP_HEAPSIZE): >> > >> > java.lang.OutOfMemoryError: GC overhead limit exceeded >> > >> > >> > Any idea what the initial error is or could be ? >> > For the second one, I'm going to reduce number of threads... but I'm >> > wondering if there could be a memory leak ? And I don't how to trace >> that. >> > >> > -- >> > -MilleBii- >> > >> >> >> >> -- >> DigitalPebble Ltd >> http://www.digitalpebble.com >> > > > > -- > -MilleBii- > -- -MilleBii-