Re: Fetch failing ?

MilleBii Sun, 06 Dec 2009 10:07:48 -0800

Works fine and my memory problem had to do with the fact that I had too many
threads...


2009/12/5 MilleBii <mille...@gmail.com>

> Thx again Julien,
>
> Yes I'm going to buy myself the Hadoop book, because I thought I could do
> without but I realize that I need to make good use of hadooop.
>
> Didn't know you could split fetching & parsing:  so I suppose you just
> issue nutch fetch <segment> -noParsing, followed by nutch parse <segment>. I
> will try on my next run.
>
>
>
> 2009/12/5 Julien Nioche <lists.digitalpeb...@gmail.com>
>
> HADOOP_HEAPSIZE specifies the memory to be used by the hadoop demons and
>> does NOT affect the memory used for the map/ reduce jobs. Maybe you should
>> invest a bit of time reading about Hadoop first?
>>
>> As for your memory problem it could be due to the parsing and not the
>> fetching. If you don't already do so I suggest that you separate the
>> fetching from the parsing. First that will tell you which part fails + if
>> it
>> does fail in the parsing then you would not need to refetch the content
>>
>> J.
>>
>> 2009/12/5 MilleBii <mille...@gmail.com>
>>
>> > My fetch cycle failed on the following initial error :
>> >
>> > java.io.IOException: Task process exit with nonzero status of 65.
>> >        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
>> >
>> > Than it makes a second attempt and after 3 hours I bump on that error
>> > (altough I had double HADOOP_HEAPSIZE):
>> >
>> > java.lang.OutOfMemoryError: GC overhead limit exceeded
>> >
>> >
>> > Any idea what the initial error is or could be ?
>> > For the second one, I'm going to reduce number of threads... but I'm
>> > wondering if there could be a memory leak ? And I don't how to trace
>> that.
>> >
>> > --
>> > -MilleBii-
>> >
>>
>>
>>
>> --
>> DigitalPebble Ltd
>> http://www.digitalpebble.com
>>
>
>
>
> --
> -MilleBii-
>



-- 
-MilleBii-

Re: Fetch failing ?

Reply via email to