Re: NegativeArraySizeException and "problem advancing port rec#" during fetching

Ferdy Galema Tue, 31 Jul 2012 05:02:22 -0700

Hi,

Have you tried the mapreduce mailing list? This really looks like a Hadoop
specific error. (Note that crawl_generate really is really just a sequence
file.) What about using a different Hadoop version? It might be that CDH4
is just incompatible with the recent versions of Nutch. I know that CDH3
works. CDH4 is a major upgrade with lots of changes.

Because you can reproduce the error, you could also try to execute the job
with a debug session attached to the reducer. Include Hadoop sources in
your debugger so you are able to actually see what is happening. There
should be plenty resources explaining how to debug a mapreduce task. My
guess is that the crawl_generate is created with a corrupted entry. A
single url from the crawldb can cause this.

Ferdy.

On Tue, Jul 31, 2012 at 10:53 AM, [email protected] <
[email protected]> wrote:

> I have new information.
> It seems that in Task$ValueIterator.java, in the method readNextKey(),
> there's a call to keyIn.reset(...)
> well there it does
> count = start + length,
> where 'start' got the value of nextKeyBytes.getPosition() and 'length' get
> the value of nextKeyBytes.getLength();
>
> the sum is beyond integer limit, and thus count is turned to a negative
> number which after that caues and EOFException to be thrown.
>
> Any inputs from anyone regarding this new info?
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/NegativeArraySizeException-and-problem-advancing-port-rec-during-fetching-tp3994633p3998304.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

Re: NegativeArraySizeException and "problem advancing port rec#" during fetching

Reply via email to