Re: why is nutch2.1 trying to parse the same documnets again and again?

Lewis John Mcgibbney Wed, 27 Feb 2013 00:24:13 -0800

Have you looked at the java code?
I am curious (and confused) about this "different batch id (null)" logging
and want to either get rid of it... or better... make it more informative
which would address both of our concerns.
I would like not only to document this in the java code but also on the
nutch wiki.


On Wednesday, February 27, 2013, adfel70 <[email protected]> wrote:
> Hi
> I'm using nutch 2.1 and hbase.
> I perform my first crawl and see that nutch is trying to parse the same
> files in different cycles.
> after the first time I always get "different batch id (null)" on the
already
> parsed files, so I assume that parsing is not actually performed.
> But the question is why nutch tries to parse these files at all?
>
> Is this because its the only place where the test of whether the file has
> already been parsed is performed?
>
>
>
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/why-is-nutch2-1-trying-to-parse-the-same-documnets-again-and-again-tp4043317.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

-- 
*Lewis*

Re: why is nutch2.1 trying to parse the same documnets again and again?

Reply via email to