Hi
I'm using nutch 2.1 and hbase.
I perform my first crawl and see that nutch is trying to parse the same
files in different cycles.
after the first time I always get "different batch id (null)" on the already
parsed files, so I assume that parsing is not actually performed.
But the question is why nutch tries to parse these files at all?

Is this because its the only place where the test of whether the file has
already been parsed is performed?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-is-nutch2-1-trying-to-parse-the-same-documnets-again-and-again-tp4043317.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to