Re: why is nutch2.1 trying to parse the same documnets again and again?

adfel70 Wed, 27 Feb 2013 00:36:53 -0800

Yes I looked at the code.
I saw that shouldProccess() check is performed on each file in the mapper.
I've got used in nutch1.* to a method in which in each cycle only a set of
urls is being processed.
Is nutch2.* processing all the urls in each cycle and thus, this
shouldProccess() is required to make sure the same file isn't parsed twice?



Also, I see that there is a loop on depth parameter. So if the defined depth
is greater than the actual depth of the site I'm crawling, the loop will
just go on until it reaches the defined depth?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-is-nutch2-1-trying-to-parse-the-same-documnets-again-and-again-tp4043317p4043323.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: why is nutch2.1 trying to parse the same documnets again and again?

Reply via email to