Hi Manish,

On Sat, Jan 16, 2016 at 5:19 AM, <[email protected]> wrote:

> I was Checking the Nutch logs,I observed there are more fetching logs then
> parsed logs.
> I understand parsing does not happen for urls with fetch fail but the
> difference is so high, any Idea ?


How did you run the crawl e.g. did you enable filtering at each stage? or
more than one stage? A dump of your crawldb will most likely give you
insight into what is going on.
Also, see the ProtocolStatus tool and the work Mike Jove has been doing
recently e.g. c, it will give you an insight into what is going on
https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/util/ProtocolStatusStatistics.java
Ta

Reply via email to