Hi all,

I'm running nutch 1.6 and solr 3.6.2 and I'm crawling with depth 1 topN
1000000 and 'db.update.additions.allowed' false.
The idea is to fetch, parse and index only the URLs in the seed list.

I seed ~120K URLs but in solr I see only ~20K indexed.

The fetch job counters show:

moved 49,937
robots_denied 1,149
robots_denied_maxcrawldelay 267
hitByTimeLimit 6,072
exception 4,479
notmodified 2
access_denied 4
temp_moved 4,658
success 23,033
notfound 1,658

and the ParserStatus success count is 22844

What happened to all the URLs ? they are all active URLs, not some old
list...

Thanks,

Amit.

Reply via email to