Thank you Upayavira for your anser. In the case I described maxDoc is 19263.
As I check the Nutch, default indexing filter in Nutch is basic indexing
filter and also it have a property to delete gone and permanently
redirected pages which it value was false for me.
I think the problem is still
this message in context:
http://lucene.472066.n3.nabble.com/What-kind-of-nutch-documents-does-Solr-index-tp4231646p4232034.html
Sent from the Solr - User mailing list archive at Nabble.com.
I suspect you may be better off asking this on the Nutch user list. The
decisions you are describing will be within the Nutch codebase, not
Solr. Someone here may know (hopefully) but you may get more support
over on the Nutch list.
One suggestion -start with a clean, empty index. Run a crawl.
Hi,
I am using apache Nutch 1.7 to crawl and apache Solr 4.7.2 for indexing. In
my tests there is a gap between number of fetched results of Nutch and
number of indexed documents in Solr. For example one of the crawls is
fetched 23343 pages and 1146 images successfully while in the Solr 19250
docs