Re: What kind of nutch documents does Solr index?

2015-09-30 Thread Daniel Holmes
Thank you Upayavira for your anser. In the case I described maxDoc is 19263. As I check the Nutch, default indexing filter in Nutch is basic indexing filter and also it have a property to delete gone and permanently redirected pages which it value was false for me. I think the problem is still

Re: What kind of nutch documents does Solr index?

2015-09-30 Thread NutchDev
this message in context: http://lucene.472066.n3.nabble.com/What-kind-of-nutch-documents-does-Solr-index-tp4231646p4232034.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: What kind of nutch documents does Solr index?

2015-09-28 Thread Upayavira
I suspect you may be better off asking this on the Nutch user list. The decisions you are describing will be within the Nutch codebase, not Solr. Someone here may know (hopefully) but you may get more support over on the Nutch list. One suggestion -start with a clean, empty index. Run a crawl.

What kind of nutch documents does Solr index?

2015-09-28 Thread Daniel Holmes
Hi, I am using apache Nutch 1.7 to crawl and apache Solr 4.7.2 for indexing. In my tests there is a gap between number of fetched results of Nutch and number of indexed documents in Solr. For example one of the crawls is fetched 23343 pages and 1146 images successfully while in the Solr 19250 docs