Hi all, When I am doing full re-crawl, the old urls that are modified should be updated correct?That is not happening.
Please correct me where I am wrong. Below are the list of steps: - property set db.fetch.interval.default=600sec db.injector.update=true - crawl : bin/nutch crawl urls -solr http://localhost:8080/solrnutch-dir crawltest -depth 10 - after 600 sec - crawl : bin/nutch crawl urls -solr http://localhost:8080/solrnutch-dir crawltest -depth 10 - Nothing updated. data in solr indexes remain same. I checked the fetch segments(bin/nutch readseg), it is also old, But the fetch took place.. please see the brief steps of log. - I also deleted one URL and made it site not found so that it also delete from indexes (using -deleteGone) but this is also not deleted. The log shows it deleted but in indexes it is not deleted. I still this URL searchable. This Seems to be some cache problem (I cleared cache -webserver)or any setting that I have to do? Please let me know.] Please see : This question is related to my old thread but different question about update nt successful: data is not re-fetched. Thanks very much - David * * * * * * *The brief log trace while second crawl:* Injector: Converting injected urls to crawl db entries. Injector: total number of urls rejected by filters: 0 Injector: total number of urls injected after normalization and filtering: 1 Injector: Merging injected urls into crawl db. http://david.wordpress.in/ overwritten with injected record but update was specified. Injector: finished at 2013-03-05 23:25:49, elapsed: 00:00:03 Generator: starting at 2013-03-05 23:25:49 Generator: Selecting best-scoring urls due for fetch. Generator: filtering: true Generator: normalizing: true Fetcher: segment: crawltest/segments/20130305232551 Using queue mode : byHost Fetcher: threads: 10 Fetcher: time-out divisor: 2 QueueFeeder finished: total 5 records + hit by time limit :0 Using queue mode : byHost Using queue mode : byHost Using queue mode : byHost Using queue mode : byHost Using queue mode : byHost Using queue mode : byHost Using queue mode : byHost Using queue mode : byHost Using queue mode : byHost Using queue mode : byHost Fetcher: throughput threshold: -1 Fetcher: throughput threshold retries: 5 fetching http://david.wordpress.in/2011_09_01_archive.html -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=4 * queue: david.wordpress.in so on...... Indexing 10 documents Deleting 1 documents SolrIndexer: finished at 2013-03-05 23:27:37, elapsed: 00:00:09 SolrDeleteDuplicates: starting at 2013-03-05 23:27:37 SolrDeleteDuplicates: Solr url: http://localhost:8080/nutch_solr4/collection1/ SolrDeleteDuplicates: finished at 2013-03-05 23:27:38, elapsed: 00:00:01 crawl finished: crawltest

