solr url is http://localhost:8080/solrnutch, Version : solr 3.6, Nutch - 1.6.... Below commands and log was copy paste problem.
-David On Wed, Mar 6, 2013 at 12:03 AM, David Philip <[email protected]>wrote: > Hi all, > > When I am doing full re-crawl, the old urls that are modified should be > updated correct?That is not happening. > > Please correct me where I am wrong. Below are the list of steps: > > > - property set db.fetch.interval.default=600sec db.injector.update=true > - crawl : bin/nutch crawl urls -solr http://localhost:8080/solrnutch-dir > crawltest -depth 10 > - after 600 sec > - crawl : bin/nutch crawl urls -solr http://localhost:8080/solrnutch-dir > crawltest -depth 10 > > > - Nothing updated. data in solr indexes remain same. I checked the > fetch segments(bin/nutch readseg), it is also old, But the fetch took > place.. please see the brief steps of log. > - I also deleted one URL and made it site not found so that it also > delete from indexes (using -deleteGone) but this is also not deleted. The > log shows it deleted but in indexes it is not deleted. I still this URL > searchable. > This Seems to be some cache problem (I cleared cache -webserver)or any > setting that I have to do? Please let me know.] > > > Please see : This question is related to my old thread but different > question about update nt successful: data is not re-fetched. > > > Thanks very much - David > * > * > * > * > * > * > *The brief log trace while second crawl:* > Injector: Converting injected urls to crawl db entries. > Injector: total number of urls rejected by filters: 0 > Injector: total number of urls injected after normalization and filtering: > 1 > Injector: Merging injected urls into crawl db. > http://david.wordpress.in/ overwritten with injected record but update > was specified. > Injector: finished at 2013-03-05 23:25:49, elapsed: 00:00:03 > Generator: starting at 2013-03-05 23:25:49 > Generator: Selecting best-scoring urls due for fetch. > Generator: filtering: true > Generator: normalizing: true > Fetcher: segment: crawltest/segments/20130305232551 > Using queue mode : byHost > Fetcher: threads: 10 > Fetcher: time-out divisor: 2 > QueueFeeder finished: total 5 records + hit by time limit :0 > Using queue mode : byHost > Using queue mode : byHost > Using queue mode : byHost > Using queue mode : byHost > Using queue mode : byHost > Using queue mode : byHost > Using queue mode : byHost > Using queue mode : byHost > Using queue mode : byHost > Using queue mode : byHost > Fetcher: throughput threshold: -1 > Fetcher: throughput threshold retries: 5 > fetching http://david.wordpress.in/2011_09_01_archive.html > -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=4 > * queue: david.wordpress.in > > so on...... > > Indexing 10 documents > Deleting 1 documents > SolrIndexer: finished at 2013-03-05 23:27:37, elapsed: 00:00:09 > SolrDeleteDuplicates: starting at 2013-03-05 23:27:37 > SolrDeleteDuplicates: Solr url: > http://localhost:8080/nutch_solr4/collection1/ > SolrDeleteDuplicates: finished at 2013-03-05 23:27:38, elapsed: 00:00:01 > crawl finished: crawltest > > >

