solr url is  http://localhost:8080/solrnutch,
Version : solr 3.6, Nutch - 1.6.... Below commands and log was copy paste
problem.



-David


On Wed, Mar 6, 2013 at 12:03 AM, David Philip
<[email protected]>wrote:

> Hi all,
>
> When I am doing full re-crawl, the old urls that are modified should be
> updated correct?That is not happening.
>
>  Please correct me where I am wrong. Below are the list of steps:
>
>
>    - property set db.fetch.interval.default=600sec db.injector.update=true
>    - crawl : bin/nutch crawl urls -solr http://localhost:8080/solrnutch-dir 
> crawltest -depth 10
>    - after 600 sec
>    - crawl : bin/nutch crawl urls -solr http://localhost:8080/solrnutch-dir 
> crawltest -depth 10
>
>
>    - Nothing updated.  data in solr indexes remain same. I checked the
>    fetch segments(bin/nutch readseg), it is also old, But the fetch took
>    place.. please see the brief steps of log.
>    - I also deleted one URL and made it site not found so that it also
>    delete from indexes (using -deleteGone) but this is also not deleted. The
>    log shows it deleted but in indexes it is not deleted. I still this URL
>    searchable.
>    This Seems to be some cache problem (I cleared cache -webserver)or any
>    setting that I have to do? Please let me know.]
>
>
> Please see :  This question is related to my old thread but different
> question about update nt successful: data is not re-fetched.
>
>
> Thanks very much - David
> *
> *
> *
> *
> *
> *
> *The brief log trace while second crawl:*
> Injector: Converting injected urls to crawl db entries.
> Injector: total number of urls rejected by filters: 0
> Injector: total number of urls injected after normalization and filtering:
> 1
> Injector: Merging injected urls into crawl db.
> http://david.wordpress.in/ overwritten with injected record but update
> was specified.
> Injector: finished at 2013-03-05 23:25:49, elapsed: 00:00:03
> Generator: starting at 2013-03-05 23:25:49
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Fetcher: segment: crawltest/segments/20130305232551
> Using queue mode : byHost
> Fetcher: threads: 10
> Fetcher: time-out divisor: 2
> QueueFeeder finished: total 5 records + hit by time limit :0
> Using queue mode : byHost
> Using queue mode : byHost
> Using queue mode : byHost
> Using queue mode : byHost
> Using queue mode : byHost
> Using queue mode : byHost
> Using queue mode : byHost
> Using queue mode : byHost
> Using queue mode : byHost
> Using queue mode : byHost
> Fetcher: throughput threshold: -1
> Fetcher: throughput threshold retries: 5
> fetching http://david.wordpress.in/2011_09_01_archive.html
> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=4
> * queue: david.wordpress.in
>
> so on......
>
> Indexing 10 documents
> Deleting 1 documents
> SolrIndexer: finished at 2013-03-05 23:27:37, elapsed: 00:00:09
> SolrDeleteDuplicates: starting at 2013-03-05 23:27:37
> SolrDeleteDuplicates: Solr url:
> http://localhost:8080/nutch_solr4/collection1/
> SolrDeleteDuplicates: finished at 2013-03-05 23:27:38, elapsed: 00:00:01
> crawl finished: crawltest
>
>
>

Reply via email to