Reindex Nutch periodically using cron job

Abdul Munim Sun, 19 Jun 2016 12:34:44 -0700

Hi,

I've crawled a website using Nutch 1.12 and indexed it in Solr 6.1 using the 
below command:


==CODE==
[root@2a563cff0511 nutch-latest]# bin/crawl -i \
> -D solr.server.url=http://192.168.99.100:8983/solr/test/ urls/ crawl 5
==END_CODE==

When I run the above command again then it says the following:

==CODE==
[root@2a563cff0511 nutch-latest]# bin/crawl -i \
> -D solr.server.url=http://192.168.99.100:8983/solr/test/ urls/ crawl 5
Injecting seed URLs
/opt/nutch-latest/bin/nutch inject crawl/crawldb urls/
Injector: starting at 2016-06-19 15:29:08
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: overwrite: false
Injector: update: false
Injector: Total urls rejected by filters: 0
Injector: Total urls injected after normalization and filtering: 1
Injector: Total urls injected but already in CrawlDb: 1
Injector: Total new urls injected: 0
Injector: finished at 2016-06-19 15:29:13, elapsed: 00:00:05
Sun Jun 19 15:29:13 UTC 2016 : Iteration 1 of 1
Generating a new segment
/opt/nutch-latest/bin/nutch generate -D mapreduce.job.reduces=2 -D 
mapred.child.java.opts=-Xmx1000m -D mapreduce.reduce.
speculative=false -D mapreduce.map.speculative=false -D 
mapreduce.map.output.compress=true crawl/crawldb crawl/segments
-topN 50000 -numFetchers 1 -noFilter
Generator: starting at 2016-06-19 15:29:15
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: false
Generator: normalizing: true
Generator: topN: 50000
Generator: 0 records selected for fetching, exiting ...
Generate returned 1 (no new segments created)
Escaping loop: no more URLs to fetch now
==END_CODE==

However, I made some changes i.e. new file is being added and an existing file 
has been changed.

Regards,
Munim

Reindex Nutch periodically using cron job

Reply via email to