Hello,

I am using apache-nutch-1.8.

I have an application which should crawl about 50 to 100 urls .

Problem is that customer wants to change the urls from time to time (also 
delete someones)

What is the correct way?

I wanted to do it this way:

1. Change the list of urls in seed.txt
2. Change the list in regex-urlfilter.txt (add +^http://www.rlp.de/ i.e. 
for every url)
3. Delete the crawl directory and subdirectories
4. Delete the solr index
5. Run a cronjob every night    bin/crawl urls/seed.txt crawl 
http://localhost:8983/solr 5

Is this o.k.?

Thanx for help Martin 

Reply via email to