Dear Team,

I have a query. I'm not sure if this is the right place to ask. But here it 
goes:

I have to crawl and index my website.

These are the steps I have been asked to follow.

Delete the crawl folders (apache-nutch-1.10\crawl)
Remove the existing indexes:
Solr-Admin-> Skyweb->Documents->Document Type (xml) and execute :
Go to Solr-Admin -> Core Admin -> Click on 'Reload' and then 'Optimize'
And run the crawl job using the following command:
bin/crawl -i -D solr.server.url=http://IP:8080/solr/website/ urls/ crawl/ 5
I did some research and felt that doing these tasks manually is overwork and 
the script should take care of all the above tasks.

So my queries\concerns are:

Doesn't the above script take care of the entire process? Do I still need to 
delete the crawl folders and clear the existing indexes manually?

What is the relevance of the Admin tasks - 'Reload' and 'Optimize'?

Can I cron schedule the the crawl script to run weekly and will it take care of 
the entire process?

How else can I automate the crawling and indexing to run periodically?


Regards,

Mohammed Ajmal Rahman
Tata Consultancy Services
Mailto: ajmal.rah...@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty.   IT Services
Business Solutions
Consulting
____________________________________________
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


Reply via email to