Dear all,

I am new to nutch, and recently is trying nutch1.7 and Solr4.4 to build a
search engine.
Here are some questions after trying for a while:

1. I use this command to start the crawling, as stated in the tutorial

/bin/bash ./bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr/ 2

So when the crawled pages will be sent to Solr for indexing? As when I look
at the Solr dashboard, the number of docs is not increasing when the
crawling is in progress.

2. About error handling. If some java exceptions are thrown in the middle of
crawling, how can I know the crawled data are indexed, and where will the
crawling resume if I execute the above command?

3. Any advice about executing the crawling if I want to index those
frequently updated pages, e.g. bbc news?

Thanks.
Regards,

Patrick



--
View this message in context: 
http://lucene.472066.n3.nabble.com/some-questions-about-nutch-from-a-new-user-tp4092548.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to