Hi there, Recently I'm crawling some sites with Nutch, but there are several problems bothering me. I have searched some with Google and some forums like nutch-user, but still gotten little help. So I have to list them as following and hope you guys can do me a favor. Thanks~ 1. Can Nutch be interrupted when it is crawling? If it can be interrupted, what's the exact handling logic after it resumes; if not, must I re-crawling the whole sites(oh, that will be a really huge re-work.), or there will be a better solution? 2. How does the Nutch handle with some bad HTTP status like 307, 203? 3. How does the crawl option depth work? For example, if I have crawled with a depth valued 3, what will the Nutch do when I re-crawl with "depth=3". Will it regenerate the destine list of URLs from the most recent segment or all of them and the file of original seeds? 4. What kind of influences will be made when I manually remove some subdirectories under the segments directory? I've searched these questions but don't get clear answers, so I hope you guys maybe tell me what in your opinions, or we can discuss them here. I'm reading the source code but that is a really huge work~~
----- I'm what I am. -- View this message in context: http://lucene.472066.n3.nabble.com/Problems-on-using-nutch-tp3995207.html Sent from the Nutch - User mailing list archive at Nabble.com.

