re On 2012?07?16? 13:39, IT_ailen [via Lucene] wrote: > Hi there, > Recently I'm crawling some sites with Nutch, but there are several > problems bothering me. I have searched some with Google and some > forums like nutch-user, but still gotten little help. So I have to > list them as following and hope you guys can do me a favor. Thanks~ > 1. Can Nutch be interrupted when it is crawling? If it can be > interrupted, what's the exact handling logic after it resumes; if not, > must I re-crawling the whole sites(oh, that will be a really huge > re-work.), or there will be a better solution? > 2. How does the Nutch handle with some bad HTTP status like 307, 203? > 3. How does the crawl option depth work? For example, if I have > crawled with a depth valued 3, what will the Nutch do when I re-crawl > with "depth=3". Will it regenerate the destine list of URLs from the > most recent segment or all of them and the file of original seeds? > 4. What kind of influences will be made when I manually remove some > subdirectories under the segments directory? > I've searched these questions but don't get clear answers, so I hope > you guys maybe tell me what in your opinions, or we can discuss them > here. > I'm reading the source code but that is a really huge work~~ > I'm what I am. > > > ------------------------------------------------------------------------ > If you reply to this email, your message will be added to the > discussion below: > http://lucene.472066.n3.nabble.com/Problems-on-using-nutch-tp3995207.html > To start a new topic under Nutch - User, email > [email protected] > To unsubscribe from Nutch - User, click here > <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=603147&code=enlsZnJlZXBhcmFkaXNlQGdtYWlsLmNvbXw2MDMxNDd8NTIxMDAxODUx>. > NAML > <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > >
----- I'm what I am. -- View this message in context: http://lucene.472066.n3.nabble.com/Problems-on-using-nutch-tp3995207p3995208.html Sent from the Nutch - User mailing list archive at Nabble.com.

