re
On 2012?07?16? 13:39, IT_ailen [via Lucene] wrote:
> Hi there,
>  Recently I'm crawling some sites with Nutch, but there are several 
> problems bothering me. I have searched some with Google and some 
> forums like nutch-user, but still gotten little help. So I have to 
> list them as following and hope you guys can do me a favor. Thanks~
>  1. Can Nutch be interrupted when it is crawling? If it can be 
> interrupted, what's the exact handling logic after it resumes; if not, 
> must I re-crawling the whole sites(oh, that will be a really huge 
> re-work.), or there will be a better solution?
>  2. How does the Nutch handle with some bad HTTP status like 307, 203?
>  3. How does the crawl option depth work? For example, if I have 
> crawled with a depth valued 3, what will the Nutch do when I re-crawl 
> with "depth=3". Will it regenerate the destine list of URLs from the 
> most recent segment or all of them and the file of original seeds?
>  4. What kind of influences will be made when I manually remove some 
> subdirectories under the segments directory?
> I've searched these questions but don't get clear answers, so I hope 
> you guys maybe tell me what in your opinions, or we can discuss them 
> here.
> I'm reading the source code but that is a really huge work~~
> I'm what I am.
>
>
> ------------------------------------------------------------------------
> If you reply to this email, your message will be added to the 
> discussion below:
> http://lucene.472066.n3.nabble.com/Problems-on-using-nutch-tp3995207.html
> To start a new topic under Nutch - User, email 
> [email protected]
> To unsubscribe from Nutch - User, click here 
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=603147&code=enlsZnJlZXBhcmFkaXNlQGdtYWlsLmNvbXw2MDMxNDd8NTIxMDAxODUx>.
> NAML 
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>  
>




-----
I'm what I am.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problems-on-using-nutch-tp3995207p3995208.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to