-----Original message-----
> From:pepe3059 <[email protected]>
> Sent: Mon 04-Jun-2012 20:42
> To: [email protected]
> Subject: RE: threads disminution when fetching page
> 
> thank you for your answer Markus

Hi

> 
> you mean, until the fetch process finishes, is information stored using hdfs
> by nutch? meanwhile is in the defined tmp directory ?

Yes. Hadoop MapReduce stores intermediate files in the tmp directory and then 
writes the reduce output to HDFS.

> 
> no change is presented in two days (and i think is so many time for
> writing). and the problem persists,

You mean it hasn't finished the fetch job in two days? Thats odd and shouldnt 
happen, at least an error should be thrown.

> could be some agent revisor from the
> above mentioned site that provokes a pause? 

No. If a site doesnt respond it times out gracefully. If a site has a too large 
crawl delay it is skipped as well.

You can test with $nutch indexchecker <url> to see if a page processes 
correctly.
> 
> 
> 
> thank you vary much for your time
> 
> José
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/threads-disminution-when-fetching-page-tp3987381p3987629.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 

Reply via email to