-----Original message----- > From:pepe3059 <[email protected]> > Sent: Mon 04-Jun-2012 20:42 > To: [email protected] > Subject: RE: threads disminution when fetching page > > thank you for your answer Markus
Hi > > you mean, until the fetch process finishes, is information stored using hdfs > by nutch? meanwhile is in the defined tmp directory ? Yes. Hadoop MapReduce stores intermediate files in the tmp directory and then writes the reduce output to HDFS. > > no change is presented in two days (and i think is so many time for > writing). and the problem persists, You mean it hasn't finished the fetch job in two days? Thats odd and shouldnt happen, at least an error should be thrown. > could be some agent revisor from the > above mentioned site that provokes a pause? No. If a site doesnt respond it times out gracefully. If a site has a too large crawl delay it is skipped as well. You can test with $nutch indexchecker <url> to see if a page processes correctly. > > > > thank you vary much for your time > > José > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/threads-disminution-when-fetching-page-tp3987381p3987629.html > Sent from the Nutch - User mailing list archive at Nabble.com. >

