Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
lei wang wrote: anyone help? so disappointed. On Fri, Jul 10, 2009 at 4:29 PM, lei wang nutchmaill...@gmail.com wrote: Yes, I am also occuring to this problem. Can anyone help? On Sun, Jul 5, 2009 at 11:33 PM, xiao yang yangxiao9...@gmail.com wrote: I often get this error message while crawling the intranet Is it the network problem? What can I do for it? $bin/nutch crawl urls -dir crawl -depth 3 -topN 4 crawl started in: crawl rootUrlDir = urls threads = 10 depth = 3 topN = 4 Injector: starting Injector: crawlDb: crawl/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Injector: Merging injected urls into crawl db. Injector: done Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: crawl/segments/20090705212324 Generator: filtering: true Generator: topN: 4 Generator: Partitioning selected urls by host, for politeness. Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. Exception in thread main java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.crawl.Generator.generate(Generator.java:524) at org.apache.nutch.crawl.Generator.generate(Generator.java:409) at org.apache.nutch.crawl.Crawl.main(Crawl.java:116) If you are running a large crawl on a single machine, you could be running out of file descriptors - please check ulimit -n, the value should be much much larger than 1024. Also, please check the hadoop.log for clues why shuffle fetching failed - this could be something trivial as a blocked port, or routing problem, or DNS resolution problem, or the problem I mentioned above. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Yes, I am also occuring to this problem. Can anyone help? On Sun, Jul 5, 2009 at 11:33 PM, xiao yang yangxiao9...@gmail.com wrote: I often get this error message while crawling the intranet Is it the network problem? What can I do for it? $bin/nutch crawl urls -dir crawl -depth 3 -topN 4 crawl started in: crawl rootUrlDir = urls threads = 10 depth = 3 topN = 4 Injector: starting Injector: crawlDb: crawl/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Injector: Merging injected urls into crawl db. Injector: done Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: crawl/segments/20090705212324 Generator: filtering: true Generator: topN: 4 Generator: Partitioning selected urls by host, for politeness. Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. Exception in thread main java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.crawl.Generator.generate(Generator.java:524) at org.apache.nutch.crawl.Generator.generate(Generator.java:409) at org.apache.nutch.crawl.Crawl.main(Crawl.java:116)
Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
anyone help? so disappointed. On Fri, Jul 10, 2009 at 4:29 PM, lei wang nutchmaill...@gmail.com wrote: Yes, I am also occuring to this problem. Can anyone help? On Sun, Jul 5, 2009 at 11:33 PM, xiao yang yangxiao9...@gmail.com wrote: I often get this error message while crawling the intranet Is it the network problem? What can I do for it? $bin/nutch crawl urls -dir crawl -depth 3 -topN 4 crawl started in: crawl rootUrlDir = urls threads = 10 depth = 3 topN = 4 Injector: starting Injector: crawlDb: crawl/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Injector: Merging injected urls into crawl db. Injector: done Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: crawl/segments/20090705212324 Generator: filtering: true Generator: topN: 4 Generator: Partitioning selected urls by host, for politeness. Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. Exception in thread main java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.crawl.Generator.generate(Generator.java:524) at org.apache.nutch.crawl.Generator.generate(Generator.java:409) at org.apache.nutch.crawl.Crawl.main(Crawl.java:116)