Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

2009-07-12 Thread Andrzej Bialecki

lei wang wrote:

anyone help? so disappointed.

On Fri, Jul 10, 2009 at 4:29 PM, lei wang nutchmaill...@gmail.com wrote:


Yes, I am also occuring to  this problem. Can anyone help?


On Sun, Jul 5, 2009 at 11:33 PM, xiao yang yangxiao9...@gmail.com wrote:


I often get this error message while crawling the intranet
Is it the network problem? What can I do for it?

$bin/nutch crawl urls -dir crawl -depth 3 -topN 4

crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
topN = 4
Injector: starting
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawl/segments/20090705212324
Generator: filtering: true
Generator: topN: 4
Generator: Partitioning selected urls by host, for politeness.
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Exception in thread main java.io.IOException: Job failed!
   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
   at org.apache.nutch.crawl.Generator.generate(Generator.java:524)
   at org.apache.nutch.crawl.Generator.generate(Generator.java:409)
   at org.apache.nutch.crawl.Crawl.main(Crawl.java:116)







If you are running a large crawl on a single machine, you could be 
running out of file descriptors - please check ulimit -n, the value 
should be much much larger than 1024.


Also, please check the hadoop.log for clues why shuffle fetching failed 
- this could be something trivial as a blocked port, or routing problem, 
or DNS resolution problem, or the problem I mentioned above.


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

2009-07-10 Thread lei wang
Yes, I am also occuring to  this problem. Can anyone help?

On Sun, Jul 5, 2009 at 11:33 PM, xiao yang yangxiao9...@gmail.com wrote:

 I often get this error message while crawling the intranet
 Is it the network problem? What can I do for it?

 $bin/nutch crawl urls -dir crawl -depth 3 -topN 4

 crawl started in: crawl
 rootUrlDir = urls
 threads = 10
 depth = 3
 topN = 4
 Injector: starting
 Injector: crawlDb: crawl/crawldb
 Injector: urlDir: urls
 Injector: Converting injected urls to crawl db entries.
 Injector: Merging injected urls into crawl db.
 Injector: done
 Generator: Selecting best-scoring urls due for fetch.
 Generator: starting
 Generator: segment: crawl/segments/20090705212324
 Generator: filtering: true
 Generator: topN: 4
 Generator: Partitioning selected urls by host, for politeness.
 Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
 Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
 Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
 Exception in thread main java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at org.apache.nutch.crawl.Generator.generate(Generator.java:524)
at org.apache.nutch.crawl.Generator.generate(Generator.java:409)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:116)



Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

2009-07-10 Thread lei wang
anyone help? so disappointed.

On Fri, Jul 10, 2009 at 4:29 PM, lei wang nutchmaill...@gmail.com wrote:

 Yes, I am also occuring to  this problem. Can anyone help?


 On Sun, Jul 5, 2009 at 11:33 PM, xiao yang yangxiao9...@gmail.com wrote:

 I often get this error message while crawling the intranet
 Is it the network problem? What can I do for it?

 $bin/nutch crawl urls -dir crawl -depth 3 -topN 4

 crawl started in: crawl
 rootUrlDir = urls
 threads = 10
 depth = 3
 topN = 4
 Injector: starting
 Injector: crawlDb: crawl/crawldb
 Injector: urlDir: urls
 Injector: Converting injected urls to crawl db entries.
 Injector: Merging injected urls into crawl db.
 Injector: done
 Generator: Selecting best-scoring urls due for fetch.
 Generator: starting
 Generator: segment: crawl/segments/20090705212324
 Generator: filtering: true
 Generator: topN: 4
 Generator: Partitioning selected urls by host, for politeness.
 Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
 Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
 Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
 Exception in thread main java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at org.apache.nutch.crawl.Generator.generate(Generator.java:524)
at org.apache.nutch.crawl.Generator.generate(Generator.java:409)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:116)