Too many fether failures
This is also how I fixed this problem. On 6/21/08, Sayali Kulkarni sayali_s_kulka...@yahoo.co.in wrote: Hi! My problem of Too many fetch failures as well as shuffle error was resolved when I added the list of all the slave machines in the /etc/hosts file. Earlier on every slave I just had the entries of the master and own machine in the /etc/hosts file. But now I have updated all the /etc/hosts files to include the IP address and the names of all the machines in the cluster and my problem is resolved. One question still, I currently have just 5-6 nodes. But when Hadoop is deployed on a larger cluster, say of 1000+ nodes, is it expected that every time a new machine is added to the cluster, you add an entry in the /etc/hosts of all the (1000+) machines in the cluster? Regards, Sayali Sayali Kulkarni sayali_s_kulka...@yahoo.co.in wrote: Can you post the reducer logs. How many nodes are there in the cluster? There are 6 nodes in the cluster - 1 master and 5 slaves I tried to reduce the number of nodes, and found that the problem is solved only if there is a single node in the cluster. So I can deduce that the problem is there in some configuration. Configuration file: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namehadoop.tmp.dir/name value/extra/HADOOP/hadoop-0.16.3/tmp/dir/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://10.105.41.25:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property property namemapred.job.tracker/name value10.105.41.25:54311/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property property namedfs.replication/name value2/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property property namemapred.child.java.opts/name value-Xmx1048M/value /property property namemapred.local.dir/name value/extra/HADOOP/hadoop-0.16.3/tmp/mapred/value /property property namemapred.map.tasks/name value53/value descriptionThe default number of map tasks per job. Typically set to a prime several times greater than number of available hosts. Ignored when mapred.job.tracker is local. /description /property property namemapred.reduce.tasks/name value7/value descriptionThe default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is local. /description /property /configuration This is the output that I get when running the tasks with 2 nodes in the cluster: 08/06/20 11:07:45 INFO mapred.FileInputFormat: Total input paths to process : 1 08/06/20 11:07:45 INFO mapred.JobClient: Running job: job_200806201106_0001 08/06/20 11:07:46 INFO mapred.JobClient: map 0% reduce 0% 08/06/20 11:07:53 INFO mapred.JobClient: map 8% reduce 0% 08/06/20 11:07:55 INFO mapred.JobClient: map 17% reduce 0% 08/06/20 11:07:57 INFO mapred.JobClient: map 26% reduce 0% 08/06/20 11:08:00 INFO mapred.JobClient: map 34% reduce 0% 08/06/20 11:08:01 INFO mapred.JobClient: map 43% reduce 0% 08/06/20 11:08:04 INFO mapred.JobClient: map 47% reduce 0% 08/06/20 11:08:05 INFO mapred.JobClient: map 52% reduce 0% 08/06/20 11:08:08 INFO mapred.JobClient: map 60% reduce 0% 08/06/20 11:08:09 INFO mapred.JobClient: map 69% reduce 0% 08/06/20 11:08:10 INFO mapred.JobClient: map 73% reduce 0% 08/06/20 11:08:12 INFO mapred.JobClient: map 78% reduce 0% 08/06/20 11:08:13 INFO mapred.JobClient: map 82% reduce 0% 08/06/20 11:08:15 INFO mapred.JobClient: map 91% reduce 1% 08/06/20 11:08:16 INFO mapred.JobClient: map 95% reduce 1% 08/06/20 11:08:18 INFO mapred.JobClient: map 99% reduce 3% 08/06/20 11:08:23 INFO mapred.JobClient: map 100% reduce 3% 08/06/20 11:08:25 INFO mapred.JobClient: map 100% reduce 7% 08/06/20 11:08:28 INFO mapred.JobClient: map 100% reduce 10% 08/06/20 11:08:30 INFO mapred.JobClient: map 100% reduce 11% 08/06/20 11:08:33 INFO mapred.JobClient: map 100% reduce 12% 08/06/20 11:08:35 INFO mapred.JobClient: map 100% reduce 14% 08/06/20 11:08:38 INFO mapred.JobClient: map 100% reduce 15% 08/06/20 11:09:54 INFO mapred.JobClient: map 100% reduce 13% 08/06/20 11:09:54 INFO mapred.JobClient: Task Id :
Search results return 0
Hi All, I had been installing nutch on my server and crawling yields results, however when I search on the site, it yields 0 results. I do not know where to put the crawl directory in the tomcat folder, so if you can give me a hint please do tell me as it's not written in the docs. (That part is skipped). On the command line, the results are returned as follows: [u...@ogn003 engine]$ bin/nutch org.apache.nutch.searcher.NutchBean linux Total hits: 14 0 20090709133038/http://www.amjad.ws/ ... تصÙÙÙات How To Linux Network php VB.NET Web ... to Windows Server 2008 Ubuntu Linux For Novices Building ... 1 20090709133038/http://www.aramcode.net/vb/ ... Ùس٠Ùظا٠تشغÙÙ Unix Linux (ÙشاÙد٠1 زائر) تÙاÙØ´ ÙÙÙ ... Ù Ù ÙÙÙس بÙ٠تÙزÙعات٠. ارشÙÙ .:. Linux Vs Windows ::... بÙاسطة Ù Ø´ÙÙر007 07 ... 2 20090709133038/http://www.bowlfr.org/ ... et SMTP sous Mozilla-Thunerbird (Linux), Outlook (Microsoft), ainsi que POP3 ... 3 20090709133038/http://emaus.czest.pl/ ... 6,7), FireFox, Opera (Win, Linux) Uwagi dotyczÄ ce funkcjonalnoÅi i wyglÄ du ... 4 20090709133038/http://www.geocities.com/ivan_penkov/ ... MS Windows 2000/NT/9x, Linux (RedHat), DOS / Win 3.11 ... 5 20090709133038/http://www.mentor-it.dk/ ... 6 cell Li-Ion, Full Linux 2.246,- ex moms  Mentor ... 6 20090709133038/http://www.webmatters.co.uk/ ... Cocoa/Objective-C, MySQL, Apache, Linux/Solaris/Mac OS X, C++ ... 7 20090709133038/http://www.yale.edu/its/help/cmc.html ... FAS IT support Educational Technologies Linux Systems Design Support Social ... 8 20090709133038/http://www.project-open.com/whitepapers/localization/ Multilingual Architecture Primer        This site has been restructured in order to make it more accessible ... 9 20090709133038/http://m4rtin.com/ ... notebooku, kde mi bľà perfektnà linux ubuntu . O tom, že lze ... jsem nikde nic lovit. ProstÄinux pro lidi. Share/Save no ... Regards, Zaihan
Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
lei wang wrote: anyone help? so disappointed. On Fri, Jul 10, 2009 at 4:29 PM, lei wang nutchmaill...@gmail.com wrote: Yes, I am also occuring to this problem. Can anyone help? On Sun, Jul 5, 2009 at 11:33 PM, xiao yang yangxiao9...@gmail.com wrote: I often get this error message while crawling the intranet Is it the network problem? What can I do for it? $bin/nutch crawl urls -dir crawl -depth 3 -topN 4 crawl started in: crawl rootUrlDir = urls threads = 10 depth = 3 topN = 4 Injector: starting Injector: crawlDb: crawl/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Injector: Merging injected urls into crawl db. Injector: done Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: crawl/segments/20090705212324 Generator: filtering: true Generator: topN: 4 Generator: Partitioning selected urls by host, for politeness. Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. Exception in thread main java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.crawl.Generator.generate(Generator.java:524) at org.apache.nutch.crawl.Generator.generate(Generator.java:409) at org.apache.nutch.crawl.Crawl.main(Crawl.java:116) If you are running a large crawl on a single machine, you could be running out of file descriptors - please check ulimit -n, the value should be much much larger than 1024. Also, please check the hadoop.log for clues why shuffle fetching failed - this could be something trivial as a blocked port, or routing problem, or DNS resolution problem, or the problem I mentioned above. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
How to search part of words?
Hi how can I search part of words? For example I want to search for ghost, and I should already get all pages with the word ghostbusters within! I get only the pages with the exact word! Where is the option for that? Best reguards Stefan
Problem with nutch
Hi This is pranay. I want to join the Nutch Form.I am new to nutch. I have problem withe the ontology.I am not able to Clear the Cached Data.So can you please help me with that. Problem:I have created different ontology for different user.But when i Search soem Word in Semantic Search.It is giving me previous users results also.So i ant the cached date to be cleaned. When i stop the server and restart it.Its working fine.
Changing fieldsNorm at query time
Hi,, I observe that my search results are bad only because the fieldNorm is so high for bad results. Can anyone please suggest on how can we change the fieldNorm factor at search time though I know it is an indexed fiels. Is there a way can we set fieldNorm to a constatnt value while using Nutch for searching over the lucene index. Thanks Ilayaraja
Nutch Character encoding converter
hi Nutch has a auto detector for character encoding. Does it convert character to standard encoding automatically, after detecting it? -- View this message in context: http://www.nabble.com/Nutch-Character-encoding-converter-tp24456144p24456144.html Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Nutch Character encoding converter
Nutch has a auto detector for character encoding. Does it convert character to standard encoding automatically, after detecting it? Yes - Nutch converts text to Unicode for all subsequent processing. -- Ken -- Ken Krugler +1 530-210-6378