Re: Ways to improve job cleanup speed
Hi Tanya, What version of Hadoop are you running? Is this a 1-node cluster running in pseudo-distributed mode with 1 physical spinning hard drive? How much intermediate data is being emitted from the Map phase? How many mappers and reducers total is the job running? -- Sameer Farooqui Systems Architect / Hortonworks On Thu, Feb 23, 2012 at 7:08 AM, tanyasch ta...@tickel.net wrote: Hi, I'm running a job that completes in about a 90 seconds, but takes about 10-15 minutes to run cleanup. I'm looking for ways to affect or even monitor the cleanup time. I'd like even advice about whether this is more of a setup issue (like where I'm storing files, with Accumulo and Hadoop temporary and log files all writing to the same disk because our cluster is tiny) or a job issue (can I throw more reducers at it ? the brief description of the OutputCommitter says it uses available reducers for cleanup) or a programming issue (in that case I'd post a different question) Basically, I want to know if the first way to go at this is by reconfiguring the cluster or if I should be programming my way out of this? Thanks. -- View this message in context: http://old.nabble.com/Ways-to-improve-job-cleanup-speed-tp33377374p33377374.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Streaming job hanging
Hi Mohit, Can you provide some more info about the job you're trying to run? What version of Hadoop are you using? What language is the Hadoop streaming job written in? Have you been able to run any Hadoop streaming jobs successfully in this cluster? I'm wondering if all Hadoop streaming jobs fail, or just this one is failing. Instead of running this on a file with possibly 551 blocks, can you try to run it on a small file with like 1 or 2 blocks and see if it runs successfully? When I ran a Hadoop streaming job with Python, on a few small files (1-2 MB), the job ran pretty quickly in 77 seconds (for the Map+Reduce phases): packageJobJar: [/home/hduser/mapper.py, /home/hduser/reducer.py, /mnt/hadoop/tmp/hadoop-unjar5368493284653516019/] [] /tmp/streamjob8122180536767888261.jar tmpDir=null 11/09/06 23:38:04 INFO mapred.FileInputFormat: Total input paths to process : 3 11/09/06 23:38:05 INFO streaming.StreamJob: getLocalDirs(): [/mnt/hadoop/tmp/mapred/local] 11/09/06 23:38:05 INFO streaming.StreamJob: Running job: job_201109062238_0001 11/09/06 23:38:05 INFO streaming.StreamJob: To kill this job, run: 11/09/06 23:38:05 INFO streaming.StreamJob: /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201109062238_0001 11/09/06 23:38:05 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201109062238_0001 11/09/06 23:38:06 INFO streaming.StreamJob: map 0% reduce 0% 11/09/06 23:38:26 INFO streaming.StreamJob: map 32% reduce 0% 11/09/06 23:38:29 INFO streaming.StreamJob: map 39% reduce 0% 11/09/06 23:38:32 INFO streaming.StreamJob: map 48% reduce 0% 11/09/06 23:38:35 INFO streaming.StreamJob: map 50% reduce 0% 11/09/06 23:38:50 INFO streaming.StreamJob: map 75% reduce 0% 11/09/06 23:38:53 INFO streaming.StreamJob: map 100% reduce 0% 11/09/06 23:38:56 INFO streaming.StreamJob: map 100% reduce 17% 11/09/06 23:39:08 INFO streaming.StreamJob: map 100% reduce 67% 11/09/06 23:39:12 INFO streaming.StreamJob: map 100% reduce 76% 11/09/06 23:39:14 INFO streaming.StreamJob: map 100% reduce 86% 11/09/06 23:39:17 INFO streaming.StreamJob: map 100% reduce 96% 11/09/06 23:39:23 INFO streaming.StreamJob: map 100% reduce 100% 11/09/06 23:39:29 INFO streaming.StreamJob: Job complete: job_201109062238_0001 11/09/06 23:39:29 INFO streaming.StreamJob: Output: /hduser/wordcount_python-output -- Sameer Farooqui Systems Architect / Hortonworks On Wed, Feb 22, 2012 at 8:38 PM, Mohit Anchlia mohitanch...@gmail.comwrote: Streaming job just seems to be hanging 12/02/22 17:35:50 INFO streaming.StreamJob: map 0% reduce 0% - On the admin page I see that it created 551 input split. Could somone suggest a way to find out what might be causing it to hang? I increased io.sort.mb to 200 MB. I am using 5 data nodes with 12 CPU, 96G RAM.
BZip2 Splittable?
Hi All, I have a cluster of 6 datanodes, all running hadoop version 0.20.2, r911707 that take a series of bzip2 compressed text files as input. I have read conflicting articles regarding whether or not hadoop can split these bzip2 files, can anyone give me a definite answer? Thanks is advance, Dan.
Re: BZip2 Splittable?
Hi Daniel, Bzip2 compression codec allows for splittable files. According to this Hadoop JIRA improvement, splitting of bzip2 compressed files in Hadoop jobs is supported: https://issues.apache.org/jira/browse/HADOOP-4012 -- Rohit Bakhshi www.hortonworks.com (http://www.hortonworks.com/) On Friday, February 24, 2012 at 7:43 AM, Daniel Baptista wrote: Hi All, I have a cluster of 6 datanodes, all running hadoop version 0.20.2, r911707 that take a series of bzip2 compressed text files as input. I have read conflicting articles regarding whether or not hadoop can split these bzip2 files, can anyone give me a definite answer? Thanks is advance, Dan.
Re: BZip2 Splittable?
Daniel, I just noticed your Hadoop version - 0.20.2. The JIRA fix below is for Hadoop 0.21.0, which is a different version. So it may not be supported on your version of Hadoop. -- Rohit Bakhshi www.hortonworks.com (http://www.hortonworks.com/) On Friday, February 24, 2012 at 7:49 AM, Rohit Bakhshi wrote: Hi Daniel, Bzip2 compression codec allows for splittable files. According to this Hadoop JIRA improvement, splitting of bzip2 compressed files in Hadoop jobs is supported: https://issues.apache.org/jira/browse/HADOOP-4012 -- Rohit Bakhshi www.hortonworks.com (http://www.hortonworks.com/) On Friday, February 24, 2012 at 7:43 AM, Daniel Baptista wrote: Hi All, I have a cluster of 6 datanodes, all running hadoop version 0.20.2, r911707 that take a series of bzip2 compressed text files as input. I have read conflicting articles regarding whether or not hadoop can split these bzip2 files, can anyone give me a definite answer? Thanks is advance, Dan.
RE: BZip2 Splittable?
Hi Rohit, thanks for the response, this is pretty much as I expected and hopefully adds weight to my other thoughts... Could this mean that all my datanodes are being sent all of the data or that only one datanode is executing the job. Thanks again , Dan. -Original Message- From: Rohit Bakhshi [mailto:ro...@hortonworks.com] Sent: 24 February 2012 15:54 To: common-user@hadoop.apache.org Subject: Re: BZip2 Splittable? Daniel, I just noticed your Hadoop version - 0.20.2. The JIRA fix below is for Hadoop 0.21.0, which is a different version. So it may not be supported on your version of Hadoop. -- Rohit Bakhshi www.hortonworks.com (http://www.hortonworks.com/) On Friday, February 24, 2012 at 7:49 AM, Rohit Bakhshi wrote: Hi Daniel, Bzip2 compression codec allows for splittable files. According to this Hadoop JIRA improvement, splitting of bzip2 compressed files in Hadoop jobs is supported: https://issues.apache.org/jira/browse/HADOOP-4012 -- Rohit Bakhshi www.hortonworks.com (http://www.hortonworks.com/) On Friday, February 24, 2012 at 7:43 AM, Daniel Baptista wrote: Hi All, I have a cluster of 6 datanodes, all running hadoop version 0.20.2, r911707 that take a series of bzip2 compressed text files as input. I have read conflicting articles regarding whether or not hadoop can split these bzip2 files, can anyone give me a definite answer? Thanks is advance, Dan. CONFIDENTIALITY - This email and any files transmitted with it, are confidential, may be legally privileged and are intended solely for the use of the individual or entity to whom they are addressed. If this has come to you in error, you must not copy, distribute, disclose or use any of the information it contains. Please notify the sender immediately and delete them from your system. SECURITY - Please be aware that communication by email, by its very nature, is not 100% secure and by communicating with Perform Group by email you consent to us monitoring and reading any such correspondence. VIRUSES - Although this email message has been scanned for the presence of computer viruses, the sender accepts no liability for any damage sustained as a result of a computer virus and it is the recipient’s responsibility to ensure that email is virus free. AUTHORITY - Any views or opinions expressed in this email are solely those of the sender and do not necessarily represent those of Perform Group. COPYRIGHT - Copyright of this email and any attachments belongs to Perform Group, Companies House Registration number 6324278.
RE: BZip2 Splittable?
Support starts in 0.21, yes. It will soon be backported and available in 1.1.0. A patch to 1.0.0 to enable bzip2 splittability is here, https://issues.apache.org/jira/browse/HADOOP-7823, if you feel up to patching and rebuilding. - Tim. From: Rohit Bakhshi [ro...@hortonworks.com] Sent: Friday, February 24, 2012 7:53 AM To: common-user@hadoop.apache.org Subject: Re: BZip2 Splittable? Daniel, I just noticed your Hadoop version - 0.20.2. The JIRA fix below is for Hadoop 0.21.0, which is a different version. So it may not be supported on your version of Hadoop. -- Rohit Bakhshi www.hortonworks.com (http://www.hortonworks.com/) On Friday, February 24, 2012 at 7:49 AM, Rohit Bakhshi wrote: Hi Daniel, Bzip2 compression codec allows for splittable files. According to this Hadoop JIRA improvement, splitting of bzip2 compressed files in Hadoop jobs is supported: https://issues.apache.org/jira/browse/HADOOP-4012 -- Rohit Bakhshi www.hortonworks.com (http://www.hortonworks.com/) On Friday, February 24, 2012 at 7:43 AM, Daniel Baptista wrote: Hi All, I have a cluster of 6 datanodes, all running hadoop version 0.20.2, r911707 that take a series of bzip2 compressed text files as input. I have read conflicting articles regarding whether or not hadoop can split these bzip2 files, can anyone give me a definite answer? Thanks is advance, Dan. The information and any attached documents contained in this message may be confidential and/or legally privileged. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, or reproduction is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender immediately by return e-mail and destroy all copies of the original message.
Re: BZip2 Splittable?
On Fri, 24 Feb 2012 15:43:10 GMT, Daniel Baptista wrote: Hi All, I have a cluster of 6 datanodes, all running hadoop version 0.20.2, r911707 that take a series of bzip2 compressed text files as input. I have read conflicting articles regarding whether or not hadoop can split these bzip2 files, can anyone give me a definite answer? Thanks is advance, Dan. Support for bzip2 splitting was only added in 0.21.0; see https://issues.apache.org/jira/browse/MAPREDUCE-830 You need to roll forward (or backport the patch) if you want bzip2 splitting. (And since 1.0.0 is a fork from 0.20-security, it also lacks bzip2 splitting, AFAIK. Hopefully some future 1.x will pick up more of the 0.21 features.) -John Heidemann
RE: Experience with Hadoop in production
I would add that it also depends on how thoroughly you have vetted your use cases. If you have already ironed out how ad-hoc access works, Kerberos vs Firewall and network segmentation, how code submission works, procedures for various operational issues, backup of your data, etc (the list is a couple hundred bullets long at minimum...) on your current cluster then there might be little need for that support. However if you are hoping to figure that stuff out still then you could potentially be in a world of hurt when you attempt the transition with just your own staff. It also helps to have that outside advice in certain situations to resolve cross department conflicts when it comes to how the cluster will be implemented :) Matt -Original Message- From: Mike Lyon [mailto:mike.l...@gmail.com] Sent: Thursday, February 23, 2012 2:33 PM To: common-user@hadoop.apache.org Subject: Re: Experience with Hadoop in production Just be sure you have that corporate card available 24x7 when you need to call support ;) Sent from my iPhone On Feb 23, 2012, at 10:30, Serge Blazhievsky serge.blazhiyevs...@nice.com wrote: What I have seen companies do often is that they will use free version of the commercial vendor and only get their support if there are major problems that they cannot solve on their own. That way you will get free distribution and insurance that you have support if something goes wrong. Serge On 2/23/12 10:42 AM, Jamack, Peter pjam...@consilium1.com wrote: A lot of it depends on your staff and their experiences. Maybe they don't have hadoop, but if they were involved with large databases, data warehouse, etc they can utilize their skills experiences and provide a lot of help. If you have linux admins, system admins, network admins with years of experience, they will be a goldmine.At the other end, database developers who know SQL, programmers who know Java, and so on can really help staff up your 'big data' team. Having a few people who know ETL would be great too. The biggest problem I've run into seems to be how big the Hadoop project/team is or is not. Sometimes it's just an 'experimental' department and therefore half the people are only 25-50 percent available to help out. And if they aren't really that knowledgeable about hadoop, it tends to be one of those, not enough time in the day scenarios. And the few people dedicated to the Hadoop project(s) will get the brunt of the work. It's like any ecosystem. To do it right, you might need system/network admins, a storage person to actually know how to set up the proper storage architecture, maybe a security expert, a few programmers, and a few data people. If you're combining analytics, that's another group. Of course most companies outside the Google and Facebooks of the world, will have a few people dedicated to Hadoop. Which means you need somebody who knows storage, knows networking, knows linux, knows how to be a system admin, knows security, and maybe other things(AKA if you have a firewall issue, somebody needs to figure out ways to make it work through or around), and then you need some programmes who either know MapReduce or can pretty much figure it out because they've done java for years. Peter J On 2/23/12 10:17 AM, Pavel Frolov pfro...@gmail.com wrote: Hi, We are going into 24x7 production soon and we are considering whether we need vendor support or not. We use a free vendor distribution of Cluster Provisioning + Hadoop + HBase and looked at their Enterprise version but it is very expensive for the value it provides (additional functionality + support), given that we¹ve already ironed out many of our performance and tuning issues on our own and with generous help from the community (e.g. all of you). So, I wanted to run it through the community to see if anybody can share their experience of running a Hadoop cluster (50+ nodes with Apache releases or Vendor distributions) in production, with in-house support only, and how difficult it was. How many people were involved, etc.. Regards, Pavel This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of Viruses or other Malware. Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to
Re: BZip2 Splittable?
Hi Daniel, Because your MapReduce jobs will not split bzip2 files, each entire bzip2 file will be processed by one Map task. Thus, if your job takes multiple bzip2 text files as the input, then you'll have as many Map tasks as you have files running in parallel. The Map tasks will be run by your TaskTrackers. Usually the cluster setup has the DataNode and the TaskTracker processing running on the same machines - so with 6 data nodes, you have 6 tasktrackers. Hope that answers your question. Rohit Bakhshi www.hortonworks.com (http://www.hortonworks.com/) On Friday, February 24, 2012 at 7:59 AM, Daniel Baptista wrote: Hi Rohit, thanks for the response, this is pretty much as I expected and hopefully adds weight to my other thoughts... Could this mean that all my datanodes are being sent all of the data or that only one datanode is executing the job. Thanks again , Dan. -Original Message- From: Rohit Bakhshi [mailto:ro...@hortonworks.com] Sent: 24 February 2012 15:54 To: common-user@hadoop.apache.org (mailto:common-user@hadoop.apache.org) Subject: Re: BZip2 Splittable? Daniel, I just noticed your Hadoop version - 0.20.2. The JIRA fix below is for Hadoop 0.21.0, which is a different version. So it may not be supported on your version of Hadoop. -- Rohit Bakhshi www.hortonworks.com (http://www.hortonworks.com/) On Friday, February 24, 2012 at 7:49 AM, Rohit Bakhshi wrote: Hi Daniel, Bzip2 compression codec allows for splittable files. According to this Hadoop JIRA improvement, splitting of bzip2 compressed files in Hadoop jobs is supported: https://issues.apache.org/jira/browse/HADOOP-4012 -- Rohit Bakhshi www.hortonworks.com (http://www.hortonworks.com/) On Friday, February 24, 2012 at 7:43 AM, Daniel Baptista wrote: Hi All, I have a cluster of 6 datanodes, all running hadoop version 0.20.2, r911707 that take a series of bzip2 compressed text files as input. I have read conflicting articles regarding whether or not hadoop can split these bzip2 files, can anyone give me a definite answer? Thanks is advance, Dan. CONFIDENTIALITY - This email and any files transmitted with it, are confidential, may be legally privileged and are intended solely for the use of the individual or entity to whom they are addressed. If this has come to you in error, you must not copy, distribute, disclose or use any of the information it contains. Please notify the sender immediately and delete them from your system. SECURITY - Please be aware that communication by email, by its very nature, is not 100% secure and by communicating with Perform Group by email you consent to us monitoring and reading any such correspondence. VIRUSES - Although this email message has been scanned for the presence of computer viruses, the sender accepts no liability for any damage sustained as a result of a computer virus and it is the recipient’s responsibility to ensure that email is virus free. AUTHORITY - Any views or opinions expressed in this email are solely those of the sender and do not necessarily represent those of Perform Group. COPYRIGHT - Copyright of this email and any attachments belongs to Perform Group, Companies House Registration number 6324278.
Re: BZip2 Splittable?
@Daniel, If you want to process bz2 files in parallel( more than one mapper/reducer ), you can go for Pig. See below. Pig has inbuilt support for processing .bz2 files in parallel (.gz support is coming soon). If the input file name extension is .bz2, Pig decompresses the file on the fly and passes the decompressed input stream to your load function. Regards, On Fri, Feb 24, 2012 at 2:59 PM, Rohit ro...@hortonworks.com wrote: Hi Daniel, Because your MapReduce jobs will not split bzip2 files, each entire bzip2 file will be processed by one Map task. Thus, if your job takes multiple bzip2 text files as the input, then you'll have as many Map tasks as you have files running in parallel. The Map tasks will be run by your TaskTrackers. Usually the cluster setup has the DataNode and the TaskTracker processing running on the same machines - so with 6 data nodes, you have 6 tasktrackers. Hope that answers your question. Rohit Bakhshi www.hortonworks.com (http://www.hortonworks.com/) On Friday, February 24, 2012 at 7:59 AM, Daniel Baptista wrote: Hi Rohit, thanks for the response, this is pretty much as I expected and hopefully adds weight to my other thoughts... Could this mean that all my datanodes are being sent all of the data or that only one datanode is executing the job. Thanks again , Dan. -Original Message- From: Rohit Bakhshi [mailto:ro...@hortonworks.com] Sent: 24 February 2012 15:54 To: common-user@hadoop.apache.org (mailto:common-user@hadoop.apache.org) Subject: Re: BZip2 Splittable? Daniel, I just noticed your Hadoop version - 0.20.2. The JIRA fix below is for Hadoop 0.21.0, which is a different version. So it may not be supported on your version of Hadoop. -- Rohit Bakhshi www.hortonworks.com (http://www.hortonworks.com/) On Friday, February 24, 2012 at 7:49 AM, Rohit Bakhshi wrote: Hi Daniel, Bzip2 compression codec allows for splittable files. According to this Hadoop JIRA improvement, splitting of bzip2 compressed files in Hadoop jobs is supported: https://issues.apache.org/jira/browse/HADOOP-4012 -- Rohit Bakhshi www.hortonworks.com (http://www.hortonworks.com/) On Friday, February 24, 2012 at 7:43 AM, Daniel Baptista wrote: Hi All, I have a cluster of 6 datanodes, all running hadoop version 0.20.2, r911707 that take a series of bzip2 compressed text files as input. I have read conflicting articles regarding whether or not hadoop can split these bzip2 files, can anyone give me a definite answer? Thanks is advance, Dan. CONFIDENTIALITY - This email and any files transmitted with it, are confidential, may be legally privileged and are intended solely for the use of the individual or entity to whom they are addressed. If this has come to you in error, you must not copy, distribute, disclose or use any of the information it contains. Please notify the sender immediately and delete them from your system. SECURITY - Please be aware that communication by email, by its very nature, is not 100% secure and by communicating with Perform Group by email you consent to us monitoring and reading any such correspondence. VIRUSES - Although this email message has been scanned for the presence of computer viruses, the sender accepts no liability for any damage sustained as a result of a computer virus and it is the recipient’s responsibility to ensure that email is virus free. AUTHORITY - Any views or opinions expressed in this email are solely those of the sender and do not necessarily represent those of Perform Group. COPYRIGHT - Copyright of this email and any attachments belongs to Perform Group, Companies House Registration number 6324278. -- Regards, -- Srinivas srini...@cloudwick.com
PathFilter File Glob
Hello, I would like to use a PathFilter for filtering the files with a regular expression which are read by the TextInputFormat, but I don't know how to apply the filter. I cannot find a setter. Unfortunately google was not my friend with this issue and The definitive Guide does not help that much. I am using Hadoop 0.20.2-cdh3u3. Please Help! Kind regards Simon Deutsche Telekom AG Products Innovation Simon Heeg Werkstudent T-Online-Allee 1, 64295 Darmstadt +49 6151 680-7835 (Tel.) E-Mail: s.h...@telekom.demailto:vorname.nachn...@telekom.de www.telekom.comhttp://www.telekom.com Erleben, was verbindet. Deutsche Telekom AG Aufsichtsrat: Prof. Dr. Ulrich Lehner (Vorsitzender) Vorstand: René Obermann (Vorsitzender), Dr. Manfred Balz, Reinhard Clemens, Niek Jan van Damme, Timotheus Höttges, Edward Kozel, Claudia Nemat, Thomas Sattelberger Handelsregister: Amtsgericht Bonn HRB 6794 Sitz der Gesellschaft: Bonn WEEE-Reg.-Nr. DE50478376 Große Veränderungen fangen klein an - Ressourcen schonen und nicht jede E-Mail drucken. Hinweis: Diese E-Mail und / oder die Anhänge ist / sind vertraulich und ausschließlich für den bezeichneten Adressaten bestimmt. Jegliche Durchsicht, Weitergabe oder Kopieren dieser E-Mail ist strengstens verboten. Wenn Sie diese E-Mail irrtümlich erhalten haben, informieren Sie bitte unverzüglich den Absender und vernichten Sie die Nachricht und alle Anhänge. Vielen Dank.
MapReduce tunning
I am looking at some hadoop tuning parameters like io.sort.mb, mapred.child.javaopts etc. - My question was where to look at for current setting - Are these settings configured cluster wide or per job? - What's the best way to look at reasons of slow performance?
Re: Consistent register getProtocolVersion error due to Duplicate metricsName:getProtocolVersion during cluster startup -- then various other errors during job execution
Hi again, Would you be able to make any suggestions to the below? Thanks in advance... Safdar On Feb 21, 2012 12:04 PM, Ali S Kureishy safdar.kurei...@gmail.com wrote: Hi, I've got a pseudo-distributed Hadoop (v0.20.02) setup with 1 machine (with Ubuntu 10.04 LTS) running all the hadoop processes (NN + SNN + JT + TT + DN). I've also configured the files under conf/ so that the master is referred to by its actual machine name (in this case, *bali*), instead of localhost (however, the issue below is seen regardless). I was able to successfully format the HDFS (by running hadoop namenode –format). However, right after I deploy the cluster using bin/start-all.sh, I see the following error in the NameNode's config file. It is an INFO error, but I believe it is the root cause behind various other errors I am encountering when executing actual Hadoop jobs. (For instance, at one point I see errors that the datanode and namenode were communicating using different protocol versions ... 3 vs 6 etc.). Anyway, here is the initial error: *2012-02-21 09:01:42,015 INFO org.apache.hadoop.ipc.Server: Error register getProtocolVersion java.lang.**IllegalArgumentException: Duplicate metricsName:getProtocolVersion at org.apache.hadoop.metrics.**util.MetricsRegistry.add(** MetricsRegistry.java:53) at org.apache.hadoop.metrics.**util.MetricsTimeVaryingRate.** init(MetricsTimeVaryingRate.**java:89) at org.apache.hadoop.metrics.**util.MetricsTimeVaryingRate.** init(MetricsTimeVaryingRate.**java:99) at org.apache.hadoop.ipc.RPC$**Server.call(RPC.java:523) at org.apache.hadoop.ipc.Server$**Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$**Handler$1.run(Server.java:955) at java.security.**AccessController.doPrivileged(**Native Method) at javax.security.auth.Subject.**doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$**Handler.run(Server.java:953) * I’ve scoured the web searching for other instances of this error, but none of the hits were helpful, nor relevant to my setup. My hunch is that this is preventing the cluster from correctly initializing. I would have switched to a later version of Hadoop, but the Nutch v1.4 distribution I’m trying to run on top of Hadoop is, AFAIK, only compatible with Hadoop v0.20. I have included with this email all my hadoop config files (config.rar), in case you need to take a quick look. Below is my /etc/hosts configuration, in case the issue is with that. I believe this is a hadoop-specific issue, and not related to Nutch, hence am posting to the hadoop mailing list. *ETC/HOSTS: **127.0.0.1 localhost #127.0.1.1 bali** ** # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters 192.168.1.21 bali ** FILE-SYSTEM layout:** *Here's my filesystem layout. I've got all my hadoop configs pointing to folders under a root folder called */private/user/hadoop*, with the following permissions. *ls -l /private/user/ *total 4 drwxrwxrwx 7 user alt 4096 Feb 21 09:06 hadoop *ls -l /private/user/hadoop/ *total 20 drwxr-xr-x 5 user alt 4096 Feb 21 09:01 data drwxr-xr-x 3 user alt 4096 Feb 21 09:07 mapred drwxr-xr-x 4 user alt 4096 Feb 21 08:59 name drwxr-xr-x 2 user alt 4096 Feb 21 08:59 pids drwxr-xr-x 3 user alt 4096 Feb 21 09:01 tmp Shortly after the getProtocolVersion error above, I start seeing these errors in the namenode log: *2012-02-21 09:06:47,895 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.io.IOException: Server returned HTTP response code: 503 for URL: http://192.168.1.21:50090/getimage?getimage=1 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:151) at org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:58) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at