As it turns out, this was actually a non-issue. I just had errors in /etc/hosts.
On Wed, Jul 30, 2014 at 9:11 AM, David Fryer <[email protected]> wrote: > I have a cluster up, but after running a single mapreduce job (pi) > successfully, I cannot run another job without running a chmod on the > user's home directory to open up permissions. I get the same error as in > the original email. The other error I get is that each time I run a job, > when I check yarn logs -applicationId, there is a no route to host error > one one of the slaves (it is random). Removing the slave from the cluster > allows me to run another job successfully, but another slave will get the > same error. > > > On Tue, Jul 29, 2014 at 3:58 PM, jay vyas <[email protected]> > wrote: > >> update on this: David has it working, I heard from him in irc. But its >> unstable. David if you get a chance, please >> update this thread again with your issue regarding instability of your >> hadoop cluster, >> and regarding jobs dying after a time period. >> >> >> >> On Tue, Jul 29, 2014 at 3:14 PM, David Fryer <[email protected]> >> wrote: >> >>> Hi Bigtop, >>> >>> I have a hadoop/mapreduce running on a 4 node bare metal cluster. Each >>> time when I try to run a job, I get output similar to this: >>> >>> sudo su hdfs -c "hadoop jar >>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 1000" >>> Number of Maps = 10 >>> Samples per Map = 1000 >>> 14/07/29 13:58:38 WARN mapred.JobConf: The variable mapred.child.ulimit >>> is no longer used. >>> Wrote input for Map #0 >>> Wrote input for Map #1 >>> Wrote input for Map #2 >>> Wrote input for Map #3 >>> Wrote input for Map #4 >>> Wrote input for Map #5 >>> Wrote input for Map #6 >>> Wrote input for Map #7 >>> Wrote input for Map #8 >>> Wrote input for Map #9 >>> Starting Job >>> 14/07/29 13:58:40 INFO service.AbstractService: >>> Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. >>> 14/07/29 13:58:40 INFO service.AbstractService: >>> Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. >>> 14/07/29 13:58:41 INFO input.FileInputFormat: Total input paths to >>> process : 10 >>> 14/07/29 13:58:41 INFO mapreduce.JobSubmitter: number of splits:10 >>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.jar is deprecated. >>> Instead, use mapreduce.job.jar >>> 14/07/29 13:58:41 WARN conf.Configuration: >>> mapred.map.tasks.speculative.execution is deprecated. Instead, use >>> mapreduce.map.speculative >>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.reduce.tasks is >>> deprecated. Instead, use mapreduce.job.reduces >>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.output.value.class is >>> deprecated. Instead, use mapreduce.job.output.value.class >>> 14/07/29 13:58:41 WARN conf.Configuration: >>> mapred.reduce.tasks.speculative.execution is deprecated. Instead, use >>> mapreduce.reduce.speculative >>> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.map.class is >>> deprecated. Instead, use mapreduce.job.map.class >>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.job.name is >>> deprecated. Instead, use mapreduce.job.name >>> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.reduce.class is >>> deprecated. Instead, use mapreduce.job.reduce.class >>> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.inputformat.class >>> is deprecated. Instead, use mapreduce.job.inputformat.class >>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.input.dir is >>> deprecated. Instead, use mapreduce.input.fileinputformat.inputdir >>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.output.dir is >>> deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir >>> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.outputformat.class >>> is deprecated. Instead, use mapreduce.job.outputformat.class >>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.map.tasks is >>> deprecated. Instead, use mapreduce.job.maps >>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.output.key.class is >>> deprecated. Instead, use mapreduce.job.output.key.class >>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.working.dir is >>> deprecated. Instead, use mapreduce.job.working.dir >>> 14/07/29 13:58:41 INFO mapreduce.JobSubmitter: Submitting tokens for >>> job: job_1406655936930_0009 >>> 14/07/29 13:58:42 WARN mapred.JobConf: The variable mapred.child.ulimit >>> is no longer used. >>> 14/07/29 13:58:42 INFO client.YarnClientImpl: Submitted application >>> application_1406655936930_0009 to ResourceManager at odin/ >>> 192.168.162.164:8032 >>> 14/07/29 13:58:42 INFO mapreduce.Job: The url to track the job: >>> http://odin:20888/proxy/application_1406655936930_0009/ >>> 14/07/29 13:58:42 INFO mapreduce.Job: Running job: job_1406655936930_0009 >>> 14/07/29 13:58:46 INFO mapreduce.Job: Job job_1406655936930_0009 running >>> in uber mode : false >>> 14/07/29 13:58:46 INFO mapreduce.Job: map 0% reduce 0% >>> 14/07/29 13:58:59 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000001_0, Status : FAILED >>> 14/07/29 13:58:59 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000000_0, Status : FAILED >>> 14/07/29 13:59:01 INFO mapreduce.Job: map 30% reduce 0% >>> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000006_0, Status : FAILED >>> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000002_0, Status : FAILED >>> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000008_0, Status : FAILED >>> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000009_0, Status : FAILED >>> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000003_0, Status : FAILED >>> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000005_0, Status : FAILED >>> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000007_0, Status : FAILED >>> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000004_0, Status : FAILED >>> 14/07/29 13:59:03 INFO mapreduce.Job: map 0% reduce 0% >>> 14/07/29 13:59:06 INFO mapreduce.Job: map 10% reduce 0% >>> 14/07/29 13:59:07 INFO mapreduce.Job: map 20% reduce 0% >>> 14/07/29 13:59:12 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000000_1, Status : FAILED >>> 14/07/29 13:59:12 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000001_1, Status : FAILED >>> 14/07/29 13:59:14 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000002_1, Status : FAILED >>> 14/07/29 13:59:14 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000006_1, Status : FAILED >>> 14/07/29 13:59:15 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000009_1, Status : FAILED >>> 14/07/29 13:59:15 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000008_1, Status : FAILED >>> 14/07/29 13:59:16 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000005_1, Status : FAILED >>> 14/07/29 13:59:16 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000007_1, Status : FAILED >>> 14/07/29 13:59:19 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_r_000000_0, Status : FAILED >>> 14/07/29 13:59:20 INFO mapreduce.Job: map 30% reduce 0% >>> 14/07/29 13:59:26 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000001_2, Status : FAILED >>> 14/07/29 13:59:26 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000000_2, Status : FAILED >>> 14/07/29 13:59:28 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000002_2, Status : FAILED >>> 14/07/29 13:59:28 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000006_2, Status : FAILED >>> 14/07/29 13:59:28 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000007_2, Status : FAILED >>> 14/07/29 13:59:29 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000008_2, Status : FAILED >>> 14/07/29 13:59:29 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_m_000009_2, Status : FAILED >>> 14/07/29 13:59:33 INFO mapreduce.Job: map 40% reduce 0% >>> 14/07/29 13:59:34 INFO mapreduce.Job: Task Id : >>> attempt_1406655936930_0009_r_000000_1, Status : FAILED >>> 14/07/29 13:59:40 INFO mapreduce.Job: map 60% reduce 0% >>> 14/07/29 13:59:40 INFO mapreduce.Job: Job job_1406655936930_0009 failed >>> with state FAILED due to: Task failed task_1406655936930_0009_m_000000 >>> Job failed as tasks failed. failedMaps:1 failedReduces:0 >>> >>> 14/07/29 13:59:40 WARN mapred.JobConf: The variable mapred.child.ulimit >>> is no longer used. >>> 14/07/29 13:59:41 INFO mapreduce.Job: Counters: 34 >>> File System Counters >>> FILE: Number of bytes read=0 >>> FILE: Number of bytes written=300264 >>> FILE: Number of read operations=0 >>> FILE: Number of large read operations=0 >>> FILE: Number of write operations=0 >>> HDFS: Number of bytes read=1040 >>> HDFS: Number of bytes written=0 >>> HDFS: Number of read operations=16 >>> HDFS: Number of large read operations=0 >>> HDFS: Number of write operations=0 >>> Job Counters >>> Failed map tasks=27 >>> Failed reduce tasks=2 >>> Launched map tasks=35 >>> Launched reduce tasks=3 >>> Other local map tasks=25 >>> Data-local map tasks=2 >>> Rack-local map tasks=8 >>> Total time spent by all maps in occupied slots (ms)=351421 >>> Total time spent by all reduces in occupied slots (ms)=24148 >>> Map-Reduce Framework >>> Map input records=4 >>> Map output records=8 >>> Map output bytes=72 >>> Map output materialized bytes=112 >>> Input split bytes=568 >>> Combine input records=0 >>> Spilled Records=8 >>> Failed Shuffles=0 >>> Merged Map outputs=0 >>> GC time elapsed (ms)=60 >>> CPU time spent (ms)=1270 >>> Physical memory (bytes) snapshot=1694736384 >>> Virtual memory (bytes) snapshot=7109738496 >>> Total committed heap usage (bytes)=1664090112 >>> File Input Format Counters >>> Bytes Read=472 >>> Job Finished in 60.202 seconds >>> java.io.FileNotFoundException: File does not exist: >>> hdfs://odin:17020/user/hdfs/QuasiMonteCarlo_1406660318881_1358196499/out/reduce-out >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:815) >>> at >>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1749) >>> at >>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773) >>> at >>> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314) >>> at >>> org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >>> at >>> org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> at >>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72) >>> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144) >>> at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:212) >>> >>> Does anyone have any idea what would cause this? >>> >>> -David Fryer >>> >> >> >> >> -- >> jay vyas >> > >
