I have a cluster up, but after running a single mapreduce job (pi)
successfully, I cannot run another job without running a chmod on the
user's home directory to open up permissions. I get the same error as in
the original email. The other error I get is that each time I run a job,
when I check yarn logs -applicationId, there is a no route to host error
one one of the slaves (it is random). Removing the slave from the cluster
allows me to run another job successfully, but another slave will get the
same error.


On Tue, Jul 29, 2014 at 3:58 PM, jay vyas <[email protected]>
wrote:

> update on this: David has it working, I heard from him in irc.  But its
> unstable.  David if you get a chance, please
> update this thread again with your issue regarding instability of your
> hadoop cluster,
> and regarding jobs dying after a time period.
>
>
>
> On Tue, Jul 29, 2014 at 3:14 PM, David Fryer <[email protected]> wrote:
>
>> Hi Bigtop,
>>
>> I have a hadoop/mapreduce running on a 4 node bare metal cluster. Each
>> time when I try to run a job, I get output similar to this:
>>
>> sudo su hdfs -c "hadoop jar
>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 1000"
>> Number of Maps  = 10
>> Samples per Map = 1000
>> 14/07/29 13:58:38 WARN mapred.JobConf: The variable mapred.child.ulimit
>> is no longer used.
>> Wrote input for Map #0
>> Wrote input for Map #1
>> Wrote input for Map #2
>> Wrote input for Map #3
>> Wrote input for Map #4
>> Wrote input for Map #5
>> Wrote input for Map #6
>> Wrote input for Map #7
>> Wrote input for Map #8
>> Wrote input for Map #9
>> Starting Job
>> 14/07/29 13:58:40 INFO service.AbstractService:
>> Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
>> 14/07/29 13:58:40 INFO service.AbstractService:
>> Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
>> 14/07/29 13:58:41 INFO input.FileInputFormat: Total input paths to
>> process : 10
>> 14/07/29 13:58:41 INFO mapreduce.JobSubmitter: number of splits:10
>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.jar is deprecated.
>> Instead, use mapreduce.job.jar
>> 14/07/29 13:58:41 WARN conf.Configuration:
>> mapred.map.tasks.speculative.execution is deprecated. Instead, use
>> mapreduce.map.speculative
>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.reduce.tasks is
>> deprecated. Instead, use mapreduce.job.reduces
>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.output.value.class is
>> deprecated. Instead, use mapreduce.job.output.value.class
>> 14/07/29 13:58:41 WARN conf.Configuration:
>> mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
>> mapreduce.reduce.speculative
>> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.map.class is
>> deprecated. Instead, use mapreduce.job.map.class
>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.job.name is
>> deprecated. Instead, use mapreduce.job.name
>> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.reduce.class is
>> deprecated. Instead, use mapreduce.job.reduce.class
>> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.inputformat.class is
>> deprecated. Instead, use mapreduce.job.inputformat.class
>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.input.dir is
>> deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.output.dir is
>> deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
>> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.outputformat.class
>> is deprecated. Instead, use mapreduce.job.outputformat.class
>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.map.tasks is
>> deprecated. Instead, use mapreduce.job.maps
>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.output.key.class is
>> deprecated. Instead, use mapreduce.job.output.key.class
>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.working.dir is
>> deprecated. Instead, use mapreduce.job.working.dir
>> 14/07/29 13:58:41 INFO mapreduce.JobSubmitter: Submitting tokens for job:
>> job_1406655936930_0009
>> 14/07/29 13:58:42 WARN mapred.JobConf: The variable mapred.child.ulimit
>> is no longer used.
>> 14/07/29 13:58:42 INFO client.YarnClientImpl: Submitted application
>> application_1406655936930_0009 to ResourceManager at odin/
>> 192.168.162.164:8032
>> 14/07/29 13:58:42 INFO mapreduce.Job: The url to track the job:
>> http://odin:20888/proxy/application_1406655936930_0009/
>> 14/07/29 13:58:42 INFO mapreduce.Job: Running job: job_1406655936930_0009
>> 14/07/29 13:58:46 INFO mapreduce.Job: Job job_1406655936930_0009 running
>> in uber mode : false
>> 14/07/29 13:58:46 INFO mapreduce.Job:  map 0% reduce 0%
>> 14/07/29 13:58:59 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000001_0, Status : FAILED
>> 14/07/29 13:58:59 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000000_0, Status : FAILED
>> 14/07/29 13:59:01 INFO mapreduce.Job:  map 30% reduce 0%
>> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000006_0, Status : FAILED
>> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000002_0, Status : FAILED
>> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000008_0, Status : FAILED
>> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000009_0, Status : FAILED
>> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000003_0, Status : FAILED
>> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000005_0, Status : FAILED
>> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000007_0, Status : FAILED
>> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000004_0, Status : FAILED
>> 14/07/29 13:59:03 INFO mapreduce.Job:  map 0% reduce 0%
>> 14/07/29 13:59:06 INFO mapreduce.Job:  map 10% reduce 0%
>> 14/07/29 13:59:07 INFO mapreduce.Job:  map 20% reduce 0%
>> 14/07/29 13:59:12 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000000_1, Status : FAILED
>> 14/07/29 13:59:12 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000001_1, Status : FAILED
>> 14/07/29 13:59:14 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000002_1, Status : FAILED
>> 14/07/29 13:59:14 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000006_1, Status : FAILED
>> 14/07/29 13:59:15 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000009_1, Status : FAILED
>> 14/07/29 13:59:15 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000008_1, Status : FAILED
>> 14/07/29 13:59:16 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000005_1, Status : FAILED
>> 14/07/29 13:59:16 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000007_1, Status : FAILED
>> 14/07/29 13:59:19 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_r_000000_0, Status : FAILED
>> 14/07/29 13:59:20 INFO mapreduce.Job:  map 30% reduce 0%
>> 14/07/29 13:59:26 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000001_2, Status : FAILED
>> 14/07/29 13:59:26 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000000_2, Status : FAILED
>> 14/07/29 13:59:28 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000002_2, Status : FAILED
>> 14/07/29 13:59:28 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000006_2, Status : FAILED
>> 14/07/29 13:59:28 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000007_2, Status : FAILED
>> 14/07/29 13:59:29 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000008_2, Status : FAILED
>> 14/07/29 13:59:29 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_m_000009_2, Status : FAILED
>> 14/07/29 13:59:33 INFO mapreduce.Job:  map 40% reduce 0%
>> 14/07/29 13:59:34 INFO mapreduce.Job: Task Id :
>> attempt_1406655936930_0009_r_000000_1, Status : FAILED
>> 14/07/29 13:59:40 INFO mapreduce.Job:  map 60% reduce 0%
>> 14/07/29 13:59:40 INFO mapreduce.Job: Job job_1406655936930_0009 failed
>> with state FAILED due to: Task failed task_1406655936930_0009_m_000000
>> Job failed as tasks failed. failedMaps:1 failedReduces:0
>>
>> 14/07/29 13:59:40 WARN mapred.JobConf: The variable mapred.child.ulimit
>> is no longer used.
>> 14/07/29 13:59:41 INFO mapreduce.Job: Counters: 34
>> File System Counters
>>  FILE: Number of bytes read=0
>> FILE: Number of bytes written=300264
>> FILE: Number of read operations=0
>>  FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=1040
>>  HDFS: Number of bytes written=0
>> HDFS: Number of read operations=16
>> HDFS: Number of large read operations=0
>>  HDFS: Number of write operations=0
>> Job Counters
>> Failed map tasks=27
>>  Failed reduce tasks=2
>> Launched map tasks=35
>> Launched reduce tasks=3
>>  Other local map tasks=25
>> Data-local map tasks=2
>> Rack-local map tasks=8
>>  Total time spent by all maps in occupied slots (ms)=351421
>> Total time spent by all reduces in occupied slots (ms)=24148
>>  Map-Reduce Framework
>> Map input records=4
>> Map output records=8
>>  Map output bytes=72
>> Map output materialized bytes=112
>> Input split bytes=568
>>  Combine input records=0
>> Spilled Records=8
>> Failed Shuffles=0
>>  Merged Map outputs=0
>> GC time elapsed (ms)=60
>> CPU time spent (ms)=1270
>>  Physical memory (bytes) snapshot=1694736384
>> Virtual memory (bytes) snapshot=7109738496
>>  Total committed heap usage (bytes)=1664090112
>>  File Input Format Counters
>> Bytes Read=472
>> Job Finished in 60.202 seconds
>> java.io.FileNotFoundException: File does not exist:
>> hdfs://odin:17020/user/hdfs/QuasiMonteCarlo_1406660318881_1358196499/out/reduce-out
>>  at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:815)
>> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1749)
>>  at
>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
>> at
>> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
>>  at
>> org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>  at
>> org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>  at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>>  at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
>> at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>  at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>>
>> Does anyone have any idea what would cause this?
>>
>> -David Fryer
>>
>
>
>
> --
> jay vyas
>

Reply via email to