update on this: David has it working, I heard from him in irc.  But its
unstable.  David if you get a chance, please
update this thread again with your issue regarding instability of your
hadoop cluster,
and regarding jobs dying after a time period.



On Tue, Jul 29, 2014 at 3:14 PM, David Fryer <[email protected]> wrote:

> Hi Bigtop,
>
> I have a hadoop/mapreduce running on a 4 node bare metal cluster. Each
> time when I try to run a job, I get output similar to this:
>
> sudo su hdfs -c "hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 1000"
> Number of Maps  = 10
> Samples per Map = 1000
> 14/07/29 13:58:38 WARN mapred.JobConf: The variable mapred.child.ulimit is
> no longer used.
> Wrote input for Map #0
> Wrote input for Map #1
> Wrote input for Map #2
> Wrote input for Map #3
> Wrote input for Map #4
> Wrote input for Map #5
> Wrote input for Map #6
> Wrote input for Map #7
> Wrote input for Map #8
> Wrote input for Map #9
> Starting Job
> 14/07/29 13:58:40 INFO service.AbstractService:
> Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
> 14/07/29 13:58:40 INFO service.AbstractService:
> Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
> 14/07/29 13:58:41 INFO input.FileInputFormat: Total input paths to process
> : 10
> 14/07/29 13:58:41 INFO mapreduce.JobSubmitter: number of splits:10
> 14/07/29 13:58:41 WARN conf.Configuration: mapred.jar is deprecated.
> Instead, use mapreduce.job.jar
> 14/07/29 13:58:41 WARN conf.Configuration:
> mapred.map.tasks.speculative.execution is deprecated. Instead, use
> mapreduce.map.speculative
> 14/07/29 13:58:41 WARN conf.Configuration: mapred.reduce.tasks is
> deprecated. Instead, use mapreduce.job.reduces
> 14/07/29 13:58:41 WARN conf.Configuration: mapred.output.value.class is
> deprecated. Instead, use mapreduce.job.output.value.class
> 14/07/29 13:58:41 WARN conf.Configuration:
> mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
> mapreduce.reduce.speculative
> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.map.class is
> deprecated. Instead, use mapreduce.job.map.class
> 14/07/29 13:58:41 WARN conf.Configuration: mapred.job.name is deprecated.
> Instead, use mapreduce.job.name
> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.reduce.class is
> deprecated. Instead, use mapreduce.job.reduce.class
> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.inputformat.class is
> deprecated. Instead, use mapreduce.job.inputformat.class
> 14/07/29 13:58:41 WARN conf.Configuration: mapred.input.dir is deprecated.
> Instead, use mapreduce.input.fileinputformat.inputdir
> 14/07/29 13:58:41 WARN conf.Configuration: mapred.output.dir is
> deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.outputformat.class is
> deprecated. Instead, use mapreduce.job.outputformat.class
> 14/07/29 13:58:41 WARN conf.Configuration: mapred.map.tasks is deprecated.
> Instead, use mapreduce.job.maps
> 14/07/29 13:58:41 WARN conf.Configuration: mapred.output.key.class is
> deprecated. Instead, use mapreduce.job.output.key.class
> 14/07/29 13:58:41 WARN conf.Configuration: mapred.working.dir is
> deprecated. Instead, use mapreduce.job.working.dir
> 14/07/29 13:58:41 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1406655936930_0009
> 14/07/29 13:58:42 WARN mapred.JobConf: The variable mapred.child.ulimit is
> no longer used.
> 14/07/29 13:58:42 INFO client.YarnClientImpl: Submitted application
> application_1406655936930_0009 to ResourceManager at odin/
> 192.168.162.164:8032
> 14/07/29 13:58:42 INFO mapreduce.Job: The url to track the job:
> http://odin:20888/proxy/application_1406655936930_0009/
> 14/07/29 13:58:42 INFO mapreduce.Job: Running job: job_1406655936930_0009
> 14/07/29 13:58:46 INFO mapreduce.Job: Job job_1406655936930_0009 running
> in uber mode : false
> 14/07/29 13:58:46 INFO mapreduce.Job:  map 0% reduce 0%
> 14/07/29 13:58:59 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000001_0, Status : FAILED
> 14/07/29 13:58:59 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000000_0, Status : FAILED
> 14/07/29 13:59:01 INFO mapreduce.Job:  map 30% reduce 0%
> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000006_0, Status : FAILED
> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000002_0, Status : FAILED
> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000008_0, Status : FAILED
> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000009_0, Status : FAILED
> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000003_0, Status : FAILED
> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000005_0, Status : FAILED
> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000007_0, Status : FAILED
> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000004_0, Status : FAILED
> 14/07/29 13:59:03 INFO mapreduce.Job:  map 0% reduce 0%
> 14/07/29 13:59:06 INFO mapreduce.Job:  map 10% reduce 0%
> 14/07/29 13:59:07 INFO mapreduce.Job:  map 20% reduce 0%
> 14/07/29 13:59:12 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000000_1, Status : FAILED
> 14/07/29 13:59:12 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000001_1, Status : FAILED
> 14/07/29 13:59:14 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000002_1, Status : FAILED
> 14/07/29 13:59:14 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000006_1, Status : FAILED
> 14/07/29 13:59:15 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000009_1, Status : FAILED
> 14/07/29 13:59:15 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000008_1, Status : FAILED
> 14/07/29 13:59:16 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000005_1, Status : FAILED
> 14/07/29 13:59:16 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000007_1, Status : FAILED
> 14/07/29 13:59:19 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_r_000000_0, Status : FAILED
> 14/07/29 13:59:20 INFO mapreduce.Job:  map 30% reduce 0%
> 14/07/29 13:59:26 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000001_2, Status : FAILED
> 14/07/29 13:59:26 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000000_2, Status : FAILED
> 14/07/29 13:59:28 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000002_2, Status : FAILED
> 14/07/29 13:59:28 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000006_2, Status : FAILED
> 14/07/29 13:59:28 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000007_2, Status : FAILED
> 14/07/29 13:59:29 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000008_2, Status : FAILED
> 14/07/29 13:59:29 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_m_000009_2, Status : FAILED
> 14/07/29 13:59:33 INFO mapreduce.Job:  map 40% reduce 0%
> 14/07/29 13:59:34 INFO mapreduce.Job: Task Id :
> attempt_1406655936930_0009_r_000000_1, Status : FAILED
> 14/07/29 13:59:40 INFO mapreduce.Job:  map 60% reduce 0%
> 14/07/29 13:59:40 INFO mapreduce.Job: Job job_1406655936930_0009 failed
> with state FAILED due to: Task failed task_1406655936930_0009_m_000000
> Job failed as tasks failed. failedMaps:1 failedReduces:0
>
> 14/07/29 13:59:40 WARN mapred.JobConf: The variable mapred.child.ulimit is
> no longer used.
> 14/07/29 13:59:41 INFO mapreduce.Job: Counters: 34
> File System Counters
>  FILE: Number of bytes read=0
> FILE: Number of bytes written=300264
> FILE: Number of read operations=0
>  FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=1040
>  HDFS: Number of bytes written=0
> HDFS: Number of read operations=16
> HDFS: Number of large read operations=0
>  HDFS: Number of write operations=0
> Job Counters
> Failed map tasks=27
>  Failed reduce tasks=2
> Launched map tasks=35
> Launched reduce tasks=3
>  Other local map tasks=25
> Data-local map tasks=2
> Rack-local map tasks=8
>  Total time spent by all maps in occupied slots (ms)=351421
> Total time spent by all reduces in occupied slots (ms)=24148
>  Map-Reduce Framework
> Map input records=4
> Map output records=8
>  Map output bytes=72
> Map output materialized bytes=112
> Input split bytes=568
>  Combine input records=0
> Spilled Records=8
> Failed Shuffles=0
>  Merged Map outputs=0
> GC time elapsed (ms)=60
> CPU time spent (ms)=1270
>  Physical memory (bytes) snapshot=1694736384
> Virtual memory (bytes) snapshot=7109738496
>  Total committed heap usage (bytes)=1664090112
>  File Input Format Counters
> Bytes Read=472
> Job Finished in 60.202 seconds
> java.io.FileNotFoundException: File does not exist:
> hdfs://odin:17020/user/hdfs/QuasiMonteCarlo_1406660318881_1358196499/out/reduce-out
>  at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:815)
> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1749)
>  at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
> at
> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
>  at
> org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>  at
> org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>  at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
> at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>
> Does anyone have any idea what would cause this?
>
> -David Fryer
>



-- 
jay vyas

Reply via email to