update on this: David has it working, I heard from him in irc. But its unstable. David if you get a chance, please update this thread again with your issue regarding instability of your hadoop cluster, and regarding jobs dying after a time period.
On Tue, Jul 29, 2014 at 3:14 PM, David Fryer <[email protected]> wrote: > Hi Bigtop, > > I have a hadoop/mapreduce running on a 4 node bare metal cluster. Each > time when I try to run a job, I get output similar to this: > > sudo su hdfs -c "hadoop jar > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 1000" > Number of Maps = 10 > Samples per Map = 1000 > 14/07/29 13:58:38 WARN mapred.JobConf: The variable mapred.child.ulimit is > no longer used. > Wrote input for Map #0 > Wrote input for Map #1 > Wrote input for Map #2 > Wrote input for Map #3 > Wrote input for Map #4 > Wrote input for Map #5 > Wrote input for Map #6 > Wrote input for Map #7 > Wrote input for Map #8 > Wrote input for Map #9 > Starting Job > 14/07/29 13:58:40 INFO service.AbstractService: > Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. > 14/07/29 13:58:40 INFO service.AbstractService: > Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. > 14/07/29 13:58:41 INFO input.FileInputFormat: Total input paths to process > : 10 > 14/07/29 13:58:41 INFO mapreduce.JobSubmitter: number of splits:10 > 14/07/29 13:58:41 WARN conf.Configuration: mapred.jar is deprecated. > Instead, use mapreduce.job.jar > 14/07/29 13:58:41 WARN conf.Configuration: > mapred.map.tasks.speculative.execution is deprecated. Instead, use > mapreduce.map.speculative > 14/07/29 13:58:41 WARN conf.Configuration: mapred.reduce.tasks is > deprecated. Instead, use mapreduce.job.reduces > 14/07/29 13:58:41 WARN conf.Configuration: mapred.output.value.class is > deprecated. Instead, use mapreduce.job.output.value.class > 14/07/29 13:58:41 WARN conf.Configuration: > mapred.reduce.tasks.speculative.execution is deprecated. Instead, use > mapreduce.reduce.speculative > 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.map.class is > deprecated. Instead, use mapreduce.job.map.class > 14/07/29 13:58:41 WARN conf.Configuration: mapred.job.name is deprecated. > Instead, use mapreduce.job.name > 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.reduce.class is > deprecated. Instead, use mapreduce.job.reduce.class > 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.inputformat.class is > deprecated. Instead, use mapreduce.job.inputformat.class > 14/07/29 13:58:41 WARN conf.Configuration: mapred.input.dir is deprecated. > Instead, use mapreduce.input.fileinputformat.inputdir > 14/07/29 13:58:41 WARN conf.Configuration: mapred.output.dir is > deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir > 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.outputformat.class is > deprecated. Instead, use mapreduce.job.outputformat.class > 14/07/29 13:58:41 WARN conf.Configuration: mapred.map.tasks is deprecated. > Instead, use mapreduce.job.maps > 14/07/29 13:58:41 WARN conf.Configuration: mapred.output.key.class is > deprecated. Instead, use mapreduce.job.output.key.class > 14/07/29 13:58:41 WARN conf.Configuration: mapred.working.dir is > deprecated. Instead, use mapreduce.job.working.dir > 14/07/29 13:58:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1406655936930_0009 > 14/07/29 13:58:42 WARN mapred.JobConf: The variable mapred.child.ulimit is > no longer used. > 14/07/29 13:58:42 INFO client.YarnClientImpl: Submitted application > application_1406655936930_0009 to ResourceManager at odin/ > 192.168.162.164:8032 > 14/07/29 13:58:42 INFO mapreduce.Job: The url to track the job: > http://odin:20888/proxy/application_1406655936930_0009/ > 14/07/29 13:58:42 INFO mapreduce.Job: Running job: job_1406655936930_0009 > 14/07/29 13:58:46 INFO mapreduce.Job: Job job_1406655936930_0009 running > in uber mode : false > 14/07/29 13:58:46 INFO mapreduce.Job: map 0% reduce 0% > 14/07/29 13:58:59 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000001_0, Status : FAILED > 14/07/29 13:58:59 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000000_0, Status : FAILED > 14/07/29 13:59:01 INFO mapreduce.Job: map 30% reduce 0% > 14/07/29 13:59:01 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000006_0, Status : FAILED > 14/07/29 13:59:01 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000002_0, Status : FAILED > 14/07/29 13:59:01 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000008_0, Status : FAILED > 14/07/29 13:59:01 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000009_0, Status : FAILED > 14/07/29 13:59:02 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000003_0, Status : FAILED > 14/07/29 13:59:02 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000005_0, Status : FAILED > 14/07/29 13:59:02 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000007_0, Status : FAILED > 14/07/29 13:59:02 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000004_0, Status : FAILED > 14/07/29 13:59:03 INFO mapreduce.Job: map 0% reduce 0% > 14/07/29 13:59:06 INFO mapreduce.Job: map 10% reduce 0% > 14/07/29 13:59:07 INFO mapreduce.Job: map 20% reduce 0% > 14/07/29 13:59:12 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000000_1, Status : FAILED > 14/07/29 13:59:12 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000001_1, Status : FAILED > 14/07/29 13:59:14 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000002_1, Status : FAILED > 14/07/29 13:59:14 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000006_1, Status : FAILED > 14/07/29 13:59:15 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000009_1, Status : FAILED > 14/07/29 13:59:15 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000008_1, Status : FAILED > 14/07/29 13:59:16 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000005_1, Status : FAILED > 14/07/29 13:59:16 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000007_1, Status : FAILED > 14/07/29 13:59:19 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_r_000000_0, Status : FAILED > 14/07/29 13:59:20 INFO mapreduce.Job: map 30% reduce 0% > 14/07/29 13:59:26 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000001_2, Status : FAILED > 14/07/29 13:59:26 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000000_2, Status : FAILED > 14/07/29 13:59:28 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000002_2, Status : FAILED > 14/07/29 13:59:28 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000006_2, Status : FAILED > 14/07/29 13:59:28 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000007_2, Status : FAILED > 14/07/29 13:59:29 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000008_2, Status : FAILED > 14/07/29 13:59:29 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_m_000009_2, Status : FAILED > 14/07/29 13:59:33 INFO mapreduce.Job: map 40% reduce 0% > 14/07/29 13:59:34 INFO mapreduce.Job: Task Id : > attempt_1406655936930_0009_r_000000_1, Status : FAILED > 14/07/29 13:59:40 INFO mapreduce.Job: map 60% reduce 0% > 14/07/29 13:59:40 INFO mapreduce.Job: Job job_1406655936930_0009 failed > with state FAILED due to: Task failed task_1406655936930_0009_m_000000 > Job failed as tasks failed. failedMaps:1 failedReduces:0 > > 14/07/29 13:59:40 WARN mapred.JobConf: The variable mapred.child.ulimit is > no longer used. > 14/07/29 13:59:41 INFO mapreduce.Job: Counters: 34 > File System Counters > FILE: Number of bytes read=0 > FILE: Number of bytes written=300264 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > HDFS: Number of bytes read=1040 > HDFS: Number of bytes written=0 > HDFS: Number of read operations=16 > HDFS: Number of large read operations=0 > HDFS: Number of write operations=0 > Job Counters > Failed map tasks=27 > Failed reduce tasks=2 > Launched map tasks=35 > Launched reduce tasks=3 > Other local map tasks=25 > Data-local map tasks=2 > Rack-local map tasks=8 > Total time spent by all maps in occupied slots (ms)=351421 > Total time spent by all reduces in occupied slots (ms)=24148 > Map-Reduce Framework > Map input records=4 > Map output records=8 > Map output bytes=72 > Map output materialized bytes=112 > Input split bytes=568 > Combine input records=0 > Spilled Records=8 > Failed Shuffles=0 > Merged Map outputs=0 > GC time elapsed (ms)=60 > CPU time spent (ms)=1270 > Physical memory (bytes) snapshot=1694736384 > Virtual memory (bytes) snapshot=7109738496 > Total committed heap usage (bytes)=1664090112 > File Input Format Counters > Bytes Read=472 > Job Finished in 60.202 seconds > java.io.FileNotFoundException: File does not exist: > hdfs://odin:17020/user/hdfs/QuasiMonteCarlo_1406660318881_1358196499/out/reduce-out > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:815) > at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1749) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773) > at > org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314) > at > org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144) > at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > > Does anyone have any idea what would cause this? > > -David Fryer > -- jay vyas
