As it turns out, this was actually a non-issue. I just had errors in
/etc/hosts.


On Wed, Jul 30, 2014 at 9:11 AM, David Fryer <[email protected]> wrote:

> I have a cluster up, but after running a single mapreduce job (pi)
> successfully, I cannot run another job without running a chmod on the
> user's home directory to open up permissions. I get the same error as in
> the original email. The other error I get is that each time I run a job,
> when I check yarn logs -applicationId, there is a no route to host error
> one one of the slaves (it is random). Removing the slave from the cluster
> allows me to run another job successfully, but another slave will get the
> same error.
>
>
> On Tue, Jul 29, 2014 at 3:58 PM, jay vyas <[email protected]>
> wrote:
>
>> update on this: David has it working, I heard from him in irc.  But its
>> unstable.  David if you get a chance, please
>> update this thread again with your issue regarding instability of your
>> hadoop cluster,
>> and regarding jobs dying after a time period.
>>
>>
>>
>> On Tue, Jul 29, 2014 at 3:14 PM, David Fryer <[email protected]>
>> wrote:
>>
>>> Hi Bigtop,
>>>
>>> I have a hadoop/mapreduce running on a 4 node bare metal cluster. Each
>>> time when I try to run a job, I get output similar to this:
>>>
>>> sudo su hdfs -c "hadoop jar
>>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 1000"
>>> Number of Maps  = 10
>>> Samples per Map = 1000
>>> 14/07/29 13:58:38 WARN mapred.JobConf: The variable mapred.child.ulimit
>>> is no longer used.
>>> Wrote input for Map #0
>>> Wrote input for Map #1
>>> Wrote input for Map #2
>>> Wrote input for Map #3
>>> Wrote input for Map #4
>>> Wrote input for Map #5
>>> Wrote input for Map #6
>>> Wrote input for Map #7
>>> Wrote input for Map #8
>>> Wrote input for Map #9
>>> Starting Job
>>> 14/07/29 13:58:40 INFO service.AbstractService:
>>> Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
>>> 14/07/29 13:58:40 INFO service.AbstractService:
>>> Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
>>> 14/07/29 13:58:41 INFO input.FileInputFormat: Total input paths to
>>> process : 10
>>> 14/07/29 13:58:41 INFO mapreduce.JobSubmitter: number of splits:10
>>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.jar is deprecated.
>>> Instead, use mapreduce.job.jar
>>> 14/07/29 13:58:41 WARN conf.Configuration:
>>> mapred.map.tasks.speculative.execution is deprecated. Instead, use
>>> mapreduce.map.speculative
>>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.reduce.tasks is
>>> deprecated. Instead, use mapreduce.job.reduces
>>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.output.value.class is
>>> deprecated. Instead, use mapreduce.job.output.value.class
>>> 14/07/29 13:58:41 WARN conf.Configuration:
>>> mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
>>> mapreduce.reduce.speculative
>>> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.map.class is
>>> deprecated. Instead, use mapreduce.job.map.class
>>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.job.name is
>>> deprecated. Instead, use mapreduce.job.name
>>> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.reduce.class is
>>> deprecated. Instead, use mapreduce.job.reduce.class
>>> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.inputformat.class
>>> is deprecated. Instead, use mapreduce.job.inputformat.class
>>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.input.dir is
>>> deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
>>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.output.dir is
>>> deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
>>> 14/07/29 13:58:41 WARN conf.Configuration: mapreduce.outputformat.class
>>> is deprecated. Instead, use mapreduce.job.outputformat.class
>>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.map.tasks is
>>> deprecated. Instead, use mapreduce.job.maps
>>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.output.key.class is
>>> deprecated. Instead, use mapreduce.job.output.key.class
>>> 14/07/29 13:58:41 WARN conf.Configuration: mapred.working.dir is
>>> deprecated. Instead, use mapreduce.job.working.dir
>>> 14/07/29 13:58:41 INFO mapreduce.JobSubmitter: Submitting tokens for
>>> job: job_1406655936930_0009
>>> 14/07/29 13:58:42 WARN mapred.JobConf: The variable mapred.child.ulimit
>>> is no longer used.
>>> 14/07/29 13:58:42 INFO client.YarnClientImpl: Submitted application
>>> application_1406655936930_0009 to ResourceManager at odin/
>>> 192.168.162.164:8032
>>> 14/07/29 13:58:42 INFO mapreduce.Job: The url to track the job:
>>> http://odin:20888/proxy/application_1406655936930_0009/
>>> 14/07/29 13:58:42 INFO mapreduce.Job: Running job: job_1406655936930_0009
>>> 14/07/29 13:58:46 INFO mapreduce.Job: Job job_1406655936930_0009 running
>>> in uber mode : false
>>> 14/07/29 13:58:46 INFO mapreduce.Job:  map 0% reduce 0%
>>> 14/07/29 13:58:59 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000001_0, Status : FAILED
>>> 14/07/29 13:58:59 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000000_0, Status : FAILED
>>> 14/07/29 13:59:01 INFO mapreduce.Job:  map 30% reduce 0%
>>> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000006_0, Status : FAILED
>>> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000002_0, Status : FAILED
>>> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000008_0, Status : FAILED
>>> 14/07/29 13:59:01 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000009_0, Status : FAILED
>>> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000003_0, Status : FAILED
>>> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000005_0, Status : FAILED
>>> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000007_0, Status : FAILED
>>> 14/07/29 13:59:02 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000004_0, Status : FAILED
>>> 14/07/29 13:59:03 INFO mapreduce.Job:  map 0% reduce 0%
>>> 14/07/29 13:59:06 INFO mapreduce.Job:  map 10% reduce 0%
>>> 14/07/29 13:59:07 INFO mapreduce.Job:  map 20% reduce 0%
>>> 14/07/29 13:59:12 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000000_1, Status : FAILED
>>> 14/07/29 13:59:12 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000001_1, Status : FAILED
>>> 14/07/29 13:59:14 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000002_1, Status : FAILED
>>> 14/07/29 13:59:14 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000006_1, Status : FAILED
>>> 14/07/29 13:59:15 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000009_1, Status : FAILED
>>> 14/07/29 13:59:15 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000008_1, Status : FAILED
>>> 14/07/29 13:59:16 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000005_1, Status : FAILED
>>> 14/07/29 13:59:16 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000007_1, Status : FAILED
>>> 14/07/29 13:59:19 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_r_000000_0, Status : FAILED
>>> 14/07/29 13:59:20 INFO mapreduce.Job:  map 30% reduce 0%
>>> 14/07/29 13:59:26 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000001_2, Status : FAILED
>>> 14/07/29 13:59:26 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000000_2, Status : FAILED
>>> 14/07/29 13:59:28 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000002_2, Status : FAILED
>>> 14/07/29 13:59:28 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000006_2, Status : FAILED
>>> 14/07/29 13:59:28 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000007_2, Status : FAILED
>>> 14/07/29 13:59:29 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000008_2, Status : FAILED
>>> 14/07/29 13:59:29 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_m_000009_2, Status : FAILED
>>> 14/07/29 13:59:33 INFO mapreduce.Job:  map 40% reduce 0%
>>> 14/07/29 13:59:34 INFO mapreduce.Job: Task Id :
>>> attempt_1406655936930_0009_r_000000_1, Status : FAILED
>>> 14/07/29 13:59:40 INFO mapreduce.Job:  map 60% reduce 0%
>>> 14/07/29 13:59:40 INFO mapreduce.Job: Job job_1406655936930_0009 failed
>>> with state FAILED due to: Task failed task_1406655936930_0009_m_000000
>>> Job failed as tasks failed. failedMaps:1 failedReduces:0
>>>
>>> 14/07/29 13:59:40 WARN mapred.JobConf: The variable mapred.child.ulimit
>>> is no longer used.
>>> 14/07/29 13:59:41 INFO mapreduce.Job: Counters: 34
>>> File System Counters
>>>  FILE: Number of bytes read=0
>>> FILE: Number of bytes written=300264
>>> FILE: Number of read operations=0
>>>  FILE: Number of large read operations=0
>>> FILE: Number of write operations=0
>>> HDFS: Number of bytes read=1040
>>>  HDFS: Number of bytes written=0
>>> HDFS: Number of read operations=16
>>> HDFS: Number of large read operations=0
>>>  HDFS: Number of write operations=0
>>> Job Counters
>>> Failed map tasks=27
>>>  Failed reduce tasks=2
>>> Launched map tasks=35
>>> Launched reduce tasks=3
>>>  Other local map tasks=25
>>> Data-local map tasks=2
>>> Rack-local map tasks=8
>>>  Total time spent by all maps in occupied slots (ms)=351421
>>> Total time spent by all reduces in occupied slots (ms)=24148
>>>  Map-Reduce Framework
>>> Map input records=4
>>> Map output records=8
>>>  Map output bytes=72
>>> Map output materialized bytes=112
>>> Input split bytes=568
>>>  Combine input records=0
>>> Spilled Records=8
>>> Failed Shuffles=0
>>>  Merged Map outputs=0
>>> GC time elapsed (ms)=60
>>> CPU time spent (ms)=1270
>>>  Physical memory (bytes) snapshot=1694736384
>>> Virtual memory (bytes) snapshot=7109738496
>>>  Total committed heap usage (bytes)=1664090112
>>>  File Input Format Counters
>>> Bytes Read=472
>>> Job Finished in 60.202 seconds
>>> java.io.FileNotFoundException: File does not exist:
>>> hdfs://odin:17020/user/hdfs/QuasiMonteCarlo_1406660318881_1358196499/out/reduce-out
>>>  at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:815)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1749)
>>>  at
>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
>>> at
>>> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
>>>  at
>>> org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>  at
>>> org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>  at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>  at java.lang.reflect.Method.invoke(Method.java:606)
>>> at
>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>>>  at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
>>> at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>  at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>>>
>>> Does anyone have any idea what would cause this?
>>>
>>> -David Fryer
>>>
>>
>>
>>
>> --
>> jay vyas
>>
>
>

Reply via email to