I cleaned up the log directory before running the job. Now there is no nodemanager jobs. when i see in userlogs directory i am getting some syslog files with the following error:
2014-04-14 11:58:23,472 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-04-14 11:58:24,474 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-04-14 11:58:25,475 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: poc-hadoop06/127.0.1.1:40221. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-04-14 11:58:25,477 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From poc-hadoop06/ 127.0.1.1 to poc-hadoop06:40221 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231) at $Proxy6.getTask(Unknown Source) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:137) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:601) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:696) at org.apache.hadoop.ipc.Client$Connection.access$2700(Client.java:367) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1458) at org.apache.hadoop.ipc.Client.call(Client.java:1377) ... 4 more 2014-04-14 11:58:25,478 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ReduceTask metrics system... 2014-04-14 11:58:25,478 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system stopped. 2014-04-14 11:58:25,479 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system shutdown complete. On Mon, Apr 14, 2014 at 12:21 PM, Shengjun Xin <[email protected]> wrote: > You can check the nodemanager log > > > On Mon, Apr 14, 2014 at 2:38 PM, Rahul Singh > <[email protected]>wrote: > >> Hi, >> I am running a job(wordcount example) on 3 node cluster(1 master and 2 >> slave), some times the job passes but some times it fails(as reduce fails, >> input data few kbs). >> I am not able to nail down the reason of this inconsistency. >> >> >> >> failed log: >> >> 14/04/14 11:57:24 WARN util.NativeCodeLoader: Unable to load >> native-hadoop library for your platform... using builtin-java classes where >> applicable >> 14/04/14 11:57:25 INFO client.RMProxy: Connecting to ResourceManager at / >> 20.0.1.206:8032 >> 14/04/14 11:57:26 INFO input.FileInputFormat: Total input paths to >> process : 1 >> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: number of splits:1 >> 14/04/14 11:57:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: >> job_1397454060494_0003 >> 14/04/14 11:57:26 INFO impl.YarnClientImpl: Submitted application >> application_1397454060494_0003 >> 14/04/14 11:57:26 INFO mapreduce.Job: The url to track the job: >> http://20.0.1.206:8088/proxy/application_1397454060494_0003/ >> 14/04/14 11:57:26 INFO mapreduce.Job: Running job: job_1397454060494_0003 >> 14/04/14 11:57:34 INFO mapreduce.Job: Job job_1397454060494_0003 running >> in uber mode : false >> 14/04/14 11:57:34 INFO mapreduce.Job: map 0% reduce 0% >> 14/04/14 11:57:40 INFO mapreduce.Job: map 100% reduce 0% >> 14/04/14 11:57:46 INFO mapreduce.Job: map 100% reduce 13% >> 14/04/14 11:57:48 INFO mapreduce.Job: map 100% reduce 25% >> 14/04/14 11:57:49 INFO mapreduce.Job: map 100% reduce 38% >> 14/04/14 11:57:50 INFO mapreduce.Job: map 100% reduce 50% >> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0003_r_000003_0, Status : FAILED >> 14/04/14 11:57:54 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0003_r_000001_0, Status : FAILED >> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0003_r_000005_0, Status : FAILED >> 14/04/14 11:57:56 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0003_r_000007_0, Status : FAILED >> 14/04/14 11:58:02 INFO mapreduce.Job: map 100% reduce 63% >> 14/04/14 11:58:04 INFO mapreduce.Job: map 100% reduce 75% >> 14/04/14 11:58:09 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0003_r_000003_1, Status : FAILED >> 14/04/14 11:58:11 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0003_r_000005_1, Status : FAILED >> 14/04/14 11:58:24 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0003_r_000003_2, Status : FAILED >> 14/04/14 11:58:26 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0003_r_000005_2, Status : FAILED >> 14/04/14 11:58:40 INFO mapreduce.Job: map 100% reduce 100% >> 14/04/14 11:58:40 INFO mapreduce.Job: Job job_1397454060494_0003 failed >> with state FAILED due to: Task failed task_1397454060494_0003_r_000003 >> Job failed as tasks failed. failedMaps:0 failedReduces:1 >> >> 14/04/14 11:58:40 INFO mapreduce.Job: Counters: 51 >> File System Counters >> FILE: Number of bytes read=80 >> FILE: Number of bytes written=596766 >> FILE: Number of read operations=0 >> FILE: Number of large read operations=0 >> FILE: Number of write operations=0 >> HDFS: Number of bytes read=175 >> HDFS: Number of bytes written=28 >> HDFS: Number of read operations=21 >> HDFS: Number of large read operations=0 >> HDFS: Number of write operations=12 >> Job Counters >> Failed reduce tasks=9 >> Killed reduce tasks=1 >> Launched map tasks=1 >> Launched reduce tasks=16 >> Data-local map tasks=1 >> Total time spent by all maps in occupied slots (ms)=3477 >> Total time spent by all reduces in occupied slots (ms)=148867 >> Total time spent by all map tasks (ms)=3477 >> Total time spent by all reduce tasks (ms)=148867 >> Total vcore-seconds taken by all map tasks=3477 >> Total vcore-seconds taken by all reduce tasks=148867 >> Total megabyte-seconds taken by all map tasks=3560448 >> Total megabyte-seconds taken by all reduce tasks=152439808 >> Map-Reduce Framework >> Map input records=3 >> Map output records=13 >> Map output bytes=110 >> Map output materialized bytes=112 >> Input split bytes=117 >> Combine input records=13 >> Combine output records=6 >> Reduce input groups=4 >> Reduce shuffle bytes=80 >> Reduce input records=4 >> Reduce output records=4 >> Spilled Records=10 >> Shuffled Maps =6 >> Failed Shuffles=0 >> Merged Map outputs=6 >> GC time elapsed (ms)=142 >> CPU time spent (ms)=6420 >> Physical memory (bytes) snapshot=1100853248 >> Virtual memory (bytes) snapshot=4468314112 >> Total committed heap usage (bytes)=1406992384 >> Shuffle Errors >> BAD_ID=0 >> CONNECTION=0 >> IO_ERROR=0 >> WRONG_LENGTH=0 >> WRONG_MAP=0 >> WRONG_REDUCE=0 >> File Input Format Counters >> Bytes Read=58 >> File Output Format Counters >> Bytes Written=28 >> >> Job Passing Logs: >> hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar >> wordcount /user/hduser/input /user/hduser/output_wordcount9 >> 14/04/14 11:47:27 WARN util.NativeCodeLoader: Unable to load >> native-hadoop library for your platform... using builtin-java classes where >> applicable >> 14/04/14 11:47:28 INFO client.RMProxy: Connecting to ResourceManager at / >> 20.0.1.206:8032 >> 14/04/14 11:47:28 INFO input.FileInputFormat: Total input paths to >> process : 1 >> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: number of splits:1 >> 14/04/14 11:47:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: >> job_1397454060494_0002 >> 14/04/14 11:47:29 INFO impl.YarnClientImpl: Submitted application >> application_1397454060494_0002 >> 14/04/14 11:47:29 INFO mapreduce.Job: The url to track the job: >> http://20.0.1.206:8088/proxy/application_1397454060494_0002/ >> 14/04/14 11:47:29 INFO mapreduce.Job: Running job: job_1397454060494_0002 >> 14/04/14 11:47:36 INFO mapreduce.Job: Job job_1397454060494_0002 running >> in uber mode : false >> 14/04/14 11:47:36 INFO mapreduce.Job: map 0% reduce 0% >> 14/04/14 11:47:50 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0002_m_000000_0, Status : FAILED >> 14/04/14 11:48:05 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0002_m_000000_1, Status : FAILED >> 14/04/14 11:48:20 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0002_m_000000_2, Status : FAILED >> 14/04/14 11:48:26 INFO mapreduce.Job: map 100% reduce 0% >> 14/04/14 11:48:34 INFO mapreduce.Job: map 100% reduce 13% >> 14/04/14 11:48:35 INFO mapreduce.Job: map 100% reduce 25% >> 14/04/14 11:48:37 INFO mapreduce.Job: map 100% reduce 50% >> 14/04/14 11:48:41 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0002_r_000001_0, Status : FAILED >> 14/04/14 11:48:42 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0002_r_000003_0, Status : FAILED >> 14/04/14 11:48:43 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0002_r_000005_0, Status : FAILED >> 14/04/14 11:48:44 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0002_r_000007_0, Status : FAILED >> 14/04/14 11:48:50 INFO mapreduce.Job: map 100% reduce 63% >> 14/04/14 11:48:51 INFO mapreduce.Job: map 100% reduce 75% >> 14/04/14 11:48:52 INFO mapreduce.Job: map 100% reduce 88% >> 14/04/14 11:48:58 INFO mapreduce.Job: Task Id : >> attempt_1397454060494_0002_r_000005_1, Status : FAILED >> 14/04/14 11:49:05 INFO mapreduce.Job: map 100% reduce 100% >> 14/04/14 11:49:06 INFO mapreduce.Job: Job job_1397454060494_0002 >> completed successfully >> 14/04/14 11:49:06 INFO mapreduce.Job: Counters: 52 >> File System Counters >> FILE: Number of bytes read=112 >> FILE: Number of bytes written=767175 >> FILE: Number of read operations=0 >> FILE: Number of large read operations=0 >> FILE: Number of write operations=0 >> HDFS: Number of bytes read=175 >> HDFS: Number of bytes written=40 >> HDFS: Number of read operations=27 >> HDFS: Number of large read operations=0 >> HDFS: Number of write operations=16 >> Job Counters >> Failed map tasks=3 >> Failed reduce tasks=5 >> Launched map tasks=4 >> Launched reduce tasks=13 >> Other local map tasks=3 >> Data-local map tasks=1 >> Total time spent by all maps in occupied slots (ms)=41629 >> Total time spent by all reduces in occupied slots (ms)=104530 >> Total time spent by all map tasks (ms)=41629 >> Total time spent by all reduce tasks (ms)=104530 >> Total vcore-seconds taken by all map tasks=41629 >> Total vcore-seconds taken by all reduce tasks=104530 >> Total megabyte-seconds taken by all map tasks=42628096 >> Total megabyte-seconds taken by all reduce tasks=107038720 >> Map-Reduce Framework >> Map input records=3 >> Map output records=13 >> Map output bytes=110 >> Map output materialized bytes=112 >> Input split bytes=117 >> Combine input records=13 >> Combine output records=6 >> Reduce input groups=6 >> Reduce shuffle bytes=112 >> Reduce input records=6 >> Reduce output records=6 >> Spilled Records=12 >> Shuffled Maps =8 >> Failed Shuffles=0 >> Merged Map outputs=8 >> GC time elapsed (ms)=186 >> CPU time spent (ms)=8890 >> Physical memory (bytes) snapshot=1408913408 >> Virtual memory (bytes) snapshot=5727019008 >> Total committed heap usage (bytes)=1808990208 >> Shuffle Errors >> BAD_ID=0 >> CONNECTION=0 >> IO_ERROR=0 >> WRONG_LENGTH=0 >> WRONG_MAP=0 >> WRONG_REDUCE=0 >> File Input Format Counters >> Bytes Read=58 >> File Output Format Counters >> Bytes Written=40 >> >> Thanks and Regards, >> -Rahul Singh >> > > > > -- > Regards > Shengjun >
