restarting hadoop by start-all.sh brought the cluster back to working condition. I do not think there is persistent any network change. Checking with AWS folks if there was a temporary failure
On Wed, Jun 12, 2013 at 6:20 PM, Shahab Yunus <[email protected]>wrote: > Broken Pipe is a network related issue usually. Have you verified no > change in network connectivity? > > Regards, > Shahab > > > On Wed, Jun 12, 2013 at 3:17 AM, Ravi Shetye <[email protected]> wrote: > >> In last 4-5 of day the task tracker on one of my slave machines has gone >> down couple of time. It has been working fine from the past 4-5 months >> >> The cluster configuration is >> 4 machine cluster on AWS >> 1 m2.xlarge master >> 3 m2.xlarge slaves >> >> The cluster is dedicated to run hive queries, with the data residing on >> s3. >> >> the slave on which the task tracker went down had the following log >> >> ******************************************************************* >> 2013-06-11 00:26:30,968 INFO >> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, >> dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID: >> attempt_201306071409_0151_m_005693_0, duration: 279198 >> 2013-06-11 00:26:30,971 INFO >> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, >> dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID: >> attempt_201306071409_0151_m_005700_0, duration: 193135 >> 2013-06-11 00:26:30,971 INFO >> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, >> dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID: >> attempt_201306071409_0151_m_005700_0, duration: 192011 >> 2013-06-11 00:26:30,972 INFO >> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, >> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID: >> attempt_201306071409_0151_m_005693_0, duration: 178209 >> 2013-06-11 00:26:30,973 INFO >> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, >> dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID: >> attempt_201306071409_0151_m_005694_0, duration: 186452 >> 2013-06-11 00:26:30,973 INFO >> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, >> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID: >> attempt_201306071409_0151_m_005694_0, duration: 157360 >> 2013-06-11 00:26:30,974 INFO >> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, >> dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID: >> attempt_201306071409_0151_m_005700_0, duration: 157555 >> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not >> killed jvm_201306071409_0151_m_-435659475 but just removed >> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM : >> jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks >> it ran: 0 >> 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught >> Throwable in JVMRunner. Aborting TaskTracker. >> org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe >> at >> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200) >> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) >> at >> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49) >> at java.io.DataOutputStream.write(DataOutputStream.java:107) >> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220) >> at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315) >> at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148) >> at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233) >> at java.io.BufferedWriter.close(BufferedWriter.java:265) >> at java.io.PrintWriter.close(PrintWriter.java:312) >> at >> org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231) >> at >> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126) >> at >> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497) >> at >> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471) >> Caused by: java.io.IOException: Broken pipe >> at java.io.FileOutputStream.writeBytes(Native Method) >> at java.io.FileOutputStream.write(FileOutputStream.java:297) >> at >> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198) >> ... 13 more >> 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In >> JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221 >> 2013-06-11 00:26:31,008 INFO >> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, >> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID: >> attempt_201306071409_0151_m_005694_0, duration: 222430 >> 2013-06-11 00:26:31,008 INFO >> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, >> dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID: >> attempt_201306071409_0151_m_005693_0, duration: 154027 >> 2013-06-11 00:26:31,008 INFO >> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, >> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID: >> attempt_201306071409_0151_m_005700_0, duration: 132067 >> 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM >> Runner jvm_201306071409_0151_m_-495709221 spawned. >> 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController: >> Writing commands to >> /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh >> 2013-06-11 00:26:31,331 INFO >> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, >> dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID: >> attempt_201306071409_0151_m_005700_0, duration: 437236 >> 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker: >> SHUTDOWN_MSG: >> /************************************************************ >> SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.*** >> ************************************************************/ >> >> -- >> RAVI SHETYE >> > > -- RAVI SHETYE
