Broken Pipe is a network related issue usually. Have you verified no change in network connectivity?
Regards, Shahab On Wed, Jun 12, 2013 at 3:17 AM, Ravi Shetye <[email protected]> wrote: > In last 4-5 of day the task tracker on one of my slave machines has gone > down couple of time. It has been working fine from the past 4-5 months > > The cluster configuration is > 4 machine cluster on AWS > 1 m2.xlarge master > 3 m2.xlarge slaves > > The cluster is dedicated to run hive queries, with the data residing on s3. > > the slave on which the task tracker went down had the following log > > ******************************************************************* > 2013-06-11 00:26:30,968 INFO > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, > dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID: > attempt_201306071409_0151_m_005693_0, duration: 279198 > 2013-06-11 00:26:30,971 INFO > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, > dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID: > attempt_201306071409_0151_m_005700_0, duration: 193135 > 2013-06-11 00:26:30,971 INFO > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, > dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID: > attempt_201306071409_0151_m_005700_0, duration: 192011 > 2013-06-11 00:26:30,972 INFO > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, > dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID: > attempt_201306071409_0151_m_005693_0, duration: 178209 > 2013-06-11 00:26:30,973 INFO > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, > dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID: > attempt_201306071409_0151_m_005694_0, duration: 186452 > 2013-06-11 00:26:30,973 INFO > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, > dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID: > attempt_201306071409_0151_m_005694_0, duration: 157360 > 2013-06-11 00:26:30,974 INFO > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, > dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID: > attempt_201306071409_0151_m_005700_0, duration: 157555 > 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not > killed jvm_201306071409_0151_m_-435659475 but just removed > 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM : > jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks > it ran: 0 > 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught > Throwable in JVMRunner. Aborting TaskTracker. > org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220) > at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315) > at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148) > at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233) > at java.io.BufferedWriter.close(BufferedWriter.java:265) > at java.io.PrintWriter.close(PrintWriter.java:312) > at > org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231) > at > org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126) > at > org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497) > at > org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471) > Caused by: java.io.IOException: Broken pipe > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:297) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198) > ... 13 more > 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In > JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221 > 2013-06-11 00:26:31,008 INFO > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, > dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID: > attempt_201306071409_0151_m_005694_0, duration: 222430 > 2013-06-11 00:26:31,008 INFO > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, > dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID: > attempt_201306071409_0151_m_005693_0, duration: 154027 > 2013-06-11 00:26:31,008 INFO > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, > dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID: > attempt_201306071409_0151_m_005700_0, duration: 132067 > 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM > Runner jvm_201306071409_0151_m_-495709221 spawned. > 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController: > Writing commands to > /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh > 2013-06-11 00:26:31,331 INFO > org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060, > dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID: > attempt_201306071409_0151_m_005700_0, duration: 437236 > 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker: > SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.*** > ************************************************************/ > > -- > RAVI SHETYE >
