[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error

2020-04-09 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15219:
---
Attachment: HDFS-15219.001.patch

> DFS Client will stuck when ResponseProcessor.run throw Error
> 
>
> Key: HDFS-15219
> URL: https://issues.apache.org/jira/browse/HDFS-15219
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.3
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-15219.001.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> In my case, a Tez application stucked more than 2 hours util we kill this 
> applicaiton. The Reason is a task attempt stucked, becuase speculative 
> execution is disable. 
> Then Exception like this:
> {code:java}
> 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
> read - 10
> 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: 
> records written - 100
> 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
> read - 100
> 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block 
> BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
> |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for 
> block 
> BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] 
> threw an Error. Shutting down now...
> java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253)
>  at java.lang.String.valueOf(String.java:2847)
>  at java.lang.StringBuilder.append(StringBuilder.java:128)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737)
> Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>  ... 4 more
> Caused by: java.util.zip.ZipException: error reading zip file
>  at java.util.zip.ZipFile.read(Native Method)
>  at java.util.zip.ZipFile.access$1400(ZipFile.java:56)
>  at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679)
>  at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415)
>  at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
>  at sun.misc.Resource.getBytes(Resource.java:124)
>  at java.net.URLClassLoader.defineClass(URLClassLoader.java:444)
>  at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>  ... 10 more
> 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block 
> BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
> |util.ExitUtil|: Exiting with status -1
> 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
> Received should die response from AM
> 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
> Asked to die via task heartbeat
> 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: 
> Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an 
> invocation of shutdownRequested
> {code}
> Reason is UncaughtException. When time is 01:29, a disk was error, so throw 
> NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch 
> NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then 
> DataStream didn't know ReponseProcessor was dead, and can't trigger 
> closeResponder, so stucked in DataStream.run.
>  I tested in unit-test TestDataStream.testDfsClient. When I throw 
> NoClassDefFoundError in ResponseProcessor.run, the 
> TestDataStream.testDfsClient will failed bacause of timeout.
> I think we should catch Throwable but not Exception in ReponseProcessor.run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error

2020-03-25 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15219:
---
Fix Version/s: 3.2.2
   3.1.4

> DFS Client will stuck when ResponseProcessor.run throw Error
> 
>
> Key: HDFS-15219
> URL: https://issues.apache.org/jira/browse/HDFS-15219
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.3
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> In my case, a Tez application stucked more than 2 hours util we kill this 
> applicaiton. The Reason is a task attempt stucked, becuase speculative 
> execution is disable. 
> Then Exception like this:
> {code:java}
> 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
> read - 10
> 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: 
> records written - 100
> 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
> read - 100
> 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block 
> BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
> |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for 
> block 
> BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] 
> threw an Error. Shutting down now...
> java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253)
>  at java.lang.String.valueOf(String.java:2847)
>  at java.lang.StringBuilder.append(StringBuilder.java:128)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737)
> Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>  ... 4 more
> Caused by: java.util.zip.ZipException: error reading zip file
>  at java.util.zip.ZipFile.read(Native Method)
>  at java.util.zip.ZipFile.access$1400(ZipFile.java:56)
>  at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679)
>  at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415)
>  at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
>  at sun.misc.Resource.getBytes(Resource.java:124)
>  at java.net.URLClassLoader.defineClass(URLClassLoader.java:444)
>  at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>  ... 10 more
> 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block 
> BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
> |util.ExitUtil|: Exiting with status -1
> 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
> Received should die response from AM
> 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
> Asked to die via task heartbeat
> 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: 
> Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an 
> invocation of shutdownRequested
> {code}
> Reason is UncaughtException. When time is 01:29, a disk was error, so throw 
> NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch 
> NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then 
> DataStream didn't know ReponseProcessor was dead, and can't trigger 
> closeResponder, so stucked in DataStream.run.
>  I tested in unit-test TestDataStream.testDfsClient. When I throw 
> NoClassDefFoundError in ResponseProcessor.run, the 
> TestDataStream.testDfsClient will failed bacause of timeout.
> I think we should catch Throwable but not Exception in ReponseProcessor.run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error

2020-03-24 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15219:

Fix Version/s: 3.3.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> DFS Client will stuck when ResponseProcessor.run throw Error
> 
>
> Key: HDFS-15219
> URL: https://issues.apache.org/jira/browse/HDFS-15219
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.3
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Fix For: 3.3.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> In my case, a Tez application stucked more than 2 hours util we kill this 
> applicaiton. The Reason is a task attempt stucked, becuase speculative 
> execution is disable. 
> Then Exception like this:
> {code:java}
> 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
> read - 10
> 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: 
> records written - 100
> 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
> read - 100
> 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block 
> BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
> |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for 
> block 
> BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] 
> threw an Error. Shutting down now...
> java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253)
>  at java.lang.String.valueOf(String.java:2847)
>  at java.lang.StringBuilder.append(StringBuilder.java:128)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737)
> Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>  ... 4 more
> Caused by: java.util.zip.ZipException: error reading zip file
>  at java.util.zip.ZipFile.read(Native Method)
>  at java.util.zip.ZipFile.access$1400(ZipFile.java:56)
>  at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679)
>  at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415)
>  at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
>  at sun.misc.Resource.getBytes(Resource.java:124)
>  at java.net.URLClassLoader.defineClass(URLClassLoader.java:444)
>  at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>  ... 10 more
> 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block 
> BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
> |util.ExitUtil|: Exiting with status -1
> 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
> Received should die response from AM
> 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
> Asked to die via task heartbeat
> 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: 
> Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an 
> invocation of shutdownRequested
> {code}
> Reason is UncaughtException. When time is 01:29, a disk was error, so throw 
> NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch 
> NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then 
> DataStream didn't know ReponseProcessor was dead, and can't trigger 
> closeResponder, so stucked in DataStream.run.
>  I tested in unit-test TestDataStream.testDfsClient. When I throw 
> NoClassDefFoundError in ResponseProcessor.run, the 
> TestDataStream.testDfsClient will failed bacause of timeout.
> I think we should catch Throwable but not Exception in ReponseProcessor.run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error

2020-03-19 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-15219:
---
Fix Version/s: (was: 3.2.2)

> DFS Client will stuck when ResponseProcessor.run throw Error
> 
>
> Key: HDFS-15219
> URL: https://issues.apache.org/jira/browse/HDFS-15219
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.3
>Reporter: zhengchenyu
>Priority: Major
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> In my case, a Tez application stucked more than 2 hours util we kill this 
> applicaiton. The Reason is a task attempt stucked, becuase speculative 
> execution is disable. 
> Then Exception like this:
> {code:java}
> 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
> read - 10
> 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: 
> records written - 100
> 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
> read - 100
> 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block 
> BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
> |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for 
> block 
> BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] 
> threw an Error. Shutting down now...
> java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253)
>  at java.lang.String.valueOf(String.java:2847)
>  at java.lang.StringBuilder.append(StringBuilder.java:128)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737)
> Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>  ... 4 more
> Caused by: java.util.zip.ZipException: error reading zip file
>  at java.util.zip.ZipFile.read(Native Method)
>  at java.util.zip.ZipFile.access$1400(ZipFile.java:56)
>  at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679)
>  at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415)
>  at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
>  at sun.misc.Resource.getBytes(Resource.java:124)
>  at java.net.URLClassLoader.defineClass(URLClassLoader.java:444)
>  at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>  ... 10 more
> 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block 
> BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
> |util.ExitUtil|: Exiting with status -1
> 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
> Received should die response from AM
> 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
> Asked to die via task heartbeat
> 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: 
> Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an 
> invocation of shutdownRequested
> {code}
> Reason is UncaughtException. When time is 01:29, a disk was error, so throw 
> NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch 
> NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then 
> DataStream didn't know ReponseProcessor was dead, and can't trigger 
> closeResponder, so stucked in DataStream.run.
>  I tested in unit-test TestDataStream.testDfsClient. When I throw 
> NoClassDefFoundError in ResponseProcessor.run, the 
> TestDataStream.testDfsClient will failed bacause of timeout.
> I think we should catch Throwable but not Exception in ReponseProcessor.run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error

2020-03-19 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-15219:
---
Status: Patch Available  (was: Open)

> DFS Client will stuck when ResponseProcessor.run throw Error
> 
>
> Key: HDFS-15219
> URL: https://issues.apache.org/jira/browse/HDFS-15219
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.3
>Reporter: zhengchenyu
>Priority: Major
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> In my case, a Tez application stucked more than 2 hours util we kill this 
> applicaiton. The Reason is a task attempt stucked, becuase speculative 
> execution is disable. 
> Then Exception like this:
> {code:java}
> 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
> read - 10
> 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: 
> records written - 100
> 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
> read - 100
> 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block 
> BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
> |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for 
> block 
> BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] 
> threw an Error. Shutting down now...
> java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253)
>  at java.lang.String.valueOf(String.java:2847)
>  at java.lang.StringBuilder.append(StringBuilder.java:128)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737)
> Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>  ... 4 more
> Caused by: java.util.zip.ZipException: error reading zip file
>  at java.util.zip.ZipFile.read(Native Method)
>  at java.util.zip.ZipFile.access$1400(ZipFile.java:56)
>  at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679)
>  at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415)
>  at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
>  at sun.misc.Resource.getBytes(Resource.java:124)
>  at java.net.URLClassLoader.defineClass(URLClassLoader.java:444)
>  at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>  ... 10 more
> 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block 
> BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
> |util.ExitUtil|: Exiting with status -1
> 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
> Received should die response from AM
> 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
> Asked to die via task heartbeat
> 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: 
> Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an 
> invocation of shutdownRequested
> {code}
> Reason is UncaughtException. When time is 01:29, a disk was error, so throw 
> NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch 
> NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then 
> DataStream didn't know ReponseProcessor was dead, and can't trigger 
> closeResponder, so stucked in DataStream.run.
>  I tested in unit-test TestDataStream.testDfsClient. When I throw 
> NoClassDefFoundError in ResponseProcessor.run, the 
> TestDataStream.testDfsClient will failed bacause of timeout.
> I think we should catch Throwable but not Exception in ReponseProcessor.run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error

2020-03-11 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15219:
---
Description: 
In my case, a Tez application stucked more than 2 hours util we kill this 
applicaiton. The Reason is a task attempt stucked, becuase speculative 
execution is disable. 

Then Exception like this:
{code:java}
2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 10
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: 
records written - 100
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 100
2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
|yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] 
threw an Error. Shutting down now...
java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253)
 at java.lang.String.valueOf(String.java:2847)
 at java.lang.StringBuilder.append(StringBuilder.java:128)
 at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737)
Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat
 at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 4 more
Caused by: java.util.zip.ZipException: error reading zip file
 at java.util.zip.ZipFile.read(Native Method)
 at java.util.zip.ZipFile.access$1400(ZipFile.java:56)
 at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679)
 at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415)
 at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
 at sun.misc.Resource.getBytes(Resource.java:124)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:444)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 ... 10 more
2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
|util.ExitUtil|: Exiting with status -1
2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
Received should die response from AM
2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Asked 
to die via task heartbeat
2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: 
Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an 
invocation of shutdownRequested

{code}
Reason is UncaughtException. When time is 01:29, a disk was error, so throw 
NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch 
NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then 
DataStream didn't know ReponseProcessor was dead, and can't trigger 
closeResponder, so stucked in DataStream.run.

 I tested in unit-test TestDataStream.testDfsClient. When I throw 
NoClassDefFoundError in ResponseProcessor.run, the TestDataStream.testDfsClient 
will failed bacause of timeout.

I think we should catch Throwable but not Exception in ReponseProcessor.run.

 

  was:
In my case, a Tez application stucked more than 2 hours util we kill this 
applicaiton. The Reason is a task attempt stucked, becuase speculative 
execution is disable. 

Then Exception like this:
{code:java}
2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 10
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: 
records written - 100
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 100
2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
|yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] 
threw an Error. Shutting down now...
java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253)
 at java.lang.String.valueOf(String.java:2847)
 at java.lang.StringBuilder.append(StringBuilder.java:128)
 at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737)
Caused by: java.lang.ClassNotFoundException: 

[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error

2020-03-11 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15219:
---
Description: 
In my case, a Tez application stucked more than 2 hours util we kill this 
applicaiton. The Reason is a task attempt stucked, becuase speculative 
execution is disable. 

Then Exception like this:
{code:java}
2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 10
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: 
records written - 100
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 100
2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
|yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] 
threw an Error. Shutting down now...
java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253)
 at java.lang.String.valueOf(String.java:2847)
 at java.lang.StringBuilder.append(StringBuilder.java:128)
 at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737)
Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat
 at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 4 more
Caused by: java.util.zip.ZipException: error reading zip file
 at java.util.zip.ZipFile.read(Native Method)
 at java.util.zip.ZipFile.access$1400(ZipFile.java:56)
 at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679)
 at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415)
 at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
 at sun.misc.Resource.getBytes(Resource.java:124)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:444)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 ... 10 more
2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
|util.ExitUtil|: Exiting with status -1
2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
Received should die response from AM
2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Asked 
to die via task heartbeat
2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: 
Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an 
invocation of shutdownRequested

{code}
Reason is UncaughtException. When time is 01:29, a disk was error, so throw 
NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch 
NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then 
DataStream didn't know ReponseProcessor was dead, and can't trigger 
closeResponder, so stucked in DataStream.run.

 I tested in unit-test TestDataStream.testDfsClient. When I throw 
NoClassDefFoundError, the TestDataStream.testDfsClient will failed bacause of 
timeout.

I think we should catch Throwable but not Exception in ReponseProcessor.run.

 

  was:
In my case, a Tez application stucked more than 2 hours util we kill this 
applicaiton. The Reason is a task attempt stucked, becuase speculative 
execution is disable. 

Then Exception like this:
{code:java}
2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 10
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: 
records written - 100
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 100
2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
|yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] 
threw an Error. Shutting down now...
java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253)
 at java.lang.String.valueOf(String.java:2847)
 at java.lang.StringBuilder.append(StringBuilder.java:128)
 at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737)
Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat
 at 

[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error

2020-03-11 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15219:
---
Description: 
In my case, a Tez application stucked more than 2 hours util we kill this 
applicaiton. The Reason is a task attempt stucked, becuase speculative 
execution is disable. 

Then Exception like this:
{code:java}
2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 10
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: 
records written - 100
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 100
2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
|yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] 
threw an Error. Shutting down now...
java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253)
 at java.lang.String.valueOf(String.java:2847)
 at java.lang.StringBuilder.append(StringBuilder.java:128)
 at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737)
Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat
 at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 4 more
Caused by: java.util.zip.ZipException: error reading zip file
 at java.util.zip.ZipFile.read(Native Method)
 at java.util.zip.ZipFile.access$1400(ZipFile.java:56)
 at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679)
 at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415)
 at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
 at sun.misc.Resource.getBytes(Resource.java:124)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:444)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 ... 10 more
2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
|util.ExitUtil|: Exiting with status -1
2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
Received should die response from AM
2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Asked 
to die via task heartbeat
2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: 
Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an 
invocation of shutdownRequested

{code}
 

 

  was:
In my case, a Tez application stucked more than 2 hours util we kill this 
applicaiton. The Reason is a task attempt stucked, becuase speculative 
execution is disable. 

Then Exception like this:

{code}

2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 10
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: 
records written - 100
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 100
2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
|yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] 
threw an Error. Shutting down now...
java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253)
 at java.lang.String.valueOf(String.java:2847)
 at java.lang.StringBuilder.append(StringBuilder.java:128)
 at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737)
Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat
 at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 4 more
Caused by: java.util.zip.ZipException: error reading zip file
 at java.util.zip.ZipFile.read(Native Method)
 at 

[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error

2020-03-11 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-15219:
---
Description: 
In my case, a Tez application stucked more than 2 hours util we kill this 
applicaiton. The Reason is a task attempt stucked, becuase speculative 
execution is disable. 

Then Exception like this:
{code:java}
2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 10
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: 
records written - 100
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 100
2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
|yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] 
threw an Error. Shutting down now...
java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253)
 at java.lang.String.valueOf(String.java:2847)
 at java.lang.StringBuilder.append(StringBuilder.java:128)
 at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737)
Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat
 at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 4 more
Caused by: java.util.zip.ZipException: error reading zip file
 at java.util.zip.ZipFile.read(Native Method)
 at java.util.zip.ZipFile.access$1400(ZipFile.java:56)
 at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679)
 at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415)
 at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
 at sun.misc.Resource.getBytes(Resource.java:124)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:444)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 ... 10 more
2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
|util.ExitUtil|: Exiting with status -1
2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: 
Received should die response from AM
2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Asked 
to die via task heartbeat
2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: 
Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an 
invocation of shutdownRequested

{code}
 Reason is UncaughtException. ResponseProcessor.run 

 

  was:
In my case, a Tez application stucked more than 2 hours util we kill this 
applicaiton. The Reason is a task attempt stucked, becuase speculative 
execution is disable. 

Then Exception like this:
{code:java}
2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 10
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: 
records written - 100
2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records 
read - 100
2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] 
|yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block 
BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] 
threw an Error. Shutting down now...
java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253)
 at java.lang.String.valueOf(String.java:2847)
 at java.lang.StringBuilder.append(StringBuilder.java:128)
 at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737)
Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat
 at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 4 more
Caused by: java.util.zip.ZipException: error reading zip file
 at java.util.zip.ZipFile.read(Native