[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15219: --- Attachment: HDFS-15219.001.patch > DFS Client will stuck when ResponseProcessor.run throw Error > > > Key: HDFS-15219 > URL: https://issues.apache.org/jira/browse/HDFS-15219 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.3 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-15219.001.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > In my case, a Tez application stucked more than 2 hours util we kill this > applicaiton. The Reason is a task attempt stucked, becuase speculative > execution is disable. > Then Exception like this: > {code:java} > 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 10 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: > records written - 100 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 100 > 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for > block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] > threw an Error. Shutting down now... > java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat > at > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) > at java.lang.String.valueOf(String.java:2847) > at java.lang.StringBuilder.append(StringBuilder.java:128) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) > Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 4 more > Caused by: java.util.zip.ZipException: error reading zip file > at java.util.zip.ZipFile.read(Native Method) > at java.util.zip.ZipFile.access$1400(ZipFile.java:56) > at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) > at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) > at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) > at sun.misc.Resource.getBytes(Resource.java:124) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > ... 10 more > 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |util.ExitUtil|: Exiting with status -1 > 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Received should die response from AM > 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Asked to die via task heartbeat > 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: > Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an > invocation of shutdownRequested > {code} > Reason is UncaughtException. When time is 01:29, a disk was error, so throw > NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch > NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then > DataStream didn't know ReponseProcessor was dead, and can't trigger > closeResponder, so stucked in DataStream.run. > I tested in unit-test TestDataStream.testDfsClient. When I throw > NoClassDefFoundError in ResponseProcessor.run, the > TestDataStream.testDfsClient will failed bacause of timeout. > I think we should catch Throwable but not Exception in ReponseProcessor.run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15219: --- Fix Version/s: 3.2.2 3.1.4 > DFS Client will stuck when ResponseProcessor.run throw Error > > > Key: HDFS-15219 > URL: https://issues.apache.org/jira/browse/HDFS-15219 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.3 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Original Estimate: 672h > Remaining Estimate: 672h > > In my case, a Tez application stucked more than 2 hours util we kill this > applicaiton. The Reason is a task attempt stucked, becuase speculative > execution is disable. > Then Exception like this: > {code:java} > 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 10 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: > records written - 100 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 100 > 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for > block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] > threw an Error. Shutting down now... > java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat > at > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) > at java.lang.String.valueOf(String.java:2847) > at java.lang.StringBuilder.append(StringBuilder.java:128) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) > Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 4 more > Caused by: java.util.zip.ZipException: error reading zip file > at java.util.zip.ZipFile.read(Native Method) > at java.util.zip.ZipFile.access$1400(ZipFile.java:56) > at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) > at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) > at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) > at sun.misc.Resource.getBytes(Resource.java:124) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > ... 10 more > 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |util.ExitUtil|: Exiting with status -1 > 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Received should die response from AM > 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Asked to die via task heartbeat > 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: > Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an > invocation of shutdownRequested > {code} > Reason is UncaughtException. When time is 01:29, a disk was error, so throw > NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch > NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then > DataStream didn't know ReponseProcessor was dead, and can't trigger > closeResponder, so stucked in DataStream.run. > I tested in unit-test TestDataStream.testDfsClient. When I throw > NoClassDefFoundError in ResponseProcessor.run, the > TestDataStream.testDfsClient will failed bacause of timeout. > I think we should catch Throwable but not Exception in ReponseProcessor.run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-15219: Fix Version/s: 3.3.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) > DFS Client will stuck when ResponseProcessor.run throw Error > > > Key: HDFS-15219 > URL: https://issues.apache.org/jira/browse/HDFS-15219 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.3 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.0 > > Original Estimate: 672h > Remaining Estimate: 672h > > In my case, a Tez application stucked more than 2 hours util we kill this > applicaiton. The Reason is a task attempt stucked, becuase speculative > execution is disable. > Then Exception like this: > {code:java} > 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 10 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: > records written - 100 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 100 > 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for > block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] > threw an Error. Shutting down now... > java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat > at > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) > at java.lang.String.valueOf(String.java:2847) > at java.lang.StringBuilder.append(StringBuilder.java:128) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) > Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 4 more > Caused by: java.util.zip.ZipException: error reading zip file > at java.util.zip.ZipFile.read(Native Method) > at java.util.zip.ZipFile.access$1400(ZipFile.java:56) > at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) > at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) > at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) > at sun.misc.Resource.getBytes(Resource.java:124) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > ... 10 more > 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |util.ExitUtil|: Exiting with status -1 > 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Received should die response from AM > 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Asked to die via task heartbeat > 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: > Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an > invocation of shutdownRequested > {code} > Reason is UncaughtException. When time is 01:29, a disk was error, so throw > NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch > NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then > DataStream didn't know ReponseProcessor was dead, and can't trigger > closeResponder, so stucked in DataStream.run. > I tested in unit-test TestDataStream.testDfsClient. When I throw > NoClassDefFoundError in ResponseProcessor.run, the > TestDataStream.testDfsClient will failed bacause of timeout. > I think we should catch Throwable but not Exception in ReponseProcessor.run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-15219: --- Fix Version/s: (was: 3.2.2) > DFS Client will stuck when ResponseProcessor.run throw Error > > > Key: HDFS-15219 > URL: https://issues.apache.org/jira/browse/HDFS-15219 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.3 >Reporter: zhengchenyu >Priority: Major > Original Estimate: 672h > Remaining Estimate: 672h > > In my case, a Tez application stucked more than 2 hours util we kill this > applicaiton. The Reason is a task attempt stucked, becuase speculative > execution is disable. > Then Exception like this: > {code:java} > 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 10 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: > records written - 100 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 100 > 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for > block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] > threw an Error. Shutting down now... > java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat > at > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) > at java.lang.String.valueOf(String.java:2847) > at java.lang.StringBuilder.append(StringBuilder.java:128) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) > Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 4 more > Caused by: java.util.zip.ZipException: error reading zip file > at java.util.zip.ZipFile.read(Native Method) > at java.util.zip.ZipFile.access$1400(ZipFile.java:56) > at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) > at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) > at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) > at sun.misc.Resource.getBytes(Resource.java:124) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > ... 10 more > 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |util.ExitUtil|: Exiting with status -1 > 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Received should die response from AM > 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Asked to die via task heartbeat > 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: > Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an > invocation of shutdownRequested > {code} > Reason is UncaughtException. When time is 01:29, a disk was error, so throw > NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch > NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then > DataStream didn't know ReponseProcessor was dead, and can't trigger > closeResponder, so stucked in DataStream.run. > I tested in unit-test TestDataStream.testDfsClient. When I throw > NoClassDefFoundError in ResponseProcessor.run, the > TestDataStream.testDfsClient will failed bacause of timeout. > I think we should catch Throwable but not Exception in ReponseProcessor.run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-15219: --- Status: Patch Available (was: Open) > DFS Client will stuck when ResponseProcessor.run throw Error > > > Key: HDFS-15219 > URL: https://issues.apache.org/jira/browse/HDFS-15219 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.3 >Reporter: zhengchenyu >Priority: Major > Original Estimate: 672h > Remaining Estimate: 672h > > In my case, a Tez application stucked more than 2 hours util we kill this > applicaiton. The Reason is a task attempt stucked, becuase speculative > execution is disable. > Then Exception like this: > {code:java} > 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 10 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: > records written - 100 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 100 > 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for > block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] > threw an Error. Shutting down now... > java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat > at > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) > at java.lang.String.valueOf(String.java:2847) > at java.lang.StringBuilder.append(StringBuilder.java:128) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) > Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 4 more > Caused by: java.util.zip.ZipException: error reading zip file > at java.util.zip.ZipFile.read(Native Method) > at java.util.zip.ZipFile.access$1400(ZipFile.java:56) > at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) > at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) > at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) > at sun.misc.Resource.getBytes(Resource.java:124) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > ... 10 more > 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |util.ExitUtil|: Exiting with status -1 > 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Received should die response from AM > 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Asked to die via task heartbeat > 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: > Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an > invocation of shutdownRequested > {code} > Reason is UncaughtException. When time is 01:29, a disk was error, so throw > NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch > NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then > DataStream didn't know ReponseProcessor was dead, and can't trigger > closeResponder, so stucked in DataStream.run. > I tested in unit-test TestDataStream.testDfsClient. When I throw > NoClassDefFoundError in ResponseProcessor.run, the > TestDataStream.testDfsClient will failed bacause of timeout. > I think we should catch Throwable but not Exception in ReponseProcessor.run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15219: --- Description: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code:java} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more Caused by: java.util.zip.ZipException: error reading zip file at java.util.zip.ZipFile.read(Native Method) at java.util.zip.ZipFile.access$1400(ZipFile.java:56) at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at sun.misc.Resource.getBytes(Resource.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ... 10 more 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |util.ExitUtil|: Exiting with status -1 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Received should die response from AM 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Asked to die via task heartbeat 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an invocation of shutdownRequested {code} Reason is UncaughtException. When time is 01:29, a disk was error, so throw NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then DataStream didn't know ReponseProcessor was dead, and can't trigger closeResponder, so stucked in DataStream.run. I tested in unit-test TestDataStream.testDfsClient. When I throw NoClassDefFoundError in ResponseProcessor.run, the TestDataStream.testDfsClient will failed bacause of timeout. I think we should catch Throwable but not Exception in ReponseProcessor.run. was: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code:java} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException:
[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15219: --- Description: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code:java} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more Caused by: java.util.zip.ZipException: error reading zip file at java.util.zip.ZipFile.read(Native Method) at java.util.zip.ZipFile.access$1400(ZipFile.java:56) at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at sun.misc.Resource.getBytes(Resource.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ... 10 more 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |util.ExitUtil|: Exiting with status -1 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Received should die response from AM 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Asked to die via task heartbeat 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an invocation of shutdownRequested {code} Reason is UncaughtException. When time is 01:29, a disk was error, so throw NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then DataStream didn't know ReponseProcessor was dead, and can't trigger closeResponder, so stucked in DataStream.run. I tested in unit-test TestDataStream.testDfsClient. When I throw NoClassDefFoundError, the TestDataStream.testDfsClient will failed bacause of timeout. I think we should catch Throwable but not Exception in ReponseProcessor.run. was: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code:java} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat at
[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15219: --- Description: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code:java} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more Caused by: java.util.zip.ZipException: error reading zip file at java.util.zip.ZipFile.read(Native Method) at java.util.zip.ZipFile.access$1400(ZipFile.java:56) at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at sun.misc.Resource.getBytes(Resource.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ... 10 more 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |util.ExitUtil|: Exiting with status -1 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Received should die response from AM 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Asked to die via task heartbeat 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an invocation of shutdownRequested {code} was: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more Caused by: java.util.zip.ZipException: error reading zip file at java.util.zip.ZipFile.read(Native Method) at
[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15219: --- Description: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code:java} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more Caused by: java.util.zip.ZipException: error reading zip file at java.util.zip.ZipFile.read(Native Method) at java.util.zip.ZipFile.access$1400(ZipFile.java:56) at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at sun.misc.Resource.getBytes(Resource.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ... 10 more 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |util.ExitUtil|: Exiting with status -1 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Received should die response from AM 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Asked to die via task heartbeat 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an invocation of shutdownRequested {code} Reason is UncaughtException. ResponseProcessor.run was: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code:java} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more Caused by: java.util.zip.ZipException: error reading zip file at java.util.zip.ZipFile.read(Native