[jira] [Commented] (HDFS-15159) Prevent adding same DN multiple times in PendingReconstructionBlocks
[ https://issues.apache.org/jira/browse/HDFS-15159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057604#comment-17057604 ] hemanthboyina commented on HDFS-15159: -- thanks for the comment [~surendrasingh] [~elgoiri] i have updated the patch with test case there was some problem with build , can you please trigger build again > Prevent adding same DN multiple times in PendingReconstructionBlocks > > > Key: HDFS-15159 > URL: https://issues.apache.org/jira/browse/HDFS-15159 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-15159.001.patch, HDFS-15159.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15154) Allow only hdfs superusers the ability to assign HDFS storage policies
[ https://issues.apache.org/jira/browse/HDFS-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057597#comment-17057597 ] Siddharth Wagle commented on HDFS-15154: 10 => Updated the patch with changes suggested by [~hanishakoneru], changed the exception message to be simpler since we already print deprecation warning and have updated the hdfs-default.xml > Allow only hdfs superusers the ability to assign HDFS storage policies > -- > > Key: HDFS-15154 > URL: https://issues.apache.org/jira/browse/HDFS-15154 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Bob Cauthen >Assignee: Siddharth Wagle >Priority: Major > Attachments: HDFS-15154.01.patch, HDFS-15154.02.patch, > HDFS-15154.03.patch, HDFS-15154.04.patch, HDFS-15154.05.patch, > HDFS-15154.06.patch, HDFS-15154.07.patch, HDFS-15154.08.patch, > HDFS-15154.09.patch, HDFS-15154.10.patch > > > Please provide a way to limit only HDFS superusers the ability to assign HDFS > Storage Policies to HDFS directories. > Currently, and based on Jira HDFS-7093, all storage policies can be disabled > cluster wide by setting the following: > dfs.storage.policy.enabled to false > But we need a way to allow only HDFS superusers the ability to assign an HDFS > Storage Policy to an HDFS directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15154) Allow only hdfs superusers the ability to assign HDFS storage policies
[ https://issues.apache.org/jira/browse/HDFS-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Wagle updated HDFS-15154: --- Attachment: HDFS-15154.10.patch > Allow only hdfs superusers the ability to assign HDFS storage policies > -- > > Key: HDFS-15154 > URL: https://issues.apache.org/jira/browse/HDFS-15154 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Bob Cauthen >Assignee: Siddharth Wagle >Priority: Major > Attachments: HDFS-15154.01.patch, HDFS-15154.02.patch, > HDFS-15154.03.patch, HDFS-15154.04.patch, HDFS-15154.05.patch, > HDFS-15154.06.patch, HDFS-15154.07.patch, HDFS-15154.08.patch, > HDFS-15154.09.patch, HDFS-15154.10.patch > > > Please provide a way to limit only HDFS superusers the ability to assign HDFS > Storage Policies to HDFS directories. > Currently, and based on Jira HDFS-7093, all storage policies can be disabled > cluster wide by setting the following: > dfs.storage.policy.enabled to false > But we need a way to allow only HDFS superusers the ability to assign an HDFS > Storage Policy to an HDFS directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15077) Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout
[ https://issues.apache.org/jira/browse/HDFS-15077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-15077: Fix Version/s: 2.10.1 > Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout > > > Key: HDFS-15077 > URL: https://issues.apache.org/jira/browse/HDFS-15077 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15077-branch-2.10.patch > > > {{TestDFSClientRetries#testLeaseRenewSocketTimeout}} intermittently fails due > to race between test thread and LeaseRenewer thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15077) Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout
[ https://issues.apache.org/jira/browse/HDFS-15077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057588#comment-17057588 ] Masatake Iwasaki commented on HDFS-15077: - pushed backported patch to branch-2.10. > Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout > > > Key: HDFS-15077 > URL: https://issues.apache.org/jira/browse/HDFS-15077 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-15077-branch-2.10.patch > > > {{TestDFSClientRetries#testLeaseRenewSocketTimeout}} intermittently fails due > to race between test thread and LeaseRenewer thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15077) Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout
[ https://issues.apache.org/jira/browse/HDFS-15077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-15077: Attachment: HDFS-15077-branch-2.10.patch > Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout > > > Key: HDFS-15077 > URL: https://issues.apache.org/jira/browse/HDFS-15077 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-15077-branch-2.10.patch > > > {{TestDFSClientRetries#testLeaseRenewSocketTimeout}} intermittently fails due > to race between test thread and LeaseRenewer thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15219: --- Description: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code:java} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more Caused by: java.util.zip.ZipException: error reading zip file at java.util.zip.ZipFile.read(Native Method) at java.util.zip.ZipFile.access$1400(ZipFile.java:56) at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at sun.misc.Resource.getBytes(Resource.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ... 10 more 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |util.ExitUtil|: Exiting with status -1 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Received should die response from AM 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Asked to die via task heartbeat 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an invocation of shutdownRequested {code} Reason is UncaughtException. When time is 01:29, a disk was error, so throw NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then DataStream didn't know ReponseProcessor was dead, and can't trigger closeResponder, so stucked in DataStream.run. I tested in unit-test TestDataStream.testDfsClient. When I throw NoClassDefFoundError in ResponseProcessor.run, the TestDataStream.testDfsClient will failed bacause of timeout. I think we should catch Throwable but not Exception in ReponseProcessor.run. was: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code:java} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException:
[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15219: --- Description: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code:java} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more Caused by: java.util.zip.ZipException: error reading zip file at java.util.zip.ZipFile.read(Native Method) at java.util.zip.ZipFile.access$1400(ZipFile.java:56) at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at sun.misc.Resource.getBytes(Resource.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ... 10 more 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |util.ExitUtil|: Exiting with status -1 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Received should die response from AM 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Asked to die via task heartbeat 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an invocation of shutdownRequested {code} Reason is UncaughtException. When time is 01:29, a disk was error, so throw NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then DataStream didn't know ReponseProcessor was dead, and can't trigger closeResponder, so stucked in DataStream.run. I tested in unit-test TestDataStream.testDfsClient. When I throw NoClassDefFoundError, the TestDataStream.testDfsClient will failed bacause of timeout. I think we should catch Throwable but not Exception in ReponseProcessor.run. was: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code:java} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat at
[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15219: --- Description: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code:java} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more Caused by: java.util.zip.ZipException: error reading zip file at java.util.zip.ZipFile.read(Native Method) at java.util.zip.ZipFile.access$1400(ZipFile.java:56) at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at sun.misc.Resource.getBytes(Resource.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ... 10 more 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |util.ExitUtil|: Exiting with status -1 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Received should die response from AM 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Asked to die via task heartbeat 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an invocation of shutdownRequested {code} was: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more Caused by: java.util.zip.ZipException: error reading zip file at java.util.zip.ZipFile.read(Native Method) at
[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-15219: --- Description: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code:java} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more Caused by: java.util.zip.ZipException: error reading zip file at java.util.zip.ZipFile.read(Native Method) at java.util.zip.ZipFile.access$1400(ZipFile.java:56) at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at sun.misc.Resource.getBytes(Resource.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ... 10 more 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |util.ExitUtil|: Exiting with status -1 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Received should die response from AM 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Asked to die via task heartbeat 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an invocation of shutdownRequested {code} Reason is UncaughtException. ResponseProcessor.run was: In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code:java} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more Caused by: java.util.zip.ZipException: error reading zip file at java.util.zip.ZipFile.read(Native
[jira] [Created] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
zhengchenyu created HDFS-15219: -- Summary: DFS Client will stuck when ResponseProcessor.run throw Error Key: HDFS-15219 URL: https://issues.apache.org/jira/browse/HDFS-15219 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.7.3 Reporter: zhengchenyu Fix For: 3.2.2 In my case, a Tez application stucked more than 2 hours util we kill this applicaiton. The Reason is a task attempt stucked, becuase speculative execution is disable. Then Exception like this: {code} 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 10 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: records written - 100 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records read - 100 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] threw an Error. Shutting down now... java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) at java.lang.String.valueOf(String.java:2847) at java.lang.StringBuilder.append(StringBuilder.java:128) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more Caused by: java.util.zip.ZipException: error reading zip file at java.util.zip.ZipFile.read(Native Method) at java.util.zip.ZipFile.access$1400(ZipFile.java:56) at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at sun.misc.Resource.getBytes(Resource.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ... 10 more 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] |util.ExitUtil|: Exiting with status -1 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Received should die response from AM 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: Asked to die via task heartbeat 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an invocation of shutdownRequested {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15077) Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout
[ https://issues.apache.org/jira/browse/HDFS-15077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057516#comment-17057516 ] Masatake Iwasaki commented on HDFS-15077: - [~Jim_Brennan] I'm going to backport this to branch-2.10. We can not use lambda since branch-2.10 does not drop Java 7 support. > Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout > > > Key: HDFS-15077 > URL: https://issues.apache.org/jira/browse/HDFS-15077 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2 > > > {{TestDFSClientRetries#testLeaseRenewSocketTimeout}} intermittently fails due > to race between test thread and LeaseRenewer thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15154) Allow only hdfs superusers the ability to assign HDFS storage policies
[ https://issues.apache.org/jira/browse/HDFS-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057501#comment-17057501 ] Siddharth Wagle commented on HDFS-15154: Since we are logging deprecation already, can we just change the warning to this: {noformat} Failed to change storage policy satisfier as storage policies have been disabled. {noformat} Rather than the cryptic message? > Allow only hdfs superusers the ability to assign HDFS storage policies > -- > > Key: HDFS-15154 > URL: https://issues.apache.org/jira/browse/HDFS-15154 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Bob Cauthen >Assignee: Siddharth Wagle >Priority: Major > Attachments: HDFS-15154.01.patch, HDFS-15154.02.patch, > HDFS-15154.03.patch, HDFS-15154.04.patch, HDFS-15154.05.patch, > HDFS-15154.06.patch, HDFS-15154.07.patch, HDFS-15154.08.patch, > HDFS-15154.09.patch > > > Please provide a way to limit only HDFS superusers the ability to assign HDFS > Storage Policies to HDFS directories. > Currently, and based on Jira HDFS-7093, all storage policies can be disabled > cluster wide by setting the following: > dfs.storage.policy.enabled to false > But we need a way to allow only HDFS superusers the ability to assign an HDFS > Storage Policy to an HDFS directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15154) Allow only hdfs superusers the ability to assign HDFS storage policies
[ https://issues.apache.org/jira/browse/HDFS-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057494#comment-17057494 ] Siddharth Wagle commented on HDFS-15154: Thanks for the review [~hanishakoneru], I will make those changes > Allow only hdfs superusers the ability to assign HDFS storage policies > -- > > Key: HDFS-15154 > URL: https://issues.apache.org/jira/browse/HDFS-15154 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Bob Cauthen >Assignee: Siddharth Wagle >Priority: Major > Attachments: HDFS-15154.01.patch, HDFS-15154.02.patch, > HDFS-15154.03.patch, HDFS-15154.04.patch, HDFS-15154.05.patch, > HDFS-15154.06.patch, HDFS-15154.07.patch, HDFS-15154.08.patch, > HDFS-15154.09.patch > > > Please provide a way to limit only HDFS superusers the ability to assign HDFS > Storage Policies to HDFS directories. > Currently, and based on Jira HDFS-7093, all storage policies can be disabled > cluster wide by setting the following: > dfs.storage.policy.enabled to false > But we need a way to allow only HDFS superusers the ability to assign an HDFS > Storage Policy to an HDFS directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15154) Allow only hdfs superusers the ability to assign HDFS storage policies
[ https://issues.apache.org/jira/browse/HDFS-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057484#comment-17057484 ] Hanisha Koneru commented on HDFS-15154: --- [~swagle], patch LGTM overall. Few comments: * In StoragePolicySatisfyManager also we should call DFSUtil#getDfsStoragePolicySetting in case the deprecated config is set. * We might have to change the following log messages to indicate that either DFS_STORAGE_POLICIES_ENABLED_KEY is Disabled or DFS_STORAGE_POLICY_ENABLED_KEY is set to false. {code:java} LOG.info("Failed to change storage policy satisfier as {} set to {}.", DFSConfigKeys.DFS_STORAGE_POLICIES_ENABLED_KEY, DFSConfigKeys.DfsStoragePolicySetting.DISABLED);{code} * We could probably add a new method in DFSUtil to check if StoragePolicy is enabled as that check is done in multiple places. > Allow only hdfs superusers the ability to assign HDFS storage policies > -- > > Key: HDFS-15154 > URL: https://issues.apache.org/jira/browse/HDFS-15154 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Bob Cauthen >Assignee: Siddharth Wagle >Priority: Major > Attachments: HDFS-15154.01.patch, HDFS-15154.02.patch, > HDFS-15154.03.patch, HDFS-15154.04.patch, HDFS-15154.05.patch, > HDFS-15154.06.patch, HDFS-15154.07.patch, HDFS-15154.08.patch, > HDFS-15154.09.patch > > > Please provide a way to limit only HDFS superusers the ability to assign HDFS > Storage Policies to HDFS directories. > Currently, and based on Jira HDFS-7093, all storage policies can be disabled > cluster wide by setting the following: > dfs.storage.policy.enabled to false > But we need a way to allow only HDFS superusers the ability to assign an HDFS > Storage Policy to an HDFS directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15159) Prevent adding same DN multiple times in PendingReconstructionBlocks
[ https://issues.apache.org/jira/browse/HDFS-15159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057475#comment-17057475 ] Hadoop QA commented on HDFS-15159: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 9s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 0s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}134m 12s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 49s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}196m 25s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy | | | hadoop.hdfs.TestDFSClientExcludedNodes | | | hadoop.hdfs.web.TestWebHDFS | | | hadoop.hdfs.TestDFSInotifyEventInputStream | | | hadoop.hdfs.server.namenode.TestNamenodeRetryCache | | | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy | | | hadoop.hdfs.TestErasureCodingExerciseAPIs | | | hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM | | | hadoop.hdfs.TestFileChecksum | | | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot | | | hadoop.hdfs.TestWriteReadStripedFile | | | hadoop.hdfs.TestDFSInputStreamBlockLocations | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.TestEncryptionZones | | | hadoop.hdfs.server.namenode.sps.TestStoragePolicySatisfierWithStripedFile | | | hadoop.hdfs.server.namenode.TestDefaultBlockPlacementPolicy | | | hadoop.hdfs.TestDecommission | | | hadoop.hdfs.server.namenode.TestFSEditLogLoader | | | hadoop.hdfs.TestDecommissionWithStriped | | | hadoop.hdfs.server.namenode.ha.TestHASafeMode | | | hadoop.hdfs.TestErasureCodingPolicyWithSnapshot | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | \\ \\ || Subsystem || Report/Notes || |
[jira] [Commented] (HDFS-15077) Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout
[ https://issues.apache.org/jira/browse/HDFS-15077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057423#comment-17057423 ] Jim Brennan commented on HDFS-15077: [~iwasakims], [~aajisaka] we have seen this failure (rarely) in our automated tests for our internal branch-2.10 build. I believe the patch applies cleanly. Could we get it pulled back to branch-2.10? > Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout > > > Key: HDFS-15077 > URL: https://issues.apache.org/jira/browse/HDFS-15077 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2 > > > {{TestDFSClientRetries#testLeaseRenewSocketTimeout}} intermittently fails due > to race between test thread and LeaseRenewer thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15159) Prevent adding same DN multiple times in PendingReconstructionBlocks
[ https://issues.apache.org/jira/browse/HDFS-15159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hemanthboyina updated HDFS-15159: - Attachment: HDFS-15159.002.patch > Prevent adding same DN multiple times in PendingReconstructionBlocks > > > Key: HDFS-15159 > URL: https://issues.apache.org/jira/browse/HDFS-15159 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-15159.001.patch, HDFS-15159.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12136) BlockSender performance regression due to volume scanner edge case
[ https://issues.apache.org/jira/browse/HDFS-12136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-12136: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) Resolved by HDFS-11187 > BlockSender performance regression due to volume scanner edge case > -- > > Key: HDFS-12136 > URL: https://issues.apache.org/jira/browse/HDFS-12136 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HDFS-12136.branch-2.patch, HDFS-12136.trunk.patch > > > HDFS-11160 attempted to fix a volume scan race for a file appended mid-scan > by reading the last checksum of finalized blocks within the {{BlockSender}} > ctor. Unfortunately it's holding the exclusive dataset lock to open and read > the metafile multiple times Block sender instantiation becomes serialized. > Performance completely collapses under heavy disk i/o utilization or high > xceiver activity. Ex. lost node replication, balancing, or decommissioning. > The xceiver threads congest creating block senders and impair the heartbeat > processing that is contending for the same lock. Combined with other lock > contention issues, pipelines break and nodes sporadically go dead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14338) TestPread timeouts in branch-2.8
[ https://issues.apache.org/jira/browse/HDFS-14338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14338: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) Branch-2.8 is EOL. Resolve as Won't Fix. > TestPread timeouts in branch-2.8 > > > Key: HDFS-14338 > URL: https://issues.apache.org/jira/browse/HDFS-14338 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Attachments: HDFS-14338-001.patch, > HDFS-14338-branch-2.8-001-testing.patch, HDFS-14338-branch-2.8-001.patch > > > TestPread timeouts in branch-2.8. > {noformat} > --- > T E S T S > --- > OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support > was removed in 8.0 > Running org.apache.hadoop.hdfs.TestPread > Results : > Tests run: 0, Failures: 0, Errors: 0, Skipped: 0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15039) Cache meta file length of FinalizedReplica to reduce call File.length()
[ https://issues.apache.org/jira/browse/HDFS-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057263#comment-17057263 ] Wei-Chiu Chuang commented on HDFS-15039: The patch doesn't apply any more. Updated the patch to resolve conflicts. > Cache meta file length of FinalizedReplica to reduce call File.length() > --- > > Key: HDFS-15039 > URL: https://issues.apache.org/jira/browse/HDFS-15039 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15039.006.patch, HDFS-15039.patch, > HDFS-15039.patch, HDFS-15039.patch, HDFS-15039.patch, HDFS-15039.patch > > > When use ReplicaCachingGetSpaceUsed to get the volume space used. It will > call File.length() for every meta file of replica. That add more disk IO, we > found the slow log as below. For finalized replica, the size of meta file is > not changed, i think we can cache the value. > {code:java} > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed: > Refresh dfs used, bpid: BP-898717543-10.75.1.240-1519386995727 replicas > size: 1166 dfsUsed: 72227113183 on volume: > DS-3add8d62-d69a-4f5a-a29f-b7bbb400af2e duration: 17206ms{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15039) Cache meta file length of FinalizedReplica to reduce call File.length()
[ https://issues.apache.org/jira/browse/HDFS-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15039: --- Attachment: HDFS-15039.006.patch > Cache meta file length of FinalizedReplica to reduce call File.length() > --- > > Key: HDFS-15039 > URL: https://issues.apache.org/jira/browse/HDFS-15039 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15039.006.patch, HDFS-15039.patch, > HDFS-15039.patch, HDFS-15039.patch, HDFS-15039.patch, HDFS-15039.patch > > > When use ReplicaCachingGetSpaceUsed to get the volume space used. It will > call File.length() for every meta file of replica. That add more disk IO, we > found the slow log as below. For finalized replica, the size of meta file is > not changed, i think we can cache the value. > {code:java} > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed: > Refresh dfs used, bpid: BP-898717543-10.75.1.240-1519386995727 replicas > size: 1166 dfsUsed: 72227113183 on volume: > DS-3add8d62-d69a-4f5a-a29f-b7bbb400af2e duration: 17206ms{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13351) Revert HDFS-11156 from branch-2/branch-2.8
[ https://issues.apache.org/jira/browse/HDFS-13351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-13351: --- Target Version/s: 2.10.1 Labels: release-blocker (was: ) > Revert HDFS-11156 from branch-2/branch-2.8 > -- > > Key: HDFS-13351 > URL: https://issues.apache.org/jira/browse/HDFS-13351 > Project: Hadoop HDFS > Issue Type: Task > Components: webhdfs >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: release-blocker > Attachments: HDFS-13351-branch-2.001.patch, > HDFS-13351-branch-2.002.patch, HDFS-13351-branch-2.003.patch > > > Per discussion in HDFS-11156, lets revert the change from branch-2 and > branch-2.8. New patch can be tracked in HDFS-12459 . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15039) Cache meta file length of FinalizedReplica to reduce call File.length()
[ https://issues.apache.org/jira/browse/HDFS-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057254#comment-17057254 ] Wei-Chiu Chuang commented on HDFS-15039: +1 > Cache meta file length of FinalizedReplica to reduce call File.length() > --- > > Key: HDFS-15039 > URL: https://issues.apache.org/jira/browse/HDFS-15039 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15039.patch, HDFS-15039.patch, HDFS-15039.patch, > HDFS-15039.patch, HDFS-15039.patch > > > When use ReplicaCachingGetSpaceUsed to get the volume space used. It will > call File.length() for every meta file of replica. That add more disk IO, we > found the slow log as below. For finalized replica, the size of meta file is > not changed, i think we can cache the value. > {code:java} > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed: > Refresh dfs used, bpid: BP-898717543-10.75.1.240-1519386995727 replicas > size: 1166 dfsUsed: 72227113183 on volume: > DS-3add8d62-d69a-4f5a-a29f-b7bbb400af2e duration: 17206ms{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14820) The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big
[ https://issues.apache.org/jira/browse/HDFS-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057249#comment-17057249 ] Wei-Chiu Chuang commented on HDFS-14820: I am +1 and will commit by end of week unless there's objection to my explanation above. Thanks. > The default 8KB buffer of > BlockReaderRemote#newBlockReader#BufferedOutputStream is too big > --- > > Key: HDFS-14820 > URL: https://issues.apache.org/jira/browse/HDFS-14820 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14820.001.patch, HDFS-14820.002.patch, > HDFS-14820.003.patch > > > this issue is similar to HDFS-14535. > {code:java} > public static BlockReader newBlockReader(String file, > ExtendedBlock block, > Token blockToken, > long startOffset, long len, > boolean verifyChecksum, > String clientName, > Peer peer, DatanodeID datanodeID, > PeerCache peerCache, > CachingStrategy cachingStrategy, > int networkDistance) throws IOException { > // in and out will be closed when sock is closed (by the caller) > final DataOutputStream out = new DataOutputStream(new BufferedOutputStream( > peer.getOutputStream())); > new Sender(out).readBlock(block, blockToken, clientName, startOffset, len, > verifyChecksum, cachingStrategy); > } > public BufferedOutputStream(OutputStream out) { > this(out, 8192); > } > {code} > Sender#readBlock parameter( block,blockToken, clientName, startOffset, len, > verifyChecksum, cachingStrategy) could not use such a big buffer. > So i think it should reduce BufferedOutputStream buffer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057033#comment-17057033 ] Hadoop QA commented on HDFS-15160: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 4s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 11s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 25s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}154m 29s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestFsck | | | hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.7 Server=19.03.7 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | HDFS-15160 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12996405/HDFS-15160.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a2221593bbe8 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cf9cf83 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/28927/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/28927/testReport/ | | Max. process+thread count | 3478 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U:
[jira] [Commented] (HDFS-15216) Wrong Use Case of -showprogress in fsck
[ https://issues.apache.org/jira/browse/HDFS-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056932#comment-17056932 ] Hadoop QA commented on HDFS-15216: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 0s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 19 unchanged - 1 fixed = 19 total (was 20) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 31s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 37s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}155m 29s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.TestMultipleNNPortQOP | | | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.7 Server=19.03.7 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | HDFS-15216 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12996314/HDFS-15216.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f15c0de2c4ab 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cf9cf83 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/28926/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results |
[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056905#comment-17056905 ] Stephen O'Donnell commented on HDFS-15160: -- Uploaded v003 switching DataNode#transferReplicaForPipelineRecovery to the read lock. > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, > HDFS-15160.003.patch > > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell updated HDFS-15160: - Attachment: HDFS-15160.003.patch > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, > HDFS-15160.003.patch > > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15216) Wrong Use Case of -showprogress in fsck
[ https://issues.apache.org/jira/browse/HDFS-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056786#comment-17056786 ] Stephen O'Donnell commented on HDFS-15216: -- +1 on this change. Unit test failures seem unrelated (unable to create native thread errors). I have triggered the build job again to be sure. > Wrong Use Case of -showprogress in fsck > > > Key: HDFS-15216 > URL: https://issues.apache.org/jira/browse/HDFS-15216 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-15216.001.patch > > > *-showprogress* is deprecated and Progress is now shown by default but fsck > --help shows incorrect use case for the same > > Usage: hdfs fsck [-list-corruptfileblocks | [-move | -delete | > -openforwrite] [-files [-blocks [-locations | -racks | -replicaDetails | > -upgradedomains [-includeSnapshots] [-showprogress] [-storagepolicies] > [-maintenance] [-blockId ] > start checking from this path > h4. > -move move corrupted files to /lost+found > -delete delete corrupted files > -files print out files being checked > -openforwrite print out files opened for write > -includeSnapshots include snapshot data if the given path indicates a > snapshottable directory or there are snapshottable directories under it > -list-corruptfileblocks print out list of missing blocks and files they > belong to > -files -blocks print out block report > -files -blocks -locations print out locations for every block > -files -blocks -racks print out network topology for data-node locations > -files -blocks -replicaDetails print out each replica details > -files -blocks -upgradedomains print out upgrade domains for every block > -storagepolicies print out storage policy summary for the blocks > -maintenance print out maintenance state node details > *-showprogress show progress in output. Default is OFF (no progress)* > -blockId print out which file this blockId belongs to, locations (nodes, > racks) of this block, and other diagnostics info (under replicated, corrupted > or not, etc) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org