[jira] [Updated] (HDFS-16063) Add toString to EditLogFileInputStream
[ https://issues.apache.org/jira/browse/HDFS-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-16063: -- Labels: n00b newbie (was: ) > Add toString to EditLogFileInputStream > -- > > Key: HDFS-16063 > URL: https://issues.apache.org/jira/browse/HDFS-16063 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: David Mollitor >Priority: Minor > Labels: n00b, newbie > > The class {{EditLogFileInputStream}} is logged at DEBUG level, but has no > {{toString}} method, so the logging is of limited value. Also, put the DEBUG > statement behind some guards since it's printing an unbounded list of items. > https://github.com/apache/hadoop/blob/eefa664fea1119a9c6e3ae2d2ad3069019fbd4ef/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L895 > Just need the following: > {code:java} > private final LogSource log; > private final long firstTxId; > private final long lastTxId; > private final boolean isInProgress; > private int maxOpSize; > private State state = State.UNINIT; > private int logVersion = 0; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16063) Add toString to EditLogFileInputStream
[ https://issues.apache.org/jira/browse/HDFS-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-16063: -- Description: The class {{EditLogFileInputStream}} is logged at DEBUG level, but has no {{toString}} method, so the logging is of limited value. Also, put the DEBUG statement behind some guards since it's printing an unbounded list of items. https://github.com/apache/hadoop/blob/eefa664fea1119a9c6e3ae2d2ad3069019fbd4ef/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L895 Just need the following: {code:java} private final LogSource log; private final long firstTxId; private final long lastTxId; private final boolean isInProgress; private int maxOpSize; private State state = State.UNINIT; private int logVersion = 0; {code} was: The class {{EditLogFileInputStream}} is logged at DEBUG level, but has no {{toString}} method, so the logging is of limited value. Also, put the DEBUG statement behind some guards since it's printing an unbounded list of items. https://github.com/apache/hadoop/blob/eefa664fea1119a9c6e3ae2d2ad3069019fbd4ef/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L895 > Add toString to EditLogFileInputStream > -- > > Key: HDFS-16063 > URL: https://issues.apache.org/jira/browse/HDFS-16063 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: David Mollitor >Priority: Minor > > The class {{EditLogFileInputStream}} is logged at DEBUG level, but has no > {{toString}} method, so the logging is of limited value. Also, put the DEBUG > statement behind some guards since it's printing an unbounded list of items. > https://github.com/apache/hadoop/blob/eefa664fea1119a9c6e3ae2d2ad3069019fbd4ef/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L895 > Just need the following: > {code:java} > private final LogSource log; > private final long firstTxId; > private final long lastTxId; > private final boolean isInProgress; > private int maxOpSize; > private State state = State.UNINIT; > private int logVersion = 0; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16063) Add toString to EditLogFileInputStream
David Mollitor created HDFS-16063: - Summary: Add toString to EditLogFileInputStream Key: HDFS-16063 URL: https://issues.apache.org/jira/browse/HDFS-16063 Project: Hadoop HDFS Issue Type: Improvement Reporter: David Mollitor The class {{EditLogFileInputStream}} is logged at DEBUG level, but has no {{toString}} method, so the logging is of limited value. Also, put the DEBUG statement behind some guards since it's printing an unbounded list of items. https://github.com/apache/hadoop/blob/eefa664fea1119a9c6e3ae2d2ad3069019fbd4ef/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L895 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
[ https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278864#comment-17278864 ] David Mollitor commented on HDFS-15790: --- OK. This looks OK with me. As I said, in my original issue, both engines were loaded into the same JVM and they would both fight at the point of registration. It looks like things are now setup that they both register in the same static way and they don't explode when they both register. Thanks. > Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist > -- > > Key: HDFS-15790 > URL: https://issues.apache.org/jira/browse/HDFS-15790 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive > project. This was not an awesome thing to do between minor versions in > regards to backwards compatibility for downstream projects. > Additionally, these two frameworks are not drop-in replacements, they have > some differences. Also, Protobuf 2 is not deprecated or anything so let us > have both protocols available at the same time. In Hadoop 4.x Protobuf 2 > support can be dropped. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
[ https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273800#comment-17273800 ] David Mollitor commented on HDFS-15790: --- There also needs to be some doc somewhere on how to allow third parties to leverage this functionality (if it's meant as a public vehicle). Like I said, I haven't figured out where the import substitution happens in the build process. > Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist > -- > > Key: HDFS-15790 > URL: https://issues.apache.org/jira/browse/HDFS-15790 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive > project. This was not an awesome thing to do between minor versions in > regards to backwards compatibility for downstream projects. > Additionally, these two frameworks are not drop-in replacements, they have > some differences. Also, Protobuf 2 is not deprecated or anything so let us > have both protocols available at the same time. In Hadoop 4.x Protobuf 2 > support can be dropped. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
[ https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273799#comment-17273799 ] David Mollitor commented on HDFS-15790: --- Thanks [~vinayakumarb]. I don't mind adding this new capability, but it broke backwards compatibility of a public class. Thanks for taking a look. I hope this can be considered an add-on and not a replacement. > Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist > -- > > Key: HDFS-15790 > URL: https://issues.apache.org/jira/browse/HDFS-15790 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive > project. This was not an awesome thing to do between minor versions in > regards to backwards compatibility for downstream projects. > Additionally, these two frameworks are not drop-in replacements, they have > some differences. Also, Protobuf 2 is not deprecated or anything so let us > have both protocols available at the same time. In Hadoop 4.x Protobuf 2 > support can be dropped. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
[ https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271694#comment-17271694 ] David Mollitor commented on HDFS-15790: --- Also, to use this new engine, there is some wizard magic required to make the protobuf compiler to import core Protobuf functionality from {{org.apache.hadoop.thirdparty.protobuf.*;}} instead of the core protobuf JARs, but I haven't been able to find any documentation on how to pull this off. > Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist > -- > > Key: HDFS-15790 > URL: https://issues.apache.org/jira/browse/HDFS-15790 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive > project. This was not an awesome thing to do between minor versions in > regards to backwards compatibility for downstream projects. > Additionally, these two frameworks are not drop-in replacements, they have > some differences. Also, Protobuf 2 is not deprecated or anything so let us > have both protocols available at the same time. In Hadoop 4.x Protobuf 2 > support can be dropped. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Moved] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
[ https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor moved HADOOP-17494 to HDFS-15790: Key: HDFS-15790 (was: HADOOP-17494) Project: Hadoop HDFS (was: Hadoop Common) > Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist > -- > > Key: HDFS-15790 > URL: https://issues.apache.org/jira/browse/HDFS-15790 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > > Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive > project. This was not an awesome thing to do between minor versions in > regards to backwards compatibility for downstream projects. > Additionally, these two frameworks are not drop-in replacements, they have > some differences. Also, Protobuf 2 is not deprecated or anything so let us > have both protocols available at the same time. In Hadoop 4.x Protobuf 2 > support can be dropped. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15621) Datanode DirectoryScanner uses excessive memory
[ https://issues.apache.org/jira/browse/HDFS-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211069#comment-17211069 ] David Mollitor commented on HDFS-15621: --- Another option would be to multi-thread the operation and use a blocking queue to regulate memory consumption. Multiple threads are scanning directories, and pumping results into a queue. One or more thread processes the data in the queue. If the queue is full, scanners block. In this way, the number of objects that exist at one time is controlled. > Datanode DirectoryScanner uses excessive memory > --- > > Key: HDFS-15621 > URL: https://issues.apache.org/jira/browse/HDFS-15621 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Screenshot 2020-10-09 at 14.11.36.png, Screenshot > 2020-10-09 at 15.20.56.png > > > We generally work a rule of 1GB heap on a datanode per 1M blocks. For nodes > with a lot of blocks, this can mean a lot of heap. > We recently captured a heapdump of a DN with about 22M blocks and found only > about 1.5GB was occupied by the ReplicaMap. Another 9GB of the heap is taken > by the DirectoryScanner ScanInfo objects. Most of this memory was alloated to > strings. > Checking the strings in question, we can see two strings per scanInfo, > looking like: > {code} > /current/BP-671271071-10.163.205.13-1552020401842/current/finalized/subdir28/subdir17/blk_1180438785 > _106716708.meta > {code} > I will update a screen shot from MAT showing this. > For the first string especially, the part > "/current/BP-671271071-10.163.205.13-1552020401842/current/finalized/" will > be the same for every block in the block pool as the scanner is only > concerned about finalized blocks. > We can probably also store just the subdir indexes "28" and "27" rather than > "subdir28/subdir17" and then construct the path when it is requested via the > getter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15393) Review of PendingReconstructionBlocks
David Mollitor created HDFS-15393: - Summary: Review of PendingReconstructionBlocks Key: HDFS-15393 URL: https://issues.apache.org/jira/browse/HDFS-15393 Project: Hadoop HDFS Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor I started looking at this class based on [HDFS-15351]. * Uses {{java.sql.Time}} unnecessarily. Confusing since Java ships with time formatters out of the box in JDK 8. I believe this will cause issues later when trying to upgrade to JDK 9+ since SQL is a different module in Java. * Remove code where appropriate * Use Java Concurrent library for higher concurrent access to underlying map -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15351) Blocks Scheduled Count was wrong on Truncate
[ https://issues.apache.org/jira/browse/HDFS-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126748#comment-17126748 ] David Mollitor commented on HDFS-15351: --- Thanks for pinging me [~hemanthboyina] a few times. I have been a bit all over the place so thanks for you persistence and patients. Probably should be using {{Collection}} classes instead of native arrays, but that's not for this ticket. {code:java} PendingBlockInfo remove = pendingReconstruction.remove(lastBlock); if (remove != null) { List locations = remove.getTargets(); DatanodeStorageInfo.decrementBlocksScheduled(locations.toArray(new DatanodeStorageInfo[0])); } {code} > Blocks Scheduled Count was wrong on Truncate > - > > Key: HDFS-15351 > URL: https://issues.apache.org/jira/browse/HDFS-15351 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-15351.001.patch, HDFS-15351.002.patch, > HDFS-15351.003.patch > > > On truncate and append we remove the blocks from Reconstruction Queue > On removing the blocks from pending reconstruction , we need to decrement > Blocks Scheduled -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14452) Make Op#valueOf() Public
[ https://issues.apache.org/jira/browse/HDFS-14452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107334#comment-17107334 ] David Mollitor commented on HDFS-14452: --- Hello Team, Any more thoughts on this? How do we move this forward? > Make Op#valueOf() Public > > > Key: HDFS-14452 > URL: https://issues.apache.org/jira/browse/HDFS-14452 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: hemanthboyina >Priority: Minor > Labels: noob > Attachments: HDFS-14452.patch > > > Change signature of {{private static Op valueOf(byte code)}} to be public. > Right now, the only easy way to look up in Op is to pass in a {{DataInput}} > object, which is not all that flexible and efficient for other custom > implementations that want to store the Op code a different way. > https://github.com/apache/hadoop/blob/8c95cb9d6bef369fef6a8364f0c0764eba90e44a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Op.java#L53 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug
[ https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021120#comment-17021120 ] David Mollitor commented on HDFS-15115: --- Thanks for looping me in. I've never liked this setup with the {{StringBuilder}} from the start. It's just not the right way to do DEBUG logging. All the logging should be generated in one block and not concatenated piecemeal. However, I submitted a patch (slightly updated from v1) so that the {{builder}} is always populated and will therefore not throw NPE. However, please note that if the DEBUG logging is enabled sometime during execution, the first log message may be only partial... that is, the first few concatenations happen while DEBUG is disabled, and the last few happen while DEBUG is enabled, and then the {{StringBuilder}} is sent to the logging framework for output. > Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically > change logger to debug > --- > > Key: HDFS-15115 > URL: https://issues.apache.org/jira/browse/HDFS-15115 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: wangzhixiang >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-15115.001.patch, HDFS-15115.2.patch > > > To get debug info, we dynamically change the logger of > BlockPlacementPolicyDefault to debug when namenode is running. However, the > Namenode crashs. From the log, we find some NPE in > BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* > will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. > While the *builder* only initializes in the first time of this method. If we > change the logger of BlockPlacementPolicyDefault to debug after the part, the > *builder* in remaining part is *NULL* and cause *NPE* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug
[ https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-15115: -- Status: Patch Available (was: Open) > Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically > change logger to debug > --- > > Key: HDFS-15115 > URL: https://issues.apache.org/jira/browse/HDFS-15115 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: wangzhixiang >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-15115.001.patch, HDFS-15115.2.patch > > > To get debug info, we dynamically change the logger of > BlockPlacementPolicyDefault to debug when namenode is running. However, the > Namenode crashs. From the log, we find some NPE in > BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* > will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. > While the *builder* only initializes in the first time of this method. If we > change the logger of BlockPlacementPolicyDefault to debug after the part, the > *builder* in remaining part is *NULL* and cause *NPE* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug
[ https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-15115: -- Attachment: (was: HDFS-15115.1.patch) > Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically > change logger to debug > --- > > Key: HDFS-15115 > URL: https://issues.apache.org/jira/browse/HDFS-15115 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: wangzhixiang >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-15115.001.patch, HDFS-15115.2.patch > > > To get debug info, we dynamically change the logger of > BlockPlacementPolicyDefault to debug when namenode is running. However, the > Namenode crashs. From the log, we find some NPE in > BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* > will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. > While the *builder* only initializes in the first time of this method. If we > change the logger of BlockPlacementPolicyDefault to debug after the part, the > *builder* in remaining part is *NULL* and cause *NPE* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug
[ https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-15115: -- Attachment: HDFS-15115.2.patch > Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically > change logger to debug > --- > > Key: HDFS-15115 > URL: https://issues.apache.org/jira/browse/HDFS-15115 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: wangzhixiang >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-15115.001.patch, HDFS-15115.2.patch > > > To get debug info, we dynamically change the logger of > BlockPlacementPolicyDefault to debug when namenode is running. However, the > Namenode crashs. From the log, we find some NPE in > BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* > will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. > While the *builder* only initializes in the first time of this method. If we > change the logger of BlockPlacementPolicyDefault to debug after the part, the > *builder* in remaining part is *NULL* and cause *NPE* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug
[ https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-15115: -- Attachment: HDFS-15115.1.patch > Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically > change logger to debug > --- > > Key: HDFS-15115 > URL: https://issues.apache.org/jira/browse/HDFS-15115 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: wangzhixiang >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-15115.001.patch, HDFS-15115.2.patch > > > To get debug info, we dynamically change the logger of > BlockPlacementPolicyDefault to debug when namenode is running. However, the > Namenode crashs. From the log, we find some NPE in > BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* > will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. > While the *builder* only initializes in the first time of this method. If we > change the logger of BlockPlacementPolicyDefault to debug after the part, the > *builder* in remaining part is *NULL* and cause *NPE* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug
[ https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-15115: -- Attachment: (was: HDFS-14103.1.patch) > Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically > change logger to debug > --- > > Key: HDFS-15115 > URL: https://issues.apache.org/jira/browse/HDFS-15115 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: wangzhixiang >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-15115.001.patch, HDFS-15115.2.patch > > > To get debug info, we dynamically change the logger of > BlockPlacementPolicyDefault to debug when namenode is running. However, the > Namenode crashs. From the log, we find some NPE in > BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* > will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. > While the *builder* only initializes in the first time of this method. If we > change the logger of BlockPlacementPolicyDefault to debug after the part, the > *builder* in remaining part is *NULL* and cause *NPE* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug
[ https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HDFS-15115: - Assignee: David Mollitor > Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically > change logger to debug > --- > > Key: HDFS-15115 > URL: https://issues.apache.org/jira/browse/HDFS-15115 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: wangzhixiang >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14103.1.patch, HDFS-15115.001.patch > > > To get debug info, we dynamically change the logger of > BlockPlacementPolicyDefault to debug when namenode is running. However, the > Namenode crashs. From the log, we find some NPE in > BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* > will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. > While the *builder* only initializes in the first time of this method. If we > change the logger of BlockPlacementPolicyDefault to debug after the part, the > *builder* in remaining part is *NULL* and cause *NPE* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug
[ https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-15115: -- Attachment: HDFS-14103.1.patch > Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically > change logger to debug > --- > > Key: HDFS-15115 > URL: https://issues.apache.org/jira/browse/HDFS-15115 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: wangzhixiang >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14103.1.patch, HDFS-15115.001.patch > > > To get debug info, we dynamically change the logger of > BlockPlacementPolicyDefault to debug when namenode is running. However, the > Namenode crashs. From the log, we find some NPE in > BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* > will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. > While the *builder* only initializes in the first time of this method. If we > change the logger of BlockPlacementPolicyDefault to debug after the part, the > *builder* in remaining part is *NULL* and cause *NPE* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14902) RBF: NullPointer When Misconfigured
[ https://issues.apache.org/jira/browse/HDFS-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975302#comment-16975302 ] David Mollitor commented on HDFS-14902: --- I don't have a great way of checking this. I was previously using the prepackaged Hadoop binaries with the default configuration. I agree that it should not even start. > RBF: NullPointer When Misconfigured > --- > > Key: HDFS-14902 > URL: https://issues.apache.org/jira/browse/HDFS-14902 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: Takanobu Asanuma >Priority: Minor > Attachments: HDFS-14902.001.patch, HDFS-14902.002.patch > > > Admittedly the server was mis-configured, but this should be a bit more > elegant. > {code:none} > 2019-10-08 11:19:52,505 ERROR router.NamenodeHeartbeatService: Unhandled > exception updating NN registration for null:null > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831) > at > org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119) > at > org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108) > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:259) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159) > at > org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14872) Read HDFS Blocks in Random Order
[ https://issues.apache.org/jira/browse/HDFS-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957343#comment-16957343 ] David Mollitor commented on HDFS-14872: --- Might be able to create a new copy routine at a higher level with the existing HDFS FS API. Need to check, https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileUtil.html#copy-java.io.File-org.apache.hadoop.fs.FileSystem-org.apache.hadoop.fs.Path-boolean-org.apache.hadoop.conf.Configuration- > Read HDFS Blocks in Random Order > > > Key: HDFS-14872 > URL: https://issues.apache.org/jira/browse/HDFS-14872 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs-client >Affects Versions: 2.8.5, 3.2.1 >Reporter: David Mollitor >Priority: Major > > When the HDFS client is downloading (copying) an entire file, allow the > client to download the blocks in random order. If a lot of clients are > reading the same file, in parallel, they will all download the first block, > the second block, and so on, stampeding down the line. > It would be interesting to spread the load across across all the available > DataNodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952812#comment-16952812 ] David Mollitor commented on HDFS-14854: --- [~sodonnell] Thanks. Looks good! > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, > HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, > HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, > HDFS-14854.008.patch, HDFS-14854.009.patch, HDFS-14854.010.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952183#comment-16952183 ] David Mollitor commented on HDFS-14854: --- What I was saying before,... now that I've dug into it a bit more, is that we should look at revamping the {{org.apache.hadoop.hdfs.server.blockmanagement.LowRedundancyBlocks}} class as part of this effort. > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, > HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, > HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, > HDFS-14854.008.patch, HDFS-14854.009.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952179#comment-16952179 ] David Mollitor commented on HDFS-14854: --- # https://stackoverflow.com/questions/10868423/lock-lock-before-try # Please grab the lock for {{dn.getStorageInfos()}} in its own block. Easier to reason about. # Using a 'null' value in this way is overloading the use of the {{Map}} class and it's not clearly articulated in the comments how this works. I think it would be much cleaner to have {{processPendingNodes()}} return a list of nodes that need to be processed instead of populating the {{Map}} in this way. {code:java} List pendingNodes; try { ... processCancelledNodes(); pendingNodes = processPendingNodes(); } finally { namesystem.writeUnlock(); } ... check(pendingNodes); {code} 4. bq. For nodes to be added to pendingNodes, that is always done under the namenode writeLock Please put that as a requirement in the JavaDoc for {{startTrackingNode}} method. 5. I worry about the needless locking because that lock is a very hot lock,... used all over the place, and the time per iteration is configurable, so 30 seconds is the default, but user may opt to lower to 1 second and there's no information for them to know that this will increase the lock retention, even if there is nothing to replicate. > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, > HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, > HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, > HDFS-14854.008.patch, HDFS-14854.009.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952069#comment-16952069 ] David Mollitor commented on HDFS-14854: --- [~sodonnell] [~elgoiri] I provided some feedback for you to review regarding this specific patching. However, I would like to draw your attention to something I was saying before... I think it would be cool if we could also include the {{BlockManager#neededReconstruction}} in improving decommissioning. There is a bunch of polling going on in this class, checking sizes and statuses. I think some of that could be removed by making the {{BlockManager#neededReconstruction}} Collection a synchronized priority queue perhaps it should just be it's own priority queue-backed {{ExecutorService}}. This will help in that requests from dead nodes will be prioritized ahead of requests for decommissioning. You could probably also make it a {{BlockingQueue}} with a fixed-size so that threads block if the queue gets too large. In this way, there doesn't need to be batching. Just figure out the next block to replicate, give up the global lock, try to add it to the {{neededReconstruction}} queue, and once complete, go find the next block to replicate. Something like that. > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, > HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, > HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, > HDFS-14854.008.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952040#comment-16952040 ] David Mollitor commented on HDFS-14854: --- {code:java} if (blockManager.blocksMap.getStoredBlock(block) == null) { LOG.trace("Removing unknown block {}", block); return true; } long bcId = block.getBlockCollectionId(); if (bcId == INodeId.INVALID_INODE_ID) { // Orphan block, will be invalidated eventually. Skip. return false; } {code} I think it should return 'true' if the block is orphaned, no? It should skip them in the same way that an 'unknown' block is. > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, > HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, > HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, > HDFS-14854.008.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952019#comment-16952019 ] David Mollitor commented on HDFS-14854: --- This code knows the pendingCount value and the pendingRepLimit... do not grab the write lock if the function is going to immediately return anyway. {code:java} int pendingCount = getPendingCount(); try { namesystem.writeLock(); long repQueueSize = blockManager.getLowRedundancyBlocksCount(); ... if (pendingCount >= pendingRepLimit) { return; } {code} > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, > HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, > HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, > HDFS-14854.008.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952014#comment-16952014 ] David Mollitor commented on HDFS-14854: --- Please remove this method. It can be replaced with {{map.computeIfAbsent(key, k -> new LinkedList()).add(v);}} {code:java} private void addBlockToPending(DatanodeDescriptor dn, BlockInfo block) { List blockList = pendingRep.get(dn); if (blockList == null) { blockList = new LinkedList<>(); pendingRep.put(dn, blockList); } blockList.add(block); } {code} https://docs.oracle.com/javase/8/docs/api/java/util/Map.html#computeIfAbsent-K-java.util.function.Function- > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, > HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, > HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, > HDFS-14854.008.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952009#comment-16952009 ] David Mollitor commented on HDFS-14854: --- Nit: this is not very java-y... {code:java} final List toRemove = new ArrayList<>(); ... processMaintenanceNodes(toRemove); ... // Check if any nodes have reached zero blocks and also update the stats // exposed via JMX for all nodes still being processed. checkForCompletedNodes(toRemove); // Finally move the nodes to their final state if they are ready. processCompletedNodes(toRemove); {code} Better to remove coupling: {code:java} final List maintenanceExpiredNodes = getMaintenanceNodes(); ... final List completedNodes = getCompletedNodes(); Iterable nodesToRemove = Iterables.unmodifiableIterable( Iterables.concat(maintenanceExpiredNodes , completedNodes)); // Finally move the nodes to their final state if they are ready. processCompletedNodes(Lists.newArrayList(nodesToRemove)); {code} > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, > HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, > HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, > HDFS-14854.008.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952001#comment-16952001 ] David Mollitor commented on HDFS-14854: --- {code:java} private void processPendingNodes() { while (!pendingNodes.isEmpty() && (maxConcurrentTrackedNodes == 0 || outOfServiceNodeBlocks.size() < maxConcurrentTrackedNodes)) { outOfServiceNodeBlocks.put(pendingNodes.poll(), null); } } {code} This method is accessed by the local running Thread. However, {{pendingNodes}} does not appear to be a thread-safe class. Perhaps the collection cannot be modified because of the external locking of the {{writeLock}} but there is no requirement to have the lock stated in the {{startTrackingNode}} method. > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, > HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, > HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, > HDFS-14854.008.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952001#comment-16952001 ] David Mollitor edited comment on HDFS-14854 at 10/15/19 3:04 PM: - {code:java} private void processPendingNodes() { while (!pendingNodes.isEmpty() && (maxConcurrentTrackedNodes == 0 || outOfServiceNodeBlocks.size() < maxConcurrentTrackedNodes)) { outOfServiceNodeBlocks.put(pendingNodes.poll(), null); } } {code} This method is accessed by the local running Thread. However, {{pendingNodes}} does not appear to be a thread-safe Collection. Perhaps the collection cannot be modified because of the external locking of the {{writeLock}} but there is no requirement to have the lock stated in the {{startTrackingNode}} method javadoc. was (Author: belugabehr): {code:java} private void processPendingNodes() { while (!pendingNodes.isEmpty() && (maxConcurrentTrackedNodes == 0 || outOfServiceNodeBlocks.size() < maxConcurrentTrackedNodes)) { outOfServiceNodeBlocks.put(pendingNodes.poll(), null); } } {code} This method is accessed by the local running Thread. However, {{pendingNodes}} does not appear to be a thread-safe class. Perhaps the collection cannot be modified because of the external locking of the {{writeLock}} but there is no requirement to have the lock stated in the {{startTrackingNode}} method. > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, > HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, > HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, > HDFS-14854.008.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951991#comment-16951991 ] David Mollitor commented on HDFS-14854: --- The method {{scanDatanodeStorage}} uses the {{namesystem.readLock();}} in a pretty verbose and complicated way. If the idea here is to grab the {{readLock}} for each DataNode, and unlock it after processing each DataNode, simply move the {{try...finally}} block inside the loop. > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, > HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, > HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, > HDFS-14854.008.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951985#comment-16951985 ] David Mollitor commented on HDFS-14854: --- {code:java} while (!pendingNodes.isEmpty() && (maxConcurrentTrackedNodes == 0 || outOfServiceNodeBlocks.size() < maxConcurrentTrackedNodes)) { outOfServiceNodeBlocks.put(pendingNodes.poll(), null); } {code} Using 'null' values is very out of vogue. Better to put a new {{HashMap}} here. Allows for simplification of the code by assuming that values will never be 'null'. The cost of creating a HashMap is very low here, especially it's only one per DataNode. > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, > HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, > HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, > HDFS-14854.008.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951978#comment-16951978 ] David Mollitor edited comment on HDFS-14854 at 10/15/19 2:43 PM: - The {{cancelledNodes}} data structure is a {{List}} but it should be a {{Queue}} {code:java} while (!queue.isEmpty()) { queue.poll(); } {code} was (Author: belugabehr): The {{cancelledNodes}} data structure is a {{List}} but it should be a {{Queue}} > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, > HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, > HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, > HDFS-14854.008.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951978#comment-16951978 ] David Mollitor commented on HDFS-14854: --- The {{cancelledNodes}} data structure is a {{List}} but it should be a {{Queue}} > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, > HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, > HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, > HDFS-14854.008.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951972#comment-16951972 ] David Mollitor commented on HDFS-14854: --- I'm looking at this now, but one nit: {code:java|title=Currently} try { namesystem.writeLock(); ... } finally { namesystem.writeUnlock(); } {code} Best practice is to grab the lock outside of the try statement. [https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantLock.html] {code:java|title=Currently} namesystem.writeLock(); try { ... } finally { namesystem.writeUnlock(); } {code} > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, > HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, > HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, > HDFS-14854.008.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14902) RBF: NullPointer When Misconfigured
[ https://issues.apache.org/jira/browse/HDFS-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947053#comment-16947053 ] David Mollitor commented on HDFS-14902: --- I just downloaded the Hadoop binaries and ran no-arg {{./hdfs dfsrouter}} > RBF: NullPointer When Misconfigured > --- > > Key: HDFS-14902 > URL: https://issues.apache.org/jira/browse/HDFS-14902 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.2.0 >Reporter: David Mollitor >Priority: Minor > > Admittedly the server was mis-configured, but this should be a bit more > elegant. > {code:none} > 2019-10-08 11:19:52,505 ERROR router.NamenodeHeartbeatService: Unhandled > exception updating NN registration for null:null > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831) > at > org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119) > at > org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108) > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:259) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159) > at > org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF
[ https://issues.apache.org/jira/browse/HDFS-14899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946988#comment-16946988 ] David Mollitor commented on HDFS-14899: --- [~goiri] Yes. I also verified with a quick and simple proxy... shout-out to [~ayushsaxena] for the idea. > Use Relative URLS in Hadoop HDFS RBF > > > Key: HDFS-14899 > URL: https://issues.apache.org/jira/browse/HDFS-14899 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14899.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14902) NullPointer When Misconfigured
David Mollitor created HDFS-14902: - Summary: NullPointer When Misconfigured Key: HDFS-14902 URL: https://issues.apache.org/jira/browse/HDFS-14902 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Affects Versions: 3.2.0 Reporter: David Mollitor Admittedly the server was mis-configured, but this should be a bit more elegant. {code:none} 2019-10-08 11:19:52,505 ERROR router.NamenodeHeartbeatService: Unhandled exception updating NN registration for null:null java.lang.NullPointerException at org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831) at org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119) at org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108) at org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:259) at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223) at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159) at org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14902) NullPointer When Misconfigured
[ https://issues.apache.org/jira/browse/HDFS-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14902: -- Priority: Minor (was: Major) > NullPointer When Misconfigured > -- > > Key: HDFS-14902 > URL: https://issues.apache.org/jira/browse/HDFS-14902 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.2.0 >Reporter: David Mollitor >Priority: Minor > > Admittedly the server was mis-configured, but this should be a bit more > elegant. > {code:none} > 2019-10-08 11:19:52,505 ERROR router.NamenodeHeartbeatService: Unhandled > exception updating NN registration for null:null > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831) > at > org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119) > at > org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108) > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:259) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159) > at > org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14898) Use Relative URLS in Hadoop HDFS HTTP FS
[ https://issues.apache.org/jira/browse/HDFS-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14898: -- Attachment: HDFS-14898.2.patch > Use Relative URLS in Hadoop HDFS HTTP FS > > > Key: HDFS-14898 > URL: https://issues.apache.org/jira/browse/HDFS-14898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14898.1.patch, HDFS-14898.2.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14898) Use Relative URLS in Hadoop HDFS HTTP FS
[ https://issues.apache.org/jira/browse/HDFS-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946925#comment-16946925 ] David Mollitor commented on HDFS-14898: --- [~ayushtkn] Thank you so much. Super helpful. I just used that method, using Caddy, to discover that my first patch was incorrect. Thanks! New patch supplied. > Use Relative URLS in Hadoop HDFS HTTP FS > > > Key: HDFS-14898 > URL: https://issues.apache.org/jira/browse/HDFS-14898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14898.1.patch, HDFS-14898.2.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14898) Use Relative URLS in Hadoop HDFS HTTP FS
[ https://issues.apache.org/jira/browse/HDFS-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14898: -- Status: Open (was: Patch Available) > Use Relative URLS in Hadoop HDFS HTTP FS > > > Key: HDFS-14898 > URL: https://issues.apache.org/jira/browse/HDFS-14898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14898.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF
[ https://issues.apache.org/jira/browse/HDFS-14899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14899: -- Status: Open (was: Patch Available) > Use Relative URLS in Hadoop HDFS RBF > > > Key: HDFS-14899 > URL: https://issues.apache.org/jira/browse/HDFS-14899 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14899.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF
[ https://issues.apache.org/jira/browse/HDFS-14899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14899: -- Attachment: HDFS-14899.1.patch > Use Relative URLS in Hadoop HDFS RBF > > > Key: HDFS-14899 > URL: https://issues.apache.org/jira/browse/HDFS-14899 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14899.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF
[ https://issues.apache.org/jira/browse/HDFS-14899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14899: -- Status: Patch Available (was: Open) > Use Relative URLS in Hadoop HDFS RBF > > > Key: HDFS-14899 > URL: https://issues.apache.org/jira/browse/HDFS-14899 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14899.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF
[ https://issues.apache.org/jira/browse/HDFS-14899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14899: -- Attachment: (was: HDFS-14899.1.patch) > Use Relative URLS in Hadoop HDFS RBF > > > Key: HDFS-14899 > URL: https://issues.apache.org/jira/browse/HDFS-14899 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14899.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF
[ https://issues.apache.org/jira/browse/HDFS-14899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14899: -- Status: Patch Available (was: Open) > Use Relative URLS in Hadoop HDFS RBF > > > Key: HDFS-14899 > URL: https://issues.apache.org/jira/browse/HDFS-14899 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14899.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF
David Mollitor created HDFS-14899: - Summary: Use Relative URLS in Hadoop HDFS RBF Key: HDFS-14899 URL: https://issues.apache.org/jira/browse/HDFS-14899 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor Attachments: HDFS-14899.1.patch -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14899) Use Relative URLS in Hadoop HDFS RBF
[ https://issues.apache.org/jira/browse/HDFS-14899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14899: -- Attachment: HDFS-14899.1.patch > Use Relative URLS in Hadoop HDFS RBF > > > Key: HDFS-14899 > URL: https://issues.apache.org/jira/browse/HDFS-14899 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14899.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14898) Use Relative URLS in Hadoop HDFS HTTP FS
[ https://issues.apache.org/jira/browse/HDFS-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14898: -- Status: Patch Available (was: Open) > Use Relative URLS in Hadoop HDFS HTTP FS > > > Key: HDFS-14898 > URL: https://issues.apache.org/jira/browse/HDFS-14898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14898.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14898) Use Relative URLS in Hadoop HDFS HTTP FS
[ https://issues.apache.org/jira/browse/HDFS-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14898: -- Attachment: HDFS-14898.1.patch > Use Relative URLS in Hadoop HDFS HTTP FS > > > Key: HDFS-14898 > URL: https://issues.apache.org/jira/browse/HDFS-14898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14898.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14898) Use Relative URLS in Hadoop HDFS HTTP FS
David Mollitor created HDFS-14898: - Summary: Use Relative URLS in Hadoop HDFS HTTP FS Key: HDFS-14898 URL: https://issues.apache.org/jira/browse/HDFS-14898 Project: Hadoop HDFS Issue Type: Improvement Components: httpfs Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14898) Use Relative URLS in Hadoop HDFS HTTP FS
[ https://issues.apache.org/jira/browse/HDFS-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14898: -- Flags: Patch > Use Relative URLS in Hadoop HDFS HTTP FS > > > Key: HDFS-14898 > URL: https://issues.apache.org/jira/browse/HDFS-14898 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14872) Read HDFS Blocks in Random Order
[ https://issues.apache.org/jira/browse/HDFS-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938614#comment-16938614 ] David Mollitor commented on HDFS-14872: --- [~sodonnell] I imagine something like... the client looks up the size of the file in HDFS and pre-allocates the file on the local system, then it gets a list of all the blocks for the file, shuffles them, iterates over them, then starts writing blocks to the local file at the required offsets. Once the list of blocks is exhausted, the file is complete and made available to the application. The first use case that comes to mind is better supporting large files submitted to the cluster that are required for MapReduce / Spark applications. The jobs will not start unless all of the required files are first localized from HDFS into the local host by the YARN NodeManager. If the job requires a large JAR file or, even more likely, a large dependency file, all of the nodes will fight with each other to download the blocks in order. One could increase {{mapreduce.client.submit.file.replication}}, however this has its limitations as well. In a large cluster, it may take a long time for the NameNode to schedule all of the replication required to get all of the blocks up to the requested replication. https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/SharedCache.html https://blog.cloudera.com/resource-localization-in-yarn-deep-dive/ https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml > Read HDFS Blocks in Random Order > > > Key: HDFS-14872 > URL: https://issues.apache.org/jira/browse/HDFS-14872 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs-client >Affects Versions: 2.8.5, 3.2.1 >Reporter: David Mollitor >Priority: Major > > When the HDFS client is downloading (copying) an entire file, allow the > client to download the blocks in random order. If a lot of clients are > reading the same file, in parallel, they will all download the first block, > the second block, and so on, stampeding down the line. > It would be interesting to spread the load across across all the available > DataNodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14863) Remove Synchronization From BlockPlacementPolicyDefault
[ https://issues.apache.org/jira/browse/HDFS-14863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938012#comment-16938012 ] David Mollitor commented on HDFS-14863: --- Different unit tests failed on the second Yetus run. Flaky tests. This particular data structure is accessed in a few places, but this is the only place it is synchronized on. I just don't see a reason for it and it's not documented anywhere as to why this may be the case. > Remove Synchronization From BlockPlacementPolicyDefault > --- > > Key: HDFS-14863 > URL: https://issues.apache.org/jira/browse/HDFS-14863 > Project: Hadoop HDFS > Issue Type: Improvement > Components: block placement >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14863.1.patch, HDFS-14863.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L1010 > The {{clusterMap}} has its own internal synchronization. Also, these are > only read operations so any changes applied to the {{clusterMap}} from > another thread will be applied since no other thread synchronizes on the > {{clusterMap}} itself (that I could find). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14863) Remove Synchronization From BlockPlacementPolicyDefault
[ https://issues.apache.org/jira/browse/HDFS-14863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14863: -- Status: Open (was: Patch Available) > Remove Synchronization From BlockPlacementPolicyDefault > --- > > Key: HDFS-14863 > URL: https://issues.apache.org/jira/browse/HDFS-14863 > Project: Hadoop HDFS > Issue Type: Improvement > Components: block placement >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14863.1.patch, HDFS-14863.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L1010 > The {{clusterMap}} has its own internal synchronization. Also, these are > only read operations so any changes applied to the {{clusterMap}} from > another thread will be applied since no other thread synchronizes on the > {{clusterMap}} itself (that I could find). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14863) Remove Synchronization From BlockPlacementPolicyDefault
[ https://issues.apache.org/jira/browse/HDFS-14863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14863: -- Status: Patch Available (was: Open) > Remove Synchronization From BlockPlacementPolicyDefault > --- > > Key: HDFS-14863 > URL: https://issues.apache.org/jira/browse/HDFS-14863 > Project: Hadoop HDFS > Issue Type: Improvement > Components: block placement >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14863.1.patch, HDFS-14863.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L1010 > The {{clusterMap}} has its own internal synchronization. Also, these are > only read operations so any changes applied to the {{clusterMap}} from > another thread will be applied since no other thread synchronizes on the > {{clusterMap}} itself (that I could find). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14863) Remove Synchronization From BlockPlacementPolicyDefault
[ https://issues.apache.org/jira/browse/HDFS-14863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14863: -- Attachment: HDFS-14863.2.patch > Remove Synchronization From BlockPlacementPolicyDefault > --- > > Key: HDFS-14863 > URL: https://issues.apache.org/jira/browse/HDFS-14863 > Project: Hadoop HDFS > Issue Type: Improvement > Components: block placement >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14863.1.patch, HDFS-14863.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L1010 > The {{clusterMap}} has its own internal synchronization. Also, these are > only read operations so any changes applied to the {{clusterMap}} from > another thread will be applied since no other thread synchronizes on the > {{clusterMap}} itself (that I could find). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14862) Review of MovedBlocks
[ https://issues.apache.org/jira/browse/HDFS-14862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937794#comment-16937794 ] David Mollitor commented on HDFS-14862: --- [~elgoiri] Thank you for taking a look. I'll take a look at this again. This class is a bit confusing, the {{getLocations()}} method included. There is no reason that this method needs to be synchronized at all because the variable {{locations}} is defined as {{final}} in the constructor and therefore will never change. Since it never changes, there's no need to synchronize. There are no comments in the code, so it's a bit hard to understand how it's being used, but it may be a life-cycle thing. That is, multiple threads may be used to add new locations to the block, but then at the end, only a single thread accesses the results (through {{getLocations()}}). However, I just realized that the blocks are also being synchronized externally as well. https://github.com/apache/hadoop/blob/1de25d134f64d815f9b43606fa426ece5ddbc430/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java#L831 I think I'll drop the external synchronization in {{Dispatcher.java}}, keep the synchronization on the collection (because it is protected) and put a comment on the {{getLocations()}} method that warns user of trying to interact with the returned List... that it is not thread safe to change the contents and may throw a {{ConcurrentModicationException}} if the underlying collection is modified. > Review of MovedBlocks > - > > Key: HDFS-14862 > URL: https://issues.apache.org/jira/browse/HDFS-14862 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14862.1.patch > > > Internal data structure needs to be protected (synchronized) but is scoped as > {{protected}} so any sub-class could modify without a lock. Synchronize the > collection itself for protection. It also returns the internal data > structure in {{getLocations}} so the structure could be modified outside of > the lock. Create a copy instead. > {code:java} > /** The locations of the replicas of the block. */ > protected final List locations = new ArrayList(3); > > public Locations(Block block) { > this.block = block; > } > > /** clean block locations */ > public synchronized void clearLocations() { > locations.clear(); > } > ... >/** @return its locations */ > public synchronized List getLocations() { > return locations; > } > {code} > > [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/MovedBlocks.java#L43] > Also, remove a bunch of superfluous and complicated code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14865) Reduce Synchronization in DatanodeManager
[ https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937757#comment-16937757 ] David Mollitor commented on HDFS-14865: --- [~elgoiri] So, I was looking at the code, looking to remove some synchronization wherever I can find. In this case, the threads require the {{namesystem}} write lock to interact with methods that modify the internal structure, so that is a pretty restrictive access applied. On top of it, I did not remove synchronization from the class per se, I simply pushed it down into the collection using a {{ConcurrentHashMap}}. The {{ConcurrentHashMap}} is nifty because it internally has several locks so that it can lock certain sections of the {{Map}} without locking the entire structure. My aim here is to make it possible for the {{get*()}} methods of this class to be able to interact with the structure without serialized synchronization. > Reduce Synchronization in DatanodeManager > - > > Key: HDFS-14865 > URL: https://issues.apache.org/jira/browse/HDFS-14865 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14865.1.patch, HDFS-14865.2.patch, > HDFS-14865.3.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java
[ https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937147#comment-16937147 ] David Mollitor commented on HDFS-14860: --- Still getting {{java.lang.OutOfMemoryError: unable to create new native thread}} Maybe the build containers need to be larger. I'll try again in the next few days and see if the issue clears. > Clean Up StoragePolicySatisfyManager.java > - > > Key: HDFS-14860 > URL: https://issues.apache.org/jira/browse/HDFS-14860 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch, > HDFS-14860.3.patch > > > * Remove superfluous debug log guards > * Use {{java.util.concurrent}} package for internal structure instead of > external synchronization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue
[ https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14864: -- Status: Patch Available (was: Open) > DatanodeDescriptor Use Concurrent BlockingQueue > --- > > Key: HDFS-14864 > URL: https://issues.apache.org/jira/browse/HDFS-14864 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch, > HDFS-14864.3.patch, HDFS-14864.4.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106 > This collection needs to be thread safe and it needs to repeatedly poll the > queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method > just for this purpose: > {quote} > This operation may be more efficient than repeatedly polling this queue. > {quote} > [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)] > Also, the collection returns 'null' if there is nothing to drain from the > queue. This is a confusing and error-prone affect. It should just return an > empty list. I've also updated the code to be more consistent and to return a > java {{List}} in all places instead of a {{List}} in some and a native array > in others. This will make the entire usage much more consistent and safe. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue
[ https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14864: -- Attachment: HDFS-14864.4.patch > DatanodeDescriptor Use Concurrent BlockingQueue > --- > > Key: HDFS-14864 > URL: https://issues.apache.org/jira/browse/HDFS-14864 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch, > HDFS-14864.3.patch, HDFS-14864.4.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106 > This collection needs to be thread safe and it needs to repeatedly poll the > queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method > just for this purpose: > {quote} > This operation may be more efficient than repeatedly polling this queue. > {quote} > [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)] > Also, the collection returns 'null' if there is nothing to drain from the > queue. This is a confusing and error-prone affect. It should just return an > empty list. I've also updated the code to be more consistent and to return a > java {{List}} in all places instead of a {{List}} in some and a native array > in others. This will make the entire usage much more consistent and safe. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue
[ https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14864: -- Status: Open (was: Patch Available) > DatanodeDescriptor Use Concurrent BlockingQueue > --- > > Key: HDFS-14864 > URL: https://issues.apache.org/jira/browse/HDFS-14864 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch, > HDFS-14864.3.patch, HDFS-14864.4.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106 > This collection needs to be thread safe and it needs to repeatedly poll the > queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method > just for this purpose: > {quote} > This operation may be more efficient than repeatedly polling this queue. > {quote} > [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)] > Also, the collection returns 'null' if there is nothing to drain from the > queue. This is a confusing and error-prone affect. It should just return an > empty list. I've also updated the code to be more consistent and to return a > java {{List}} in all places instead of a {{List}} in some and a native array > in others. This will make the entire usage much more consistent and safe. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue
[ https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937006#comment-16937006 ] David Mollitor commented on HDFS-14864: --- {{java.lang.OutOfMemoryError: unable to create new native thread}} seems to be a common failure. Doesn't look related, but I'll kick it off once more to see if the number of tests review come down. > DatanodeDescriptor Use Concurrent BlockingQueue > --- > > Key: HDFS-14864 > URL: https://issues.apache.org/jira/browse/HDFS-14864 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch, > HDFS-14864.3.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106 > This collection needs to be thread safe and it needs to repeatedly poll the > queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method > just for this purpose: > {quote} > This operation may be more efficient than repeatedly polling this queue. > {quote} > [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)] > Also, the collection returns 'null' if there is nothing to drain from the > queue. This is a confusing and error-prone affect. It should just return an > empty list. I've also updated the code to be more consistent and to return a > java {{List}} in all places instead of a {{List}} in some and a native array > in others. This will make the entire usage much more consistent and safe. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java
[ https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14860: -- Status: Patch Available (was: Open) > Clean Up StoragePolicySatisfyManager.java > - > > Key: HDFS-14860 > URL: https://issues.apache.org/jira/browse/HDFS-14860 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch, > HDFS-14860.3.patch > > > * Remove superfluous debug log guards > * Use {{java.util.concurrent}} package for internal structure instead of > external synchronization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14843) Double Synchronization in BlockReportLeaseManager
[ https://issues.apache.org/jira/browse/HDFS-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936816#comment-16936816 ] David Mollitor commented on HDFS-14843: --- [~elgoiri] Are you able to help me out on this one too? > Double Synchronization in BlockReportLeaseManager > - > > Key: HDFS-14843 > URL: https://issues.apache.org/jira/browse/HDFS-14843 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14843.1.patch > > > {code:java|title=BlockReportLeaseManager.java} > private synchronized long getNextId() { > long id; > do { > id = nextId++; > } while (id == 0); > return id; > } > {code} > This is a private method and is synchronized, however, it is only be accessed > from an already-synchronized method. No need to double-synchronize. > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L183-L189 > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L227 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14837) Review of Block.java
[ https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936804#comment-16936804 ] David Mollitor commented on HDFS-14837: --- [~elgoiri] Are we good to move forward on this? > Review of Block.java > > > Key: HDFS-14837 > URL: https://issues.apache.org/jira/browse/HDFS-14837 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14837.1.patch, HDFS-14837.2.patch, > HDFS-14837.3.patch, HDFS-14837.4.patch > > > The {{Block}} class is such a core class in the project, I just wanted to > make sure it was super clean and documentation was correct. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14872) Read HDFS Blocks in Random Order
David Mollitor created HDFS-14872: - Summary: Read HDFS Blocks in Random Order Key: HDFS-14872 URL: https://issues.apache.org/jira/browse/HDFS-14872 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client Affects Versions: 3.2.1, 2.8.5 Reporter: David Mollitor When the HDFS client is downloading (copying) an entire file, allow the client to download the blocks in random order. If a lot of clients are reading the same file, in parallel, they will all download the first block, the second block, and so on, stampeding down the line. It would be interesting to spread the load across across all the available DataNodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java
[ https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936141#comment-16936141 ] David Mollitor commented on HDFS-14860: --- I'll submit as many times at it takes :) > Clean Up StoragePolicySatisfyManager.java > - > > Key: HDFS-14860 > URL: https://issues.apache.org/jira/browse/HDFS-14860 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch, > HDFS-14860.3.patch > > > * Remove superfluous debug log guards > * Use {{java.util.concurrent}} package for internal structure instead of > external synchronization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java
[ https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14860: -- Attachment: HDFS-14860.3.patch > Clean Up StoragePolicySatisfyManager.java > - > > Key: HDFS-14860 > URL: https://issues.apache.org/jira/browse/HDFS-14860 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch, > HDFS-14860.3.patch > > > * Remove superfluous debug log guards > * Use {{java.util.concurrent}} package for internal structure instead of > external synchronization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java
[ https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14860: -- Status: Open (was: Patch Available) > Clean Up StoragePolicySatisfyManager.java > - > > Key: HDFS-14860 > URL: https://issues.apache.org/jira/browse/HDFS-14860 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch > > > * Remove superfluous debug log guards > * Use {{java.util.concurrent}} package for internal structure instead of > external synchronization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue
[ https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14864: -- Attachment: HDFS-14864.3.patch > DatanodeDescriptor Use Concurrent BlockingQueue > --- > > Key: HDFS-14864 > URL: https://issues.apache.org/jira/browse/HDFS-14864 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch, > HDFS-14864.3.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106 > This collection needs to be thread safe and it needs to repeatedly poll the > queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method > just for this purpose: > {quote} > This operation may be more efficient than repeatedly polling this queue. > {quote} > [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)] > Also, the collection returns 'null' if there is nothing to drain from the > queue. This is a confusing and error-prone affect. It should just return an > empty list. I've also updated the code to be more consistent and to return a > java {{List}} in all places instead of a {{List}} in some and a native array > in others. This will make the entire usage much more consistent and safe. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue
[ https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14864: -- Status: Patch Available (was: Open) Same patch. Different name to kick off CI. > DatanodeDescriptor Use Concurrent BlockingQueue > --- > > Key: HDFS-14864 > URL: https://issues.apache.org/jira/browse/HDFS-14864 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch, > HDFS-14864.3.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106 > This collection needs to be thread safe and it needs to repeatedly poll the > queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method > just for this purpose: > {quote} > This operation may be more efficient than repeatedly polling this queue. > {quote} > [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)] > Also, the collection returns 'null' if there is nothing to drain from the > queue. This is a confusing and error-prone affect. It should just return an > empty list. I've also updated the code to be more consistent and to return a > java {{List}} in all places instead of a {{List}} in some and a native array > in others. This will make the entire usage much more consistent and safe. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue
[ https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14864: -- Status: Open (was: Patch Available) > DatanodeDescriptor Use Concurrent BlockingQueue > --- > > Key: HDFS-14864 > URL: https://issues.apache.org/jira/browse/HDFS-14864 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106 > This collection needs to be thread safe and it needs to repeatedly poll the > queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method > just for this purpose: > {quote} > This operation may be more efficient than repeatedly polling this queue. > {quote} > [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)] > Also, the collection returns 'null' if there is nothing to drain from the > queue. This is a confusing and error-prone affect. It should just return an > empty list. I've also updated the code to be more consistent and to return a > java {{List}} in all places instead of a {{List}} in some and a native array > in others. This will make the entire usage much more consistent and safe. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager
[ https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14865: -- Attachment: HDFS-14865.3.patch > Reduce Synchronization in DatanodeManager > - > > Key: HDFS-14865 > URL: https://issues.apache.org/jira/browse/HDFS-14865 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14865.1.patch, HDFS-14865.2.patch, > HDFS-14865.3.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager
[ https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14865: -- Status: Open (was: Patch Available) > Reduce Synchronization in DatanodeManager > - > > Key: HDFS-14865 > URL: https://issues.apache.org/jira/browse/HDFS-14865 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14865.1.patch, HDFS-14865.2.patch, > HDFS-14865.3.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager
[ https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14865: -- Status: Patch Available (was: Open) > Reduce Synchronization in DatanodeManager > - > > Key: HDFS-14865 > URL: https://issues.apache.org/jira/browse/HDFS-14865 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14865.1.patch, HDFS-14865.2.patch, > HDFS-14865.3.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager
[ https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14865: -- Status: Patch Available (was: Open) > Reduce Synchronization in DatanodeManager > - > > Key: HDFS-14865 > URL: https://issues.apache.org/jira/browse/HDFS-14865 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14865.1.patch, HDFS-14865.2.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager
[ https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14865: -- Status: Open (was: Patch Available) > Reduce Synchronization in DatanodeManager > - > > Key: HDFS-14865 > URL: https://issues.apache.org/jira/browse/HDFS-14865 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14865.1.patch, HDFS-14865.2.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager
[ https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14865: -- Attachment: HDFS-14865.2.patch > Reduce Synchronization in DatanodeManager > - > > Key: HDFS-14865 > URL: https://issues.apache.org/jira/browse/HDFS-14865 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14865.1.patch, HDFS-14865.2.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue
[ https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14864: -- Status: Patch Available (was: Open) > DatanodeDescriptor Use Concurrent BlockingQueue > --- > > Key: HDFS-14864 > URL: https://issues.apache.org/jira/browse/HDFS-14864 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106 > This collection needs to be thread safe and it needs to repeatedly poll the > queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method > just for this purpose: > {quote} > This operation may be more efficient than repeatedly polling this queue. > {quote} > [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)] > Also, the collection returns 'null' if there is nothing to drain from the > queue. This is a confusing and error-prone affect. It should just return an > empty list. I've also updated the code to be more consistent and to return a > java {{List}} in all places instead of a {{List}} in some and a native array > in others. This will make the entire usage much more consistent and safe. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue
[ https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14864: -- Attachment: HDFS-14864.2.patch > DatanodeDescriptor Use Concurrent BlockingQueue > --- > > Key: HDFS-14864 > URL: https://issues.apache.org/jira/browse/HDFS-14864 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14864.1.patch, HDFS-14864.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106 > This collection needs to be thread safe and it needs to repeatedly poll the > queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method > just for this purpose: > {quote} > This operation may be more efficient than repeatedly polling this queue. > {quote} > [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)] > Also, the collection returns 'null' if there is nothing to drain from the > queue. This is a confusing and error-prone affect. It should just return an > empty list. I've also updated the code to be more consistent and to return a > java {{List}} in all places instead of a {{List}} in some and a native array > in others. This will make the entire usage much more consistent and safe. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue
[ https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14864: -- Status: Open (was: Patch Available) > DatanodeDescriptor Use Concurrent BlockingQueue > --- > > Key: HDFS-14864 > URL: https://issues.apache.org/jira/browse/HDFS-14864 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14864.1.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106 > This collection needs to be thread safe and it needs to repeatedly poll the > queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method > just for this purpose: > {quote} > This operation may be more efficient than repeatedly polling this queue. > {quote} > [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)] > Also, the collection returns 'null' if there is nothing to drain from the > queue. This is a confusing and error-prone affect. It should just return an > empty list. I've also updated the code to be more consistent and to return a > java {{List}} in all places instead of a {{List}} in some and a native array > in others. This will make the entire usage much more consistent and safe. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue
[ https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934738#comment-16934738 ] David Mollitor commented on HDFS-14864: --- Something seems up with the CI build. _TestHdfsNativeCodeLoader_ is a flaky failure for sure, and _shadedclient_ seems to be failing for all my recent patches. > DatanodeDescriptor Use Concurrent BlockingQueue > --- > > Key: HDFS-14864 > URL: https://issues.apache.org/jira/browse/HDFS-14864 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14864.1.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106 > This collection needs to be thread safe and it needs to repeatedly poll the > queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method > just for this purpose: > {quote} > This operation may be more efficient than repeatedly polling this queue. > {quote} > [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)] > Also, the collection returns 'null' if there is nothing to drain from the > queue. This is a confusing and error-prone affect. It should just return an > empty list. I've also updated the code to be more consistent and to return a > java {{List}} in all places instead of a {{List}} in some and a native array > in others. This will make the entire usage much more consistent and safe. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14866) NameNode stopRequested is Marked volatile
[ https://issues.apache.org/jira/browse/HDFS-14866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14866: -- Status: Patch Available (was: Open) > NameNode stopRequested is Marked volatile > - > > Key: HDFS-14866 > URL: https://issues.apache.org/jira/browse/HDFS-14866 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Trivial > Attachments: HDFS-14866.1.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java#L405 > "Used for testing" so not a big deal, but it's a bit odd that it's scoped as > 'protected' and is not 'volatile'. It could be accessed outside of a lock > and getting a bad value. Tighten that up a little. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14866) NameNode stopRequested is Marked volatile
[ https://issues.apache.org/jira/browse/HDFS-14866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14866: -- Attachment: HDFS-14866.1.patch > NameNode stopRequested is Marked volatile > - > > Key: HDFS-14866 > URL: https://issues.apache.org/jira/browse/HDFS-14866 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Trivial > Attachments: HDFS-14866.1.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java#L405 > "Used for testing" so not a big deal, but it's a bit odd that it's scoped as > 'protected' and is not 'volatile'. It could be accessed outside of a lock > and getting a bad value. Tighten that up a little. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14866) NameNode stopRequested is Marked volatile
David Mollitor created HDFS-14866: - Summary: NameNode stopRequested is Marked volatile Key: HDFS-14866 URL: https://issues.apache.org/jira/browse/HDFS-14866 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java#L405 "Used for testing" so not a big deal, but it's a bit odd that it's scoped as 'protected' and is not 'volatile'. It could be accessed outside of a lock and getting a bad value. Tighten that up a little. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java
[ https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14860: -- Status: Open (was: Patch Available) > Clean Up StoragePolicySatisfyManager.java > - > > Key: HDFS-14860 > URL: https://issues.apache.org/jira/browse/HDFS-14860 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch > > > * Remove superfluous debug log guards > * Use {{java.util.concurrent}} package for internal structure instead of > external synchronization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java
[ https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14860: -- Attachment: HDFS-14860.2.patch > Clean Up StoragePolicySatisfyManager.java > - > > Key: HDFS-14860 > URL: https://issues.apache.org/jira/browse/HDFS-14860 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch > > > * Remove superfluous debug log guards > * Use {{java.util.concurrent}} package for internal structure instead of > external synchronization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java
[ https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14860: -- Status: Patch Available (was: Open) > Clean Up StoragePolicySatisfyManager.java > - > > Key: HDFS-14860 > URL: https://issues.apache.org/jira/browse/HDFS-14860 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14860.1.patch, HDFS-14860.2.patch > > > * Remove superfluous debug log guards > * Use {{java.util.concurrent}} package for internal structure instead of > external synchronization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14860) Clean Up StoragePolicySatisfyManager.java
[ https://issues.apache.org/jira/browse/HDFS-14860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934684#comment-16934684 ] David Mollitor commented on HDFS-14860: --- [~elgoiri] Are you talking about this? {code:java} LOG.debug("Storage policy satisfier service is running outside namenode," + " ignoring"); {code} The compiler will take care of that at compile time. Not to worry. > Clean Up StoragePolicySatisfyManager.java > - > > Key: HDFS-14860 > URL: https://issues.apache.org/jira/browse/HDFS-14860 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14860.1.patch > > > * Remove superfluous debug log guards > * Use {{java.util.concurrent}} package for internal structure instead of > external synchronization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager
[ https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14865: -- Attachment: HDFS-14865.1.patch > Reduce Synchronization in DatanodeManager > - > > Key: HDFS-14865 > URL: https://issues.apache.org/jira/browse/HDFS-14865 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14865.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager
[ https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14865: -- Status: Patch Available (was: Open) > Reduce Synchronization in DatanodeManager > - > > Key: HDFS-14865 > URL: https://issues.apache.org/jira/browse/HDFS-14865 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-14865.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14865) Reduce Synchronization in DatanodeManager
David Mollitor created HDFS-14865: - Summary: Reduce Synchronization in DatanodeManager Key: HDFS-14865 URL: https://issues.apache.org/jira/browse/HDFS-14865 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14865) Reduce Synchronization in DatanodeManager
[ https://issues.apache.org/jira/browse/HDFS-14865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14865: -- Flags: Patch > Reduce Synchronization in DatanodeManager > - > > Key: HDFS-14865 > URL: https://issues.apache.org/jira/browse/HDFS-14865 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue
[jira] [Updated] (HDFS-14864) DatanodeDescriptor Use Concurrent BlockingQueue
[ https://issues.apache.org/jira/browse/HDFS-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HDFS-14864: -- Status: Patch Available (was: Open) > DatanodeDescriptor Use Concurrent BlockingQueue > --- > > Key: HDFS-14864 > URL: https://issues.apache.org/jira/browse/HDFS-14864 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14864.1.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L104-L106 > This collection needs to be thread safe and it needs to repeatedly poll the > queue to drain it, so use {{BlockingQueue}} which has a {{drain()}} method > just for this purpose: > {quote} > This operation may be more efficient than repeatedly polling this queue. > {quote} > [https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#drainTo(java.util.Collection,%20int)] > Also, the collection returns 'null' if there is nothing to drain from the > queue. This is a confusing and error-prone affect. It should just return an > empty list. I've also updated the code to be more consistent and to return a > java {{List}} in all places instead of a {{List}} in some and a native array > in others. This will make the entire usage much more consistent and safe. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org