[jira] [Updated] (HDFS-15638) Make Hive tables directory permission check flat
[ https://issues.apache.org/jira/browse/HDFS-15638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinli Shang updated HDFS-15638: --- Description: Problem: Currently, when a user tries to accesses a file he/she needs the permissions of it's parent and ancestors and the permission of that file. This is correct generally, but for Hive tables directories/files, all the files under a partition or even a table usually have the same permissions for the same set of ACL groups. Although the permissions and ACL groups are the same, the writer still need to call setfacl() for every file. This results in a huge amount of RPC calls to NN. HDFS has default ACL to solve that but that only applies to create and copy, but not apply for rename. However, in Hive ETL, rename is very common. Proposal: Add a 1-bit flag to directory inodes to indicate whether or not it is a Hive table directory. If that flag is set, then all the sub-directory and files under it will just use it's permission and ACL groups settings. By doing this way, Hive ETL doesn't need to set permissions at the file level. If that flag is not set(by default), work as before. To set/unset that flag, it would require admin privilege. was: Problem: Currently, when a user tries to accesses a file he/she needs not only the permission of that file but also the permissions of it's parent and ancestors. This is correct, but for Hive tables directory/files, all the files under a partition or even a table usually have the same permissions for the same set of ACL groups. Although the permissions and ACL groups are the same, the writer sometimes still need to call setfacl() for every file. This results in a huge amount of RPC calls to NN. HDFS has default ACL to solve that but that only applies to create and copy, but not apply for rename. However, in Hive ETL, rename is very common. Proposal: Add a 1-bit flag to directory inodes to indicate whether or not it is a Hive table directory. If that flag is set, then all the sub-directory and files under it will just use it's permission and ACL groups settings. By doing this way, Hive ETL doesn't need to set permissions at the file level. If that flag is not set(by default), work as before. To set/unset that flag, it would require admin privilege. > Make Hive tables directory permission check flat > - > > Key: HDFS-15638 > URL: https://issues.apache.org/jira/browse/HDFS-15638 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Xinli Shang >Priority: Major > > Problem: Currently, when a user tries to accesses a file he/she needs the > permissions of it's parent and ancestors and the permission of that file. > This is correct generally, but for Hive tables directories/files, all the > files under a partition or even a table usually have the same permissions for > the same set of ACL groups. Although the permissions and ACL groups are the > same, the writer still need to call setfacl() for every file. This results in > a huge amount of RPC calls to NN. HDFS has default ACL to solve that but that > only applies to create and copy, but not apply for rename. However, in Hive > ETL, rename is very common. > Proposal: Add a 1-bit flag to directory inodes to indicate whether or not it > is a Hive table directory. If that flag is set, then all the sub-directory > and files under it will just use it's permission and ACL groups settings. By > doing this way, Hive ETL doesn't need to set permissions at the file level. > If that flag is not set(by default), work as before. To set/unset that flag, > it would require admin privilege. > > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15638) Make Hive tables directory permission check flat
Xinli Shang created HDFS-15638: -- Summary: Make Hive tables directory permission check flat Key: HDFS-15638 URL: https://issues.apache.org/jira/browse/HDFS-15638 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: Xinli Shang Problem: Currently, when a user tries to accesses a file he/she needs not only the permission of that file but also the permissions of it's parent and ancestors. This is correct, but for Hive tables directory/files, all the files under a partition or even a table usually have the same permissions for the same set of ACL groups. Although the permissions and ACL groups are the same, the writer sometimes still need to call setfacl() for every file. This results in a huge amount of RPC calls to NN. HDFS has default ACL to solve that but that only applies to create and copy, but not apply for rename. However, in Hive ETL, rename is very common. Proposal: Add a 1-bit flag to directory inodes to indicate whether or not it is a Hive table directory. If that flag is set, then all the sub-directory and files under it will just use it's permission and ACL groups settings. By doing this way, Hive ETL doesn't need to set permissions at the file level. If that flag is not set(by default), work as before. To set/unset that flag, it would require admin privilege. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency
[ https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215697#comment-17215697 ] Ahmed Hussein commented on HDFS-15618: -- I checked the failing Junit tests. They are unrelated to the patch. I will file new Jiras for those tests that seem to be broken for sometime. On multiple occasions, I started to notice "Dominos effect" in HDFS tests. A test fails or times out causes other tests to fail because they could not bind to a port or they could not get enough resources. An example was that in the testRead() between TestBlockTokenWithDFSStriped and TestBlockTokenWithDFS where port 19870 is used by the two test cases. The following is the stack trace of TestDeadNodeDetection failure reported by hadoopQA. {code:bash} java.net.BindException: Problem binding to [localhost:44881] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:908) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:809) at org.apache.hadoop.ipc.Server.bind(Server.java:640) at org.apache.hadoop.ipc.Server$Listener.(Server.java:1210) at org.apache.hadoop.ipc.Server.(Server.java:3103) at org.apache.hadoop.ipc.RPC$Server.(RPC.java:1039) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.(ProtobufRpcEngine2.java:430) at org.apache.hadoop.ipc.ProtobufRpcEngine2.getServer(ProtobufRpcEngine2.java:350) at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:848) at org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:1031) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1452) at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:513) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2868) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2774) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2818) at org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2494) at org.apache.hadoop.hdfs.TestDeadNodeDetection.testDeadNodeDetectionDeadNodeRecovery(TestDeadNodeDetection.java:226) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at
[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency
[ https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215691#comment-17215691 ] Hadoop QA commented on HDFS-15618: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 58s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 2s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 17s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 15s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 4s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 58s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 55s{color} | | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 9s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 8s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 45s{color} | | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 1s{color} | | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || || | {color:red}-1{color} | {color:red} unit {color} | {color:red}
[jira] [Commented] (HDFS-15634) Invalidate block on decommissioning DataNode after replication
[ https://issues.apache.org/jira/browse/HDFS-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215676#comment-17215676 ] Stephen O'Donnell commented on HDFS-15634: -- The issue you have seen is a fairly extreme example - not too many clusters will have 200 nodes decommissioned at the same time I suspect. The empty node problem is valid concern on smaller clusters and in the decommission-recommission case, the empty node may never catch up to the other nodes, as the default block placement policy picks nodes randomly. The long lock hold is a problem even when a node (or perhaps rack) goes dead unexpectedly. I think it would be better to try to fix that generally, rather than doing something special for decommission. I am also not really comfortable changing the current "non destructive" decommission flow to something that removes the blocks from the DN. If you look at HeartBeatManager.heartbeatCheck(...), it seems to handle only 1 DN as dead on each check interval. The check interval is either 5 minute by default or 30 seconds if "dfs.namenode.avoid.write.stale.datanode" is true. Then it ultimately calls BlockManager.removeBlocksAssociatedTo(...) which does the expensive work of removing the blocks. In that method, I wonder if we could drop and re-take the write lock periodically so this does not hold the lock for too long? {code} /** Remove the blocks associated to the given datanode. */ void removeBlocksAssociatedTo(final DatanodeDescriptor node) { providedStorageMap.removeDatanode(node); for (DatanodeStorageInfo storage : node.getStorageInfos()) { final Iterator it = storage.getBlockIterator(); //add the BlockInfos to a new collection as the //returned iterator is not modifiable. Collection toRemove = new ArrayList<>(); while (it.hasNext()) { toRemove.add(it.next()); } // Could we drop and re-take the write lock in this loop every 1000 blocks? for (BlockInfo b : toRemove) { removeStoredBlock(b, node); } } // Remove all pending DN messages referencing this DN. pendingDNMessages.removeAllMessagesForDatanode(node); node.resetBlocks(); invalidateBlocks.remove(node); } {code} We see some nodes with 5M, 10M or even more blocks sometimes, so this would help them in general. I am not sure if there would be any negative consequences of dropping and retaking the write lock in this scenario? > Invalidate block on decommissioning DataNode after replication > -- > > Key: HDFS-15634 > URL: https://issues.apache.org/jira/browse/HDFS-15634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Attachments: write lock.png > > Time Spent: 1h > Remaining Estimate: 0h > > Right now when a DataNode starts decommission, Namenode will mark it as > decommissioning and its blocks will be replicated over to different > DataNodes, then marked as decommissioned. These blocks are not touched since > they are not counted as live replicas. > Proposal: Invalidate these blocks once they are replicated and there are > enough live replicas in the cluster. > Reason: A recent shutdown of decommissioned datanodes to finished the flow > caused Namenode latency spike since namenode needs to remove all of the > blocks from its memory and this step requires holding write lock. If we have > gradually invalidated these blocks the deletion will be much easier and > faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15634) Invalidate block on decommissioning DataNode after replication
[ https://issues.apache.org/jira/browse/HDFS-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215654#comment-17215654 ] Fengnan Li commented on HDFS-15634: --- Thanks for the comment [~sodonnell]. More context here: We were decommissioning to swap with better hardware so these datanodes would not be used anymore. We are running 2.8.2 with about 350K blocks on each datanode after they are decomed. We stopped ~200 datanodes at once (sounds crazy... and it does) I attached the graph for the writelock at that time. !write lock.png! The goal of the whole ticket is not really about whether there will be missing block or not. And I don't think there will be unless you are decommissioning datanodes with all replicas at the same time which is out of the discussion. What I am proposing is to mitigate the impact to namenode performance. From this perspective, recommissioning a datanode with full blocks or stopping the node to have namenode clean up all the blocks at once are not ideal. Balancing is actually not a concern for a large cluster (>3k datanodes) with high traffic since other blocks will soon fill up this new datanode. > Invalidate block on decommissioning DataNode after replication > -- > > Key: HDFS-15634 > URL: https://issues.apache.org/jira/browse/HDFS-15634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Attachments: write lock.png > > Time Spent: 1h > Remaining Estimate: 0h > > Right now when a DataNode starts decommission, Namenode will mark it as > decommissioning and its blocks will be replicated over to different > DataNodes, then marked as decommissioned. These blocks are not touched since > they are not counted as live replicas. > Proposal: Invalidate these blocks once they are replicated and there are > enough live replicas in the cluster. > Reason: A recent shutdown of decommissioned datanodes to finished the flow > caused Namenode latency spike since namenode needs to remove all of the > blocks from its memory and this step requires holding write lock. If we have > gradually invalidated these blocks the deletion will be much easier and > faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15634) Invalidate block on decommissioning DataNode after replication
[ https://issues.apache.org/jira/browse/HDFS-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fengnan Li updated HDFS-15634: -- Attachment: write lock.png > Invalidate block on decommissioning DataNode after replication > -- > > Key: HDFS-15634 > URL: https://issues.apache.org/jira/browse/HDFS-15634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Attachments: write lock.png > > Time Spent: 1h > Remaining Estimate: 0h > > Right now when a DataNode starts decommission, Namenode will mark it as > decommissioning and its blocks will be replicated over to different > DataNodes, then marked as decommissioned. These blocks are not touched since > they are not counted as live replicas. > Proposal: Invalidate these blocks once they are replicated and there are > enough live replicas in the cluster. > Reason: A recent shutdown of decommissioned datanodes to finished the flow > caused Namenode latency spike since namenode needs to remove all of the > blocks from its memory and this step requires holding write lock. If we have > gradually invalidated these blocks the deletion will be much easier and > faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15630) RBF: Fix wrong client IP info in CallerContext when requests mount points with multi-destinations.
[ https://issues.apache.org/jira/browse/HDFS-15630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215650#comment-17215650 ] Íñigo Goiri commented on HDFS-15630: In addition to [~ferhui] comments, a couple minor ones. Let's put the full javadoc for the old invokeMethod signature and just add "Set clientIp and callerContext get from server context." at the end of the description. > RBF: Fix wrong client IP info in CallerContext when requests mount points > with multi-destinations. > -- > > Key: HDFS-15630 > URL: https://issues.apache.org/jira/browse/HDFS-15630 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Chengwei Wang >Assignee: Chengwei Wang >Priority: Major > Attachments: HDFS-15630.001.patch, HDFS-15630.002.patch, > HDFS-15630.003.patch > > > There are two issues about client IP info in CallerContext when we try to > request mount points with multi-destinations. > # the clientIp would duplicate in CallerContext when > RouterRpcClient#invokeSequential. > # the clientIp would miss in CallerContext when > RouterRpcClient#invokeConcurrent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14383) Compute datanode load based on StoragePolicy
[ https://issues.apache.org/jira/browse/HDFS-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215648#comment-17215648 ] Íñigo Goiri commented on HDFS-14383: [^HDFS-14383-02.patch] LGTM. +1 [~LiJinglun], can you verify this solves your use case too? > Compute datanode load based on StoragePolicy > > > Key: HDFS-14383 > URL: https://issues.apache.org/jira/browse/HDFS-14383 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.7.3, 3.1.2 >Reporter: Karthik Palanisamy >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14383-01.patch, HDFS-14383-02.patch > > > Datanode load check logic needs to be changed because existing computation > will not consider StoragePolicy. > DatanodeManager#getInServiceXceiverAverage > {code} > public double getInServiceXceiverAverage() { > double avgLoad = 0; > final int nodes = getNumDatanodesInService(); > if (nodes != 0) { > final int xceivers = heartbeatManager > .getInServiceXceiverCount(); > avgLoad = (double)xceivers/nodes; > } > return avgLoad; > } > {code} > > For example: with 10 nodes (HOT), average 50 xceivers and 90 nodes (COLD) > with average 10 xceivers the calculated threshold by the NN is 28 (((500 + > 900)/100)*2), which means those 10 nodes (the whole HOT tier) becomes > unavailable when the COLD tier nodes are barely in use. Turning this check > off helps to mitigate this issue, however the > dfs.namenode.replication.considerLoad helps to "balance" the load of the DNs, > upon turning it off can lead to situations where specific DNs are > "overloaded". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency
[ https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215646#comment-17215646 ] Ahmed Hussein commented on HDFS-15618: -- Thanks [~kihwal]! I thinks that's a good idea to have a default value for the MiniDFSCluster. I added a default value in the {{MiniDFSCluster builder}}. > Improve datanode shutdown latency > - > > Key: HDFS-15618 > URL: https://issues.apache.org/jira/browse/HDFS-15618 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch, > HDFS-15618.003.patch, HDFS-15618.004.patch > > > The shutdown of Datanode is a very long latency. A block scanner waits for 5 > minutes to join on each VolumeScanner thread. > Since the scanners are daemon threads and do not alter the block content, it > is safe to ignore such conditions on shutdown of Datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15618) Improve datanode shutdown latency
[ https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated HDFS-15618: - Attachment: HDFS-15618.004.patch > Improve datanode shutdown latency > - > > Key: HDFS-15618 > URL: https://issues.apache.org/jira/browse/HDFS-15618 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch, > HDFS-15618.003.patch, HDFS-15618.004.patch > > > The shutdown of Datanode is a very long latency. A block scanner waits for 5 > minutes to join on each VolumeScanner thread. > Since the scanners are daemon threads and do not alter the block content, it > is safe to ignore such conditions on shutdown of Datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15628) HttpFS server throws NPE if a file is a symlink
[ https://issues.apache.org/jira/browse/HDFS-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215497#comment-17215497 ] Kihwal Lee commented on HDFS-15628: --- I've committed this to trunk, branch-3.3, branch-3.2 and branch-3.1. Thanks for working on this, [~ahussein]. > HttpFS server throws NPE if a file is a symlink > --- > > Key: HDFS-15628 > URL: https://issues.apache.org/jira/browse/HDFS-15628 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, httpfs >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HDFS-15628.001.patch, HDFS-15628.002.patch > > > If a directory containing a symlink is listed, the client (WebHfdsFileSystem) > blows up with a NPE. If {{type}} is {{SYMLINK}}, there must be {{symlink}} > field whose value is the link target string. HttpFS returns a response > without {{symlink}} filed. {{WebHfdsFileSystem}} assumes it is there for a > symlink and blindly tries to parse it, causing NPE. > This is not an issue if the destination cluster does not have symlinks > enabled. > > {code:bash} > java.io.IOException: localhost:55901: Response decoding failure: > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathResponseRunner.getResponse(WebHdfsFileSystem.java:967) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:816) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:638) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:676) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:672) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1731) > at > org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testListSymLinkStatus(BaseTestHttpFSWith.java:388) > at > org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.operation(BaseTestHttpFSWith.java:1230) > at > org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testOperation(BaseTestHttpFSWith.java:1363) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.hadoop.test.TestHdfsHelper$HdfsStatement.evaluate(TestHdfsHelper.java:95) > at > org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106) > at > org.apache.hadoop.test.TestExceptionHelper$1.evaluate(TestExceptionHelper.java:42) > at > org.apache.hadoop.test.TestJettyHelper$1.evaluate(TestJettyHelper.java:74) > at > org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at
[jira] [Updated] (HDFS-15628) HttpFS server throws NPE if a file is a symlink
[ https://issues.apache.org/jira/browse/HDFS-15628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-15628: -- Fix Version/s: 3.1.5 > HttpFS server throws NPE if a file is a symlink > --- > > Key: HDFS-15628 > URL: https://issues.apache.org/jira/browse/HDFS-15628 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, httpfs >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HDFS-15628.001.patch, HDFS-15628.002.patch > > > If a directory containing a symlink is listed, the client (WebHfdsFileSystem) > blows up with a NPE. If {{type}} is {{SYMLINK}}, there must be {{symlink}} > field whose value is the link target string. HttpFS returns a response > without {{symlink}} filed. {{WebHfdsFileSystem}} assumes it is there for a > symlink and blindly tries to parse it, causing NPE. > This is not an issue if the destination cluster does not have symlinks > enabled. > > {code:bash} > java.io.IOException: localhost:55901: Response decoding failure: > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathResponseRunner.getResponse(WebHdfsFileSystem.java:967) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:816) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:638) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:676) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:672) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1731) > at > org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testListSymLinkStatus(BaseTestHttpFSWith.java:388) > at > org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.operation(BaseTestHttpFSWith.java:1230) > at > org.apache.hadoop.fs.http.client.BaseTestHttpFSWith.testOperation(BaseTestHttpFSWith.java:1363) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.apache.hadoop.test.TestHdfsHelper$HdfsStatement.evaluate(TestHdfsHelper.java:95) > at > org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106) > at > org.apache.hadoop.test.TestExceptionHelper$1.evaluate(TestExceptionHelper.java:42) > at > org.apache.hadoop.test.TestJettyHelper$1.evaluate(TestJettyHelper.java:74) > at > org.apache.hadoop.test.TestDirHelper$1.evaluate(TestDirHelper.java:106) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at
[jira] [Updated] (HDFS-15627) Audit log deletes before collecting blocks
[ https://issues.apache.org/jira/browse/HDFS-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-15627: -- Fix Version/s: 3.2.3 3.1.5 3.4.0 3.3.1 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) I've committed this to trunk, branch-3.3, branch-3.2 and branch-3.1. Thanks for working on this [~ahussein]. > Audit log deletes before collecting blocks > -- > > Key: HDFS-15627 > URL: https://issues.apache.org/jira/browse/HDFS-15627 > Project: Hadoop HDFS > Issue Type: Bug > Components: logging, namenode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HDFS-15627.001.patch > > > Deletes currently collect blocks in the write lock, write the edit, > incrementally block delete, finally +audit log+. It should be collect blocks, > edit log, +audit log+, incremental delete. Once the edit is durable it's > consistent to audit log the delete. There is no sense in deferring the audit > into the indeterminate future. > The problem occurs when thereto server hung due to large deletes but it won't > be easy to identify the problem. That should have been easily identified as > the first delete logged after the hang. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15627) Audit log deletes before collecting blocks
[ https://issues.apache.org/jira/browse/HDFS-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-15627: -- Summary: Audit log deletes before collecting blocks (was: Audit log deletes after edit is written) > Audit log deletes before collecting blocks > -- > > Key: HDFS-15627 > URL: https://issues.apache.org/jira/browse/HDFS-15627 > Project: Hadoop HDFS > Issue Type: Bug > Components: logging, namenode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Attachments: HDFS-15627.001.patch > > > Deletes currently collect blocks in the write lock, write the edit, > incrementally block delete, finally +audit log+. It should be collect blocks, > edit log, +audit log+, incremental delete. Once the edit is durable it's > consistent to audit log the delete. There is no sense in deferring the audit > into the indeterminate future. > The problem occurs when thereto server hung due to large deletes but it won't > be easy to identify the problem. That should have been easily identified as > the first delete logged after the hang. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15627) Audit log deletes after edit is written
[ https://issues.apache.org/jira/browse/HDFS-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215476#comment-17215476 ] Kihwal Lee commented on HDFS-15627: --- +1 lgtm > Audit log deletes after edit is written > --- > > Key: HDFS-15627 > URL: https://issues.apache.org/jira/browse/HDFS-15627 > Project: Hadoop HDFS > Issue Type: Bug > Components: logging, namenode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Attachments: HDFS-15627.001.patch > > > Deletes currently collect blocks in the write lock, write the edit, > incrementally block delete, finally +audit log+. It should be collect blocks, > edit log, +audit log+, incremental delete. Once the edit is durable it's > consistent to audit log the delete. There is no sense in deferring the audit > into the indeterminate future. > The problem occurs when thereto server hung due to large deletes but it won't > be easy to identify the problem. That should have been easily identified as > the first delete logged after the hang. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency
[ https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215454#comment-17215454 ] Kihwal Lee commented on HDFS-15618: --- It can be as fancy as adding a builder method for setting it (for exceptional cases) with a default value of 30 seconds. Or simply set it to 30 in places like {{startDataNodes()}}. > Improve datanode shutdown latency > - > > Key: HDFS-15618 > URL: https://issues.apache.org/jira/browse/HDFS-15618 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch, > HDFS-15618.003.patch > > > The shutdown of Datanode is a very long latency. A block scanner waits for 5 > minutes to join on each VolumeScanner thread. > Since the scanners are daemon threads and do not alter the block content, it > is safe to ignore such conditions on shutdown of Datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency
[ https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215452#comment-17215452 ] Kihwal Lee commented on HDFS-15618: --- In production, there is no harm in exiting after waiting for 5 seconds. But in junit, as you pointed out, it might cause more failures when the environment is slow. We can set the timeout to something like 30 seconds in the mini dfs cluster's base config. > Improve datanode shutdown latency > - > > Key: HDFS-15618 > URL: https://issues.apache.org/jira/browse/HDFS-15618 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Attachments: HDFS-15618.001.patch, HDFS-15618.002.patch, > HDFS-15618.003.patch > > > The shutdown of Datanode is a very long latency. A block scanner waits for 5 > minutes to join on each VolumeScanner thread. > Since the scanners are daemon threads and do not alter the block content, it > is safe to ignore such conditions on shutdown of Datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf
[ https://issues.apache.org/jira/browse/HDFS-15635?focusedWorklogId=501492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501492 ] ASF GitHub Bot logged work on HDFS-15635: - Author: ASF GitHub Bot Created on: 16/Oct/20 10:01 Start Date: 16/Oct/20 10:01 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2389: URL: https://github.com/apache/hadoop/pull/2389#issuecomment-709951351 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 6s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 14s | | trunk passed | | +1 :green_heart: | compile | 21m 17s | | trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | compile | 17m 49s | | trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | checkstyle | 0m 46s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 28s | | trunk passed | | +1 :green_heart: | shadedclient | 18m 37s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 28s | | trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javadoc | 1m 33s | | trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +0 :ok: | spotbugs | 2m 34s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 2m 32s | | trunk passed | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 57s | | the patch passed | | +1 :green_heart: | compile | 23m 40s | | the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | -1 :x: | javac | 23m 40s | [/diff-compile-javac-root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2389/1/artifact/out/diff-compile-javac-root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt) | root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 generated 1 new + 2055 unchanged - 1 fixed = 2056 total (was 2056) | | +1 :green_heart: | compile | 22m 36s | | the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | javac | 22m 36s | | the patch passed | | -0 :warning: | checkstyle | 0m 49s | [/diff-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2389/1/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 7 new + 8 unchanged - 0 fixed = 15 total (was 8) | | +1 :green_heart: | mvnsite | 1m 39s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | shadedclient | 19m 20s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 31s | | the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 | | +1 :green_heart: | javadoc | 1m 41s | | the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 | | +1 :green_heart: | findbugs | 3m 6s | | the patch passed | _ Other Tests _ | | -1 :x: | unit | 11m 2s | [/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2389/1/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt) | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 52s | | The patch does not generate ASF License warnings. | | | | 186m 15s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.fs.viewfs.TestViewFSOverloadSchemeCentralMountTableConfig | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2389/1/artifact/out/Dockerfile | | GITHUB PR |
[jira] [Updated] (HDFS-15637) Viewfs should make mount-table to read from central place
[ https://issues.apache.org/jira/browse/HDFS-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated HDFS-15637: Component/s: viewfs > Viewfs should make mount-table to read from central place > - > > Key: HDFS-15637 > URL: https://issues.apache.org/jira/browse/HDFS-15637 > Project: Hadoop HDFS > Issue Type: Improvement > Components: viewfs >Reporter: Junfan Zhang >Priority: Major > > Like [HDFS-15637|https://issues.apache.org/jira/browse/HDFS-15637]. > Viewfs should make mount-table to read from central place to solve the > problem of difficult mount-table conf update -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15637) Viewfs should make mount-table to read from central place
[ https://issues.apache.org/jira/browse/HDFS-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215296#comment-17215296 ] Junfan Zhang edited comment on HDFS-15637 at 10/16/20, 9:42 AM: Hi [~umamaheswararao]. [~shv], [~abhishekd], [~hexiaoqiao]. Can you please take a look at the ISSUE, thanks. If ok, i will take it over. was (Author: zuston): Hi [~umamaheswararao] Can you please take a look at the ISSUE, thanks. If ok, i will take it over. > Viewfs should make mount-table to read from central place > - > > Key: HDFS-15637 > URL: https://issues.apache.org/jira/browse/HDFS-15637 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Junfan Zhang >Priority: Major > > Like [HDFS-15637|https://issues.apache.org/jira/browse/HDFS-15637]. > Viewfs should make mount-table to read from central place to solve the > problem of difficult mount-table conf update -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15637) Viewfs should make mount-table to read from central place
[ https://issues.apache.org/jira/browse/HDFS-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215296#comment-17215296 ] Junfan Zhang commented on HDFS-15637: - Hi [~umamaheswararao] Can you please take a look at the ISSUE, thanks. If ok, i will take it over. > Viewfs should make mount-table to read from central place > - > > Key: HDFS-15637 > URL: https://issues.apache.org/jira/browse/HDFS-15637 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Junfan Zhang >Priority: Major > > Like [HDFS-15637|https://issues.apache.org/jira/browse/HDFS-15637]. > Viewfs should make mount-table to read from central place to solve the > problem of difficult mount-table conf update -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15637) Viewfs should make mount-table to read from central place
[ https://issues.apache.org/jira/browse/HDFS-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated HDFS-15637: Description: Like [HDFS-15637|https://issues.apache.org/jira/browse/HDFS-15637]. Viewfs should make mount-table to read from central place to solve the problem of difficult mount-table conf update > Viewfs should make mount-table to read from central place > - > > Key: HDFS-15637 > URL: https://issues.apache.org/jira/browse/HDFS-15637 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Junfan Zhang >Priority: Major > > Like [HDFS-15637|https://issues.apache.org/jira/browse/HDFS-15637]. > Viewfs should make mount-table to read from central place to solve the > problem of difficult mount-table conf update -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15637) Viewfs should make mount-table to read from central place
Junfan Zhang created HDFS-15637: --- Summary: Viewfs should make mount-table to read from central place Key: HDFS-15637 URL: https://issues.apache.org/jira/browse/HDFS-15637 Project: Hadoop HDFS Issue Type: Improvement Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15634) Invalidate block on decommissioning DataNode after replication
[ https://issues.apache.org/jira/browse/HDFS-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215288#comment-17215288 ] Stephen O'Donnell edited comment on HDFS-15634 at 10/16/20, 9:27 AM: - {quote} Proposal: Invalidate these blocks once they are replicated and there are enough live replicas in the cluster. {quote} Looking at the PR, you are adding these blocks to addToInvalidates(...) which will actually remove the replicas from the DNs. I am not sure this is a good idea, for a few reasons: 1. Right now, a decommissioned DN is untouched by the process - if something goes wrong with decommission (which we have seen happen) we can just recommission the node again and know all the blocks are still safely present. 2. I seem to recall there are some edge cases where a decommissioned but still online replica can be read. 3. On some clusters, nodes are decommissioned for maintenance (yes, they should use maintenance mode, but some don't) such as OS upgrades and then recommissioned. In these cases, when the DN rejoins, the blocks will become over replicated and then the NN will remove replicas randomly. This is arguably better than adding back an empty node, which may require running the balancer to move data onto it. If we remove the blocks from the DN while it is decommissioning, then on recommission we can only ever add back an empty node. {quote} A recent shutdown of decommissioned datanodes to finished the flow caused Namenode latency spike since namenode needs to remove all of the blocks from its memory and this step requires holding write lock. If we have gradually invalidated these blocks the deletion will be much easier and faster. {quote} 1. What version were you running when you saw this problem? 2. How many blocks approximately were on the DNs which were stopped after decommission completed? 3. How many decommissioned hosts were stopped when this happened? 4. How long did the NN hold the write lock for approximately? I am wondering if there would be a better way to handle this, possibly yielding the write lock which removing the blocks periodically, as this same problem would exist for a node going dead unexpectedly and not just during decommission. was (Author: sodonnell): {quote} Proposal: Invalidate these blocks once they are replicated and there are enough live replicas in the cluster. {quote} Looking at the PR, you are adding these blocks to addToInvalidates(...) which will actually remove the replicas from the DNs. I am not sure this is a good idea, for a few reasons: 1. Right now, a decommissioned DN is untouched by the process - if something goes wrong with decommission (which we have seen happen) we can just recommission the node again and know all the blocks are still safely present. 2. I seem to recall there are some edge cases where a decommissioned but still online replica can be read. 3. On some clusters, nodes are decommissioned for maintenance (yes, they should use maintenance mode, but some don't) such as OS upgrades and then recommissioned. In these cases, when the DN rejoins, the blocks will become over replicated and then the NN will remove replicas randomly. This is arguably better than adding back an empty node, which may require running the balancer to move data onto it. If we remove the blocks from the DN while it is decommissioning, then on recommission we can only ever add back an empty node. {quote} A recent shutdown of decommissioned datanodes to finished the flow caused Namenode latency spike since namenode needs to remove all of the blocks from its memory and this step requires holding write lock. If we have gradually invalidated these blocks the deletion will be much easier and faster. {quote} What version were you running when you saw this problem? How many blocks approximately were on the DNs which were stopped after decommission completed? How many decommissioned hosts were stopped when this happened? I am wondering if there would be a better way to handle this, possibly yielding the write lock which removing the blocks periodically, as this same problem would exist for a node going dead unexpectedly and not just during decommission. > Invalidate block on decommissioning DataNode after replication > -- > > Key: HDFS-15634 > URL: https://issues.apache.org/jira/browse/HDFS-15634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Right now when a DataNode starts decommission, Namenode will mark it as > decommissioning and its blocks will be replicated over to different
[jira] [Commented] (HDFS-15634) Invalidate block on decommissioning DataNode after replication
[ https://issues.apache.org/jira/browse/HDFS-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215288#comment-17215288 ] Stephen O'Donnell commented on HDFS-15634: -- {quote} Proposal: Invalidate these blocks once they are replicated and there are enough live replicas in the cluster. {quote} Looking at the PR, you are adding these blocks to addToInvalidates(...) which will actually remove the replicas from the DNs. I am not sure this is a good idea, for a few reasons: 1. Right now, a decommissioned DN is untouched by the process - if something goes wrong with decommission (which we have seen happen) we can just recommission the node again and know all the blocks are still safely present. 2. I seem to recall there are some edge cases where a decommissioned but still online replica can be read. 3. On some clusters, nodes are decommissioned for maintenance (yes, they should use maintenance mode, but some don't) such as OS upgrades and then recommissioned. In these cases, when the DN rejoins, the blocks will become over replicated and then the NN will remove replicas randomly. This is arguably better than adding back an empty node, which may require running the balancer to move data onto it. If we remove the blocks from the DN while it is decommissioning, then on recommission we can only ever add back an empty node. {quote} A recent shutdown of decommissioned datanodes to finished the flow caused Namenode latency spike since namenode needs to remove all of the blocks from its memory and this step requires holding write lock. If we have gradually invalidated these blocks the deletion will be much easier and faster. {quote} What version were you running when you saw this problem? How many blocks approximately were on the DNs which were stopped after decommission completed? How many decommissioned hosts were stopped when this happened? I am wondering if there would be a better way to handle this, possibly yielding the write lock which removing the blocks periodically, as this same problem would exist for a node going dead unexpectedly and not just during decommission. > Invalidate block on decommissioning DataNode after replication > -- > > Key: HDFS-15634 > URL: https://issues.apache.org/jira/browse/HDFS-15634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Right now when a DataNode starts decommission, Namenode will mark it as > decommissioning and its blocks will be replicated over to different > DataNodes, then marked as decommissioned. These blocks are not touched since > they are not counted as live replicas. > Proposal: Invalidate these blocks once they are replicated and there are > enough live replicas in the cluster. > Reason: A recent shutdown of decommissioned datanodes to finished the flow > caused Namenode latency spike since namenode needs to remove all of the > blocks from its memory and this step requires holding write lock. If we have > gradually invalidated these blocks the deletion will be much easier and > faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf
[ https://issues.apache.org/jira/browse/HDFS-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated HDFS-15635: Description: According to HDFS-15289, the default mountable loader is {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}. In some scenarios, users want to implement the mount table loader by themselves, so it is necessary to dynamically configure the loader. cc [~shv], [~abhishekd], [~hexiaoqiao] was: According to HDFS-15289, the default mountable loader is {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}. In some scenarios, users want to implement the mount table loader by themselves, so it is necessary to dynamically configure the loader. cc [~shv], [~abhishekd] > ViewFileSystemOverloadScheme support specifying mount table loader imp > through conf > --- > > Key: HDFS-15635 > URL: https://issues.apache.org/jira/browse/HDFS-15635 > Project: Hadoop HDFS > Issue Type: Improvement > Components: viewfsOverloadScheme >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > According to HDFS-15289, the default mountable loader is > {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}. > In some scenarios, users want to implement the mount table loader by > themselves, so it is necessary to dynamically configure the loader. > > cc [~shv], [~abhishekd], [~hexiaoqiao] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf
[ https://issues.apache.org/jira/browse/HDFS-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215223#comment-17215223 ] Junfan Zhang edited comment on HDFS-15635 at 10/16/20, 8:26 AM: Hi [~umamaheswararao] Can you please take a look at the PR, thanks. was (Author: zuston): Hi [~umamaheswararao] Please review it.Thanks~ > ViewFileSystemOverloadScheme support specifying mount table loader imp > through conf > --- > > Key: HDFS-15635 > URL: https://issues.apache.org/jira/browse/HDFS-15635 > Project: Hadoop HDFS > Issue Type: Improvement > Components: viewfsOverloadScheme >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > According to HDFS-15289, the default mountable loader is > {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}. > In some scenarios, users want to implement the mount table loader by > themselves, so it is necessary to dynamically configure the loader. > > cc [~shv], [~abhishekd] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf
[ https://issues.apache.org/jira/browse/HDFS-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated HDFS-15635: Description: According to HDFS-15289, the default mountable loader is {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}. In some scenarios, users want to implement the mount table loader by themselves, so it is necessary to dynamically configure the loader. cc [~shv], [~abhishekd] was: According to HDFS-15289, the default mountable loader is {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}. In some scenarios, users want to implement the mount table loader by themselves, so it is necessary to dynamically configure the loader. > ViewFileSystemOverloadScheme support specifying mount table loader imp > through conf > --- > > Key: HDFS-15635 > URL: https://issues.apache.org/jira/browse/HDFS-15635 > Project: Hadoop HDFS > Issue Type: Improvement > Components: viewfsOverloadScheme >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > According to HDFS-15289, the default mountable loader is > {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}. > In some scenarios, users want to implement the mount table loader by > themselves, so it is necessary to dynamically configure the loader. > > cc [~shv], [~abhishekd] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15636) NameNode computes load by group when choosing datanodes.
[ https://issues.apache.org/jira/browse/HDFS-15636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinglun resolved HDFS-15636. Resolution: Duplicate Duplicate with HDFS-14383 > NameNode computes load by group when choosing datanodes. > > > Key: HDFS-15636 > URL: https://issues.apache.org/jira/browse/HDFS-15636 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > > We have an HDFS cluster used for HBase with 251 ssd datanodes and 30 hdd > datanodes. The HOT files are stored with ALL_SSD and cold ones are stored > with HOT. There is a big chance the NameNode couldn't choose enough nodes for > writing disk files(with storage policy HOT) because of 'NODE_TOO_BUSY'. A > temporary solution is to increase the > 'dfs.namenode.redundancy.considerLoad.factor'. But that may cause the > unbalance of load of all the datanodes. > We should let the NameNode compute load by group. The ssd nodes and hdd nodes > are computed separately and each group has its own average load. When the > NameNode chooses a hdd node it only compares the node's load with > the average load of the hdd group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14383) Compute datanode load based on StoragePolicy
[ https://issues.apache.org/jira/browse/HDFS-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215242#comment-17215242 ] Jinglun commented on HDFS-14383: I met the same problem recently. This patch makes sense to me. Thanks [~ayushtkn] your working. > Compute datanode load based on StoragePolicy > > > Key: HDFS-14383 > URL: https://issues.apache.org/jira/browse/HDFS-14383 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.7.3, 3.1.2 >Reporter: Karthik Palanisamy >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14383-01.patch, HDFS-14383-02.patch > > > Datanode load check logic needs to be changed because existing computation > will not consider StoragePolicy. > DatanodeManager#getInServiceXceiverAverage > {code} > public double getInServiceXceiverAverage() { > double avgLoad = 0; > final int nodes = getNumDatanodesInService(); > if (nodes != 0) { > final int xceivers = heartbeatManager > .getInServiceXceiverCount(); > avgLoad = (double)xceivers/nodes; > } > return avgLoad; > } > {code} > > For example: with 10 nodes (HOT), average 50 xceivers and 90 nodes (COLD) > with average 10 xceivers the calculated threshold by the NN is 28 (((500 + > 900)/100)*2), which means those 10 nodes (the whole HOT tier) becomes > unavailable when the COLD tier nodes are barely in use. Turning this check > off helps to mitigate this issue, however the > dfs.namenode.replication.considerLoad helps to "balance" the load of the DNs, > upon turning it off can lead to situations where specific DNs are > "overloaded". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15636) NameNode computes load by group when choosing datanodes.
[ https://issues.apache.org/jira/browse/HDFS-15636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215232#comment-17215232 ] Jinglun commented on HDFS-15636: Hi [~ayushtkn], seems related, I'll give it a check. > NameNode computes load by group when choosing datanodes. > > > Key: HDFS-15636 > URL: https://issues.apache.org/jira/browse/HDFS-15636 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > > We have an HDFS cluster used for HBase with 251 ssd datanodes and 30 hdd > datanodes. The HOT files are stored with ALL_SSD and cold ones are stored > with HOT. There is a big chance the NameNode couldn't choose enough nodes for > writing disk files(with storage policy HOT) because of 'NODE_TOO_BUSY'. A > temporary solution is to increase the > 'dfs.namenode.redundancy.considerLoad.factor'. But that may cause the > unbalance of load of all the datanodes. > We should let the NameNode compute load by group. The ssd nodes and hdd nodes > are computed separately and each group has its own average load. When the > NameNode chooses a hdd node it only compares the node's load with > the average load of the hdd group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15636) NameNode computes load by group when choosing datanodes.
[ https://issues.apache.org/jira/browse/HDFS-15636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215231#comment-17215231 ] Jinglun commented on HDFS-15636: We can implement this by 2 steps: # Let NameNode support computing load with group. The NameNode should resolve group for each datanode and count 'nodes in service' and 'xceiver count' for each group(in DatanodeStats). # Add a new BlockPlacementPolicy which considers load with group. > NameNode computes load by group when choosing datanodes. > > > Key: HDFS-15636 > URL: https://issues.apache.org/jira/browse/HDFS-15636 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > > We have an HDFS cluster used for HBase with 251 ssd datanodes and 30 hdd > datanodes. The HOT files are stored with ALL_SSD and cold ones are stored > with HOT. There is a big chance the NameNode couldn't choose enough nodes for > writing disk files(with storage policy HOT) because of 'NODE_TOO_BUSY'. A > temporary solution is to increase the > 'dfs.namenode.redundancy.considerLoad.factor'. But that may cause the > unbalance of load of all the datanodes. > We should let the NameNode compute load by group. The ssd nodes and hdd nodes > are computed separately and each group has its own average load. When the > NameNode chooses a hdd node it only compares the node's load with > the average load of the hdd group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15636) NameNode computes load by group when choosing datanodes.
[ https://issues.apache.org/jira/browse/HDFS-15636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215228#comment-17215228 ] Ayush Saxena commented on HDFS-15636: - Similar to HDFS-14383? > NameNode computes load by group when choosing datanodes. > > > Key: HDFS-15636 > URL: https://issues.apache.org/jira/browse/HDFS-15636 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > > We have an HDFS cluster used for HBase with 251 ssd datanodes and 30 hdd > datanodes. The HOT files are stored with ALL_SSD and cold ones are stored > with HOT. There is a big chance the NameNode couldn't choose enough nodes for > writing disk files(with storage policy HOT) because of 'NODE_TOO_BUSY'. A > temporary solution is to increase the > 'dfs.namenode.redundancy.considerLoad.factor'. But that may cause the > unbalance of load of all the datanodes. > We should let the NameNode compute load by group. The ssd nodes and hdd nodes > are computed separately and each group has its own average load. When the > NameNode chooses a hdd node it only compares the node's load with > the average load of the hdd group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15636) NameNode computes load by group when choosing datanodes.
Jinglun created HDFS-15636: -- Summary: NameNode computes load by group when choosing datanodes. Key: HDFS-15636 URL: https://issues.apache.org/jira/browse/HDFS-15636 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jinglun We have an HDFS cluster used for HBase with 251 ssd datanodes and 30 hdd datanodes. The HOT files are stored with ALL_SSD and cold ones are stored with HOT. There is a big chance the NameNode couldn't choose enough nodes for writing disk files(with storage policy HOT) because of 'NODE_TOO_BUSY'. A temporary solution is to increase the 'dfs.namenode.redundancy.considerLoad.factor'. But that may cause the unbalance of load of all the datanodes. We should let the NameNode compute load by group. The ssd nodes and hdd nodes are computed separately and each group has its own average load. When the NameNode chooses a hdd node it only compares the node's load with the average load of the hdd group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15636) NameNode computes load by group when choosing datanodes.
[ https://issues.apache.org/jira/browse/HDFS-15636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinglun reassigned HDFS-15636: -- Assignee: Jinglun > NameNode computes load by group when choosing datanodes. > > > Key: HDFS-15636 > URL: https://issues.apache.org/jira/browse/HDFS-15636 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > > We have an HDFS cluster used for HBase with 251 ssd datanodes and 30 hdd > datanodes. The HOT files are stored with ALL_SSD and cold ones are stored > with HOT. There is a big chance the NameNode couldn't choose enough nodes for > writing disk files(with storage policy HOT) because of 'NODE_TOO_BUSY'. A > temporary solution is to increase the > 'dfs.namenode.redundancy.considerLoad.factor'. But that may cause the > unbalance of load of all the datanodes. > We should let the NameNode compute load by group. The ssd nodes and hdd nodes > are computed separately and each group has its own average load. When the > NameNode chooses a hdd node it only compares the node's load with > the average load of the hdd group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf
[ https://issues.apache.org/jira/browse/HDFS-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215223#comment-17215223 ] Junfan Zhang commented on HDFS-15635: - Hi [~umamaheswararao] Please review it.Thanks~ > ViewFileSystemOverloadScheme support specifying mount table loader imp > through conf > --- > > Key: HDFS-15635 > URL: https://issues.apache.org/jira/browse/HDFS-15635 > Project: Hadoop HDFS > Issue Type: Improvement > Components: viewfsOverloadScheme >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > According to HDFS-15289, the default mountable loader is > {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}. > > In some scenarios, users want to implement the mount table loader by > themselves, so it is necessary to dynamically configure the loader. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf
[ https://issues.apache.org/jira/browse/HDFS-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junfan Zhang updated HDFS-15635: External issue ID: https://github.com/apache/hadoop/pull/2389 > ViewFileSystemOverloadScheme support specifying mount table loader imp > through conf > --- > > Key: HDFS-15635 > URL: https://issues.apache.org/jira/browse/HDFS-15635 > Project: Hadoop HDFS > Issue Type: Improvement > Components: viewfsOverloadScheme >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > According to HDFS-15289, the default mountable loader is > {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}. > > In some scenarios, users want to implement the mount table loader by > themselves, so it is necessary to dynamically configure the loader. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf
[ https://issues.apache.org/jira/browse/HDFS-15635?focusedWorklogId=501439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501439 ] ASF GitHub Bot logged work on HDFS-15635: - Author: ASF GitHub Bot Created on: 16/Oct/20 06:54 Start Date: 16/Oct/20 06:54 Worklog Time Spent: 10m Work Description: zuston opened a new pull request #2389: URL: https://github.com/apache/hadoop/pull/2389 Link to [HDFS-15635](https://issues.apache.org/jira/browse/HDFS-15635) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 501439) Remaining Estimate: 0h Time Spent: 10m > ViewFileSystemOverloadScheme support specifying mount table loader imp > through conf > --- > > Key: HDFS-15635 > URL: https://issues.apache.org/jira/browse/HDFS-15635 > Project: Hadoop HDFS > Issue Type: Improvement > Components: viewfsOverloadScheme >Reporter: Junfan Zhang >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > According to HDFS-15289, the default mountable loader is > {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}. > > In some scenarios, users want to implement the mount table loader by > themselves, so it is necessary to dynamically configure the loader. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf
[ https://issues.apache.org/jira/browse/HDFS-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-15635: -- Labels: pull-request-available (was: ) > ViewFileSystemOverloadScheme support specifying mount table loader imp > through conf > --- > > Key: HDFS-15635 > URL: https://issues.apache.org/jira/browse/HDFS-15635 > Project: Hadoop HDFS > Issue Type: Improvement > Components: viewfsOverloadScheme >Reporter: Junfan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > According to HDFS-15289, the default mountable loader is > {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}. > > In some scenarios, users want to implement the mount table loader by > themselves, so it is necessary to dynamically configure the loader. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15635) ViewFileSystemOverloadScheme support specifying mount table loader imp through conf
Junfan Zhang created HDFS-15635: --- Summary: ViewFileSystemOverloadScheme support specifying mount table loader imp through conf Key: HDFS-15635 URL: https://issues.apache.org/jira/browse/HDFS-15635 Project: Hadoop HDFS Issue Type: Improvement Components: viewfsOverloadScheme Reporter: Junfan Zhang According to HDFS-15289, the default mountable loader is {{[HCFSMountTableConfigLoader|https://github.com/apache/hadoop/blob/4734c77b4b64b7c6432da4cc32881aba85f94ea1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/HCFSMountTableConfigLoader.java#L35]}}. In some scenarios, users want to implement the mount table loader by themselves, so it is necessary to dynamically configure the loader. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15634) Invalidate block on decommissioning DataNode after replication
[ https://issues.apache.org/jira/browse/HDFS-15634?focusedWorklogId=501431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501431 ] ASF GitHub Bot logged work on HDFS-15634: - Author: ASF GitHub Bot Created on: 16/Oct/20 06:05 Start Date: 16/Oct/20 06:05 Worklog Time Spent: 10m Work Description: fengnanli commented on pull request #2388: URL: https://github.com/apache/hadoop/pull/2388#issuecomment-709817743 Will put another patch with UT soon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 501431) Time Spent: 1h (was: 50m) > Invalidate block on decommissioning DataNode after replication > -- > > Key: HDFS-15634 > URL: https://issues.apache.org/jira/browse/HDFS-15634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Right now when a DataNode starts decommission, Namenode will mark it as > decommissioning and its blocks will be replicated over to different > DataNodes, then marked as decommissioned. These blocks are not touched since > they are not counted as live replicas. > Proposal: Invalidate these blocks once they are replicated and there are > enough live replicas in the cluster. > Reason: A recent shutdown of decommissioned datanodes to finished the flow > caused Namenode latency spike since namenode needs to remove all of the > blocks from its memory and this step requires holding write lock. If we have > gradually invalidated these blocks the deletion will be much easier and > faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15634) Invalidate block on decommissioning DataNode after replication
[ https://issues.apache.org/jira/browse/HDFS-15634?focusedWorklogId=501429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501429 ] ASF GitHub Bot logged work on HDFS-15634: - Author: ASF GitHub Bot Created on: 16/Oct/20 06:04 Start Date: 16/Oct/20 06:04 Worklog Time Spent: 10m Work Description: fengnanli commented on a change in pull request #2388: URL: https://github.com/apache/hadoop/pull/2388#discussion_r506070581 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java ## @@ -3512,7 +3512,11 @@ private Block addStoredBlock(final BlockInfo block, int numUsableReplicas = num.liveReplicas() + num.decommissioning() + num.liveEnteringMaintenanceReplicas(); -if(storedBlock.getBlockUCState() == BlockUCState.COMMITTED && + +// if block is still under construction, then done for now +if (!storedBlock.isCompleteOrCommitted()) { Review comment: I felt quite confused with the original structure since the early return was put after the statements it is trying to avoid.. I can make it a single return no big deal. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 501429) Time Spent: 40m (was: 0.5h) > Invalidate block on decommissioning DataNode after replication > -- > > Key: HDFS-15634 > URL: https://issues.apache.org/jira/browse/HDFS-15634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Right now when a DataNode starts decommission, Namenode will mark it as > decommissioning and its blocks will be replicated over to different > DataNodes, then marked as decommissioned. These blocks are not touched since > they are not counted as live replicas. > Proposal: Invalidate these blocks once they are replicated and there are > enough live replicas in the cluster. > Reason: A recent shutdown of decommissioned datanodes to finished the flow > caused Namenode latency spike since namenode needs to remove all of the > blocks from its memory and this step requires holding write lock. If we have > gradually invalidated these blocks the deletion will be much easier and > faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15634) Invalidate block on decommissioning DataNode after replication
[ https://issues.apache.org/jira/browse/HDFS-15634?focusedWorklogId=501430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-501430 ] ASF GitHub Bot logged work on HDFS-15634: - Author: ASF GitHub Bot Created on: 16/Oct/20 06:04 Start Date: 16/Oct/20 06:04 Worklog Time Spent: 10m Work Description: fengnanli commented on a change in pull request #2388: URL: https://github.com/apache/hadoop/pull/2388#discussion_r506070741 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java ## @@ -3559,9 +3558,26 @@ private Block addStoredBlock(final BlockInfo block, if ((corruptReplicasCount > 0) && (numLiveReplicas >= fileRedundancy)) { invalidateCorruptReplicas(storedBlock, reportedBlock, num); } +if (shouldInvalidateDecommissionedRedundancy(num, fileRedundancy)) { + for (DatanodeStorageInfo storage : blocksMap.getStorages(block)) { +final DatanodeDescriptor datanode = storage.getDatanodeDescriptor(); +if (datanode.isDecommissioned() +|| datanode.isDecommissionInProgress()) { + addToInvalidates(storedBlock, datanode); +} + } +} return storedBlock; } + // If there are enough live replicas, start invalidating + // decommissioned + decommissioning replicas + private boolean shouldInvalidateDecommissionedRedundancy(NumberReplicas num, Review comment: Good idea. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 501430) Time Spent: 50m (was: 40m) > Invalidate block on decommissioning DataNode after replication > -- > > Key: HDFS-15634 > URL: https://issues.apache.org/jira/browse/HDFS-15634 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Right now when a DataNode starts decommission, Namenode will mark it as > decommissioning and its blocks will be replicated over to different > DataNodes, then marked as decommissioned. These blocks are not touched since > they are not counted as live replicas. > Proposal: Invalidate these blocks once they are replicated and there are > enough live replicas in the cluster. > Reason: A recent shutdown of decommissioned datanodes to finished the flow > caused Namenode latency spike since namenode needs to remove all of the > blocks from its memory and this step requires holding write lock. If we have > gradually invalidated these blocks the deletion will be much easier and > faster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org