[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces
[ https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007258#comment-17007258 ] Yiqun Lin commented on HDFS-15087: -- {quote} Snapshot: Using the snapshot diff maybe? I'm not sure. {quote} I think Inigo'r proposal is that we can firstly create an initial snapshot to do the SaveTree. And for the incremental change in source folder during the subsequent phases, we can create the new snapshot and do the snapshot diff for SaveTree and then do the same as the first time procedure. If we find there is only very few data change (maybe we will have a threshold value here), we do the block write until last SaveTree,.., transfer block , add hard link finished. {quote} The approach described in the doc requires hard linking. I think this is a good idea for the start but I would push to make it pluggable/abstract so in the future we can have other implementations. {quote} I am +1 for this, this will be better to be pluggable. Others look good to me. [~LiJinglun], feel free to attach your initial patch. > RBF: Balance/Rename across federation namespaces > > > Key: HDFS-15087 > URL: https://issues.apache.org/jira/browse/HDFS-15087 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Priority: Major > Attachments: HFR_Rename Across Federation Namespaces.pdf > > > The Xiaomi storage team has developed a new feature called HFR(HDFS > Federation Rename) that enables us to do balance/rename across federation > namespaces. The idea is to first move the meta to the dst NameNode and then > link all the replicas. It has been working in our largest production cluster > for 2 months. We use it to balance the namespaces. It turns out HFR is fast > and flexible. The detail could be found in the design doc. > Looking forward to a lively discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces
[ https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007253#comment-17007253 ] Jinglun commented on HDFS-15087: Hi [~ayushtkn], the approach in the design doc doesn't cover the non-shared DNs. If we let the DN to transfer the blocks, the process would be: *block writes -> saveTree -> graftTree -> transfer blocks -> update mount table*. Since we got bandwidth limit, I'm afraid the process would be too long. In this case I think we can use the option 3 "Incremental Distcp" to do the balance. We only need to block writes on the final round of distcp, so the writes blocking period should be shorter. For a non-shared DNs cluster, I think we can not support normal user rename operations because the data transfer costs too much time. So my initial patch should includes both option 1 and option 3. > RBF: Balance/Rename across federation namespaces > > > Key: HDFS-15087 > URL: https://issues.apache.org/jira/browse/HDFS-15087 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Priority: Major > Attachments: HFR_Rename Across Federation Namespaces.pdf > > > The Xiaomi storage team has developed a new feature called HFR(HDFS > Federation Rename) that enables us to do balance/rename across federation > namespaces. The idea is to first move the meta to the dst NameNode and then > link all the replicas. It has been working in our largest production cluster > for 2 months. We use it to balance the namespaces. It turns out HFR is fast > and flexible. The detail could be found in the design doc. > Looking forward to a lively discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces
[ https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007212#comment-17007212 ] Ayush Saxena commented on HDFS-15087: - Thanx [~LiJinglun] for the updates, One last doubt I have, As it is said, the datanodes needs to be shared, We have couple of use cases where the federated clusters doesn't have shared DN's. So, Would that be a limitation with the approach in the design doc, or is there a cover to that, We fallback to some other mechanism, like directly copying the block to the other DN, or something like that? If there is a cover to this too, in any way I am +1 for the approach. This being plugable shall not block us from upgrading to a better approach, if we tend to get any. > RBF: Balance/Rename across federation namespaces > > > Key: HDFS-15087 > URL: https://issues.apache.org/jira/browse/HDFS-15087 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Priority: Major > Attachments: HFR_Rename Across Federation Namespaces.pdf > > > The Xiaomi storage team has developed a new feature called HFR(HDFS > Federation Rename) that enables us to do balance/rename across federation > namespaces. The idea is to first move the meta to the dst NameNode and then > link all the replicas. It has been working in our largest production cluster > for 2 months. We use it to balance the namespaces. It turns out HFR is fast > and flexible. The detail could be found in the design doc. > Looking forward to a lively discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces
[ https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007186#comment-17007186 ] Jinglun commented on HDFS-15087: I think we have 4 options about the balance/rename: # The design doc way: block writes -> saveTree -> graftTree -> hardlink -> update router mount table # FastCopy: block writes -> FastCopy -> update router mount table # Incremental DistCp: Distcp many times -> block writes -> final distcp -> update router mount table # Snapshot: Using the snapshot diff maybe? I'm not sure. I'd prefer option 1. Because it's fast and can be used in both balance and rename. The FastCopy is not maintained for a while so using option 2 needs much work to update FastCopy I think. The weak points of distcp is mentioned before: "too slow to support rename" + "doubles the space" + "distcp listing costs too much time when the src-path is big". The Scheduler model in HFR is plugable so choosing option 1 doesn't mean rejecting all the other options. So I think may be we can start with option 1. If we all agree, I'll upload the initial patch. > RBF: Balance/Rename across federation namespaces > > > Key: HDFS-15087 > URL: https://issues.apache.org/jira/browse/HDFS-15087 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Priority: Major > Attachments: HFR_Rename Across Federation Namespaces.pdf > > > The Xiaomi storage team has developed a new feature called HFR(HDFS > Federation Rename) that enables us to do balance/rename across federation > namespaces. The idea is to first move the meta to the dst NameNode and then > link all the replicas. It has been working in our largest production cluster > for 2 months. We use it to balance the namespaces. It turns out HFR is fast > and flexible. The detail could be found in the design doc. > Looking forward to a lively discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces
[ https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007176#comment-17007176 ] Jinglun commented on HDFS-15087: Hi [~ayushtkn] , thanks your comments. The meta(INodes,Blocks,Tree structure) is serialized in the same way as in FSImage. So every thing is preserved. The HFR can support EC files too. We are developing it now. > RBF: Balance/Rename across federation namespaces > > > Key: HDFS-15087 > URL: https://issues.apache.org/jira/browse/HDFS-15087 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Priority: Major > Attachments: HFR_Rename Across Federation Namespaces.pdf > > > The Xiaomi storage team has developed a new feature called HFR(HDFS > Federation Rename) that enables us to do balance/rename across federation > namespaces. The idea is to first move the meta to the dst NameNode and then > link all the replicas. It has been working in our largest production cluster > for 2 months. We use it to balance the namespaces. It turns out HFR is fast > and flexible. The detail could be found in the design doc. > Looking forward to a lively discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8591) Remove support for deprecated configuration key dfs.namenode.decommission.nodes.per.interval
[ https://issues.apache.org/jira/browse/HDFS-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007166#comment-17007166 ] Danny Becker commented on HDFS-8591: There is an issue with the logic here which can cause the decommissioner to get stuck in a nearly infinite loop. The decommissioner checks a datanode which is in_maintenance and no blocks are checked. The decommissioner will continue to loop through this until the datanode is no longer in_maintenance or it reaches Integer.MAX_VALUE. > Remove support for deprecated configuration key > dfs.namenode.decommission.nodes.per.interval > > > Key: HDFS-8591 > URL: https://issues.apache.org/jira/browse/HDFS-8591 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Minor > Fix For: 3.0.0-alpha1 > > Attachments: hdfs-8591.001.patch > > > dfs.namenode.decommission.nodes.per.interval is deprecated in branch-2 and > can be removed in trunk. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007055#comment-17007055 ] Ahmed Hussein commented on HDFS-14854: -- [~sodonnell], I see that you commented out one of the checks in "{{TestDecommissioningStatus.testDecommissionStatus()"}} Can you please share your experience with that test case and why you decided to remove the check? There are some old Jiras suggesting that "{{testDecommissionStatus"}} is flaky. * HDFS-12188 * HDFS-9599 * HDFS-9950 * HDFS-10755 > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: 012_to_013_changes.diff, > Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, HDFS-14854.002.patch, > HDFS-14854.003.patch, HDFS-14854.004.patch, HDFS-14854.005.patch, > HDFS-14854.006.patch, HDFS-14854.007.patch, HDFS-14854.008.patch, > HDFS-14854.009.patch, HDFS-14854.010.patch, HDFS-14854.011.patch, > HDFS-14854.012.patch, HDFS-14854.013.patch, HDFS-14854.014.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces
[ https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006918#comment-17006918 ] Ayush Saxena commented on HDFS-15087: - Does this preserve the EC Policy, ACL's etc? > RBF: Balance/Rename across federation namespaces > > > Key: HDFS-15087 > URL: https://issues.apache.org/jira/browse/HDFS-15087 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Priority: Major > Attachments: HFR_Rename Across Federation Namespaces.pdf > > > The Xiaomi storage team has developed a new feature called HFR(HDFS > Federation Rename) that enables us to do balance/rename across federation > namespaces. The idea is to first move the meta to the dst NameNode and then > link all the replicas. It has been working in our largest production cluster > for 2 months. We use it to balance the namespaces. It turns out HFR is fast > and flexible. The detail could be found in the design doc. > Looking forward to a lively discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces
[ https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006816#comment-17006816 ] Jinglun commented on HDFS-15087: In Xiaomi we have an incremental version of HFR using distcp. The idea is to keep submitting distcp round by round until the distcp can be done in a short time. Then we block all the writes and do the final round of distcp. But still it has weak points: # It's slow and can't be used in normal user rename. # It doubles the space so can't be used on big path. # The distcp needs to list the src-path and it can cost a lot of time if the src-path is big. It restricts the final round speed of distcp. Listing src-path in multi-thread might resolve weak point 3. > RBF: Balance/Rename across federation namespaces > > > Key: HDFS-15087 > URL: https://issues.apache.org/jira/browse/HDFS-15087 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Priority: Major > Attachments: HFR_Rename Across Federation Namespaces.pdf > > > The Xiaomi storage team has developed a new feature called HFR(HDFS > Federation Rename) that enables us to do balance/rename across federation > namespaces. The idea is to first move the meta to the dst NameNode and then > link all the replicas. It has been working in our largest production cluster > for 2 months. We use it to balance the namespaces. It turns out HFR is fast > and flexible. The detail could be found in the design doc. > Looking forward to a lively discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15087) RBF: Balance/Rename across federation namespaces
[ https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006657#comment-17006657 ] Jinglun edited comment on HDFS-15087 at 1/2/20 1:36 PM: Hi [~elgoiri], thanks your nice comments ! {quote}Would it be possible to leverage HDFS snapshots instead of blocking writes and having the new tree related calls? Intuitively, I would expect for snapshots to cover 90% of the features described in the doc. I would try to improve snapshots to cover 100%. {quote} -I'm not familiar with the snapshot. In my rough thought as long as the snapshot meta could be transferred and rebuilt the HFR could support it.- -I'll try to write a demo to transfer and rebuild the snapshot across NameNodes-. I have a quick look of snapshot and I'm not sure how to use it. Do you mean to use the diff of snapshots so we can do the balance in an incremental way ? {quote}The approach described in the doc requires hard linking. I think this is a good idea for the start but I would push to make it pluggable/abstract so in the future we can have other implementations. {quote} Good idea. The design of HFR has considered it. The HFR is a combination of many tasks. Each task is plugable. For example if we want to use copy instead of hardlink, we can switch the HardLink task to a CopyReplica task. {quote}Is hard linking available in Windows? {quote} After HADOOP-11483 we use jdk Files.createLink() to do the hardlinks. I test Files.createLink() on windows and it works. See java doc [https://docs.oracle.com/javase/tutorial/essential/io/links.html] was (Author: lijinglun): Hi [~elgoiri], thanks your nice comments ! {quote}Would it be possible to leverage HDFS snapshots instead of blocking writes and having the new tree related calls? Intuitively, I would expect for snapshots to cover 90% of the features described in the doc. I would try to improve snapshots to cover 100%. {quote} I'm not familiar with the snapshot. In my rough thought as long as the snapshot meta could be transferred and rebuilt the HFR could support it. I'll try to write a demo to transfer and rebuild the snapshot across NameNodes. {quote}The approach described in the doc requires hard linking. I think this is a good idea for the start but I would push to make it pluggable/abstract so in the future we can have other implementations. {quote} Good idea. The design of HFR has considered it. The HFR is a combination of many tasks. Each task is plugable. For example if we want to use copy instead of hardlink, we can switch the HardLink task to a CopyReplica task. {quote}Is hard linking available in Windows? {quote} After HADOOP-11483 we use jdk Files.createLink() to do the hardlinks. I test Files.createLink() on windows and it works. See java doc [https://docs.oracle.com/javase/tutorial/essential/io/links.html] > RBF: Balance/Rename across federation namespaces > > > Key: HDFS-15087 > URL: https://issues.apache.org/jira/browse/HDFS-15087 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Priority: Major > Attachments: HFR_Rename Across Federation Namespaces.pdf > > > The Xiaomi storage team has developed a new feature called HFR(HDFS > Federation Rename) that enables us to do balance/rename across federation > namespaces. The idea is to first move the meta to the dst NameNode and then > link all the replicas. It has been working in our largest production cluster > for 2 months. We use it to balance the namespaces. It turns out HFR is fast > and flexible. The detail could be found in the design doc. > Looking forward to a lively discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15092) TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock sometimes failed
[ https://issues.apache.org/jira/browse/HDFS-15092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006766#comment-17006766 ] Íñigo Goiri commented on HDFS-15092: Any chance we can use GenericTestUtils#waitFor? > TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock sometimes failed > - > > Key: HDFS-15092 > URL: https://issues.apache.org/jira/browse/HDFS-15092 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 3.3.0 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Minor > Attachments: HDFS-15092.001.patch > > > TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock sometimes failed > {quote} > java.lang.AssertionError: > Expected :5 > Actual :4 > > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks.testProcessOverReplicatedAndRedudantBlock(TestRedudantBlocks.java:138) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {quote} > Maybe we should increase sleep time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15092) TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock sometimes failed
[ https://issues.apache.org/jira/browse/HDFS-15092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006757#comment-17006757 ] Surendra Singh Lilhore commented on HDFS-15092: --- +1 LGTM, I feel it is failing only in some slow machines. > TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock sometimes failed > - > > Key: HDFS-15092 > URL: https://issues.apache.org/jira/browse/HDFS-15092 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 3.3.0 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Minor > Attachments: HDFS-15092.001.patch > > > TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock sometimes failed > {quote} > java.lang.AssertionError: > Expected :5 > Actual :4 > > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks.testProcessOverReplicatedAndRedudantBlock(TestRedudantBlocks.java:138) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {quote} > Maybe we should increase sleep time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15091) Cache Admin and Quota Commands Should Check SuperUser Before Taking Lock
[ https://issues.apache.org/jira/browse/HDFS-15091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006735#comment-17006735 ] Xiaoqiao He commented on HDFS-15091: v02 LGTM, +1 from my side. > Cache Admin and Quota Commands Should Check SuperUser Before Taking Lock > > > Key: HDFS-15091 > URL: https://issues.apache.org/jira/browse/HDFS-15091 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-15091-01.patch, HDFS-15091-02.patch > > > As of now all API check superuser before taking lock, Similarly can be done > for the cache commands and setQuota. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces
[ https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006662#comment-17006662 ] Jinglun commented on HDFS-15087: Hi [~linyiqun], thanks your nice comments ! {quote}So how can we ensure that source directory not being changed during that time? Or we recommend use HRF only for small paths that won't have frequent change? {quote} A simple way to ensure the directory not being changed is: remove all permissions of the source directory and force recoverLease()/close all open files. Normal users can't change the source directory anymore, both directories and files. They can read it too. In Xiaomi we also developed a lock technique called INodeLock. We can set an xattribute to one INode. The xattribute records a set of prohibited operations and the scope. When one rpc arrives, the NameNode check it and reject the rpc trying prohibited operations on path in scope. We want this INodeLock because we want only the write operations to be rejected. > RBF: Balance/Rename across federation namespaces > > > Key: HDFS-15087 > URL: https://issues.apache.org/jira/browse/HDFS-15087 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Priority: Major > Attachments: HFR_Rename Across Federation Namespaces.pdf > > > The Xiaomi storage team has developed a new feature called HFR(HDFS > Federation Rename) that enables us to do balance/rename across federation > namespaces. The idea is to first move the meta to the dst NameNode and then > link all the replicas. It has been working in our largest production cluster > for 2 months. We use it to balance the namespaces. It turns out HFR is fast > and flexible. The detail could be found in the design doc. > Looking forward to a lively discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces
[ https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006660#comment-17006660 ] Jinglun commented on HDFS-15087: Hi [~ayushtkn], thanks your nice comments ! Yes, FastCopy is a very good tool. We researched it before we started HFR. It could be very effective when we do balance. But it's too heavyweight if we want to support a normal rename across namespaces. It depends on Yarn hence the time cost is out of control. The saveTree()+graftTree()+hardlink way is more lightweight. In our practice even TB path rename could be controlled within one minute, so the rpc won't timeout. > RBF: Balance/Rename across federation namespaces > > > Key: HDFS-15087 > URL: https://issues.apache.org/jira/browse/HDFS-15087 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Priority: Major > Attachments: HFR_Rename Across Federation Namespaces.pdf > > > The Xiaomi storage team has developed a new feature called HFR(HDFS > Federation Rename) that enables us to do balance/rename across federation > namespaces. The idea is to first move the meta to the dst NameNode and then > link all the replicas. It has been working in our largest production cluster > for 2 months. We use it to balance the namespaces. It turns out HFR is fast > and flexible. The detail could be found in the design doc. > Looking forward to a lively discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces
[ https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006657#comment-17006657 ] Jinglun commented on HDFS-15087: Hi [~elgoiri], thanks your nice comments ! {quote}Would it be possible to leverage HDFS snapshots instead of blocking writes and having the new tree related calls? Intuitively, I would expect for snapshots to cover 90% of the features described in the doc. I would try to improve snapshots to cover 100%. {quote} I'm not familiar with the snapshot. In my rough thought as long as the snapshot meta could be transferred and rebuilt the HFR could support it. I'll try to write a demo to transfer and rebuild the snapshot across NameNodes. {quote}The approach described in the doc requires hard linking. I think this is a good idea for the start but I would push to make it pluggable/abstract so in the future we can have other implementations. {quote} Good idea. The design of HFR has considered it. The HFR is a combination of many tasks. Each task is plugable. For example if we want to use copy instead of hardlink, we can switch the HardLink task to a CopyReplica task. {quote}Is hard linking available in Windows? {quote} After HADOOP-11483 we use jdk Files.createLink() to do the hardlinks. I test Files.createLink() on windows and it works. See java doc [https://docs.oracle.com/javase/tutorial/essential/io/links.html] > RBF: Balance/Rename across federation namespaces > > > Key: HDFS-15087 > URL: https://issues.apache.org/jira/browse/HDFS-15087 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Priority: Major > Attachments: HFR_Rename Across Federation Namespaces.pdf > > > The Xiaomi storage team has developed a new feature called HFR(HDFS > Federation Rename) that enables us to do balance/rename across federation > namespaces. The idea is to first move the meta to the dst NameNode and then > link all the replicas. It has been working in our largest production cluster > for 2 months. We use it to balance the namespaces. It turns out HFR is fast > and flexible. The detail could be found in the design doc. > Looking forward to a lively discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org