[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118421#comment-17118421 ] Hudson commented on HDFS-13183: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18304 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18304/]) HDFS-13183. Addendum: Standby NameNode process getBlocks request to (ayushsaxena: rev 9b38be43c6323077a7be14e1295ad484c4038372) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch, > HDFS-13183.addendum.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118409#comment-17118409 ] Ayush Saxena commented on HDFS-13183: - Committed addendum to trunk and branch-3.3 Thanx [~hexiaoqiao] for the contribution and [~Jim_Brennan] for the review!!! > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch, > HDFS-13183.addendum.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115072#comment-17115072 ] Ayush Saxena commented on HDFS-13183: - Thanx [~hexiaoqiao] and [~Jim_Brennan] for the quick fix. +1 for the second addendum. [~weichiu] can you give a check and conclude this, you must be having a better idea with this, I couldn't check the original patch. So... > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch, > HDFS-13183.addendum.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113467#comment-17113467 ] Jim Brennan commented on HDFS-13183: I am +1 (non-binding) on the second addendum patch. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch, > HDFS-13183.addendum.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113218#comment-17113218 ] Jim Brennan commented on HDFS-13183: [~hexiaoqiao] thanks for checking TestBalancer and fixing the problem causing TestBalancerWithNodeGroup to fail. I think a separate Jira for the TestBalancerWithHANameNodes#testBalancerWithObserver failures is appropriate. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch, > HDFS-13183.addendum.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113049#comment-17113049 ] Hadoop QA commented on HDFS-13183: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 23s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}117m 53s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}196m 5s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics | | | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-HDFS-Build/29344/artifact/out/Dockerfile | | JIRA Issue | HDFS-13183 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13003566/HDFS-13183.addendum.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7a8ee7089793 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 1a3c6bb33b6 | | Default Java | Private
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112857#comment-17112857 ] Xiaoqiao He commented on HDFS-13183: After dig deep about BalancerWithObserver, the root cause of failed unit test TestBalancerWithHANameNodes#testBalancerWithObserver is that verify #getBlocks invoke times, as the following code segment. When open Observer Read feature, seems it does not request the first Observer NameNode every time. When there are two Observer NameNodes are alive, it could request random one in this case. So it is 50% possible to execute failed. IMO it is not related to this changes. I would like to file another JIRA to trace it. {code:java} doTest(conf); for (int i = 0; i < cluster.getNumNameNodes(); i++) { // First observer node is at idx 2, or 3 if 2 has been shut down // It should get both getBlocks calls, all other NNs should see 0 calls int expectedObserverIdx = withObserverFailure ? 3 : 2; int expectedCount = (i == expectedObserverIdx) ? 2 : 0; verify(namesystemSpies.get(i), times(expectedCount)) .getBlocks(any(), anyLong(), anyLong()); } {code} try to trigger yetus manually, and check the result again. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch, > HDFS-13183.addendum.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112463#comment-17112463 ] Xiaoqiao He commented on HDFS-13183: Thanks [~Jim_Brennan] for your information. I think TestBalancer should not be affected even with this feature, because configuration `dfs.ha.allow.stale.reads` not set true in TestBalancer and all logic will keep the same. Actually I try to instead `Balancer.run(namenodes, BalancerParameters.DEFAULT, conf);` with `Balancer.run(namenodes, nsIds, BalancerParameters.DEFAULT, conf);` offline, and all case of TestBalancer could pass at local. The new addendum patch just improve log output. About failed unit test {{TestBalancerWithHANameNodes.testBalancerWithObserver}}, I try to run many times at local without the changes, it is also low-probability failure, And I try to trace the code execution path, it looks no difference between patch and no patch. [~xkrogen] Would you mind to have another check? > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch, > HDFS-13183.addendum.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112277#comment-17112277 ] Jim Brennan commented on HDFS-13183: Thanks [~hexiaoqiao]. It looks like there are still some failures. One other note: it's possible TestBalancer did not fail because it uses its own copy of doBalance() called runBalancer(). I don't know if it would have failed if it was using Balancer.run() instead. TestBalancerWithNodeGroup uses Balancer.run(), which is why it was affected. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112111#comment-17112111 ] Hadoop QA commented on HDFS-13183: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 42s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 50s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 41s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 0s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}157m 17s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestBPOfferService | | | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics | | | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-HDFS-Build/29341/artifact/out/Dockerfile | | JIRA Issue | HDFS-13183 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13003492/HDFS-13183.addendum.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 96d76f88501d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / cef07569294 | | Default
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111744#comment-17111744 ] Xiaoqiao He commented on HDFS-13183: Hi [~weichiu],[~Jim_Brennan] I am sure the failed unit test TestBalancerWithNodeGroup is related to this changes. Sorry I does not consider NodeGroup carefully and no any experience about this feature. [~weichiu] could you help to revert this changes. I would like to offer fixed version ASAP. Thanks. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111707#comment-17111707 ] Xiaoqiao He commented on HDFS-13183: Thanks [~weichiu] and [~Jim_Brennan] for your reminder, I will check it today. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111628#comment-17111628 ] Wei-Chiu Chuang commented on HDFS-13183: Thanks Jim [~hexiaoqiao] how do you think? I can revert it now or you can offer a fix quickly. Let me know. Thanks > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111586#comment-17111586 ] Jim Brennan commented on HDFS-13183: More importantly, because it will never return NO_MOVE_PROGRESS, it will loop forever returning IN_PROGRESS. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111578#comment-17111578 ] Jim Brennan commented on HDFS-13183: [~weichiu], [~hexiaoqiao], I believe this change is causing TestBalancerWithNodeGroup to fail: [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/146/testReport/junit/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerEndInNoMoveProgress/] The problem is that Balancer.doBalance() was changed to construct the NameNodeConnectors inside the iteration loop. The counter to track how many iterations we have gone without a move ({{notChangedIterations}}) is in the NameNodeConnector, but it is intended to work across iterations. Since we are now creating new connectors on each iteration, this will always be zero, so we will never exit a balancer with ExitStatus.NO_MOVE_PROGRESS. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110460#comment-17110460 ] Wei-Chiu Chuang commented on HDFS-13183: Done. Pushed both to branch-3.3. Thanks! > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110450#comment-17110450 ] Xiaoqiao He commented on HDFS-13183: Thanks [~weichiu] for your reminder here. branch-3.3 compile failure because HDFS-15356 has not backport. Try to cherry-pick HDFS-15356 and fix HDFS-15202 manually, it seems work fine at local. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110442#comment-17110442 ] Wei-Chiu Chuang commented on HDFS-13183: Just realized the code doesn't compile in branch-3.3 so reverted from that branch. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110422#comment-17110422 ] Xiaoqiao He commented on HDFS-13183: Thanks [~weichiu] for your helps and push this feature forward. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110327#comment-17110327 ] Hudson commented on HDFS-13183: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #18268 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18268/]) HDFS-13183. Standby NameNode process getBlocks request to reduce Active (weichiu: rev a3f44dacc1fa19acc4eefd1e2505e54f8629e603) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithHANameNodes.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110306#comment-17110306 ] Wei-Chiu Chuang commented on HDFS-13183: +1 bq. BTW, I am not sure why configuration key 'dfs.ha.allow.stale.reads' is not defined at DFSConfigKeys, I would like file another JIRA to unify it. Yeah, please go ahead. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110296#comment-17110296 ] Hadoop QA commented on HDFS-13183: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 0s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 58s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 9s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}111m 36s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}182m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | | | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup | | | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-HDFS-Build/29318/artifact/out/Dockerfile | | JIRA Issue | HDFS-13183 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13003241/HDFS-13183.007.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f1dfcee32614 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / b65815d6914 | | Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 | | unit |
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109977#comment-17109977 ] Hadoop QA commented on HDFS-13183: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 22m 3s{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 18m 9s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 0s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 58s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 15m 59s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}114m 14s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}187m 1s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDistributedFileSystem | | | hadoop.hdfs.TestDFSInputStream | | | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes | | | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-HDFS-Build/29317/artifact/out/Dockerfile | | JIRA Issue | HDFS-13183 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13003241/HDFS-13183.007.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux dce5a9c87862 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / a3809d20230 | | Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 | | mvninstall |
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109826#comment-17109826 ] Xiaoqiao He commented on HDFS-13183: code rebase and trigger yetus again. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108881#comment-17108881 ] Xiaoqiao He commented on HDFS-13183: Try to run failed unit test at local, both of them passed except {{TestBalancerWithNodeGroup}} which HDFS-14960 is tracing. [~weichiu] would you like to give another reviews. Thanks. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108251#comment-17108251 ] Hadoop QA commented on HDFS-13183: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 54s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 25m 49s{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 4m 37s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 27s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 17m 19s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}122m 31s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 42s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}189m 54s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.tracing.TestTracing | | | hadoop.hdfs.server.namenode.TestDeleteRace | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-HDFS-Build/29282/artifact/out/Dockerfile | | JIRA Issue | HDFS-13183 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13003019/HDFS-13183.006.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 234e393c3e9c 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / ac4a2e11d98 | | Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 | | mvninstall |
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108132#comment-17108132 ] Xiaoqiao He commented on HDFS-13183: Thanks [~weichiu] for your comments. {quote}does it work in federated cluster? IIRC you have a large federated cluster so I am assuming the answer is yes, but does work out of box or does it require extra configuration ? (Sorry, don't have much experience with HDFS federation){quote} In our practice, we deploy multi-balancers for each namespace in order to monitor smoothly. And the current balancer solution also support federation arch after check the logic IMO. Also this PR does not change this core logic. {quote}failover. if a failover happens, the balancer can't adapt and will then send the requests to ANN. That is fine as it shouldn't fail the balancer, but it increases the new ANN overhead.{quote} v006 try to create new {{NameNodeConnector}} for each iterator and keep to request SBN even failover. {quote}Also, just want to say that you don't actually need to UNCHECKED FSNamesystem#getBlocks(). If dfs.ha.allow.stale.reads is true, Standby NN accepts the request as well. That is an extra configuration so probably not ideal.{quote} Yes, it is true. v006 does not involve extra configuration just rely on 'dfs.ha.allow.stale.reads'. Please give another review if have time. Thanks. BTW, I am not sure why configuration key 'dfs.ha.allow.stale.reads' is not defined at DFSConfigKeys, I would like file another JIRA to unify it. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099434#comment-17099434 ] Wei-Chiu Chuang commented on HDFS-13183: I am really sorry I meant to review but got distracted. I would like to push this feature to the finish line, because CRFS is a big feature and will take time to stabilize. Plus, it requires an additional Observer NameNode. The logistics of adding an extra master namenode adds additional complexity. A few comments on the patch: * does it work in federated cluster? IIRC you have a large federated cluster so I am assuming the answer is yes, but does work out of box or does it require extra configuration ? (Sorry, don't have much experience with HDFS federation) * Looks like the balancer determine which NN is the sbnn at start, and then use it til the end. There are two issues: ** failover. if a failover happens, the balancer can't adapt and will then send the requests to ANN. That is fine as it shouldn't fail the balancer, but it increases the new ANN overhead. ** multiple standby namenode support. The balancer always choose the first available standby namenode. This is fine, since in any case there can be only one balancer running at a time. Also, just want to say that you don't actually need to UNCHECKED FSNamesystem#getBlocks(). If dfs.ha.allow.stale.reads is true, Standby NN accepts the request as well. That is an extra configuration so probably not ideal. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070299#comment-17070299 ] Xiaoqiao He commented on HDFS-13183: [~ayushtkn] I totally agree that SBN read/Observer is more common and interesting feature. And it is also effective to reduce load of ANN from #getBlocks. IMO, redirect #getBlocks request to Standby does step further, we could also reduce load of Observer which is core role on the whole read/write access path if we open SBN read feature. On another hand, it is also one choice for end users who do not open SBN feature. Thanks. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069333#comment-17069333 ] Ayush Saxena commented on HDFS-13183: - Thanx [~hexiaoqiao] for the patch, couldn't check the code. But isn't using Observer post SBN has been released a more feasible option. Observer would be having edit log tailing and stuff enabled, optimal for reads, so it would be a better option than standby. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069272#comment-17069272 ] Hadoop QA commented on HDFS-13183: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 55s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 19s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 1s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}156m 37s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.server.datanode.TestBPOfferService | | | hadoop.hdfs.TestMultipleNNPortQOP | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 | | JIRA Issue | HDFS-13183 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12998055/HDFS-13183.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 1b2e3b24e227 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f531a4a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/29037/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results |
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069207#comment-17069207 ] Xiaoqiao He commented on HDFS-13183: v005 try to fix the findbugs and checkstyle. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069006#comment-17069006 ] Hadoop QA commented on HDFS-13183: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 1s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 49s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 657 unchanged - 0 fixed = 658 total (was 657) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 28s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 10s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}174m 25s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Nullcheck of namenodes at line 116 of value previously dereferenced in org.apache.hadoop.hdfs.server.balancer.NameNodeConnector.newNameNodeConnectors(Collection, Collection, String, Path, Configuration, int) At NameNodeConnector.java:116 of value previously dereferenced in org.apache.hadoop.hdfs.server.balancer.NameNodeConnector.newNameNodeConnectors(Collection, Collection, String, Path, Configuration, int) At NameNodeConnector.java:[line 114] | | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | | | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.server.namenode.TestPersistentStoragePolicySatisfier | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 | | JIRA Issue | HDFS-13183 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12998008/HDFS-13183.004.patch | | Optional Tests | dupname asflicense
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068994#comment-17068994 ] Hadoop QA commented on HDFS-13183: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 43s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 48s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 657 unchanged - 0 fixed = 658 total (was 657) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 18s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 5s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}163m 35s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Nullcheck of namenodes at line 116 of value previously dereferenced in org.apache.hadoop.hdfs.server.balancer.NameNodeConnector.newNameNodeConnectors(Collection, Collection, String, Path, Configuration, int) At NameNodeConnector.java:116 of value previously dereferenced in org.apache.hadoop.hdfs.server.balancer.NameNodeConnector.newNameNodeConnectors(Collection, Collection, String, Path, Configuration, int) At NameNodeConnector.java:[line 114] | | Failed junit tests | hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks | | | hadoop.hdfs.server.datanode.TestBPOfferService | | | hadoop.hdfs.TestFileChecksum | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.TestFileChecksumCompositeCrc | | | hadoop.hdfs.TestErasureCodingPoliciesWithRandomECPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 | | JIRA Issue |
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068809#comment-17068809 ] Xiaoqiao He commented on HDFS-13183: Considering this feature is deployed and used by many users, I would like to pick up this again and submit new patch based on branch trunk. I would like to state, A. v004 offer configuration to enable/disable this feature for users. It is disable by default. B. This feature is just one choice for end users to send high load request to Active NN, Observer NN or Standby NN. C. based on my internal cluster practice over 2 years, it is helpful to reduce load of Active NN. Hi [~weichiu][~elgoiri] and other guys, anyone would like to give review about v004? Thanks. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005233#comment-17005233 ] Max Xie commented on HDFS-13183: - (y) > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544463#comment-16544463 ] xiaoli commented on HDFS-13183: --- (y) > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503271#comment-16503271 ] He Xiaoqiao commented on HDFS-13183: Thanks [~elgoiri] and [~shv] for your comments. {quote} If we have that mechanism, we could make this generic and the ActiveDenyOfServiceException would only need to implement this failover exception. {quote} ActiveDenyOfServiceException can trigger client failover now actually. but even that it also has some problems, for instance, Balancer would not work well if SBN shutdown. Of course, these cases can be resolved, if someone interested, please go on work, I am sorry that I have no time to fix it recently. {quote} I see that you are coming to the same problems as we do, but in a more general case. getBlocks() was actually one of our initial use cases for the feature. {quote} HDFS-12943 is a very compelling feature indeed. However, there are many users who used branch-2.7 or earlier are eager for resolving ANN overhead of #getBlocks from Balancer as far as I know. This patch just puts forward one solution to this problem for reference. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501194#comment-16501194 ] Konstantin Shvachko commented on HDFS-13183: I think once reads from StandbyNode HDFS-12943 is implemented {{getBlocks()}} will be just one type of read requests to SBN. [~hexiaoqiao] I see that you are coming to the same problems as we do, but in a more general case. {{getBlocks()}} was actually one of our initial use cases for the feature. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501003#comment-16501003 ] Íñigo Goiri commented on HDFS-13183: I came across this JIRA and I had a similar use case. For HDFS-13488, I tried to add an exception to trigger the failover. Initially, I tried extending {{StandbyException}} as I expected it to be identified by the retry policy and switch to the next NN. However, for the current client, the exception must be StandbyException and not a subclass. I think it would be better to add a new exception type (or probably interface) that could be subclassed and the RetryPolicy would identify it as a trigger for a failover. If we have that mechanism, we could make this generic and the ActiveDenyOfServiceException would only need to implement this failover exception. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383114#comment-16383114 ] He Xiaoqiao commented on HDFS-13183: [~xkrogen] {quote}if the SbNN goes down, the ANN is not aware of this, but the balancer should start to read from the ANN instead of SbNN.{quote} v003 can not process this situation indeed, and i think it is better if client is able to make decision to request the proper namenode which may need to refactor {{NameNodeConnector}}, and I review the target of HDFS-12976, maybe we need wait for finishing. Thanks again for your detailed code reviewed. [~xkrogen] > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382189#comment-16382189 ] Erik Krogen commented on HDFS-13183: I'm not sure that a specific new exception just for this situation is the right move. I think ideally, the client (in this case the Balancer) should be able to make the decision rather than the NN. For example, if the SbNN goes down, the ANN is not aware of this, but the balancer should start to read from the ANN instead of SbNN. The current approach is not able to handle such a situation. The current handling may work as an interim solution until we develop out HDFS-12976, but in that case I would rather reuse {{StandbyException}} and just update its comment rather than creating a new class of exception. This has better compatibility as well. Ping [~shv] for an opinion on this approach. Additional comments on the patch: * I realized that changing {{checkOperation}} to {{UNCHECKED}} in all cases is wrong as that will allow {{getBlocks}} to be performed against the SbNN even if the new config is disabled. For now the only thing that comes to mind is to do something like {{checkOperation(balancerShouldRequestStandby ? UNCHECKED : READ)}}, but I'm not too fond of it. Open to better ideas. It may be that we want to create a new {{OperationCategory.STANDBY_READ}} and then use {{checkOperation(balancerShouldRequestStandby ? STANDBY_READ : READ)}}; this could do away with the explicit check of the service state * In the test, we should confirm that the balancer actually fails over to the SbNN, and that it is able to appropriately get blocks and trigger data movement as a result. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381922#comment-16381922 ] He Xiaoqiao commented on HDFS-13183: [~xkrogen] Thanks for your comments and correct me. In patch v003, I open {{NameNode#getBlock}} and uncheck operation at Standby NameNode, in order to trigger failover at once when request getBlocks to Active NameNode I add new {{Exception}} named {{ActiveDenyOfServiceException}}. Add a simple unit test about failover for {{ActiveDenyOfServiceException}} also. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380554#comment-16380554 ] Erik Krogen commented on HDFS-13183: An {{IOException}} thrown on the ANN side will return to the client as a {{RemoteException}}, so I don't believe it will properly trigger failover. Regardless, the failover handling of generic {{IOException}} within {{FailoverOnNetworkExceptionRetry}} is intended for network issues, not purposeful failover, which is the reason for the existence of {{StandbyException}}. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380139#comment-16380139 ] He Xiaoqiao commented on HDFS-13183: [~xkrogen] Thanks for your detailed comments, and sorry for slow response. {quote}we should not remove checkOperation altogether, but rather it should be OperationCategory.UNCHECKED. We should have this feature be opt-in.{quote} It is good suggestions, and I will update this issue following your advice. {quote}Additionally the exception that should be thrown to purposefully trigger failover for a client is currently a StandbyException, not a generic IOException. {quote} will client not failover immediately when it meet IOException even if {{NamenodeProtocol#getBlocks}} is idempotent? maybe i don't understand correctly, if that please correct me. Thanks again for [~xkrogen] pushing this. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377660#comment-16377660 ] Erik Krogen commented on HDFS-13183: [~hexiaoqiao], thanks for raising this issue! I was actually just about to do the same today. Regarding the existing patch, we should not remove {{checkOperation}} altogether, but rather it should be {{OperationCategory.UNCHECKED}}. We should have this feature be opt-in, and also only enabled if HA is enabled, i.e.: {code:java} if (balancerReadsFromStandby && haEnabled && haContext != null && haContext.getState().getServiceState() == HAServiceState.ACTIVE) { ... } {code} Additionally the exception that should be thrown to purposefully trigger failover for a client is currently a {{StandbyException}}, not a generic {{IOException}}. However, this violates the description of a {{StandbyException}}: {code:java} /** * Thrown by a remote server when it is up, but is not the active server in a * set of servers in which only a subset may be active. */ {code} I think the right approach here will depend on how exactly we tackle HDFS-12976 which is closely related. [~ajayydv], as a member of the team pushing HDFS-12943, we think there is still value in enabling the balancer to read from the existing SbNN. A few notes: * The work in HDFS-12975 is primarily necessary because reads from SbNN should not get blocked by e.g. checkpointing. If the balancer is temporarily (few minutes) delayed by such operations, there is no negative impact to the cluster. * The work in HDFS-13150 is primarily necessary to decrease lag between the ANN and SbNN. Since the balancer will mostly read blocks that have not been recently modified (probabilistically), the lag time is not critical for balancer. * The overall read-from-standby feature is a much larger feature, and would require setting up additional admin machines (for the ObserverNode). The balancer can benefit from the existing standby. We think the Balancer is a good point to provide intermediate functionality between the current state of affairs and the full read-from-standby feature due to its high performance impact, narrow scope, and low consistency requirements. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377367#comment-16377367 ] Ajay Kumar commented on HDFS-13183: --- [~hexiaoqiao], there are broader efforts to utilize SNN in this kind of scenario. (check [HDFS-12975]) Individual change to one function will be difficult to track and may break important functionality. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375520#comment-16375520 ] He Xiaoqiao commented on HDFS-13183: [~ajayydv], Thanks for your comments, {quote}It would be good if in HA mode, ANN redirect all calls to SNN or signals client to direct these calls to SNN through appropriate IOE.{quote} It is a good suggestion and I just submit patch V2 to redirect {{getBlocks}} requests from ANN to SBN in HA mode through IOE, do you mind having a look? [~ajayydv] > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374932#comment-16374932 ] Ajay Kumar commented on HDFS-13183: --- [~hexiaoqiao], this is good start but it will not redirect {{getBlocks}} requests to ANN. It would be good if in HA mode, ANN redirect all calls to SNN or signals client to direct these calls to SNN through appropriate IOE. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13183-trunk.001.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374325#comment-16374325 ] He Xiaoqiao commented on HDFS-13183: The WIP patch V1 demonstrates approach for Standby NameNode changes. It's not ready to run Jenkins yet. The key idea is to remove operation check when Standby NameNode receive #getBlocks request. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13183-trunk.001.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org