[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293190#comment-16293190 ] Hudson commented on HBASE-18946: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4231 (See [https://builds.apache.org/job/HBase-Trunk_matrix/4231/]) HBASE-18946 Stochastic load balancer assigns replica regions to the same (stack: rev 010012cbcb3064b78b9e184a2808bbd26ea80903) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/EnableTableProcedure.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RecoverMetaProcedure.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/CreateTableProcedure.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/master/AbstractTestDLS.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java * (edit) hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java * (add) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionReplicasWithRestartScenarios.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/ZKNamespaceManager.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestAssignmentManager.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestRegionsOnMasterOptions.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionMergeTransactionOnCluster.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/TruncateTableProcedure.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenInitializing.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/CloneSnapshotProcedure.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * (add) hbase-server/src/test/java/org/apache/hadoop/hbase/master/snapshot/TestAssignProcedure.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: stack > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.master.001.patch, > HBASE-18946.master.002.patch, HBASE-18946.master.003.patch, > HBASE-18946.master.004.patch, HBASE-18946.master.005.patch, > HBASE-18946.master.006.patch, HBASE-18946.master.007.patch, > HBASE-18946.master.008.patch, HBASE-18946.master.009.patch, > HBASE-18946.master.010.patch, HBASE-18946.master.011.patch, > HBASE-18946.master.012.patch, HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292173#comment-16292173 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 9 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 56s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 13s{color} | {color:red} hbase-server: The patch generated 10 new + 735 unchanged - 9 fixed = 745 total (was 744) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 30s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 19m 6s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0-beta1. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 0s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 94m 26s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}137m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12902218/HBASE-18946.master.012.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 1884365e82e3 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / deba43b156 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | checkstyle |
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292062#comment-16292062 ] stack commented on HBASE-18946: --- Failure was TestMasterFailover. It failed to write its xml because it timed out twice. Looking that the test, it tries to set zk node states and move meta regions off master, neither of which makes sense in AMv2. I refactored the TestMasterFailover test that does nonesense. I see other timeouts though it looks like most other tests just pass. Here is what I see in console: TestDLSFSHLog TestStochasticLoadBalancer TestReplicationZKNodeCleaner TestLogsCleaner These all pass locally w/o issue EXCEPT TestDLSFSHLog. It looks sick, stuck. Digging, indeed, its the fault of this patch. We try to keep sending state change messages to master as long as we can only the thread is not daemon so it keeps the RS up. Ugh! Fixed. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: stack > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.master.001.patch, > HBASE-18946.master.002.patch, HBASE-18946.master.003.patch, > HBASE-18946.master.004.patch, HBASE-18946.master.005.patch, > HBASE-18946.master.006.patch, HBASE-18946.master.007.patch, > HBASE-18946.master.008.patch, HBASE-18946.master.009.patch, > HBASE-18946.master.010.patch, HBASE-18946.master.011.patch, > HBASE-18946.patch, HBASE-18946.patch, HBASE-18946_2.patch, > HBASE-18946_2.patch, HBASE-18946_simple_7.patch, HBASE-18946_simple_8.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291928#comment-16291928 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 47s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 12s{color} | {color:red} hbase-server: The patch generated 9 new + 721 unchanged - 7 fixed = 730 total (was 728) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 24s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 19m 0s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0-beta1. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 2s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}100m 12s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 22s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12902177/HBASE-18946.master.011.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 0678ddafa22b 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 6ab8ce9829 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | checkstyle |
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291801#comment-16291801 ] stack commented on HBASE-18946: --- Another interesting test failure. I can't get it to fail locally which is a bummer. Looking at the test, for the case where all servers carry Regions -- i.e. Master too -- I see that we were not running the balancer because we had RIT. Let me fix that in latest version of patch. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: stack > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.master.001.patch, > HBASE-18946.master.002.patch, HBASE-18946.master.003.patch, > HBASE-18946.master.004.patch, HBASE-18946.master.005.patch, > HBASE-18946.master.006.patch, HBASE-18946.master.007.patch, > HBASE-18946.master.008.patch, HBASE-18946.master.009.patch, > HBASE-18946.master.010.patch, HBASE-18946.master.011.patch, > HBASE-18946.patch, HBASE-18946.patch, HBASE-18946_2.patch, > HBASE-18946_2.patch, HBASE-18946_simple_7.patch, HBASE-18946_simple_8.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291737#comment-16291737 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 29s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 48s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 51s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 14s{color} | {color:red} hbase-server: The patch generated 9 new + 704 unchanged - 7 fixed = 713 total (was 711) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 27s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 19m 13s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0-beta1. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 1s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}107m 44s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}154m 19s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.master.balancer.TestRegionsOnMasterOptions | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12902145/HBASE-18946.master.010.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux f2560ea0a1ec 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / 2c9ef8a471 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d;
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291475#comment-16291475 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 8s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 12s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 9s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 1m 50s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 37s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 37s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 59s{color} | {color:red} hbase-server: The patch generated 9 new + 704 unchanged - 7 fixed = 713 total (was 711) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedjars {color} | {color:red} 2m 53s{color} | {color:red} patch has 15 errors when building our shaded downstream artifacts. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 4m 24s{color} | {color:red} The patch causes 16 errors with Hadoop v2.6.5. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 5m 57s{color} | {color:red} The patch causes 16 errors with Hadoop v2.7.4. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 7m 33s{color} | {color:red} The patch causes 16 errors with Hadoop v3.0.0-beta1. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 54s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 35s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 29s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12902133/HBASE-18946.master.009.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 88cfccd6cadc 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291400#comment-16291400 ] stack commented on HBASE-18946: --- .009 disables the failing TestRSKilledWhenInitializing (See HBASE-19515). Its a medium test. Lets see what other goodies this patch turns up. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: stack > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.master.001.patch, > HBASE-18946.master.002.patch, HBASE-18946.master.003.patch, > HBASE-18946.master.004.patch, HBASE-18946.master.005.patch, > HBASE-18946.master.006.patch, HBASE-18946.master.007.patch, > HBASE-18946.master.008.patch, HBASE-18946.master.009.patch, > HBASE-18946.patch, HBASE-18946.patch, HBASE-18946_2.patch, > HBASE-18946_2.patch, HBASE-18946_simple_7.patch, HBASE-18946_simple_8.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291393#comment-16291393 ] stack commented on HBASE-18946: --- The test failure is a really good one. We've exposed a latent problem in assign supposedly fixed years ago in 0.96 with HBASE-9593. See HBASE-19515 for detail. For now going to disable this failing test (fails sometimes but the test needs to be beefed up as it is a little sloppy verifying supposed fix). > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: stack > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.master.001.patch, > HBASE-18946.master.002.patch, HBASE-18946.master.003.patch, > HBASE-18946.master.004.patch, HBASE-18946.master.005.patch, > HBASE-18946.master.006.patch, HBASE-18946.master.007.patch, > HBASE-18946.master.008.patch, HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290511#comment-16290511 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 1s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 44s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 53s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 10s{color} | {color:red} hbase-server: The patch generated 9 new + 569 unchanged - 6 fixed = 578 total (was 575) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 25s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 18m 53s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0-beta1. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 3s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 36s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}139m 59s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestRSKilledWhenInitializing | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12902000/HBASE-18946.master.008.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 10bb8ecca696 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / ba5f9ac380 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d;
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290373#comment-16290373 ] stack commented on HBASE-18946: --- .008 Adds in a bunch of doc around how retention works, when to use round robin assignment creation vs old-style. Fixes some other minor findings. The new bulk comes from port of doc done over in the now-shattered HBASE-19501. Here is commit message. Added new bulk assign createRoundRobinAssignProcedure to complement the existing createAssignProcedure. The former asks the balancer for target servers to set into the created AssignProcedures. The latter sets no target server into AssignProcedure. When no target server is specified, we make effort at assign-time at trying to deploy the region to its old location if there was one. The new round robin assign procedure creator does not do this. Use the new round robin method on table create or reenabling offline regions. Use the old assign in ServerCrashProcedure or in EnableTable so there is a chance we retain locality. Bulk preassigning passing all to-be-assigned to the balancer in one go is good for ensuring good distribution especially when read replicas in the mix. The old assign was single-assign scoped so region replicas could end up on the same server. M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java Cleanup around forceNewPlan. Was confusing. Added a Comparator to sort AssignProcedures so meta and system tables come ahead of user-space tables. M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java Remove the forceNewPlan argument on createAssignProcedure. Didn't make sense given we were creating a new AssignProcedure; the arg had no effect. (createRoundRobinAssignProcedures) Recast to feed all regions to the balancer in bulk and to sort the return so meta and system tables take precedence. Miscellaneous fixes including keeping the Master around until all RegionServers are down, documentation on how assignment retention works, etc. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: stack > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.master.001.patch, > HBASE-18946.master.002.patch, HBASE-18946.master.003.patch, > HBASE-18946.master.004.patch, HBASE-18946.master.005.patch, > HBASE-18946.master.006.patch, HBASE-18946.master.007.patch, > HBASE-18946.master.008.patch, HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290263#comment-16290263 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 32s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 45s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 6s{color} | {color:red} hbase-server: The patch generated 4 new + 229 unchanged - 7 fixed = 233 total (was 236) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 23s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 19m 1s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0-beta1. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 58s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}124m 41s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}167m 32s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestRSKilledWhenInitializing | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12901964/HBASE-18946.master.007.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux ad7f3aa297cc 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 104afd74a6 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d;
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290123#comment-16290123 ] stack commented on HBASE-18946: --- TestRestartCluster#testRetainAssignmentOnRestart is broken because when we create AssignProcedures now, we populate them w/ target servers. Doing this blows our retaining old locations (if no target specified, when AssignProcedure runs, it will try to use the new form of the old location -- which is how we get our retention). Over in HBASE-19501 we noticed that ServerCrashProcedure depends on this and made a version of assign procedures that works for SCP. Let me do the same here by reinstituting the old bulk assign method for SCP to use. This seems to have fixed all tests. Let me post new patch to be sure. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.master.001.patch, > HBASE-18946.master.002.patch, HBASE-18946.master.003.patch, > HBASE-18946.master.004.patch, HBASE-18946.master.005.patch, > HBASE-18946.master.006.patch, HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289840#comment-16289840 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 3m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 49s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 5s{color} | {color:red} hbase-server: The patch generated 3 new + 149 unchanged - 6 fixed = 152 total (was 155) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 24s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 18m 53s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0-beta1. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 59s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 35s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}141m 34s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.master.TestRestartCluster | | | hadoop.hbase.regionserver.TestRSKilledWhenInitializing | | | hadoop.hbase.regionserver.TestRegionReplicasAreDistributed | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12901923/HBASE-18946.master.006.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux f73947df98b8 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289668#comment-16289668 ] stack commented on HBASE-18946: --- [~ram_krish] You +1 if tests pass? Thanks sir. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.master.001.patch, > HBASE-18946.master.002.patch, HBASE-18946.master.003.patch, > HBASE-18946.master.004.patch, HBASE-18946.master.005.patch, > HBASE-18946.master.006.patch, HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289628#comment-16289628 ] stack commented on HBASE-18946: --- I included the async wal setting by mistake. Fixing. Need a +1 here please. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.master.001.patch, > HBASE-18946.master.002.patch, HBASE-18946.master.003.patch, > HBASE-18946.master.004.patch, HBASE-18946.master.005.patch, > HBASE-18946.patch, HBASE-18946.patch, HBASE-18946_2.patch, > HBASE-18946_2.patch, HBASE-18946_simple_7.patch, HBASE-18946_simple_8.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288977#comment-16288977 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 48s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 58s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 7s{color} | {color:red} hbase-server: The patch generated 5 new + 155 unchanged - 6 fixed = 160 total (was 161) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 40s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 53m 40s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 5s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}129m 51s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}208m 47s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.master.TestRestartCluster | | | hadoop.hbase.wal.TestWALOpenAfterDNRollingStart | | | hadoop.hbase.regionserver.TestRSKilledWhenInitializing | | | hadoop.hbase.regionserver.TestRegionReplicasAreDistributed | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12901827/HBASE-18946.master.005.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 8da561481ac1 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288970#comment-16288970 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 9s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 30s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 34s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 5s{color} | {color:red} hbase-server: The patch generated 5 new + 155 unchanged - 6 fixed = 160 total (was 161) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 34s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 54m 59s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 35s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 55s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}214m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestWalAndCompactingMemStoreFlush | | | hadoop.hbase.wal.TestWALOpenAfterDNRollingStart | | | hadoop.hbase.regionserver.TestRegionReplicasAreDistributed | | | hadoop.hbase.master.TestRestartCluster | | | hadoop.hbase.regionserver.wal.TestLogRolling | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12901826/HBASE-18946.master.004.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux b04190b07c38 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017 x86_64 GNU/Linux
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288937#comment-16288937 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 47s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 5s{color} | {color:red} hbase-server: The patch generated 4 new + 155 unchanged - 6 fixed = 159 total (was 161) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 32s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 53m 55s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 0s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}129m 0s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 42s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}206m 55s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestHRegion | | | hadoop.hbase.master.TestRestartCluster | | | hadoop.hbase.wal.TestWALOpenAfterDNRollingStart | | | hadoop.hbase.regionserver.TestRegionReplicasAreDistributed | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12901825/HBASE-18946.master.003.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 52e30776008f 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288754#comment-16288754 ] stack commented on HBASE-18946: --- .004 fixes tests and adds in [~ram_krish] 's test. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.master.001.patch, > HBASE-18946.master.002.patch, HBASE-18946.master.003.patch, > HBASE-18946.master.004.patch, HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288183#comment-16288183 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 29s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 4s{color} | {color:red} hbase-server: The patch generated 8 new + 155 unchanged - 6 fixed = 163 total (was 161) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 23s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 50m 43s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 15s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 10s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}123m 21s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.coprocessor.TestCoprocessorShortCircuitRPC | | | hadoop.hbase.regionserver.TestHRegionFileSystem | | | hadoop.hbase.master.locking.TestLockManager | | | hadoop.hbase.master.locking.TestLockProcedure | | | hadoop.hbase.procedure.TestProcedureManager | | | hadoop.hbase.master.balancer.TestRegionLocationFinder | | | hadoop.hbase.ipc.TestNettyRpcServer | | | hadoop.hbase.TestCheckTestClasses | | | hadoop.hbase.client.TestClientClusterStatus | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12901730/HBASE-18946.master.002.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 7552cc1b172a 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 11467ef111 | | maven | version: Apache Maven 3.5.2
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287968#comment-16287968 ] stack commented on HBASE-18946: --- .002 should be good to go. Trying it against qa. Changed core of AM#createAssignProcedure so we pass list of Regions to assign to the balancer en masse, in one lump. Let the balancer figure what to do with the fat assign. We get back a Map of servers to regions. We then transform that into an array of AssignProcedures to pass to the Assign executor. We sort the array so that meta and system tables are passed to the executor first (and so replicas are clumped together...). Internally the AM executor may divvy up the work into queues but all will be pre-assigned so we should have good distribution (round-robin) regardless of how the queue is processed. M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java Cleanup around forceNewPlan. Was confusing. Added a Comparator to sort AssignProcedures so meta and system tables come ahead of user-space tables. M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java Remove the forceNewPlan argument on createAssignProcedure. Didn't make sense given we were creating a new AssignProcedure; the arg had no effect. (createAssignProcedures) Recast to feed all regions to the balancer in bulk and to sort the return so meta and system tables take precedence. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.master.001.patch, > HBASE-18946.master.002.patch, HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287181#comment-16287181 ] stack commented on HBASE-18946: --- .001 is NOT done yet. Posting WIP. Builds on [~ram_krish] findings and explorations. Added method to AM to bulk create AssignProcedures. Passes bulk to balancer to get placement. Some cleanup of a dodgy force assign flag. Will doc better tomorrow. Just posting what I have at mo. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.master.001.patch, HBASE-18946.patch, > HBASE-18946.patch, HBASE-18946_2.patch, HBASE-18946_2.patch, > HBASE-18946_simple_7.patch, HBASE-18946_simple_8.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280070#comment-16280070 ] Anoop Sam John commented on HBASE-18946: No no.. When the LB round robin is called with all replicas passed together. Till then we had only one region as part of the procedure. So RS been selected for diff replicas together in one shot. And then the sub procedures for each of the replica regions and that will NOT do the previous steps like contacting the LB. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280054#comment-16280054 ] ramkrishna.s.vasudevan commented on HBASE-18946: I can get what you are saying. Adding to META is not an issue, we can do that any time. We can create regioninfos from the default. Sub procs can create new sub procs - true. But my point is this Say first default region create a sub proc, that in turn created one sub proc for rpelica 1. Now you call assign for that replica 1. This assign wil call roundrobin in LB. LB will just assign to RS 1. Now again the parent sub proc wil run - ie the default reigon. Here again round robin in LB will be called. It will again assign RS 1 because it does not know that replica 1 is in RS 1. I will spend some more time in this - to see if I come up with any other alternate way. In case am wrong in the above stmts. [~saint@gmail.com] You can have a go too. Sorry I missed replying to your earlier comment if you can try this out. No problem. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280011#comment-16280011 ] Anoop Sam John commented on HBASE-18946: Ram.. What I say is at 1st steps, there wont be 3 regions at all (consider 3 replica count and we say abt one region).. Till the LB is been contacted. There only the replicas are been generated and asked for servers. Now this single region processing itself is a sub procedure. Now we have 3 regions under this and so have to generate 3 sub procedures. My Q was whether that is possible. A sub procedure under another (in the f/w level) and Stack says yes.. The issue of grouping comes as we did the replica regions create at 1st. What comes always to my mind is can we NOT do that then but later. May be am totally wrong as I did not do code study fully. Just putting up an option/Question > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278997#comment-16278997 ] ramkrishna.s.vasudevan commented on HBASE-18946: bq.Yes same thing I ask/suggest from long time. this is what am trying to do from patch 1. And saying the same. Group the replicas. bq.Can we again make sub procedures for each of the replica regions under a region sub procedure? Not possible currently because AM is independent of what it is processing and same is with LB. what ever procs you create it will just go as individual regions to AM and AM will just process if it has something in queue. So before putting it to AM group it. Either with procs or some queue. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278988#comment-16278988 ] stack commented on HBASE-18946: --- bq. Can we again make sub procedures for each of the replica regions under a region sub procedure? A subprocedure of a subprocedure? Yes, you can do that. Let me try and take a look at this informed by Ram's work above See if I can help. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278985#comment-16278985 ] Anoop Sam John commented on HBASE-18946: bq.Can we give the replicas as a group to the LB and have it queue them as a group? I Yes same thing I ask/suggest from long time.. As discussed with Ram, we have some issue in there with the usage of Procedure way. But my call there would be , if this need a change in the Procedure f/w on the sub procedure ways or so, lets do that. Creating the assign individual procedures for each of the replicas at 1st step and then ask LB is the issue maker. In assign procedure, first we add entry to META. What if there we dont have the replica regions(objects) been created by that time? I think it may be ok as we dont have row per replica region in META. It is just one row only for a region and as per the replicas some new columns (cf:Q) comes in only. Then while contacting LB, we can create the replica regions and pass it as a List to get the servers for regions. Already the LB APIs allow to pass a list of regions so as to assign servers for each. Now the Q is already each of the region assign is a sub procedure within the Table create procedure. Can we again make sub procedures for each of the replica regions under a region sub procedure? [~stack].. Not knowing the internals of Procedure f/w that well ... But will this direction help? > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278975#comment-16278975 ] ramkrishna.s.vasudevan commented on HBASE-18946: bq.and we do NOT want global coordination because we want the assignment queues to run w/o need of coordination so they run fast – seems to be root problem? Yes that is the problem. and pls note that none of the LB is responsible for this. If you pass all the replica regions as a group it will work fine becuae it will just do a pure round robin and nothing based on the actual replica knowledge. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278969#comment-16278969 ] ramkrishna.s.vasudevan commented on HBASE-18946: bq.Can we give the replicas as a group to the LB and have it queue them as a group? I've not looked. But replicas being split over assignment queues w/o a global coordination – Yes. If you see the first patch that is what I try to do but that adds some state. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278916#comment-16278916 ] stack commented on HBASE-18946: --- Thanks [~anoop.hbase] for the intervention. I was thinking this progress and though I understood what is happening here but plain now that I did not. So back to square one. Can we give the replicas as a group to the LB and have it queue them as a group? I've not looked. But replicas being split over assignment queues w/o a global coordination -- and we do NOT want global coordination because we want the assignment queues to run w/o need of coordination so they run fast -- seems to be root problem? Will I have a go at it [~ram_krish]? Thanks boss. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278462#comment-16278462 ] ramkrishna.s.vasudevan commented on HBASE-18946: That is what I was saying now we do the round robin decision even before. But one thing to note is that since now the flow will go with retainAssignment it will still go with retainAssignment what ever be the LB configured. So stochastic LB case this will work as per the title of the JIRA. I think even before this patch if we have some other LB configured it will still not work as expected for replicas (and we already know that FavouredNodeBalancer) does not take care of replicas for sure. I don't mean this patch is solving all case and that is why I am hesitant to commit this stil because it is not asking the LB to pick the servers. The patch _2 is slightly better but again I think other Region balancers will not be able to understand replica. All replicas have been working fine only during actual balancing and not during initial assignment. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278364#comment-16278364 ] Anoop Sam John commented on HBASE-18946: If the region replica placing not working as expected (no 2 replicas on same server) with RS grouping balancer way, that is a bug with RS grouping , not considering the replica feature.But for placing the regions ,while creating the table, NOT contacting the LB will be another problem. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278359#comment-16278359 ] Anoop Sam John commented on HBASE-18946: Before the patch, the regions are created and the AssignProcedure for them. There is no target server in the procedure and we will contact the LB (RoundRobin or RetainAssign calls). Now say when a table is created, then itself the target servers for each of the regions are selected NOT contacting the LB but much earlier by AM. So the LB is not been asked at all.. What if the cluster is having the RS group feature and using that LB? There is all chance that the regions go to some servers where the admin dont want this table to go! I think we can not do this way. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277768#comment-16277768 ] stack commented on HBASE-18946: --- Ok. Got it. +1 for commit. Thanks Ram. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277175#comment-16277175 ] ramkrishna.s.vasudevan commented on HBASE-18946: bq.The patch is still an advance because you've moved the problem of assign to AM which is where it belongs; that the implementation has problems is something we can work on ... Yes that was my concern. My early patch HBASE-18946_2 was actually working on a completed assigned set of regions. That is why I was not looking at moving the selection of servers to the Create table proc stage. bq.Say some more on this please. I was saying the same thing as you stated above advancing the target server selection to the Create table stage and not internal to AM. bq.Should it be other way around? May be am missing your intent. The idea is that if tableDescriptor has replication then create Assign procs for the entire set of regions (including replicas) in a round robin mode with the available list of servers. If there are no replicas just go with the existing way. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277107#comment-16277107 ] stack commented on HBASE-18946: --- So, looking at v8 again in light of your comment, won't we end up putting region replicas on the same servers each time we assign? The patch is still an advance because you've moved the problem of assign to AM which is where it belongs; that the implementation has problems is something we can work on ... bq. I said this because now we set the target server so the round robin logic is not with the AM. Generally AM decides if the assignment shouldbe round robin or retain assignment. So here it is always retain assignment. Say some more on this please. Is this right? 110 if (tableDescriptor.getRegionReplication() > 1) { 111 addChildProcedure( 112 env.getAssignmentManager().createRoundRobinAssignProcedures(newRegions)); 113 } else { 114 addChildProcedure(env.getAssignmentManager().createAssignProcedures(newRegions)); 115 } Should it be other way around? I still think this patch really nice. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276483#comment-16276483 ] ramkrishna.s.vasudevan commented on HBASE-18946: bq.But the problem is no longer we call roundrobinAssignment from LB instead we will call retainAssignment() only. This happens only when replicas are created I said this because now we set the target server so the round robin logic is not with the AM. Generally AM decides if the assignment shouldbe round robin or retain assignment. So here it is always retain assignment. As I said LB knows about replica and it does the replica assignment pretty well but that is when it has the entire cluster available. During the course of assignment it cannot make a judgement on it. I don't find any other simple way to make LB aware of it unless some where we need a grouping of the regions to be assigned. So are you ok with patch for commit [~saint@gmail.com]? > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16272220#comment-16272220 ] stack commented on HBASE-18946: --- This last patch is very nice. Clean. What you mean by this sir: "But the problem is no longer we call roundrobinAssignment from LB instead we will call retainAssignment() only. This happens only when replicas are created." Do we need to add more knowledge of replicas to the LB? > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, HBASE-18946_simple_7.patch, > HBASE-18946_simple_8.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266747#comment-16266747 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 42s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 33s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 33s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 5s{color} | {color:red} hbase-server: The patch generated 4 new + 31 unchanged - 0 fixed = 35 total (was 31) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 0s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 54m 7s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 40s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}100m 12s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}180m 11s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12899364/HBASE-18946_simple_8.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux aa6b75705164 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / f521000d78 | | maven | version: Apache Maven 3.5.2
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266740#comment-16266740 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 50s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 33s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 32s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 35s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 8s{color} | {color:red} hbase-server: The patch generated 2 new + 31 unchanged - 0 fixed = 33 total (was 31) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 2s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 54m 24s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 39s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 99m 43s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}181m 40s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12899363/HBASE-18946_simple_7.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux d6cff5ddaba4 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / f521000d78 | | maven | version:
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264225#comment-16264225 ] ramkrishna.s.vasudevan commented on HBASE-18946: bq.Man. We want round-robin but with caveats.. "Round-robin except..." I think you are worried about this part. But I see this way. We are still doing round robin. 1) Assign primary replicas (round robin) 2) assign sec replicas ( again round robin just avoid primary replica servers). 3) assign tertiary replicas (again round robin but avoid servers in first two steps). and so on. .. So you think why not do a round robin with the entire set of regions. That is true round robin. But for that some where we need to hold up the regions and bulk it up before going for Assign procs. Otherwise another thing we can do is that, Add a API in AM that will do tentative round robin with the available servers and regions. And in the CreateTableProc before creating the AssignProcs add the target server also to it. So that we go ahead and assign accordingly only. bq.Could we pass the AM the new table regions and ask it to return us plans to use assigning? Reading this comment of yours -you suggest the same I believe? Can we do that only for tables with replicas? > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263832#comment-16263832 ] ramkrishna.s.vasudevan commented on HBASE-18946: bq.Looking in the patch, are we doing assign placement inside in CreateTableProcedure still? No we are not doing assign placement in create table proc. We are just seperating out the regions so that primary are assigned and then replicas. There is no state maintained in LB or in the Assign proces like in first patch. bq. Could we pass the AM the new table regions and ask it to return us plans to use assigning? I think that is what we are doing now in this patch right - we pass the regions and get the right server for them ensuring replicas don't sit together. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263317#comment-16263317 ] stack commented on HBASE-18946: --- Man. We want round-robin but with caveats.. "Round-robin except..." Ok w/ the above points yeah, not sure how it changes patch. Looking in the patch, are we doing assign placement inside in CreateTableProcedure still? Could we pass the AM the new table regions and ask it to return us plans to use assigning? I like your speculation on how region replicas will pile up on rolling restart. Yeah, need to make sure we do the right thing. Thanks. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262960#comment-16262960 ] ramkrishna.s.vasudevan commented on HBASE-18946: [~saint@gmail.com] Before I update the patch you ok with the above points ? > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261986#comment-16261986 ] ramkrishna.s.vasudevan commented on HBASE-18946: Thanks for the detailed review. bq.We only do this when it a region with replicas or do we do it always (would be good if former, we want assignment to run fast). Yes only for the replica regions. bq.Please remind me what is the rule for replica assign? Just that they need to be on different servers? Nothing about ordering? (Hmm... seems like replica has to go out first). How does the patch to the balancer ensure this ordering? Our initial requirement is that replicas for sure should be in different servers if there are enough number of servers. Ordering is not of importance. Coming to the balancer, in our code base only StochasticLB knows about replicas while actually balancing the cluster. We have tried FavoredStocasticLB and it does not know about replicas and infact messes with the replica assignment itself (by corrupting the META entries for replicas). That is a big change which we need to do later. We have confirmed this with [~enis] also offline some time back. Also as in said in previous comment balancer does not come into picture while doing round robin assignment of a new table reigons. It just tries to do round robin based on available servers. bq.is there a hole where you can't see an ongoing Assigment? It has been queue'd and is being worked on but but you have no means of querying where a region is being assigned Yes exactly. We don know about it. It not only applies for replica regions any new create table regions has the same issue. The assignment queued just uses the current regions in the queue to do the assignments. But for those regions it is ok we don't mind how they are distributed but for replicas it is very important. when we have enough servers if the replicas are not distributed then we don server the purpose of replicas. If the servers are less than the replicas then it is ok to assign the replicas to the same RS. In future we are planning to even avoid this and fail the assignments itself. bq.If round robin, are we not moving through the list of servers? Is the issue only when cluster is small – three servers or so? Hope you mean before this patch right? We are moving through the list of servers but all the regions (including replicas) do not go to the assignment queue together. So what ever is getting processed from the assignment queue there it does round robin but the next set of regions that is processed again does round robin and we end up in same RS. bq.On patch, don't renumber protobuf fields. Oh yes. I did that so that the steps are in order. Will change it and will try to remove some duplicate code. bq.If NOT isDefaultReplica and NOT replicaAvailable, we just fall through? Yes. If it is a normal region we just go with the old code only and if the replica is not avaliable in the existing code there is way to assign all such region that don't find a suitable server to some servers randomly. Which is fine for us too because replicas are more than the available number of servers. Actually there is more to do with AM and replicas. We know the issues but not yet ready with patches. Like on a rolling restart like case the AM will keep moving the replicas to RS that are running. So finally when the last one is closed all the region would have moved there and META will only have that entry. Now when new RS are started it will try to do retain assignment and again replica regions may get colocated and only a balancer can solve it. We need to see how best we can do in these cases. But all that later (out of scope here). > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261768#comment-16261768 ] stack commented on HBASE-18946: --- bq. While doing roundrobinAssignment contact the AM to know the current state of replica regions and choose a server accordingly. We only do this when it a region with replicas or do we do it always (would be good if former, we want assignment to run fast). Yeah, if round robin, its round robin (smile). Please remind me what is the rule for replica assign? Just that they need to be on different servers? Nothing about ordering? (Hmm... seems like replica has to go out first). How does the patch to the balancer ensure this ordering? is there a hole where you can't see an ongoing Assigment? It has been queue'd and is being worked on but but you have no means of querying where a region is being assigned (i.e. we are about to assign a replica and we want to avoid assigning to the same location as where we just assigned?). If round robin, are we not moving through the list of servers? Is the issue only when cluster is small -- three servers or so? On patch, don't renumber protobuf fields. What is happening here (BTW, repeats code): {code} 1263List serverRegions = 1264assignments.computeIfAbsent(serverName, k -> new ArrayList<>()); 1265if (!RegionReplicaUtil.isDefaultReplica(region)) { 1266 if (!replicaAvailable(region, serverName)) { 1267assignRegionToServer(cluster, serverName, serverRegions, region); 1268serverIdx = (j + serverIdx + 1) % numServers; 1269assigned = true; 1270break; 1271 } 1272} else if (!cluster.wouldLowerAvailability(region, serverName)) { 1273 assignRegionToServer(cluster, serverName, serverRegions, region); 1274 serverIdx = (j + serverIdx + 1) % numServers; // remain from next server ... {code} If NOT isDefaultReplica and NOT replicaAvailable, we just fall through? Good stuff. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259586#comment-16259586 ] ramkrishna.s.vasudevan commented on HBASE-18946: Even I too felt intially should we check for all the replicas equivalent to tableDescriptor.getReplicas but I think with the current logic that is not needed and also we really don know what is the max replicas configured at the AM level. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259583#comment-16259583 ] ramkrishna.s.vasudevan commented on HBASE-18946: Thanks for the review. Since now the CreateTable assigns each replica as a batch the purpose of the code is to check if the previous replicas have been assigned and where is the location of that replica. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259496#comment-16259496 ] Ted Yu commented on HBASE-18946: {code} +int replicaId = info.getReplicaId(); +for (int i = 0; i < replicaId; i++) { {code} replicaId may not be the max number of replicas (tableDescriptor.getRegionReplication()). What's the purpose of the for loop ? > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259357#comment-16259357 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 12s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 33s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 6s{color} | {color:red} hbase-server: The patch generated 1 new + 61 unchanged - 0 fixed = 62 total (was 61) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 57s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 53m 29s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 23s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}181m 20s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898474/HBASE-18946_2.patch | | Optional Tests | asflicense cc unit hbaseprotoc javac javadoc findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 4204523348a1
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259356#comment-16259356 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 12s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 2s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 3s{color} | {color:red} hbase-server: The patch generated 3 new + 61 unchanged - 0 fixed = 64 total (was 61) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 50s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 51m 5s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 96m 37s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}170m 30s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898476/HBASE-18946_2.patch | | Optional Tests | asflicense cc unit hbaseprotoc javac javadoc findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 069931356f1e 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259175#comment-16259175 ] ramkrishna.s.vasudevan commented on HBASE-18946: BTW let me check one flow - like are we able to assign if the number of servers are less than the number of replicas after ths patch. Will be back here. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259171#comment-16259171 ] ramkrishna.s.vasudevan commented on HBASE-18946: bq.Now I am very sure the same change has to be done in ServerCrashProcedure to enable the failed tests becuase there also we just go with round robin and the flow will not contact the LB. I mean this issue - https://issues.apache.org/jira/browse/HBASE-19268 > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256922#comment-16256922 ] ramkrishna.s.vasudevan commented on HBASE-18946: Some updates here -> In roundRobinAssignment the LB does not even care for the replicas and its assignment. So we need to make it aware of it. It knows that the region has a replica but it does not check if same replica is being assigned only balancer knows about it. -> Say we want to assign replica 3 then we have to ensure that the primary and the secondary replica are already assigned and from that determine a plan. So to do that we may have to enable the cost functions or we should add some new methods to make LB in roundrobin to be aware of replicas. -> Next is that the way we do in the current patch attached here actually solved the issue because for the balancer it got all the regions (incluidng replicas) and just applied the round robin part on it. Will be back here. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216905#comment-16216905 ] Anoop Sam John commented on HBASE-18946: bq.When the Balancer is passed a List, could it look at the list first to find replicas and group? Similar thing I was also asking above. bq. Should we make it with all the replica regions go in one request? While reading the code on region replica, that time itself this Q came to me. Then fat logic for not putting 2 regions of same replica in one RS can be avoided. Not sue how easy/difficult it is. Just saying Right now to the LB we pass each region one by one and ask it to find a RS for assign. Instead will it be possible to pass all replicas of a given region together in a List and make the LB to find RS for each of it? > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216381#comment-16216381 ] stack commented on HBASE-18946: --- The batching at RPC level is our new 'bulk assign' mechanism. It is bulking per Server though so i suppose this is no good to you. Can the Balancer learn about Replicas? Seems like a good thing for it to know about. Could it have a plugin that did anti-affinity? > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216366#comment-16216366 ] ramkrishna.s.vasudevan commented on HBASE-18946: bq.When the Balancer is passed a List, could it look at the list first to find replicas and group? It does. But I don't know if it is able to do it across two RPC for the same table. Will check the balancer logic and see how it can be improved/fixed. bq.The AssignProcedure is about assigning a single Procedure, nothing else. If we start bulking it up with other concerns, we'll be back to the fuzzy AMv1 story. Yes. I am also skeptical about this change and that is the reason why did not go forward with this patch. I also doubt other issues. bq.See in RPC where it is batching requests. Will check. But what I see in AM waitOnAssignQueue() we collect the batched regions and go with the assignment. Balancer does know about region replicas but I fear it is still assuming the earlier bulk assign logic. Will be back. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216359#comment-16216359 ] stack commented on HBASE-18946: --- bq. But still in these cases the bulking mechanism is not a logical bulking instead it depends on the timed wait and the size of the queue. See in RPC where it is batching requests. But if you want discernment regards where Regions go on a cluster, thats the Balancer's job. It has all the sources to pull on. Can't it tell members of a ReadReplica set? Can't it do lookup to see where the other replicas are out on the cluster before it makes a plan for current replia? The AssignProcedure is about assigning a single Procedure, nothing else. If we start bulking it up with other concerns, we'll be back to the fuzzy AMv1 story. When the Balancer is passed a List, could it look at the list first to find replicas and group? > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216347#comment-16216347 ] ramkrishna.s.vasudevan commented on HBASE-18946: [~saint@gmail.com] Thanks for the comment. Yes in a way I agree that fixing it in Balancer is best. But still in these cases the bulking mechanism is not a logical bulking instead it depends on the timed wait and the size of the queue. So the balancer may not really know what has been balanced by the time the next bulked set of region comes in. Any suggestions? I can still check if it is possible to make balancer aware of this. But this mechanism solves some more issues in other related areas since we know the set of regions to be balanced at one shot. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216340#comment-16216340 ] stack commented on HBASE-18946: --- We already have a bulking mechanism below AssignProcedure in the RPC. Why ain't we making this change in the balancer. It knows all. Doing it in the AssignProcedure makes it carry state when we've been doing our best to make it a simple machine. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212288#comment-16212288 ] ramkrishna.s.vasudevan commented on HBASE-18946: bq. As with some of regions with known location, there are new ones which needs to be assigned, these new ones could be assigned to the same region server which hosts the primary or other replica region. Yes agree with you. So i need to fix the issue with EnableTableHandler first. bq.The previous logic is that when the first region is queued, it starts to wait assignDispatchWaitMillis to start the real work. With the patch, the whole batch is added at once, it skipped the addFirstOne logic. Will read and understand your comment and will be back here. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211811#comment-16211811 ] huaxiang sun commented on HBASE-18946: -- For the case with increased replica count, even with HBASE-19017, it needs the same logic. As with some of regions with known location, there are new ones which needs to be assigned, these new ones could be assigned to the same region server which hosts the primary or other replica region. wdyt? [~ram_krish] > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211449#comment-16211449 ] huaxiang sun commented on HBASE-18946: -- Thanks [~ram_krish]. One possible slowdown here with the approach is that if queueAll() queues more than assignDispatchWaitQueueMaxSize regions, with the current logic, it still needs to wait a bit, please see https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1639. The previous logic is that when the first region is queued, it starts to wait assignDispatchWaitMillis to start the real work. With the patch, the whole batch is added at once, it skipped the addFirstOne logic. I think it can be changed to avoid this case. {code} private HashMapwaitOnAssignQueue() { HashMap regions = null; assignQueueLock.lock(); try { if (pendingAssignQueue.isEmpty() && isRunning()) { assignQueueFullCond.await(); } if (!isRunning()) return null; +if (pendingAssignQueue.size() < assignDispatchWaitQueueMaxSize) { + assignQueueFullCond.await(assignDispatchWaitMillis, TimeUnit.MILLISECONDS); +} -assignQueueFullCond.await(assignDispatchWaitMillis, TimeUnit.MILLISECONDS); regions = new HashMap (pendingAssignQueue.size()); for (RegionStateNode regionNode: pendingAssignQueue) { regions.put(regionNode.getRegionInfo(), regionNode); } pendingAssignQueue.clear(); } catch (InterruptedException e) { LOG.warn("got interrupted ", e); Thread.currentThread().interrupt(); } finally { assignQueueLock.unlock(); } return regions; } {code} > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210767#comment-16210767 ] ramkrishna.s.vasudevan commented on HBASE-18946: bq.Do we need to apply the same collection logic to EnableTableProcedure (to address the TODO in the comment)? Yes it could be. But with HBASE-19017 it may not be needed because we know where the assignment would go. But before I commit this I would like to spend some more time to understand if there could be any other issues with how the procedure pool executes and we could starve other threads from adding to the assignQueue and only the other threads keep retrying which I think happened when I tried doing the same for enableTablePRocedure. But HBASE-19017 we are safe. And thanks a lot for the review [~huaxiang]. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208547#comment-16208547 ] huaxiang sun commented on HBASE-18946: -- Do we need to apply the same collection logic to EnableTableProcedure (to address the TODO in the comment)? Otherwise, +1. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207050#comment-16207050 ] ramkrishna.s.vasudevan commented on HBASE-18946: Since HBASE-19017 is resolved I think this issue is better now. Will wait for reviews and mean while will check if the patch can cause other issues. [~huaxiang] - Thanks for your time. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206684#comment-16206684 ] huaxiang sun commented on HBASE-18946: -- Sorry, [~ramkrishna], busy with something else. going through the changes now and will post the update, thanks. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205746#comment-16205746 ] ramkrishna.s.vasudevan commented on HBASE-18946: bq.Better assert that the RegionStateNode's added are replica of the same region. Ok will add. I think test case failures are because of timeout issue. Will try out again. [~huaxiang] Can you have a look and provide some suggestion/feedback? I think if we solve HBASE-19017 then for enable we don't need this type of bulk assignment. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203706#comment-16203706 ] Ted Yu commented on HBASE-18946: {code} 47public synchronized void addRegion(RegionStateNode node) { {code} Better assert that the RegionStateNode's added are replica of the same region. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203655#comment-16203655 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 6s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 15s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 59s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 47m 0s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 28s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}134m 8s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.hbase.client.TestScanWithoutFetchingData | | | org.apache.hadoop.hbase.regionserver.wal.TestSecureWALReplay | | | org.apache.hadoop.hbase.master.TestMasterMetricsWrapper | | | org.apache.hadoop.hbase.master.procedure.TestDisableTableProcedure | | | org.apache.hadoop.hbase.regionserver.TestRowTooBig | | | org.apache.hadoop.hbase.regionserver.wal.TestAsyncWALReplay | | | org.apache.hadoop.hbase.regionserver.TestSplitLogWorker | | | org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedure | | | org.apache.hadoop.hbase.master.procedure.TestModifyTableProcedure | | | org.apache.hadoop.hbase.master.procedure.TestDeleteTableProcedure | | | org.apache.hadoop.hbase.master.procedure.TestEnableTableProcedure | | | org.apache.hadoop.hbase.master.procedure.TestCreateTableProcedure | | | org.apache.hadoop.hbase.client.TestSnapshotCloneIndependence | | | org.apache.hadoop.hbase.coprocessor.TestHTableWrapper | | |
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203454#comment-16203454 ] Hadoop QA commented on HBASE-18946: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 18s{color} | {color:red} Docker failed to build yetus/hbase:5d60123. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-18946 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12892052/HBASE-18946.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/9098/console | | Powered by | Apache Yetus 0.4.0 http://yetus.apache.org | This message was automatically generated. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201852#comment-16201852 ] Anoop Sam John commented on HBASE-18946: Should we make it with all the replica regions go in one request? While reading the code on region replica, that time itself this Q came to me. Then fat logic for not putting 2 regions of same replica in one RS can be avoided. Not sue how easy/difficult it is. Just saying > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201809#comment-16201809 ] ramkrishna.s.vasudevan commented on HBASE-18946: Checked in other branches (branch -1.x). There we have a bulk Assigner in AM. So all the regions to be created while doing CreateTable are collected and passed to the balancer along with the available servers. So we get a plan that is truly distributing the replicas. We don't have similar way in trunk as every AssignProcedure is individual and it just adds to the 'pendingAssignQueue' one by one. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198281#comment-16198281 ] ramkrishna.s.vasudevan commented on HBASE-18946: One way to fix is that for CreateTableProcedure atleast we should not split the assignments - we should ensure that all the regions are added as a unit and the assignment should proceed. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198278#comment-16198278 ] ramkrishna.s.vasudevan commented on HBASE-18946: I now got the real issue why this happens. Need to check in other branches which is not PRocV2. Suppose we create a table with 20 regions and say replica as 3. Now we create 60 Assign procedures. Now all these assign procedures are added to a queue 'pendingAssignQueue' in AM. There is a thread that executes the assignment of the regions in this queue. So when ever all the 60 regions are added to this queue and the assignment thread assigns them we have no problem. The Stochastic LB uses the replica concept to ensure the regions are assigned properly. The Cluster and Cost functions created per 'roundRobinAssignment' ensures that happens. But when the multi threaded model executes differently like the 60 regions are executed with 45 and 15 regions each then we end up in this issue every time. Because the roundRobinAssignment Cluster creation is not global and it is per assignment. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198177#comment-16198177 ] ramkrishna.s.vasudevan commented on HBASE-18946: [~huaxiang] I tried your fix by adding the default regions to the beginning of the list passed to AM. But still this issue seems to occur. I had a doubt with the LB and how the region replica awareness is passed on to the LB. Am reading that code to understand. will be back here. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194151#comment-16194151 ] ramkrishna.s.vasudevan commented on HBASE-18946: [~huaxiang] Thanks for the update. Great that you already feel this change will solve the problem. I was checking this issue but got pulled to something else. Will be back here and to check your suggestion. And yes we need to check all related areas and also other related branches also. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193404#comment-16193404 ] huaxiang sun commented on HBASE-18946: -- By the way, we need the fix to be backported to branch-1 as well, I checked the branch-1.2, the logic is same as current master branch. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193395#comment-16193395 ] huaxiang sun commented on HBASE-18946: -- Thanks [~ram_krish] for the finding! I checked the code, I think it is caused by the fact replica regions are added first, then all default regions are added at the end of the list. https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionReplicaUtil.java#L187 For example, 3 RSes and two regions with 3 replicas. {code} regionId-replicaId 0-10-2 1-11-2 0-0 1-0 R0 R1 R2 0-1 0-2 1-1 1-2 0-0 1-0 {code} We can see that for R1 and R2, replicas for same regions are assigned to the same RS. If the logic can be changed a bit as follows, it can fix this issue. Other places need to be checked as well. {code} public static List addReplicas(final TableDescriptor tableDescriptor, final List regions, int oldReplicaCount, int newReplicaCount) { if ((newReplicaCount - 1) <= 0) { return regions; } List hRegionInfos = new ArrayList<>((newReplicaCount) * regions.size()); for (int i = 0; i < regions.size(); i++) { if (RegionReplicaUtil.isDefaultReplica(regions.get(i))) { // region level replica index starts from 0. So if oldReplicaCount was 2 then the max replicaId for // the existing regions would be 1 hRegionInfos.add(regions.get(i)); for (int j = oldReplicaCount; j < newReplicaCount; j++) { hRegionInfos.add(RegionReplicaUtil.getRegionInfoForReplica(regions.get(i), j)); } } } // hRegionInfos.addAll(regions); {code} > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)