[jira] [Updated] (HBASE-17289) Avoid adding a replication peer named "lock"
[ https://issues.apache.org/jira/browse/HBASE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17289: --- Attachment: HBASE-17289-branch-1.patch Attach patch for branch-1. > Avoid adding a replication peer named "lock" > > > Key: HBASE-17289 > URL: https://issues.apache.org/jira/browse/HBASE-17289 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-17289-branch-1.1.patch, > HBASE-17289-branch-1.2.patch, HBASE-17289-branch-1.3.patch, > HBASE-17289-branch-1.patch > > > When zk based replication queue is used and useMulti is false, the steps of > transfer replication queues are first add a lock, then copy nodes, finally > clean old queue and the lock. And the default lock znode's name is "lock". So > we should avoid adding a peer named "lock". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17289) Avoid adding a replication peer named "lock"
[ https://issues.apache.org/jira/browse/HBASE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15740679#comment-15740679 ] Guanghao Zhang commented on HBASE-17289: Thanks. This variable can be used in test, too. It is a different package org.apache.hadoop.hbase.client.replication. > Avoid adding a replication peer named "lock" > > > Key: HBASE-17289 > URL: https://issues.apache.org/jira/browse/HBASE-17289 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-17289-branch-1.1.patch, > HBASE-17289-branch-1.2.patch, HBASE-17289-branch-1.3.patch, > HBASE-17289-branch-1.patch > > > When zk based replication queue is used and useMulti is false, the steps of > transfer replication queues are first add a lock, then copy nodes, finally > clean old queue and the lock. And the default lock znode's name is "lock". So > we should avoid adding a peer named "lock". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17443) Move listReplicated/enableTableRep/disableTableRep methods from ReplicationAdmin to Admin
[ https://issues.apache.org/jira/browse/HBASE-17443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17443: --- Description: We have moved other replication requests to Admin and mark ReplicationAdmin as Deprecated, so listReplicated/enableTableRep/disableTableRep methods need move to Admin, too. Review board: https://reviews.apache.org/r/55534/ was:We have moved other replication requests to Admin and mark ReplicationAdmin as Deprecated, so listReplicated/enableTableRep/disableTableRep methods need move to Admin, too. > Move listReplicated/enableTableRep/disableTableRep methods from > ReplicationAdmin to Admin > - > > Key: HBASE-17443 > URL: https://issues.apache.org/jira/browse/HBASE-17443 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17443-v1.patch, HBASE-17443-v2.patch, > HBASE-17443-v2.patch, HBASE-17443-v3.patch > > > We have moved other replication requests to Admin and mark ReplicationAdmin > as Deprecated, so listReplicated/enableTableRep/disableTableRep methods need > move to Admin, too. > Review board: https://reviews.apache.org/r/55534/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17442) Move most of the replication related classes to hbase-server package
Guanghao Zhang created HBASE-17442: -- Summary: Move most of the replication related classes to hbase-server package Key: HBASE-17442 URL: https://issues.apache.org/jira/browse/HBASE-17442 Project: HBase Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Guanghao Zhang After the replication requests are routed through master, replication implementation details didn't need be exposed to client. We should move most of the replication related classes to hbase-server package. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17337) list replication peers request should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-17337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17337: --- Affects Version/s: 2.0.0 > list replication peers request should be routed through master > -- > > Key: HBASE-17337 > URL: https://issues.apache.org/jira/browse/HBASE-17337 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17337-v1.patch, HBASE-17337-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17337) list replication peers request should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-17337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17337: --- Resolution: Fixed Release Note: List replication peers request will be roughed through master. Status: Resolved (was: Patch Available) Pushed to master. Thanks all for review. > list replication peers request should be routed through master > -- > > Key: HBASE-17337 > URL: https://issues.apache.org/jira/browse/HBASE-17337 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17337-v1.patch, HBASE-17337-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-17442) Move most of the replication related classes to hbase-server package
[ https://issues.apache.org/jira/browse/HBASE-17442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reassigned HBASE-17442: -- Assignee: Guanghao Zhang > Move most of the replication related classes to hbase-server package > > > Key: HBASE-17442 > URL: https://issues.apache.org/jira/browse/HBASE-17442 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > > After the replication requests are routed through master, replication > implementation details didn't need be exposed to client. We should move most > of the replication related classes to hbase-server package. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17443) Move listReplicated/enableTableRep/disableTableRep methods from ReplicationAdmin to Admin
[ https://issues.apache.org/jira/browse/HBASE-17443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17443: --- Attachment: HBASE-17443-v2.patch Failed ut not related. Trigger the Hadoop QA again. > Move listReplicated/enableTableRep/disableTableRep methods from > ReplicationAdmin to Admin > - > > Key: HBASE-17443 > URL: https://issues.apache.org/jira/browse/HBASE-17443 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17443-v1.patch, HBASE-17443-v2.patch, > HBASE-17443-v2.patch > > > We have moved other replication requests to Admin and mark ReplicationAdmin > as Deprecated, so listReplicated/enableTableRep/disableTableRep methods need > move to Admin, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17442) Move most of the replication related classes to hbase-server package
[ https://issues.apache.org/jira/browse/HBASE-17442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817025#comment-15817025 ] Guanghao Zhang commented on HBASE-17442: Now we don't have a hbase-replication module, it means we need a new module for hbase-replication. [~enis] What do you think about this? One question: Our hadoop QA only run hbase-client ut and hbase-server ut? If we have a hbase-replication module, I thought the hbase-replication ut also need run every time? > Move most of the replication related classes to hbase-server package > > > Key: HBASE-17442 > URL: https://issues.apache.org/jira/browse/HBASE-17442 > Project: HBase > Issue Type: Sub-task > Components: build, Replication >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > > After the replication requests are routed through master, replication > implementation details didn't need be exposed to client. We should move most > of the replication related classes to hbase-server package. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17443) Move listReplicated/enableTableRep/disableTableRep methods from ReplicationAdmin to Admin
[ https://issues.apache.org/jira/browse/HBASE-17443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817224#comment-15817224 ] Guanghao Zhang commented on HBASE-17443: [~enis] [~ashish singhi] Can you help review this? Thanks. > Move listReplicated/enableTableRep/disableTableRep methods from > ReplicationAdmin to Admin > - > > Key: HBASE-17443 > URL: https://issues.apache.org/jira/browse/HBASE-17443 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17443-v1.patch, HBASE-17443-v2.patch, > HBASE-17443-v2.patch > > > We have moved other replication requests to Admin and mark ReplicationAdmin > as Deprecated, so listReplicated/enableTableRep/disableTableRep methods need > move to Admin, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14061) Support CF-level Storage Policy
[ https://issues.apache.org/jira/browse/HBASE-14061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817410#comment-15817410 ] Guanghao Zhang commented on HBASE-14061: [~carp84] Did this related to this failed ut? {code} Unable to find suitable constructor for class org.apache.hadoop.hbase.mob.compactions.TestPartitionedMobCompactor$FaultyDistributedFileSystem Stacktrace java.lang.UnsupportedOperationException: Unable to find suitable constructor for class org.apache.hadoop.hbase.mob.compactions.TestPartitionedMobCompactor$FaultyDistributedFileSystem at org.apache.hadoop.hbase.util.ReflectionUtils.findConstructor(ReflectionUtils.java:103) at org.apache.hadoop.hbase.util.ReflectionUtils.newInstance(ReflectionUtils.java:73) at org.apache.hadoop.hbase.fs.HFileSystem.newInstanceFileSystem(HFileSystem.java:260) at org.apache.hadoop.hbase.fs.HFileSystem.(HFileSystem.java:110) at org.apache.hadoop.hbase.fs.HFileSystem.get(HFileSystem.java:476) at org.apache.hadoop.hbase.HBaseTestingUtility.getTestFileSystem(HBaseTestingUtility.java:2951) at org.apache.hadoop.hbase.HBaseTestingUtility.getNewDataTestDirOnTestFS(HBaseTestingUtility.java:565) at org.apache.hadoop.hbase.HBaseTestingUtility.setupDataTestDirOnTestFS(HBaseTestingUtility.java:554) at org.apache.hadoop.hbase.HBaseTestingUtility.getDataTestDirOnTestFS(HBaseTestingUtility.java:527) at org.apache.hadoop.hbase.HBaseTestingUtility.getDefaultRootDirPath(HBaseTestingUtility.java:1228) at org.apache.hadoop.hbase.HBaseTestingUtility.createRootDir(HBaseTestingUtility.java:1259) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1085) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1057) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:929) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:911) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:898) at org.apache.hadoop.hbase.mob.compactions.TestPartitionedMobCompactor.setUpBeforeClass(TestPartitionedMobCompactor.java:87) {code} > Support CF-level Storage Policy > --- > > Key: HBASE-14061 > URL: https://issues.apache.org/jira/browse/HBASE-14061 > Project: HBase > Issue Type: Sub-task > Components: HFile, regionserver > Environment: hadoop-2.6.0 >Reporter: Victor Xu >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14061-master-v1.patch, HBASE-14061.addendum.patch, > HBASE-14061.addendum.patch, HBASE-14061.v2.patch, HBASE-14061.v3.patch, > HBASE-14061.v4.patch > > > After reading [HBASE-12848|https://issues.apache.org/jira/browse/HBASE-12848] > and [HBASE-12934|https://issues.apache.org/jira/browse/HBASE-12934], I wrote > a patch to implement cf-level storage policy. > My main purpose is to improve random-read performance for some really hot > data, which usually locates in certain column family of a big table. > Usage: > $ hbase shell > > alter 'TABLE_NAME', METADATA => {'hbase.hstore.block.storage.policy' => > > 'POLICY_NAME'} > > alter 'TABLE_NAME', {NAME=>'CF_NAME', METADATA => > > {'hbase.hstore.block.storage.policy' => 'POLICY_NAME'}} > HDFS's setStoragePolicy can only take effect when new hfile is created in a > configured directory, so I had to make sub directories(for each cf) in > region's .tmp directory and set storage policy for them. > Besides, I had to upgrade hadoop version to 2.6.0 because > dfs.getStoragePolicy cannot be easily written in reflection, and I needed > this api to finish my unit test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17443) Move listReplicated/enableTableRep/disableTableRep methods from ReplicationAdmin to Admin
[ https://issues.apache.org/jira/browse/HBASE-17443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17443: --- Attachment: HBASE-17443-v2.patch Update a hbase-server ut to trigger Hadoop QA ut. > Move listReplicated/enableTableRep/disableTableRep methods from > ReplicationAdmin to Admin > - > > Key: HBASE-17443 > URL: https://issues.apache.org/jira/browse/HBASE-17443 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17443-v1.patch, HBASE-17443-v2.patch > > > We have moved other replication requests to Admin and mark ReplicationAdmin > as Deprecated, so listReplicated/enableTableRep/disableTableRep methods need > move to Admin, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17442) Move most of the replication related classes to hbase-server package
[ https://issues.apache.org/jira/browse/HBASE-17442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821053#comment-15821053 ] Guanghao Zhang commented on HBASE-17442: [~stack] I agree with this. Thanks for your help. :) [~enis] It's really good to know your thoughts about this. > Move most of the replication related classes to hbase-server package > > > Key: HBASE-17442 > URL: https://issues.apache.org/jira/browse/HBASE-17442 > Project: HBase > Issue Type: Sub-task > Components: build, Replication >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > > After the replication requests are routed through master, replication > implementation details didn't need be exposed to client. We should move most > of the replication related classes to hbase-server package. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17443) Move listReplicated/enableTableRep/disableTableRep methods from ReplicationAdmin to Admin
[ https://issues.apache.org/jira/browse/HBASE-17443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821060#comment-15821060 ] Guanghao Zhang commented on HBASE-17443: TestPartitionedMobCompactor has been resloved by HBASE-14061. > Move listReplicated/enableTableRep/disableTableRep methods from > ReplicationAdmin to Admin > - > > Key: HBASE-17443 > URL: https://issues.apache.org/jira/browse/HBASE-17443 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17443-v1.patch, HBASE-17443-v2.patch, > HBASE-17443-v2.patch > > > We have moved other replication requests to Admin and mark ReplicationAdmin > as Deprecated, so listReplicated/enableTableRep/disableTableRep methods need > move to Admin, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17443) Move listReplicated/enableTableRep/disableTableRep methods from ReplicationAdmin to Admin
[ https://issues.apache.org/jira/browse/HBASE-17443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17443: --- Attachment: HBASE-17443-v3.patch > Move listReplicated/enableTableRep/disableTableRep methods from > ReplicationAdmin to Admin > - > > Key: HBASE-17443 > URL: https://issues.apache.org/jira/browse/HBASE-17443 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17443-v1.patch, HBASE-17443-v2.patch, > HBASE-17443-v2.patch, HBASE-17443-v3.patch > > > We have moved other replication requests to Admin and mark ReplicationAdmin > as Deprecated, so listReplicated/enableTableRep/disableTableRep methods need > move to Admin, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17396) Add first async admin impl and implement balance methods
[ https://issues.apache.org/jira/browse/HBASE-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17396: --- Attachment: HBASE-17396-v5.patch > Add first async admin impl and implement balance methods > > > Key: HBASE-17396 > URL: https://issues.apache.org/jira/browse/HBASE-17396 > Project: HBase > Issue Type: Sub-task > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17396-v1.patch, HBASE-17396-v2.patch, > HBASE-17396-v3.patch, HBASE-17396-v4.patch, HBASE-17396-v5.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14061) Support CF-level Storage Policy
[ https://issues.apache.org/jira/browse/HBASE-14061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817922#comment-15817922 ] Guanghao Zhang commented on HBASE-14061: Test 2nd addendum patch locally and TestPartitionedMobCompactor passed. +1. > Support CF-level Storage Policy > --- > > Key: HBASE-14061 > URL: https://issues.apache.org/jira/browse/HBASE-14061 > Project: HBase > Issue Type: Sub-task > Components: HFile, regionserver > Environment: hadoop-2.6.0 >Reporter: Victor Xu >Assignee: Yu Li > Fix For: 2.0.0 > > Attachments: HBASE-14061-master-v1.patch, HBASE-14061.addendum.patch, > HBASE-14061.addendum.patch, HBASE-14061.addendum2.patch, > HBASE-14061.addendum2.patch, HBASE-14061.v2.patch, HBASE-14061.v3.patch, > HBASE-14061.v4.patch > > > After reading [HBASE-12848|https://issues.apache.org/jira/browse/HBASE-12848] > and [HBASE-12934|https://issues.apache.org/jira/browse/HBASE-12934], I wrote > a patch to implement cf-level storage policy. > My main purpose is to improve random-read performance for some really hot > data, which usually locates in certain column family of a big table. > Usage: > $ hbase shell > > alter 'TABLE_NAME', METADATA => {'hbase.hstore.block.storage.policy' => > > 'POLICY_NAME'} > > alter 'TABLE_NAME', {NAME=>'CF_NAME', METADATA => > > {'hbase.hstore.block.storage.policy' => 'POLICY_NAME'}} > HDFS's setStoragePolicy can only take effect when new hfile is created in a > configured directory, so I had to make sub directories(for each cf) in > region's .tmp directory and set storage policy for them. > Besides, I had to upgrade hadoop version to 2.6.0 because > dfs.getStoragePolicy cannot be easily written in reflection, and I needed > this api to finish my unit test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17205) Add a metric for the duration of region in transition
[ https://issues.apache.org/jira/browse/HBASE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713594#comment-15713594 ] Guanghao Zhang commented on HBASE-17205: Thanks [~mbertozzi] for reviewing. > Add a metric for the duration of region in transition > - > > Key: HBASE-17205 > URL: https://issues.apache.org/jira/browse/HBASE-17205 > Project: HBase > Issue Type: Improvement > Components: Region Assignment >Affects Versions: 2.0.0, 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17205-branch-1.patch, HBASE-17205-v1.patch, > HBASE-17205-v1.patch, HBASE-17205.patch > > > When work for HBASE-17178, I found there are not a metric for the overall > duration of region in transition. When move a region form A to B, the > transformation of region state is PENDING_CLOSE => CLOSING => CLOSED => > PENDING_OPEN => OPENING => OPENED. When transform old region state to new > region state, it update the time stamp to current time. So we can't get the > overall transformation's duration of region in transition. Add a rit duration > to RegionState for accumulating this metric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17140) Reduce meta request number by skipping table state check
[ https://issues.apache.org/jira/browse/HBASE-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707486#comment-15707486 ] Guanghao Zhang commented on HBASE-17140: Thanks for your reply. bq. In the case of (2), the HRI for parent region is saved with split=true, offline=true (similar for merge). If I am not wrong, when merge A and B to a new region, the region info of A and B are deleted directly? So split=true, offline=true means a split parent region. And offline=true means a region of disabled table. bq. When the table is re-enabled again, we do not want to bring back the old parents. When enable a table, it need to get the table regions first and the split parent region will be filtered in this step. So I thought it can't bring back? > Reduce meta request number by skipping table state check > > > Key: HBASE-17140 > URL: https://issues.apache.org/jira/browse/HBASE-17140 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17140-v1.patch, HBASE-17140-v2.patch, > HBASE-17140-v3.patch, HBASE-17140-v4.patch, HBASE-17140-v5.patch > > > Now when request for a disabled table, it need 3 rpc calls before fail. > 1. get region location > 2. send call to rs and get NotServeRegionException > 3. retry and check the table state, then throw TableNotEnabledException > The table state check is added for disabled table. But now the prepare method > in RegionServerCallable shows that all retry request will get table state > first. > {code} > public void prepare(final boolean reload) throws IOException { > // check table state if this is a retry > if (reload && !tableName.equals(TableName.META_TABLE_NAME) && > getConnection().isTableDisabled(tableName)) { > throw new TableNotEnabledException(tableName.getNameAsString() + " is > disabled."); > } > try (RegionLocator regionLocator = > connection.getRegionLocator(tableName)) { > this.location = regionLocator.getRegionLocation(row); > } > if (this.location == null) { > throw new IOException("Failed to find location, tableName=" + tableName > + > ", row=" + Bytes.toString(row) + ", reload=" + reload); > } > setStubByServiceName(this.location.getServerName()); > } > {code} > An improvement is set the region offline in HRegionInfo and throw the > RegionOfflineException when get region location. Then we don't need check > table state for any retry request. > Review board: https://reviews.apache.org/r/54071/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17178) Add region balance throttling
[ https://issues.apache.org/jira/browse/HBASE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17178: --- Attachment: HBASE-17178-v5.patch Attach a v5 patch addressed review comments. > Add region balance throttling > - > > Key: HBASE-17178 > URL: https://issues.apache.org/jira/browse/HBASE-17178 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17178-v1.patch, HBASE-17178-v2.patch, > HBASE-17178-v3.patch, HBASE-17178-v4.patch, HBASE-17178-v5.patch > > > Our online cluster serves dozens of tables and different tables serve for > different services. If the balancer moves too many regions in the same time, > it will decrease the availability for some table or some services. So we add > region balance throttling on our online serve cluster. > We introduce a new config hbase.balancer.max.balancing.regions, which means > the max number of regions in transition when balancing. > If we config this to 1 and a table have 100 regions, then the table will have > 99 regions available at any time. It helps a lot for our use case and it has > been running a long time > our production cluster. > But for some use case, we need the balancer run faster. If a cluster has 100 > regionservers, then it add 50 new regionservers for peak requests. Then it > need balancer run as soon as > possible and let the cluster reach a balance state soon. Our idea is compute > max number of regions in transition by the max balancing time and the average > time of region in transition. > Then the balancer use the computed value to throttling. > Examples for understanding. > A cluster has 100 regionservers, each regionserver has 200 regions and the > average time of region in transition is 1 seconds, we config the max > balancing time is 10 * 60 seconds. > Case 1. One regionserver crash, the cluster at most need balance 200 regions. > Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in > transition is 1 when balancing. Then the balancer can move region one by one > and the cluster will have high availability when balancing. > Case 2. Add other 100 regionservers, the cluster at most need balance 1 > regions. Then 1 / (10 * 60s / 1s) = 16.7, it means the max number of > regions in transition is 17 when balancing. Then the cluster can reach a > balance state within the max balancing time. > Any suggestions are welcomed. > Review board: https://reviews.apache.org/r/54191/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17178) Add region balance throttling
[ https://issues.apache.org/jira/browse/HBASE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17178: --- Attachment: HBASE-17178-branch-1.patch Attach patch for branch-1. > Add region balance throttling > - > > Key: HBASE-17178 > URL: https://issues.apache.org/jira/browse/HBASE-17178 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 2.0.0, 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17178-branch-1.patch, HBASE-17178-v1.patch, > HBASE-17178-v2.patch, HBASE-17178-v3.patch, HBASE-17178-v4.patch, > HBASE-17178-v5.patch > > > Our online cluster serves dozens of tables and different tables serve for > different services. If the balancer moves too many regions in the same time, > it will decrease the availability for some table or some services. So we add > region balance throttling on our online serve cluster. > We introduce a new config hbase.balancer.max.balancing.regions, which means > the max number of regions in transition when balancing. > If we config this to 1 and a table have 100 regions, then the table will have > 99 regions available at any time. It helps a lot for our use case and it has > been running a long time > our production cluster. > But for some use case, we need the balancer run faster. If a cluster has 100 > regionservers, then it add 50 new regionservers for peak requests. Then it > need balancer run as soon as > possible and let the cluster reach a balance state soon. Our idea is compute > max number of regions in transition by the max balancing time and the average > time of region in transition. > Then the balancer use the computed value to throttling. > Examples for understanding. > A cluster has 100 regionservers, each regionserver has 200 regions and the > average time of region in transition is 1 seconds, we config the max > balancing time is 10 * 60 seconds. > Case 1. One regionserver crash, the cluster at most need balance 200 regions. > Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in > transition is 1 when balancing. Then the balancer can move region one by one > and the cluster will have high availability when balancing. > Case 2. Add other 100 regionservers, the cluster at most need balance 1 > regions. Then 1 / (10 * 60s / 1s) = 16.7, it means the max number of > regions in transition is 17 when balancing. Then the cluster can reach a > balance state within the max balancing time. > Any suggestions are welcomed. > Review board: https://reviews.apache.org/r/54191/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17178) Add region balance throttling
[ https://issues.apache.org/jira/browse/HBASE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17178: --- Affects Version/s: 1.4.0 2.0.0 > Add region balance throttling > - > > Key: HBASE-17178 > URL: https://issues.apache.org/jira/browse/HBASE-17178 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 2.0.0, 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17178-branch-1.patch, HBASE-17178-v1.patch, > HBASE-17178-v2.patch, HBASE-17178-v3.patch, HBASE-17178-v4.patch, > HBASE-17178-v5.patch > > > Our online cluster serves dozens of tables and different tables serve for > different services. If the balancer moves too many regions in the same time, > it will decrease the availability for some table or some services. So we add > region balance throttling on our online serve cluster. > We introduce a new config hbase.balancer.max.balancing.regions, which means > the max number of regions in transition when balancing. > If we config this to 1 and a table have 100 regions, then the table will have > 99 regions available at any time. It helps a lot for our use case and it has > been running a long time > our production cluster. > But for some use case, we need the balancer run faster. If a cluster has 100 > regionservers, then it add 50 new regionservers for peak requests. Then it > need balancer run as soon as > possible and let the cluster reach a balance state soon. Our idea is compute > max number of regions in transition by the max balancing time and the average > time of region in transition. > Then the balancer use the computed value to throttling. > Examples for understanding. > A cluster has 100 regionservers, each regionserver has 200 regions and the > average time of region in transition is 1 seconds, we config the max > balancing time is 10 * 60 seconds. > Case 1. One regionserver crash, the cluster at most need balance 200 regions. > Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in > transition is 1 when balancing. Then the balancer can move region one by one > and the cluster will have high availability when balancing. > Case 2. Add other 100 regionservers, the cluster at most need balance 1 > regions. Then 1 / (10 * 60s / 1s) = 16.7, it means the max number of > regions in transition is 17 when balancing. Then the cluster can reach a > balance state within the max balancing time. > Any suggestions are welcomed. > Review board: https://reviews.apache.org/r/54191/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17140) Reduce meta request number by skipping table state check
[ https://issues.apache.org/jira/browse/HBASE-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17140: --- Summary: Reduce meta request number by skipping table state check (was: Throw RegionOfflineException directly when request for a disabled table) > Reduce meta request number by skipping table state check > > > Key: HBASE-17140 > URL: https://issues.apache.org/jira/browse/HBASE-17140 > Project: HBase > Issue Type: Improvement > Components: Client >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17140-v1.patch, HBASE-17140-v2.patch, > HBASE-17140-v3.patch, HBASE-17140-v4.patch, HBASE-17140-v5.patch > > > Now when request for a disabled table, it need 3 rpc calls before fail. > 1. get region location > 2. send call to rs and get NotServeRegionException > 3. retry and check the table state, then throw TableNotEnabledException > The table state check is added for disabled table. But now the prepare method > in RegionServerCallable shows that all retry request will get table state > first. > {code} > public void prepare(final boolean reload) throws IOException { > // check table state if this is a retry > if (reload && !tableName.equals(TableName.META_TABLE_NAME) && > getConnection().isTableDisabled(tableName)) { > throw new TableNotEnabledException(tableName.getNameAsString() + " is > disabled."); > } > try (RegionLocator regionLocator = > connection.getRegionLocator(tableName)) { > this.location = regionLocator.getRegionLocation(row); > } > if (this.location == null) { > throw new IOException("Failed to find location, tableName=" + tableName > + > ", row=" + Bytes.toString(row) + ", reload=" + reload); > } > setStubByServiceName(this.location.getServerName()); > } > {code} > An improvement is set the region offline in HRegionInfo. Then throw the > RegionOfflineException when get region location. > Review board: https://reviews.apache.org/r/54071/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17178) Add region balance throttling
[ https://issues.apache.org/jira/browse/HBASE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17178: --- Description: Our online cluster serves dozens of tables and different tables serve for different services. If the balancer moves too many regions in the same time, it will decrease the availability for some table or some services. So we add region balance throttling on our online serve cluster. We introduce a new config hbase.balancer.max.balancing.regions, which means the max number of regions in transition when balancing. If we config this to 1 and a table have 100 regions, then the table will have 99 regions available at any time. It helps a lot for our use case and it has been running a long time our production cluster. But for some use case, we need the balancer run faster. If a cluster has 100 regionservers, then it add 50 new regionservers for peak requests. Then it need balancer run as soon as possible and let the cluster reach a balance state soon. Our idea is compute max number of regions in transition by the max balancing time and the average time of region in transition. Then the balancer use the computed value to throttling. Examples for understanding. A cluster has 100 regionservers, each regionserver has 200 regions and the average time of region in transition is 1 seconds, we config the max balancing time is 10 * 60 seconds. Case 1. One regionserver crash, the cluster at most need balance 200 regions. Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in transition is 1 when balancing. Then the balancer can move region one by one and the cluster will have high availability when balancing. Case 2. Add other 100 regionservers, the cluster at most need balance 1 regions. Then 1 / (10 * 60s / 1s) = 16.7, it means the max number of regions in transition is 17 when balancing. Then the cluster can reach a balance state within the max balancing time. Any suggestions are welcomed. Review board: https://reviews.apache.org/r/54191/ was: Our online cluster serves dozens of tables and different tables serve for different services. If the balancer moves too many regions in the same time, it will decrease the availability for some table or some services. So we add region balance throttling on our online serve cluster. We introduce a new config hbase.balancer.max.balancing.regions, which means the max number of regions in transition when balancing. If we config this to 1 and a table have 100 regions, then the table will have 99 regions available at any time. It helps a lot for our use case and it has been running a long time our production cluster. But for some use case, we need the balancer run faster. If a cluster has 100 regionservers, then it add 50 new regionservers for peak requests. Then it need balancer run as soon as possible and let the cluster reach a balance state soon. Our idea is compute max number of regions in transition by the max balancing time and the average time of region in transition. Then the balancer use the computed value to throttling. Examples for understanding. A cluster has 100 regionservers, each regionserver has 200 regions and the average time of region in transition is 1 seconds, we config the max balancing time is 10 * 60 seconds. Case 1. One regionserver crash, the cluster at most need balance 200 regions. Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in transition is 1 when balancing. Then the balancer can move region one by one and the cluster will have high availability when balancing. Case 2. Add other 100 regionservers, the cluster at most need balance 1 regions. Then 1 / (10 * 60s / 1s) = 16.7, it means the max number of regions in transition is 17 when balancing. Then the cluster can reach a balance state within the max balancing time. Any suggestions are welcomed. > Add region balance throttling > - > > Key: HBASE-17178 > URL: https://issues.apache.org/jira/browse/HBASE-17178 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17178-v1.patch, HBASE-17178-v2.patch, > HBASE-17178-v3.patch, HBASE-17178-v4.patch > > > Our online cluster serves dozens of tables and different tables serve for > different services. If the balancer moves too many regions in the same time, > it will decrease the availability for some table or some services. So we add > region balance throttling on our online serve cluster. > We introduce a new config hbase.balancer.max.balancing.regions, which means > the max number of regions in transition when balancing. > If we config this to 1 and a table have 100 regions, then the table will have > 99 regions available at any
[jira] [Updated] (HBASE-17178) Add region balance throttling
[ https://issues.apache.org/jira/browse/HBASE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17178: --- Attachment: HBASE-17178-v4.patch > Add region balance throttling > - > > Key: HBASE-17178 > URL: https://issues.apache.org/jira/browse/HBASE-17178 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17178-v1.patch, HBASE-17178-v2.patch, > HBASE-17178-v3.patch, HBASE-17178-v4.patch > > > Our online cluster serves dozens of tables and different tables serve for > different services. If the balancer moves too many regions in the same time, > it will decrease the availability for some table or some services. So we add > region balance throttling on our online serve cluster. > We introduce a new config hbase.balancer.max.balancing.regions, which means > the max number of regions in transition when balancing. > If we config this to 1 and a table have 100 regions, then the table will have > 99 regions available at any time. It helps a lot for our use case and it has > been running a long time > our production cluster. > But for some use case, we need the balancer run faster. If a cluster has 100 > regionservers, then it add 50 new regionservers for peak requests. Then it > need balancer run as soon as > possible and let the cluster reach a balance state soon. Our idea is compute > max number of regions in transition by the max balancing time and the average > time of region in transition. > Then the balancer use the computed value to throttling. > Examples for understanding. > A cluster has 100 regionservers, each regionserver has 200 regions and the > average time of region in transition is 1 seconds, we config the max > balancing time is 10 * 60 seconds. > Case 1. One regionserver crash, the cluster at most need balance 200 regions. > Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in > transition is 1 when balancing. Then the balancer can move region one by one > and the cluster will have high availability when balancing. > Case 2. Add other 100 regionservers, the cluster at most need balance 1 > regions. Then 1 / (10 * 60s / 1s) = 16.7, it means the max number of > regions in transition is 17 when balancing. Then the cluster can reach a > balance state within the max balancing time. > Any suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17178) Add region balance throttling
[ https://issues.apache.org/jira/browse/HBASE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707188#comment-15707188 ] Guanghao Zhang commented on HBASE-17178: Review board: https://reviews.apache.org/r/54191/ bq. Move this line out of synchronized Fixed in v4 patch. bq. Shall the balancing be affected by other RIT? Assuming RS crash happened in middle of balancing, shall we wait? Yes, balancing will be affected by other RIT. This is for availability. If RS crash happend in middle of balancing, there will be more regions in transition. Then the balancer can't finish all region plans. The cluster need a next round balance to reach a balance state. bq. the code flow of balancer might block here and not controlled by the cutoffTime? Fixed in v4 patch. It need break the sleep when exceeds cutoff time. Review board: https://reviews.apache.org/r/54191/ > Add region balance throttling > - > > Key: HBASE-17178 > URL: https://issues.apache.org/jira/browse/HBASE-17178 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17178-v1.patch, HBASE-17178-v2.patch, > HBASE-17178-v3.patch, HBASE-17178-v4.patch > > > Our online cluster serves dozens of tables and different tables serve for > different services. If the balancer moves too many regions in the same time, > it will decrease the availability for some table or some services. So we add > region balance throttling on our online serve cluster. > We introduce a new config hbase.balancer.max.balancing.regions, which means > the max number of regions in transition when balancing. > If we config this to 1 and a table have 100 regions, then the table will have > 99 regions available at any time. It helps a lot for our use case and it has > been running a long time > our production cluster. > But for some use case, we need the balancer run faster. If a cluster has 100 > regionservers, then it add 50 new regionservers for peak requests. Then it > need balancer run as soon as > possible and let the cluster reach a balance state soon. Our idea is compute > max number of regions in transition by the max balancing time and the average > time of region in transition. > Then the balancer use the computed value to throttling. > Examples for understanding. > A cluster has 100 regionservers, each regionserver has 200 regions and the > average time of region in transition is 1 seconds, we config the max > balancing time is 10 * 60 seconds. > Case 1. One regionserver crash, the cluster at most need balance 200 regions. > Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in > transition is 1 when balancing. Then the balancer can move region one by one > and the cluster will have high availability when balancing. > Case 2. Add other 100 regionservers, the cluster at most need balance 1 > regions. Then 1 / (10 * 60s / 1s) = 16.7, it means the max number of > regions in transition is 17 when balancing. Then the cluster can reach a > balance state within the max balancing time. > Any suggestions are welcomed. > Review board: https://reviews.apache.org/r/54191/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-17178) Add region balance throttling
[ https://issues.apache.org/jira/browse/HBASE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707188#comment-15707188 ] Guanghao Zhang edited comment on HBASE-17178 at 11/30/16 1:38 AM: -- Review board: https://reviews.apache.org/r/54191/ bq. Move this line out of synchronized Fixed in v4 patch. bq. Shall the balancing be affected by other RIT? Assuming RS crash happened in middle of balancing, shall we wait? Yes, balancing will be affected by other RIT. This is for availability. If RS crash happend in middle of balancing, there will be more regions in transition. Then the balancer can't finish all region plans. The cluster need a next round balance to reach a balance state. bq. the code flow of balancer might block here and not controlled by the cutoffTime? Fixed in v4 patch. It need break the sleep when exceeds cutoff time. was (Author: zghaobac): Review board: https://reviews.apache.org/r/54191/ bq. Move this line out of synchronized Fixed in v4 patch. bq. Shall the balancing be affected by other RIT? Assuming RS crash happened in middle of balancing, shall we wait? Yes, balancing will be affected by other RIT. This is for availability. If RS crash happend in middle of balancing, there will be more regions in transition. Then the balancer can't finish all region plans. The cluster need a next round balance to reach a balance state. bq. the code flow of balancer might block here and not controlled by the cutoffTime? Fixed in v4 patch. It need break the sleep when exceeds cutoff time. Review board: https://reviews.apache.org/r/54191/ > Add region balance throttling > - > > Key: HBASE-17178 > URL: https://issues.apache.org/jira/browse/HBASE-17178 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17178-v1.patch, HBASE-17178-v2.patch, > HBASE-17178-v3.patch, HBASE-17178-v4.patch > > > Our online cluster serves dozens of tables and different tables serve for > different services. If the balancer moves too many regions in the same time, > it will decrease the availability for some table or some services. So we add > region balance throttling on our online serve cluster. > We introduce a new config hbase.balancer.max.balancing.regions, which means > the max number of regions in transition when balancing. > If we config this to 1 and a table have 100 regions, then the table will have > 99 regions available at any time. It helps a lot for our use case and it has > been running a long time > our production cluster. > But for some use case, we need the balancer run faster. If a cluster has 100 > regionservers, then it add 50 new regionservers for peak requests. Then it > need balancer run as soon as > possible and let the cluster reach a balance state soon. Our idea is compute > max number of regions in transition by the max balancing time and the average > time of region in transition. > Then the balancer use the computed value to throttling. > Examples for understanding. > A cluster has 100 regionservers, each regionserver has 200 regions and the > average time of region in transition is 1 seconds, we config the max > balancing time is 10 * 60 seconds. > Case 1. One regionserver crash, the cluster at most need balance 200 regions. > Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in > transition is 1 when balancing. Then the balancer can move region one by one > and the cluster will have high availability when balancing. > Case 2. Add other 100 regionservers, the cluster at most need balance 1 > regions. Then 1 / (10 * 60s / 1s) = 16.7, it means the max number of > regions in transition is 17 when balancing. Then the cluster can reach a > balance state within the max balancing time. > Any suggestions are welcomed. > Review board: https://reviews.apache.org/r/54191/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17140) Reduce meta request number by skipping table state check
[ https://issues.apache.org/jira/browse/HBASE-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17140: --- Affects Version/s: 2.0.0 Description: Now when request for a disabled table, it need 3 rpc calls before fail. 1. get region location 2. send call to rs and get NotServeRegionException 3. retry and check the table state, then throw TableNotEnabledException The table state check is added for disabled table. But now the prepare method in RegionServerCallable shows that all retry request will get table state first. {code} public void prepare(final boolean reload) throws IOException { // check table state if this is a retry if (reload && !tableName.equals(TableName.META_TABLE_NAME) && getConnection().isTableDisabled(tableName)) { throw new TableNotEnabledException(tableName.getNameAsString() + " is disabled."); } try (RegionLocator regionLocator = connection.getRegionLocator(tableName)) { this.location = regionLocator.getRegionLocation(row); } if (this.location == null) { throw new IOException("Failed to find location, tableName=" + tableName + ", row=" + Bytes.toString(row) + ", reload=" + reload); } setStubByServiceName(this.location.getServerName()); } {code} An improvement is set the region offline in HRegionInfo and throw the RegionOfflineException when get region location. Then we don't need check table state for any retry request. Review board: https://reviews.apache.org/r/54071/ was: Now when request for a disabled table, it need 3 rpc calls before fail. 1. get region location 2. send call to rs and get NotServeRegionException 3. retry and check the table state, then throw TableNotEnabledException The table state check is added for disabled table. But now the prepare method in RegionServerCallable shows that all retry request will get table state first. {code} public void prepare(final boolean reload) throws IOException { // check table state if this is a retry if (reload && !tableName.equals(TableName.META_TABLE_NAME) && getConnection().isTableDisabled(tableName)) { throw new TableNotEnabledException(tableName.getNameAsString() + " is disabled."); } try (RegionLocator regionLocator = connection.getRegionLocator(tableName)) { this.location = regionLocator.getRegionLocation(row); } if (this.location == null) { throw new IOException("Failed to find location, tableName=" + tableName + ", row=" + Bytes.toString(row) + ", reload=" + reload); } setStubByServiceName(this.location.getServerName()); } {code} An improvement is set the region offline in HRegionInfo. Then throw the RegionOfflineException when get region location. Review board: https://reviews.apache.org/r/54071/ > Reduce meta request number by skipping table state check > > > Key: HBASE-17140 > URL: https://issues.apache.org/jira/browse/HBASE-17140 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17140-v1.patch, HBASE-17140-v2.patch, > HBASE-17140-v3.patch, HBASE-17140-v4.patch, HBASE-17140-v5.patch > > > Now when request for a disabled table, it need 3 rpc calls before fail. > 1. get region location > 2. send call to rs and get NotServeRegionException > 3. retry and check the table state, then throw TableNotEnabledException > The table state check is added for disabled table. But now the prepare method > in RegionServerCallable shows that all retry request will get table state > first. > {code} > public void prepare(final boolean reload) throws IOException { > // check table state if this is a retry > if (reload && !tableName.equals(TableName.META_TABLE_NAME) && > getConnection().isTableDisabled(tableName)) { > throw new TableNotEnabledException(tableName.getNameAsString() + " is > disabled."); > } > try (RegionLocator regionLocator = > connection.getRegionLocator(tableName)) { > this.location = regionLocator.getRegionLocation(row); > } > if (this.location == null) { > throw new IOException("Failed to find location, tableName=" + tableName > + > ", row=" + Bytes.toString(row) + ", reload=" + reload); > } > setStubByServiceName(this.location.getServerName()); > } > {code} > An improvement is set the region offline in HRegionInfo and throw the > RegionOfflineException when get region location. Then we don't need check > table state for any retry request. > Review board: https://reviews.apache.org/r/54071/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17178) Add region balance throttling
[ https://issues.apache.org/jira/browse/HBASE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17178: --- Release Note: Add region balance throttling. Master execute every region balance plan per balance interval, which is equals to divide max balancing time by the size of region balance plan. And Introduce a new config hbase.master.balancer.maxRitPercent to protect availability. If config this to 0.01, then the max percent of regions in transition is 1% when balancing. Then the cluster's availability is at least 99% when balancing. > Add region balance throttling > - > > Key: HBASE-17178 > URL: https://issues.apache.org/jira/browse/HBASE-17178 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17178-v1.patch, HBASE-17178-v2.patch, > HBASE-17178-v3.patch > > > Our online cluster serves dozens of tables and different tables serve for > different services. If the balancer moves too many regions in the same time, > it will decrease the availability for some table or some services. So we add > region balance throttling on our online serve cluster. > We introduce a new config hbase.balancer.max.balancing.regions, which means > the max number of regions in transition when balancing. > If we config this to 1 and a table have 100 regions, then the table will have > 99 regions available at any time. It helps a lot for our use case and it has > been running a long time > our production cluster. > But for some use case, we need the balancer run faster. If a cluster has 100 > regionservers, then it add 50 new regionservers for peak requests. Then it > need balancer run as soon as > possible and let the cluster reach a balance state soon. Our idea is compute > max number of regions in transition by the max balancing time and the average > time of region in transition. > Then the balancer use the computed value to throttling. > Examples for understanding. > A cluster has 100 regionservers, each regionserver has 200 regions and the > average time of region in transition is 1 seconds, we config the max > balancing time is 10 * 60 seconds. > Case 1. One regionserver crash, the cluster at most need balance 200 regions. > Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in > transition is 1 when balancing. Then the balancer can move region one by one > and the cluster will have high availability when balancing. > Case 2. Add other 100 regionservers, the cluster at most need balance 1 > regions. Then 1 / (10 * 60s / 1s) = 16.7, it means the max number of > regions in transition is 17 when balancing. Then the cluster can reach a > balance state within the max balancing time. > Any suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17178) Add region balance throttling
[ https://issues.apache.org/jira/browse/HBASE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17178: --- Attachment: HBASE-17178-v6.patch > Add region balance throttling > - > > Key: HBASE-17178 > URL: https://issues.apache.org/jira/browse/HBASE-17178 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 2.0.0, 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17178-branch-1.patch, HBASE-17178-v1.patch, > HBASE-17178-v2.patch, HBASE-17178-v3.patch, HBASE-17178-v4.patch, > HBASE-17178-v5.patch, HBASE-17178-v6.patch > > > Our online cluster serves dozens of tables and different tables serve for > different services. If the balancer moves too many regions in the same time, > it will decrease the availability for some table or some services. So we add > region balance throttling on our online serve cluster. > We introduce a new config hbase.balancer.max.balancing.regions, which means > the max number of regions in transition when balancing. > If we config this to 1 and a table have 100 regions, then the table will have > 99 regions available at any time. It helps a lot for our use case and it has > been running a long time > our production cluster. > But for some use case, we need the balancer run faster. If a cluster has 100 > regionservers, then it add 50 new regionservers for peak requests. Then it > need balancer run as soon as > possible and let the cluster reach a balance state soon. Our idea is compute > max number of regions in transition by the max balancing time and the average > time of region in transition. > Then the balancer use the computed value to throttling. > Examples for understanding. > A cluster has 100 regionservers, each regionserver has 200 regions and the > average time of region in transition is 1 seconds, we config the max > balancing time is 10 * 60 seconds. > Case 1. One regionserver crash, the cluster at most need balance 200 regions. > Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in > transition is 1 when balancing. Then the balancer can move region one by one > and the cluster will have high availability when balancing. > Case 2. Add other 100 regionservers, the cluster at most need balance 1 > regions. Then 1 / (10 * 60s / 1s) = 16.7, it means the max number of > regions in transition is 17 when balancing. Then the cluster can reach a > balance state within the max balancing time. > Any suggestions are welcomed. > Review board: https://reviews.apache.org/r/54191/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17178) Add region balance throttling
[ https://issues.apache.org/jira/browse/HBASE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17178: --- Attachment: HBASE-17178-branch-1-v1.patch Update patch for branch-1. > Add region balance throttling > - > > Key: HBASE-17178 > URL: https://issues.apache.org/jira/browse/HBASE-17178 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 2.0.0, 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17178-branch-1-v1.patch, > HBASE-17178-branch-1.patch, HBASE-17178-v1.patch, HBASE-17178-v2.patch, > HBASE-17178-v3.patch, HBASE-17178-v4.patch, HBASE-17178-v5.patch, > HBASE-17178-v6.patch > > > Our online cluster serves dozens of tables and different tables serve for > different services. If the balancer moves too many regions in the same time, > it will decrease the availability for some table or some services. So we add > region balance throttling on our online serve cluster. > We introduce a new config hbase.balancer.max.balancing.regions, which means > the max number of regions in transition when balancing. > If we config this to 1 and a table have 100 regions, then the table will have > 99 regions available at any time. It helps a lot for our use case and it has > been running a long time > our production cluster. > But for some use case, we need the balancer run faster. If a cluster has 100 > regionservers, then it add 50 new regionservers for peak requests. Then it > need balancer run as soon as > possible and let the cluster reach a balance state soon. Our idea is compute > max number of regions in transition by the max balancing time and the average > time of region in transition. > Then the balancer use the computed value to throttling. > Examples for understanding. > A cluster has 100 regionservers, each regionserver has 200 regions and the > average time of region in transition is 1 seconds, we config the max > balancing time is 10 * 60 seconds. > Case 1. One regionserver crash, the cluster at most need balance 200 regions. > Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in > transition is 1 when balancing. Then the balancer can move region one by one > and the cluster will have high availability when balancing. > Case 2. Add other 100 regionservers, the cluster at most need balance 1 > regions. Then 1 / (10 * 60s / 1s) = 16.7, it means the max number of > regions in transition is 17 when balancing. Then the cluster can reach a > balance state within the max balancing time. > Any suggestions are welcomed. > Review board: https://reviews.apache.org/r/54191/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17205) Add a metric for the duration of region in transition
[ https://issues.apache.org/jira/browse/HBASE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17205: --- Status: Patch Available (was: Open) > Add a metric for the duration of region in transition > - > > Key: HBASE-17205 > URL: https://issues.apache.org/jira/browse/HBASE-17205 > Project: HBase > Issue Type: Improvement > Components: Region Assignment >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-17205.patch > > > When work for HBASE-17178, I found there are not a metric for the overall > duration of region in transition. When move a region form A to B, the > transformation of region state is PENDING_CLOSE => CLOSING => CLOSED => > PENDING_OPEN => OPENING => OPENED. When transform old region state to new > region state, it update the time stamp to current time. So we can't get the > overall transformation's duration of region in transition. Add a rit duration > to RegionState for accumulating this metric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17205) Add a metric for the duration of region in transition
[ https://issues.apache.org/jira/browse/HBASE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17205: --- Attachment: HBASE-17205.patch > Add a metric for the duration of region in transition > - > > Key: HBASE-17205 > URL: https://issues.apache.org/jira/browse/HBASE-17205 > Project: HBase > Issue Type: Improvement > Components: Region Assignment >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-17205.patch > > > When work for HBASE-17178, I found there are not a metric for the overall > duration of region in transition. When move a region form A to B, the > transformation of region state is PENDING_CLOSE => CLOSING => CLOSED => > PENDING_OPEN => OPENING => OPENED. When transform old region state to new > region state, it update the time stamp to current time. So we can't get the > overall transformation's duration of region in transition. Add a rit duration > to RegionState for accumulating this metric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17178) Add region balance throttling
[ https://issues.apache.org/jira/browse/HBASE-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708146#comment-15708146 ] Guanghao Zhang commented on HBASE-17178: Thanks [~yangzhe1991] [~tedyu] [~carp84] [~ashish singhi] for reviewing. > Add region balance throttling > - > > Key: HBASE-17178 > URL: https://issues.apache.org/jira/browse/HBASE-17178 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 2.0.0, 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17178-branch-1-v1.patch, > HBASE-17178-branch-1.patch, HBASE-17178-v1.patch, HBASE-17178-v2.patch, > HBASE-17178-v3.patch, HBASE-17178-v4.patch, HBASE-17178-v5.patch, > HBASE-17178-v6.patch > > > Our online cluster serves dozens of tables and different tables serve for > different services. If the balancer moves too many regions in the same time, > it will decrease the availability for some table or some services. So we add > region balance throttling on our online serve cluster. > We introduce a new config hbase.balancer.max.balancing.regions, which means > the max number of regions in transition when balancing. > If we config this to 1 and a table have 100 regions, then the table will have > 99 regions available at any time. It helps a lot for our use case and it has > been running a long time > our production cluster. > But for some use case, we need the balancer run faster. If a cluster has 100 > regionservers, then it add 50 new regionservers for peak requests. Then it > need balancer run as soon as > possible and let the cluster reach a balance state soon. Our idea is compute > max number of regions in transition by the max balancing time and the average > time of region in transition. > Then the balancer use the computed value to throttling. > Examples for understanding. > A cluster has 100 regionservers, each regionserver has 200 regions and the > average time of region in transition is 1 seconds, we config the max > balancing time is 10 * 60 seconds. > Case 1. One regionserver crash, the cluster at most need balance 200 regions. > Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in > transition is 1 when balancing. Then the balancer can move region one by one > and the cluster will have high availability when balancing. > Case 2. Add other 100 regionservers, the cluster at most need balance 1 > regions. Then 1 / (10 * 60s / 1s) = 16.7, it means the max number of > regions in transition is 17 when balancing. Then the cluster can reach a > balance state within the max balancing time. > Any suggestions are welcomed. > Review board: https://reviews.apache.org/r/54191/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17205) Add a metric for the duration of region in transition
Guanghao Zhang created HBASE-17205: -- Summary: Add a metric for the duration of region in transition Key: HBASE-17205 URL: https://issues.apache.org/jira/browse/HBASE-17205 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Minor When work for HBASE-17178, I found there are not a metric for the overall duration of region in transition. When move a region form A to B, the transformation of region state is PENDING_CLOSE => CLOSING => CLOSED => PENDING_OPEN => OPENING => OPENED. When transform old region state to new region state, it update the time stamp to current time. So we can't get the overall transformation's duration of region in transition. Add a rit duration to RegionState for accumulating this metric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16336) Removing peers seem to be leaving spare queues
[ https://issues.apache.org/jira/browse/HBASE-16336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724132#comment-15724132 ] Guanghao Zhang commented on HBASE-16336: HBASE-12769 try to fix this by hbck. A more automatic way is to add a replication zk node checker on master. It periodically check and delete the useless replication zk node. In our use case, we found there are dead rs znode leaved and the dead rs znode only can be transferred when other rs restarted. So the replication zk node checker should check the dead rs znode too. I know the more proper solution is HBASE-11392 and HBASE-12439. But for branch-1, we can resolve this by a replication zk node checker. Any ideas? [~enis] > Removing peers seem to be leaving spare queues > -- > > Key: HBASE-16336 > URL: https://issues.apache.org/jira/browse/HBASE-16336 > Project: HBase > Issue Type: Sub-task > Components: Replication >Reporter: Joseph > > I have been running IntegrationTestReplication repeatedly with the backported > Replication Table changes. Every other iteration of the test fails with, but > these queues should have been deleted when we removed the peers. I believe > this may be related to HBASE-16096, HBASE-16208, or HBASE-16081. > 16/08/02 08:36:07 ERROR util.AbstractHBaseTool: Error running command-line > tool > org.apache.hadoop.hbase.replication.ReplicationException: undeleted queue for > peerId: TestPeer, replicator: > hbase4124.ash2.facebook.com,16020,1470150251042, queueId: TestPeer > at > org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl.checkQueuesDeleted(ReplicationPeersZKImpl.java:544) > at > org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl.addPeer(ReplicationPeersZKImpl.java:127) > at > org.apache.hadoop.hbase.client.replication.ReplicationAdmin.addPeer(ReplicationAdmin.java:200) > at > org.apache.hadoop.hbase.test.IntegrationTestReplication$VerifyReplicationLoop.setupTablesAndReplication(IntegrationTestReplication.java:239) > at > org.apache.hadoop.hbase.test.IntegrationTestReplication$VerifyReplicationLoop.run(IntegrationTestReplication.java:325) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hbase.test.IntegrationTestReplication.runTestFromCommandLine(IntegrationTestReplication.java:418) > at > org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:134) > at > org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hbase.test.IntegrationTestReplication.main(IntegrationTestReplication.java:424) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17261) Balancer makes no sense on tip of branch-1: says balanced when not
[ https://issues.apache.org/jira/browse/HBASE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724002#comment-15724002 ] Guanghao Zhang commented on HBASE-17261: After HBASE-15529, cluster need balance when (total cost / sum multiplier) > minCostNeedBalance. So this means the average cost is less than the defaul minCostNeedBalance 0.05. > Balancer makes no sense on tip of branch-1: says balanced when not > -- > > Key: HBASE-17261 > URL: https://issues.apache.org/jira/browse/HBASE-17261 > Project: HBase > Issue Type: Bug >Reporter: stack > > Running ITBLL on tip of branch-1, I see this in log when I try to balance: > {code} > 2016-12-05 16:42:21,031 INFO > [RpcServer.deafult.FPBQ.Fifo.handler=46,queue=1,port=16000] > balancer.StochasticLoadBalancer: Skipping load balancing because balanced > cluster; total cost is 525.2547686174673| > , sum multiplier is 111087.0 min cost which need balance is 0.05 > {code} > Its some old nonsense. > Does this every time I balance. Can't even force a balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17261) Balancer makes no sense on tip of branch-1: says balanced when not
[ https://issues.apache.org/jira/browse/HBASE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724030#comment-15724030 ] Guanghao Zhang commented on HBASE-17261: We can update the default value of hbase.master.balancer.stochastic.minCostNeedBalance to 0.0. And keep the default behavior sames with before HBASE-15529. Any ideas? [~stack] > Balancer makes no sense on tip of branch-1: says balanced when not > -- > > Key: HBASE-17261 > URL: https://issues.apache.org/jira/browse/HBASE-17261 > Project: HBase > Issue Type: Bug >Reporter: stack > > Running ITBLL on tip of branch-1, I see this in log when I try to balance: > {code} > 2016-12-05 16:42:21,031 INFO > [RpcServer.deafult.FPBQ.Fifo.handler=46,queue=1,port=16000] > balancer.StochasticLoadBalancer: Skipping load balancing because balanced > cluster; total cost is 525.2547686174673| > , sum multiplier is 111087.0 min cost which need balance is 0.05 > {code} > Its some old nonsense. > Does this every time I balance. Can't even force a balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17261) Balancer makes no sense on tip of branch-1: says balanced when not
[ https://issues.apache.org/jira/browse/HBASE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724403#comment-15724403 ] Guanghao Zhang commented on HBASE-17261: bq. sum multiplier is 111087.0 Did the cluster use all default config in StochasticLoadBalancer? bq. What you think is up? We have been used this in our cluster. But I thought the default value should be zero. This config can be used only for some power user. I will upload a patch for this. > Balancer makes no sense on tip of branch-1: says balanced when not > -- > > Key: HBASE-17261 > URL: https://issues.apache.org/jira/browse/HBASE-17261 > Project: HBase > Issue Type: Bug >Reporter: stack > > Running ITBLL on tip of branch-1, I see this in log when I try to balance: > {code} > 2016-12-05 16:42:21,031 INFO > [RpcServer.deafult.FPBQ.Fifo.handler=46,queue=1,port=16000] > balancer.StochasticLoadBalancer: Skipping load balancing because balanced > cluster; total cost is 525.2547686174673| > , sum multiplier is 111087.0 min cost which need balance is 0.05 > {code} > Its some old nonsense. > Does this every time I balance. Can't even force a balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17261) Balancer makes no sense on tip of branch-1: says balanced when not
[ https://issues.apache.org/jira/browse/HBASE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17261: --- Attachment: HBASE-17261.patch > Balancer makes no sense on tip of branch-1: says balanced when not > -- > > Key: HBASE-17261 > URL: https://issues.apache.org/jira/browse/HBASE-17261 > Project: HBase > Issue Type: Bug >Reporter: stack > Attachments: HBASE-17261.patch > > > Running ITBLL on tip of branch-1, I see this in log when I try to balance: > {code} > 2016-12-05 16:42:21,031 INFO > [RpcServer.deafult.FPBQ.Fifo.handler=46,queue=1,port=16000] > balancer.StochasticLoadBalancer: Skipping load balancing because balanced > cluster; total cost is 525.2547686174673| > , sum multiplier is 111087.0 min cost which need balance is 0.05 > {code} > Its some old nonsense. > Does this every time I balance. Can't even force a balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-17261) Balancer makes no sense on tip of branch-1: says balanced when not
[ https://issues.apache.org/jira/browse/HBASE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reassigned HBASE-17261: -- Assignee: Guanghao Zhang > Balancer makes no sense on tip of branch-1: says balanced when not > -- > > Key: HBASE-17261 > URL: https://issues.apache.org/jira/browse/HBASE-17261 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Guanghao Zhang > Attachments: HBASE-17261.patch > > > Running ITBLL on tip of branch-1, I see this in log when I try to balance: > {code} > 2016-12-05 16:42:21,031 INFO > [RpcServer.deafult.FPBQ.Fifo.handler=46,queue=1,port=16000] > balancer.StochasticLoadBalancer: Skipping load balancing because balanced > cluster; total cost is 525.2547686174673| > , sum multiplier is 111087.0 min cost which need balance is 0.05 > {code} > Its some old nonsense. > Does this every time I balance. Can't even force a balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17261) Balancer makes no sense on tip of branch-1: says balanced when not
[ https://issues.apache.org/jira/browse/HBASE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17261: --- Status: Patch Available (was: Open) > Balancer makes no sense on tip of branch-1: says balanced when not > -- > > Key: HBASE-17261 > URL: https://issues.apache.org/jira/browse/HBASE-17261 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Guanghao Zhang > Attachments: HBASE-17261.patch > > > Running ITBLL on tip of branch-1, I see this in log when I try to balance: > {code} > 2016-12-05 16:42:21,031 INFO > [RpcServer.deafult.FPBQ.Fifo.handler=46,queue=1,port=16000] > balancer.StochasticLoadBalancer: Skipping load balancing because balanced > cluster; total cost is 525.2547686174673| > , sum multiplier is 111087.0 min cost which need balance is 0.05 > {code} > Its some old nonsense. > Does this every time I balance. Can't even force a balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17261) Balancer makes no sense on tip of branch-1: says balanced when not
[ https://issues.apache.org/jira/browse/HBASE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724490#comment-15724490 ] Guanghao Zhang commented on HBASE-17261: bq. currently on 1.2 branch-1.2 ? But HBASE-15529 was only merged to branch-1 and master. So branch-1.2 should not has this problem. > Balancer makes no sense on tip of branch-1: says balanced when not > -- > > Key: HBASE-17261 > URL: https://issues.apache.org/jira/browse/HBASE-17261 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Guanghao Zhang > Attachments: HBASE-17261.patch > > > Running ITBLL on tip of branch-1, I see this in log when I try to balance: > {code} > 2016-12-05 16:42:21,031 INFO > [RpcServer.deafult.FPBQ.Fifo.handler=46,queue=1,port=16000] > balancer.StochasticLoadBalancer: Skipping load balancing because balanced > cluster; total cost is 525.2547686174673| > , sum multiplier is 111087.0 min cost which need balance is 0.05 > {code} > Its some old nonsense. > Does this every time I balance. Can't even force a balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17261) Balancer makes no sense on tip of branch-1: says balanced when not
[ https://issues.apache.org/jira/browse/HBASE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15724470#comment-15724470 ] Guanghao Zhang commented on HBASE-17261: {code} private static final String REGION_REPLICA_HOST_COST_KEY = "hbase.master.balancer.stochastic.regionReplicaHostCostKey"; private static final float DEFAULT_REGION_REPLICA_HOST_COST_KEY = 10; {code} The default region replica cost multiplier is too big and it has the most weight in total cost. So when replica cost is small, it can't balance. Upload a patch for this. > Balancer makes no sense on tip of branch-1: says balanced when not > -- > > Key: HBASE-17261 > URL: https://issues.apache.org/jira/browse/HBASE-17261 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Guanghao Zhang > Attachments: HBASE-17261.patch > > > Running ITBLL on tip of branch-1, I see this in log when I try to balance: > {code} > 2016-12-05 16:42:21,031 INFO > [RpcServer.deafult.FPBQ.Fifo.handler=46,queue=1,port=16000] > balancer.StochasticLoadBalancer: Skipping load balancing because balanced > cluster; total cost is 525.2547686174673| > , sum multiplier is 111087.0 min cost which need balance is 0.05 > {code} > Its some old nonsense. > Does this every time I balance. Can't even force a balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17205) Add a metric for the duration of region in transition
[ https://issues.apache.org/jira/browse/HBASE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17205: --- Affects Version/s: 1.4.0 2.0.0 > Add a metric for the duration of region in transition > - > > Key: HBASE-17205 > URL: https://issues.apache.org/jira/browse/HBASE-17205 > Project: HBase > Issue Type: Improvement > Components: Region Assignment >Affects Versions: 2.0.0, 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-17205-branch-1.patch, HBASE-17205-v1.patch, > HBASE-17205.patch > > > When work for HBASE-17178, I found there are not a metric for the overall > duration of region in transition. When move a region form A to B, the > transformation of region state is PENDING_CLOSE => CLOSING => CLOSED => > PENDING_OPEN => OPENING => OPENED. When transform old region state to new > region state, it update the time stamp to current time. So we can't get the > overall transformation's duration of region in transition. Add a rit duration > to RegionState for accumulating this metric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17205) Add a metric for the duration of region in transition
[ https://issues.apache.org/jira/browse/HBASE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17205: --- Attachment: HBASE-17205-branch-1.patch > Add a metric for the duration of region in transition > - > > Key: HBASE-17205 > URL: https://issues.apache.org/jira/browse/HBASE-17205 > Project: HBase > Issue Type: Improvement > Components: Region Assignment >Affects Versions: 2.0.0, 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-17205-branch-1.patch, HBASE-17205-v1.patch, > HBASE-17205.patch > > > When work for HBASE-17178, I found there are not a metric for the overall > duration of region in transition. When move a region form A to B, the > transformation of region state is PENDING_CLOSE => CLOSING => CLOSED => > PENDING_OPEN => OPENING => OPENED. When transform old region state to new > region state, it update the time stamp to current time. So we can't get the > overall transformation's duration of region in transition. Add a rit duration > to RegionState for accumulating this metric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17205) Add a metric for the duration of region in transition
[ https://issues.apache.org/jira/browse/HBASE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17205: --- Attachment: HBASE-17205-v1.patch Attach v1 patch addressed review comments. > Add a metric for the duration of region in transition > - > > Key: HBASE-17205 > URL: https://issues.apache.org/jira/browse/HBASE-17205 > Project: HBase > Issue Type: Improvement > Components: Region Assignment >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-17205-v1.patch, HBASE-17205.patch > > > When work for HBASE-17178, I found there are not a metric for the overall > duration of region in transition. When move a region form A to B, the > transformation of region state is PENDING_CLOSE => CLOSING => CLOSED => > PENDING_OPEN => OPENING => OPENED. When transform old region state to new > region state, it update the time stamp to current time. So we can't get the > overall transformation's duration of region in transition. Add a rit duration > to RegionState for accumulating this metric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17205) Add a metric for the duration of region in transition
[ https://issues.apache.org/jira/browse/HBASE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710497#comment-15710497 ] Guanghao Zhang commented on HBASE-17205: bq. with the new AM we have the actual time of assign and unassign operation for each region and the time of the region in failed open or those kind of states. Look forward to the new AM in 2.0. :) > Add a metric for the duration of region in transition > - > > Key: HBASE-17205 > URL: https://issues.apache.org/jira/browse/HBASE-17205 > Project: HBase > Issue Type: Improvement > Components: Region Assignment >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-17205-v1.patch, HBASE-17205.patch > > > When work for HBASE-17178, I found there are not a metric for the overall > duration of region in transition. When move a region form A to B, the > transformation of region state is PENDING_CLOSE => CLOSING => CLOSED => > PENDING_OPEN => OPENING => OPENED. When transform old region state to new > region state, it update the time stamp to current time. So we can't get the > overall transformation's duration of region in transition. Add a rit duration > to RegionState for accumulating this metric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17205) Add a metric for the duration of region in transition
[ https://issues.apache.org/jira/browse/HBASE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710926#comment-15710926 ] Guanghao Zhang commented on HBASE-17205: Failed ut are related to HBASE-17212. > Add a metric for the duration of region in transition > - > > Key: HBASE-17205 > URL: https://issues.apache.org/jira/browse/HBASE-17205 > Project: HBase > Issue Type: Improvement > Components: Region Assignment >Affects Versions: 2.0.0, 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-17205-branch-1.patch, HBASE-17205-v1.patch, > HBASE-17205.patch > > > When work for HBASE-17178, I found there are not a metric for the overall > duration of region in transition. When move a region form A to B, the > transformation of region state is PENDING_CLOSE => CLOSING => CLOSED => > PENDING_OPEN => OPENING => OPENED. When transform old region state to new > region state, it update the time stamp to current time. So we can't get the > overall transformation's duration of region in transition. Add a rit duration > to RegionState for accumulating this metric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17205) Add a metric for the duration of region in transition
[ https://issues.apache.org/jira/browse/HBASE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17205: --- Attachment: HBASE-17205-v1.patch There were no precommit job run for v1. Attach again. > Add a metric for the duration of region in transition > - > > Key: HBASE-17205 > URL: https://issues.apache.org/jira/browse/HBASE-17205 > Project: HBase > Issue Type: Improvement > Components: Region Assignment >Affects Versions: 2.0.0, 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-17205-branch-1.patch, HBASE-17205-v1.patch, > HBASE-17205-v1.patch, HBASE-17205.patch > > > When work for HBASE-17178, I found there are not a metric for the overall > duration of region in transition. When move a region form A to B, the > transformation of region state is PENDING_CLOSE => CLOSING => CLOSED => > PENDING_OPEN => OPENING => OPENED. When transform old region state to new > region state, it update the time stamp to current time. So we can't get the > overall transformation's duration of region in transition. Add a rit duration > to RegionState for accumulating this metric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17337) list replication peers request should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-17337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17337: --- Attachment: HBASE-17337-v1.patch > list replication peers request should be routed through master > -- > > Key: HBASE-17337 > URL: https://issues.apache.org/jira/browse/HBASE-17337 > Project: HBase > Issue Type: Sub-task >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17337-v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17337) list replication peers request should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-17337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804296#comment-15804296 ] Guanghao Zhang commented on HBASE-17337: Attach a v1 patch. Wait the Hadoop QA result. > list replication peers request should be routed through master > -- > > Key: HBASE-17337 > URL: https://issues.apache.org/jira/browse/HBASE-17337 > Project: HBase > Issue Type: Sub-task >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17337-v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17337) list replication peers request should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-17337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17337: --- Status: Patch Available (was: Open) > list replication peers request should be routed through master > -- > > Key: HBASE-17337 > URL: https://issues.apache.org/jira/browse/HBASE-17337 > Project: HBase > Issue Type: Sub-task >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17337-v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17388) Move ReplicationPeer and other replication related PB messages to the replication.proto
[ https://issues.apache.org/jira/browse/HBASE-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17388: --- Status: Patch Available (was: Open) > Move ReplicationPeer and other replication related PB messages to the > replication.proto > --- > > Key: HBASE-17388 > URL: https://issues.apache.org/jira/browse/HBASE-17388 > Project: HBase > Issue Type: Sub-task > Components: Replication >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17388.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17388) Move ReplicationPeer and other replication related PB messages to the replication.proto
[ https://issues.apache.org/jira/browse/HBASE-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803258#comment-15803258 ] Guanghao Zhang commented on HBASE-17388: Pushed to master. Thanks [~enis] for review. > Move ReplicationPeer and other replication related PB messages to the > replication.proto > --- > > Key: HBASE-17388 > URL: https://issues.apache.org/jira/browse/HBASE-17388 > Project: HBase > Issue Type: Sub-task > Components: Replication >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17388.patch, HBASE-17388.patch, HBASE-17388.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17388) Move ReplicationPeer and other replication related PB messages to the replication.proto
[ https://issues.apache.org/jira/browse/HBASE-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17388: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Move ReplicationPeer and other replication related PB messages to the > replication.proto > --- > > Key: HBASE-17388 > URL: https://issues.apache.org/jira/browse/HBASE-17388 > Project: HBase > Issue Type: Sub-task > Components: Replication >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17388.patch, HBASE-17388.patch, HBASE-17388.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17388) Move ReplicationPeer and other replication related PB messages to the replication.proto
Guanghao Zhang created HBASE-17388: -- Summary: Move ReplicationPeer and other replication related PB messages to the replication.proto Key: HBASE-17388 URL: https://issues.apache.org/jira/browse/HBASE-17388 Project: HBase Issue Type: Sub-task Components: Replication Affects Versions: 2.0.0 Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-17389) Convert all internal usages from ReplicationAdmin to Admin
[ https://issues.apache.org/jira/browse/HBASE-17389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reassigned HBASE-17389: -- Assignee: Guanghao Zhang > Convert all internal usages from ReplicationAdmin to Admin > -- > > Key: HBASE-17389 > URL: https://issues.apache.org/jira/browse/HBASE-17389 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17336) get/update replication peer config requests should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-17336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15784294#comment-15784294 ] Guanghao Zhang commented on HBASE-17336: Attach a v5 patch fix the copy-paste error. bq. move ReplicationPeer and other replication related PB messages to the replication.proto from zookeeper.proto. bq. Maybe after all methods moved to Admin, we can do a refactor patch to convert internal usages from RA to Admin. Open new issue HBASE-17388 and HBASE-17389 for these. > get/update replication peer config requests should be routed through master > --- > > Key: HBASE-17336 > URL: https://issues.apache.org/jira/browse/HBASE-17336 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17336-v1.patch, HBASE-17336-v2.patch, > HBASE-17336-v3.patch, HBASE-17336-v4.patch, HBASE-17336-v5.patch > > > As HBASE-11392 description says, we should move replication operations to be > routed through master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17336) get/update replication peer config requests should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-17336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15784299#comment-15784299 ] Guanghao Zhang commented on HBASE-17336: Thanks for your suggestion. I will do it in HBASE-17389. > get/update replication peer config requests should be routed through master > --- > > Key: HBASE-17336 > URL: https://issues.apache.org/jira/browse/HBASE-17336 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17336-v1.patch, HBASE-17336-v2.patch, > HBASE-17336-v3.patch, HBASE-17336-v4.patch, HBASE-17336-v5.patch > > > As HBASE-11392 description says, we should move replication operations to be > routed through master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17389) Convert all internal usages from ReplicationAdmin to Admin
Guanghao Zhang created HBASE-17389: -- Summary: Convert all internal usages from ReplicationAdmin to Admin Key: HBASE-17389 URL: https://issues.apache.org/jira/browse/HBASE-17389 Project: HBase Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Guanghao Zhang Fix For: 2.0.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17336) get/update replication peer config requests should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-17336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17336: --- Attachment: HBASE-17336-v5.patch > get/update replication peer config requests should be routed through master > --- > > Key: HBASE-17336 > URL: https://issues.apache.org/jira/browse/HBASE-17336 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17336-v1.patch, HBASE-17336-v2.patch, > HBASE-17336-v3.patch, HBASE-17336-v4.patch, HBASE-17336-v5.patch > > > As HBASE-11392 description says, we should move replication operations to be > routed through master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17388) Move ReplicationPeer and other replication related PB messages to the replication.proto
[ https://issues.apache.org/jira/browse/HBASE-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17388: --- Attachment: HBASE-17388.patch Try to trigger Hadoop QA again. > Move ReplicationPeer and other replication related PB messages to the > replication.proto > --- > > Key: HBASE-17388 > URL: https://issues.apache.org/jira/browse/HBASE-17388 > Project: HBase > Issue Type: Sub-task > Components: Replication >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17388.patch, HBASE-17388.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17388) Move ReplicationPeer and other replication related PB messages to the replication.proto
[ https://issues.apache.org/jira/browse/HBASE-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17388: --- Attachment: HBASE-17388.patch > Move ReplicationPeer and other replication related PB messages to the > replication.proto > --- > > Key: HBASE-17388 > URL: https://issues.apache.org/jira/browse/HBASE-17388 > Project: HBase > Issue Type: Sub-task > Components: Replication >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17388.patch, HBASE-17388.patch, HBASE-17388.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17388) Move ReplicationPeer and other replication related PB messages to the replication.proto
[ https://issues.apache.org/jira/browse/HBASE-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17388: --- Status: Open (was: Patch Available) > Move ReplicationPeer and other replication related PB messages to the > replication.proto > --- > > Key: HBASE-17388 > URL: https://issues.apache.org/jira/browse/HBASE-17388 > Project: HBase > Issue Type: Sub-task > Components: Replication >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17388.patch, HBASE-17388.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17388) Move ReplicationPeer and other replication related PB messages to the replication.proto
[ https://issues.apache.org/jira/browse/HBASE-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17388: --- Status: Patch Available (was: Open) > Move ReplicationPeer and other replication related PB messages to the > replication.proto > --- > > Key: HBASE-17388 > URL: https://issues.apache.org/jira/browse/HBASE-17388 > Project: HBase > Issue Type: Sub-task > Components: Replication >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17388.patch, HBASE-17388.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17396) Add first async admin impl and implement balance methods
[ https://issues.apache.org/jira/browse/HBASE-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17396: --- Attachment: HBASE-17396-v4.patch > Add first async admin impl and implement balance methods > > > Key: HBASE-17396 > URL: https://issues.apache.org/jira/browse/HBASE-17396 > Project: HBase > Issue Type: Sub-task > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17396-v1.patch, HBASE-17396-v2.patch, > HBASE-17396-v3.patch, HBASE-17396-v4.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17336) get/update replication peer config requests should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-17336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17336: --- Release Note: Get/update replication peer config requests will be routed through master. > get/update replication peer config requests should be routed through master > --- > > Key: HBASE-17336 > URL: https://issues.apache.org/jira/browse/HBASE-17336 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17336-v1.patch, HBASE-17336-v2.patch, > HBASE-17336-v3.patch, HBASE-17336-v4.patch, HBASE-17336-v5.patch > > > As HBASE-11392 description says, we should move replication operations to be > routed through master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17336) get/update replication peer config requests should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-17336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17336: --- Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to master. Thanks all for reviewing. > get/update replication peer config requests should be routed through master > --- > > Key: HBASE-17336 > URL: https://issues.apache.org/jira/browse/HBASE-17336 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17336-v1.patch, HBASE-17336-v2.patch, > HBASE-17336-v3.patch, HBASE-17336-v4.patch, HBASE-17336-v5.patch > > > As HBASE-11392 description says, we should move replication operations to be > routed through master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17396) Add first async admin impl and implement balance methods
Guanghao Zhang created HBASE-17396: -- Summary: Add first async admin impl and implement balance methods Key: HBASE-17396 URL: https://issues.apache.org/jira/browse/HBASE-17396 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Assignee: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17396) Add first async admin impl and implement balance methods
[ https://issues.apache.org/jira/browse/HBASE-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17396: --- Affects Version/s: 2.0.0 > Add first async admin impl and implement balance methods > > > Key: HBASE-17396 > URL: https://issues.apache.org/jira/browse/HBASE-17396 > Project: HBase > Issue Type: Sub-task > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17396) Add first async admin impl and implement balance methods
[ https://issues.apache.org/jira/browse/HBASE-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17396: --- Status: Patch Available (was: Open) > Add first async admin impl and implement balance methods > > > Key: HBASE-17396 > URL: https://issues.apache.org/jira/browse/HBASE-17396 > Project: HBase > Issue Type: Sub-task > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17396-v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17396) Add first async admin impl and implement balance methods
[ https://issues.apache.org/jira/browse/HBASE-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17396: --- Attachment: HBASE-17396-v1.patch > Add first async admin impl and implement balance methods > > > Key: HBASE-17396 > URL: https://issues.apache.org/jira/browse/HBASE-17396 > Project: HBase > Issue Type: Sub-task > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17396-v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17396) Add first async admin impl and implement balance methods
[ https://issues.apache.org/jira/browse/HBASE-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787455#comment-15787455 ] Guanghao Zhang commented on HBASE-17396: Attach a initial patch and only implement balance methods. And I used a MasterService stub directly in RpcRetryingCaller and didn't use MasterKeepAliveConnection. > Add first async admin impl and implement balance methods > > > Key: HBASE-17396 > URL: https://issues.apache.org/jira/browse/HBASE-17396 > Project: HBase > Issue Type: Sub-task > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17396-v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17396) Add first async admin impl and implement balance methods
[ https://issues.apache.org/jira/browse/HBASE-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17396: --- Attachment: HBASE-17396-v2.patch > Add first async admin impl and implement balance methods > > > Key: HBASE-17396 > URL: https://issues.apache.org/jira/browse/HBASE-17396 > Project: HBase > Issue Type: Sub-task > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17396-v1.patch, HBASE-17396-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17388) Move ReplicationPeer and other replication related PB messages to the replication.proto
[ https://issues.apache.org/jira/browse/HBASE-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17388: --- Attachment: HBASE-17388.patch > Move ReplicationPeer and other replication related PB messages to the > replication.proto > --- > > Key: HBASE-17388 > URL: https://issues.apache.org/jira/browse/HBASE-17388 > Project: HBase > Issue Type: Sub-task > Components: Replication >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17388.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17388) Move ReplicationPeer and other replication related PB messages to the replication.proto
[ https://issues.apache.org/jira/browse/HBASE-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15797836#comment-15797836 ] Guanghao Zhang commented on HBASE-17388: Move TableCF, ReplicationPeer, ReplicationState, ReplicationHLogPosition to Replication.proto. > Move ReplicationPeer and other replication related PB messages to the > replication.proto > --- > > Key: HBASE-17388 > URL: https://issues.apache.org/jira/browse/HBASE-17388 > Project: HBase > Issue Type: Sub-task > Components: Replication >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17388.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17396) Add first async admin impl and implement balance methods
[ https://issues.apache.org/jira/browse/HBASE-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17396: --- Attachment: HBASE-17396-v3.patch > Add first async admin impl and implement balance methods > > > Key: HBASE-17396 > URL: https://issues.apache.org/jira/browse/HBASE-17396 > Project: HBase > Issue Type: Sub-task > Components: Client >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17396-v1.patch, HBASE-17396-v2.patch, > HBASE-17396-v3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17337) list replication peers request should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-17337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17337: --- Attachment: HBASE-17337-v2.patch > list replication peers request should be routed through master > -- > > Key: HBASE-17337 > URL: https://issues.apache.org/jira/browse/HBASE-17337 > Project: HBase > Issue Type: Sub-task >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17337-v1.patch, HBASE-17337-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17337) list replication peers request should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-17337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15810601#comment-15810601 ] Guanghao Zhang commented on HBASE-17337: bq. Add javadoc and InterfaceAudience to ReplicationPeerDescription class. Added in v2. bq. If pattern is not null and once the pattern matches the peer id then we can break out of the for loop. This is list operation and there maybe many peers match the pattern. > list replication peers request should be routed through master > -- > > Key: HBASE-17337 > URL: https://issues.apache.org/jira/browse/HBASE-17337 > Project: HBase > Issue Type: Sub-task >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17337-v1.patch, HBASE-17337-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17443) Move listReplicated/enableTableRep/disableTableRep methods from ReplicationAdmin to Admin
[ https://issues.apache.org/jira/browse/HBASE-17443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17443: --- Assignee: Guanghao Zhang Status: Patch Available (was: Open) > Move listReplicated/enableTableRep/disableTableRep methods from > ReplicationAdmin to Admin > - > > Key: HBASE-17443 > URL: https://issues.apache.org/jira/browse/HBASE-17443 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17443-v1.patch > > > We have moved other replication requests to Admin and mark ReplicationAdmin > as Deprecated, so listReplicated/enableTableRep/disableTableRep methods need > move to Admin, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17443) Move listReplicated/enableTableRep/disableTableRep methods from ReplicationAdmin to Admin
[ https://issues.apache.org/jira/browse/HBASE-17443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17443: --- Attachment: HBASE-17443-v1.patch > Move listReplicated/enableTableRep/disableTableRep methods from > ReplicationAdmin to Admin > - > > Key: HBASE-17443 > URL: https://issues.apache.org/jira/browse/HBASE-17443 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17443-v1.patch > > > We have moved other replication requests to Admin and mark ReplicationAdmin > as Deprecated, so listReplicated/enableTableRep/disableTableRep methods need > move to Admin, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17443) Move listReplicated/enableTableRep/disableTableRep methods from ReplicationAdmin to Admin
[ https://issues.apache.org/jira/browse/HBASE-17443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17443: --- Summary: Move listReplicated/enableTableRep/disableTableRep methods from ReplicationAdmin to Admin (was: Move listReplicated/enableTableRep/disableTableRep from ReplicationAdmin to Admin) > Move listReplicated/enableTableRep/disableTableRep methods from > ReplicationAdmin to Admin > - > > Key: HBASE-17443 > URL: https://issues.apache.org/jira/browse/HBASE-17443 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17443-v1.patch > > > We have moved other replication requests to Admin and mark ReplicationAdmin > as Deprecated, so listReplicated/enableTableRep/disableTableRep methods need > move to Admin, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17443) Move listReplicated/enableTableRep/disableTableRep from ReplicationAdmin to Admin
Guanghao Zhang created HBASE-17443: -- Summary: Move listReplicated/enableTableRep/disableTableRep from ReplicationAdmin to Admin Key: HBASE-17443 URL: https://issues.apache.org/jira/browse/HBASE-17443 Project: HBase Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Guanghao Zhang Fix For: 2.0.0 We have moved other replication requests to Admin and mark ReplicationAdmin as Deprecated, so listReplicated/enableTableRep/disableTableRep methods need move to Admin, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766012#comment-15766012 ] Guanghao Zhang commented on HBASE-17341: +1 on v2. We met this problem on our cluster, too. The region server shutdown hanged when terminate ReplicationSource. > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11392) add/remove peer requests should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765810#comment-15765810 ] Guanghao Zhang commented on HBASE-11392: Failed ut is not related. [~enis] [~ashish singhi] Any more ideas about v6 patch? Thanks. > add/remove peer requests should be routed through master > > > Key: HBASE-11392 > URL: https://issues.apache.org/jira/browse/HBASE-11392 > Project: HBase > Issue Type: Sub-task >Reporter: Enis Soztutar >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-11392-v1.patch, HBASE-11392-v2.patch, > HBASE-11392-v3.patch, HBASE-11392-v4.patch, HBASE-11392-v5.patch, > HBASE-11392-v6.patch > > > ReplicationAdmin directly operates over the zookeeper data for replication > setup. We should move these operations to be routed through master for two > reasons: > - Replication implementation details are exposed to client. We should move > most of the replication related classes to hbase-server package. > - Routing the requests through master is the standard practice for all other > operations. It allows for decoupling implementation details from the client > and code. > Review board: https://reviews.apache.org/r/54730/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11392) add/remove peer requests should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-11392: --- Resolution: Fixed Status: Resolved (was: Patch Available) > add/remove peer requests should be routed through master > > > Key: HBASE-11392 > URL: https://issues.apache.org/jira/browse/HBASE-11392 > Project: HBase > Issue Type: Sub-task >Reporter: Enis Soztutar >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-11392-v1.patch, HBASE-11392-v2.patch, > HBASE-11392-v3.patch, HBASE-11392-v4.patch, HBASE-11392-v5.patch, > HBASE-11392-v6.patch > > > ReplicationAdmin directly operates over the zookeeper data for replication > setup. We should move these operations to be routed through master for two > reasons: > - Replication implementation details are exposed to client. We should move > most of the replication related classes to hbase-server package. > - Routing the requests through master is the standard practice for all other > operations. It allows for decoupling implementation details from the client > and code. > Review board: https://reviews.apache.org/r/54730/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765804#comment-15765804 ] Guanghao Zhang commented on HBASE-17341: +1 on this. One minor comment. bq. LOG.warn("Got exception:", e); Can you add more info to this log? > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.master.v1.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11392) add/remove peer requests should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766194#comment-15766194 ] Guanghao Zhang commented on HBASE-11392: Pushed to master branch.Thanks all for reviewing. > add/remove peer requests should be routed through master > > > Key: HBASE-11392 > URL: https://issues.apache.org/jira/browse/HBASE-11392 > Project: HBase > Issue Type: Sub-task >Reporter: Enis Soztutar >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-11392-v1.patch, HBASE-11392-v2.patch, > HBASE-11392-v3.patch, HBASE-11392-v4.patch, HBASE-11392-v5.patch, > HBASE-11392-v6.patch > > > ReplicationAdmin directly operates over the zookeeper data for replication > setup. We should move these operations to be routed through master for two > reasons: > - Replication implementation details are exposed to client. We should move > most of the replication related classes to hbase-server package. > - Routing the requests through master is the standard practice for all other > operations. It allows for decoupling implementation details from the client > and code. > Review board: https://reviews.apache.org/r/54730/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17317) [branch-1] The updatePeerConfig method in ReplicationPeersZKImpl didn't update the table-cfs map
[ https://issues.apache.org/jira/browse/HBASE-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17317: --- Resolution: Fixed Status: Resolved (was: Patch Available) > [branch-1] The updatePeerConfig method in ReplicationPeersZKImpl didn't > update the table-cfs map > > > Key: HBASE-17317 > URL: https://issues.apache.org/jira/browse/HBASE-17317 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17317-branch-1.patch > > > The updatePeerConfig method in ReplicationPeersZKImpl.java > {code} > @Override > public void updatePeerConfig(String id, ReplicationPeerConfig newConfig) > throws ReplicationException { > ReplicationPeer peer = getPeer(id); > if (peer == null){ > throw new ReplicationException("Could not find peer Id " + id); > } > ReplicationPeerConfig existingConfig = peer.getPeerConfig(); > if (newConfig.getClusterKey() != null && > !newConfig.getClusterKey().isEmpty() && > !newConfig.getClusterKey().equals(existingConfig.getClusterKey())){ > throw new ReplicationException("Changing the cluster key on an existing > peer is not allowed." > + " Existing key '" + existingConfig.getClusterKey() + "' does not > match new key '" > + newConfig.getClusterKey() + > "'"); > } > String existingEndpointImpl = existingConfig.getReplicationEndpointImpl(); > if (newConfig.getReplicationEndpointImpl() != null && > !newConfig.getReplicationEndpointImpl().isEmpty() && > !newConfig.getReplicationEndpointImpl().equals(existingEndpointImpl)){ > throw new ReplicationException("Changing the replication endpoint > implementation class " + > "on an existing peer is not allowed. Existing class '" > + existingConfig.getReplicationEndpointImpl() > + "' does not match new class '" + > newConfig.getReplicationEndpointImpl() + "'"); > } > //Update existingConfig's peer config and peer data with the new values, > but don't touch config > // or data that weren't explicitly changed > existingConfig.getConfiguration().putAll(newConfig.getConfiguration()); > existingConfig.getPeerData().putAll(newConfig.getPeerData()); >// Bug. We should update table-cfs map, too. > try { > ZKUtil.setData(this.zookeeper, getPeerNode(id), > ReplicationSerDeHelper.toByteArray(existingConfig)); > } > catch(KeeperException ke){ > throw new ReplicationException("There was a problem trying to save > changes to the " + > "replication peer " + id, ke); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17317) [branch-1] The updatePeerConfig method in ReplicationPeersZKImpl didn't update the table-cfs map
[ https://issues.apache.org/jira/browse/HBASE-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763933#comment-15763933 ] Guanghao Zhang commented on HBASE-17317: Pushed to branch-1. Thanks [~tedyu] for reviewing. > [branch-1] The updatePeerConfig method in ReplicationPeersZKImpl didn't > update the table-cfs map > > > Key: HBASE-17317 > URL: https://issues.apache.org/jira/browse/HBASE-17317 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Attachments: HBASE-17317-branch-1.patch > > > The updatePeerConfig method in ReplicationPeersZKImpl.java > {code} > @Override > public void updatePeerConfig(String id, ReplicationPeerConfig newConfig) > throws ReplicationException { > ReplicationPeer peer = getPeer(id); > if (peer == null){ > throw new ReplicationException("Could not find peer Id " + id); > } > ReplicationPeerConfig existingConfig = peer.getPeerConfig(); > if (newConfig.getClusterKey() != null && > !newConfig.getClusterKey().isEmpty() && > !newConfig.getClusterKey().equals(existingConfig.getClusterKey())){ > throw new ReplicationException("Changing the cluster key on an existing > peer is not allowed." > + " Existing key '" + existingConfig.getClusterKey() + "' does not > match new key '" > + newConfig.getClusterKey() + > "'"); > } > String existingEndpointImpl = existingConfig.getReplicationEndpointImpl(); > if (newConfig.getReplicationEndpointImpl() != null && > !newConfig.getReplicationEndpointImpl().isEmpty() && > !newConfig.getReplicationEndpointImpl().equals(existingEndpointImpl)){ > throw new ReplicationException("Changing the replication endpoint > implementation class " + > "on an existing peer is not allowed. Existing class '" > + existingConfig.getReplicationEndpointImpl() > + "' does not match new class '" + > newConfig.getReplicationEndpointImpl() + "'"); > } > //Update existingConfig's peer config and peer data with the new values, > but don't touch config > // or data that weren't explicitly changed > existingConfig.getConfiguration().putAll(newConfig.getConfiguration()); > existingConfig.getPeerData().putAll(newConfig.getPeerData()); >// Bug. We should update table-cfs map, too. > try { > ZKUtil.setData(this.zookeeper, getPeerNode(id), > ReplicationSerDeHelper.toByteArray(existingConfig)); > } > catch(KeeperException ke){ > throw new ReplicationException("There was a problem trying to save > changes to the " + > "replication peer " + id, ke); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17328) Properly dispose of looped replication peers
[ https://issues.apache.org/jira/browse/HBASE-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15760485#comment-15760485 ] Guanghao Zhang commented on HBASE-17328: Seems the metrics will be clear twice? > Properly dispose of looped replication peers > > > Key: HBASE-17328 > URL: https://issues.apache.org/jira/browse/HBASE-17328 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 1.4.0, 0.98.23 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.9 > > Attachments: HBASE-17328-1.1.v1.patch, HBASE-17328-master.v1.patch, > HBASE-17328-master.v2.patch, HBASE-17328.branch-1.1.v2.patch, > HBASE-17328.master.v3.patch > > > When adding a looped replication peer (clusterId == peerClusterId), the > following code terminates the replication source thread, but since the source > manager still holds a reference, WALs continue to get enqueued, and never get > cleaned because they're stuck in the queue, leading to an unsustainable > buildup. Furthermore, the replication statistics thread will continue to > print statistics for the terminated source. > {code} > if (clusterId.equals(peerClusterId) && > !replicationEndpoint.canReplicateToSameCluster()) { > this.terminate("ClusterId " + clusterId + " is replicating to itself: > peerClusterId " > + peerClusterId + " which is not allowed by ReplicationEndpoint:" > + replicationEndpoint.getClass().getName(), null, false); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17288) Add warn log for huge Cell and huge row
[ https://issues.apache.org/jira/browse/HBASE-17288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764057#comment-15764057 ] Guanghao Zhang commented on HBASE-17288: bq. You init these new vars when parallel seek enabled. I believe simple mistake it is and not intended by you. Sorry for this mistake. I will upload a new patch later. bq. Any way 1st cell which causes the break in the row size check, will make into the log. Nice, but the row is still needed? We need this to find the huge row. bq. Better we can do the row size check and the end after considering all cells so that we can get exactly the size of the row? Or is that not possible as per loop here? This is not the real row size. When the scan set batch, then it is only the batch cell's size. Now our scan support heartbeat and ScannerConnext has size limit and time limit. Maybe it doesn't need huge row warn... I will check the latest code in master branch. Thanks for your reviewing. :) > Add warn log for huge Cell and huge row > --- > > Key: HBASE-17288 > URL: https://issues.apache.org/jira/browse/HBASE-17288 > Project: HBase > Issue Type: Improvement > Components: scan >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-17288-v1.patch, HBASE-17288-v2.patch, > HBASE-17288.patch > > > Some log examples from our production cluster. > {code} > 2016-12-10,17:08:11,478 WARN > org.apache.hadoop.hbase.regionserver.StoreScanner: adding a HUGE KV into > result list, kv size:1253360, > kv:10567114001-1-c/R:r1/1481360887152/Put/vlen=1253245/ts=923099, from > table X > 2016-12-10,17:08:16,724 WARN > org.apache.hadoop.hbase.regionserver.StoreScanner: adding a HUGE KV into > result list, kv size:1048680, > kv:0220459/I:i_0/1481360889551/Put/vlen=1048576/ts=13642, from table XX > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11392) add/remove peer requests should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-11392: --- Attachment: HBASE-11392-v6.patch > add/remove peer requests should be routed through master > > > Key: HBASE-11392 > URL: https://issues.apache.org/jira/browse/HBASE-11392 > Project: HBase > Issue Type: Sub-task >Reporter: Enis Soztutar >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-11392-v1.patch, HBASE-11392-v2.patch, > HBASE-11392-v3.patch, HBASE-11392-v4.patch, HBASE-11392-v5.patch, > HBASE-11392-v6.patch > > > ReplicationAdmin directly operates over the zookeeper data for replication > setup. We should move these operations to be routed through master for two > reasons: > - Replication implementation details are exposed to client. We should move > most of the replication related classes to hbase-server package. > - Routing the requests through master is the standard practice for all other > operations. It allows for decoupling implementation details from the client > and code. > Review board: https://reviews.apache.org/r/54730/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11392) add/remove peer requests should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764040#comment-15764040 ] Guanghao Zhang commented on HBASE-11392: Attach a v5 patch addressed review comments. > add/remove peer requests should be routed through master > > > Key: HBASE-11392 > URL: https://issues.apache.org/jira/browse/HBASE-11392 > Project: HBase > Issue Type: Sub-task >Reporter: Enis Soztutar >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-11392-v1.patch, HBASE-11392-v2.patch, > HBASE-11392-v3.patch, HBASE-11392-v4.patch, HBASE-11392-v5.patch > > > ReplicationAdmin directly operates over the zookeeper data for replication > setup. We should move these operations to be routed through master for two > reasons: > - Replication implementation details are exposed to client. We should move > most of the replication related classes to hbase-server package. > - Routing the requests through master is the standard practice for all other > operations. It allows for decoupling implementation details from the client > and code. > Review board: https://reviews.apache.org/r/54730/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17348) Remove the unused hbase.replication from javadoc/comment completely
[ https://issues.apache.org/jira/browse/HBASE-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17348: --- Status: Patch Available (was: Open) > Remove the unused hbase.replication from javadoc/comment completely > --- > > Key: HBASE-17348 > URL: https://issues.apache.org/jira/browse/HBASE-17348 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Trivial > Attachments: HBASE-17348.patch > > > Configuration hbase.replication has been removed by HBASE-16040. But there > are still some hbase.replication left in javadoc of ReplicationAdmin, > Admin.proto and shell.rb. Let's remove it completely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11392) add/remove peer requests should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-11392: --- Attachment: HBASE-11392-v5.patch > add/remove peer requests should be routed through master > > > Key: HBASE-11392 > URL: https://issues.apache.org/jira/browse/HBASE-11392 > Project: HBase > Issue Type: Sub-task >Reporter: Enis Soztutar >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-11392-v1.patch, HBASE-11392-v2.patch, > HBASE-11392-v3.patch, HBASE-11392-v4.patch, HBASE-11392-v5.patch > > > ReplicationAdmin directly operates over the zookeeper data for replication > setup. We should move these operations to be routed through master for two > reasons: > - Replication implementation details are exposed to client. We should move > most of the replication related classes to hbase-server package. > - Routing the requests through master is the standard practice for all other > operations. It allows for decoupling implementation details from the client > and code. > Review board: https://reviews.apache.org/r/54730/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17348) Remove the unused hbase.replication from javadoc/comment completely
[ https://issues.apache.org/jira/browse/HBASE-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17348: --- Attachment: HBASE-17348-v1.patch Update the generated java files too. > Remove the unused hbase.replication from javadoc/comment completely > --- > > Key: HBASE-17348 > URL: https://issues.apache.org/jira/browse/HBASE-17348 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Trivial > Attachments: HBASE-17348-v1.patch, HBASE-17348.patch > > > Configuration hbase.replication has been removed by HBASE-16040. But there > are still some hbase.replication left in javadoc of ReplicationAdmin, > Admin.proto and shell.rb. Let's remove it completely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17348) Remove the unused hbase.replication from javadoc/comment completely
Guanghao Zhang created HBASE-17348: -- Summary: Remove the unused hbase.replication from javadoc/comment completely Key: HBASE-17348 URL: https://issues.apache.org/jira/browse/HBASE-17348 Project: HBase Issue Type: Improvement Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Trivial Configuration hbase.replication has been removed by HBASE-16040. But there are still some hbase.replication left in javadoc of ReplicationAdmin, Admin.proto and shell.rb. Let's remove it completely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17348) Remove the unused hbase.replication from javadoc/comment completely
[ https://issues.apache.org/jira/browse/HBASE-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-17348: --- Attachment: HBASE-17348.patch > Remove the unused hbase.replication from javadoc/comment completely > --- > > Key: HBASE-17348 > URL: https://issues.apache.org/jira/browse/HBASE-17348 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Trivial > Attachments: HBASE-17348.patch > > > Configuration hbase.replication has been removed by HBASE-16040. But there > are still some hbase.replication left in javadoc of ReplicationAdmin, > Admin.proto and shell.rb. Let's remove it completely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)