[jira] [Created] (HBASE-27414) Search order for locations in HFileLink

2022-10-05 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-27414:


 Summary: Search order for locations in  HFileLink
 Key: HBASE-27414
 URL: https://issues.apache.org/jira/browse/HBASE-27414
 Project: HBase
  Issue Type: Improvement
  Components: Performance
Reporter: Huaxiang Sun


Found that search order for locations is following the order of these locations 
added to HFileLink object. 

 

setLocations(originPath, tempPath, mobPath, archivePath);

archivePath is the last one to be searched. For most cases, hfile exists in 
archivePath, so we can move archivePath to the first parameter to avoid 
unnecessary NN query.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27366) split or merge removed region under snapshot

2022-09-12 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-27366:


 Summary: split or merge removed region under snapshot
 Key: HBASE-27366
 URL: https://issues.apache.org/jira/browse/HBASE-27366
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.4.10
Reporter: Huaxiang Sun


We run into snapshot failures for one table with large number of regions. The 
event sequence is like the following:

 
 # Snapshot process lists all regions for one table.
 # Normalize kicks in to split some regions for the table under snapshot.
 # split finishes and major compaction finishes. The parent region is moved to 
archive.
 # When the Snapshot processes the parent region, it does not exist and 
snapshot fails.

Since snapshot process acquires the table lock, but there is no table lock 
acquired in split or merge process, they crash into each other.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27345) Add 2.4.14 to the downloads page

2022-08-29 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-27345.
--
Fix Version/s: 3.0.0-alpha-4
 Assignee: Huaxiang Sun
   Resolution: Fixed

> Add 2.4.14 to the downloads page
> 
>
> Key: HBASE-27345
> URL: https://issues.apache.org/jira/browse/HBASE-27345
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 2.4.14
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Minor
> Fix For: 3.0.0-alpha-4
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27345) Add 2.4.14 to the downloads page

2022-08-29 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-27345:


 Summary: Add 2.4.14 to the downloads page
 Key: HBASE-27345
 URL: https://issues.apache.org/jira/browse/HBASE-27345
 Project: HBase
  Issue Type: Task
  Components: documentation
Affects Versions: 2.4.14
Reporter: Huaxiang Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27296) Some Cell's implementation of toString() such as IndividualBytesFieldCell prints out value and tags which is too verbose

2022-08-15 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-27296.
--
Fix Version/s: 2.5.0
   3.0.0-alpha-4
   2.4.14
   Resolution: Fixed

> Some Cell's implementation of toString() such as IndividualBytesFieldCell 
> prints out value and tags which is too verbose
> 
>
> Key: HBASE-27296
> URL: https://issues.apache.org/jira/browse/HBASE-27296
> Project: HBase
>  Issue Type: Improvement
>  Components: logging
>Affects Versions: 2.4.12
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
>
> One of users sees cells >10Mb are logged when over limit at their client log. 
> Checked the code, toString() behavior is not consistent, mostly does not 
> include values and tags. Change toString() to exclude tags/value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27296) IndividualBytesFieldCell#toString() prints out value and tags which is too verbose.

2022-08-11 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-27296:


 Summary: IndividualBytesFieldCell#toString() prints out value and 
tags which is too verbose.
 Key: HBASE-27296
 URL: https://issues.apache.org/jira/browse/HBASE-27296
 Project: HBase
  Issue Type: Improvement
  Components: logging
Affects Versions: 2.4.12
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


One of users sees cells >10Mb are logged when over limit. 

Checked the code, toString() behavior is not consistent, mostly does not 
include values and tags. Change toString() to exclude tags/value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27250) MasterRpcService#setRegionStateInMeta does not support replica region encodedNames or region names

2022-07-27 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-27250:


 Summary: MasterRpcService#setRegionStateInMeta does not support 
replica region encodedNames or region names
 Key: HBASE-27250
 URL: https://issues.apache.org/jira/browse/HBASE-27250
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.4.13
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


MasterRpcServices#setRegionStateInMeta does not support replica region names, 
it assumes the primary region only. This makes HBCK2's setRegionState for 
replica region fails. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27181) Replica region support in HBCK2 setRegionState option

2022-07-05 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-27181:


 Summary: Replica region support in HBCK2 setRegionState option
 Key: HBASE-27181
 URL: https://issues.apache.org/jira/browse/HBASE-27181
 Project: HBase
  Issue Type: Improvement
  Components: hbck2
Affects Versions: 2.4.13
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


Replica region id is  not recognized by hbck2's setRegionState as it does not 
show up in meta. We run into cases that it needs to set region state in meta 
for replica regions in order to fix inconsistency. We ended up writing the 
state manually into meta table and did a master failover to sync state from 
meta table. 

 

hbck2's setRegionState needs to support replica region id and handles it nicely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27025) Change Hbase book's description for "74.7.3. Load Balancing META table load"

2022-06-24 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-27025.
--
Fix Version/s: 3.0.0-alpha-4
   Resolution: Fixed

Merged into the master branch.

> Change Hbase book's description for "74.7.3. Load Balancing META table load"
> 
>
> Key: HBASE-27025
> URL: https://issues.apache.org/jira/browse/HBASE-27025
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.4.12
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Minor
> Fix For: 3.0.0-alpha-4
>
>
> HBASE-26618 involves primary meta region in meta scan. The description in 
> hbase book is inaccurate. Update it accordingly.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HBASE-26649) Support meta replica LoadBalance mode for RegionLocator#getAllRegionLocations()

2022-06-03 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-26649.
--
Fix Version/s: 2.5.0
   3.0.0-alpha-3
   2.4.13
 Release Note: When setting 'hbase.locator.meta.replicas.mode' to 
"LoadBalance" at HBase client, RegionLocator#getAllRegionLocations() now load 
balances across all Meta Replica Regions. Please note,  results from 
non-primary meta replica regions may contain stale data. 
   Resolution: Fixed

> Support meta replica LoadBalance mode for 
> RegionLocator#getAllRegionLocations()
> ---
>
> Key: HBASE-26649
> URL: https://issues.apache.org/jira/browse/HBASE-26649
> Project: HBase
>  Issue Type: Improvement
>  Components: meta replicas
>Affects Versions: 2.4.9
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-3, 2.4.13
>
>
> When HBase application restarts, its meta cache is empty. Normally, it will 
> fill the meta cache one region at a time by scanning the meta region. This 
> will cause huge pressure to the region server hosting meta during application 
> restart. 
> It can prefetching all region locations by calling 
> RegionLocator#getAllRegionLocations().Meta replica LoadBalance mode is 
> support in 2.4, it will be nice to load balance 
> RegionLocator#getAllRegionLocations() to all meta replica regions so batch 
> scan can spread across all meta replica regions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-27087) TestQuotaThrottle times out in branch-2.5.

2022-06-03 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-27087:


 Summary: TestQuotaThrottle times out in branch-2.5.
 Key: HBASE-27087
 URL: https://issues.apache.org/jira/browse/HBASE-27087
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.5.0
Reporter: Huaxiang Sun


With branch-2.5, TestQuotaThrottle times out. Need to investigate.

 
h3. Error Message

Failed after attempts=7, exceptions: 2022-06-03T11:26:33.418Z, 
RpcRetryingCaller\{globalStartTime=2022-06-03T11:26:33.418Z, pause=250, 
maxAttempts=7}, org.apache.hadoop.hbase.MasterNotRunningException: 
java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /hbase/master 2022-06-03T11:26:33.418Z, 
RpcRetryingCaller\{globalStartTime=2022-06-03T11:26:33.418Z, pause=250, 
maxAttempts=7}, org.apache.hadoop.hbase.MasterNotRunningException: 
java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /hbase/master 2022-06-03T11:26:33.418Z, 
RpcRetryingCaller\{globalStartTime=2022-06-03T11:26:33.418Z, pause=250, 
maxAttempts=7}, org.apache.hadoop.hbase.MasterNotRunningException: 
java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /hbase/master 2022-06-03T11:26:33.418Z, 
RpcRetryingCaller\{globalStartTime=2022-06-03T11:26:33.418Z, pause=250, 
maxAttempts=7}, org.apache.hadoop.hbase.MasterNotRunningException: 
java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /hbase/master 2022-06-03T11:26:33.418Z, 
RpcRetryingCaller\{globalStartTime=2022-06-03T11:26:33.418Z, pause=250, 
maxAttempts=7}, org.apache.hadoop.hbase.MasterNotRunningException: 
java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /hbase/master 2022-06-03T11:26:33.418Z, 
RpcRetryingCaller\{globalStartTime=2022-06-03T11:26:33.418Z, pause=250, 
maxAttempts=7}, org.apache.hadoop.hbase.MasterNotRunningException: 
java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /hbase/master 2022-06-03T11:26:33.418Z, 
RpcRetryingCaller\{globalStartTime=2022-06-03T11:26:33.418Z, pause=250, 
maxAttempts=7}, org.apache.hadoop.hbase.MasterNotRunningException: 
java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /hbase/master



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Reopened] (HBASE-26962) Add mob info in web UI

2022-06-02 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun reopened HBASE-26962:
--

The commit caused branch-2 build failure. Can you fix the build error and 
resubmit a patch? Thanks.

> Add mob info in web UI
> --
>
> Key: HBASE-26962
> URL: https://issues.apache.org/jira/browse/HBASE-26962
> Project: HBase
>  Issue Type: Improvement
>  Components: UI
>Reporter: Xuesen Liang
>Assignee: Xuesen Liang
>Priority: Minor
> Fix For: 2.6.0, 3.0.0-alpha-3
>
>
> Add mob store info in web UI.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-27025) Change Hbase book's description for "74.7.3. Load Balancing META table load"

2022-05-11 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-27025:


 Summary: Change Hbase book's description for "74.7.3. Load 
Balancing META table load"
 Key: HBASE-27025
 URL: https://issues.apache.org/jira/browse/HBASE-27025
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.12
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


HBASE-26618 involves primary meta region in meta scan. The description in hbase 
book is inaccurate. Update it accordingly.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HBASE-26984) Chaos Monkey thread dies in ITBLL Chaos GracefulRollingRestartRsAction

2022-05-05 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-26984.
--
Fix Version/s: 2.5.0
   3.0.0-alpha-3
   Resolution: Fixed

> Chaos Monkey thread dies in ITBLL Chaos GracefulRollingRestartRsAction 
> ---
>
> Key: HBASE-26984
> URL: https://issues.apache.org/jira/browse/HBASE-26984
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Affects Versions: 2.4.11
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-3
>
>
> Run itbll chaos monkey in k8s cluster, found chaos monkey thread died in 
> GracefulRollingRestartRsAction. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26984) Chaos Monkey thread dies in ITBLL Chaos GracefulRollingRestartRsAction

2022-04-27 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-26984:


 Summary: Chaos Monkey thread dies in ITBLL Chaos 
GracefulRollingRestartRsAction 
 Key: HBASE-26984
 URL: https://issues.apache.org/jira/browse/HBASE-26984
 Project: HBase
  Issue Type: Bug
  Components: integration tests
Affects Versions: 2.4.11
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


Run itbll chaos monkey in k8s cluster, found chaos monkey thread died in 
GracefulRollingRestartRsAction. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HBASE-26618) Involving primary meta region in meta scan with CatalogReplicaLoadBalanceSimpleSelector

2022-04-11 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-26618.
--
Fix Version/s: 2.5.0
   3.0.0-alpha-3
   2.4.12
 Release Note: When META replica LoadBalance mode is enabled at 
client-side, clients will try to read from one META region first. If META 
location is from any non-primary META regions, in case of errors, it will fall 
back to the primary META region.
   Resolution: Fixed

> Involving primary meta region in meta scan with 
> CatalogReplicaLoadBalanceSimpleSelector
> ---
>
> Key: HBASE-26618
> URL: https://issues.apache.org/jira/browse/HBASE-26618
> Project: HBase
>  Issue Type: Improvement
>  Components: meta replicas
>Affects Versions: 2.4.9
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Minor
> Fix For: 2.5.0, 3.0.0-alpha-3, 2.4.12
>
>
> In the current release with Meta replica LoadBalance mode, the primary meta 
> region is not serving the meta scan (only meta replica region serves the 
> read). When the result from meta replica region is stale, it will go to 
> primary meta region for up-to-date location. 
> From our experience, the primary meta region serves very less read traffic, 
> so it will be better to load balance read traffic across the primary meta 
> region as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26864) Region Server does not send Ack back to master after receiving an OpenRegionReq for open regions, causing OpenRegionProcedure stuck forever.

2022-03-18 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-26864:


 Summary: Region Server does not send Ack back to master after 
receiving an OpenRegionReq for open regions, causing OpenRegionProcedure stuck 
forever.
 Key: HBASE-26864
 URL: https://issues.apache.org/jira/browse/HBASE-26864
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 2.4.10
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


For some upgrading cases, we found that master issues RegionOpen for an already 
open region and Region Sever simply logs 
{code:java}
2022-03-17 22:16:55,595 WARN 
org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler: Received OPEN 
for 
foo,b2875fcb-7bc0-4fa9-a980-e902faf7f151,1631771037620.def199cc7208615b783b285f582ddfa4.
 which is already online {code}
and it does not ack or nack master. This OpenRegionProceduce is stuck forever.

In this specific case, it needs to ack master that region is open. 

 

For the cause of why it sent an OpenRegion request for an already open region, 
it will be followed by another issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26649) Support meta replica LoadBalance mode for RegionLocator#getAllRegionLocations()

2022-01-06 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-26649:


 Summary: Support meta replica LoadBalance mode for 
RegionLocator#getAllRegionLocations()
 Key: HBASE-26649
 URL: https://issues.apache.org/jira/browse/HBASE-26649
 Project: HBase
  Issue Type: Improvement
  Components: meta replicas
Affects Versions: 2.4.9
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


When HBase application restarts, its meta cache is empty. Normally, it will 
fill the meta cache one region at a time by scanning the meta region. This will 
cause huge pressure to the region server hosting meta during application 
restart. 

It can prefetching all region locations by calling 
RegionLocator#getAllRegionLocations().Meta replica LoadBalance mode is support 
in 2.4, it will be nice to load balance RegionLocator#getAllRegionLocations() 
to all meta replica regions so batch scan can spread across all meta replica 
regions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2022-01-06 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-26590.
--
Fix Version/s: 2.5.0
   2.4.10
   Resolution: Fixed

Resolved it for now, will reopen if there is new finding.

> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
>  Issue Type: Improvement
>  Components: meta
>Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 2.5.0, 2.4.10
>
>
> One of our users complained higher latency after application upgrades from 
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
> Balance mode during app restart. I reproduced the regression by a test for 
> meta lookup. 
> At my test cluster, there are 160k regions for the test table, so there are 
> 160k entries in meta region. Used one thread to do 1 million meta lookup 
> against the meta region server.
>  
> ||Version ||Meta Replica Load Balance Enabled||Time               ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> |      *cdh-5.16.2*|                                *No* |  *323412ms*|
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26618) Involving primary meta region in meta scan with Meta Replica Mode

2021-12-22 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-26618:


 Summary: Involving primary meta region in meta scan with Meta 
Replica Mode
 Key: HBASE-26618
 URL: https://issues.apache.org/jira/browse/HBASE-26618
 Project: HBase
  Issue Type: Improvement
  Components: meta replicas
Affects Versions: 2.4.9
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


In the current release with Meta replica LoadBalance mode, the primary meta 
region is not serving the meta scan (only meta replica region serves the read). 
When the result from meta replica region is stale, it will go to primary meta 
region for up-to-date location. 

>From our experience, the primary meta region serves very less read traffic, so 
>it will be better to load balance read traffic across the primary meta region 
>as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26590) Hbase-client Meta lookup performance regression between hbase-1 and hbase-2

2021-12-16 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-26590:


 Summary: Hbase-client Meta lookup performance regression between 
hbase-1 and hbase-2
 Key: HBASE-26590
 URL: https://issues.apache.org/jira/browse/HBASE-26590
 Project: HBase
  Issue Type: Improvement
  Components: meta
Affects Versions: 2.3.7, 3.0.0-alpha-1
 Environment: ||Version ||Meta Replica Load Balance Enabled||Time       
        ||
||2.4.5-with-fixed||Yes||336458ms||
||2.4.5-with-fixed||No||333253ms||
||2.4.5||Yes||469980ms||
||2.4.5||No||470515ms||
|      *cdh-5.16.2*|                                *No* |  *323412ms*|
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


One of our users complained higher latency after application upgrades from 
hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load 
Balance mode during app restart. I reproduced the regression by a test for meta 
lookup. 

At my test cluster, there are 160k regions for the test table, so there are 
160k entries in meta region. Used one thread to do 1 million meta lookup 
against the meta region server.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26338) hbck2 setRegionState cannot set replica region state

2021-10-11 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-26338.
--
Fix Version/s: hbase-operator-tools-1.2.0
 Release Note: 
To set the replica region's state, it needs the primary region's 
encoded regionname and replica id, the command will be "setRegionState 
, ".
   Resolution: Fixed

> hbck2 setRegionState cannot set replica region state
> 
>
> Key: HBASE-26338
> URL: https://issues.apache.org/jira/browse/HBASE-26338
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2
>Affects Versions: hbase-operator-tools-1.1.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: hbase-operator-tools-1.2.0
>
>
> Currently, there is no way to use hbck2 setRegionState to set a replica 
> region's state, which makes hard to fix inconsistency related with replica 
> regions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26338) hbck2 setRegionState cannot set replica region state

2021-10-07 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-26338:


 Summary: hbck2 setRegionState cannot set replica region state
 Key: HBASE-26338
 URL: https://issues.apache.org/jira/browse/HBASE-26338
 Project: HBase
  Issue Type: Bug
  Components: hbck2
Affects Versions: hbase-operator-tools-1.1.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


Currently, there is no way to use hbck2 setRegionState to set a replica 
region's state, which makes hard to fix inconsistency related with replica 
regions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-26255) Add an option to use region location from meta table in TableSnapshotInputFormat

2021-09-09 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-26255.
--
Fix Version/s: 2.4.7
   2.3.7
   3.0.0-alpha-2
   Resolution: Fixed

> Add an option to use region location from meta table in 
> TableSnapshotInputFormat
> 
>
> Key: HBASE-26255
> URL: https://issues.apache.org/jira/browse/HBASE-26255
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.3.6
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 3.0.0-alpha-2, 2.3.7, 2.4.7
>
>
> TableSnapshotInputFormat currently calculates block locality of a region to 
> decide the best location to run the task. While this works for a small scale 
> table snapshot, we found that for a table snapshot with many regions, the 
> locality calculation takes too much time. 
> In the case of a table with high locality, we can use region location from 
> meta table to decide a snapshot region's location. Add an option to support 
> it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26272) TestTableMapReduceUntil failure in branch-2

2021-09-09 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-26272:


 Summary: TestTableMapReduceUntil failure in branch-2
 Key: HBASE-26272
 URL: https://issues.apache.org/jira/browse/HBASE-26272
 Project: HBase
  Issue Type: Test
  Components: test
Reporter: Huaxiang Sun


 
{code:java}
[ERROR] 
org.apache.hadoop.hbase.mapreduce.TestTableMapReduceUtil.testInitCredentialsForCluster3
  Time elapsed: 8.122 s  <<< ERROR!
org.apache.hadoop.security.KerberosAuthException: Login failure for user: 
hsun/localh...@example.com from keytab 
/Users/hsun/work/hbase-hs/hbase-1/hbase-mapreduce/target/test-data/b12f4926-d8ec-1129-0101-1ba76e65f3c2/keytab
 javax.security.auth.login.LoginException: java.lang.IllegalArgumentException: 
Illegal principal name hsun/localh...@example.com: 
org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No 
rules applied to hsun/localh...@example.com
 at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1104)
 at 
org.apache.hadoop.hbase.mapreduce.TestTableMapReduceUtil.testInitCredentialsForCluster3(TestTableMapReduceUtil.java:233)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
 at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
 at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
 at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
 at org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:38)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.lang.Thread.run(Thread.java:748)
Caused by: javax.security.auth.login.LoginException: 
java.lang.IllegalArgumentException: Illegal principal name 
hsun/localh...@example.com: 
org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No 
rules applied to hsun/localh...@example.com
 at 
org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule.commit(UserGroupInformation.java:224)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
 at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
 at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
 at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
 at javax.security.auth.login.LoginContext.login(LoginContext.java:588)
 at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1095)
 ... 25 more
Caused by: java.lang.IllegalArgumentException: Illegal principal name 
hsun/localh...@example.com: 
org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No 
rules applied to hsun/localh...@example.com
 at org.apache.hadoop.security.User.(User.java:50)
 at org.apache.hadoop.security.User.(User.java:43)
 at 
org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule.commit(UserGroupInformation.java:222)
 ... 37 more
Caused by: 
org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No 
rules applied to hsun/localh...@example.com
 at 

[jira] [Created] (HBASE-26255) Add an option to use region location from meta table in TableSnapshotInputFormat

2021-09-03 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-26255:


 Summary: Add an option to use region location from meta table in 
TableSnapshotInputFormat
 Key: HBASE-26255
 URL: https://issues.apache.org/jira/browse/HBASE-26255
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.3.6
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


TableSnapshotInputFormat currently calculates block locality of a region to 
decide the best location to run the task. While this works for a small scale 
table snapshot, we found that for a table snapshot with many regions, the 
locality calculation takes too much time. 

In the case of a table with high locality, we can use region location from meta 
table to decide a snapshot region's location. Add an option to support it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-26108) add option to disable scanMetrics in TableSnapshotInputFormat

2021-07-22 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-26108.
--
Fix Version/s: 2.4.5
   3.0.0-alpha-2
   2.3.6
   Resolution: Fixed

> add option to disable scanMetrics in TableSnapshotInputFormat
> -
>
> Key: HBASE-26108
> URL: https://issues.apache.org/jira/browse/HBASE-26108
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.3.5
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 2.3.6, 3.0.0-alpha-2, 2.4.5
>
>
> When running spark job with TableSnapshotInputFormat, we found that scan is 
> very slower. We found that scanMetrics is hardcoded as enabled, spark's 
> newAPIHadoopRDD uses DummyReporter in hadoop, which causes the following 
> exception and 80% cpu time is spent on this exception handling. 
> Need to provide an option to disable scanMetrics.
> java.base@11.0.5/java.lang.Throwable.fillInStackTrace(Native Method)
> java.base@11.0.5/java.lang.Throwable.fillInStackTrace(Throwable.java:787) => 
> holding Monitor(java.util.MissingResourceException@258206255})
> java.base@11.0.5/java.lang.Throwable.(Throwable.java:292)
> java.base@11.0.5/java.lang.Exception.(Exception.java:84)
> java.base@11.0.5/java.lang.RuntimeException.(RuntimeException.java:80)
> java.base@11.0.5/java.util.MissingResourceException.(MissingResourceException.java:85)
> java.base@11.0.5/java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:2055)
> java.base@11.0.5/java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1689)
> java.base@11.0.5/java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1593)
> java.base@11.0.5/java.util.ResourceBundle.getBundle(ResourceBundle.java:1284)
> app//org.apache.hadoop.mapreduce.util.ResourceBundles.getBundle(ResourceBundles.java:37)
> app//org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56)
>  => holding Monitor(java.lang.Class@545605549})
> app//org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterGroupName(ResourceBundles.java:77)
> app//org.apache.hadoop.mapreduce.counters.CounterGroupFactory.newGroup(CounterGroupFactory.java:94)
> app//org.apache.hadoop.mapreduce.counters.AbstractCounters.getGroup(AbstractCounters.java:227)
> app//org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
> app//org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl$DummyReporter.getCounter(TaskAttemptContextImpl.java:110)
> app//org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.getCounter(TaskAttemptContextImpl.java:76)
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters(TableRecordReaderImpl.java:311)
> org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat$TableSnapshotRegionRecordReader.nextKeyValue(TableSnapshotInputFormat.java:167)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26108) add option to disable scanMetrics in TableSnapshotInputFormat

2021-07-21 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-26108:


 Summary: add option to disable scanMetrics in 
TableSnapshotInputFormat
 Key: HBASE-26108
 URL: https://issues.apache.org/jira/browse/HBASE-26108
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.3.5
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


When running spark job with TableSnapshotInputFormat, we found that scan is 
very slower. We found that scanMetrics is hardcoded as enabled, spark's 

newAPIHadoopRDD uses DummyReporter in hadoop, which causes the following 
exception and 80% cpu time is spent on this exception handling. 

Need to provide an option to disable scanMetrics.
java.base@11.0.5/java.lang.Throwable.fillInStackTrace(Native Method)
java.base@11.0.5/java.lang.Throwable.fillInStackTrace(Throwable.java:787) => 
holding Monitor(java.util.MissingResourceException@258206255})
java.base@11.0.5/java.lang.Throwable.(Throwable.java:292)
java.base@11.0.5/java.lang.Exception.(Exception.java:84)
java.base@11.0.5/java.lang.RuntimeException.(RuntimeException.java:80)
java.base@11.0.5/java.util.MissingResourceException.(MissingResourceException.java:85)
java.base@11.0.5/java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:2055)
java.base@11.0.5/java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1689)
java.base@11.0.5/java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1593)
java.base@11.0.5/java.util.ResourceBundle.getBundle(ResourceBundle.java:1284)
app//org.apache.hadoop.mapreduce.util.ResourceBundles.getBundle(ResourceBundles.java:37)
app//org.apache.hadoop.mapreduce.util.ResourceBundles.getValue(ResourceBundles.java:56)
 => holding Monitor(java.lang.Class@545605549})
app//org.apache.hadoop.mapreduce.util.ResourceBundles.getCounterGroupName(ResourceBundles.java:77)
app//org.apache.hadoop.mapreduce.counters.CounterGroupFactory.newGroup(CounterGroupFactory.java:94)
app//org.apache.hadoop.mapreduce.counters.AbstractCounters.getGroup(AbstractCounters.java:227)
app//org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
app//org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl$DummyReporter.getCounter(TaskAttemptContextImpl.java:110)
app//org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.getCounter(TaskAttemptContextImpl.java:76)
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.updateCounters(TableRecordReaderImpl.java:311)
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat$TableSnapshotRegionRecordReader.nextKeyValue(TableSnapshotInputFormat.java:167)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26092) JVM core dump in the replication path

2021-07-15 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-26092:


 Summary: JVM core dump in the replication path
 Key: HBASE-26092
 URL: https://issues.apache.org/jira/browse/HBASE-26092
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.3.5
Reporter: Huaxiang Sun


When replication is turned on, we found the following code dump in the region 
server. 

I checked the code dump for replication. I think I got some ideas. For 
replication, when RS receives walEdits from remote cluster, it needs to send 
them out to final RS. In this case, NettyRpcConnection is deployed, calls are 
queued while it refers to ByteBuffer in the context of replicationHandler 
(returned to the pool once it returns). Code dump will happen since the 
byteBuffer has been reused. Needs ref count in this asynchronous processing.

 

Feel free to take it, otherwise, I will try to work on a patch later.

 

 
{code:java}
Stack: [0x7fb1bf039000,0x7fb1bf13a000],  sp=0x7fb1bf138560,  free 
space=1021k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 28175 C2 
org.apache.hadoop.hbase.ByteBufferKeyValue.write(Ljava/io/OutputStream;Z)I (21 
bytes) @ 0x7fd2663c [0x7fd263c0+0x27c]
J 14912 C2 
org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.writeRequest(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Lorg/apache/hadoop/hbase/ipc/Call;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
 (370 bytes) @ 0x7fdbbb94b590 [0x7fdbbb949c00+0x1990]
J 14911 C2 
org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.write(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
 (30 bytes) @ 0x7fdbb972d1d4 [0x7fdbb972d1a0+0x34]
J 30476 C2 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(Ljava/lang/Object;ZLorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
 (149 bytes) @ 0x7fdbbd4e7084 [0x7fdbbd4e6900+0x784]
J 14914 C2 org.apache.hadoop.hbase.ipc.NettyRpcConnection$6$1.run()V (22 bytes) 
@ 0x7fdbbb9344ec [0x7fdbbb934280+0x26c]
J 23528 C2 
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z
 (106 bytes) @ 0x7fdbbcbb0efc [0x7fdbbcbb0c40+0x2bc]
J 15987% C2 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run()V (461 
bytes) @ 0x7fdbbbaf1580 [0x7fdbbbaf1360+0x220]
j  
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run()V+44
j  
org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run()V+11
j  
org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run()V+4
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25724) update download area for 2.3.5 as new stable build

2021-04-01 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25724:


 Summary: update download area for 2.3.5 as new stable build
 Key: HBASE-25724
 URL: https://issues.apache.org/jira/browse/HBASE-25724
 Project: HBase
  Issue Type: Sub-task
  Components: community
Reporter: Sean Busbey
Assignee: Sean Busbey



* update the stable symlink to point to 2.3.4
* Remove the 2.3.3 release from downloads.a.o



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25721) Add 2.3.5 to the downloads page

2021-03-31 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25721.
--
Resolution: Fixed

> Add 2.3.5 to the downloads page
> ---
>
> Key: HBASE-25721
> URL: https://issues.apache.org/jira/browse/HBASE-25721
> Project: HBase
>  Issue Type: Task
>  Components: community
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25722) Update reporter tool with new release, 2.3.5

2021-03-31 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25722:


 Summary: Update reporter tool with new release, 2.3.5
 Key: HBASE-25722
 URL: https://issues.apache.org/jira/browse/HBASE-25722
 Project: HBase
  Issue Type: Sub-task
Reporter: Huaxiang Sun
Assignee: Viraj Jasani


Reporter tool: [https://reporter.apache.org/addrelease.html?hbase]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25721) Add 2.3.5 to the downloads page

2021-03-31 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25721:


 Summary: Add 2.3.5 to the downloads page
 Key: HBASE-25721
 URL: https://issues.apache.org/jira/browse/HBASE-25721
 Project: HBase
  Issue Type: Task
  Components: community
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25590) Bulkload replication HFileRefs cannot be cleared in some cases where set exclude-namespace/exclude-table-cfs

2021-03-25 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25590.
--
Resolution: Fixed

Resolving it for 2.3.5 release. Please reopen when landing the 2.2 patch.

> Bulkload replication HFileRefs cannot be cleared in some cases where set 
> exclude-namespace/exclude-table-cfs
> 
>
> Key: HBASE-25590
> URL: https://issues.apache.org/jira/browse/HBASE-25590
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.2.6, 2.3.4, 2.4.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.5
>
>
> In 
> [ReplicationSource#addHFileRefs|https://github.com/apache/hbase/blob/ed90a14995acd87111d2b9849f07d84418ca43d4/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L264],
>  we may add unwanted hfiles to the _HFileRefs_ if a peer is set 
> _replicate_all_ true and set _exclude-namespace/exclude-table-cfs_.
> These unwanted _HFileRefs_ will not be replicated to remote cluster and not 
> be cleared.
> Two problems are caused by this bug:
>  # The metric sizeOfHFileRefsQueue cannot be zeroed.
>  # Referenced HFiles cannot be deleted by _ReplicationHFileCleaner._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25691) Test failure: TestVerifyBucketCacheFile.testRetrieveFromFile

2021-03-25 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25691.
--
Resolution: Fixed

> Test failure: TestVerifyBucketCacheFile.testRetrieveFromFile
> 
>
> Key: HBASE-25691
> URL: https://issues.apache.org/jira/browse/HBASE-25691
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 2.3.4
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.5, 2.4.3
>
>
> Saw this test failure from 2.3 nightly.
> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/190/testReport/junit/org.apache.hadoop.hbase.io.hfile.bucket/TestVerifyBucketCacheFile/health_checks___yetus_jdk8_hadoop2_checks___testRetrieveFromFile_1__blockSize_16_384__bucketSizes__I_371a67ec_/
> h1. Regression
> health checks / yetus jdk8 hadoop2 checks / 
> org.apache.hadoop.hbase.io.hfile.bucket.TestVerifyBucketCacheFile.testRetrieveFromFile[1:
>  blockSize=16,384, bucketSizes=[I@371a67ec]
> Failing for the past 1 build (Since 
> [!https://ci-hadoop.apache.org/static/e247241e/images/16x16/red.png! 
> #190|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/190/]
>  )
> [Took 0.32 
> sec.|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/190/testReport/junit/org.apache.hadoop.hbase.io.hfile.bucket/TestVerifyBucketCacheFile/health_checks___yetus_jdk8_hadoop2_checks___testRetrieveFromFile_1__blockSize_16_384__bucketSizes__I_371a67ec_/history]
>  
> h3. Stacktrace
> java.lang.AssertionError at 
> org.apache.hadoop.hbase.io.hfile.bucket.TestVerifyBucketCacheFile.testRetrieveFromFile(TestVerifyBucketCacheFile.java:136)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25691) Test failure: TestVerifyBucketCacheFile.testRetrieveFromFile

2021-03-23 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25691:


 Summary: Test failure: 
TestVerifyBucketCacheFile.testRetrieveFromFile
 Key: HBASE-25691
 URL: https://issues.apache.org/jira/browse/HBASE-25691
 Project: HBase
  Issue Type: Test
  Components: test
Affects Versions: 2.3.4
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


Saw this test failure from 2.3 nightly.

https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/190/testReport/junit/org.apache.hadoop.hbase.io.hfile.bucket/TestVerifyBucketCacheFile/health_checks___yetus_jdk8_hadoop2_checks___testRetrieveFromFile_1__blockSize_16_384__bucketSizes__I_371a67ec_/
h1. Regression

health checks / yetus jdk8 hadoop2 checks / 
org.apache.hadoop.hbase.io.hfile.bucket.TestVerifyBucketCacheFile.testRetrieveFromFile[1:
 blockSize=16,384, bucketSizes=[I@371a67ec]
Failing for the past 1 build (Since 
[!https://ci-hadoop.apache.org/static/e247241e/images/16x16/red.png! 
#190|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/190/]
 )
[Took 0.32 
sec.|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/190/testReport/junit/org.apache.hadoop.hbase.io.hfile.bucket/TestVerifyBucketCacheFile/health_checks___yetus_jdk8_hadoop2_checks___testRetrieveFromFile_1__blockSize_16_384__bucketSizes__I_371a67ec_/history]
 
h3. Stacktrace

java.lang.AssertionError at 
org.apache.hadoop.hbase.io.hfile.bucket.TestVerifyBucketCacheFile.testRetrieveFromFile(TestVerifyBucketCacheFile.java:136)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25639) meta replica state is not respected during active master switch

2021-03-18 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25639.
--
Fix Version/s: 2.3.5
   Resolution: Fixed

> meta replica state is not respected during active master switch
> ---
>
> Key: HBASE-25639
> URL: https://issues.apache.org/jira/browse/HBASE-25639
> Project: HBase
>  Issue Type: Bug
>  Components: meta replicas
>Affects Versions: 2.0.6, 2.1.9, 2.2.6, 2.3.4
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 2.3.5
>
>
> We saw this warning in master log.
> WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: No 
> RegionStateNode for hbase:meta,,1_0003 but reported as up on 
> server1.example.com,16020,1614958467735; closing...
>  
> The root cause is that meta replica regions are in zookeeper, and these state 
> are not iterated by the new active master so it loses track.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25640) Support hbase rpc compression for remote rpc only

2021-03-05 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25640:


 Summary: Support hbase rpc compression for remote rpc only
 Key: HBASE-25640
 URL: https://issues.apache.org/jira/browse/HBASE-25640
 Project: HBase
  Issue Type: Improvement
  Components: rpc
Affects Versions: 2.3.4
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


The purpose of Rpc compression is to save network bandwidth. For local 
communication (both hbase client and RS are on the same node), rpc compression 
is unnecessary as local communication is memory copy only and does not go 
through nic. Rpc compression for local communication will be a waste of cpu 
computation power as compress/decompress is cpu intensive. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25639) meta replica state is not respected during active master switch

2021-03-05 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25639:


 Summary: meta replica state is not respected during active master 
switch
 Key: HBASE-25639
 URL: https://issues.apache.org/jira/browse/HBASE-25639
 Project: HBase
  Issue Type: Bug
  Components: meta replicas
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


We saw this warning in master log.

WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: No 
RegionStateNode for hbase:meta,,1_0003 but reported as up on 
server1.example.com,16020,1614958467735; closing...

 

The root cause is that meta replica regions are in zookeeper, and these state 
are not iterated by the new active master so it loses track.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25537) Misleading Range metrcis

2021-01-28 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25537:


 Summary: Misleading Range metrcis 
 Key: HBASE-25537
 URL: https://issues.apache.org/jira/browse/HBASE-25537
 Project: HBase
  Issue Type: Bug
  Components: metrics
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun
 Fix For: 2.3.4
 Attachments: Screen Shot 2021-01-27 at 1.09.32 PM.png

Found some cases that max value is included in a smaller range, which is 
confusing. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25417) Send announce email

2021-01-24 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25417.
--
Resolution: Fixed

> Send announce email
> ---
>
> Key: HBASE-25417
> URL: https://issues.apache.org/jira/browse/HBASE-25417
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25409) Release 2.3.4

2021-01-24 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25409.
--
Resolution: Fixed

> Release 2.3.4
> -
>
> Key: HBASE-25409
> URL: https://issues.apache.org/jira/browse/HBASE-25409
> Project: HBase
>  Issue Type: Task
>  Components: community
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25416) Add 2.3.4 to the downloads page

2021-01-22 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25416.
--
Fix Version/s: (was: 3.0.0-alpha-1)
   Resolution: Fixed

> Add 2.3.4 to the downloads page
> ---
>
> Key: HBASE-25416
> URL: https://issues.apache.org/jira/browse/HBASE-25416
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25412) Release version 2.3.4 in Jira

2021-01-22 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25412.
--
Resolution: Fixed

> Release version 2.3.4 in Jira
> -
>
> Key: HBASE-25412
> URL: https://issues.apache.org/jira/browse/HBASE-25412
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25411) "Release" staged nexus repository

2021-01-22 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25411.
--
Resolution: Fixed

> "Release" staged nexus repository
> -
>
> Key: HBASE-25411
> URL: https://issues.apache.org/jira/browse/HBASE-25411
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25410) Spin RCs

2021-01-22 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25410.
--
Resolution: Fixed

> Spin RCs
> 
>
> Key: HBASE-25410
> URL: https://issues.apache.org/jira/browse/HBASE-25410
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25413) Promote 2.3.4 RC artifacts in svn

2021-01-22 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25413.
--
Resolution: Fixed

> Promote 2.3.4 RC artifacts in svn
> -
>
> Key: HBASE-25413
> URL: https://issues.apache.org/jira/browse/HBASE-25413
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25415) Push signed release tag

2021-01-22 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25415.
--
Resolution: Fixed

> Push signed release tag
> ---
>
> Key: HBASE-25415
> URL: https://issues.apache.org/jira/browse/HBASE-25415
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25368) Filter out more invalid encoded name in isEncodedRegionName(byte[] regionName)

2021-01-20 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25368.
--
Fix Version/s: 3.0.0-alpha-1
   Resolution: Fixed

> Filter out more invalid encoded name in isEncodedRegionName(byte[] 
> regionName) 
> ---
>
> Key: HBASE-25368
> URL: https://issues.apache.org/jira/browse/HBASE-25368
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> {code:java}
> public static boolean isEncodedRegionName(byte[] regionName) {
>   // If not parseable as region name, presume encoded. TODO: add stringency; 
> e.g. if hex.
>   return parseRegionNameOrReturnNull(regionName) == null && regionName.length 
> <= MD5_HEX_LENGTH;
> }
> Right now, if it passes in an table name, it still thinks  it is a encoded 
> region name and will result in unnecessary registry query for meta regions. 
> This can be avoided if table names can be filtered out early in this 
> method.{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25418) Run a correctness test with ITBLL

2021-01-16 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25418.
--
Resolution: Fixed

Run itbll with chaos monkey, inserted 3billion rows and it was verified 
successfully, it is for 2.3.4RC4.

> Run a correctness test with ITBLL
> -
>
> Key: HBASE-25418
> URL: https://issues.apache.org/jira/browse/HBASE-25418
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-25371) When openRegion fails during initial verification(before initializing and setting seq num), exception is observed during region close.

2021-01-14 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun reopened HBASE-25371:
--

I just found that this Jira has not been merged to branch-2, 
branch-2.3,branch-2.4 yet.

Can you do backport and set the release version correctly? Thanks.

> When openRegion fails during initial verification(before initializing and 
> setting seq num), exception is observed during region close.
> --
>
> Key: HBASE-25371
> URL: https://issues.apache.org/jira/browse/HBASE-25371
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Assignee: Mohammad Arshad
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.1, 2.3.5
>
>
> When openRegion fails during initial verification(before initializing and 
> setting seq num), exception is observed during region close:
>  
> 2020-12-03 16:34:47,133 ERROR 
> [RS_OPEN_REGION-regionserver/AA:16040-0] handler.OpenRegionHandler: 
> Failed open of 
> region=ns2:testtable4,15,1606912406234.cd386135276b7d3c57416df3666e4aea.2020-12-03
>  16:34:47,133 ERROR [RS_OPEN_REGION-regionserver/blrphispra01054:16040-0] 
> handler.OpenRegionHandler: Failed open of 
> region=ns2:testtable4,15,1606912406234.cd386135276b7d3c57416df3666e4aea.java.io.IOException:
>  The new max sequence id 1 is less than the old max sequence id 7134 at 
> org.apache.hadoop.hbase.wal.WALSplitUtil.writeRegionSequenceIdFile(WALSplitUtil.java:418)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegion.writeRegionCloseMarker(HRegion.java:1253)
>  at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1793) 
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1606) at 
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1552) at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7522) 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7467) 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7439) 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7397) 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7348) 
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:286)
>  at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:111)
>  at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25083) make sure the next hbase 1.y release has Hadoop 2.10 as a minimum version

2021-01-14 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25083.
--
Resolution: Fixed

> make sure the next hbase 1.y release has Hadoop 2.10 as a minimum version
> -
>
> Key: HBASE-25083
> URL: https://issues.apache.org/jira/browse/HBASE-25083
> Project: HBase
>  Issue Type: Task
>  Components: documentation, hadoop2
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.3.4, 2.5.0, 2.4.1
>
>
> Our reference guide list of prerequisites still has Hadoop 2.8 and 2.9 listed 
> for HBase 1 releases.
> * [hadoop 2.8 is 
> EOM|https://lists.apache.org/thread.html/r348f7bc93a522f05b7cce78a911854d128a6b1b8bd8124bad4d06ce6%40%3Cuser.hadoop.apache.org%3E]
> * [hadoop 2.9 is 
> EOM|https://lists.apache.org/thread.html/r16b14cce9504f7a9d228612c6b808e72d8dd20863c78be51a7e04ed5%40%3Cuser.hadoop.apache.org%3E]
> The current list in the reference guide for HBase 1.6 is just the 1.5 list 
> copied. we should update it to remove 2.8 and 2.9 and make sure we're no 
> longer doing build/test based on those versions for branch-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25356) HBaseAdmin#getRegion() needs to filter out non-regionName and non-encodedRegionName

2021-01-14 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25356.
--
Fix Version/s: 2.4.1
   2.5.0
   2.3.4
   Resolution: Fixed

> HBaseAdmin#getRegion() needs to filter out non-regionName and 
> non-encodedRegionName
> ---
>
> Key: HBASE-25356
> URL: https://issues.apache.org/jira/browse/HBASE-25356
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.3.3, 2.4.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 2.3.4, 2.5.0, 2.4.1
>
>
> I was running shell command to major compact meta table. The implementation 
> is wrong because it tries to search the meta table with meta table name. This 
> also results in an unnecessary scan of meta table. 
>  
> majorCompactRegion() is calling HBaseAdmin#getRegion() which basically scan 
> meta table itself.
> This command is being used by operator quite often, we need to correct it.
>  
> This applies to split/flush command as well, which calls getRegion() with 
> tableName as an input.
>  
> The solution is that getRegion() needs to filter out non-regionName and 
> non-encodedRegionName, this will save a query of meta table and a heavy scan 
> of meta table. If meta table size is large, the overhead is huge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25470) Add unitest for HBASE-25445 - SplitWALRemoteProcedure failed to archive split WAL

2021-01-06 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25470:


 Summary: Add unitest for HBASE-25445 - SplitWALRemoteProcedure 
failed to archive split WAL
 Key: HBASE-25470
 URL: https://issues.apache.org/jira/browse/HBASE-25470
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 3.0.0-alpha-1, 2.4.0, 2.2.6, 2.3.2
Reporter: mokai
Assignee: Anjan Das
 Fix For: 3.0.0-alpha-1, 2.2.7, 2.3.4, 2.5.0, 2.4.1


If 'hbase.wal.dir' and 'hbase.rootdir' are configured to different filesystem, 
SplitWALRemoteProcedure archived split WAL failed since SplitWALManager using 
wrong fs instance. SplitWALManager should use WAL corresponding fs instance.

Steps to Reproduce:
 * Configure 'hbase.wal.dir' and 'hbase.rootdir' so that they point to 
different fs instances.
 * Start HBase with multiple RS. 
 * Create a couple of tables and some rows in them so that the RSs get assigned 
with some regions. 
 * Take any RS with non-zero number of regions offline. 
 * Check master logs for "Wrong FS" error as shown in the screenshot attached. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25293) Followup jira to address the client handling issue when chaning from meta replica to non-meta-replica at the server side.

2021-01-05 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25293.
--
Fix Version/s: 2.4.1
   Resolution: Fixed

> Followup jira to address the client handling issue when chaning from meta 
> replica to non-meta-replica at the server side.
> -
>
> Key: HBASE-25293
> URL: https://issues.apache.org/jira/browse/HBASE-25293
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Minor
> Fix For: 2.4.1
>
>
> [https://github.com/apache/hbase/pull/2643]
>  
> {quote}
> With my operator hat on, I'd assume that LOAD_BALANCE with 1 replica count 
> works like no read replicas configured (logic wise at-least, even though the 
> code paths are different).
> {quote}If the server side does not support meta replica, the client side 
> cannot be configured to support this mode
> {quote}
> Since clients are usually long running (meaning we may not be able to restart 
> client or they using cached HBase connection) and meta replica count can be 
> altered on the service side on the fly, I'd expect client to work across 
> these changes without any configuration changes. WDYT?
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25418) Run a correctness test with ITBLL

2020-12-18 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25418:


 Summary: Run a correctness test with ITBLL
 Key: HBASE-25418
 URL: https://issues.apache.org/jira/browse/HBASE-25418
 Project: HBase
  Issue Type: Task
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25411) CLONE - "Release" staged nexus repository

2020-12-18 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25411:


 Summary: CLONE - "Release" staged nexus repository
 Key: HBASE-25411
 URL: https://issues.apache.org/jira/browse/HBASE-25411
 Project: HBase
  Issue Type: Sub-task
Reporter: Viraj Jasani
Assignee: Viraj Jasani






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25414) CLONE - Update reporter tool with new release

2020-12-18 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25414:


 Summary: CLONE - Update reporter tool with new release
 Key: HBASE-25414
 URL: https://issues.apache.org/jira/browse/HBASE-25414
 Project: HBase
  Issue Type: Sub-task
Reporter: Viraj Jasani
Assignee: Nick Dimiduk


Reporter tool: [https://reporter.apache.org/addrelease.html?hbase]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25415) CLONE - Push signed release tag

2020-12-18 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25415:


 Summary: CLONE - Push signed release tag
 Key: HBASE-25415
 URL: https://issues.apache.org/jira/browse/HBASE-25415
 Project: HBase
  Issue Type: Sub-task
Reporter: Viraj Jasani
Assignee: Viraj Jasani






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25410) CLONE - Spin RCs

2020-12-18 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25410:


 Summary: CLONE - Spin RCs
 Key: HBASE-25410
 URL: https://issues.apache.org/jira/browse/HBASE-25410
 Project: HBase
  Issue Type: Sub-task
Reporter: Viraj Jasani
Assignee: Viraj Jasani






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25417) CLONE - Send announce email

2020-12-18 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25417:


 Summary: CLONE - Send announce email
 Key: HBASE-25417
 URL: https://issues.apache.org/jira/browse/HBASE-25417
 Project: HBase
  Issue Type: Sub-task
Reporter: Viraj Jasani
Assignee: Viraj Jasani






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25412) CLONE - Release version 2.3.2 in Jira

2020-12-18 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25412:


 Summary: CLONE - Release version 2.3.2 in Jira
 Key: HBASE-25412
 URL: https://issues.apache.org/jira/browse/HBASE-25412
 Project: HBase
  Issue Type: Sub-task
Reporter: Viraj Jasani
Assignee: Viraj Jasani






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25416) CLONE - Add 2.3.2 to the downloads page

2020-12-18 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25416:


 Summary: CLONE - Add 2.3.2 to the downloads page
 Key: HBASE-25416
 URL: https://issues.apache.org/jira/browse/HBASE-25416
 Project: HBase
  Issue Type: Sub-task
Reporter: Viraj Jasani
Assignee: Viraj Jasani
 Fix For: 3.0.0-alpha-1






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25413) CLONE - Promote 2.3.2 RC artifacts in svn

2020-12-18 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25413:


 Summary: CLONE - Promote 2.3.2 RC artifacts in svn
 Key: HBASE-25413
 URL: https://issues.apache.org/jira/browse/HBASE-25413
 Project: HBase
  Issue Type: Sub-task
Reporter: Viraj Jasani
Assignee: Nick Dimiduk






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25409) Release 2.3.4

2020-12-18 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25409:


 Summary: Release 2.3.4
 Key: HBASE-25409
 URL: https://issues.apache.org/jira/browse/HBASE-25409
 Project: HBase
  Issue Type: Task
  Components: community
Reporter: Viraj Jasani
Assignee: Viraj Jasani






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25358) meta replica regions are assigned to the same region server during SCP.

2020-12-11 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25358.
--
Resolution: Invalid

I checked the code and there is guard in the code to avoid assigning two 
replica regions to the same region server. So kind of lost and went back to 
rerun itbll and was able to reproduce it.

It is itbll actions which moves regions around and in some cases, it moves meta 
replica regions to the same region server. This is not a bug and resolve it.

> meta replica regions are assigned to the same region server during SCP.
> ---
>
> Key: HBASE-25358
> URL: https://issues.apache.org/jira/browse/HBASE-25358
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 2.4.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> When running 2.4.0 RC1 with meta replica enabled, during SCP, meta replica 
> regions are assigned to the same region server. I think the reason is that 
> SCP uses round robin algo to assign meta replicas and do not exclude  region 
> servers hosting replica regions. This is not a new issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25368) Filter out more invalid encoded name in isEncodedRegionName(byte[] regionName)

2020-12-07 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25368:


 Summary: Filter out more invalid encoded name in 
isEncodedRegionName(byte[] regionName) 
 Key: HBASE-25368
 URL: https://issues.apache.org/jira/browse/HBASE-25368
 Project: HBase
  Issue Type: Improvement
  Components: Client
Reporter: Huaxiang Sun


{code:java}
public static boolean isEncodedRegionName(byte[] regionName) {
  // If not parseable as region name, presume encoded. TODO: add stringency; 
e.g. if hex.
  return parseRegionNameOrReturnNull(regionName) == null && regionName.length 
<= MD5_HEX_LENGTH;
}

Right now, if it passes in an table name, it still thinks  it is a encoded 
region name and will result in unnecessary registry query for meta regions. 
This can be avoided if table names can be filtered out early in this 
method.{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25358) meta replica regions are assigned to

2020-12-03 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25358:


 Summary: meta replica regions are assigned to 
 Key: HBASE-25358
 URL: https://issues.apache.org/jira/browse/HBASE-25358
 Project: HBase
  Issue Type: Bug
  Components: read replicas
Reporter: Huaxiang Sun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25356) shell command major_compact misbehave for hbase:meta

2020-12-03 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25356:


 Summary: shell command major_compact misbehave for hbase:meta
 Key: HBASE-25356
 URL: https://issues.apache.org/jira/browse/HBASE-25356
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 1.6.0, 2.4.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


I was running shell command to major compact meta table. The implementation is 
wrong because it tries to search the meta table with meta table name. This also 
results in an unnecessary scan of meta table. 

 

majorCompactRegion() is calling getRegion() which basically scan meta table 
itself.

This command is being used by operator quite often, we need to correct it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-25343) Avoid the failed meta replica region temporarily in Load Balance mode

2020-12-01 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun reopened HBASE-25343:
--

Reopen to reflect the new scope.

> Avoid the failed meta replica region temporarily in Load Balance mode
> -
>
> Key: HBASE-25343
> URL: https://issues.apache.org/jira/browse/HBASE-25343
> Project: HBase
>  Issue Type: Sub-task
>  Components: meta replicas
>Affects Versions: 2.4.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 2.4.1
>
>
> This is a follow-up enhancement with Stack, Duo. With the newly introduced 
> meta replica LoadBalance mode, if there is something wrong with one of meta 
> replica regions, the current logic is that it keeps trying until the meta 
> replica region is onlined again or it reports error, i.e, there is no HA at 
> LoadBalance mode. HA can be implemented if it reports timeout with one meta 
> replica region and tries another meta replica region.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25343) Add HA support on top of Load Balance mode

2020-12-01 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-25343.
--
Resolution: Won't Do

> Add HA support on top of Load Balance mode
> --
>
> Key: HBASE-25343
> URL: https://issues.apache.org/jira/browse/HBASE-25343
> Project: HBase
>  Issue Type: Sub-task
>  Components: meta replicas
>Affects Versions: 2.4.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 2.4.1
>
>
> This is a follow-up enhancement with Stack, Duo. With the newly introduced 
> meta replica LoadBalance mode, if there is something wrong with one of meta 
> replica regions, the current logic is that it keeps trying until the meta 
> replica region is onlined again or it reports error, i.e, there is no HA at 
> LoadBalance mode. HA can be implemented if it reports timeout with one meta 
> replica region and tries another meta replica region.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25343) Add HA support on top of Load Balance mode

2020-11-30 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25343:


 Summary: Add HA support on top of Load Balance mode
 Key: HBASE-25343
 URL: https://issues.apache.org/jira/browse/HBASE-25343
 Project: HBase
  Issue Type: Sub-task
  Components: meta replicas
Affects Versions: 2.4.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun
 Fix For: 2.4.1


This is a follow-up enhancement with Stack, Duo. With the newly introduced meta 
replica LoadBalance mode, if there is something wrong with one of meta replica 
regions, the current logic is that it keeps trying until the meta replica 
region is onlined again or it reports error, i.e, there is no HA at LoadBalance 
mode. HA can be implemented if it reports timeout with one meta replica region 
and tries another meta replica region.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25293) Followup jira to address the client handling issue when chaning from meta replica to non-meta-replica at the server side.

2020-11-16 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25293:


 Summary: Followup jira to address the client handling issue when 
chaning from meta replica to non-meta-replica at the server side.
 Key: HBASE-25293
 URL: https://issues.apache.org/jira/browse/HBASE-25293
 Project: HBase
  Issue Type: Sub-task
Reporter: Huaxiang Sun


[https://github.com/apache/hbase/pull/2643]

 

{quote}

With my operator hat on, I'd assume that LOAD_BALANCE with 1 replica count 
works like no read replicas configured (logic wise at-least, even though the 
code paths are different).
{quote}If the server side does not support meta replica, the client side cannot 
be configured to support this mode
{quote}
Since clients are usually long running (meaning we may not be able to restart 
client or they using cached HBase connection) and meta replica count can be 
altered on the service side on the fly, I'd expect client to work across these 
changes without any configuration changes. WDYT?

{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25291) Document how to enable the meta replica load balance mode for the client

2020-11-16 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25291:


 Summary: Document how to enable the meta replica load balance mode 
for the client
 Key: HBASE-25291
 URL: https://issues.apache.org/jira/browse/HBASE-25291
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


Need to document how to enable meta replica Load Balance mode for clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25248) Followup jira to create single thread ScheduledExecutorService in AsyncConnImpl, and schedule all these periodic tasks

2020-11-04 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25248:


 Summary: Followup jira to create single thread 
ScheduledExecutorService in AsyncConnImpl, and schedule all these periodic tasks
 Key: HBASE-25248
 URL: https://issues.apache.org/jira/browse/HBASE-25248
 Project: HBase
  Issue Type: Sub-task
Reporter: Huaxiang Sun


This is a followup Jira for comments in 
[https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0.]
 

{quote}
h4. *[saintstack|https://github.com/saintstack]* [18 hours 
ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r517040579]
 Member
So, implements Stoppable rather than do what the likes of AuthUtil does where 
it does createDummyStoppable and then has an internal do-nothing Stoppable? 
Makes sense.

Perhaps add comment that it is a do-nothing stop required by ScheduledChore 
impls. s/isStopped/stopped/
 
[!https://avatars1.githubusercontent.com/u/62515050?s=60=4|width=28,height=28!|https://github.com/huaxiangsun]
h4. *[huaxiangsun|https://github.com/huaxiangsun]* [18 hours 
ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r517042290]
 Author Member
Will do.
 
[!https://avatars2.githubusercontent.com/u/45484?s=60=4|width=28,height=28!|https://github.com/ndimiduk]
h4. *[ndimiduk|https://github.com/ndimiduk]* [17 hours 
ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r517057141]
 Member
Maybe in the future we can put a default empty implementation on the interface, 
and then implementers who don't need it can ignore it.
 
[!https://avatars3.githubusercontent.com/u/4958168?s=60=fc28b222c03c02201d705b025a5293d6c471f7b3=4|width=28,height=28!|https://github.com/Apache9]
h4. *[Apache9|https://github.com/Apache9]* [17 hours 
ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r517057999]
 Member
Maybe we could just use a ScheduledExecutorService at client side, the 
ChoreService is designed to be used at server side I believe. Anyway, not a 
blocker for now.

{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25247) Followup jira to encap all meta replica mode/selector processing into CatalogReplicaModeManager

2020-11-04 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25247:


 Summary: Followup jira to encap all meta replica mode/selector 
processing into CatalogReplicaModeManager
 Key: HBASE-25247
 URL: https://issues.apache.org/jira/browse/HBASE-25247
 Project: HBase
  Issue Type: Sub-task
  Components: meta
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


This is follow up with Stack's comments in 
[https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0.]

{quote}
h4. *[saintstack|https://github.com/saintstack]* [6 days 
ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r514558880]
 
Member
Yeah, said this before but in follow-on, would be good to shove all this stuff 
into a CatalogReplicaMode class. Internally this class would figure which 
policy to run. It would have a method that took a Scan that allowed decorating 
the Scan w/ whatever the mode needed to implement its policy. Later.
 
[!https://avatars1.githubusercontent.com/u/62515050?s=60=4|width=28,height=28!|https://github.com/huaxiangsun]
 
h4. *[huaxiangsun|https://github.com/huaxiangsun]* [6 days 
ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r514587250]
 
Author Member
Now I thought about it, it makes sense. Maybe a CatalogReplicaModeManager class 
which encaps mode and selector?
Let me create a followup jira after this is merged.
 
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25241) Add integration test for meta replica load balance mode

2020-11-03 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25241:


 Summary: Add integration test for meta replica load balance mode
 Key: HBASE-25241
 URL: https://issues.apache.org/jira/browse/HBASE-25241
 Project: HBase
  Issue Type: Sub-task
  Components: integration tests
Reporter: Huaxiang Sun


We need to create an integration test which has meta replica load balance mode 
enabled and make sure its correctness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25158) Enhance balancer to make sure no meta primary/replica regions are going to be assigned to one same region server.

2020-10-06 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25158:


 Summary: Enhance balancer to make sure no meta primary/replica 
regions are going to be assigned to one same region server.
 Key: HBASE-25158
 URL: https://issues.apache.org/jira/browse/HBASE-25158
 Project: HBase
  Issue Type: Sub-task
Reporter: Huaxiang Sun


Region replica has enhancement in balancer that primary region and its replicas 
are not going to be assigned to the same region server. Today, there is only 
one meta region, so this enhancement is still enough. With split meta coming 
in, it needs to make sure that no meta regoin/replicas is going to be assigned 
to the same region server in order to avoid hotspot issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25129) serial replication, addReplicationBarrier is writing to rep_barrier family even there is no serial replication peer.

2020-09-30 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25129:


 Summary: serial replication, addReplicationBarrier is writing to 
rep_barrier family even there is no serial replication peer.
 Key: HBASE-25129
 URL: https://issues.apache.org/jira/browse/HBASE-25129
 Project: HBase
  Issue Type: Bug
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


We found that there are quite some data in rep_barrier family even there is no 
serial replication enabled. Checked the code,  it is checking if table has 
replication enabled. Think there is another check needed (i.e, is there any 
serial replication peers configured).

[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java#L215]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25127) Enhance PerformanceEvaluation to profile meta replica performance.

2020-09-30 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25127:


 Summary: Enhance PerformanceEvaluation to profile meta replica 
performance.
 Key: HBASE-25127
 URL: https://issues.apache.org/jira/browse/HBASE-25127
 Project: HBase
  Issue Type: Sub-task
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25126) Add load balance logic in hbase-client to distribute read load over meta replica regions.

2020-09-30 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25126:


 Summary: Add load balance logic in hbase-client to distribute read 
load over meta replica regions.
 Key: HBASE-25126
 URL: https://issues.apache.org/jira/browse/HBASE-25126
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 3.0.0-alpha-1
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25125) Create a ReplicationEndPoint for meta/look table.

2020-09-30 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-25125:


 Summary: Create a ReplicationEndPoint for meta/look table.
 Key: HBASE-25125
 URL: https://issues.apache.org/jira/browse/HBASE-25125
 Project: HBase
  Issue Type: Sub-task
  Components: read replicas
Affects Versions: 3.0.0-alpha-1
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24563) Make hbck chore aware of replica region and check/fix replica region consistency

2020-08-18 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-24563.
--
Resolution: Duplicate

It is covered by other jiras, no need for this one.

> Make hbck chore aware of replica region and check/fix replica region 
> consistency
> 
>
> Key: HBASE-24563
> URL: https://issues.apache.org/jira/browse/HBASE-24563
> Project: HBase
>  Issue Type: Improvement
>  Components: read replicas
>Affects Versions: 2.3.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> Hbck1 checks/fix only primary region consistency and ignores replica region. 
> In hbase 2, hbck chore needs to be aware of replica region and check its 
> consistency as well. Hbck2 needs to fix replica region inconsistency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24824) Add more stats in PE for read replica

2020-08-10 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-24824.
--
Fix Version/s: 2.4.0
   2.3.1
   3.0.0-alpha-1
   Resolution: Fixed

> Add more stats in PE for read replica
> -
>
> Key: HBASE-24824
> URL: https://issues.apache.org/jira/browse/HBASE-24824
> Project: HBase
>  Issue Type: Improvement
>  Components: PE, read replicas
>Affects Versions: 2.3.1
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0
>
> Attachments: Screen Shot 2020-08-05 at 5.04.56 PM.png
>
>
> Add more stats for read replica PE test. Currently, there is read replica 
> tests in PE, but it does not provide details for how many requests to replica 
> regions, and how many replica results win.
> Also, want to add a latency histogram for replica reads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24824) Add more stats in PE for read replica

2020-08-05 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24824:


 Summary: Add more stats in PE for read replica
 Key: HBASE-24824
 URL: https://issues.apache.org/jira/browse/HBASE-24824
 Project: HBase
  Issue Type: Improvement
  Components: PE, read replicas
Affects Versions: 2.3.1
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


Add more stats for read replica PE test. Currently, there is read replica tests 
in PE, but it does not provide details for how many requests to replica 
regions, and how many replica results win.

Also, want to add a latency histogram for replica reads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24804) Follow up work add client side scan metrics for read replica

2020-07-30 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24804:


 Summary: Follow up work add client side scan metrics for read 
replica
 Key: HBASE-24804
 URL: https://issues.apache.org/jira/browse/HBASE-24804
 Project: HBase
  Issue Type: New Feature
  Components: read replicas
Affects Versions: 2.4.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


This is a followup work for HBASE-18436, which adds client metrics for read 
replica get. Will add metrics for scan as well. This metrics will be used in PE 
and any interested applications.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24705) MetaFixer#fixHoles() does not include the case for read replicas (i.e, replica regions are not created)

2020-07-14 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-24705.
--
Fix Version/s: 2.4.0
   2.3.1
   3.0.0-alpha-1
   Resolution: Fixed

> MetaFixer#fixHoles() does not include the case for read replicas (i.e, 
> replica regions are not created)
> ---
>
> Key: HBASE-24705
> URL: https://issues.apache.org/jira/browse/HBASE-24705
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 2.3.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24708) Flaky Test TestRegionReplicas#testVerifySecondaryAbilityToReadWithOnFiles

2020-07-09 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24708:


 Summary: Flaky Test 
TestRegionReplicas#testVerifySecondaryAbilityToReadWithOnFiles
 Key: HBASE-24708
 URL: https://issues.apache.org/jira/browse/HBASE-24708
 Project: HBase
  Issue Type: Test
  Components: test
Affects Versions: 2.3.0
Reporter: Huaxiang Sun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24707) Fix Empty region in meta with read replica

2020-07-09 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24707:


 Summary: Fix Empty region in meta with read replica
 Key: HBASE-24707
 URL: https://issues.apache.org/jira/browse/HBASE-24707
 Project: HBase
  Issue Type: Improvement
  Components: hbck2
Affects Versions: 2.3.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


Currently, there is a case in CatalogJanitor which checks if the default region 
info is missing in metaRow, it is reporting it as EmptyRegionInfoList. 

 

For read replica, this entry needs to be dealt with. In hbase-1, this was 
caused by region server opens an orphan replica region. In hbase-2, it will not 
happen since checks are added to defend this case. The hback2 fix is still 
needed for upgrade. Issues could be brought into hbase-2 post upgrade, hbck2 
needs to handle it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24688) AssignRegionHandler uses EventType.M_RS_CLOSE_META instead of EventType.M_RS_OPEN_META

2020-07-06 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24688:


 Summary: AssignRegionHandler uses EventType.M_RS_CLOSE_META 
instead of EventType.M_RS_OPEN_META
 Key: HBASE-24688
 URL: https://issues.apache.org/jira/browse/HBASE-24688
 Project: HBase
  Issue Type: Bug
Reporter: Huaxiang Sun


This results in openMetaRegion always be executed in closeMetaExecutor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24661) TestHeapSize.testSizes failure

2020-06-30 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24661:


 Summary: TestHeapSize.testSizes  failure
 Key: HBASE-24661
 URL: https://issues.apache.org/jira/browse/HBASE-24661
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0-alpha-1
Reporter: Huaxiang Sun


{code:java}
INFO] 
[INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ hbase-server ---
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hbase.io.TestHeapSize
[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.884 s 
<<< FAILURE! - in org.apache.hadoop.hbase.io.TestHeapSize
[ERROR] org.apache.hadoop.hbase.io.TestHeapSize.testSizes  Time elapsed: 0.308 
s  <<< FAILURE!
java.lang.AssertionError: expected:<368> but was:<360>
at 
org.apache.hadoop.hbase.io.TestHeapSize.testSizes(TestHeapSize.java:493)


[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   TestHeapSize.testSizes:493 expected:<368> but was:<360>
[INFO] 
[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0
[INFO] 
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24552) Replica region needs to check if primary region directory exists at file system in TransitRegionStateProcedure

2020-06-29 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-24552.
--
Fix Version/s: 2.3.0
   3.0.0-alpha-1
   Resolution: Fixed

> Replica region needs to check if primary region directory exists at file 
> system  in TransitRegionStateProcedure 
> 
>
> Key: HBASE-24552
> URL: https://issues.apache.org/jira/browse/HBASE-24552
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 2.3.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0
>
>
> In hbase-1, it always runs into the situation that primary region has been 
> closed/removed and replica region still stays in master's in-memory db and 
> open at one of the region servers. Balancer can move this replica region to a 
> new region server. During the region open, replica region does not check if 
> primary region has been removed and moves forward. During store open, it will 
> recreates primary region directory at hdfs and caused inconsistency.
>  
> In hbase-2, things get much better. To prevent the above inconsistency from 
> happening, it adds more checks for a replica region, i.e, if primary regions' 
> directory exists and there is a .regioninfo under. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24643) Replace Cluster#primariesOfRegionsPerServer from int array to treemap

2020-06-26 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24643:


 Summary: Replace Cluster#primariesOfRegionsPerServer from int 
array to treemap
 Key: HBASE-24643
 URL: https://issues.apache.org/jira/browse/HBASE-24643
 Project: HBase
  Issue Type: Improvement
  Components: Balancer
Affects Versions: 2.3.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


Currently, primariesOfRegionsPerServer is an int array, moveRegion does heavy 
work by searching the array (linearly) and insert/remove an element requires 
allocating/copying the whole array.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24633) Remove data locality and StoreFileCostFunction for replica regions out of balancer's cost calculation

2020-06-24 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24633:


 Summary: Remove data locality and StoreFileCostFunction for 
replica regions out of balancer's cost calculation
 Key: HBASE-24633
 URL: https://issues.apache.org/jira/browse/HBASE-24633
 Project: HBase
  Issue Type: Improvement
  Components: Balancer
Affects Versions: 2.3.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


We found one of the clusters with read replica enabled always balance lots of 
replica regions. going through the balancer's cost functions, found that data 
locality and StoreFileCost have same multiplier for both primary and replica 
regions. That is something we can improve. Data locality for replica regions 
should not be a dominant factor for balancer. We can either remove it out of 
balancer's picture for now and give it a small multiplier.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24582) The current implementation of assignMetaReplicas() may assign replica meta regions to the same server hosting primary meta region.

2020-06-17 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24582:


 Summary: The current implementation of assignMetaReplicas() may 
assign replica meta regions to the same server hosting primary meta region.
 Key: HBASE-24582
 URL: https://issues.apache.org/jira/browse/HBASE-24582
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


We need to take the approach similar to SplitTableRegionProcedure, which uses 
round robin algo to assign replica regions and excludes the primary server.

 

'''return AssignmentManagerUtil.createAssignProceduresForOpeningNewRegions(env, 
hris,
 getRegionReplication(env), getParentRegionServerName(env));'''



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24581) Replica regions should not trigger any compaction

2020-06-17 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24581:


 Summary: Replica regions should not trigger any compaction
 Key: HBASE-24581
 URL: https://issues.apache.org/jira/browse/HBASE-24581
 Project: HBase
  Issue Type: Bug
  Components: read replicas
Affects Versions: 2.3.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


I found that in certain cases replica regions can trigger compaction, such as 
{code:java}
@Override
public void postOpenDeployTasks(final PostOpenDeployContext context) throws 
IOException {
  HRegion r = context.getRegion();
  long openProcId = context.getOpenProcId();
  long masterSystemTime = context.getMasterSystemTime();
  rpcServices.checkOpen();
  LOG.info("Post open deploy tasks for {}, openProcId={}, masterSystemTime={}",
r.getRegionInfo().getRegionNameAsString(), openProcId, masterSystemTime);
  // Do checks to see if we need to compact (references or too many files)
  // TODO: SHX, do not do this for replica regions? Otherwise, it is going to 
lost data locality for primary regions.
  for (HStore s : r.stores.values()) {
if (s.hasReferences() || s.needsCompaction()) {
  this.compactSplitThread.requestSystemCompaction(r, s, "Opening Region");
}
  }
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24563) Make hbck chore aware of replica region and check/fix replica region consistency

2020-06-15 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24563:


 Summary: Make hbck chore aware of replica region and check/fix 
replica region consistency
 Key: HBASE-24563
 URL: https://issues.apache.org/jira/browse/HBASE-24563
 Project: HBase
  Issue Type: Improvement
  Components: read replicas
Affects Versions: 2.3.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


Hbck1 checks/fix only primary region consistency and ignores replica region. In 
hbase 2, hbck chore needs to be aware of replica region and check its 
consistency as well. Hbck2 needs to fix replica region inconsistency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24554) Improve/stable read replica

2020-06-12 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24554:


 Summary: Improve/stable read replica
 Key: HBASE-24554
 URL: https://issues.apache.org/jira/browse/HBASE-24554
 Project: HBase
  Issue Type: Task
  Components: read replicas
Affects Versions: 2.3.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


Tracing some read replica issues recently, this is the umbrella Jira to track 
this effort. A few observations so far:
 # balancer balances replica regions too often, need to spend time on it. 
Replica region does not serve write and rarely serve reads (unless the client 
specifically selects the replica region). So data locality should be a very 
minimum factor for replica regions. 
 # Need to study split/merge for regions with replica, need to make them more 
robust. With proc-v2, probably it is already robust. 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24552) Replica region needs to do check if primary region exists in hdfs during createRegionOnFileSystem().

2020-06-12 Thread Huaxiang Sun (Jira)
Huaxiang Sun created HBASE-24552:


 Summary: Replica region needs to do check if primary region exists 
in hdfs during createRegionOnFileSystem().
 Key: HBASE-24552
 URL: https://issues.apache.org/jira/browse/HBASE-24552
 Project: HBase
  Issue Type: Bug
  Components: read replicas
Affects Versions: 2.3.0
Reporter: Huaxiang Sun
Assignee: Huaxiang Sun


When a replica is opened, it does not check if region dir exists and if 
.regionInfo exists in the directory, region server will online this replica 
region even the primary region does not exist. 

 

It needs to do better to do more checks and fails region open if the check does 
not pass.

Maybe we can do this check in master, will see.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23202) ExportSnapshot (import) will fail if copying files to root directory takes longer than cleaner TTL

2020-06-08 Thread Huaxiang Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxiang Sun resolved HBASE-23202.
--
Fix Version/s: 3.0.0-alpha-1
   Resolution: Fixed

I updated the patch based on review comments (addressed the missed case when 
snapshot working dir is different from the hbase root dir, added a new test 
case for the described case), pushed the patch to branch-2.3, branch-2 and 
master, resolving it.

> ExportSnapshot (import) will fail if copying files to root directory takes 
> longer than cleaner TTL
> --
>
> Key: HBASE-23202
> URL: https://issues.apache.org/jira/browse/HBASE-23202
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-1, 1.5.0, 2.2.1, 1.4.11, 2.1.7
>Reporter: Zach York
>Assignee: Guangxu Cheng
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0
>
>
> HBASE-17330 removed the checking of the snapshot .tmp directory when 
> determining which files are candidates for deletes. It appears that in the 
> latest branches, this isn't an issue for taking a snapshot as it checks 
> whether a snapshot is in progress via the SnapshotManager.
> However, when using the ExportSnapshot tool to import a snapshot into a 
> cluster, it will first copy the snapshot manifest into 
> /.snapshot/.tmp/ [1], copies the files, and then renames the 
> snapshot manifest to the final snapshot directory. If the copyFiles job takes 
> longer than the cleaner TTL, the ExportSnapshot job will fail because HFiles 
> will get deleted before the snapshot is committed to the final directory. 
> The ExportSnapshot tool already has a functionality to skipTmp and write the 
> manifest directly to the final location. However, this has unintended 
> consequences such as the snapshot appearing to the user before it is usable. 
> So it looks like we will have to bring back the tmp directory check to avoid 
> this situation.
> [1] 
> https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java#L1029



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >