[jira] [Commented] (HBASE-27571) Get supports RAW

2023-01-17 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677684#comment-17677684
 ] 

Bo Cui commented on HBASE-27571:


hi [~zhangduo] we may push down this limit to RawScanQueryMatcher or remove it ?

> Get supports RAW
> 
>
> Key: HBASE-27571
> URL: https://issues.apache.org/jira/browse/HBASE-27571
> Project: HBase
>  Issue Type: Improvement
>Reporter: Bo Cui
>Priority: Major
>
> [https://github.com/apache/hbase/blob/da261344cc55e7812dfe22d86d5fa88c93ed79b9/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L234]
>  
> I used the `Get` to query all put and delete in a column, but I got this error
> *Cannot specify any column for a raw scan.*
> Why add this restriction?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27571) Get supports RAW

2023-01-17 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677680#comment-17677680
 ] 

Bo Cui commented on HBASE-27571:


and then I removed the `if (columns != null && scan.isRaw())` logic and got the 
correct answer.

My guess is that it is not sure if the family has delete operation (`delete 
't1','r1','f1',1673945640117`).

but if the consumer determines that the family  has not delete operation, we 
should allow such `Get`.

> Get supports RAW
> 
>
> Key: HBASE-27571
> URL: https://issues.apache.org/jira/browse/HBASE-27571
> Project: HBase
>  Issue Type: Improvement
>Reporter: Bo Cui
>Priority: Major
>
> [https://github.com/apache/hbase/blob/da261344cc55e7812dfe22d86d5fa88c93ed79b9/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L234]
>  
> I used the `Get` to query all put and delete in a column, but I got this error
> *Cannot specify any column for a raw scan.*
> Why add this restriction?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27571) Get supports RAW

2023-01-16 Thread Bo Cui (Jira)
Bo Cui created HBASE-27571:
--

 Summary: Get supports RAW
 Key: HBASE-27571
 URL: https://issues.apache.org/jira/browse/HBASE-27571
 Project: HBase
  Issue Type: Improvement
Reporter: Bo Cui


[https://github.com/apache/hbase/blob/da261344cc55e7812dfe22d86d5fa88c93ed79b9/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L234]

 

I used the `Get` to query all put and delete in a column, but I got this error

*Cannot specify any column for a raw scan.*

Why add this restriction?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-20727) Persist FlushedSequenceId to speed up WAL split after cluster restart

2021-01-15 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-20727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17266480#comment-17266480
 ] 

Bo Cui commented on HBASE-20727:


[~allan163]hi,Can we optimize the feature?
1) the new file is written to the tmp directory. after the new file is written 
and moved successfully, delete the old file. because new files may fail to be 
written.
2) can we write new file in batches? If hbase has too many regions, the 
FlushedSequenceIdFlusher occupies master ChoreService for a long time...like 
HBASE-25506

> Persist FlushedSequenceId to speed up WAL split after cluster restart
> -
>
> Key: HBASE-20727
> URL: https://issues.apache.org/jira/browse/HBASE-20727
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.0.0
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
> Attachments: HBASE-20727.002.patch, HBASE-20727.003.patch, 
> HBASE-20727.004.patch, HBASE-20727.005.patch, HBASE-20727.patch
>
>
> We use flushedSequenceIdByRegion and storeFlushedSequenceIdsByRegion in 
> ServerManager to record the latest flushed seqids of regions and stores. So 
> during log split, we can use seqids stored in those maps to filter out the 
> edits which do not need to be replayed. But, those maps are not persisted. 
> After cluster restart or master restart, info of flushed seqids are all lost. 
> Here I offer a way to persist those info to HDFS, even if master restart, we 
> can still use those info to filter WAL edits and then to speed up replay.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25506) ServerManager affects MTTR of HMaster

2021-01-14 Thread Bo Cui (Jira)
Bo Cui created HBASE-25506:
--

 Summary: ServerManager affects MTTR of HMaster
 Key: HBASE-25506
 URL: https://issues.apache.org/jira/browse/HBASE-25506
 Project: HBase
  Issue Type: Improvement
  Components: MTTR
Reporter: Bo Cui
 Attachments: image-2021-01-14-17-44-16-091.png, 
image-2021-01-14-17-44-42-181.png

https://github.com/apache/hbase/blob/3488c44a21612aae1835fc3e91a4a12ed2abb8b7/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java#L925

If a cluster has N+W regions, this 
removeDeletedRegionFromLoadedFlushedSequenceIds takes a long time...
 !image-2021-01-14-17-44-16-091.png! 
 !image-2021-01-14-17-44-42-181.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-25506) ServerManager affects MTTR of HMaster

2021-01-14 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-25506:
--

Assignee: Bo Cui

> ServerManager affects MTTR of HMaster
> -
>
> Key: HBASE-25506
> URL: https://issues.apache.org/jira/browse/HBASE-25506
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2021-01-14-17-44-16-091.png, 
> image-2021-01-14-17-44-42-181.png
>
>
> https://github.com/apache/hbase/blob/3488c44a21612aae1835fc3e91a4a12ed2abb8b7/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java#L925
> If a cluster has N+W regions, this 
> removeDeletedRegionFromLoadedFlushedSequenceIds takes a long time...
>  !image-2021-01-14-17-44-16-091.png! 
>  !image-2021-01-14-17-44-42-181.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25506) ServerManager affects MTTR of HMaster

2021-01-14 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-25506:
---
Affects Version/s: 3.0.0-alpha-1

> ServerManager affects MTTR of HMaster
> -
>
> Key: HBASE-25506
> URL: https://issues.apache.org/jira/browse/HBASE-25506
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 3.0.0-alpha-1
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2021-01-14-17-44-16-091.png, 
> image-2021-01-14-17-44-42-181.png
>
>
> https://github.com/apache/hbase/blob/3488c44a21612aae1835fc3e91a4a12ed2abb8b7/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java#L925
> If a cluster has N+W regions, this 
> removeDeletedRegionFromLoadedFlushedSequenceIds takes a long time...
>  !image-2021-01-14-17-44-16-091.png! 
>  !image-2021-01-14-17-44-42-181.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-25483) set the loadMeta log level to debug.

2021-01-07 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-25483:
--

Assignee: Bo Cui

> set the loadMeta log level to debug.
> 
>
> Key: HBASE-25483
> URL: https://issues.apache.org/jira/browse/HBASE-25483
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR, Region Assignment
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> https://github.com/apache/hbase/blob/2444d268901644d90def3fca39505627ff956b40/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java#L167
> test 100w Regions, the log level is info, it takes more than 250 seconds to 
> load metadata. The log is debug. It takes more than 100 seconds to load 
> metadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25483) set the loadMeta log level to debug.

2021-01-07 Thread Bo Cui (Jira)
Bo Cui created HBASE-25483:
--

 Summary: set the loadMeta log level to debug.
 Key: HBASE-25483
 URL: https://issues.apache.org/jira/browse/HBASE-25483
 Project: HBase
  Issue Type: Improvement
  Components: MTTR, Region Assignment
Affects Versions: 2.2.3, 3.0.0-alpha-1
Reporter: Bo Cui


https://github.com/apache/hbase/blob/2444d268901644d90def3fca39505627ff956b40/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java#L167
test 100w Regions, the log level is info, it takes more than 250 seconds to 
load metadata. The log is debug. It takes more than 100 seconds to load 
metadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-25461) when the cluster has many tables, UI can be opened quickly

2021-01-04 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-25461:
--

Assignee: Bo Cui

> when the cluster has many tables, UI can be opened quickly
> --
>
> Key: HBASE-25461
> URL: https://issues.apache.org/jira/browse/HBASE-25461
> Project: HBase
>  Issue Type: Improvement
>  Components: UI
>Affects Versions: 3.0.0-alpha-1
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> my cluster has 60+K tables, and UI is opened slowly.
> From the following code, we can reduce steps of class conversion
> rsgroup.jsp
> https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/resources/hbase-webapps/master/rsgroup.jsp#L439
> snapshot.jsp
> https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/resources/hbase-webapps/master/snapshot.jsp#L42
> RegionStates.java
> https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java#L538
> RSStatusTmpl.jamon
> https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon#L47



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25461) when the cluster has many tables, UI can be opened quickly

2021-01-04 Thread Bo Cui (Jira)
Bo Cui created HBASE-25461:
--

 Summary: when the cluster has many tables, UI can be opened quickly
 Key: HBASE-25461
 URL: https://issues.apache.org/jira/browse/HBASE-25461
 Project: HBase
  Issue Type: Improvement
  Components: UI
Affects Versions: 3.0.0-alpha-1
Reporter: Bo Cui


my cluster has 60+K tables, and UI is opened slowly.
>From the following code, we can reduce steps of class conversion

rsgroup.jsp
https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/resources/hbase-webapps/master/rsgroup.jsp#L439
snapshot.jsp
https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/resources/hbase-webapps/master/snapshot.jsp#L42
RegionStates.java
https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java#L538
RSStatusTmpl.jamon
https://github.com/apache/hbase/blob/600be60a4bd4d3b3e9652027a0cb8bdd32016c6b/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon#L47



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25447) remoteProc is suspended due to OOM ERROR

2020-12-28 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255772#comment-17255772
 ] 

Bo Cui commented on HBASE-25447:


1. the timeoutThread rarely encounter exceptions. and if timeoutThread throws 
exception, the node of master may have some serious problems, for example, 
resource leakage, stop master better than timeoutThread retry...
2. In the production env, we have two masters, one active and one standby, and 
standby might be fine, and HBase can be recovered quickly...

so i think , abort master better than timoutThread retry..

> remoteProc is suspended due to OOM ERROR
> 
>
> Key: HBASE-25447
> URL: https://issues.apache.org/jira/browse/HBASE-25447
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-12-26-11-49-38-018.png
>
>
> https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317
> If resource leakage occurs due to other components or reasons, 
> BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the 
> while (running.get()), and some procs will stuck...
>  !image-2020-12-26-11-49-38-018.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25447) remoteProc is suspended due to OOM ERROR

2020-12-25 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-25447:
---
Description: 
https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317
If resource leakage occurs due to other components or reasons, 
BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the 
while (running.get()), and some procs will stuck...
 !image-2020-12-26-11-49-38-018.png! 



  was:
https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317
If a node leaks resources due to other components or reasons, 
BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the 
while (running.get()), and some procs will stuck...
 !image-2020-12-26-11-49-38-018.png! 




> remoteProc is suspended due to OOM ERROR
> 
>
> Key: HBASE-25447
> URL: https://issues.apache.org/jira/browse/HBASE-25447
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-12-26-11-49-38-018.png
>
>
> https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317
> If resource leakage occurs due to other components or reasons, 
> BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the 
> while (running.get()), and some procs will stuck...
>  !image-2020-12-26-11-49-38-018.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HBASE-25447) remoteProc is suspended due to OOM ERROR

2020-12-25 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-25447:
---
Comment: was deleted

(was: 2 solution :
1. hmaster abort 
2. ProcedureDispatcherTimeoutThread does not exit.)

> remoteProc is suspended due to OOM ERROR
> 
>
> Key: HBASE-25447
> URL: https://issues.apache.org/jira/browse/HBASE-25447
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-12-26-11-49-38-018.png
>
>
> https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317
> If a node leaks resources due to other components or reasons, 
> BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the 
> while (running.get()), and some procs will stuck...
>  !image-2020-12-26-11-49-38-018.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25447) remoteProc is suspended due to OOM ERROR

2020-12-25 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254942#comment-17254942
 ] 

Bo Cui commented on HBASE-25447:


2 solution :
1. hmaster abort 
2. ProcedureDispatcherTimeoutThread does not exit.

> remoteProc is suspended due to OOM ERROR
> 
>
> Key: HBASE-25447
> URL: https://issues.apache.org/jira/browse/HBASE-25447
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-12-26-11-49-38-018.png
>
>
> https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317
> If a node leaks resources due to other components or reasons, 
> BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the 
> while (running.get()), and some procs will stuck...
>  !image-2020-12-26-11-49-38-018.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25447) remoteProc is suspended due to OOM ERROR

2020-12-25 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-25447:
---
Description: 
https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317
If a node leaks resources due to other components or reasons, 
BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the 
while (running.get()), and some procs will stuck...
 !image-2020-12-26-11-49-38-018.png! 



  was:
https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317
If a node leaks resources due to other components or reasons, 
BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the 
while (running.get()), and some procs will stuck...
 !image-2020-12-26-11-49-38-018.png! 


> remoteProc is suspended due to OOM ERROR
> 
>
> Key: HBASE-25447
> URL: https://issues.apache.org/jira/browse/HBASE-25447
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-12-26-11-49-38-018.png
>
>
> https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317
> If a node leaks resources due to other components or reasons, 
> BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the 
> while (running.get()), and some procs will stuck...
>  !image-2020-12-26-11-49-38-018.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-25447) remoteProc is suspended due to OOM ERROR

2020-12-25 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-25447:
--

Assignee: Bo Cui

> remoteProc is suspended due to OOM ERROR
> 
>
> Key: HBASE-25447
> URL: https://issues.apache.org/jira/browse/HBASE-25447
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-12-26-11-49-38-018.png
>
>
> https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317
> If a node leaks resources due to other components or reasons, 
> BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the 
> while (running.get()), and some procs will stuck...
>  !image-2020-12-26-11-49-38-018.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25447) remoteProc is suspended due to OOM ERROR

2020-12-25 Thread Bo Cui (Jira)
Bo Cui created HBASE-25447:
--

 Summary: remoteProc is suspended due to OOM ERROR
 Key: HBASE-25447
 URL: https://issues.apache.org/jira/browse/HBASE-25447
 Project: HBase
  Issue Type: Bug
  Components: proc-v2
Affects Versions: 2.2.3, 3.0.0-alpha-1
Reporter: Bo Cui
 Attachments: image-2020-12-26-11-49-38-018.png

https://github.com/apache/hbase/blob/0f868da05d7ffabe4512a0cae110ed097b033ebf/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L317
If a node leaks resources due to other components or reasons, 
BufferNode#dispatch() may fail. and then TimeoutExecutorThread will exit the 
while (running.get()), and some procs will stuck...
 !image-2020-12-26-11-49-38-018.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23340) hmaster /hbase/replication/rs session expired (hbase replication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in old

2020-12-12 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248504#comment-17248504
 ] 

Bo Cui commented on HBASE-23340:


[~zhangduo][~vjasani] thanks for review
[~zhangduo]  i have submitted new PR to branch-2 

> hmaster  /hbase/replication/rs  session expired (hbase replication default 
> value is true, we don't use ) causes logcleaner can not clean oldWALs, which 
> resulits in oldWALs too large (more than 2TB)
> -
>
> Key: HBASE-23340
> URL: https://issues.apache.org/jira/browse/HBASE-23340
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: jackylau
>Assignee: Bo Cui
>Priority: Major
> Attachments: Snipaste_2019-11-21_10-39-25.png, 
> Snipaste_2019-11-21_14-10-36.png
>
>
> hmaster /hbase/replication/rs session expired (hbase replication default 
> value is true, we don't use ) causes logcleaner can not clean oldWALs, which 
> resulits in oldWALs too large (more than 2TB).
> !Snipaste_2019-11-21_10-39-25.png!
>  
> !Snipaste_2019-11-21_14-10-36.png!
>  
> we can solve it by following :
> 1) increase the session timeout(but i think it is not a good idea. because we 
> do not know how long to set is suitable)
> 2) close the hbase replication. It is not a good idea too, when our user uses 
> this feature
> 3) we need add retry times, for example when it has already happened three 
> times, we set the ReplicationLogCleaner and SnapShotCleaner stop
> that is all my ideas, i do not konw it is suitable, If it is suitable, could 
> i commit a PR?
> Does anynode have a good idea.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24395) ServerName#getHostname() is case sensitive

2020-11-21 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-24395:
--

Assignee: Bo Cui

> ServerName#getHostname() is case sensitive
> --
>
> Key: HBASE-24395
> URL: https://issues.apache.org/jira/browse/HBASE-24395
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer
>Affects Versions: 1.3.1
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: HBase-24395.patch, image-2020-05-18-17-42-57-119.png
>
>
> ServerName calss,the getServerName(String hostName, int port, long 
> startcode),equals and compareTo are case insensitive, but getHostname() is 
> case sensitive.
> if hostName is HOSTNAME1, ServerName is hostname1,1,1589615319931, and 
> getHostname() returns HOSTNAME1.
> and then BaseLoadBalancer#retainAssignment() uses ServerName#getHostname(), 
> all keys of serversByHostname are 
> upperCase(HOSTNAME1,HOSTNAME2,HOSTNAME3,HOSTNAME4...) from 
> ServerManager#createDestinationServersList, but oldServerName.getHostname() 
> is lowerCase(hostname1,hostname2,hostname3...) from walLog dir.
> !image-2020-05-18-17-42-57-119.png!
> and finally...all region of old ServerName will be assigned to random hosts



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24395) ServerName#getHostname() is case sensitive

2020-11-21 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236622#comment-17236622
 ] 

Bo Cui commented on HBASE-24395:


[https://github.com/apache/hbase/pull/2690]

update

> ServerName#getHostname() is case sensitive
> --
>
> Key: HBASE-24395
> URL: https://issues.apache.org/jira/browse/HBASE-24395
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer
>Affects Versions: 1.3.1
>Reporter: Bo Cui
>Priority: Major
> Attachments: HBase-24395.patch, image-2020-05-18-17-42-57-119.png
>
>
> ServerName calss,the getServerName(String hostName, int port, long 
> startcode),equals and compareTo are case insensitive, but getHostname() is 
> case sensitive.
> if hostName is HOSTNAME1, ServerName is hostname1,1,1589615319931, and 
> getHostname() returns HOSTNAME1.
> and then BaseLoadBalancer#retainAssignment() uses ServerName#getHostname(), 
> all keys of serversByHostname are 
> upperCase(HOSTNAME1,HOSTNAME2,HOSTNAME3,HOSTNAME4...) from 
> ServerManager#createDestinationServersList, but oldServerName.getHostname() 
> is lowerCase(hostname1,hostname2,hostname3...) from walLog dir.
> !image-2020-05-18-17-42-57-119.png!
> and finally...all region of old ServerName will be assigned to random hosts



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-11-21 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236614#comment-17236614
 ] 

Bo Cui commented on HBASE-24924:


{quote}
You still do not answer my question...
How could this happen without data corruption from outside?
{quote}
If the data was not corrupted, the problem would not have occurred.

But the data corruption exists,and we don't know how it is damaged.

we have a lot of cluster, and should consider how to automatically recover.

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, 
> and meta znode does not exist, finishActiveMasterInitialization will stuck
> because during startup,If InitMetaProcedure exists, InitMetaProcedure 
> recreates a new CountDownLatch.
> master jstack
>  !image-2020-08-21-09-12-33-894.png!
> master log 
>  !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23340) hmaster /hbase/replication/rs session expired (hbase replication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in oldWA

2020-11-21 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-23340:
---
Status: Patch Available  (was: Open)

> hmaster  /hbase/replication/rs  session expired (hbase replication default 
> value is true, we don't use ) causes logcleaner can not clean oldWALs, which 
> resulits in oldWALs too large (more than 2TB)
> -
>
> Key: HBASE-23340
> URL: https://issues.apache.org/jira/browse/HBASE-23340
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 2.2.3, 3.0.0-alpha-1
>Reporter: jackylau
>Assignee: Bo Cui
>Priority: Major
> Attachments: Snipaste_2019-11-21_10-39-25.png, 
> Snipaste_2019-11-21_14-10-36.png
>
>
> hmaster /hbase/replication/rs session expired (hbase replication default 
> value is true, we don't use ) causes logcleaner can not clean oldWALs, which 
> resulits in oldWALs too large (more than 2TB).
> !Snipaste_2019-11-21_10-39-25.png!
>  
> !Snipaste_2019-11-21_14-10-36.png!
>  
> we can solve it by following :
> 1) increase the session timeout(but i think it is not a good idea. because we 
> do not know how long to set is suitable)
> 2) close the hbase replication. It is not a good idea too, when our user uses 
> this feature
> 3) we need add retry times, for example when it has already happened three 
> times, we set the ReplicationLogCleaner and SnapShotCleaner stop
> that is all my ideas, i do not konw it is suitable, If it is suitable, could 
> i commit a PR?
> Does anynode have a good idea.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25311) ui throws NPE

2020-11-21 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-25311:
---
Status: Patch Available  (was: Open)

> ui throws NPE
> -
>
> Key: HBASE-25311
> URL: https://issues.apache.org/jira/browse/HBASE-25311
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.3, 3.0.0-alpha-1
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> https://github.com/apache/hbase/blob/eca904e0fb438461a8da3f37cea3eaf496988be9/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L3624
>  if rs has invalid znode, and restart master, ui will throw NPE.
> i encountered this problem during the upgrade.
> workaround: restart HBase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25317) [github]rename HBASE-18070-ROOT_hbase:meta_Region_Replicas.pdf

2020-11-20 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-25317:
---
Description: 
use git pull to obtain an exception, because filename has ':'

!image-2020-11-21-14-48-23-794.png!

workaround:git config core.protectNTFS false

[~stack] i think we can ranme filename...

.

  was:
use git pull to obtain an exception

!image-2020-11-21-14-48-23-794.png!

workaround:git config core.protectNTFS false

[~stack] i think we can ranme filename...

.


> [github]rename HBASE-18070-ROOT_hbase:meta_Region_Replicas.pdf
> --
>
> Key: HBASE-25317
> URL: https://issues.apache.org/jira/browse/HBASE-25317
> Project: HBase
>  Issue Type: Bug
>Reporter: Bo Cui
>Priority: Minor
> Attachments: image-2020-11-21-14-48-23-794.png
>
>
> use git pull to obtain an exception, because filename has ':'
> !image-2020-11-21-14-48-23-794.png!
> workaround:git config core.protectNTFS false
> [~stack] i think we can ranme filename...
> .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25317) [github]rename HBASE-18070-ROOT_hbase:meta_Region_Replicas.pdf

2020-11-20 Thread Bo Cui (Jira)
Bo Cui created HBASE-25317:
--

 Summary: [github]rename 
HBASE-18070-ROOT_hbase:meta_Region_Replicas.pdf
 Key: HBASE-25317
 URL: https://issues.apache.org/jira/browse/HBASE-25317
 Project: HBase
  Issue Type: Bug
Reporter: Bo Cui
 Attachments: image-2020-11-21-14-48-23-794.png

use git pull to obtain an exception

!image-2020-11-21-14-48-23-794.png!

workaround:git config core.protectNTFS false

[~stack] i think we can ranme filename...

.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-23340) hmaster /hbase/replication/rs session expired (hbase replication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in oldW

2020-11-20 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-23340:
--

Assignee: Bo Cui  (was: jackylau)

> hmaster  /hbase/replication/rs  session expired (hbase replication default 
> value is true, we don't use ) causes logcleaner can not clean oldWALs, which 
> resulits in oldWALs too large (more than 2TB)
> -
>
> Key: HBASE-23340
> URL: https://issues.apache.org/jira/browse/HBASE-23340
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: jackylau
>Assignee: Bo Cui
>Priority: Major
> Attachments: Snipaste_2019-11-21_10-39-25.png, 
> Snipaste_2019-11-21_14-10-36.png
>
>
> hmaster /hbase/replication/rs session expired (hbase replication default 
> value is true, we don't use ) causes logcleaner can not clean oldWALs, which 
> resulits in oldWALs too large (more than 2TB).
> !Snipaste_2019-11-21_10-39-25.png!
>  
> !Snipaste_2019-11-21_14-10-36.png!
>  
> we can solve it by following :
> 1) increase the session timeout(but i think it is not a good idea. because we 
> do not know how long to set is suitable)
> 2) close the hbase replication. It is not a good idea too, when our user uses 
> this feature
> 3) we need add retry times, for example when it has already happened three 
> times, we set the ReplicationLogCleaner and SnapShotCleaner stop
> that is all my ideas, i do not konw it is suitable, If it is suitable, could 
> i commit a PR?
> Does anynode have a good idea.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23340) hmaster /hbase/replication/rs session expired (hbase replication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in oldWA

2020-11-20 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-23340:
---
Affects Version/s: 3.0.0-alpha-1
   2.2.3

> hmaster  /hbase/replication/rs  session expired (hbase replication default 
> value is true, we don't use ) causes logcleaner can not clean oldWALs, which 
> resulits in oldWALs too large (more than 2TB)
> -
>
> Key: HBASE-23340
> URL: https://issues.apache.org/jira/browse/HBASE-23340
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: jackylau
>Assignee: jackylau
>Priority: Major
> Attachments: Snipaste_2019-11-21_10-39-25.png, 
> Snipaste_2019-11-21_14-10-36.png
>
>
> hmaster /hbase/replication/rs session expired (hbase replication default 
> value is true, we don't use ) causes logcleaner can not clean oldWALs, which 
> resulits in oldWALs too large (more than 2TB).
> !Snipaste_2019-11-21_10-39-25.png!
>  
> !Snipaste_2019-11-21_14-10-36.png!
>  
> we can solve it by following :
> 1) increase the session timeout(but i think it is not a good idea. because we 
> do not know how long to set is suitable)
> 2) close the hbase replication. It is not a good idea too, when our user uses 
> this feature
> 3) we need add retry times, for example when it has already happened three 
> times, we set the ReplicationLogCleaner and SnapShotCleaner stop
> that is all my ideas, i do not konw it is suitable, If it is suitable, could 
> i commit a PR?
> Does anynode have a good idea.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25311) ui throws NPE

2020-11-20 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-25311:
---
Summary: ui throws NPE  (was: hbase ui throws NPE)

> ui throws NPE
> -
>
> Key: HBASE-25311
> URL: https://issues.apache.org/jira/browse/HBASE-25311
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> https://github.com/apache/hbase/blob/eca904e0fb438461a8da3f37cea3eaf496988be9/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L3624
>  if rs has invalid znode, and restart master, ui will throw NPE.
> i encountered this problem during the upgrade.
> workaround: restart HBase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-25311) hbase ui throws NPE

2020-11-20 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-25311:
--

Assignee: Bo Cui

> hbase ui throws NPE
> ---
>
> Key: HBASE-25311
> URL: https://issues.apache.org/jira/browse/HBASE-25311
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> https://github.com/apache/hbase/blob/eca904e0fb438461a8da3f37cea3eaf496988be9/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L3624
>  if rs has invalid znode, and restart master, ui will throw NPE.
> i encountered this problem during the upgrade.
> workaround: restart HBase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25311) hbase ui throws NPE

2020-11-20 Thread Bo Cui (Jira)
Bo Cui created HBASE-25311:
--

 Summary: hbase ui throws NPE
 Key: HBASE-25311
 URL: https://issues.apache.org/jira/browse/HBASE-25311
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.2.3, 3.0.0-alpha-1
Reporter: Bo Cui


https://github.com/apache/hbase/blob/eca904e0fb438461a8da3f37cea3eaf496988be9/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L3624

 if rs has invalid znode, and restart master, ui will throw NPE.
i encountered this problem during the upgrade.

workaround: restart HBase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23340) hmaster /hbase/replication/rs session expired (hbase replication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in old

2020-11-20 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236011#comment-17236011
 ] 

Bo Cui commented on HBASE-23340:


I'll submit the PR.

> hmaster  /hbase/replication/rs  session expired (hbase replication default 
> value is true, we don't use ) causes logcleaner can not clean oldWALs, which 
> resulits in oldWALs too large (more than 2TB)
> -
>
> Key: HBASE-23340
> URL: https://issues.apache.org/jira/browse/HBASE-23340
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Reporter: jackylau
>Assignee: jackylau
>Priority: Major
> Attachments: Snipaste_2019-11-21_10-39-25.png, 
> Snipaste_2019-11-21_14-10-36.png
>
>
> hmaster /hbase/replication/rs session expired (hbase replication default 
> value is true, we don't use ) causes logcleaner can not clean oldWALs, which 
> resulits in oldWALs too large (more than 2TB).
> !Snipaste_2019-11-21_10-39-25.png!
>  
> !Snipaste_2019-11-21_14-10-36.png!
>  
> we can solve it by following :
> 1) increase the session timeout(but i think it is not a good idea. because we 
> do not know how long to set is suitable)
> 2) close the hbase replication. It is not a good idea too, when our user uses 
> this feature
> 3) we need add retry times, for example when it has already happened three 
> times, we set the ReplicationLogCleaner and SnapShotCleaner stop
> that is all my ideas, i do not konw it is suitable, If it is suitable, could 
> i commit a PR?
> Does anynode have a good idea.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25092) RSGroupBalancer#assignments lost some regionPlans

2020-10-07 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-25092:
---
Status: Patch Available  (was: Open)

> RSGroupBalancer#assignments lost some regionPlans
> -
>
> Key: HBASE-25092
> URL: https://issues.apache.org/jira/browse/HBASE-25092
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 2.2.3, 2.3.1
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> https://github.com/apache/hbase/blob/b2f2c79d8fa18fb691e669419004cc5168b0838d/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java#L216
> when fallback is enabled, servers does not contain the current group's rs, 
> and contains the rs of other group, region will be assigend to other group, 
> but assignments already contains targetRS, and then assignments.putAll 
> overwrites old entry
> {code:java}
> this.internalBalancer.retainAssignment(currentAssignmentMap, candidateList)
>   .forEach((serverName, regionInfos) -> {
> assignments.computeIfAbsent(serverName, s -> new 
> ArrayList<>())
> .addAll(regionInfos);
> });
> {code}
> the issue exists only in the branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-25111) Admin supports multi region merge

2020-09-28 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-25111:
--

Assignee: Bo Cui

> Admin supports multi region merge
> -
>
> Key: HBASE-25111
> URL: https://issues.apache.org/jira/browse/HBASE-25111
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> https://github.com/apache/hbase/blob/68b56beab744e983df0877eec9f576ef884a2807/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L889
> from masterRpcServices and mergeProc, master supports multi region merge...
> but admin dont support...
> we can enhance it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25111) Admin supports multi region merge

2020-09-28 Thread Bo Cui (Jira)
Bo Cui created HBASE-25111:
--

 Summary: Admin supports multi region merge
 Key: HBASE-25111
 URL: https://issues.apache.org/jira/browse/HBASE-25111
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.2.3, 3.0.0-alpha-1
Reporter: Bo Cui


https://github.com/apache/hbase/blob/68b56beab744e983df0877eec9f576ef884a2807/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java#L889
from masterRpcServices and mergeProc, master supports multi region merge...
but admin dont support...
we can enhance it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-25092) RSGroupBalancer#assignments lost some regionPlans

2020-09-23 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-25092:
--

Assignee: Bo Cui

> RSGroupBalancer#assignments lost some regionPlans
> -
>
> Key: HBASE-25092
> URL: https://issues.apache.org/jira/browse/HBASE-25092
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 2.3.1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> https://github.com/apache/hbase/blob/b2f2c79d8fa18fb691e669419004cc5168b0838d/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java#L216
> when fallback is enabled, servers does not contain the current group's rs, 
> and contains the rs of other group, region will be assigend to other group, 
> but assignments already contains targetRS, and then assignments.putAll 
> overwrites old entry
> {code:java}
> this.internalBalancer.retainAssignment(currentAssignmentMap, candidateList)
>   .forEach((serverName, regionInfos) -> {
> assignments.computeIfAbsent(serverName, s -> new 
> ArrayList<>())
> .addAll(regionInfos);
> });
> {code}
> the issue exists only in the branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25093) the RSGroupBasedLoadBalancer#retainAssignment throws NPE

2020-09-23 Thread Bo Cui (Jira)
Bo Cui created HBASE-25093:
--

 Summary: the RSGroupBasedLoadBalancer#retainAssignment throws NPE
 Key: HBASE-25093
 URL: https://issues.apache.org/jira/browse/HBASE-25093
 Project: HBase
  Issue Type: Bug
  Components: rsgroup
Affects Versions: 2.2.3, 2.3.1, 3.0.0-alpha-1
Reporter: Bo Cui


when BaseLoadBalancer#
https://github.com/apache/hbase/blob/8bfa2cb2eedcf050b26a28961e1b77dbf3cd8c95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java#L1433

If the result of the BaseLoadBalancer#retainAssignment is null, the 
RSGroupBasedLoadBalancer#retainAssignment will throw NPE.

https://github.com/apache/hbase/blob/8bfa2cb2eedcf050b26a28961e1b77dbf3cd8c95/hbase-server/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java#L206



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25092) RSGroupBalancer#assignments lost some regionPlans

2020-09-23 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-25092:
---
Description: 
https://github.com/apache/hbase/blob/b2f2c79d8fa18fb691e669419004cc5168b0838d/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java#L216

when fallback is enabled, servers does not contain the current group's rs, and 
contains the rs of other group, region will be assigend to other group, but 
assignments already contains targetRS, and then assignments.putAll overwrites 
old entry

{code:java}
this.internalBalancer.retainAssignment(currentAssignmentMap, candidateList)
  .forEach((serverName, regionInfos) -> {
assignments.computeIfAbsent(serverName, s -> new ArrayList<>())
.addAll(regionInfos);
});
{code}

the issue exists only in the branch-2.

  was:
https://github.com/apache/hbase/blob/b2f2c79d8fa18fb691e669419004cc5168b0838d/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java#L216

when fallbak is enabled, servers does not contain the current group's rs, and 
contains the rs of other group, region will be assigend to other group, but 
assignments already contains targetRS, and then assignments.putAll overwrites 
old entry

{code:java}
this.internalBalancer.retainAssignment(currentAssignmentMap, candidateList)
  .forEach((serverName, regionInfos) -> {
assignments.computeIfAbsent(serverName, s -> new ArrayList<>())
.addAll(regionInfos);
});
{code}

the issue exists only in the branch-2.


> RSGroupBalancer#assignments lost some regionPlans
> -
>
> Key: HBASE-25092
> URL: https://issues.apache.org/jira/browse/HBASE-25092
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 2.3.1, 2.2.3
>Reporter: Bo Cui
>Priority: Major
>
> https://github.com/apache/hbase/blob/b2f2c79d8fa18fb691e669419004cc5168b0838d/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java#L216
> when fallback is enabled, servers does not contain the current group's rs, 
> and contains the rs of other group, region will be assigend to other group, 
> but assignments already contains targetRS, and then assignments.putAll 
> overwrites old entry
> {code:java}
> this.internalBalancer.retainAssignment(currentAssignmentMap, candidateList)
>   .forEach((serverName, regionInfos) -> {
> assignments.computeIfAbsent(serverName, s -> new 
> ArrayList<>())
> .addAll(regionInfos);
> });
> {code}
> the issue exists only in the branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25092) RSGroupBalancer#assignments lost some regionPlans

2020-09-23 Thread Bo Cui (Jira)
Bo Cui created HBASE-25092:
--

 Summary: RSGroupBalancer#assignments lost some regionPlans
 Key: HBASE-25092
 URL: https://issues.apache.org/jira/browse/HBASE-25092
 Project: HBase
  Issue Type: Bug
  Components: rsgroup
Affects Versions: 2.2.3, 2.3.1
Reporter: Bo Cui


https://github.com/apache/hbase/blob/b2f2c79d8fa18fb691e669419004cc5168b0838d/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java#L216

when fallbak is enabled, servers does not contain the current group's rs, and 
contains the rs of other group, region will be assigend to other group, but 
assignments already contains targetRS, and then assignments.putAll overwrites 
old entry

{code:java}
this.internalBalancer.retainAssignment(currentAssignmentMap, candidateList)
  .forEach((serverName, regionInfos) -> {
assignments.computeIfAbsent(serverName, s -> new ArrayList<>())
.addAll(regionInfos);
});
{code}

the issue exists only in the branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24962) Optimize BufferNode Lock

2020-09-02 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24962:
---
Status: Patch Available  (was: Open)

> Optimize BufferNode Lock
> 
>
> Key: HBASE-24962
> URL: https://issues.apache.org/jira/browse/HBASE-24962
> Project: HBase
>  Issue Type: Bug
>  Components: MTTR
>Affects Versions: 2.2.3, 3.0.0-alpha-1
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L373]
> during startup, a large number of OpenRegionProcedures are generated, which 
> are added to the BufferNode. However, the BufferNode has some "synchronized" 
> methods, These methods may affect MTTR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24962) Optimize BufferNode Lock

2020-09-02 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-24962:
--

Assignee: Bo Cui

> Optimize BufferNode Lock
> 
>
> Key: HBASE-24962
> URL: https://issues.apache.org/jira/browse/HBASE-24962
> Project: HBase
>  Issue Type: Bug
>  Components: MTTR
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L373]
> during startup, a large number of OpenRegionProcedures are generated, which 
> are added to the BufferNode. However, the BufferNode has some "synchronized" 
> methods, These methods may affect MTTR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24960) reduce invalid subprocedure task

2020-09-02 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24960:
---
Status: Patch Available  (was: Open)

> reduce invalid subprocedure task
> 
>
> Key: HBASE-24960
> URL: https://issues.apache.org/jira/browse/HBASE-24960
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 2.2.3, 3.0.0-alpha-1
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165]
> [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/flush/RegionServerFlushTableProcedureManager.java#L146]
>  
> if involvedRegions is null or empty, rs should skip subprocedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24960) reduce invalid subprocedure task

2020-09-02 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-24960:
--

Assignee: Bo Cui

> reduce invalid subprocedure task
> 
>
> Key: HBASE-24960
> URL: https://issues.apache.org/jira/browse/HBASE-24960
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165]
> [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/flush/RegionServerFlushTableProcedureManager.java#L146]
>  
> if involvedRegions is null or empty, rs should skip subprocedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24960) reduce invalid subprocedure task

2020-09-02 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24960:
---
Priority: Major  (was: Minor)

> reduce invalid subprocedure task
> 
>
> Key: HBASE-24960
> URL: https://issues.apache.org/jira/browse/HBASE-24960
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Priority: Major
>
> [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165]
> [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/flush/RegionServerFlushTableProcedureManager.java#L146]
>  
> if involvedRegions is null or empty, rs should skip subprocedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24680) Refactor the checkAndMutate code on the server side

2020-09-01 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24680:
---
Description: # Refactor the checkAndMutate code on the server side by using 
the CheckAndMutate class (introduced in HBASE-8458) and the 
CheckAndMutateResult class (introduced in HBASE-24650).  (was: Refactor the 
checkAndMutate code on the server side by using the CheckAndMutate class 
(introduced in HBASE-8458) and the CheckAndMutateResult class (introduced in 
HBASE-24650).)

> Refactor the checkAndMutate code on the server side
> ---
>
> Key: HBASE-24680
> URL: https://issues.apache.org/jira/browse/HBASE-24680
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> # Refactor the checkAndMutate code on the server side by using the 
> CheckAndMutate class (introduced in HBASE-8458) and the CheckAndMutateResult 
> class (introduced in HBASE-24650).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-24925) SCP reduce unnecessary get requests

2020-08-29 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186924#comment-17186924
 ] 

Bo Cui edited comment on HBASE-24925 at 8/29/20, 10:50 AM:
---

!image-2020-08-29-17-46-00-900.png!
If the thread pool is not used, load 10k tablestate needs 170+s


was (Author: bo cui):
!image-2020-08-29-17-46-00-900.png!
If the thread pool is not used, load tablestate needs 170+s

> SCP reduce unnecessary get requests
> ---
>
> Key: HBASE-24925
> URL: https://issues.apache.org/jira/browse/HBASE-24925
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-29-17-46-00-900.png
>
>
> SCP should reduce unnecessary Get request
> [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520]
> during startup, the tableNam2State of tableStateManager is not loading 
> tableState data form metaTable yet.  if procThread num is 50 and hbase has 
> 10K tables, in the worst case, the master needs to query meta table 500K 
> times(50*10K. and the regions that all SCPs simultaneously check tableState 
> belong to the same table )
>  
> i think master can reduce Get request, and AM#loadMeta can load regions and 
> all tables through asynchronous threads.
> [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24925) SCP reduce unnecessary get requests

2020-08-29 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186924#comment-17186924
 ] 

Bo Cui commented on HBASE-24925:


!image-2020-08-29-17-46-00-900.png!
If the thread pool is not used, load tablestate needs 170+s

> SCP reduce unnecessary get requests
> ---
>
> Key: HBASE-24925
> URL: https://issues.apache.org/jira/browse/HBASE-24925
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-29-17-46-00-900.png
>
>
> SCP should reduce unnecessary Get request
> [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520]
> during startup, the tableNam2State of tableStateManager is not loading 
> tableState data form metaTable yet.  if procThread num is 50 and hbase has 
> 10K tables, in the worst case, the master needs to query meta table 500K 
> times(50*10K. and the regions that all SCPs simultaneously check tableState 
> belong to the same table )
>  
> i think master can reduce Get request, and AM#loadMeta can load regions and 
> all tables through asynchronous threads.
> [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24925) SCP reduce unnecessary get requests

2020-08-29 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24925:
---
Attachment: image-2020-08-29-17-46-00-900.png

> SCP reduce unnecessary get requests
> ---
>
> Key: HBASE-24925
> URL: https://issues.apache.org/jira/browse/HBASE-24925
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-29-17-46-00-900.png
>
>
> SCP should reduce unnecessary Get request
> [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520]
> during startup, the tableNam2State of tableStateManager is not loading 
> tableState data form metaTable yet.  if procThread num is 50 and hbase has 
> 10K tables, in the worst case, the master needs to query meta table 500K 
> times(50*10K. and the regions that all SCPs simultaneously check tableState 
> belong to the same table )
>  
> i think master can reduce Get request, and AM#loadMeta can load regions and 
> all tables through asynchronous threads.
> [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24937) table.rb use LocalDateTime to replace Instant

2020-08-28 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24937:
---
Status: Patch Available  (was: Open)

> table.rb use LocalDateTime to replace Instant
> -
>
> Key: HBASE-24937
> URL: https://issues.apache.org/jira/browse/HBASE-24937
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Affects Versions: 2.2.3, 3.0.0-alpha-1
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Minor
>
> https://github.com/apache/hbase/blob/9f62a82334574b135f8e220b024981df64fab811/hbase-shell/src/main/ruby/hbase/table.rb#L754
> we can use timeZone to improve readability.
> {code:java}
> return java.time.LocalDateTime.ofInstant(instant, 
> java.time.ZoneId.systemDefault()).toString
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24962) Optimize BufferNode Lock

2020-08-27 Thread Bo Cui (Jira)
Bo Cui created HBASE-24962:
--

 Summary: Optimize BufferNode Lock
 Key: HBASE-24962
 URL: https://issues.apache.org/jira/browse/HBASE-24962
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Affects Versions: 2.2.3, 3.0.0-alpha-1
Reporter: Bo Cui


[https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java#L373]

during startup, a large number of OpenRegionProcedures are generated, which are 
added to the BufferNode. However, the BufferNode has some "synchronized" 
methods, These methods may affect MTTR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24937) table.rb use LocalDateTime to replace Instant

2020-08-27 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-24937:
--

Assignee: Bo Cui

> table.rb use LocalDateTime to replace Instant
> -
>
> Key: HBASE-24937
> URL: https://issues.apache.org/jira/browse/HBASE-24937
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Minor
>
> https://github.com/apache/hbase/blob/9f62a82334574b135f8e220b024981df64fab811/hbase-shell/src/main/ruby/hbase/table.rb#L754
> we can use timeZone to improve readability.
> {code:java}
> return java.time.LocalDateTime.ofInstant(instant, 
> java.time.ZoneId.systemDefault()).toString
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24925) SCP reduce unnecessary get requests

2020-08-27 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-24925:
--

Assignee: Bo Cui

> SCP reduce unnecessary get requests
> ---
>
> Key: HBASE-24925
> URL: https://issues.apache.org/jira/browse/HBASE-24925
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> SCP should reduce unnecessary Get request
> [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520]
> during startup, the tableNam2State of tableStateManager is not loading 
> tableState data form metaTable yet.  if procThread num is 50 and hbase has 
> 10K tables, in the worst case, the master needs to query meta table 500K 
> times(50*10K. and the regions that all SCPs simultaneously check tableState 
> belong to the same table )
>  
> i think master can reduce Get request, and AM#loadMeta can load regions and 
> all tables through asynchronous threads.
> [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24960) reduce invalid subprocedure task

2020-08-27 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185731#comment-17185731
 ] 

Bo Cui commented on HBASE-24960:


[~wenfeiyi666] hi, u want to fix the issue? but i am alread working with the 
issue, will be raising PR soon.

> reduce invalid subprocedure task
> 
>
> Key: HBASE-24960
> URL: https://issues.apache.org/jira/browse/HBASE-24960
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: wenfeiyi666
>Priority: Minor
>
> [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165]
> [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/flush/RegionServerFlushTableProcedureManager.java#L146]
>  
> if involvedRegions is null or empty, rs should skip subprocedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24960) reduce invalid subprocedure task

2020-08-27 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24960:
---
Description: 
[https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165]

[https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/flush/RegionServerFlushTableProcedureManager.java#L146]

 

if involvedRegions is null or empty, rs should skip subprocedure.

  was:
[https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165]

 

if involvedRegions is null or empty, rs should skip subprocedure.


> reduce invalid subprocedure task
> 
>
> Key: HBASE-24960
> URL: https://issues.apache.org/jira/browse/HBASE-24960
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Priority: Minor
>
> [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165]
> [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/flush/RegionServerFlushTableProcedureManager.java#L146]
>  
> if involvedRegions is null or empty, rs should skip subprocedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24960) reduce invalid subprocedure task

2020-08-27 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24960:
---
Summary: reduce invalid subprocedure task  (was: reduce invalid snapshot 
task)

> reduce invalid subprocedure task
> 
>
> Key: HBASE-24960
> URL: https://issues.apache.org/jira/browse/HBASE-24960
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Priority: Minor
>
> [https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165]
>  
> if involvedRegions is null or empty, rs should skip subprocedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24960) reduce invalid snapshot task

2020-08-27 Thread Bo Cui (Jira)
Bo Cui created HBASE-24960:
--

 Summary: reduce invalid snapshot task
 Key: HBASE-24960
 URL: https://issues.apache.org/jira/browse/HBASE-24960
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.2.3, 3.0.0-alpha-1
Reporter: Bo Cui


[https://github.com/apache/hbase/blob/047e0618d290a09a4a269b00548fe17691e31787/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/snapshot/RegionServerSnapshotManager.java#L165]

 

if involvedRegions is null or empty, rs should skip subprocedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24939) Invalid addFsRegionsMissingInMeta -d

2020-08-23 Thread Bo Cui (Jira)
Bo Cui created HBASE-24939:
--

 Summary: Invalid addFsRegionsMissingInMeta -d
 Key: HBASE-24939
 URL: https://issues.apache.org/jira/browse/HBASE-24939
 Project: HBase
  Issue Type: Bug
  Components: hbck2
Affects Versions: 2.2.3, 3.0.0-alpha-1
Reporter: Bo Cui


[https://github.com/apache/hbase-operator-tools/blob/87878aada3354514050f5a2df11f27b317efd42d/hbase-hbck2/src/main/java/org/apache/hbase/HBCK2.java#L436]

-d is invalid. addFsRegionsMissingInMeta does not use -d



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24925) SCP reduce unnecessary get requests

2020-08-23 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24925:
---
Description: 
SCP should reduce unnecessary Get request

[https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520]

during startup, the tableNam2State of tableStateManager is not loading 
tableState data form metaTable yet.  if procThread num is 50 and hbase has 10K 
tables, in the worst case, the master needs to query meta table 500K 
times(50*10K. and the regions that all SCPs simultaneously check tableState 
belong to the same table )

 

i think master can reduce Get request, and AM#loadMeta can load regions and all 
tables through asynchronous threads.

[https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532]

 

 

  was:
SCP should reduce unnecessary Get request

[https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520]

during startup, the tableNam2State of tableStateManager is not loading 
tableState data form metaTable yet.  if procThread num is 50 and hbase has 10K 
tables, in the worst case, the master needs to query meta table 500K 
times(50*10K. and the regions that all SCPs simultaneously check tableState 
belong to the same table )

 

i think master can reduce Get request, and AM#loadMeta can load regions and all 
tables

[https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532]

 

 


> SCP reduce unnecessary get requests
> ---
>
> Key: HBASE-24925
> URL: https://issues.apache.org/jira/browse/HBASE-24925
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Priority: Major
>
> SCP should reduce unnecessary Get request
> [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520]
> during startup, the tableNam2State of tableStateManager is not loading 
> tableState data form metaTable yet.  if procThread num is 50 and hbase has 
> 10K tables, in the worst case, the master needs to query meta table 500K 
> times(50*10K. and the regions that all SCPs simultaneously check tableState 
> belong to the same table )
>  
> i think master can reduce Get request, and AM#loadMeta can load regions and 
> all tables through asynchronous threads.
> [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24937) table.rb use LocalDateTime to replace Instant

2020-08-23 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24937:
---
Affects Version/s: 3.0.0-alpha-1
   2.2.3

> table.rb use LocalDateTime to replace Instant
> -
>
> Key: HBASE-24937
> URL: https://issues.apache.org/jira/browse/HBASE-24937
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Priority: Minor
>
> https://github.com/apache/hbase/blob/9f62a82334574b135f8e220b024981df64fab811/hbase-shell/src/main/ruby/hbase/table.rb#L754
> we can use timeZone to improve readability.
> {code:java}
> return java.time.LocalDateTime.ofInstant(instant, 
> java.time.ZoneId.systemDefault()).toString
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24937) table.rb use LocalDateTime to replace Instant

2020-08-23 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24937:
---
Component/s: shell

> table.rb use LocalDateTime to replace Instant
> -
>
> Key: HBASE-24937
> URL: https://issues.apache.org/jira/browse/HBASE-24937
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Priority: Minor
>
> https://github.com/apache/hbase/blob/9f62a82334574b135f8e220b024981df64fab811/hbase-shell/src/main/ruby/hbase/table.rb#L754
> we can use timeZone to improve readability.
> {code:java}
> return java.time.LocalDateTime.ofInstant(instant, 
> java.time.ZoneId.systemDefault()).toString
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24937) table.rb use LocalDateTime to replace Instant

2020-08-23 Thread Bo Cui (Jira)
Bo Cui created HBASE-24937:
--

 Summary: table.rb use LocalDateTime to replace Instant
 Key: HBASE-24937
 URL: https://issues.apache.org/jira/browse/HBASE-24937
 Project: HBase
  Issue Type: Improvement
Reporter: Bo Cui


https://github.com/apache/hbase/blob/9f62a82334574b135f8e220b024981df64fab811/hbase-shell/src/main/ruby/hbase/table.rb#L754

we can use timeZone to improve readability.

{code:java}
return java.time.LocalDateTime.ofInstant(instant, 
java.time.ZoneId.systemDefault()).toString
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-21 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181776#comment-17181776
 ] 

Bo Cui edited comment on HBASE-24924 at 8/21/20, 10:00 AM:
---

yeah, The root cause is the same:meta znode doest not exist

but after 
{quote}next startup, if znode doest not exist, hbase will recreate new 
IniMetaProcedure
{quote}
hbase can be started.

after
{quote}if procWAL has completed IniMetaProcedure and znode doest not exist
{quote}
hbase stuck
  


was (Author: bo cui):
yeah, The root cause is the same:meta znode doest not exist

but after 
{quote}next startup, if znode doest not exist, hbase will recreate new 
IniMetaProcedure
{quote}
hbase is ok

after
{quote}if procWAL has completed IniMetaProcedure and znode doest not exist
{quote}
hbase stuck
  

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, 
> and meta znode does not exist, finishActiveMasterInitialization will stuck
> because during startup,If InitMetaProcedure exists, InitMetaProcedure 
> recreates a new CountDownLatch.
> master jstack
>  !image-2020-08-21-09-12-33-894.png!
> master log 
>  !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-21 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181776#comment-17181776
 ] 

Bo Cui commented on HBASE-24924:


yeah, The root cause is the same:meta znode doest not exist

but after 
{quote}next startup, if znode doest not exist, hbase will recreate new 
IniMetaProcedure
{quote}
hbase is ok

after
{quote}if procWAL has completed IniMetaProcedure and znode doest not exist
{quote}
hbase stuck
  

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, 
> and meta znode does not exist, finishActiveMasterInitialization will stuck
> because during startup,If InitMetaProcedure exists, InitMetaProcedure 
> recreates a new CountDownLatch.
> master jstack
>  !image-2020-08-21-09-12-33-894.png!
> master log 
>  !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-21 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181757#comment-17181757
 ] 

Bo Cui commented on HBASE-24924:


thx [~zhangduo]  my question and the PR have a little different

in normal scenario:
1、first startup, hbase will create IniMetaProcedure
2、next startup, if znode doest not exist, hbase will recreate new 
IniMetaProcedure

my question:
if procWAL has completed IniMetaProcedure and znode doest not exist, hbase 
should startup like 2, hbase should not stuck

 

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, 
> and meta znode does not exist, finishActiveMasterInitialization will stuck
> because during startup,If InitMetaProcedure exists, InitMetaProcedure 
> recreates a new CountDownLatch.
> master jstack
>  !image-2020-08-21-09-12-33-894.png!
> master log 
>  !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-21 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181738#comment-17181738
 ] 

Bo Cui commented on HBASE-24924:


{quote}
 The problem is who deletes the meta znode? Why?
{quote}
actual production env ZK data my be corrupted... ZK data lost...manual deletion 
...reinstall zk ...etc...

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, 
> and meta znode does not exist, finishActiveMasterInitialization will stuck
> because during startup,If InitMetaProcedure exists, InitMetaProcedure 
> recreates a new CountDownLatch.
> master jstack
>  !image-2020-08-21-09-12-33-894.png!
> master log 
>  !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-21 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181711#comment-17181711
 ] 

Bo Cui commented on HBASE-24924:


but when we knew znode had been deleted, hbase had started and submitted the 
InitMetaProcedure and recreate znode.
So I think, since we don't know about it in time, we should minimize its 
impact. After a InitMetaProcedure is resubmitted,  how to ensure that the meta 
region is not deleted and  not stuck?

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, 
> and meta znode does not exist, finishActiveMasterInitialization will stuck
> because during startup,If InitMetaProcedure exists, InitMetaProcedure 
> recreates a new CountDownLatch.
> master jstack
>  !image-2020-08-21-09-12-33-894.png!
> master log 
>  !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-21 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181698#comment-17181698
 ] 

Bo Cui commented on HBASE-24924:


{quote}Anyway, this is not a normal operation, we should deal with this through 
HBCK2, not in the normal code path.{quote}
when startup, How do master avoid submit InitMetaProcedure if meta is already 
there

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, 
> and meta znode does not exist, finishActiveMasterInitialization will stuck
> because during startup,If InitMetaProcedure exists, InitMetaProcedure 
> recreates a new CountDownLatch.
> master jstack
>  !image-2020-08-21-09-12-33-894.png!
> master log 
>  !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-21 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181671#comment-17181671
 ] 

Bo Cui commented on HBASE-24924:


{quote}I think the design here is to not submit InitMetaProcedure if meta is 
already there...{quote}
if znode does not exist, how assign meta?

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, 
> and meta znode does not exist, finishActiveMasterInitialization will stuck
> because during startup,If InitMetaProcedure exists, InitMetaProcedure 
> recreates a new CountDownLatch.
> master jstack
>  !image-2020-08-21-09-12-33-894.png!
> master log 
>  !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-21 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181661#comment-17181661
 ] 

Bo Cui edited comment on HBASE-24924 at 8/21/20, 7:04 AM:
--

{quote}I assume in a normal deployment, we should not delete the meta znode? 
The guys from AWS has a scenario where they restart a cluster with nothing on 
zookeeper, then the problem is the InitMetaProcedure will delete the meta 
region...
{quote}

yeah, 2.2.3 and branch-master InitMetaProcedure are different. the 2.2.3 
InitMetaProcedure does not create metaTable, so master can resubmit 
InitMetaProcedure. but in branch-master,master executes only 
INIT_META_ASSIGN_META(if meta dir already exists, InitMetaProcedure should skip 
INIT_META_WRITE_FS_LAYOUT and INIT_META_CREATE_NAMESPACES)


was (Author: bo cui):
{quote}I assume in a normal deployment, we should not delete the meta znode? 
The guys from AWS has a scenario where they restart a cluster with nothing on 
zookeeper, then the problem is the InitMetaProcedure will delete the meta 
region...
{quote}

yeah, 2.2.3 and master InitMetaProcedure are different. the 2.2.3 
InitMetaProcedure does not create metaTable, so master can resubmit 
InitMetaProcedure. but in branch-master,master executes only 
INIT_META_ASSIGN_META(if meta dir already exists, InitMetaProcedure should skip 
INIT_META_WRITE_FS_LAYOUT and INIT_META_CREATE_NAMESPACES)

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, 
> and meta znode does not exist, finishActiveMasterInitialization will stuck
> because during startup,If InitMetaProcedure exists, InitMetaProcedure 
> recreates a new CountDownLatch.
> master jstack
>  !image-2020-08-21-09-12-33-894.png!
> master log 
>  !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-21 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181661#comment-17181661
 ] 

Bo Cui edited comment on HBASE-24924 at 8/21/20, 7:03 AM:
--

{quote}I assume in a normal deployment, we should not delete the meta znode? 
The guys from AWS has a scenario where they restart a cluster with nothing on 
zookeeper, then the problem is the InitMetaProcedure will delete the meta 
region...
{quote}

yeah, 2.2.3 and master InitMetaProcedure are different. the 2.2.3 
InitMetaProcedure does not create metaTable, so master can resubmit 
InitMetaProcedure. but in branch-master,master executes only 
INIT_META_ASSIGN_META(if meta dir already exists, InitMetaProcedure should skip 
INIT_META_WRITE_FS_LAYOUT and INIT_META_CREATE_NAMESPACES)


was (Author: bo cui):
{{{quote}}}I assume in a normal deployment, we should not delete the meta 
znode? The guys from AWS has a scenario where they restart a cluster with 
nothing on zookeeper, then the problem is the InitMetaProcedure will delete the 
meta region...

{{{quote}}}

yeah, 2.2.3 and master InitMetaProcedure are different. the 2.2.3 
InitMetaProcedure does not create metaTable, so master can resubmit 
InitMetaProcedure. but in branch-master,master executes only 
INIT_META_ASSIGN_META(if meta dir already exists, InitMetaProcedure should skip 
INIT_META_WRITE_FS_LAYOUT and INIT_META_CREATE_NAMESPACES)

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, 
> and meta znode does not exist, finishActiveMasterInitialization will stuck
> because during startup,If InitMetaProcedure exists, InitMetaProcedure 
> recreates a new CountDownLatch.
> master jstack
>  !image-2020-08-21-09-12-33-894.png!
> master log 
>  !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-21 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181661#comment-17181661
 ] 

Bo Cui commented on HBASE-24924:


{{{quote}}}I assume in a normal deployment, we should not delete the meta 
znode? The guys from AWS has a scenario where they restart a cluster with 
nothing on zookeeper, then the problem is the InitMetaProcedure will delete the 
meta region...

{{{quote}}}

yeah, 2.2.3 and master InitMetaProcedure are different. the 2.2.3 
InitMetaProcedure does not create metaTable, so master can resubmit 
InitMetaProcedure. but in branch-master,master executes only 
INIT_META_ASSIGN_META(if meta dir already exists, InitMetaProcedure should skip 
INIT_META_WRITE_FS_LAYOUT and INIT_META_CREATE_NAMESPACES)

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, 
> and meta znode does not exist, finishActiveMasterInitialization will stuck
> because during startup,If InitMetaProcedure exists, InitMetaProcedure 
> recreates a new CountDownLatch.
> master jstack
>  !image-2020-08-21-09-12-33-894.png!
> master log 
>  !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-20 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181518#comment-17181518
 ] 

Bo Cui edited comment on HBASE-24924 at 8/21/20, 3:03 AM:
--

i think we can enhance it. 

[https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-protocol-shaded/src/main/protobuf/server/master/MasterProcedure.proto#L497]
{code:java}
message InitMetaStateData {
   required int32 latch = 1;
}
{code}
if the completed InitMetaProcedure exists, and meta znode does not exist, 
master should create new InitMetaProcedure 
 
[https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1046]
{code:java}
filter(p -> (p instanceof InitMetaProcedure && !p.isFinished()))
{code}


was (Author: bo cui):
i think we should strengthen it. 

[https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-protocol-shaded/src/main/protobuf/server/master/MasterProcedure.proto#L497]
{code:java}
message InitMetaStateData {
   required int32 latch = 1;
}
{code}

if the completed InitMetaProcedure exists, and meta znode does not exist, 
master should create new InitMetaProcedure 
https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1046

{code:java}
filter(p -> (p instanceof InitMetaProcedure && !p.isFinished()))
{code}


> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, 
> and meta znode does not exist, finishActiveMasterInitialization will stuck
> because during startup,If InitMetaProcedure exists, InitMetaProcedure 
> recreates a new CountDownLatch.
> master jstack
>  !image-2020-08-21-09-12-33-894.png!
> master log 
>  !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24925) SCP reduce unnecessary get requests

2020-08-20 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24925:
---
Component/s: MTTR

> SCP reduce unnecessary get requests
> ---
>
> Key: HBASE-24925
> URL: https://issues.apache.org/jira/browse/HBASE-24925
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Priority: Major
>
> SCP should reduce unnecessary Get request
> [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520]
> during startup, the tableNam2State of tableStateManager is not loading 
> tableState data form metaTable yet.  if procThread num is 50 and hbase has 
> 10K tables, in the worst case, the master needs to query meta table 500K 
> times(50*10K. and the regions that all SCPs simultaneously check tableState 
> belong to the same table )
>  
> i think master can reduce Get request, and AM#loadMeta can load regions and 
> all tables
> [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24925) SCP reduce unnecessary get requests

2020-08-20 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24925:
---
Affects Version/s: 3.0.0-alpha-1
   2.2.3

> SCP reduce unnecessary get requests
> ---
>
> Key: HBASE-24925
> URL: https://issues.apache.org/jira/browse/HBASE-24925
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Priority: Major
>
> SCP should reduce unnecessary Get request
> [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520]
> during startup, the tableNam2State of tableStateManager is not loading 
> tableState data form metaTable yet.  if procThread num is 50 and hbase has 
> 10K tables, in the worst case, the master needs to query meta table 500K 
> times(50*10K. and the regions that all SCPs simultaneously check tableState 
> belong to the same table )
>  
> i think master can reduce Get request, and AM#loadMeta can load regions and 
> all tables
> [https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-20 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24924:
---
Component/s: master

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, 
> and meta znode does not exist, finishActiveMasterInitialization will stuck
> because during startup,If InitMetaProcedure exists, InitMetaProcedure 
> recreates a new CountDownLatch.
> master jstack
>  !image-2020-08-21-09-12-33-894.png!
> master log 
>  !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24925) SCP reduce unnecessary get requests

2020-08-20 Thread Bo Cui (Jira)
Bo Cui created HBASE-24925:
--

 Summary: SCP reduce unnecessary get requests
 Key: HBASE-24925
 URL: https://issues.apache.org/jira/browse/HBASE-24925
 Project: HBase
  Issue Type: Improvement
Reporter: Bo Cui


SCP should reduce unnecessary Get request

[https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L520]

during startup, the tableNam2State of tableStateManager is not loading 
tableState data form metaTable yet.  if procThread num is 50 and hbase has 10K 
tables, in the worst case, the master needs to query meta table 500K 
times(50*10K. and the regions that all SCPs simultaneously check tableState 
belong to the same table )

 

i think master can reduce Get request, and AM#loadMeta can load regions and all 
tables

[https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1532]

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-20 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181518#comment-17181518
 ] 

Bo Cui edited comment on HBASE-24924 at 8/21/20, 2:28 AM:
--

i think we should strengthen it. 

[https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-protocol-shaded/src/main/protobuf/server/master/MasterProcedure.proto#L497]
{code:java}
message InitMetaStateData {
   required int32 latch = 1;
}
{code}

if the completed InitMetaProcedure exists, and meta znode does not exist, 
master should create new InitMetaProcedure 
https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1046

{code:java}
filter(p -> (p instanceof InitMetaProcedure && !p.isFinished()))
{code}



was (Author: bo cui):
https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-protocol-shaded/src/main/protobuf/server/master/MasterProcedure.proto#L497

i think we should strengthen it.
{code:java}
message InitMetaStateData {
   required int32 latch = 1;
}
{code}

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, 
> and meta znode does not exist, finishActiveMasterInitialization will stuck
> because during startup,If InitMetaProcedure exists, InitMetaProcedure 
> recreates a new CountDownLatch.
> master jstack
>  !image-2020-08-21-09-12-33-894.png!
> master log 
>  !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-20 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24924:
---
Description: 
if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, and 
meta znode does not exist, finishActiveMasterInitialization will stuck

because during startup,If InitMetaProcedure exists, InitMetaProcedure recreates 
a new CountDownLatch.

master jstack
 !image-2020-08-21-09-12-33-894.png!

master log 
 !masterLog.gif!

  was:
if procWAL has InitMetaProcedure,  and InitMetaProcedure  state is finished, 
and meta znode   does not exist, finishActiveMasterInitialization will stuck

master jstack
 !image-2020-08-21-09-12-33-894.png! 

master log 
!masterLog.gif!


> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure, and InitMetaProcedure state is finished, 
> and meta znode does not exist, finishActiveMasterInitialization will stuck
> because during startup,If InitMetaProcedure exists, InitMetaProcedure 
> recreates a new CountDownLatch.
> master jstack
>  !image-2020-08-21-09-12-33-894.png!
> master log 
>  !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-20 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181518#comment-17181518
 ] 

Bo Cui edited comment on HBASE-24924 at 8/21/20, 2:09 AM:
--

https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-protocol-shaded/src/main/protobuf/server/master/MasterProcedure.proto#L497

i think we should strengthen it.
{code:java}
message InitMetaStateData {
   required int32 latch = 1;
}
{code}


was (Author: bo cui):
[https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1046]

i think we should strengthen it.
{code:java}
filter(p -> (p instanceof InitMetaProcedure && !p.isFinished()))
{code}

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure,  and InitMetaProcedure  state is finished, 
> and meta znode   does not exist, finishActiveMasterInitialization will stuck
> master jstack
>  !image-2020-08-21-09-12-33-894.png! 
> master log 
> !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-20 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181521#comment-17181521
 ] 

Bo Cui commented on HBASE-24924:


reproduction steps

1、start new hbase,master will create meta table and submit InitMetaProcedure to 
procedureExecutor
2、wait until the InitMetaProcedure is complete, and then kill master and all rs
3、delete meta znode
4、start hbase, load procWAL, and load the completed InitMetaProcedure, master 
is stuck

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure,  and InitMetaProcedure  state is finished, 
> and meta znode   does not exist, finishActiveMasterInitialization will stuck
> master jstack
>  !image-2020-08-21-09-12-33-894.png! 
> master log 
> !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-20 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-24924:
--

Assignee: Bo Cui

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure,  and InitMetaProcedure  state is finished, 
> and meta znode   does not exist, finishActiveMasterInitialization will stuck
> master jstack
>  !image-2020-08-21-09-12-33-894.png! 
> master log 
> !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-20 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181518#comment-17181518
 ] 

Bo Cui commented on HBASE-24924:


[https://github.com/apache/hbase/blob/65d28da7c22382e040363c607840d5ab6e6b45da/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1046]

i think we should strengthen it.
{code:java}
filter(p -> (p instanceof InitMetaProcedure && !p.isFinished()))
{code}

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure,  and InitMetaProcedure  state is finished, 
> and meta znode   does not exist, finishActiveMasterInitialization will stuck
> master jstack
>  !image-2020-08-21-09-12-33-894.png! 
> master log 
> !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-20 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24924:
---
Affects Version/s: 3.0.0-alpha-1
   2.2.3

> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure,  and InitMetaProcedure  state is finished, 
> and meta znode   does not exist, finishActiveMasterInitialization will stuck
> master jstack
>  !image-2020-08-21-09-12-33-894.png! 
> master log 
> !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-20 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24924:
---
Description: 
if procWAL has InitMetaProcedure,  and InitMetaProcedure  state is finished, 
and meta znode   does not exist, finishActiveMasterInitialization will stuck

master jstack
 !image-2020-08-21-09-12-33-894.png! 

master log 
!masterLog.gif!

  was:
if procWAL has InitMetaProcedure,  and InitMetaProcedure  state is finished, 
and meta znode   does not exist, finishActiveMasterInitialization will stuck

master jstack
 !image-2020-08-21-09-12-33-894.png! 

master log 



> stuck InitMetaProcedure in finishActiveMasterInitialization
> ---
>
> Key: HBASE-24924
> URL: https://issues.apache.org/jira/browse/HBASE-24924
> Project: HBase
>  Issue Type: Bug
>Reporter: Bo Cui
>Priority: Major
> Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif
>
>
> if procWAL has InitMetaProcedure,  and InitMetaProcedure  state is finished, 
> and meta znode   does not exist, finishActiveMasterInitialization will stuck
> master jstack
>  !image-2020-08-21-09-12-33-894.png! 
> master log 
> !masterLog.gif!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24924) stuck InitMetaProcedure in finishActiveMasterInitialization

2020-08-20 Thread Bo Cui (Jira)
Bo Cui created HBASE-24924:
--

 Summary: stuck InitMetaProcedure in 
finishActiveMasterInitialization
 Key: HBASE-24924
 URL: https://issues.apache.org/jira/browse/HBASE-24924
 Project: HBase
  Issue Type: Bug
Reporter: Bo Cui
 Attachments: image-2020-08-21-09-12-33-894.png, masterLog.gif

if procWAL has InitMetaProcedure,  and InitMetaProcedure  state is finished, 
and meta znode   does not exist, finishActiveMasterInitialization will stuck

master jstack
 !image-2020-08-21-09-12-33-894.png! 

master log 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24885) STUCK RIT by hbck2 assigns

2020-08-17 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179303#comment-17179303
 ] 

Bo Cui commented on HBASE-24885:


{quote}Want me to take this [~Bo Cui]. I had seen the second case occur dealing 
w/ a Replica that failed assign and had added investigation to my todo list. 
Thanks for digging in here.{quote}
[~stack] thx, i have assigned to u.

> STUCK RIT by hbck2 assigns
> --
>
> Key: HBASE-24885
> URL: https://issues.apache.org/jira/browse/HBASE-24885
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2, Region Assignment
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Michael Stack
>Priority: Major
>
> If a region has been assign to rs1 and then client assigns region again by 
> "hbck2 assigns"
> 1、if  regionPlan is region to be assign to rs2,the region will be opened on 
> rs1 and rs2.
> master log:
> {quote}WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: 
> rit=OPEN, location=rs2, table=tableName, region=reionName reported OPEN on 
> server=rs1 but state has otherwise
> {quote}
> 2、if regionPlan is region to be assign to rs1, the 
> TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 
> is not responding to master
>  rslog:
> {quote}Receiving OPEN for the region:{}, which we are already trying to OPEN 
> - ignoring this new request for this region.
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24885) STUCK RIT by hbck2 assigns

2020-08-17 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-24885:
--

Assignee: Michael Stack  (was: Bo Cui)

> STUCK RIT by hbck2 assigns
> --
>
> Key: HBASE-24885
> URL: https://issues.apache.org/jira/browse/HBASE-24885
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2, Region Assignment
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Michael Stack
>Priority: Major
>
> If a region has been assign to rs1 and then client assigns region again by 
> "hbck2 assigns"
> 1、if  regionPlan is region to be assign to rs2,the region will be opened on 
> rs1 and rs2.
> master log:
> {quote}WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: 
> rit=OPEN, location=rs2, table=tableName, region=reionName reported OPEN on 
> server=rs1 but state has otherwise
> {quote}
> 2、if regionPlan is region to be assign to rs1, the 
> TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 
> is not responding to master
>  rslog:
> {quote}Receiving OPEN for the region:{}, which we are already trying to OPEN 
> - ignoring this new request for this region.
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24885) STUCK RIT by hbck2 assigns

2020-08-17 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178833#comment-17178833
 ] 

Bo Cui commented on HBASE-24885:


bq. Is this only a problem of HBCK2? Or we could meet the same problem when 
calling admin.assign?

yeah, TRSP can judge current regionState in 
TRSP#executeFromState(REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE/REGION_STATE_TRANSITION_CLOSE)
https://github.com/apache/hbase/blob/7335dbc8345298c57b8da4ccba640d5432b3fde9/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java#L340

> STUCK RIT by hbck2 assigns
> --
>
> Key: HBASE-24885
> URL: https://issues.apache.org/jira/browse/HBASE-24885
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2, Region Assignment
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> If a region has been assign to rs1 and then client assigns region again by 
> "hbck2 assigns"
> 1、if  regionPlan is region to be assign to rs2,the region will be opened on 
> rs1 and rs2.
> master log:
> {quote}WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: 
> rit=OPEN, location=rs2, table=tableName, region=reionName reported OPEN on 
> server=rs1 but state has otherwise
> {quote}
> 2、if regionPlan is region to be assign to rs1, the 
> TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 
> is not responding to master
>  rslog:
> {quote}Receiving OPEN for the region:{}, which we are already trying to OPEN 
> - ignoring this new request for this region.
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24885) STUCK RIT by hbck2 assigns

2020-08-17 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui reassigned HBASE-24885:
--

Assignee: Bo Cui

> STUCK RIT by hbck2 assigns
> --
>
> Key: HBASE-24885
> URL: https://issues.apache.org/jira/browse/HBASE-24885
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2, Region Assignment
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
>
> If a region has been assign to rs1 and then client assigns region again by 
> "hbck2 assigns"
> 1、if  regionPlan is region to be assign to rs2,the region will be opened on 
> rs1 and rs2.
> master log:
> {quote}WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: 
> rit=OPEN, location=rs2, table=tableName, region=reionName reported OPEN on 
> server=rs1 but state has otherwise
> {quote}
> 2、if regionPlan is region to be assign to rs1, the 
> TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 
> is not responding to master
>  rslog:
> {quote}Receiving OPEN for the region:{}, which we are already trying to OPEN 
> - ignoring this new request for this region.
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-16 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178685#comment-17178685
 ] 

Bo Cui commented on HBASE-23035:


[https://github.com/apache/hbase/blob/c2e0cf989e4a86169219161d4d889db80288e636/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L556]

[~anoop.hbase]  u are talking about it?

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24885) STUCK RIT by hbck2 assigns

2020-08-14 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24885:
---
Description: 
If a region has been assign to rs1 and then client assigns region again by 
"hbck2 assigns"

1、if  regionPlan is region to be assign to rs2,the region will be opened on rs1 
and rs2.

master log:
{quote}WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: 
rit=OPEN, location=rs2, table=tableName, region=reionName reported OPEN on 
server=rs1 but state has otherwise
{quote}
2、if regionPlan is region to be assign to rs1, the TransitRegionStateProcedure 
and OpenRegionProcedure will stuck. because rs1 is not responding to master
 rslog:
{quote}Receiving OPEN for the region:{}, which we are already trying to OPEN - 
ignoring this new request for this region.
{quote}
 

  was:
If a region has been assign to rs1 and then client assigns region again by 
"hbck2 assigns"

1、if  regionPlan is region to be assign to rs2,the region will be opened on rs1 
and rs2.

master log:
bq. WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, 
location=rs2, table=t1, region=16e485198b448131fd012e6ba3327040 reported OPEN 
on server=rs1 but state has otherwise

2、if regionPlan is region to be assign to rs1, the TransitRegionStateProcedure 
and OpenRegionProcedure will stuck. because rs1 is not responding to master
rslog:
bq. Receiving OPEN for the region:{}, which we are already trying to OPEN  - 
ignoring this new request for this region.

 


> STUCK RIT by hbck2 assigns
> --
>
> Key: HBASE-24885
> URL: https://issues.apache.org/jira/browse/HBASE-24885
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2, Region Assignment
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Priority: Major
>
> If a region has been assign to rs1 and then client assigns region again by 
> "hbck2 assigns"
> 1、if  regionPlan is region to be assign to rs2,the region will be opened on 
> rs1 and rs2.
> master log:
> {quote}WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: 
> rit=OPEN, location=rs2, table=tableName, region=reionName reported OPEN on 
> server=rs1 but state has otherwise
> {quote}
> 2、if regionPlan is region to be assign to rs1, the 
> TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 
> is not responding to master
>  rslog:
> {quote}Receiving OPEN for the region:{}, which we are already trying to OPEN 
> - ignoring this new request for this region.
> {quote}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24885) STUCK RIT by hbck2 assigns

2020-08-14 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24885:
---
Affects Version/s: 3.0.0-alpha-1

> STUCK RIT by hbck2 assigns
> --
>
> Key: HBASE-24885
> URL: https://issues.apache.org/jira/browse/HBASE-24885
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2, Region Assignment
>Affects Versions: 3.0.0-alpha-1, 2.2.3
>Reporter: Bo Cui
>Priority: Major
>
> If a region has been assign to rs1 and then client assigns region again by 
> "hbck2 assigns"
> 1、if  regionPlan is region to be assign to rs2,the region will be opened on 
> rs1 and rs2.
> master log:
> bq. WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: 
> rit=OPEN, location=rs2, table=t1, region=16e485198b448131fd012e6ba3327040 
> reported OPEN on server=rs1 but state has otherwise
> 2、if regionPlan is region to be assign to rs1, the 
> TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 
> is not responding to master
> rslog:
> bq. Receiving OPEN for the region:{}, which we are already trying to OPEN  - 
> ignoring this new request for this region.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24885) STUCK RIT by hbck2 assigns

2020-08-14 Thread Bo Cui (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Cui updated HBASE-24885:
---
Description: 
If a region has been assign to rs1 and then client assigns region again by 
"hbck2 assigns"

1、if  regionPlan is region to be assign to rs2,the region will be opened on rs1 
and rs2.

master log:
bq. WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, 
location=rs2, table=t1, region=16e485198b448131fd012e6ba3327040 reported OPEN 
on server=rs1 but state has otherwise

2、if regionPlan is region to be assign to rs1, the TransitRegionStateProcedure 
and OpenRegionProcedure will stuck. because rs1 is not responding to master
rslog:
bq. Receiving OPEN for the region:{}, which we are already trying to OPEN  - 
ignoring this new request for this region.

 

  was:
If a region has been assign to rs1 and then client assigns region again by 
"hbck2 assigns"

1、if  regionPlan is region to be assign to rs2,the region will be opened on rs1 
and rs2.

master log:
 bq. WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: 
rit=OPEN, location=rs2, table=t1, region=16e485198b448131fd012e6ba3327040 
reported OPEN on server=rs1 but state has otherwise

2、if regionPlan is region to be assign to rs1, the TransitRegionStateProcedure 
and OpenRegionProcedure will stuck. because rs1 is not responding to master
rslog:
bq. Receiving OPEN for the region:{}, which we are already trying to OPEN  - 
ignoring this new request for this region.

 


> STUCK RIT by hbck2 assigns
> --
>
> Key: HBASE-24885
> URL: https://issues.apache.org/jira/browse/HBASE-24885
> Project: HBase
>  Issue Type: Bug
>  Components: hbck2, Region Assignment
>Affects Versions: 2.2.3
>Reporter: Bo Cui
>Priority: Major
>
> If a region has been assign to rs1 and then client assigns region again by 
> "hbck2 assigns"
> 1、if  regionPlan is region to be assign to rs2,the region will be opened on 
> rs1 and rs2.
> master log:
> bq. WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: 
> rit=OPEN, location=rs2, table=t1, region=16e485198b448131fd012e6ba3327040 
> reported OPEN on server=rs1 but state has otherwise
> 2、if regionPlan is region to be assign to rs1, the 
> TransitRegionStateProcedure and OpenRegionProcedure will stuck. because rs1 
> is not responding to master
> rslog:
> bq. Receiving OPEN for the region:{}, which we are already trying to OPEN  - 
> ignoring this new request for this region.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24885) STUCK RIT by hbck2 assigns

2020-08-14 Thread Bo Cui (Jira)
Bo Cui created HBASE-24885:
--

 Summary: STUCK RIT by hbck2 assigns
 Key: HBASE-24885
 URL: https://issues.apache.org/jira/browse/HBASE-24885
 Project: HBase
  Issue Type: Bug
  Components: hbck2, Region Assignment
Affects Versions: 2.2.3
Reporter: Bo Cui


If a region has been assign to rs1 and then client assigns region again by 
"hbck2 assigns"

1、if  regionPlan is region to be assign to rs2,the region will be opened on rs1 
and rs2.

master log:
 bq. WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: 
rit=OPEN, location=rs2, table=t1, region=16e485198b448131fd012e6ba3327040 
reported OPEN on server=rs1 but state has otherwise

2、if regionPlan is region to be assign to rs1, the TransitRegionStateProcedure 
and OpenRegionProcedure will stuck. because rs1 is not responding to master
rslog:
bq. Receiving OPEN for the region:{}, which we are already trying to OPEN  - 
ignoring this new request for this region.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-12 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176744#comment-17176744
 ] 

Bo Cui commented on HBASE-23035:


[~zghao]

During startup, hbase needs to assign region to previous rs without affecting 
the scan performance,  so we can add conf to solve this problem

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2020-08-10 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174189#comment-17174189
 ] 

Bo Cui commented on HBASE-23035:


{quote}And the locality is not big deal when deploy HBase on cloud.
{quote}
[~zghao]

hi, but some hbase cluster is not on the cloud, 

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-21721) reduce write#syncs() times

2020-08-07 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-21721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173534#comment-17173534
 ] 

Bo Cui commented on HBASE-21721:


[~anoop.hbase] master has been updated

> reduce write#syncs() times
> --
>
> Key: HBASE-21721
> URL: https://issues.apache.org/jira/browse/HBASE-21721
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.1, 2.1.1, master, 2.2.3
>Reporter: Bo Cui
>Assignee: Bo Cui
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.6
>
>
> the number of write#syncs can be reduced by updating the 
> highestUnsyncedSequence:
> before write#sync(), get the current highestUnsyncedSequence 
> after write#sync, highestSyncedSequence=highestUnsyncedSequence
>  
> {code:title=FSHLog.java|borderStyle=solid}
> // Some comments here
> public void run()
> {
> long currentSequence;
>   while (!isInterrupted()) {
> int syncCount = 0;
> try {
>   while (true) {
> ...
>   try {
> Trace.addTimelineAnnotation("syncing writer");
> long unSyncedFlushSeq = highestUnsyncedSequence;
> writer.sync();
> Trace.addTimelineAnnotation("writer synced");
> if( unSyncedFlushSeq > currentSequence ) currentSequence = 
> unSyncedFlushSeq;
> currentSequence = updateHighestSyncedSequence(currentSequence);
>   } catch (IOException e) {
> LOG.error("Error syncing, request close of WAL", e);
> lastException = e;
>   } catch (Exception e) {
>...
> }
> }
> {code}
> Add code
>  long unSyncedFlushSeq = highestUnsyncedSequence;
>  if( unSyncedFlushSeq > currentSequence ) currentSequence = unSyncedFlushSeq;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-22263) Master creates duplicate ServerCrashProcedure on initialization, leading to assignment hanging in region-dense clusters

2020-08-04 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170653#comment-17170653
 ] 

Bo Cui commented on HBASE-22263:


[~busbey] yeah, the issue affects only branch-1

i will raise a new branch-1 PR and close branch-1.4 PR

> Master creates duplicate ServerCrashProcedure on initialization, leading to 
> assignment hanging in region-dense clusters
> ---
>
> Key: HBASE-22263
> URL: https://issues.apache.org/jira/browse/HBASE-22263
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, Region Assignment
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0
>Reporter: Sean Busbey
>Assignee: Bo Cui
>Priority: Critical
> Attachments: HBASE-22263-branch-1.v0.add.patch, 
> HBASE-22263-branch-1.v0.patch
>
>
> h3. Problem:
> During Master initialization we
>  # restore existing procedures that still need to run from prior active 
> Master instances
>  # look for signs that Region Servers have died and need to be recovered 
> while we were out and schedule a ServerCrashProcedure (SCP) for each them
>  # turn on the assignment manager
> The normal turn of events for a ServerCrashProcedure will attempt to use a 
> bulk assignment to maintain the set of regions on a RS if possible. However, 
> we wait around and retry a bit later if the assignment manager isn’t ready 
> yet.
> Note that currently #2 has no notion of wether or not a previous active 
> Master instances has already done a check. This means we might schedule an 
> SCP for a ServerName (host, port, start code) that already has an SCP 
> scheduled. Ideally, such a duplicate should be a no-op.
> However, before step #2 schedules the SCP it first marks the region server as 
> dead and not yet processed, with the expectation that the SCP it just created 
> will look if there is log splitting work and then mark the server as easy for 
> region assignment. At the same time, any restored SCPs that are past the step 
> of log splitting will be waiting for the AssignmentManager still. As a part 
> of restoring themselves, they do not update with the current master instance 
> to show that they are past the point of WAL processing.
> Once the AssignmentManager starts in #3 the restored SCP continues; it will 
> eventually get to the assignment phase and find that its server is marked as 
> dead and in need of wal processing. Such assignments are skipped with a log 
> message. Thus as we iterate over the regions to assign we’ll skip all of 
> them. This non-intuitively shifts the “no-op” status from the newer SCP we 
> scheduled at #2 to the older SCP that was restored in #1.
> Bulk assignment works by sending the assign calls via a pool to allow more 
> parallelism. Once we’ve set up the pool we just wait to see if the region 
> state updates to online. Unfortunately, since all of the assigns got skipped, 
> we’ll never change the state for any of these regions. That means the bulk 
> assign, and the older SCP that started it, will wait until it hits a timeout.
> By default the timeout for a bulk assignment is the smaller of {{(# Regions 
> in the plan * 10s)}} or {{(# Regions in the most loaded RS in the plan * 1s + 
> 60s + # of RegionServers in the cluster * 30s)}}. For even modest clusters 
> with several hundreds of regions per region server, this means the “no-op” 
> SCP will end up waiting ~tens-of-minutes (e.g. ~50 minutes for an average 
> region density of 300 regions per region server on a 100 node cluster. ~11 
> minutes for 300 regions per region server on a 10 node cluster). During this 
> time, the SCP will hold one of the available procedure execution slots for 
> both the overall pool and for the specific server queue.
> As previously mentioned, restored SCPs will retry their submission if the 
> assignment manager has not yet been activated (done in #3), this can cause 
> them to be scheduled after the newer SCPs (created in #2). Thus the order of 
> execution of no-op and usable SCPs can vary from run-to-run of master 
> initialization.
> This means that unless you get lucky with SCP ordering, impacted regions will 
> remain as RIT for an extended period of time. If you get particularly unlucky 
> and a critical system table is included in the regions that are being 
> recovered, then master initialization itself will end up blocked on this 
> sequence of SCP timeouts. If there are enough of them to exceed the master 
> initialization timeouts, then the situation can be self-sustaining as 
> additional master fails over cause even more duplicative SCPs to be scheduled.
> h3. Indicators:
>  * Master appears to hang; failing to assign regions to available region 
> servers.
>  * Master appears to hang during initialization; 

[jira] [Comment Edited] (HBASE-21721) reduce write#syncs() times

2020-07-25 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-21721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165159#comment-17165159
 ] 

Bo Cui edited comment on HBASE-21721 at 7/26/20, 4:46 AM:
--

# yes, AsyncFsWAL is not in the 1.3 and older hbase
 # in 2.x version, default is async WAL, but FSHLog exists
 # the patch does not apply with AsyncFSWAL


was (Author: bo cui):
# yes, AsyncFsWAL is not in the 1.3 and older hbase
 # in 2.x version, default is async WAL, but FSWAL exists
 # the patch does not apply with AsyncFSWAL

> reduce write#syncs() times
> --
>
> Key: HBASE-21721
> URL: https://issues.apache.org/jira/browse/HBASE-21721
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.1, 2.1.1, master, 2.2.3
>Reporter: Bo Cui
>Priority: Major
>
> the number of write#syncs can be reduced by updating the 
> highestUnsyncedSequence:
> before write#sync(), get the current highestUnsyncedSequence 
> after write#sync, highestSyncedSequence=highestUnsyncedSequence
>  
> {code:title=FSHLog.java|borderStyle=solid}
> // Some comments here
> public void run()
> {
> long currentSequence;
>   while (!isInterrupted()) {
> int syncCount = 0;
> try {
>   while (true) {
> ...
>   try {
> Trace.addTimelineAnnotation("syncing writer");
> long unSyncedFlushSeq = highestUnsyncedSequence;
> writer.sync();
> Trace.addTimelineAnnotation("writer synced");
> if( unSyncedFlushSeq > currentSequence ) currentSequence = 
> unSyncedFlushSeq;
> currentSequence = updateHighestSyncedSequence(currentSequence);
>   } catch (IOException e) {
> LOG.error("Error syncing, request close of WAL", e);
> lastException = e;
>   } catch (Exception e) {
>...
> }
> }
> {code}
> Add code
>  long unSyncedFlushSeq = highestUnsyncedSequence;
>  if( unSyncedFlushSeq > currentSequence ) currentSequence = unSyncedFlushSeq;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-21721) reduce write#syncs() times

2020-07-25 Thread Bo Cui (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-21721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165159#comment-17165159
 ] 

Bo Cui commented on HBASE-21721:


# yes, AsyncFsWAL is not in the 1.3 and older hbase
 # in 2.x version, default is async WAL, but FSWAL exists
 # the patch does not apply with AsyncFSWAL

> reduce write#syncs() times
> --
>
> Key: HBASE-21721
> URL: https://issues.apache.org/jira/browse/HBASE-21721
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.1, 2.1.1, master, 2.2.3
>Reporter: Bo Cui
>Priority: Major
>
> the number of write#syncs can be reduced by updating the 
> highestUnsyncedSequence:
> before write#sync(), get the current highestUnsyncedSequence 
> after write#sync, highestSyncedSequence=highestUnsyncedSequence
>  
> {code:title=FSHLog.java|borderStyle=solid}
> // Some comments here
> public void run()
> {
> long currentSequence;
>   while (!isInterrupted()) {
> int syncCount = 0;
> try {
>   while (true) {
> ...
>   try {
> Trace.addTimelineAnnotation("syncing writer");
> long unSyncedFlushSeq = highestUnsyncedSequence;
> writer.sync();
> Trace.addTimelineAnnotation("writer synced");
> if( unSyncedFlushSeq > currentSequence ) currentSequence = 
> unSyncedFlushSeq;
> currentSequence = updateHighestSyncedSequence(currentSequence);
>   } catch (IOException e) {
> LOG.error("Error syncing, request close of WAL", e);
> lastException = e;
>   } catch (Exception e) {
>...
> }
> }
> {code}
> Add code
>  long unSyncedFlushSeq = highestUnsyncedSequence;
>  if( unSyncedFlushSeq > currentSequence ) currentSequence = unSyncedFlushSeq;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >