[jira] [Commented] (HBASE-25321) The sort icons not shown after Upgrade JQuery to 3.5.1

2020-11-23 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237562#comment-17237562
 ] 

Michael Stack commented on HBASE-25321:
---

Any chance of a PR [~AkshayTSudheer]  ? Thank you.

> The sort icons not shown after Upgrade JQuery to 3.5.1
> --
>
> Key: HBASE-25321
> URL: https://issues.apache.org/jira/browse/HBASE-25321
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 2.2.3
>Reporter: Akshay Sudheer
>Priority: Major
>
> The sort icons not shown after Upgrade JQuery to 3.5.1:
> Upgrade has changes on tablesorter class values. Need to change in hbase.css 
> accordingly.
> header changed to tablesorter-header
> headerSortUp changed to tablesorter-headerAsc
> headerSortDown changed to tablesorter-headerDesc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-18070) Enable memstore replication for meta replica

2020-11-21 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-18070.
---
Fix Version/s: (was: HBASE-18070.branch-2)
   (was: HBASE-18070)
   Resolution: Fixed

Resolving. All subtasks done. Thanks to the army that helped dev and land this 
feature.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
>  Labels: read-replicas
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-21 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236729#comment-17236729
 ] 

Michael Stack commented on HBASE-18070:
---

The last subtask has just been committed – updating the checked in design on 
master and branch-2.

Last night's nightly on master failed but all the backup tests passed 
[https://ci-hadoop.apache.org/view/HBase/job/HBase/job/HBase%20Nightly/job/master/133/testReport/org.apache.hadoop.hbase.backup/]
 So, I'll let go of my concern that the merge of this feature to master broke 
backup.

 

Last nights branch-2 #106 is not yet complete but it looks like a failure in 
the jdk11 runs but unrelated.

[Test 
Result|https://ci-hadoop.apache.org/view/HBase/job/HBase/job/HBase%20Nightly/job/branch-2/106/testReport/]
 (1 failure / +1)
 * 
[org.apache.hadoop.hbase.master.TestSplitRegionWhileRSCrash.|https://ci-hadoop.apache.org/view/HBase/job/HBase/job/HBase%20Nightly/job/branch-2/106/testReport/junit/org.apache.hadoop.hbase.master/TestSplitRegionWhileRSCrash//]

Let me queue up a new nightly now.

I think we are done here. Let me resolve.

 

 

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
>  Labels: read-replicas
> Fix For: 3.0.0-alpha-1, 2.4.0, HBASE-18070, HBASE-18070.branch-2
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25284) Check-in "Enable memstore replication..." design

2020-11-21 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25284.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Re-resolving after updating the already checked-in HBASE-18070 design doc. 
Applied to branch-2+. Resolving.

> Check-in "Enable memstore replication..." design
> 
>
> Key: HBASE-25284
> URL: https://issues.apache.org/jira/browse/HBASE-25284
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Add the design doc under dev-support/design-docs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25284) Check-in "Enable memstore replication..." design

2020-11-21 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236725#comment-17236725
 ] 

Michael Stack commented on HBASE-25284:
---

I updated again to pick up the new fixes and then pushed. Thanks for the review 
[~zhangduo]

> Check-in "Enable memstore replication..." design
> 
>
> Key: HBASE-25284
> URL: https://issues.apache.org/jira/browse/HBASE-25284
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Add the design doc under dev-support/design-docs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236550#comment-17236550
 ] 

Michael Stack commented on HBASE-18070:
---

Just waiting on HBASE-25284 to resolve [~apurtell]  Otherwise, done.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
>  Labels: read-replicas
> Fix For: 3.0.0-alpha-1, 2.4.0, HBASE-18070, HBASE-18070.branch-2
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25284) Check-in "Enable memstore replication..." design

2020-11-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236548#comment-17236548
 ] 

Michael Stack commented on HBASE-25284:
---

I updated the checked in pdf to be a copy of the design as of now. +1 please. 
This is the last issue on HBASE-18070.

> Check-in "Enable memstore replication..." design
> 
>
> Key: HBASE-25284
> URL: https://issues.apache.org/jira/browse/HBASE-25284
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Add the design doc under dev-support/design-docs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25126) Add load balance logic in hbase-client to distribute read load over meta replica regions.

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25126.
---
Hadoop Flags: Reviewed
Release Note: See parent issue, HBASE-18070, release notes for how to 
enable.
  Resolution: Fixed

Resolving [~huaxiangsun]  I see this patch in master and 2.4 so presume it is 
done. Reopening if I have it wrong.

> Add load balance logic in hbase-client to distribute read load over meta 
> replica regions.
> -
>
> Key: HBASE-25126
> URL: https://issues.apache.org/jira/browse/HBASE-25126
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha-1
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25127) Enhance PerformanceEvaluation to profile meta replica performance.

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25127:
--
Parent Issue: HBASE-25315  (was: HBASE-18070)

> Enhance PerformanceEvaluation to profile meta replica performance.
> --
>
> Key: HBASE-25127
> URL: https://issues.apache.org/jira/browse/HBASE-25127
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Clara Xiong
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
> Attachments: Screen Shot 2020-11-13 at 5.30.11 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25127) Enhance PerformanceEvaluation to profile meta replica performance.

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25127:
--
Fix Version/s: 2.4.0

> Enhance PerformanceEvaluation to profile meta replica performance.
> --
>
> Key: HBASE-25127
> URL: https://issues.apache.org/jira/browse/HBASE-25127
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Clara Xiong
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
> Attachments: Screen Shot 2020-11-13 at 5.30.11 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25126) Add load balance logic in hbase-client to distribute read load over meta replica regions.

2020-11-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236529#comment-17236529
 ] 

Michael Stack commented on HBASE-25126:
---

This can be resolved [~huaxiangsun] ?

> Add load balance logic in hbase-client to distribute read load over meta 
> replica regions.
> -
>
> Key: HBASE-25126
> URL: https://issues.apache.org/jira/browse/HBASE-25126
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha-1
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25151) warmupRegion frustrates registering WALs on the catalog replicationsource

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25151:
--
Fix Version/s: (was: HBASE-18070)
   2.4.0
   3.0.0-alpha-1

> warmupRegion frustrates registering WALs on the catalog replicationsource
> -
>
> Key: HBASE-25151
> URL: https://issues.apache.org/jira/browse/HBASE-25151
> Project: HBase
>  Issue Type: Sub-task
>  Components: read replicas
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Writing a test for HBASE-25145
> I noticed that the warmupRegion call triggered by the Master on Region move 
> mess-up registering hbase:meta ReplicationSource. Add accommodation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25126) Add load balance logic in hbase-client to distribute read load over meta replica regions.

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25126:
--
Fix Version/s: 2.4.0
   3.0.0-alpha-1

> Add load balance logic in hbase-client to distribute read load over meta 
> replica regions.
> -
>
> Key: HBASE-25126
> URL: https://issues.apache.org/jira/browse/HBASE-25126
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha-1
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25068) Pass WALFactory to Replication so it knows of all WALProviders, not just default/user-space

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25068:
--
Fix Version/s: (was: HBASE-18070)
   2.4.0
   3.0.0-alpha-1

> Pass WALFactory to Replication so it knows of all WALProviders, not just 
> default/user-space
> ---
>
> Key: HBASE-25068
> URL: https://issues.apache.org/jira/browse/HBASE-25068
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.0
>
> Attachments: 
> 0001-HBASE-25068-Pass-WALFactory-to-Replication-so-it-kno.patch.2, 
> 0001-HBASE-25068-Pass-WALFactory-to-Replication-so-it-kno.patch.master
>
>
> Small change that passes all WALProviders to ReplicationService rather than 
> just the default/user-space WALProvider. It does this using the WALFactory 
> vessel since it holds all Providers. This change is to be exploited by 
> adjacent sub-task HBASE-25055 in follow-on. This sub-task also exists to make 
> the HBASE-25055 patch smaller and more focused, easier to review.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25055) Add ReplicationSource for meta WALs; add enable/disable when hbase:meta assigned to RS

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25055:
--
Fix Version/s: (was: HBASE-18070)
   2.4.0
   3.0.0-alpha-1

> Add ReplicationSource for meta WALs; add enable/disable when hbase:meta 
> assigned to RS
> --
>
> Key: HBASE-25055
> URL: https://issues.apache.org/jira/browse/HBASE-25055
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Add ReplicationSource that feeds on hbase:meta WAL files. Add enabling this 
> source when hbase:meta is opened and hbase:meta region replicas are 
> configured ON. Disable the source when the hbase:meta Region moves away.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236527#comment-17236527
 ] 

Michael Stack commented on HBASE-18070:
---

The combined branch-2 patch passed all green: 
[https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2683/]
  I merged the feature branch HBASE-18070.branch-2 to branch-2.

 

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
>  Labels: read-replicas
> Fix For: 3.0.0-alpha-1, 2.4.0, HBASE-18070, HBASE-18070.branch-2
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25291) Document how to enable the meta replica load balance mode for the client and clean up around hbase:meta read replicas

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25291.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Backported to branch-2 after merge of HBASE-18070.branch-2 feature branch into 
branch-2.

> Document how to enable the meta replica load balance mode for the client and 
> clean up around hbase:meta read replicas
> -
>
> Key: HBASE-25291
> URL: https://issues.apache.org/jira/browse/HBASE-25291
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Huaxiang Sun
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Need to document how to enable meta replica Load Balance mode for clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25291) Document how to enable the meta replica load balance mode for the client and clean up around hbase:meta read replicas

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25291:
--
Fix Version/s: (was: HBASE-18070.branch-2)
   (was: HBASE-18070)
   2.4.0
   3.0.0-alpha-1

> Document how to enable the meta replica load balance mode for the client and 
> clean up around hbase:meta read replicas
> -
>
> Key: HBASE-25291
> URL: https://issues.apache.org/jira/browse/HBASE-25291
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Huaxiang Sun
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Need to document how to enable meta replica Load Balance mode for clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25310) HBASE-18070 makes for NPEs in some hbase-backup tests on master

2020-11-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236522#comment-17236522
 ] 

Michael Stack commented on HBASE-25310:
---

In nightly build #312, all balancer tests passed. My current thinking is that 
the backup test failures are not related to HBASE-18070. Keeping an eye on it. 
Started up nightly build #313.

> HBASE-18070 makes for NPEs in some hbase-backup tests on master
> ---
>
> Key: HBASE-25310
> URL: https://issues.apache.org/jira/browse/HBASE-25310
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Michael Stack
>Priority: Major
>
> Let me work on this as a distinct issue. The hbase-backup tests are 
> complex/massive so need to do some study. Its the jdk8 runs that fail. 
> [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2643/]
>  jdk11 passes.  All pass locally too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25310) HBASE-18070 makes for NPEs in some hbase-backup tests on master

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25310:
--
Parent Issue: HBASE-25315  (was: HBASE-18070)

> HBASE-18070 makes for NPEs in some hbase-backup tests on master
> ---
>
> Key: HBASE-25310
> URL: https://issues.apache.org/jira/browse/HBASE-25310
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Michael Stack
>Priority: Major
>
> Let me work on this as a distinct issue. The hbase-backup tests are 
> complex/massive so need to do some study. Its the jdk8 runs that fail. 
> [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2643/]
>  jdk11 passes.  All pass locally too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-18070) Enable memstore replication for meta replica

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-18070:
--
Fix Version/s: (was: 2.5.0)
   2.4.0
 Hadoop Flags: Reviewed
 Release Note: 
"Async WAL Replication" [1] was added by HBASE-11183 "Timeline Consistent 
region replicas - Phase 2 design" but only for user-space tables. This feature 
adds "Async WAL Replication" for the hbase:meta table.  It also adds a client 
'LoadBalance' mode that has reads go to replicas first and to the primary only 
on fail so as to shed read load from the primary to alleviate *hotspotting* on 
the hbase:meta Region.

Configuration is as it was for the user-space 'Async WAL Replication'. See [2] 
and [3] for details on how to enable.

1. http://hbase.apache.org/book.html#async.wal.replication
2. http://hbase.apache.org/book.html#async.wal.replication.meta
3. 
http://hbase.apache.org/book.html#_async_wal_replication_for_meta_table_as_of_hbase_2_4_0
   Labels: read-replicas  (was: )

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
>  Labels: read-replicas
> Fix For: 3.0.0-alpha-1, 2.4.0, HBASE-18070, HBASE-18070.branch-2
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25125) Create a ReplicationEndPoint for meta/root replica replication.

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25125:
--
Parent Issue: HBASE-25315  (was: HBASE-18070)

> Create a ReplicationEndPoint for meta/root replica replication.
> ---
>
> Key: HBASE-25125
> URL: https://issues.apache.org/jira/browse/HBASE-25125
> Project: HBase
>  Issue Type: Sub-task
>  Components: read replicas
>Affects Versions: 3.0.0-alpha-1
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25158) Enhance balancer to make sure no meta primary/replica regions are going to be assigned to one same region server.

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25158:
--
Parent Issue: HBASE-25315  (was: HBASE-18070)

> Enhance balancer to make sure no meta primary/replica regions are going to be 
> assigned to one same region server.
> -
>
> Key: HBASE-25158
> URL: https://issues.apache.org/jira/browse/HBASE-25158
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Minor
>
> Region replica has enhancement in balancer that primary region and its 
> replicas are not going to be assigned to the same region server. Today, there 
> is only one meta region, so this enhancement is still enough. With split meta 
> coming in, it needs to make sure that no meta regoin/replicas is going to be 
> assigned to the same region server in order to avoid hotspot issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25241) Add integration test for meta replica load balance mode

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25241:
--
Parent Issue: HBASE-25315  (was: HBASE-18070)

> Add integration test for meta replica load balance mode
> ---
>
> Key: HBASE-25241
> URL: https://issues.apache.org/jira/browse/HBASE-25241
> Project: HBase
>  Issue Type: Sub-task
>  Components: integration tests
>Reporter: Huaxiang Sun
>Priority: Major
>
> We need to create an integration test which has meta replica load balance 
> mode enabled and make sure its correctness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25247) Followup jira to encap all meta replica mode/selector processing into CatalogReplicaModeManager

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25247:
--
Parent Issue: HBASE-25315  (was: HBASE-18070)

> Followup jira to encap all meta replica mode/selector processing into 
> CatalogReplicaModeManager
> ---
>
> Key: HBASE-25247
> URL: https://issues.apache.org/jira/browse/HBASE-25247
> Project: HBase
>  Issue Type: Sub-task
>  Components: meta
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Minor
>
> This is follow up with Stack's comments in 
> [https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0.]
> {quote}
> h4. *[saintstack|https://github.com/saintstack]* [6 days 
> ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r514558880]
>  
> Member
> Yeah, said this before but in follow-on, would be good to shove all this 
> stuff into a CatalogReplicaMode class. Internally this class would figure 
> which policy to run. It would have a method that took a Scan that allowed 
> decorating the Scan w/ whatever the mode needed to implement its policy. 
> Later.
>  
> [!https://avatars1.githubusercontent.com/u/62515050?s=60=4|width=28,height=28!|https://github.com/huaxiangsun]
>  
> h4. *[huaxiangsun|https://github.com/huaxiangsun]* [6 days 
> ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r514587250]
>  
> Author Member
> Now I thought about it, it makes sense. Maybe a CatalogReplicaModeManager 
> class which encaps mode and selector?
> Let me create a followup jira after this is merged.
>  
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25248) Followup jira to create single thread ScheduledExecutorService in AsyncConnImpl, and schedule all these periodic tasks

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25248:
--
Parent Issue: HBASE-25315  (was: HBASE-18070)

> Followup jira to create single thread ScheduledExecutorService in 
> AsyncConnImpl, and schedule all these periodic tasks
> --
>
> Key: HBASE-25248
> URL: https://issues.apache.org/jira/browse/HBASE-25248
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Priority: Minor
>
> This is a followup Jira for comments in 
> [https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0.]
>  
> {quote}
> h4. *[saintstack|https://github.com/saintstack]* [18 hours 
> ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r517040579]
>  Member
> So, implements Stoppable rather than do what the likes of AuthUtil does where 
> it does createDummyStoppable and then has an internal do-nothing Stoppable? 
> Makes sense.
> Perhaps add comment that it is a do-nothing stop required by ScheduledChore 
> impls. s/isStopped/stopped/
>  
> [!https://avatars1.githubusercontent.com/u/62515050?s=60=4|width=28,height=28!|https://github.com/huaxiangsun]
> h4. *[huaxiangsun|https://github.com/huaxiangsun]* [18 hours 
> ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r517042290]
>  Author Member
> Will do.
>  
> [!https://avatars2.githubusercontent.com/u/45484?s=60=4|width=28,height=28!|https://github.com/ndimiduk]
> h4. *[ndimiduk|https://github.com/ndimiduk]* [17 hours 
> ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r517057141]
>  Member
> Maybe in the future we can put a default empty implementation on the 
> interface, and then implementers who don't need it can ignore it.
>  
> [!https://avatars3.githubusercontent.com/u/4958168?s=60=fc28b222c03c02201d705b025a5293d6c471f7b3=4|width=28,height=28!|https://github.com/Apache9]
> h4. *[Apache9|https://github.com/Apache9]* [17 hours 
> ago|https://github.com/apache/hbase/pull/2584/commits/d99c2b0ccfd2a57150e984742d097d1e1fcc47b0#r517057999]
>  Member
> Maybe we could just use a ScheduledExecutorService at client side, the 
> ChoreService is designed to be used at server side I believe. Anyway, not a 
> blocker for now.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25293) Followup jira to address the client handling issue when chaning from meta replica to non-meta-replica at the server side.

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25293:
--
Parent Issue: HBASE-25315  (was: HBASE-18070)

> Followup jira to address the client handling issue when chaning from meta 
> replica to non-meta-replica at the server side.
> -
>
> Key: HBASE-25293
> URL: https://issues.apache.org/jira/browse/HBASE-25293
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Priority: Minor
>
> [https://github.com/apache/hbase/pull/2643]
>  
> {quote}
> With my operator hat on, I'd assume that LOAD_BALANCE with 1 replica count 
> works like no read replicas configured (logic wise at-least, even though the 
> code paths are different).
> {quote}If the server side does not support meta replica, the client side 
> cannot be configured to support this mode
> {quote}
> Since clients are usually long running (meaning we may not be able to restart 
> client or they using cached HBase connection) and meta replica count can be 
> altered on the service side on the fly, I'd expect client to work across 
> these changes without any configuration changes. WDYT?
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25294) Follow-on: defend against read replicas being enabled for server-side clients

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25294:
--
Parent Issue: HBASE-25315  (was: HBASE-18070)

> Follow-on: defend against read replicas being enabled for server-side clients 
> --
>
> Key: HBASE-25294
> URL: https://issues.apache.org/jira/browse/HBASE-25294
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Priority: Major
>
> A nice aid for operators would be to spend some time on ensuring that at 
> least 'LoadBalance' is not set for the clients that on the serverside inside 
> Master in particular. Currently our only defense is documentation. An 
> operator might set them for client-side and server-side by mistake. Defend 
> server-side clients against this possibility to avoid Master making decisions 
> based off stale state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236501#comment-17236501
 ] 

Michael Stack commented on HBASE-18070:
---

Created HBASE-25315 as an umbrella under which to hang the follow-ons we have 
accumulated here. Moving them over now so I can resolve this Jira against the 
version that first carries it.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, HBASE-18070, HBASE-18070.branch-2, 2.5.0
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25315) Follow-on tasks that came of "HBASE-18070 Enable memstore replication for meta replica"

2020-11-20 Thread Michael Stack (Jira)
Michael Stack created HBASE-25315:
-

 Summary: Follow-on tasks that came of "HBASE-18070 Enable memstore 
replication for meta replica"
 Key: HBASE-25315
 URL: https://issues.apache.org/jira/browse/HBASE-25315
 Project: HBase
  Issue Type: Umbrella
  Components: meta replicas
Reporter: Michael Stack


The HBASE-18070 _Enable memstore replication for meta replica_ 
turned up _follow-ons:_ tests, doc, guardrails and enhancements. Let me give 
them their own issue so they do not crowd the original and so I can resolve the 
original against the version that carries the first implementation of the 
enhancement (Want to avoid the alternative of an issue that stays open for 
ever).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236471#comment-17236471
 ] 

Michael Stack commented on HBASE-18070:
---

On master merge, the nightly #132 just finished. One unit test failed. It was 
NOT a backup test.  The backup tests all passed: 
[https://ci-hadoop.apache.org/view/HBase/job/HBase/job/HBase%20Nightly/job/master/132/testReport/org.apache.hadoop.hbase.backup/]
 It looks like the backup tests are flakey. I will start a new nightly now #133 
to be sure.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, HBASE-18070, HBASE-18070.branch-2, 2.5.0
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236387#comment-17236387
 ] 

Michael Stack commented on HBASE-18070:
---

The run of the aggregated patch for branch-2 failed because of HBASE-24877 
which came in since our last run of the aggregated patch. HBASE-18070.branch-2 
was missing HBASE-25126 backport for branch-2, the client-side changes. Adding 
this to our aggregated set should fix the HBASE-24877 failure ( [~huaxiangsun] 
ran into when tidying  HBASE-25126 for branch-2).  I've set a new hadoopqa run 
against the aggregated branch-2 merge patch here 
[https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2683/]
 


On master branch, still waiting on nightly #132 to see what damage I have 
wrought.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, HBASE-18070, HBASE-18070.branch-2, 2.5.0
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25290) Remove table on master related code

2020-11-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236255#comment-17236255
 ] 

Michael Stack commented on HBASE-25290:
---

{quote}Or back to the root requirements here, do you think it is worth to undo 
HMaster extends HRegionServer? I want to do it but after facing the maintenance 
mode problem, I'm a bit uncertain about whether it is still worth to do.
{quote}
I have not spent time on the issue as you have. The startup complexity is a 
fountain of bugs and prevents cleanup refactoring. Startup is about to become 
much more involved with more system tables in the mix so the complexity will 
hurt us more as we go forward (higher development time, persistent bugs).

Currently 'maintenance mode' is not much more than a 'stage' in startup – 
HBASE-21073.  It needs lots of speciification as to what it can do and 
development. It is not used that I know of (and not usable going by my notes at 
the end of the issue).

A simpler startup would make maintenance mode more straight forward to 
implement.

I am wary offering any more than these high level observations at this time.

> Remove table on master related code
> ---
>
> Key: HBASE-25290
> URL: https://issues.apache.org/jira/browse/HBASE-25290
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer, master
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> This could be a start of the HBASE-15549 feature branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25290) Remove table on master related code

2020-11-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235998#comment-17235998
 ] 

Michael Stack commented on HBASE-25290:
---

bq, In HBCKSCP we will scan meta to find 'Unknown Region Server', and I asked 
that what if we want to fix meta assignment? Since meta is not online we can 
not scan meta.

 

Yes. This is an issue to fix.
{quote}Maybe we will be in trouble when kerberos is enabled?
{quote}
I've not tried it on minicluster (there are the kerberos tests with the dummy 
kerberos implementation? Do they help?)

Could we say no kerberos when 'single-user'/'maintenance-mode'?

> Remove table on master related code
> ---
>
> Key: HBASE-25290
> URL: https://issues.apache.org/jira/browse/HBASE-25290
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer, master
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> This could be a start of the HBASE-15549 feature branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235995#comment-17235995
 ] 

Michael Stack commented on HBASE-18070:
---

Merged HBASE-18070 to master branch. Queued #132 on nightly 
[https://ci-hadoop.apache.org/view/HBase/job/HBase/job/HBase%20Nightly/job/master/.|https://ci-hadoop.apache.org/view/HBase/job/HBase/job/HBase%20Nightly/job/master/]
  Lets see what it turns up. Will work on backup failures in meantime. Rebased 
HBASE-18070.branch-2 against branch-2. Put up an amalgamated PR against 
branch-2. Lets see how it buiilds.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, HBASE-18070, HBASE-18070.branch-2, 2.5.0
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25310) HBASE-18070 makes for NPEs in some hbase-backup tests on master

2020-11-20 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25310:
--
Description: Let me work on this as a distinct issue. The hbase-backup 
tests are complex/massive so need to do some study. Its the jdk8 runs that 
fail. 
[https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2643/]
 jdk11 passes.  All pass locally too.  (was: Let me work on this as a distinct 
issue. The hbase-backup tests are complex/massive so need to do some study.)

> HBASE-18070 makes for NPEs in some hbase-backup tests on master
> ---
>
> Key: HBASE-25310
> URL: https://issues.apache.org/jira/browse/HBASE-25310
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Michael Stack
>Priority: Major
>
> Let me work on this as a distinct issue. The hbase-backup tests are 
> complex/massive so need to do some study. Its the jdk8 runs that fail. 
> [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2643/]
>  jdk11 passes.  All pass locally too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235987#comment-17235987
 ] 

Michael Stack commented on HBASE-18070:
---

The first merge patch passed. Runs #2 and #3 turn up NPEs in hbase-backup tests 
(the first run 18 failed, the second run 11 failed). The failures are opaque 
(these backup tests launch multiple hdfs and yarn and then do god knows what). 
Filed HBASE-25310 to work on it. Meantime merging master patch. Will start a 
nightly build after it is in so can see how it does over night.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, HBASE-18070, HBASE-18070.branch-2, 2.5.0
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25310) HBASE-18070 makes for NPEs in some hbase-backup tests on master

2020-11-20 Thread Michael Stack (Jira)
Michael Stack created HBASE-25310:
-

 Summary: HBASE-18070 makes for NPEs in some hbase-backup tests on 
master
 Key: HBASE-25310
 URL: https://issues.apache.org/jira/browse/HBASE-25310
 Project: HBase
  Issue Type: Sub-task
  Components: test
Reporter: Michael Stack


Let me work on this as a distinct issue. The hbase-backup tests are 
complex/massive so need to do some study.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25237) 'hbase master stop' shuts down the cluster, not the master only

2020-11-19 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235928#comment-17235928
 ] 

Michael Stack commented on HBASE-25237:
---

Thank you [~lokiore]  for taking this one on.

> 'hbase master stop' shuts down the cluster, not the master only
> ---
>
> Key: HBASE-25237
> URL: https://issues.apache.org/jira/browse/HBASE-25237
> Project: HBase
>  Issue Type: Improvement
>Reporter: Michael Stack
>Assignee: Lokesh Khurana
>Priority: Major
>
> This is confusing (an operator at place-of-employment shut down a massive 
> cluster mistakenly thinking he was shutting down the master only).
> If I run 'hbase master --help', it says unsupported and then dumps out:
> {code:java}
> org.apache.hbase.thirdparty.org.apache.commons.cli.UnrecognizedOptionException:
>  Unrecognized option: --help
>  at 
> org.apache.hbase.thirdparty.org.apache.commons.cli.Parser.processOption(Parser.java:383)
>  at 
> org.apache.hbase.thirdparty.org.apache.commons.cli.Parser.parse(Parser.java:210)
>  at 
> org.apache.hbase.thirdparty.org.apache.commons.cli.Parser.parse(Parser.java:88)
>  at 
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:89)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  at 
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
>  at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2945)
> Usage: Master [opts] start|stop|clear
>  start Start Master. If local mode, start Master and RegionServer in same JVM
>  stop Start cluster shutdown; Master signals RegionServer shutdown
>  clear Delete the master znode in ZooKeeper after a master crashes
>  where [opts] are:
>  --minRegionServers= Minimum RegionServers needed to host user 
> tables.
>  --localRegionServers= RegionServers to start in master process when 
> in standalone mode.
>  --masters= Masters to start in this process.
>  --backup Master should start in backup mode{code}
>  
> ... so the 'help' is clear that its the cluster that goes down, not just the 
> master – but that is hardly compensates for the unexpected behavior.
>  
> 'hbase master stop' is actually what is used internally you run stop-hbase.sh.
>  
> Stopping the cluster when your run 'hbase master stop' is actually very old 
> behavior. Its still confusing though. We could change this.
> If I run 'hbase regionserver stop', it does this:
> {code:java}
> System.err.println(
>   "To shutdown the regionserver run " +
>   "hbase-daemon.sh stop regionserver or send a kill signal to " +
>   "the regionserver pid");{code}
> I'd think we could make improvement here. We could do something like the RS 
> output if user types 'hbase master stop' at a minimum requiring operator add 
> a '–force' flag if they want the cluster to go down or point them to a new 
> hbase cluster stop/hbase cluster start or point them at stop-hbase.sh.  
> Should also change mode so 'hbase master stop' stops the master and not the 
> cluster (there was no stopMaster when the 'hbase master stop' was added 
> originally – but there is one now).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25127) Enhance PerformanceEvaluation to profile meta replica performance.

2020-11-19 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235875#comment-17235875
 ] 

Michael Stack commented on HBASE-25127:
---

The new master PR is https://github.com/apache/hbase/pull/2682

> Enhance PerformanceEvaluation to profile meta replica performance.
> --
>
> Key: HBASE-25127
> URL: https://issues.apache.org/jira/browse/HBASE-25127
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Clara Xiong
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
> Attachments: Screen Shot 2020-11-13 at 5.30.11 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-25127) Enhance PerformanceEvaluation to profile meta replica performance.

2020-11-19 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235873#comment-17235873
 ] 

Michael Stack edited comment on HBASE-25127 at 11/20/20, 3:40 AM:
--

[~clarax98007] I'm reopening this because I have reverted the change. I made a 
mistake when I committed it. I added [~zhangduo]  as a 'Signed-off-by' when he 
had not. He had a 'requested changes' mark still in place. I pushed the issue 
thinking the 'requested changes' addressed since all commentary had been 
'resolved' but this was a mistake (my mistake). So, let me put up a new PR. Do 
you know what the 'requested changes' are? If not, lets figure them out. I can 
help. Once addressed we can ask [~zhangduo] to take a look. I'm around to help 
on this one. Sorry for the inconvenience.


was (Author: stack):
[~clarax98007] I'm reopening this. I made a mistake when I committed it. I 
added [~zhangduo]  as a 'Signed-off-by' when he had a 'requested changes' mark 
still in place. I pushed the issue thinking the 'requested changes' addressed 
but this was the mistake apparently (my mistake). So, let me put up a new PR. 
Do you know what the 'requested changes' are? If not, lets figure them out. I 
can help. Once addressed we can ask [~zhangduo] to take a look. I'm around to 
help on this one. Sorry for the inconvenience.

> Enhance PerformanceEvaluation to profile meta replica performance.
> --
>
> Key: HBASE-25127
> URL: https://issues.apache.org/jira/browse/HBASE-25127
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Clara Xiong
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
> Attachments: Screen Shot 2020-11-13 at 5.30.11 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-25127) Enhance PerformanceEvaluation to profile meta replica performance.

2020-11-19 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reopened HBASE-25127:
---

[~clarax98007] I'm reopening this. I made a mistake when I committed it. I 
added [~zhangduo]  as a 'Signed-off-by' when he had a 'requested changes' mark 
still in place. I pushed the issue thinking the 'requested changes' addressed 
but this was the mistake apparently (my mistake). So, let me put up a new PR. 
Do you know what the 'requested changes' are? If not, lets figure them out. I 
can help. Once addressed we can ask [~zhangduo] to take a look. I'm around to 
help on this one. Sorry for the inconvenience.

> Enhance PerformanceEvaluation to profile meta replica performance.
> --
>
> Key: HBASE-25127
> URL: https://issues.apache.org/jira/browse/HBASE-25127
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Clara Xiong
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
> Attachments: Screen Shot 2020-11-13 at 5.30.11 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25290) Remove table on master related code

2020-11-19 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235714#comment-17235714
 ] 

Michael Stack commented on HBASE-25290:
---

I forgot about the need to bring up master in maintenance mode. Thanks for the 
recollection.

Standalone master mode is still in need of development so I think there is lots 
of leeway here on how it might be implemented.

We could have a RegionServer host meta in a standalone mode that allowed 
meta-editing but what if it is a procedure that needs fixing or an alignment of 
meta and procedure state that needs reconciliation? These would require the 
Master be involved?

Minicluster starts a Master and a RegionServer in same process. Could we base 
master maintenance mode ('single-user') on a version of minicluster? Then you 
could proceed with your disentangling Master and RS project?

 

 

> Remove table on master related code
> ---
>
> Key: HBASE-25290
> URL: https://issues.apache.org/jira/browse/HBASE-25290
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer, master
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> This could be a start of the HBASE-15549 feature branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25284) Check-in "Enable memstore replication..." design

2020-11-19 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235684#comment-17235684
 ] 

Michael Stack commented on HBASE-25284:
---

I put up a PR which removes the doc currently checked in and replaces it w/ 
pointers to the editable document. Suggest we use this as a stop gap while the 
design doc is again in flux? Doing this we release the RM so he can proceed 
with the 2.4.0RC. When the design doc stabilizes again, we can re-commit its 
state at that time as a pdf replacing the pointer.  This can happen before or 
after the RC IMO.

> Check-in "Enable memstore replication..." design
> 
>
> Key: HBASE-25284
> URL: https://issues.apache.org/jira/browse/HBASE-25284
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Add the design doc under dev-support/design-docs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-19 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235661#comment-17235661
 ] 

Michael Stack commented on HBASE-18070:
---

Merge VOTE passed see _"[RESULT] VOTE: Merge HBASE-18070 "Enable memstore 
replication for meta replica" to master and then back to branch-2" (Was 
"HEAD-UP: Merging HBASE-18070 "Enable memstore replication for meta replica_" 
to master and then back to branch-2")" on the dev mailing list. Updated the 
attached aggregated PR made from current state of HBASE-18070 feature branch to 
pass by hadoopqa.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, HBASE-18070, HBASE-18070.branch-2, 2.5.0
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25284) Check-in "Enable memstore replication..." design

2020-11-19 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235571#comment-17235571
 ] 

Michael Stack commented on HBASE-25284:
---

I granted [~zhangduo]  edit rights.

> Check-in "Enable memstore replication..." design
> 
>
> Key: HBASE-25284
> URL: https://issues.apache.org/jira/browse/HBASE-25284
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Add the design doc under dev-support/design-docs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-18 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234964#comment-17234964
 ] 

Michael Stack commented on HBASE-18070:
---

{quote}I think this is another story? Let's focus on the current problem here.
{quote}
I think different. The 'current problem' is the interaction between you and I 
and not to be found 'here' in this JIRA. This is my last comment here in my 
attempt at moving what I see as the problem out of this JIRA.

On your questions, I'd have referred you to the design doc but I see 
[~huaxiangsun] has answered in-line (Thanks [~huaxiangsun] ).

 

 

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, HBASE-18070, HBASE-18070.branch-2, 2.5.0
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-18070) Enable memstore replication for meta replica

2020-11-17 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233790#comment-17233790
 ] 

Michael Stack edited comment on HBASE-18070 at 11/18/20, 4:50 AM:
--

{quote}If you think you have something that can not be spoken out publicly then 
I'm fine with a zoom call.
{quote}
Suggested it because I thought we'd agreed to go to face-to-face if a problem 
communicating arose post Yu Li's help. That seems to be what is going on here 
(and you put a -1 on top of it to boot).

On [https://github.com/apache/hbase/pull/2644], when I went there, I was late 
to the game, trying to help clean up some crossed-wires around resolved Jira 
but open PRs (or other way round). All conversations were 'resolved'. Your 
'request changes' I presumed a vestige of a resolved conversation. Seemed like 
minor stuff. No intentional dissing on my part. I can reopen if you want. Sorry.

Suggest you not get hung up on my representation. There is a design here with a 
long-standing description of what the work here is about and you have helped 
review the patches that comprise this work. Just use these instead.


was (Author: stack):
{quote}If you think you have something that can not be spoken out publicly then 
I'm fine with a zoom call.
{quote}
Suggested it because I thought we'd agreed to go to face-to-face if a problem 
communicating arose post Yu Li's help. That seems to be what is going on here 
(and you put a -1 on top of it to boot).

On [https://github.com/apache/hbase/pull/2644], when I went there, I was late 
to the game, trying to help clean up some crossed-wires around resolved Jira 
but open PRs (or other way round). All conversations were 'resolved'. Your 
'request changes' I presumed a vestige of a resolved conversation. Seemed like 
minor stuff. No intentional dissing on my part. I can reopen if you want.

Suggest you not get hung up on my representation. There is a design here with a 
long-standing description of what the work here is about and you have helped 
review the patches that comprise this work. Just use these instead.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, HBASE-18070, HBASE-18070.branch-2, 2.5.0
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-17 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233790#comment-17233790
 ] 

Michael Stack commented on HBASE-18070:
---

{quote}If you think you have something that can not be spoken out publicly then 
I'm fine with a zoom call.
{quote}
Suggested it because I thought we'd agreed to go to face-to-face if a problem 
communicating arose post Yu Li's help. That seems to be what is going on here 
(and you put a -1 on top of it to boot).

On [https://github.com/apache/hbase/pull/2644], when I went there, I was late 
to the game, trying to help clean up some crossed-wires around resolved Jira 
but open PRs (or other way round). All conversations were 'resolved'. Your 
'request changes' I presumed a vestige of a resolved conversation. Seemed like 
minor stuff. No intentional dissing on my part. I can reopen if you want.

Suggest you not get hung up on my representation. There is a design here with a 
long-standing description of what the work here is about and you have helped 
review the patches that comprise this work. Just use these instead.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, HBASE-18070, HBASE-18070.branch-2, 2.5.0
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25291) Document how to enable the meta replica load balance mode for the client and clean up around hbase:meta read replicas

2020-11-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25291:
--
Fix Version/s: HBASE-18070.branch-2

> Document how to enable the meta replica load balance mode for the client and 
> clean up around hbase:meta read replicas
> -
>
> Key: HBASE-25291
> URL: https://issues.apache.org/jira/browse/HBASE-25291
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Huaxiang Sun
>Assignee: Michael Stack
>Priority: Major
> Fix For: HBASE-18070, HBASE-18070.branch-2
>
>
> Need to document how to enable meta replica Load Balance mode for clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25291) Document how to enable the meta replica load balance mode for the client and clean up around hbase:meta read replicas

2020-11-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233216#comment-17233216
 ] 

Michael Stack commented on HBASE-25291:
---

Merged to master for now only until HBASE-25126 goes in on backported branch. 
Leaving open.

> Document how to enable the meta replica load balance mode for the client and 
> clean up around hbase:meta read replicas
> -
>
> Key: HBASE-25291
> URL: https://issues.apache.org/jira/browse/HBASE-25291
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Huaxiang Sun
>Assignee: Michael Stack
>Priority: Major
> Fix For: HBASE-18070
>
>
> Need to document how to enable meta replica Load Balance mode for clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25291) Document how to enable the meta replica load balance mode for the client and clean up around hbase:meta read replicas

2020-11-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25291:
--
Fix Version/s: (was: HBASE-18070.branch-2)

> Document how to enable the meta replica load balance mode for the client and 
> clean up around hbase:meta read replicas
> -
>
> Key: HBASE-25291
> URL: https://issues.apache.org/jira/browse/HBASE-25291
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Huaxiang Sun
>Assignee: Michael Stack
>Priority: Major
> Fix For: HBASE-18070
>
>
> Need to document how to enable meta replica Load Balance mode for clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25294) Follow-on: defend against read replicas being enabled for server-side clients

2020-11-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25294:
--
Description: A nice aid for operators would be to spend some time on 
ensuring that at least 'LoadBalance' is not set for the clients that on the 
serverside inside Master in particular. Currently our only defense is 
documentation. An operator might set them for client-side and server-side by 
mistake. Defend server-side clients against this possibility to avoid Master 
making decisions based off stale state.

> Follow-on: defend against read replicas being enabled for server-side clients 
> --
>
> Key: HBASE-25294
> URL: https://issues.apache.org/jira/browse/HBASE-25294
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Priority: Major
>
> A nice aid for operators would be to spend some time on ensuring that at 
> least 'LoadBalance' is not set for the clients that on the serverside inside 
> Master in particular. Currently our only defense is documentation. An 
> operator might set them for client-side and server-side by mistake. Defend 
> server-side clients against this possibility to avoid Master making decisions 
> based off stale state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25294) Follow-on: defend against read replicas being enabled for server-side clients

2020-11-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25294:
--
Environment: (was: Spend some time on ensuring that at least 
'LoadBalance' is not set for the clients that run serverside. Currently our 
only defense is documentation. An operator might set them for client-side and 
server-side by mistake. Defend server-side clients against this possibility to 
avoid Master possibly making decisions based off stale state.)

> Follow-on: defend against read replicas being enabled for server-side clients 
> --
>
> Key: HBASE-25294
> URL: https://issues.apache.org/jira/browse/HBASE-25294
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25294) Follow-on: defend against read replicas being enabled for server-side clients

2020-11-16 Thread Michael Stack (Jira)
Michael Stack created HBASE-25294:
-

 Summary: Follow-on: defend against read replicas being enabled for 
server-side clients 
 Key: HBASE-25294
 URL: https://issues.apache.org/jira/browse/HBASE-25294
 Project: HBase
  Issue Type: Sub-task
 Environment: Spend some time on ensuring that at least 'LoadBalance' 
is not set for the clients that run serverside. Currently our only defense is 
documentation. An operator might set them for client-side and server-side by 
mistake. Defend server-side clients against this possibility to avoid Master 
possibly making decisions based off stale state.
Reporter: Michael Stack






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25291) Document how to enable the meta replica load balance mode for the client and clean up around hbase:meta read replicas

2020-11-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25291:
--
Fix Version/s: HBASE-18070.branch-2
   HBASE-18070

> Document how to enable the meta replica load balance mode for the client and 
> clean up around hbase:meta read replicas
> -
>
> Key: HBASE-25291
> URL: https://issues.apache.org/jira/browse/HBASE-25291
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Huaxiang Sun
>Assignee: Michael Stack
>Priority: Major
> Fix For: HBASE-18070, HBASE-18070.branch-2
>
>
> Need to document how to enable meta replica Load Balance mode for clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25291) Document how to enable the meta replica load balance mode for the client and clean up around hbase:meta read replicas

2020-11-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233144#comment-17233144
 ] 

Michael Stack commented on HBASE-25291:
---

I took this [~huaxiangsun]  I was looking around the hbase:meta read replica 
stuff. It needs an edit and it need to include the warnings [~apurtell]  
suggested for the new addition. I also rolled in here the documentation nits 
suggested by [~bharathv] in his review of the rolled up HBASE-18070 patch.

> Document how to enable the meta replica load balance mode for the client and 
> clean up around hbase:meta read replicas
> -
>
> Key: HBASE-25291
> URL: https://issues.apache.org/jira/browse/HBASE-25291
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Huaxiang Sun
>Assignee: Michael Stack
>Priority: Major
>
> Need to document how to enable meta replica Load Balance mode for clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25291) Document how to enable the meta replica load balance mode for the client and clean up around hbase:meta read replicas

2020-11-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25291:
--
Summary: Document how to enable the meta replica load balance mode for the 
client and clean up around hbase:meta read replicas  (was: Document how to 
enable the meta replica load balance mode for the client)

> Document how to enable the meta replica load balance mode for the client and 
> clean up around hbase:meta read replicas
> -
>
> Key: HBASE-25291
> URL: https://issues.apache.org/jira/browse/HBASE-25291
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> Need to document how to enable meta replica Load Balance mode for clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-25291) Document how to enable the meta replica load balance mode for the client and clean up around hbase:meta read replicas

2020-11-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reassigned HBASE-25291:
-

Assignee: Michael Stack  (was: Huaxiang Sun)

> Document how to enable the meta replica load balance mode for the client and 
> clean up around hbase:meta read replicas
> -
>
> Key: HBASE-25291
> URL: https://issues.apache.org/jira/browse/HBASE-25291
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Huaxiang Sun
>Assignee: Michael Stack
>Priority: Major
>
> Need to document how to enable meta replica Load Balance mode for clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25280) [meta replicas] ArrayIndexOutOfBoundsException in ZKConnectionRegistry

2020-11-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25280:
--
Fix Version/s: (was: 2.4.0)
   (was: 3.0.0-alpha-1)
   HBASE-18070.branch-2
   HBASE-18070

> [meta replicas] ArrayIndexOutOfBoundsException in ZKConnectionRegistry
> --
>
> Key: HBASE-25280
> URL: https://issues.apache.org/jira/browse/HBASE-25280
> Project: HBase
>  Issue Type: Bug
>  Components: meta replicas
>Affects Versions: HBASE-18070
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: HBASE-18070, HBASE-18070.branch-2
>
>
> ITBLL Testing HBASE-18070 feature, [~huaxiangsun] found this:
> {code:java}
> 2020-11-12 19:48:12,358 ERROR org.apache.hadoop.hbase.util.FutureUtils: 
> Unexpected error caught when processing 
> CompletableFuturejava.lang.ArrayIndexOutOfBoundsException: Index 3 out of 
> bounds for length 3 at 
> org.apache.hadoop.hbase.client.ZKConnectionRegistry.lambda$getMetaRegionLocation$2(ZKConnectionRegistry.java:180)
>  at 
> org.apache.hadoop.hbase.util.FutureUtils.lambda$addListener$0(FutureUtils.java:68)
>  at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>  at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>  at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
>  at 
> org.apache.hadoop.hbase.client.ZKConnectionRegistry.lambda$getAndConvert$0(ZKConnectionRegistry.java:78)
>  at 
> org.apache.hadoop.hbase.util.FutureUtils.lambda$addListener$0(FutureUtils.java:68)
>  at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>  at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>  at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
>  at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:174)
>  at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:342)
> at java.base/java.lang.Thread.run(Thread.java:834) {code}
> The code has been this way a long time but his running with four replicas 
> seems to have revealed a race exposed by using replicaid as index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25280) [meta replicas] ArrayIndexOutOfBoundsException in ZKConnectionRegistry

2020-11-16 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25280.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to feature branches, HBASE-18070 and HBASE-18070.branch-2. Thanks for 
reviews [~huaxiangsun]  and [~zhangduo]

> [meta replicas] ArrayIndexOutOfBoundsException in ZKConnectionRegistry
> --
>
> Key: HBASE-25280
> URL: https://issues.apache.org/jira/browse/HBASE-25280
> Project: HBase
>  Issue Type: Bug
>  Components: meta replicas
>Affects Versions: HBASE-18070
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: HBASE-18070, HBASE-18070.branch-2
>
>
> ITBLL Testing HBASE-18070 feature, [~huaxiangsun] found this:
> {code:java}
> 2020-11-12 19:48:12,358 ERROR org.apache.hadoop.hbase.util.FutureUtils: 
> Unexpected error caught when processing 
> CompletableFuturejava.lang.ArrayIndexOutOfBoundsException: Index 3 out of 
> bounds for length 3 at 
> org.apache.hadoop.hbase.client.ZKConnectionRegistry.lambda$getMetaRegionLocation$2(ZKConnectionRegistry.java:180)
>  at 
> org.apache.hadoop.hbase.util.FutureUtils.lambda$addListener$0(FutureUtils.java:68)
>  at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>  at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>  at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
>  at 
> org.apache.hadoop.hbase.client.ZKConnectionRegistry.lambda$getAndConvert$0(ZKConnectionRegistry.java:78)
>  at 
> org.apache.hadoop.hbase.util.FutureUtils.lambda$addListener$0(FutureUtils.java:68)
>  at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>  at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>  at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
>  at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:174)
>  at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:342)
> at java.base/java.lang.Thread.run(Thread.java:834) {code}
> The code has been this way a long time but his running with four replicas 
> seems to have revealed a race exposed by using replicaid as index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232894#comment-17232894
 ] 

Michael Stack commented on HBASE-18070:
---

{quote}bq. ... my 'changes requested' flag is just ignored without confirmation 
and the commit message even said it was signed of by me.
{quote}
Oh. Are you referring to the commit of HBASE-25280 on master? Sorry about that. 
That was premature on my part. It went in ahead of [~huaxiangsun]'s +1 and my 
seeing your nits. Let me fix. Sorry about that. Mistake.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, HBASE-18070, HBASE-18070.branch-2, 2.5.0
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232877#comment-17232877
 ] 

Michael Stack commented on HBASE-18070:
---

[~zhangduo] please remove your veto. Then lets try and zoom call to work 
through our misunderstanding here.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, HBASE-18070, HBASE-18070.branch-2, 2.5.0
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-18070) Enable memstore replication for meta replica

2020-11-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232869#comment-17232869
 ] 

Michael Stack edited comment on HBASE-18070 at 11/16/20, 4:20 PM:
--

{quote}I completely agree with [~zhangduo] on this point. It would be good to 
choose between an "any replica" or "primary first" access pattern. In my 
opinion we could have made that a follow up issue for improvement. I.e. this 
feature is the main work; this alternate mode is a modest increment.
{quote}
I didn't disagree. I encouraged it.

 
{quote}I wish this discussion had taken place closer to when we agreed to hold 
the 2.4 RC.
{quote}
Me too.

Only, there is no discussion here though, just a repeat of the failed 
communication pattern (even same phrasings).

I've agreed to work on Duo's asks. I just wanted to do it after the merge so 
this feature could make 2.4 and so I could shed the burden of keeping two 
feature branches in sync.


was (Author: stack):
{quote}I completely agree with [~zhangduo] on this point. It would be good to 
choose between an "any replica" or "primary first" access pattern. In my 
opinion we could have made that a follow up issue for improvement. I.e. this 
feature is the main work; this alternate mode is a modest increment.
{quote}
I didn't disagree. I encouraged it.

 
{quote}I wish this discussion had taken place closer to when we agreed to hold 
the 2.4 RC.
{quote}
Me too.

Only, there is no discussion here though, just a repeat of failed the 
communication pattern (even same phrasings).

I've agreed to work on Duo's asks. I just wanted to do it after the merge so 
this feature could make 2.4 and so I could shed the burden of keeping two 
feature branches in sync.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, HBASE-18070, HBASE-18070.branch-2, 2.5.0
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-16 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232869#comment-17232869
 ] 

Michael Stack commented on HBASE-18070:
---

{quote}I completely agree with [~zhangduo] on this point. It would be good to 
choose between an "any replica" or "primary first" access pattern. In my 
opinion we could have made that a follow up issue for improvement. I.e. this 
feature is the main work; this alternate mode is a modest increment.
{quote}
I didn't disagree. I encouraged it.

 
{quote}I wish this discussion had taken place closer to when we agreed to hold 
the 2.4 RC.
{quote}
Me too.

Only, there is no discussion here though, just a repeat of failed the 
communication pattern (even same phrasings).

I've agreed to work on Duo's asks. I just wanted to do it after the merge so 
this feature could make 2.4 and so I could shed the burden of keeping two 
feature branches in sync.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, HBASE-18070, HBASE-18070.branch-2, 2.5.0
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-18070) Enable memstore replication for meta replica

2020-11-15 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232550#comment-17232550
 ] 

Michael Stack edited comment on HBASE-18070 at 11/16/20, 6:03 AM:
--

{quote}Where is the ‘HedgeRead’ word in my comments above? I was also talking 
about the newly added load balance mode of meta location look up.
{quote}
 

I use 'hedged read' as a shorthand for what you describe as: "The general read 
replica feature is not designed for performance either, it is designed for 
availability. And for meta replica, I think what we need is to verify that, our 
cluster could still be functional under heavy load on loaction lookup while the 
cluster without this feature on will be dead."; i.e. a feature that has been in 
the code base for years.

As I read it, you want a revisit of the original 'general read replica' 
feature; proof it improves availability when the work here takes it as a basis, 
building on "HBASE-10070 _HBase read high-availability using 
timeline-consistent region replicas_" though this projet is about adding 
liveness on existing hbase:meta read replica with a concern for load 
distribution.

Regards tests for the 'general read replica', there are unit tests as you know 
and it is enabled at my place of work apparently effective regards HA (as 
described in an old hbasecon talk here 
[https://www.youtube.com/watch?v=l6S-Vbs9WsU).]

I'll be happy to work on the test you suggest but do not think it should hold 
up our committing this work. -1 if you disagree.

 
{quote}{quote}And I think here we could do much better than ‘HedgeRead’ 
solution.
{quote}{quote}
Sounds good. Lets make a follow-on issue?

 

 


was (Author: stack):
{quote}Where is the ‘HedgeRead’ word in my comments above? I was also talking 
about the newly added load balance mode of meta location look up.
{quote}
 

I use 'hedged read' as a shorthand for what you describe as: "The general read 
replica feature is not designed for performance either, it is designed for 
availability. And for meta replica, I think what we need is to verify that, our 
cluster could still be functional under heavy load on loaction lookup while the 
cluster without this feature on will be dead."; i.e. a feature that has been in 
the code base for years.

As I read it, you want a revisit of the original 'general read replica' 
feature; proof it improves availability when the work here takes it as a basis, 
building on "HBASE-10070 _HBase read high-availability using 
timeline-consistent region replicas_" adding liveness on existing hbase:meta 
read replica with a concern for load distribution.

Regards tests, there are unit tests as you know and it is enabled at my place 
of work apparently effective regards HA (as described in an old hbasecon talk 
here [https://www.youtube.com/watch?v=l6S-Vbs9WsU).]

I'll be happy to work on the test you suggest but do not think it should hold 
up our committing this work. -1 if you disagree.

 
{quote}bq.And I think here we could do much better than ‘HedgeRead’ solution.
{quote}
Sounds good. Lets make a follow-on issue?

 

 

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0, HBASE-18070, HBASE-18070.branch-2
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-15 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232550#comment-17232550
 ] 

Michael Stack commented on HBASE-18070:
---

{quote}Where is the ‘HedgeRead’ word in my comments above? I was also talking 
about the newly added load balance mode of meta location look up.
{quote}
 

I use 'hedged read' as a shorthand for what you describe as: "The general read 
replica feature is not designed for performance either, it is designed for 
availability. And for meta replica, I think what we need is to verify that, our 
cluster could still be functional under heavy load on loaction lookup while the 
cluster without this feature on will be dead."; i.e. a feature that has been in 
the code base for years.

As I read it, you want a revisit of the original 'general read replica' 
feature; proof it improves availability when the work here takes it as a basis, 
building on "HBASE-10070 _HBase read high-availability using 
timeline-consistent region replicas_" adding liveness on existing hbase:meta 
read replica with a concern for load distribution.

Regards tests, there are unit tests as you know and it is enabled at my place 
of work apparently effective regards HA (as described in an old hbasecon talk 
here [https://www.youtube.com/watch?v=l6S-Vbs9WsU).]

I'll be happy to work on the test you suggest but do not think it should hold 
up our committing this work. -1 if you disagree.

 
{quote}bq.And I think here we could do much better than ‘HedgeRead’ solution.
{quote}
Sounds good. Lets make a follow-on issue?

 

 

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0, HBASE-18070, HBASE-18070.branch-2
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-15 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232521#comment-17232521
 ] 

Michael Stack commented on HBASE-18070:
---

{quote}This meta replica Load Balance mode should increase throughput as well. 
The previous design is for High availability, so it is not for throughput. In 
this new mode, load balance is introduced so read is offloaded mostly from 
primary meta and distributed among multiple meta replica regions. In case of 
the newly PE test, all location data is static, so if there are enough clients 
does location lookup, it could exhaust the non-meta-replica region server, with 
meta replica LB, it can support n(replica regon #) * read requests/second.
{quote}
[~huaxiangsun] yes... if enough load but we don't seem to be driving enough 
load.. we seem bottlenecked on client, but improving perf is secondary?

 
{quote}With that said, the PE test should not increase primary read numbers at 
all (there is no fall-back-to-primary path in the test), I will take a look 
tomorrow to see why there are so many reads going through the primary meta 
region, could be something wrong in the path.
{quote}
Yes, would be good to figure where the primary load is coming from. TODO.

 

[~zhangduo]
{quote}First you said you do not need to prove that the meta lookup request has 
been distributed to meta replicas, as it has been done by the Meta Read 
Replicas feature which has been done years ago.
{quote}
The 'hedged read' scenario has been in place forever, yes.

 
{quote}Then you said the feature is about distributing load of meta, which 
directly objects your first argument?
{quote}
 

Distributing load is the 'LoadBalance' configuration, which is new and not the 
'HedgedRead'.

 
{quote}Is it because my English is too bad? Could anyone help me to understand 
better? Thanks a lot.
{quote}
Keep asking questions. I'll try and explain. My impression is that you are want 
us to make reports around HA which would be good to have but not the thrust of 
this issue. I think it would be good to do but don't want it to be obstacle to 
commit. Thanks.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0, HBASE-18070, HBASE-18070.branch-2
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-18070) Enable memstore replication for meta replica

2020-11-15 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232521#comment-17232521
 ] 

Michael Stack edited comment on HBASE-18070 at 11/16/20, 4:49 AM:
--

{quote}This meta replica Load Balance mode should increase throughput as well. 
The previous design is for High availability, so it is not for throughput. In 
this new mode, load balance is introduced so read is offloaded mostly from 
primary meta and distributed among multiple meta replica regions. In case of 
the newly PE test, all location data is static, so if there are enough clients 
does location lookup, it could exhaust the non-meta-replica region server, with 
meta replica LB, it can support n(replica regon #) * read requests/second.
{quote}
[~huaxiangsun] yes... if enough load but we don't seem to be driving enough 
load.. we seem bottlenecked on client, but improving perf is secondary?

 
{quote}With that said, the PE test should not increase primary read numbers at 
all (there is no fall-back-to-primary path in the test), I will take a look 
tomorrow to see why there are so many reads going through the primary meta 
region, could be something wrong in the path.
{quote}
Yes, would be good to figure where the primary load is coming from. TODO.

 

[~zhangduo]
{quote}First you said you do not need to prove that the meta lookup request has 
been distributed to meta replicas, as it has been done by the Meta Read 
Replicas feature which has been done years ago.
{quote}
The 'hedged read' scenario has been in place forever, yes.

 
{quote}Then you said the feature is about distributing load of meta, which 
directly objects your first argument?
{quote}
 

Distributing load is the 'LoadBalance' configuration, which is new and not the 
'HedgedRead'.

 
{quote}Is it because my English is too bad? Could anyone help me to understand 
better? Thanks a lot.
{quote}
Keep asking questions. I'll try and explain. I may not be doing a good job of 
it. My impression is that you are want us to make reports around HA which would 
be good to have but not the thrust of this issue. I think it would be good to 
do but don't want it to be obstacle to commit. Thanks.


was (Author: stack):
{quote}This meta replica Load Balance mode should increase throughput as well. 
The previous design is for High availability, so it is not for throughput. In 
this new mode, load balance is introduced so read is offloaded mostly from 
primary meta and distributed among multiple meta replica regions. In case of 
the newly PE test, all location data is static, so if there are enough clients 
does location lookup, it could exhaust the non-meta-replica region server, with 
meta replica LB, it can support n(replica regon #) * read requests/second.
{quote}
[~huaxiangsun] yes... if enough load but we don't seem to be driving enough 
load.. we seem bottlenecked on client, but improving perf is secondary?

 
{quote}With that said, the PE test should not increase primary read numbers at 
all (there is no fall-back-to-primary path in the test), I will take a look 
tomorrow to see why there are so many reads going through the primary meta 
region, could be something wrong in the path.
{quote}
Yes, would be good to figure where the primary load is coming from. TODO.

 

[~zhangduo]
{quote}First you said you do not need to prove that the meta lookup request has 
been distributed to meta replicas, as it has been done by the Meta Read 
Replicas feature which has been done years ago.
{quote}
The 'hedged read' scenario has been in place forever, yes.

 
{quote}Then you said the feature is about distributing load of meta, which 
directly objects your first argument?
{quote}
 

Distributing load is the 'LoadBalance' configuration, which is new and not the 
'HedgedRead'.

 
{quote}Is it because my English is too bad? Could anyone help me to understand 
better? Thanks a lot.
{quote}
Keep asking questions. I'll try and explain. My impression is that you are want 
us to make reports around HA which would be good to have but not the thrust of 
this issue. I think it would be good to do but don't want it to be obstacle to 
commit. Thanks.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0, HBASE-18070, HBASE-18070.branch-2
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-15 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232501#comment-17232501
 ] 

Michael Stack commented on HBASE-18070:
---

{quote}I do not think the approach for meta replica is for performance?
{quote}
That is right.

 

(I just observe that there no perf improvement nor regression).

 
{quote}The general read replica feature is not designed for performance either, 
it is designed for availability.
{quote}
That is right.

 
{quote}And for meta replica, I think what we need is to verify that, our 
cluster could still be functional under heavy load on loaction lookup while the 
cluster without this feature on will be dead.
{quote}
I do not think we need to prove that here.  Meta Read Replicas have been in 
place for years. Commit of this feature does not need to revisit justification 
of the original read replica feature.

This feature is about distributing load on hbase:meta. See tail of HBASE-25127 
for illustration it is effective on this front.

 
{quote}So I think what we need to test here, is to write a special test which 
will clearRegionLocationCache everytime before issuing the actual read/write 
scan request, to simulate bad client behavior. The cluster without this feature 
should be dead soon and the cluster with this feature should perform much 
better. Of course if you put too many loads then no cluster could be alive, but 
we only need to prove that, we can perform better.
{quote}
Hasn't this been done justifying original submission of read replica feature? 
Why repeat?

 

 

 

 

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0, HBASE-18070, HBASE-18070.branch-2
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-14 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232160#comment-17232160
 ] 

Michael Stack commented on HBASE-18070:
---

Sorry for being unclear [~apurtell] . Was thinking of merging to branch-2 just 
after merging to master (Monday night I hope). I put up notice on dev list this 
morning. That work sir?

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0, HBASE-18070, HBASE-18070.branch-2
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25127) Enhance PerformanceEvaluation to profile meta replica performance.

2020-11-14 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25127:
--
Fix Version/s: (was: 2.4.0)

> Enhance PerformanceEvaluation to profile meta replica performance.
> --
>
> Key: HBASE-25127
> URL: https://issues.apache.org/jira/browse/HBASE-25127
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Clara Xiong
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
> Attachments: Screen Shot 2020-11-13 at 5.30.11 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25127) Enhance PerformanceEvaluation to profile meta replica performance.

2020-11-14 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232120#comment-17232120
 ] 

Michael Stack commented on HBASE-25127:
---

Oh, reverted from branch-2 for now. It fails to compile. We can deal w/ 
backport later. Removing 2.4.0 from fix version for now.

> Enhance PerformanceEvaluation to profile meta replica performance.
> --
>
> Key: HBASE-25127
> URL: https://issues.apache.org/jira/browse/HBASE-25127
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Clara Xiong
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
> Attachments: Screen Shot 2020-11-13 at 5.30.11 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25127) Enhance PerformanceEvaluation to profile meta replica performance.

2020-11-14 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232112#comment-17232112
 ] 

Michael Stack commented on HBASE-25127:
---

I merged the associated PR to master and branch-2.

> Enhance PerformanceEvaluation to profile meta replica performance.
> --
>
> Key: HBASE-25127
> URL: https://issues.apache.org/jira/browse/HBASE-25127
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Clara Xiong
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
> Attachments: Screen Shot 2020-11-13 at 5.30.11 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25127) Enhance PerformanceEvaluation to profile meta replica performance.

2020-11-14 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232088#comment-17232088
 ] 

Michael Stack commented on HBASE-25127:
---

Do you have a picture of the UI read requests showing how the reads are 
distributed across replicas and primary [~clarax98007] ?

(Generally we don't resolve until the PR lands, FYI – let me merge it)

> Enhance PerformanceEvaluation to profile meta replica performance.
> --
>
> Key: HBASE-25127
> URL: https://issues.apache.org/jira/browse/HBASE-25127
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Huaxiang Sun
>Assignee: Clara Xiong
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
> Attachments: Screen Shot 2020-11-13 at 5.30.11 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25284) Check-in "Enable memstore replication..." design

2020-11-14 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25284.
---
Fix Version/s: 2.4.0
   3.0.0-alpha-1
 Assignee: Michael Stack
   Resolution: Fixed

Pushed to master and branch-2.

> Check-in "Enable memstore replication..." design
> 
>
> Key: HBASE-25284
> URL: https://issues.apache.org/jira/browse/HBASE-25284
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Add the design doc under dev-support/design-docs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25284) Check-in "Enable memstore replication..." design

2020-11-14 Thread Michael Stack (Jira)
Michael Stack created HBASE-25284:
-

 Summary: Check-in "Enable memstore replication..." design
 Key: HBASE-25284
 URL: https://issues.apache.org/jira/browse/HBASE-25284
 Project: HBase
  Issue Type: Sub-task
Reporter: Michael Stack


Add the design doc under dev-support/design-docs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25283) Undo meta replica specialization inside of the region replica framework

2020-11-14 Thread Michael Stack (Jira)
Michael Stack created HBASE-25283:
-

 Summary: Undo meta replica specialization inside of the region 
replica framework
 Key: HBASE-25283
 URL: https://issues.apache.org/jira/browse/HBASE-25283
 Project: HBase
  Issue Type: Improvement
  Components: read replicas
Reporter: Michael Stack


Filing an issue to capture a [~zhangduo] suggestion made in PR that was then 
carried to a comment over in the design document attached to HBASE-18070 only 
it is deserving of its own issue.
{quote} 

I do not like that we have a specialized implementation for meta replica inside 
the region replica framework. They are almost the same so they should share the 
same code base. I think this enhancement could also be applied to normal 
RegionReplicaReplicationEndpoint, what we need is the config for the max 
distance between different secondary replicas. For normal region replicas, the 
value will be small, which means we will soon block the replication if any 
replicas are slow, to save memory. For meta replicas, we could have a large 
default value to let the good replicas catch up fast.
{quote}
 

Lets use this issue to fill out more on what [~zhangduo] suggests above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-13 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231901#comment-17231901
 ] 

Michael Stack commented on HBASE-18070:
---

A small itbll run (600M, two loops of generate and verify) on a 10 node cluster 
completed/verified and a performance comparison that shows hbase:meta read load 
distributed over replicas with no regression – but no real gain in perf 
(interesting) – will be added to HBASE-25127 in a while. After HBASE-25255 
lands – a bug found testing – I'll update the merge patch and put up a notice 
for merge to master on the dev list tomorrow (sat) hopefully. Will work on the 
merge patch for branch-2 concurrently (the itbll was run against the branch-2 
backported merge patch).  Hope to merge that just after the master patch lands. 
Will continue with bigger itbll runs in the meantime and will spend more time 
studying how hbase:meta load is currently distributed... it is even at the 
moment when we'd expect the replicas to get more (system reads seem to make up 
good portion of the primary load; TBD).

We seem to be still on for our Monday target of merge to master and branch-2.

[~huaxiangsun] a doc/improvement Jira for how to configure client/and 
not-server-side clients is needed?

 

 

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0, HBASE-18070, HBASE-18070.branch-2
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25280) [meta replicas] ArrayIndexOutOfBoundsException in ZKConnectionRegistry

2020-11-12 Thread Michael Stack (Jira)
Michael Stack created HBASE-25280:
-

 Summary: [meta replicas] ArrayIndexOutOfBoundsException in 
ZKConnectionRegistry
 Key: HBASE-25280
 URL: https://issues.apache.org/jira/browse/HBASE-25280
 Project: HBase
  Issue Type: Bug
  Components: meta replicas
Affects Versions: HBASE-18070
Reporter: Michael Stack
Assignee: Michael Stack
 Fix For: 3.0.0-alpha-1, 2.4.0


ITBLL Testing HBASE-18070 feature, [~huaxiangsun] found this:
{code:java}
2020-11-12 19:48:12,358 ERROR org.apache.hadoop.hbase.util.FutureUtils: 
Unexpected error caught when processing 
CompletableFuturejava.lang.ArrayIndexOutOfBoundsException: Index 3 out of 
bounds for length 3 at 
org.apache.hadoop.hbase.client.ZKConnectionRegistry.lambda$getMetaRegionLocation$2(ZKConnectionRegistry.java:180)
 at 
org.apache.hadoop.hbase.util.FutureUtils.lambda$addListener$0(FutureUtils.java:68)
 at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
 at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
 at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
 at 
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
 at 
org.apache.hadoop.hbase.client.ZKConnectionRegistry.lambda$getAndConvert$0(ZKConnectionRegistry.java:78)
 at 
org.apache.hadoop.hbase.util.FutureUtils.lambda$addListener$0(FutureUtils.java:68)
 at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
 at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
 at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
 at 
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
 at 
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:174)
 at 
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:342)
at java.base/java.lang.Thread.run(Thread.java:834) {code}
The code has been this way a long time but his running with four replicas seems 
to have revealed a race exposed by using replicaid as index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-11 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230245#comment-17230245
 ] 

Michael Stack commented on HBASE-18070:
---

[~apurtell] yes. Status on merge/no-merge by end of the week.

 

Sorry this taking so long. Merge to master is ready but waiting on PE and ITBLL 
reports before I ask on dev list (ITBLL needs all backported to branch-2 as we 
are unsure of state of master when it comes to ITBLL so bit of juggling going 
on not to mention branch-2 is different in the client from master – [~zhangduo] 
 is helping with this part which should speed the result).

 

Plan:

 * PE Report on effectiveness of new feature (currently underway)

 * ITBLL Report on our retaining correctness w/ this feature enabled (gated on 
finishing {color:#1d1c1d}HBASE-25272{color} but that should be tomorrow morning 
– the backport to branch-2 is done... just need to get a CI run in)

 * If ITBLL and PE reports are good (friday), will put up notice of merge to 
master (don't think we have to wait for a vote to run) on friday/saturday.

 * After 24hours in case of objection, will merge to master and at same time 
merge to branch-2 (monday?).

 

Thanks [~apurtell]

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0, HBASE-18070, HBASE-18070.branch-2
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25266) [hbase-operator-tools] Add a repair tool for moving stale regions dir not present in meta away from table dir

2020-11-11 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230160#comment-17230160
 ] 

Michael Stack commented on HBASE-25266:
---

[~wchevreuil] I've not started work but would be game. It keeps coming up here. 
I think a hbck2 command 'adoptOrphans' that took a list of one or more encoded 
region names.  Recently I messed w/ the walplayer to make it do load of a 
bunch of recovered.edits files under a bunch of regions... and can say that 
works. And bulk load is well known. The 'adoptOrphans' would just be bulk load 
all hfiles, then any recovered.edits, then delete the orphan dir. Probably 
should check that the passed region is indeed an orphan first requiring a 
'force' if operator wants to override. Something like that?

 

> [hbase-operator-tools] Add a repair tool for moving stale regions dir not 
> present in meta away from table dir
> -
>
> Key: HBASE-25266
> URL: https://issues.apache.org/jira/browse/HBASE-25266
> Project: HBase
>  Issue Type: New Feature
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>
> This adds a new tool under *hbase-tools* module, that allows for moving aways 
> regions dirs existing under table's hdfs dir, but not in meta. This is useful 
> in cases where the region is not present in meta, but still has data on hdfs, 
> yet no holes in the table region chain is detected. 
> On such cases, the existing *HBCK2 addFsRegionsMissingInMeta* command isn't 
> ideal, as it would bring the region back in meta and cause overlaps. 
> This tool performs the following actions:
> 1) Identifies regions in hdfs but not in meta using 
> *HBCK2.reportTablesWithMissingRegionsInMeta*;
> 2) For each of these regions, sidelines the related dir to a temp folder;
> 3) Bulkload hfiles from each sidelined region to the related table;
> Sidelined regions are never removed from temp folder. Operators should remove 
> those manually, after they certified on data integrity. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-11 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230095#comment-17230095
 ] 

Michael Stack commented on HBASE-18070:
---

Yesterday, I put up a PR that is an amalgamation of all of the patches that 
make the HBASE-18070 feature. The unit tests all passed (see attached PR). 
Waiting now on reports back on PE and ITBLL evaluations before putting up 
notice on dev list on merge-to-master.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0, HBASE-18070, HBASE-18070.branch-2
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24632) Enable procedure-based log splitting as default in hbase3

2020-11-10 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229682#comment-17229682
 ] 

Michael Stack commented on HBASE-24632:
---

{quote}[~stack] [~anoop.hbase] The zk based log splitting is only a internal 
implenation. Can we purge them out in master branch and no need wait to 4.0.0?
{quote}
It is on by default in 2.4.0. I was thinking it should stay in place for a 
while in case we find a problem in procedure-based log splitting. Perhaps we 
figure out if procedure-based log splitting is stable in 2.4 + 2.5  So, 
purging from trunk/branch-3 would be ok?

> Enable procedure-based log splitting as default in hbase3
> -
>
> Key: HBASE-24632
> URL: https://issues.apache.org/jira/browse/HBASE-24632
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Means changing this value in HConstants to false:
>public static final boolean DEFAULT_HBASE_SPLIT_COORDINATED_BY_ZK = true;
> Should probably also deprecate the current zk distributed split too so we can 
> clear out those classes to.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25266) [hbase-operator-tools] Add a repair tool for moving stale regions dir not present in meta away from table dir

2020-11-10 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229421#comment-17229421
 ] 

Michael Stack commented on HBASE-25266:
---

I was thinking HBASE-25159 [~wchevreuil] ? It'd bulk load any hfiles and 
walplay any recovered.edits and then delete the dir?

> [hbase-operator-tools] Add a repair tool for moving stale regions dir not 
> present in meta away from table dir
> -
>
> Key: HBASE-25266
> URL: https://issues.apache.org/jira/browse/HBASE-25266
> Project: HBase
>  Issue Type: New Feature
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>
> This adds a new tool under *hbase-tools* module, that allows for moving aways 
> regions dirs existing under table's hdfs dir, but not in meta. This is useful 
> in cases where the region is not present in meta, but still has data on hdfs, 
> yet no holes in the table region chain is detected. 
> On such cases, the existing *HBCK2 addFsRegionsMissingInMeta* command isn't 
> ideal, as it would bring the region back in meta and cause overlaps. 
> This tool performs the following actions:
> 1) Identifies regions in hdfs but not in meta using 
> *HBCK2.reportTablesWithMissingRegionsInMeta*;
> 2) For each of these regions, sidelines the related dir to a temp folder;
> 3) Bulkload hfiles from each sidelined region to the related table;
> Sidelined regions are never removed from temp folder. Operators should remove 
> those manually, after they certified on data integrity. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25260) upgrading hbase from 2.0.6 to 2.1.1, HMaster failed to become active because it cannot find hbase:namespace table

2020-11-09 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228995#comment-17228995
 ] 

Michael Stack commented on HBASE-25260:
---

Would be good to see all of the log since startup, not just a snippet.

It looks like meta might be online given we are able to migrate table state... 
but odd that we can't find the hbase:namespace table in hbase:meta – the data 
was in good health pre-upgrade?

Can you upgrade to hbase-2.3.x instead of 2.1.x? It is our stable offering. The 
tooling to fix issues is also much better than it was back on 2.1.1 (2.1.1 is 
no longer maintained by the community). Thanks.

> upgrading hbase from 2.0.6 to 2.1.1, HMaster failed to become active because 
> it cannot find hbase:namespace table
> -
>
> Key: HBASE-25260
> URL: https://issues.apache.org/jira/browse/HBASE-25260
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.1, 2.0.6
>Reporter: Yongle Zhang
>Priority: Major
> Attachments: hmaster.log
>
>
> When we upgraded HBASE cluster from 2.0.6 to 2.1.1, the HMaster on upgraded 
> node failed to start.
> Some stack trace in the error log:
> {code:java}
> 2020-11-06 02:01:26,420 WARN  [PEWorker-12] 
> assignment.RegionTransitionProcedure: Failed transition, suspend 1secs 
> pid=12, ppid=9, state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; 
> AssignProcedure table=TestTable, region=37d62d2c1934da269a592e0e5cbca82a; 
> rit=OFFLINE, location=null; waiting on rectified condition fixed by other 
> Procedure or operator intervention
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> TestTable
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignProcedure.assign(AssignProcedure.java:194)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignProcedure.startTransition(AssignProcedure.java:205)
>   at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:355)
>   at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:957)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1835)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1595)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:80)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2140)
> {code}
> Seems it's caused by not able to find hbase:namespace table after upgrade: 
> {code:java}
> 2020-11-06 02:01:26,791 ERROR [master/399fd6ca0c6d:16000:becomeActiveMaster] 
> master.HMaster: Master server abort: loaded coprocessors are: []
> 2020-11-06 02:01:26,791 ERROR [master/399fd6ca0c6d:16000:becomeActiveMaster] 
> master.HMaster: * ABORTING master 399fd6ca0c6d,16000,1604628075265: 
> Unhandled exception. Starting shutdown. *
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
>   at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
>   at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
>   at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1253)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1031)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2254)
>   at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.TableNotFoundException: hbase:namespace
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:864)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:759)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:745)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
>   at 
> 

[jira] [Updated] (HBASE-18070) Enable memstore replication for meta replica

2020-11-06 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-18070:
--
Fix Version/s: HBASE-18070
   HBASE-18070.branch-2

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0, HBASE-18070, HBASE-18070.branch-2
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-11-06 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227594#comment-17227594
 ] 

Michael Stack commented on HBASE-18070:
---

[~huaxiangsun] is about to land the client-side changes on branch HBASE-18070. 
That makes the feature 'complete' for now (the new endpoint will be done as a 
follow-on). [~huaxiangsun] wants to do more testing – PE tests to make a report 
on effectiveness and some IT runs to prove correctness. Issue is that IT on 
master-branch are in unknown state. Given we want to backport this feature to 
branch-2 anyways (hopefully in time for 2.4 and failing that, 2.5 <= 
[~apurtell] ), we're going to make a new branch HBASE-18070 branch that is 
based on branch-2 – HBASE-18070.branch-2 – and backport HBASE-18070 there out 
of which the PE and IT tests will be run.

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25238) Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing required fields: state”

2020-11-05 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227117#comment-17227117
 ] 

Michael Stack commented on HBASE-25238:
---

[~Zhuqi1108] There is no release w/ this fix in it yet. If you need it now, try 
making a build from the tip of branch-2.3.

> Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing 
> required fields: state”
> -
>
> Key: HBASE-25238
> URL: https://issues.apache.org/jira/browse/HBASE-25238
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Zhuqi Jin
>Assignee: Michael Stack
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.7, 2.3.4
>
>
> When we upgraded HBASE cluster from 2.2.0-RC0 to 2.3.0 or 2.3.3, the HMaster 
> on upgraded node failed to start.
> The error message is shown below: 
> {code:java}
> 2020-11-02 23:04:01,998 ERROR [master/2c4006997f99:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active 
> masterorg.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:125)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:228)  
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:124)
>    at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.deserializeStateData(RegionRemoteProcedureBase.java:352)
>    at 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure.deserializeStateData(OpenRegionProcedure.java:72)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:294)
>    at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>    at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore$1.load(RegionProcedureStore.java:194)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore$2.load(WALProcedureStore.java:474)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:151)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:103)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:465)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.tryMigrate(RegionProcedureStore.java:184)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.recoverLease(RegionProcedureStore.java:257)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:587)
>    at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1572)
>    at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:950)
>    at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2240)
>    at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:622) 
>   at java.lang.Thread.run(Thread.java:748)2020-11-02 23:04:01,998 ERROR 
> [master/2c4006997f99:16000:becomeActiveMaster] master.HMaster: * ABORTING 
> master 2c4006997f99,16000,1604358237412: Unhandled exception. Starting 
> shutdown. 
> *org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>  

[jira] [Resolved] (HBASE-25238) Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing required fields: state”

2020-11-05 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25238.
---
Fix Version/s: 2.3.4
   2.2.7
   2.4.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Release Note: Fixes master procedure store migration issues going from 
2.0.x to 2.2.x and/or 2.3.x. Also fixes failed heartbeat parse during rolling 
upgrade from 2.0.x. to 2.3.x.
 Assignee: Michael Stack
   Resolution: Fixed

Merged to 2.2+ (half of the patch only went into 2.2 – full patch elsewhere). 
Thanks for review [~vjasani]

> Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing 
> required fields: state”
> -
>
> Key: HBASE-25238
> URL: https://issues.apache.org/jira/browse/HBASE-25238
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Zhuqi Jin
>Assignee: Michael Stack
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.7, 2.3.4
>
>
> When we upgraded HBASE cluster from 2.0.0-RC0 to 2.3.0 or 2.3.3, the HMaster 
> on upgraded node failed to start.
> The error message is shown below: 
> {code:java}
> 2020-11-02 23:04:01,998 ERROR [master/2c4006997f99:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active 
> masterorg.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:125)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:228)  
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:124)
>    at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.deserializeStateData(RegionRemoteProcedureBase.java:352)
>    at 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure.deserializeStateData(OpenRegionProcedure.java:72)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:294)
>    at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>    at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore$1.load(RegionProcedureStore.java:194)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore$2.load(WALProcedureStore.java:474)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:151)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:103)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:465)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.tryMigrate(RegionProcedureStore.java:184)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.recoverLease(RegionProcedureStore.java:257)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:587)
>    at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1572)
>    at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:950)
>    at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2240)
>    at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:622) 
>   at java.lang.Thread.run(Thread.java:748)2020-11-02 23:04:01,998 ERROR 
> [master/2c4006997f99:16000:becomeActiveMaster] master.HMaster: * ABORTING 
> master 2c4006997f99,16000,1604358237412: Unhandled exception. Starting 
> shutdown. 
> *org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> 

[jira] [Comment Edited] (HBASE-25234) [Upgrade]Incompatibility in reading RS report from 2.1 RS when Master is upgraded to a version containing HBASE-21406

2020-11-05 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226829#comment-17226829
 ] 

Michael Stack edited comment on HBASE-25234 at 11/5/20, 5:41 PM:
-

Fixed by HBASE-25238


was (Author: stack):
Pushed on branch-2.3+. Applied half of the patch to branch-2.2 (the change in 
clusterreport wasn't added till 2.3).

> [Upgrade]Incompatibility in reading RS report from 2.1 RS when Master is 
> upgraded to a version containing HBASE-21406
> -
>
> Key: HBASE-25234
> URL: https://issues.apache.org/jira/browse/HBASE-25234
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Sanjeet Nishad
>Assignee: Sanjeet Nishad
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.7, 2.3.4
>
>
> While upgrading to a version having HBASE-21406 and following the upgrade 
> process suggested in HBASE-21075, after Master is upgraded, the following 
> exception is observed while reading the rs report from old region servers :
> {code:java}
> 2020-11-02 18:25:30,303 WARN [RS-EventLoopGroup-1-2] ipc.RpcServer: 
> /x.x.x.x:16000 is unable to read call parameter from client x.x.x.x
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException:
>  Message missing required fields: load.replLoadSink.timestampStarted, 
> load.replLoadSink.totalOpsProcessed
>  at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:477)
>  at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerReportRequest$Builder.build(RegionServerStatusProtos.java:2411)
>  at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerReportRequest$Builder.build(RegionServerStatusProtos.java:2349)
>  at 
> org.apache.hadoop.hbase.ipc.ServerRpcConnection.processRequest(ServerRpcConnection.java:654)
>  at 
> org.apache.hadoop.hbase.ipc.ServerRpcConnection.processOneRpc(ServerRpcConnection.java:458)
>  at 
> org.apache.hadoop.hbase.ipc.ServerRpcConnection.saslReadAndProcess(ServerRpcConnection.java:351)
>  at 
> org.apache.hadoop.hbase.ipc.NettyServerRpcConnection.process(NettyServerRpcConnection.java:92)
>  at 
> org.apache.hadoop.hbase.ipc.NettyServerRpcConnection.process(NettyServerRpcConnection.java:68)
>  at 
> org.apache.hadoop.hbase.ipc.NettyRpcServerRequestDecoder.channelRead(NettyRpcServerRequestDecoder.java:62)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>  at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321)
>  at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>  at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>  at 
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  at 
> 

[jira] [Commented] (HBASE-25116) RegionMonitor support RegionTask count normalize

2020-11-05 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226881#comment-17226881
 ] 

Michael Stack commented on HBASE-25116:
---

What is about the canary access that slows user-requests?

 

Should these options be made into command-line options for the Canary tool 
rather than internal configs?

 

We are playing w/ 'task' counts in the patch. A task maps to a Region?  Would 
it help if we talked of sampling rather than task counts? A command-line option 
that took a sample float? If you passed --sample=0.1 or --table_sample=0.1 (or 
-Dcanary.sample.. ) on the command-line, would that be easier on the 
operator? It would make the feature easier to find if it showed in the canary 
--help usage?

> RegionMonitor support RegionTask count normalize
> 
>
> Key: HBASE-25116
> URL: https://issues.apache.org/jira/browse/HBASE-25116
> Project: HBase
>  Issue Type: Improvement
>Reporter: niuyulin
>Assignee: niuyulin
>Priority: Minor
>
> large count of region task from canary may affect user normal request, 
> meanwhile if region task is few, the  availability monitoring may shake for 
> occasional exception.
> so , if the task count is large , we will randomly trim tasks for each table, 
> according to the raito of the table region count in whole tasks region count. 
> If the task count is few,  we will repeat tasks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25234) [Upgrade]Incompatibility in reading RS report from 2.1 RS when Master is upgraded to a version containing HBASE-21406

2020-11-05 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25234.
---
Fix Version/s: 2.3.4
   2.2.7
   2.4.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
 Release Note: Fixes so auto-migration of master procedure store works 
again going from 2.0.x => 2.2+. Also make it so heartbeats work when rolling 
upgrading from 2.0.x => 2.3+.
   Resolution: Fixed

Pushed on branch-2.3+. Applied half of the patch to branch-2.2 (the change in 
clusterreport wasn't added till 2.3).

> [Upgrade]Incompatibility in reading RS report from 2.1 RS when Master is 
> upgraded to a version containing HBASE-21406
> -
>
> Key: HBASE-25234
> URL: https://issues.apache.org/jira/browse/HBASE-25234
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Sanjeet Nishad
>Assignee: Sanjeet Nishad
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.7, 2.3.4
>
>
> While upgrading to a version having HBASE-21406 and following the upgrade 
> process suggested in HBASE-21075, after Master is upgraded, the following 
> exception is observed while reading the rs report from old region servers :
> {code:java}
> 2020-11-02 18:25:30,303 WARN [RS-EventLoopGroup-1-2] ipc.RpcServer: 
> /x.x.x.x:16000 is unable to read call parameter from client x.x.x.x
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException:
>  Message missing required fields: load.replLoadSink.timestampStarted, 
> load.replLoadSink.totalOpsProcessed
>  at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:477)
>  at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerReportRequest$Builder.build(RegionServerStatusProtos.java:2411)
>  at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerReportRequest$Builder.build(RegionServerStatusProtos.java:2349)
>  at 
> org.apache.hadoop.hbase.ipc.ServerRpcConnection.processRequest(ServerRpcConnection.java:654)
>  at 
> org.apache.hadoop.hbase.ipc.ServerRpcConnection.processOneRpc(ServerRpcConnection.java:458)
>  at 
> org.apache.hadoop.hbase.ipc.ServerRpcConnection.saslReadAndProcess(ServerRpcConnection.java:351)
>  at 
> org.apache.hadoop.hbase.ipc.NettyServerRpcConnection.process(NettyServerRpcConnection.java:92)
>  at 
> org.apache.hadoop.hbase.ipc.NettyServerRpcConnection.process(NettyServerRpcConnection.java:68)
>  at 
> org.apache.hadoop.hbase.ipc.NettyRpcServerRequestDecoder.channelRead(NettyRpcServerRequestDecoder.java:62)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>  at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321)
>  at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475)
>  at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>  at 
> 

[jira] [Commented] (HBASE-25238) Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing required fields: state”

2020-11-05 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226811#comment-17226811
 ] 

Michael Stack commented on HBASE-25238:
---

Here is log from manual migration. I ran a 2.0.x cluster, loaded it w/ some 
data. I then stopped the 2.0.x Master. Started a 2.4.x Master. Below you see 
successful migration of store from old format to new. All the while the old 
RegionServer kept heartbeating though it was using old ClusterReport format.

 
{code:java}

2020-11-05 16:27:04,684 INFO  [master/hbasedn020:16000:becomeActiveMaster] 
region.RegionProcedureStore: Starting Region Procedure Store lease recovery...
2020-11-05 16:27:04,685 INFO  [master/hbasedn020:16000:becomeActiveMaster] 
region.RegionProcedureStore: The old WALProcedureStore wal directory 
hdfs://nameservice1/tmp/stack.wal/MasterProcWALs exists, migrating...
2020-11-05 16:27:04,694 INFO  [master/hbasedn020:16000:becomeActiveMaster] 
util.RecoverLeaseFSUtils: Recover lease on dfs file 
hdfs://nameservice1/tmp/stack.wal/MasterProcWALs/pv2-0020.log
2020-11-05 16:27:04,697 INFO  [master/hbasedn020:16000:becomeActiveMaster] 
util.RecoverLeaseFSUtils: Recovered lease, attempt=0 on 
file=hdfs://nameservice1/tmp/stack.wal/MasterProcWALs/pv2-0020.log
 after 3ms
2020-11-05 16:27:04,704 WARN  [master/hbasedn020:16000:becomeActiveMaster] 
wal.WALProcedureStore: Unable to read tracker for 
hdfs://nameservice1/tmp/stack.wal/MasterProcWALs/pv2-0020.log
org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat$InvalidWALDataException:
 Missing trailer: size=18 startPos=18
at 
org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.readTrailer(ProcedureWALFormat.java:182)
at 
org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile.readTrailer(ProcedureWALFile.java:93)
at 
org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile.readTracker(ProcedureWALFile.java:100)
at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.initOldLog(WALProcedureStore.java:1389)
at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.initOldLogs(WALProcedureStore.java:1338)
at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:416)
at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.tryMigrate(RegionProcedureStore.java:180)
at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.recoverLease(RegionProcedureStore.java:257)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:587)
at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1560)
at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:925)
at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2182)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:603)
at java.base/java.lang.Thread.run(Thread.java:834)
2020-11-05 16:27:04,710 INFO  [master/hbasedn020:16000:becomeActiveMaster] 
wal.WALProcedureStore: Rolled new Procedure Store WAL, id=21
2020-11-05 16:27:04,714 INFO  [master/hbasedn020:16000:becomeActiveMaster] 
wal.ProcedureWALFormatReader: Rebuilding tracker for 
hdfs://nameservice1/tmp/stack.wal/MasterProcWALs/pv2-0020.log
2020-11-05 16:27:04,716 WARN  [master/hbasedn020:16000:becomeActiveMaster] 
wal.ProcedureWALFormatReader: Nothing left to decode. Exiting with missing EOF, 
log=hdfs://nameservice1/tmp/stack.wal/MasterProcWALs/pv2-0020.log
2020-11-05 16:27:04,716 INFO  [master/hbasedn020:16000:becomeActiveMaster] 
wal.ProcedureWALFormatReader: Read 0 entries in 
hdfs://nameservice1/tmp/stack.wal/MasterProcWALs/pv2-0020.log
2020-11-05 16:27:04,737 INFO  [master/hbasedn020:16000:becomeActiveMaster] 
wal.WALProcedureStore: Rolled new Procedure Store WAL, id=22
2020-11-05 16:27:04,737 INFO  [master/hbasedn020:16000:becomeActiveMaster] 
wal.WALProcedureStore: Remove all state logs with ID less than 21, since no 
active procedures
2020-11-05 16:27:04,737 INFO  [master/hbasedn020:16000:becomeActiveMaster] 
wal.ProcedureWALFile: Archiving 
hdfs://nameservice1/tmp/stack.wal/MasterProcWALs/pv2-0020.log 
to hdfs://nameservice1/tmp/stack.wal/oldWALs/pv2-0020.log
2020-11-05 16:27:04,738 INFO  [master/hbasedn020:16000:becomeActiveMaster] 
wal.ProcedureWALFile: Archiving 
hdfs://nameservice1/tmp/stack.wal/MasterProcWALs/pv2-0021.log 
to hdfs://nameservice1/tmp/stack.wal/oldWALs/pv2-0021.log
2020-11-05 16:27:04,743 INFO  [master/hbasedn020:16000:becomeActiveMaster] 
region.RegionProcedureStore: Migrated 0 existing procedures from the old 
storage format.
2020-11-05 16:27:04,743 INFO  

[jira] [Commented] (HBASE-25238) Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing required fields: state”

2020-11-04 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226508#comment-17226508
 ] 

Michael Stack commented on HBASE-25238:
---

Added suggested PR. Manually testing of upgrade is taking a bit of time...

> Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing 
> required fields: state”
> -
>
> Key: HBASE-25238
> URL: https://issues.apache.org/jira/browse/HBASE-25238
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Zhuqi Jin
>Priority: Critical
>
> When we upgraded HBASE cluster from 2.0.0-RC0 to 2.3.0 or 2.3.3, the HMaster 
> on upgraded node failed to start.
> The error message is shown below: 
> {code:java}
> 2020-11-02 23:04:01,998 ERROR [master/2c4006997f99:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active 
> masterorg.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:125)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:228)  
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:124)
>    at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.deserializeStateData(RegionRemoteProcedureBase.java:352)
>    at 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure.deserializeStateData(OpenRegionProcedure.java:72)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:294)
>    at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>    at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore$1.load(RegionProcedureStore.java:194)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore$2.load(WALProcedureStore.java:474)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:151)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:103)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:465)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.tryMigrate(RegionProcedureStore.java:184)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.recoverLease(RegionProcedureStore.java:257)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:587)
>    at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1572)
>    at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:950)
>    at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2240)
>    at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:622) 
>   at java.lang.Thread.run(Thread.java:748)2020-11-02 23:04:01,998 ERROR 
> [master/2c4006997f99:16000:becomeActiveMaster] master.HMaster: * ABORTING 
> master 2c4006997f99,16000,1604358237412: Unhandled exception. Starting 
> shutdown. 
> *org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:125)
>    at 
> 

[jira] [Comment Edited] (HBASE-25238) Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing required fields: state”

2020-11-04 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226288#comment-17226288
 ] 

Michael Stack edited comment on HBASE-25238 at 11/4/20, 10:53 PM:
--

Marking this issue critical.

Can change the proto fields to be optional so upgrades work. Let me make a 
patch. Thanks for linking HBASE-25234 [~pankajkumar] . Let me fix that too.

 

 


was (Author: stack):
Marking this issue critical.

Can change the proto fields to be optional so upgrades. Let me make a patch. 
Thanks for linking HBASE-25234 [~pankajkumar] . Let me fix that too.

 

 

> Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing 
> required fields: state”
> -
>
> Key: HBASE-25238
> URL: https://issues.apache.org/jira/browse/HBASE-25238
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Zhuqi Jin
>Priority: Critical
>
> When we upgraded HBASE cluster from 2.0.0-RC0 to 2.3.0 or 2.3.3, the HMaster 
> on upgraded node failed to start.
> The error message is shown below: 
> {code:java}
> 2020-11-02 23:04:01,998 ERROR [master/2c4006997f99:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active 
> masterorg.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:125)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:228)  
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:124)
>    at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.deserializeStateData(RegionRemoteProcedureBase.java:352)
>    at 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure.deserializeStateData(OpenRegionProcedure.java:72)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:294)
>    at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>    at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore$1.load(RegionProcedureStore.java:194)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore$2.load(WALProcedureStore.java:474)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:151)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:103)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:465)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.tryMigrate(RegionProcedureStore.java:184)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.recoverLease(RegionProcedureStore.java:257)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:587)
>    at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1572)
>    at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:950)
>    at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2240)
>    at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:622) 
>   at java.lang.Thread.run(Thread.java:748)2020-11-02 23:04:01,998 ERROR 
> [master/2c4006997f99:16000:becomeActiveMaster] master.HMaster: * ABORTING 
> master 2c4006997f99,16000,1604358237412: Unhandled exception. Starting 
> shutdown. 
> *org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>    at 
> 

[jira] [Commented] (HBASE-24186) RegionMover ignores replicationId

2020-11-04 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226293#comment-17226293
 ] 

Michael Stack commented on HBASE-24186:
---

Not important but just a note to say I reverted this patch from branch-2.0 
too... It broke its build (I'm testing migration so was trying to build 
branch-2.0 and found this). This matches the observation above by [~ram_krish]

> RegionMover ignores replicationId
> -
>
> Key: HBASE-24186
> URL: https://issues.apache.org/jira/browse/HBASE-24186
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.5
>
>
> When RegionMover looks up which rs hosts a region, it does this based on 
> startRowKey. When read replication is enabled this might not return the 
> expected region's data and this can prevent the moving of these regions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25238) Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing required fields: state”

2020-11-04 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226288#comment-17226288
 ] 

Michael Stack commented on HBASE-25238:
---

Marking this issue critical.

Can change the proto fields to be optional so upgrades. Let me make a patch. 
Thanks for linking HBASE-25234 [~pankajkumar] . Let me fix that too.

 

 

> Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing 
> required fields: state”
> -
>
> Key: HBASE-25238
> URL: https://issues.apache.org/jira/browse/HBASE-25238
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Zhuqi Jin
>Priority: Critical
>
> When we upgraded HBASE cluster from 2.0.0-RC0 to 2.3.0 or 2.3.3, the HMaster 
> on upgraded node failed to start.
> The error message is shown below: 
> {code:java}
> 2020-11-02 23:04:01,998 ERROR [master/2c4006997f99:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active 
> masterorg.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:125)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:228)  
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:124)
>    at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.deserializeStateData(RegionRemoteProcedureBase.java:352)
>    at 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure.deserializeStateData(OpenRegionProcedure.java:72)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:294)
>    at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>    at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore$1.load(RegionProcedureStore.java:194)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore$2.load(WALProcedureStore.java:474)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:151)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:103)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:465)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.tryMigrate(RegionProcedureStore.java:184)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.recoverLease(RegionProcedureStore.java:257)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:587)
>    at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1572)
>    at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:950)
>    at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2240)
>    at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:622) 
>   at java.lang.Thread.run(Thread.java:748)2020-11-02 23:04:01,998 ERROR 
> [master/2c4006997f99:16000:becomeActiveMaster] master.HMaster: * ABORTING 
> master 2c4006997f99,16000,1604358237412: Unhandled exception. Starting 
> shutdown. 
> *org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>    at 
> 

[jira] [Updated] (HBASE-25238) Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing required fields: state”

2020-11-04 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-25238:
--
Priority: Critical  (was: Major)

> Upgrading HBase from 2.2.0 to 2.3.x fails because of “Message missing 
> required fields: state”
> -
>
> Key: HBASE-25238
> URL: https://issues.apache.org/jira/browse/HBASE-25238
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Zhuqi Jin
>Priority: Critical
>
> When we upgraded HBASE cluster from 2.0.0-RC0 to 2.3.0 or 2.3.3, the HMaster 
> on upgraded node failed to start.
> The error message is shown below: 
> {code:java}
> 2020-11-02 23:04:01,998 ERROR [master/2c4006997f99:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active 
> masterorg.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:125)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:228)  
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:124)
>    at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.deserializeStateData(RegionRemoteProcedureBase.java:352)
>    at 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure.deserializeStateData(OpenRegionProcedure.java:72)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:294)
>    at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>    at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore$1.load(RegionProcedureStore.java:194)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore$2.load(WALProcedureStore.java:474)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:151)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:103)
>    at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:465)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.tryMigrate(RegionProcedureStore.java:184)
>    at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.recoverLease(RegionProcedureStore.java:257)
>    at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:587)
>    at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1572)
>    at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:950)
>    at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2240)
>    at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:622) 
>   at java.lang.Thread.run(Thread.java:748)2020-11-02 23:04:01,998 ERROR 
> [master/2c4006997f99:16000:becomeActiveMaster] master.HMaster: * ABORTING 
> master 2c4006997f99,16000,1604358237412: Unhandled exception. Starting 
> shutdown. 
> *org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: state   at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:120)
>    at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:125)
>    at 
> 

[jira] [Resolved] (HBASE-25053) WAL replay should ignore 0-length files

2020-11-04 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HBASE-25053.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to branch-2 and master. Thanks for patch [~niuyulin] . Thanks for 
reviews [~zhangduo]  and [~vjasani]

> WAL replay should ignore 0-length files
> ---
>
> Key: HBASE-25053
> URL: https://issues.apache.org/jira/browse/HBASE-25053
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Affects Versions: 2.3.1
>Reporter: Nick Dimiduk
>Assignee: niuyulin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> I overdrove a small testing cluster, filling HDFS. After cleaning up data to 
> bring HBase back up, I noticed all masters -refused to start- abort. Logs 
> complain of seeking past EOF. Indeed the last wal file name logged is a 
> 0-length file. WAL replay should gracefully skip and clean up such an empty 
> file.
> {noformat}
> 2020-09-16 19:51:30,297 ERROR org.apache.hadoop.hbase.master.HMaster: Failed 
> to become active master
> java.io.EOFException: Cannot seek after EOF
> at 
> org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1448)
> at 
> org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:66)
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initInternal(ProtobufLogReader.java:211)
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initReader(ProtobufLogReader.java:173)
> at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:64)
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:168)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:323)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:305)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:293)
> at 
> org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:429)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4859)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4765)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1014)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:956)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7496)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7454)
> at 
> org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:269)
> at 
> org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:309)
> at 
> org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:949)
> at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2240)
> at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:622)
> at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >