[jira] [Commented] (HBASE-20791) RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s internalBalancer

2018-06-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524607#comment-16524607
 ] 

Hadoop QA commented on HBASE-20791:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
25s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
33s{color} | {color:green} hbase-rsgroup generated 0 new + 106 unchanged - 1 
fixed = 106 total (was 107) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} hbase-rsgroup: The patch generated 0 new + 0 
unchanged - 1 fixed = 0 total (was 1) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
28s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m  5s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
55s{color} | {color:green} hbase-rsgroup in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 8s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 36m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-20791 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929325/20791-master-v2.patch 
|
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux af279cdc1f0b 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 99d54246ee |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13409/testReport/ |
| Max. process+thread count | 2719 (vs. ulimit of 1) |
| modules | C: hbase-rsgroup U: hbase-rsgroup |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13409/console |
| Powered by | 

[jira] [Commented] (HBASE-20651) Master, prevents hbck or shell command to reassign the split parent region

2018-06-26 Thread huaxiang sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524605#comment-16524605
 ] 

huaxiang sun commented on HBASE-20651:
--

The issue is that hbck sends a offline command to offline a split parent 
region, which will change region's in-memory state from SPLIT to OFFLINE, and a 
following assign() will succeed and bring back the SPLIT parent region to life. 
I added a simple check, though it wont address the issue totally (due to no 
lock protection to change region's in-memory state), the sanity check will 
filter out most of corner cases (hopefully)

> Master, prevents hbck or shell command to reassign the split parent region
> --
>
> Key: HBASE-20651
> URL: https://issues.apache.org/jira/browse/HBASE-20651
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 1.2.6
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Minor
> Attachments: HBASE-20651-branch-1-v001.patch
>
>
> We are seeing that hbck brings back split parent region and this causes 
> region inconsistency. More details will be filled as reproduce is still 
> ongoing. Might need to do something at hbck or master to prevent this from 
> happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20651) Master, prevents hbck or shell command to reassign the split parent region

2018-06-26 Thread huaxiang sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huaxiang sun updated HBASE-20651:
-
Status: Patch Available  (was: Open)

> Master, prevents hbck or shell command to reassign the split parent region
> --
>
> Key: HBASE-20651
> URL: https://issues.apache.org/jira/browse/HBASE-20651
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 1.2.6
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Minor
> Attachments: HBASE-20651-branch-1-v001.patch
>
>
> We are seeing that hbck brings back split parent region and this causes 
> region inconsistency. More details will be filled as reproduce is still 
> ongoing. Might need to do something at hbck or master to prevent this from 
> happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20795) Allow option in BBKVComparator.compare to do comparison without sequence id

2018-06-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524604#comment-16524604
 ] 

Hudson commented on HBASE-20795:


Results for branch branch-2
[build #912 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/912/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/912//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/912//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/912//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Allow option in BBKVComparator.compare to do comparison without sequence id
> ---
>
> Key: HBASE-20795
> URL: https://issues.apache.org/jira/browse/HBASE-20795
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.1
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 3.0.0, 2.1.0, 2.0.2
>
> Attachments: HBASE-20795.patch
>
>
> CellComparatorImpl#compare(final Cell a, final Cell b, boolean 
> ignoreSequenceid) needs to ignore sequence id in comparison if 
> ignoreSequenceId parameter is set to true but BBKVComparator.compare used 
> internally for the cell of type ByteBufferKeyValue doesn't consider this.
>  {code}
> @Override
>   public int compare(final Cell a, final Cell b, boolean ignoreSequenceid) {
> int diff = 0;
> // "Peel off" the most common path.
> if (a instanceof ByteBufferKeyValue && b instanceof ByteBufferKeyValue) {
>   diff = BBKVComparator.compare((ByteBufferKeyValue)a, 
> (ByteBufferKeyValue)b);
>   if (diff != 0) {
> return diff;
>   }
> } else {
>   diff = compareRows(a, b);
>   if (diff != 0) {
> return diff;
>   }
>   diff = compareWithoutRow(a, b);
>   if (diff != 0) {
> return diff;
>   }
> }
> // Negate following comparisons so later edits show up first mvccVersion: 
> later sorts first
> return ignoreSequenceid? diff: Long.compare(b.getSequenceId(), 
> a.getSequenceId());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20732) Shutdown scan pool when master is stopped.

2018-06-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524602#comment-16524602
 ] 

Hadoop QA commented on HBASE-20732:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
29s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
11s{color} | {color:red} hbase-server: The patch generated 1 new + 159 
unchanged - 0 fixed = 160 total (was 159) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
24s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 53s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}118m 
37s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}158m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-20732 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929317/HBASE-20732.master.008.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux cdd699c66212 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 99d54246ee |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13406/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13406/testReport/ |
| Max. process+thread count | 4448 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Updated] (HBASE-20193) Basic Replication Web UI - Regionserver

2018-06-26 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-20193:
-
Attachment: HBASE-20193.master.009.patch

> Basic Replication Web UI - Regionserver 
> 
>
> Key: HBASE-20193
> URL: https://issues.apache.org/jira/browse/HBASE-20193
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication, Usability
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HBASE-20193.master.001.patch, 
> HBASE-20193.master.002.patch, HBASE-20193.master.003.patch, 
> HBASE-20193.master.004.patch, HBASE-20193.master.004.patch, 
> HBASE-20193.master.005.patch, HBASE-20193.master.006.patch, 
> HBASE-20193.master.006.patch, HBASE-20193.master.007.patch, 
> HBASE-20193.master.008.patch, HBASE-20193.master.009.patch
>
>
> subtask of HBASE-15809. Implementation of replication UI on Regionserver web 
> page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20651) Master, prevents hbck or shell command to reassign the split parent region

2018-06-26 Thread huaxiang sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huaxiang sun updated HBASE-20651:
-
Attachment: HBASE-20651-branch-1-v001.patch

> Master, prevents hbck or shell command to reassign the split parent region
> --
>
> Key: HBASE-20651
> URL: https://issues.apache.org/jira/browse/HBASE-20651
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 1.2.6
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Minor
> Attachments: HBASE-20651-branch-1-v001.patch
>
>
> We are seeing that hbck brings back split parent region and this causes 
> region inconsistency. More details will be filled as reproduce is still 
> ongoing. Might need to do something at hbck or master to prevent this from 
> happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20791) RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s internalBalancer

2018-06-26 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524593#comment-16524593
 ] 

Reid Chan commented on HBASE-20791:
---

Still reviewing, please wait.

> RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s 
> internalBalancer
> --
>
> Key: HBASE-20791
> URL: https://issues.apache.org/jira/browse/HBASE-20791
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 3.0.0, 2.0.0
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Attachments: 20791-master-v2.patch, HBASE-20791-master-v1.patch
>
>
> RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s 
> internalBalancer, Or the StochasticLoadBalancer(internal balancer) will lose 
> it's Up-to-date RegionLoads info, and effect the balance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20791) RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s internalBalancer

2018-06-26 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524592#comment-16524592
 ] 

Ted Yu commented on HBASE-20791:


[~reidchan]:
Do you have other comments w.r.t. patch v2 ?

> RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s 
> internalBalancer
> --
>
> Key: HBASE-20791
> URL: https://issues.apache.org/jira/browse/HBASE-20791
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 3.0.0, 2.0.0
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Attachments: 20791-master-v2.patch, HBASE-20791-master-v1.patch
>
>
> RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s 
> internalBalancer, Or the StochasticLoadBalancer(internal balancer) will lose 
> it's Up-to-date RegionLoads info, and effect the balance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20791) RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s internalBalancer

2018-06-26 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-20791:
---
Attachment: 20791-master-v2.patch

> RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s 
> internalBalancer
> --
>
> Key: HBASE-20791
> URL: https://issues.apache.org/jira/browse/HBASE-20791
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 3.0.0, 2.0.0
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Attachments: 20791-master-v2.patch, HBASE-20791-master-v1.patch
>
>
> RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s 
> internalBalancer, Or the StochasticLoadBalancer(internal balancer) will lose 
> it's Up-to-date RegionLoads info, and effect the balance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20791) RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s internalBalancer

2018-06-26 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524579#comment-16524579
 ] 

Reid Chan commented on HBASE-20791:
---

Make sense.

> RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s 
> internalBalancer
> --
>
> Key: HBASE-20791
> URL: https://issues.apache.org/jira/browse/HBASE-20791
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 3.0.0, 2.0.0
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Attachments: HBASE-20791-master-v1.patch
>
>
> RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s 
> internalBalancer, Or the StochasticLoadBalancer(internal balancer) will lose 
> it's Up-to-date RegionLoads info, and effect the balance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20716) Unsafe access cleanup

2018-06-26 Thread Sahil Aggarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524578#comment-16524578
 ] 

Sahil Aggarwal commented on HBASE-20716:


Here's small benchmark for Bytes.toShort

@Benchmark
public void testGetShortCheckAndDispatch() {
 if (UNSAFE_UNALIGNED) {  
     UnsafeAccess.toShort(bytes, 0);
  } else {
    getShort(bytes, 0);
  }
}

@Benchmark
public void testGetShortDispatch() {
   UnsafeAccess.toShort(bytes, 0);
}

private short getShort(byte[] bytes, int offset) {
   short n = 0;
   n = (short) ((n ^ bytes[offset]) & 0xFF);
   n = (short) (n << 8);
   n = (short) ((n ^ bytes[offset+1]) & 0xFF);
   return n;
}

Benchmark                                                                       
        Mode    Cnt     Score                     Error        Units
UnsafeAccessBenchmark.testGetShortCheckAndDispatch    thrpt       4       
2195811074.302                    ops/s
UnsafeAccessBenchmark.testGetShortDispatch                     thrpt      4     
  2196669275.206                    ops/s

 

Was curious on affect on inlining too, disabling the inlining on 
UnsafeAccess.toShort:

Benchmark                                                                       
        Mode    Cnt     Score                     Error        Units
UnsafeAccessBenchmark.testGetShortCheckAndDispatch    thrpt       2       
474847927.534                    ops/s

UnsafeAccessBenchmark.testGetShortDispatch                     thrpt      2     
  486521295.364                    ops/s

 

 

~ 78% reduction on disabling inlining.

 

And, size difference b/w these methods:

org.sample.UnsafeAccessBenchmark::testGetShortCheckAndDispatch (27 bytes)   
force inline by CompileOracle

org.sample.UnsafeAccessBenchmark::testGetShortDispatch (9 bytes)   force inline 
by CompileOracle

 

 

> Unsafe access cleanup
> -
>
> Key: HBASE-20716
> URL: https://issues.apache.org/jira/browse/HBASE-20716
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: stack
>Priority: Critical
>  Labels: beginner
> Attachments: Screen Shot 2018-06-26 at 11.37.49 AM.png
>
>
> We have two means of getting at unsafe; UnsafeAccess and then internal to the 
> Bytes class. They are effectively doing the same thing. We should have one 
> avenue to Unsafe only.
> Many of our paths to Unsafe via UnsafeAccess traverse flags to check if 
> access is available, if it is aligned and the order in which words are 
> written on the machine. Each check costs -- especially if done millions of 
> times a second -- and on occasion adds bloat in hot code paths. The unsafe 
> access inside Bytes checks on startup what the machine is capable off and 
> then does a static assign of the appropriate class-to-use from there on out. 
> UnsafeAccess does not do this running the checks everytime. Would be good to 
> have the Bytes behavior pervasive.
> The benefit of one access to Unsafe only is plain. The benefits we gain 
> removing checks will be harder to measure though should be plain when you 
> disassemble a hot-path; in a (very) rare case, the saved byte codes could be 
> the difference between inlining or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20795) Allow option in BBKVComparator.compare to do comparison without sequence id

2018-06-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524577#comment-16524577
 ] 

Hudson commented on HBASE-20795:


Results for branch branch-2.0
[build #478 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/478/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/478//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/478//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/478//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Allow option in BBKVComparator.compare to do comparison without sequence id
> ---
>
> Key: HBASE-20795
> URL: https://issues.apache.org/jira/browse/HBASE-20795
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.1
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 3.0.0, 2.1.0, 2.0.2
>
> Attachments: HBASE-20795.patch
>
>
> CellComparatorImpl#compare(final Cell a, final Cell b, boolean 
> ignoreSequenceid) needs to ignore sequence id in comparison if 
> ignoreSequenceId parameter is set to true but BBKVComparator.compare used 
> internally for the cell of type ByteBufferKeyValue doesn't consider this.
>  {code}
> @Override
>   public int compare(final Cell a, final Cell b, boolean ignoreSequenceid) {
> int diff = 0;
> // "Peel off" the most common path.
> if (a instanceof ByteBufferKeyValue && b instanceof ByteBufferKeyValue) {
>   diff = BBKVComparator.compare((ByteBufferKeyValue)a, 
> (ByteBufferKeyValue)b);
>   if (diff != 0) {
> return diff;
>   }
> } else {
>   diff = compareRows(a, b);
>   if (diff != 0) {
> return diff;
>   }
>   diff = compareWithoutRow(a, b);
>   if (diff != 0) {
> return diff;
>   }
> }
> // Negate following comparisons so later edits show up first mvccVersion: 
> later sorts first
> return ignoreSequenceid? diff: Long.compare(b.getSequenceId(), 
> a.getSequenceId());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20792) info:servername and info:sn inconsistent for OPEN region

2018-06-26 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20792:
---
Attachment: TestRegionMoveAndAbandon.java

> info:servername and info:sn inconsistent for OPEN region
> 
>
> Key: HBASE-20792
> URL: https://issues.apache.org/jira/browse/HBASE-20792
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 2.0.2
>
> Attachments: TestRegionMoveAndAbandon.java, 
> hbase-hbase-master-ctr-e138-1518143905142-380753-01-04.hwx.site.log
>
>
> Next problem we've run into after HBASE-20752 and HBASE-20708
> After a rolling restart of a cluster, we'll see situations where a collection 
> of regions will simply not be assigned out to the RS. I was able to reproduce 
> this my mimic the restart patterns our tests do internally (ignore whether 
> this is the best way to restart nodes for now :)). The general pattern is 
> this:
> {code:java}
> for rs in regionservers:
>   stop(server, rs, RS)
> for master in masters:
>   stop(server, master, MASTER)
> sleep(15)
> for master in masters:
>   start(server, master, MASTER)
> for rs in regionservers:
>   start(server, rs, RS){code}
> Looking at meta, we can see why the Master is ignoring some regions:
> {noformat}
>  test
> column=table:state, timestamp=1529871718998, value=\x08\x00
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:regioninfo, timestamp=1529967103390, value={ENCODED => 
> 0297f680df6dc0166a44f9536346268e, NAME => 
> 'test,,1529871718122.0297f680df6dc0166a44f9536346268e.', STARTKEY
>  => '', ENDKEY => 
> ''}
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:seqnumDuringOpen, timestamp=1529967103390, 
> value=\x00\x00\x00\x00\x00\x00\x00*
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:server, timestamp=1529967103390, 
> value=ctr-e138-1518143905142-378097-02-12.hwx.site:16020
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:serverstartcode, timestamp=1529967103390, value=1529966776248
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   column=info:sn, 
> timestamp=1529967096482, 
> value=ctr-e138-1518143905142-378097-02-06.hwx.site,16020,1529966755170
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:state, timestamp=1529967103390, value=OPEN{noformat}
> The region is marked as {{OPEN}}. The master doesn't know any better. 
> However, the interesting bit is that {{info:server}} and {{info:sn}} are 
> inconsistent (which, according to the javadoc should not be possible for an 
> {{OPEN}} region).{{}}
> This doesn't happen every time, but I caught it yesterday on the 2nd or 3rd 
> attempt, so I'm hopeful it's not a bear to repro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20792) info:servername and info:sn inconsistent for OPEN region

2018-06-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524567#comment-16524567
 ] 

Josh Elser commented on HBASE-20792:


Ok, I have a test – (very) initial signs are pointing that HBASE-20708 might 
have inadvertently removed a critical path. Running my test on branch-2.0, it 
passes, but I see us getting into the same meta state:
{noformat}
2018-06-27 00:27:31,828 INFO  [M:0;hw13390:60070] 
assignment.RegionStateStore(122): Load hbase:meta entry 
region=fc00bdddfdec0ebe72132de4197c3247, regionState=OPEN, 
lastHost=hw13390.local,60048,1530073629090, 
regionLocation=hw13390.local,60046,1530073629030, openSeqNum=14
2018-06-27 00:27:31,828 INFO  [M:0;hw13390:60070] 
assignment.AssignmentManager(1211): Number of RegionServers=2
2018-06-27 00:27:31,828 INFO  [M:0;hw13390:60070] 
assignment.AssignmentManager(1331): KILL 
RegionServer=hw13390.local,60046,1530073629030 hosting regions but not online.
{noformat}
That last {{KILL}} log message is what saves our bacon. This triggers the 
region to get reassigned via an SCP that otherwise doesn't happen. Haven't 
looked further other than to say that this logic was shuffled around as a part 
of that. I'll throw up my test case if anyone would like to mess around with it.

> info:servername and info:sn inconsistent for OPEN region
> 
>
> Key: HBASE-20792
> URL: https://issues.apache.org/jira/browse/HBASE-20792
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 2.0.2
>
> Attachments: 
> hbase-hbase-master-ctr-e138-1518143905142-380753-01-04.hwx.site.log
>
>
> Next problem we've run into after HBASE-20752 and HBASE-20708
> After a rolling restart of a cluster, we'll see situations where a collection 
> of regions will simply not be assigned out to the RS. I was able to reproduce 
> this my mimic the restart patterns our tests do internally (ignore whether 
> this is the best way to restart nodes for now :)). The general pattern is 
> this:
> {code:java}
> for rs in regionservers:
>   stop(server, rs, RS)
> for master in masters:
>   stop(server, master, MASTER)
> sleep(15)
> for master in masters:
>   start(server, master, MASTER)
> for rs in regionservers:
>   start(server, rs, RS){code}
> Looking at meta, we can see why the Master is ignoring some regions:
> {noformat}
>  test
> column=table:state, timestamp=1529871718998, value=\x08\x00
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:regioninfo, timestamp=1529967103390, value={ENCODED => 
> 0297f680df6dc0166a44f9536346268e, NAME => 
> 'test,,1529871718122.0297f680df6dc0166a44f9536346268e.', STARTKEY
>  => '', ENDKEY => 
> ''}
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:seqnumDuringOpen, timestamp=1529967103390, 
> value=\x00\x00\x00\x00\x00\x00\x00*
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:server, timestamp=1529967103390, 
> value=ctr-e138-1518143905142-378097-02-12.hwx.site:16020
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:serverstartcode, timestamp=1529967103390, value=1529966776248
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   column=info:sn, 
> timestamp=1529967096482, 
> value=ctr-e138-1518143905142-378097-02-06.hwx.site,16020,1529966755170
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:state, timestamp=1529967103390, value=OPEN{noformat}
> The region is marked as {{OPEN}}. The master doesn't know any better. 
> However, the interesting bit is that {{info:server}} and {{info:sn}} are 
> inconsistent (which, according to the javadoc should not be possible for an 
> {{OPEN}} region).{{}}
> This doesn't happen every time, but I caught it yesterday on the 2nd or 3rd 
> attempt, so I'm hopeful it's not a bear to repro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18201) add UT and docs for DataBlockEncodingTool

2018-06-26 Thread Kuan-Po Tseng (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524566#comment-16524566
 ] 

Kuan-Po Tseng commented on HBASE-18201:
---

[~chia7712] I wasn't change the logic of the code, what I modifed is to make 
sure the last kv isn't null. So yes, this only check the last kv, and I am also 
confusing about this segment...

> add UT and docs for DataBlockEncodingTool
> -
>
> Key: HBASE-18201
> URL: https://issues.apache.org/jira/browse/HBASE-18201
> Project: HBase
>  Issue Type: Task
>  Components: tooling
>Reporter: Chia-Ping Tsai
>Assignee: Kuan-Po Tseng
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-18201.master.001.patch, 
> HBASE-18201.master.002.patch, HBASE-18201.master.002.patch, 
> HBASE-18201.master.003.patch
>
>
> There is no example, documents, or tests for DataBlockEncodingTool. We should 
> have it friendly if any use case exists. Otherwise, we should just get rid of 
> it because DataBlockEncodingTool presumes that the implementation of cell 
> returned from DataBlockEncoder is KeyValue. The presume may obstruct the 
> cleanup of KeyValue references in the code base of read/write path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20791) RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s internalBalancer

2018-06-26 Thread chenxu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524563#comment-16524563
 ] 

chenxu commented on HBASE-20791:


ClusterMetrics will be periodic update by ClusterStatusChore, so it should be 
pass to internalBalancer

> RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s 
> internalBalancer
> --
>
> Key: HBASE-20791
> URL: https://issues.apache.org/jira/browse/HBASE-20791
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 3.0.0, 2.0.0
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Attachments: HBASE-20791-master-v1.patch
>
>
> RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s 
> internalBalancer, Or the StochasticLoadBalancer(internal balancer) will lose 
> it's Up-to-date RegionLoads info, and effect the balance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19997) [rolling upgrade] 1.x => 2.x

2018-06-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524562#comment-16524562
 ] 

stack commented on HBASE-19997:
---

[~Xiaolin Ha] Yes. 0.98 won't work against a 2.0. Anything before 1.2 will give 
wonky results (not able to locate regions, etc.). You've seen HBASE-20788 
(perhaps you work w/ [~Apache9] ?)

> [rolling upgrade] 1.x => 2.x
> 
>
> Key: HBASE-19997
> URL: https://issues.apache.org/jira/browse/HBASE-19997
> Project: HBase
>  Issue Type: Umbrella
>Reporter: stack
>Priority: Blocker
> Fix For: 2.1.0
>
> Attachments: Screenshot from 2018-05-03 14-43-46.png
>
>
> An umbrella issue of issues needed so folks can do a rolling upgrade from 
> hbase-1.x to hbase-2.x.
> (Recent) Notables:
>  * hbase-1.x can't read hbase-2.x WALs -- hbase-1.x doesn't know the 
> AsyncProtobufLogWriter class used writing the WAL -- see 
> https://issues.apache.org/jira/browse/HBASE-19166?focusedCommentId=16362897=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16362897
>  for exception.
>  ** Might be ok... means WAL split fails on an hbase1 RS... must wait till an 
> hbase-2.x RS picks up the WAL for it to be split.
>  * hbase-1 can't open regions from tables created by hbase-2; it can't find 
> the Table descriptor. See 
> https://issues.apache.org/jira/browse/HBASE-19116?focusedCommentId=16363276=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16363276
>  ** This might be ok if the tables we are doing rolling upgrade over were 
> written with hbase-1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20745) Log when master proc wal rolls

2018-06-26 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20745:
--
Release Note: Log Master WAL Proc at INFO level so can tell where we 
transition; will help debugging/figuring accounting. Also change 
DEFAULT_RIT_CHORE_INTERVAL_MSEC from 5 to 60 seconds; makes it so we emit STUCK 
RIT notice once a minute only rather than 12 times a minute (latter was causing 
us to quickly roll-away the logging around problem 'events').

> Log when master proc wal rolls
> --
>
> Key: HBASE-20745
> URL: https://issues.apache.org/jira/browse/HBASE-20745
> Project: HBase
>  Issue Type: Sub-task
>  Components: debugging
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20745.master.001.patch
>
>
> Emit when we roll master proc WAL so can see when they happen. Want to 
> correlate instances of corruption w/ events on Master. Currently hard to do 
> on  a server where log-level is INFO (default for many deploys).
> Also, we log STUCK regions every 5 seconds. If a bundle of regions get stuck, 
> we can log so frequently, we roll away where the problem happened so lose the 
> chance to debug. Let me fix that too
> Need both debugging instances of parent issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20745) Log when master proc wal rolls

2018-06-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524559#comment-16524559
 ] 

stack commented on HBASE-20745:
---

[~apurtell] Sorry about that. Should have made mention. Yeah, the logging every 
5 seconds was worse that useless causing roll-away of actual problems.

> Log when master proc wal rolls
> --
>
> Key: HBASE-20745
> URL: https://issues.apache.org/jira/browse/HBASE-20745
> Project: HBase
>  Issue Type: Sub-task
>  Components: debugging
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20745.master.001.patch
>
>
> Emit when we roll master proc WAL so can see when they happen. Want to 
> correlate instances of corruption w/ events on Master. Currently hard to do 
> on  a server where log-level is INFO (default for many deploys).
> Also, we log STUCK regions every 5 seconds. If a bundle of regions get stuck, 
> we can log so frequently, we roll away where the problem happened so lose the 
> chance to debug. Let me fix that too
> Need both debugging instances of parent issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18201) add UT and docs for DataBlockEncodingTool

2018-06-26 Thread Chia-Ping Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524547#comment-16524547
 ] 

Chia-Ping Tsai commented on HBASE-18201:


{code:java}
+boolean useTag = (prevKV.getTagsLength() > 0);{code}
only check the last kv? 

> add UT and docs for DataBlockEncodingTool
> -
>
> Key: HBASE-18201
> URL: https://issues.apache.org/jira/browse/HBASE-18201
> Project: HBase
>  Issue Type: Task
>  Components: tooling
>Reporter: Chia-Ping Tsai
>Assignee: Kuan-Po Tseng
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-18201.master.001.patch, 
> HBASE-18201.master.002.patch, HBASE-18201.master.002.patch, 
> HBASE-18201.master.003.patch
>
>
> There is no example, documents, or tests for DataBlockEncodingTool. We should 
> have it friendly if any use case exists. Otherwise, we should just get rid of 
> it because DataBlockEncodingTool presumes that the implementation of cell 
> returned from DataBlockEncoder is KeyValue. The presume may obstruct the 
> cleanup of KeyValue references in the code base of read/write path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20791) RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s internalBalancer

2018-06-26 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524542#comment-16524542
 ] 

Reid Chan commented on HBASE-20791:
---

{code:title=HMaster#finishActiveMasterInitialization, this.balancer is 
RSGroupBasedLoadBalancer}
// initialize load balancer
this.balancer.setMasterServices(this);
this.balancer.setClusterMetrics(getClusterMetricsWithoutCoprocessor());
this.balancer.initialize();
{code}
internalBalancer(StochasticLoadBalancer) is initialized in method 
{{initialize()}}, and in following code
{code:title=RSGroupBasedLoadBalancer#initialize}
internalBalancer.setMasterServices(masterServices);
internalBalancer.setClusterMetrics(clusterStatus);
internalBalancer.setConf(config);
internalBalancer.initialize();
{code}
StochasticLoadBalancer must accept the {{clusterStatus}} which set by 
RSGroupBasedLoadBalancer#setClusterMetrics.

I don't think it an issue, nothing should be fixed.

> RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s 
> internalBalancer
> --
>
> Key: HBASE-20791
> URL: https://issues.apache.org/jira/browse/HBASE-20791
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 3.0.0, 2.0.0
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Attachments: HBASE-20791-master-v1.patch
>
>
> RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s 
> internalBalancer, Or the StochasticLoadBalancer(internal balancer) will lose 
> it's Up-to-date RegionLoads info, and effect the balance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19164) Avoid UUID.randomUUID in tests

2018-06-26 Thread Sahil Aggarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Aggarwal updated HBASE-19164:
---
Attachment: HBASE-19164.master.006.patch

> Avoid UUID.randomUUID in tests
> --
>
> Key: HBASE-19164
> URL: https://issues.apache.org/jira/browse/HBASE-19164
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Reporter: Mike Drob
>Assignee: Sahil Aggarwal
>Priority: Major
>  Labels: beginner
> Attachments: HBASE-19164.master.001.patch, 
> HBASE-19164.master.002.patch, HBASE-19164.master.003.patch, 
> HBASE-19164.master.004.patch, HBASE-19164.master.005.patch, 
> HBASE-19164.master.006.patch
>
>
> We have a lot of places in our test code where we use {{UUID.randomUUID}} to 
> generate table names or paths for uniqueness. Unfortunately, this uses up a 
> good chunk of system entropy, since Sun chose that random UUID's should use 
> the NativePRNGBlocking implementation.
> We don't need to block on entropy for random bits to pick a random table name 
> in a test, so we can use something that doesn't strain the system too much - 
> secure random can be a source of problems on some VM or containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18201) add UT and docs for DataBlockEncodingTool

2018-06-26 Thread Chia-Ping Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524535#comment-16524535
 ] 

Chia-Ping Tsai commented on HBASE-18201:


[~brandboat] Thanks for making this stale tool reborn!!! I will review the 
patch today. Could you please briefly describe the patch? thanks!

> add UT and docs for DataBlockEncodingTool
> -
>
> Key: HBASE-18201
> URL: https://issues.apache.org/jira/browse/HBASE-18201
> Project: HBase
>  Issue Type: Task
>  Components: tooling
>Reporter: Chia-Ping Tsai
>Assignee: Kuan-Po Tseng
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-18201.master.001.patch, 
> HBASE-18201.master.002.patch, HBASE-18201.master.002.patch, 
> HBASE-18201.master.003.patch
>
>
> There is no example, documents, or tests for DataBlockEncodingTool. We should 
> have it friendly if any use case exists. Otherwise, we should just get rid of 
> it because DataBlockEncodingTool presumes that the implementation of cell 
> returned from DataBlockEncoder is KeyValue. The presume may obstruct the 
> cleanup of KeyValue references in the code base of read/write path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20792) info:servername and info:sn inconsistent for OPEN region

2018-06-26 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20792:
---
Attachment: 
hbase-hbase-master-ctr-e138-1518143905142-380753-01-04.hwx.site.log

> info:servername and info:sn inconsistent for OPEN region
> 
>
> Key: HBASE-20792
> URL: https://issues.apache.org/jira/browse/HBASE-20792
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 2.0.2
>
> Attachments: 
> hbase-hbase-master-ctr-e138-1518143905142-380753-01-04.hwx.site.log
>
>
> Next problem we've run into after HBASE-20752 and HBASE-20708
> After a rolling restart of a cluster, we'll see situations where a collection 
> of regions will simply not be assigned out to the RS. I was able to reproduce 
> this my mimic the restart patterns our tests do internally (ignore whether 
> this is the best way to restart nodes for now :)). The general pattern is 
> this:
> {code:java}
> for rs in regionservers:
>   stop(server, rs, RS)
> for master in masters:
>   stop(server, master, MASTER)
> sleep(15)
> for master in masters:
>   start(server, master, MASTER)
> for rs in regionservers:
>   start(server, rs, RS){code}
> Looking at meta, we can see why the Master is ignoring some regions:
> {noformat}
>  test
> column=table:state, timestamp=1529871718998, value=\x08\x00
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:regioninfo, timestamp=1529967103390, value={ENCODED => 
> 0297f680df6dc0166a44f9536346268e, NAME => 
> 'test,,1529871718122.0297f680df6dc0166a44f9536346268e.', STARTKEY
>  => '', ENDKEY => 
> ''}
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:seqnumDuringOpen, timestamp=1529967103390, 
> value=\x00\x00\x00\x00\x00\x00\x00*
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:server, timestamp=1529967103390, 
> value=ctr-e138-1518143905142-378097-02-12.hwx.site:16020
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:serverstartcode, timestamp=1529967103390, value=1529966776248
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   column=info:sn, 
> timestamp=1529967096482, 
> value=ctr-e138-1518143905142-378097-02-06.hwx.site,16020,1529966755170
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:state, timestamp=1529967103390, value=OPEN{noformat}
> The region is marked as {{OPEN}}. The master doesn't know any better. 
> However, the interesting bit is that {{info:server}} and {{info:sn}} are 
> inconsistent (which, according to the javadoc should not be possible for an 
> {{OPEN}} region).{{}}
> This doesn't happen every time, but I caught it yesterday on the 2nd or 3rd 
> attempt, so I'm hopeful it's not a bear to repro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20792) info:servername and info:sn inconsistent for OPEN region

2018-06-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524534#comment-16524534
 ] 

Josh Elser commented on HBASE-20792:


{quote}The procedure with pid 17 is a MRP? Or SCP? 
{quote}
An SCP, yup, but not the one we're interested in. We want the SCP which is 
pid=19, see at {{2018-06-27 02:57:19,945}}
{quote}Since for opening a region the regionLocation can never be null, this 
means the new region location is the same with the previous one.
{quote}
The reason we didn't hit this branch is that the region was OFFLINE :). Let me 
try to give you a full log snippet. That should help un-vex you a little (or 
not, haha).

{{2018-06-27 02:57:21,190}} has the step where we mark the region as OFFLINE 
which makes the subsequent OPENING not update {{info:sn}} (like I think it 
should) at {{2018-06-27 02:57:21,342}}.
{quote}Let me take a look at other issues and then come back with a fresh brain 
and dig it...
{quote}
No worries. Been staring at this stuff all day. I'm just about spent. 
Appreciate your help already!

> info:servername and info:sn inconsistent for OPEN region
> 
>
> Key: HBASE-20792
> URL: https://issues.apache.org/jira/browse/HBASE-20792
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 2.0.2
>
>
> Next problem we've run into after HBASE-20752 and HBASE-20708
> After a rolling restart of a cluster, we'll see situations where a collection 
> of regions will simply not be assigned out to the RS. I was able to reproduce 
> this my mimic the restart patterns our tests do internally (ignore whether 
> this is the best way to restart nodes for now :)). The general pattern is 
> this:
> {code:java}
> for rs in regionservers:
>   stop(server, rs, RS)
> for master in masters:
>   stop(server, master, MASTER)
> sleep(15)
> for master in masters:
>   start(server, master, MASTER)
> for rs in regionservers:
>   start(server, rs, RS){code}
> Looking at meta, we can see why the Master is ignoring some regions:
> {noformat}
>  test
> column=table:state, timestamp=1529871718998, value=\x08\x00
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:regioninfo, timestamp=1529967103390, value={ENCODED => 
> 0297f680df6dc0166a44f9536346268e, NAME => 
> 'test,,1529871718122.0297f680df6dc0166a44f9536346268e.', STARTKEY
>  => '', ENDKEY => 
> ''}
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:seqnumDuringOpen, timestamp=1529967103390, 
> value=\x00\x00\x00\x00\x00\x00\x00*
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:server, timestamp=1529967103390, 
> value=ctr-e138-1518143905142-378097-02-12.hwx.site:16020
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:serverstartcode, timestamp=1529967103390, value=1529966776248
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   column=info:sn, 
> timestamp=1529967096482, 
> value=ctr-e138-1518143905142-378097-02-06.hwx.site,16020,1529966755170
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:state, timestamp=1529967103390, value=OPEN{noformat}
> The region is marked as {{OPEN}}. The master doesn't know any better. 
> However, the interesting bit is that {{info:server}} and {{info:sn}} are 
> inconsistent (which, according to the javadoc should not be possible for an 
> {{OPEN}} region).{{}}
> This doesn't happen every time, but I caught it yesterday on the 2nd or 3rd 
> attempt, so I'm hopeful it's not a bear to repro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20790) Fix the style issues on branch HBASE-19064 before merging back to master

2018-06-26 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20790:
--
Assignee: Duo Zhang
  Status: Patch Available  (was: Open)

> Fix the style issues on branch HBASE-19064 before merging back to master
> 
>
> Key: HBASE-20790
> URL: https://issues.apache.org/jira/browse/HBASE-20790
> Project: HBase
>  Issue Type: Sub-task
>  Components: build
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HBASE-19064
>
> Attachments: HBASE-20790-HBASE-19064.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20783) NPE encountered when rolling update from master with an async peer to branch HBASE-19064

2018-06-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524530#comment-16524530
 ] 

Duo Zhang commented on HBASE-20783:
---

Pushed to branch HBASE-19064.

> NPE encountered when rolling update from master with an async peer to branch 
> HBASE-19064
> 
>
> Key: HBASE-20783
> URL: https://issues.apache.org/jira/browse/HBASE-20783
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: HBASE-19064
>
> Attachments: HBASE-20783-HBASE-19064-addendum.patch, 
> HBASE-20783-HBASE-19064-v1.patch, HBASE-20783-HBASE-19064-v1.patch, 
> HBASE-20783-addendum.patch
>
>
> {code}
> 2018-06-25 16:25:04,261 ERROR [Thread-14] master.HMaster: Failed to become 
> active master
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.replication.SyncReplicationState.parseFrom(SyncReplicationState.java:72)
> at 
> org.apache.hadoop.hbase.replication.ZKReplicationPeerStorage.getSyncReplicationState(ZKReplicationPeerStorage.java:224)
> at 
> org.apache.hadoop.hbase.replication.ZKReplicationPeerStorage.getPeerSyncReplicationState(ZKReplicationPeerStorage.java:240)
> at 
> org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.create(ReplicationPeerManager.java:479)
> at 
> org.apache.hadoop.hbase.master.HMaster.initializeZKBasedSystemTrackers(HMaster.java:755)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:895)
> at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2126)
> at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:571)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20783) NPE encountered when rolling update from master with an async peer to branch HBASE-19064

2018-06-26 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20783:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> NPE encountered when rolling update from master with an async peer to branch 
> HBASE-19064
> 
>
> Key: HBASE-20783
> URL: https://issues.apache.org/jira/browse/HBASE-20783
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: HBASE-19064
>
> Attachments: HBASE-20783-HBASE-19064-addendum.patch, 
> HBASE-20783-HBASE-19064-v1.patch, HBASE-20783-HBASE-19064-v1.patch, 
> HBASE-20783-addendum.patch
>
>
> {code}
> 2018-06-25 16:25:04,261 ERROR [Thread-14] master.HMaster: Failed to become 
> active master
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.replication.SyncReplicationState.parseFrom(SyncReplicationState.java:72)
> at 
> org.apache.hadoop.hbase.replication.ZKReplicationPeerStorage.getSyncReplicationState(ZKReplicationPeerStorage.java:224)
> at 
> org.apache.hadoop.hbase.replication.ZKReplicationPeerStorage.getPeerSyncReplicationState(ZKReplicationPeerStorage.java:240)
> at 
> org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.create(ReplicationPeerManager.java:479)
> at 
> org.apache.hadoop.hbase.master.HMaster.initializeZKBasedSystemTrackers(HMaster.java:755)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:895)
> at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2126)
> at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:571)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20790) Fix the style issues on branch HBASE-19064 before merging back to master

2018-06-26 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20790:
--
Attachment: HBASE-20790-HBASE-19064.patch

> Fix the style issues on branch HBASE-19064 before merging back to master
> 
>
> Key: HBASE-20790
> URL: https://issues.apache.org/jira/browse/HBASE-20790
> Project: HBase
>  Issue Type: Sub-task
>  Components: build
>Reporter: Duo Zhang
>Priority: Major
> Fix For: HBASE-19064
>
> Attachments: HBASE-20790-HBASE-19064.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20783) NPE encountered when rolling update from master with an async peer to branch HBASE-19064

2018-06-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524525#comment-16524525
 ] 

Duo Zhang commented on HBASE-20783:
---

Let me commit.

> NPE encountered when rolling update from master with an async peer to branch 
> HBASE-19064
> 
>
> Key: HBASE-20783
> URL: https://issues.apache.org/jira/browse/HBASE-20783
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: HBASE-19064
>
> Attachments: HBASE-20783-HBASE-19064-addendum.patch, 
> HBASE-20783-HBASE-19064-v1.patch, HBASE-20783-HBASE-19064-v1.patch, 
> HBASE-20783-addendum.patch
>
>
> {code}
> 2018-06-25 16:25:04,261 ERROR [Thread-14] master.HMaster: Failed to become 
> active master
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.replication.SyncReplicationState.parseFrom(SyncReplicationState.java:72)
> at 
> org.apache.hadoop.hbase.replication.ZKReplicationPeerStorage.getSyncReplicationState(ZKReplicationPeerStorage.java:224)
> at 
> org.apache.hadoop.hbase.replication.ZKReplicationPeerStorage.getPeerSyncReplicationState(ZKReplicationPeerStorage.java:240)
> at 
> org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.create(ReplicationPeerManager.java:479)
> at 
> org.apache.hadoop.hbase.master.HMaster.initializeZKBasedSystemTrackers(HMaster.java:755)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:895)
> at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2126)
> at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:571)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20792) info:servername and info:sn inconsistent for OPEN region

2018-06-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524522#comment-16524522
 ] 

Duo Zhang commented on HBASE-20792:
---

Read the code, we set the lastHost for region node at two places, one is when 
loading meta at startup, the other is in markRegionAsClosed, which is only 
called in UnassignProcedure. So I guess the problem is that, we do not update 
the lastHost field for a region node in SCP, and this causes some 
inconsistency. But I haven't gotten the idea on how to reproduce the problem, 
my brain has been trapped...

Let me take a look at other issues and then come back with a fresh brain and 
dig it...

Thanks.

> info:servername and info:sn inconsistent for OPEN region
> 
>
> Key: HBASE-20792
> URL: https://issues.apache.org/jira/browse/HBASE-20792
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 2.0.2
>
>
> Next problem we've run into after HBASE-20752 and HBASE-20708
> After a rolling restart of a cluster, we'll see situations where a collection 
> of regions will simply not be assigned out to the RS. I was able to reproduce 
> this my mimic the restart patterns our tests do internally (ignore whether 
> this is the best way to restart nodes for now :)). The general pattern is 
> this:
> {code:java}
> for rs in regionservers:
>   stop(server, rs, RS)
> for master in masters:
>   stop(server, master, MASTER)
> sleep(15)
> for master in masters:
>   start(server, master, MASTER)
> for rs in regionservers:
>   start(server, rs, RS){code}
> Looking at meta, we can see why the Master is ignoring some regions:
> {noformat}
>  test
> column=table:state, timestamp=1529871718998, value=\x08\x00
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:regioninfo, timestamp=1529967103390, value={ENCODED => 
> 0297f680df6dc0166a44f9536346268e, NAME => 
> 'test,,1529871718122.0297f680df6dc0166a44f9536346268e.', STARTKEY
>  => '', ENDKEY => 
> ''}
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:seqnumDuringOpen, timestamp=1529967103390, 
> value=\x00\x00\x00\x00\x00\x00\x00*
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:server, timestamp=1529967103390, 
> value=ctr-e138-1518143905142-378097-02-12.hwx.site:16020
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:serverstartcode, timestamp=1529967103390, value=1529966776248
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   column=info:sn, 
> timestamp=1529967096482, 
> value=ctr-e138-1518143905142-378097-02-06.hwx.site,16020,1529966755170
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:state, timestamp=1529967103390, value=OPEN{noformat}
> The region is marked as {{OPEN}}. The master doesn't know any better. 
> However, the interesting bit is that {{info:server}} and {{info:sn}} are 
> inconsistent (which, according to the javadoc should not be possible for an 
> {{OPEN}} region).{{}}
> This doesn't happen every time, but I caught it yesterday on the 2nd or 3rd 
> attempt, so I'm hopeful it's not a bear to repro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20789) TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky

2018-06-26 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524520#comment-16524520
 ] 

Zheng Hu commented on HBASE-20789:
--

Check it again,  I would suggest that not depend on the comparison code  which 
is calculated by comparing the serialized ByteBuffer.. because if bigEndian  or 
not , the comparsion code will diff , it's a potential risk .. 
{code}
//HeapByteBuffer#putInt
static void putInt(ByteBuffer bb, int bi, int x, boolean bigEndian) {
if (bigEndian)
putIntB(bb, bi, x);
else
putIntL(bb, bi, x);
}
{code}

In BlockCacheUtil#validateBlockAddition, I think we can just compare the block 
without  nextBlockMetadata first, if not equal, throw the RE, otherwise, just 
compare the nextBlockOnDiskSize ... the trouble is , it's not easy to access 
the nextBlockOnDiskSize member in the abstract Cacheable interface ... 
I think we can just define a new method in Cacheable . 

> TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
> ---
>
> Key: HBASE-20789
> URL: https://issues.apache.org/jira/browse/HBASE-20789
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: bucket-33718.out
>
>
> The UT failed frequently in our internal branch-2... Will dig into the UT.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20792) info:servername and info:sn inconsistent for OPEN region

2018-06-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524513#comment-16524513
 ] 

Duo Zhang edited comment on HBASE-20792 at 6/27/18 3:18 AM:


{code}
else if (regionLocation != null && !regionLocation.equals(lastHost)) {
  // Ideally, if no regionLocation, write null to the hbase:meta but this 
will confuse clients
  // currently; they want a server to hit. TODO: Make clients wait if no 
location.
  put.add(CellBuilderFactory.create(CellBuilderType.SHALLOW_COPY)
  .setRow(put.getRow())
  .setFamily(HConstants.CATALOG_FAMILY)
  .setQualifier(getServerNameColumn(replicaId))
  .setTimestamp(put.getTimestamp())
  .setType(Cell.Type.Put)
  .setValue(Bytes.toBytes(regionLocation.getServerName()))
  .build());
  info.append(", regionLocation=").append(regionLocation);
}
{code}

So here if we do not log the regionLocation when OPENING, it means that the 
regionLocation is null, or it is the same with the previous location. Since for 
opening a region the regionLocation can never be null, this means the new 
region location is the same with the previous one.

And in your case, I guess this is not the truth right? The procedure with pid 
17 is a MRP? Or SCP? Anyway, in both cases, I do not think the region location 
should be the same with the previous one. Maybe you can add more logs in 
updateUserRegionLocation to see what's going on with the regionLocation and 
lastHost?


was (Author: apache9):
{code}
else if (regionLocation != null && !regionLocation.equals(lastHost)) {
  // Ideally, if no regionLocation, write null to the hbase:meta but this 
will confuse clients
  // currently; they want a server to hit. TODO: Make clients wait if no 
location.
  put.add(CellBuilderFactory.create(CellBuilderType.SHALLOW_COPY)
  .setRow(put.getRow())
  .setFamily(HConstants.CATALOG_FAMILY)
  .setQualifier(getServerNameColumn(replicaId))
  .setTimestamp(put.getTimestamp())
  .setType(Cell.Type.Put)
  .setValue(Bytes.toBytes(regionLocation.getServerName()))
  .build());
  info.append(", regionLocation=").append(regionLocation);
}
{code}
else if (regionLocation != null && !regionLocation.equals(lastHost)) {
  // Ideally, if no regionLocation, write null to the hbase:meta but this 
will confuse clients
  // currently; they want a server to hit. TODO: Make clients wait if no 
location.
  put.add(CellBuilderFactory.create(CellBuilderType.SHALLOW_COPY)
  .setRow(put.getRow())
  .setFamily(HConstants.CATALOG_FAMILY)
  .setQualifier(getServerNameColumn(replicaId))
  .setTimestamp(put.getTimestamp())
  .setType(Cell.Type.Put)
  .setValue(Bytes.toBytes(regionLocation.getServerName()))
  .build());
  info.append(", regionLocation=").append(regionLocation);
}
{code}

So here if we do not log the regionLocation when OPENING, it means that the 
regionLocation is null, or it is the same with the previous location. Since for 
opening a region the regionLocation can never be null, this means the new 
region location is the same with the previous one.

And in your case, I guess this is not the truth right? The procedure with pid 
17 is a MRP? Or SCP? Anyway, in both cases, I do not think the region location 
should be the same with the previous one. Maybe you can add more logs in 
updateUserRegionLocation to see what's going on with the regionLocation and 
lastHost?

> info:servername and info:sn inconsistent for OPEN region
> 
>
> Key: HBASE-20792
> URL: https://issues.apache.org/jira/browse/HBASE-20792
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 2.0.2
>
>
> Next problem we've run into after HBASE-20752 and HBASE-20708
> After a rolling restart of a cluster, we'll see situations where a collection 
> of regions will simply not be assigned out to the RS. I was able to reproduce 
> this my mimic the restart patterns our tests do internally (ignore whether 
> this is the best way to restart nodes for now :)). The general pattern is 
> this:
> {code:java}
> for rs in regionservers:
>   stop(server, rs, RS)
> for master in masters:
>   stop(server, master, MASTER)
> sleep(15)
> for master in masters:
>   start(server, master, MASTER)
> for rs in regionservers:
>   start(server, rs, RS){code}
> Looking at meta, we can see why the Master is ignoring some regions:
> {noformat}
>  test
> column=table:state, timestamp=1529871718998, value=\x08\x00
>  

[jira] [Commented] (HBASE-20792) info:servername and info:sn inconsistent for OPEN region

2018-06-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524513#comment-16524513
 ] 

Duo Zhang commented on HBASE-20792:
---

{code}
else if (regionLocation != null && !regionLocation.equals(lastHost)) {
  // Ideally, if no regionLocation, write null to the hbase:meta but this 
will confuse clients
  // currently; they want a server to hit. TODO: Make clients wait if no 
location.
  put.add(CellBuilderFactory.create(CellBuilderType.SHALLOW_COPY)
  .setRow(put.getRow())
  .setFamily(HConstants.CATALOG_FAMILY)
  .setQualifier(getServerNameColumn(replicaId))
  .setTimestamp(put.getTimestamp())
  .setType(Cell.Type.Put)
  .setValue(Bytes.toBytes(regionLocation.getServerName()))
  .build());
  info.append(", regionLocation=").append(regionLocation);
}
{code}
else if (regionLocation != null && !regionLocation.equals(lastHost)) {
  // Ideally, if no regionLocation, write null to the hbase:meta but this 
will confuse clients
  // currently; they want a server to hit. TODO: Make clients wait if no 
location.
  put.add(CellBuilderFactory.create(CellBuilderType.SHALLOW_COPY)
  .setRow(put.getRow())
  .setFamily(HConstants.CATALOG_FAMILY)
  .setQualifier(getServerNameColumn(replicaId))
  .setTimestamp(put.getTimestamp())
  .setType(Cell.Type.Put)
  .setValue(Bytes.toBytes(regionLocation.getServerName()))
  .build());
  info.append(", regionLocation=").append(regionLocation);
}
{code}

So here if we do not log the regionLocation when OPENING, it means that the 
regionLocation is null, or it is the same with the previous location. Since for 
opening a region the regionLocation can never be null, this means the new 
region location is the same with the previous one.

And in your case, I guess this is not the truth right? The procedure with pid 
17 is a MRP? Or SCP? Anyway, in both cases, I do not think the region location 
should be the same with the previous one. Maybe you can add more logs in 
updateUserRegionLocation to see what's going on with the regionLocation and 
lastHost?

> info:servername and info:sn inconsistent for OPEN region
> 
>
> Key: HBASE-20792
> URL: https://issues.apache.org/jira/browse/HBASE-20792
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 2.0.2
>
>
> Next problem we've run into after HBASE-20752 and HBASE-20708
> After a rolling restart of a cluster, we'll see situations where a collection 
> of regions will simply not be assigned out to the RS. I was able to reproduce 
> this my mimic the restart patterns our tests do internally (ignore whether 
> this is the best way to restart nodes for now :)). The general pattern is 
> this:
> {code:java}
> for rs in regionservers:
>   stop(server, rs, RS)
> for master in masters:
>   stop(server, master, MASTER)
> sleep(15)
> for master in masters:
>   start(server, master, MASTER)
> for rs in regionservers:
>   start(server, rs, RS){code}
> Looking at meta, we can see why the Master is ignoring some regions:
> {noformat}
>  test
> column=table:state, timestamp=1529871718998, value=\x08\x00
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:regioninfo, timestamp=1529967103390, value={ENCODED => 
> 0297f680df6dc0166a44f9536346268e, NAME => 
> 'test,,1529871718122.0297f680df6dc0166a44f9536346268e.', STARTKEY
>  => '', ENDKEY => 
> ''}
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:seqnumDuringOpen, timestamp=1529967103390, 
> value=\x00\x00\x00\x00\x00\x00\x00*
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:server, timestamp=1529967103390, 
> value=ctr-e138-1518143905142-378097-02-12.hwx.site:16020
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:serverstartcode, timestamp=1529967103390, value=1529966776248
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   column=info:sn, 
> timestamp=1529967096482, 
> value=ctr-e138-1518143905142-378097-02-06.hwx.site,16020,1529966755170
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:state, timestamp=1529967103390, value=OPEN{noformat}
> The region is marked as {{OPEN}}. The master doesn't know any better. 
> However, the interesting bit is that {{info:server}} and {{info:sn}} are 
> inconsistent (which, according to the javadoc should not be possible for an 
> {{OPEN}} region).{{}}
> This doesn't happen every time, but I caught it 

[jira] [Commented] (HBASE-20783) NPE encountered when rolling update from master with an async peer to branch HBASE-19064

2018-06-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524508#comment-16524508
 ] 

Hadoop QA commented on HBASE-20783:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} HBASE-19064 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
26s{color} | {color:green} HBASE-19064 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
24s{color} | {color:green} HBASE-19064 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
35s{color} | {color:green} HBASE-19064 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
53s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
58s{color} | {color:green} HBASE-19064 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} HBASE-19064 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
33s{color} | {color:green} hbase-client generated 0 new + 103 unchanged - 1 
fixed = 103 total (was 104) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
15s{color} | {color:green} hbase-replication in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
32s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
58s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 57s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
2s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hbase-replication in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}112m 
26s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 2s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}160m 19s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-20783 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929297/HBASE-20783-HBASE-19064-addendum.patch
 |
| Optional Tests |  

[jira] [Commented] (HBASE-20789) TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky

2018-06-26 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524503#comment-16524503
 ] 

Zheng Hu commented on HBASE-20789:
--

bq. The existing block has nextBlockOnDiskSize set so we will get performance 
gains by keeping that version.
If the existingBlock has nextBlockOnDiskSize set , while cachedItem has 
nextBlockOnDiskSize(default = -1) unset, the comparison should be positive 
number ? 
So there is a typo ? 

{code}
 if (cb != null) {
   int comparison = BlockCacheUtil.validateBlockAddition(cb.getBuffer(), 
buf, cacheKey);
   if (comparison != 0) {
-if (comparison < 0) {
+if (comparison > 0) {
   LOG.warn("Cached block contents differ by nextBlockOnDiskSize. 
Keeping cached block.");
   return;
 } else {
{code}

> TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
> ---
>
> Key: HBASE-20789
> URL: https://issues.apache.org/jira/browse/HBASE-20789
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: bucket-33718.out
>
>
> The UT failed frequently in our internal branch-2... Will dig into the UT.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20792) info:servername and info:sn inconsistent for OPEN region

2018-06-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524504#comment-16524504
 ] 

Josh Elser commented on HBASE-20792:


Embarrassingly simple change:
{code:java}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
index 7d041381a1..94480c997a 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
@@ -176,7 +176,7 @@ public class RegionStateStore {
   MetaTableAccessor.addLocation(put, regionLocation, openSeqNum, 
replicaId);
   info.append(", openSeqNum=").append(openSeqNum);
   info.append(", regionLocation=").append(regionLocation);
-    } else if (regionLocation != null && !regionLocation.equals(lastHost)) {
+    } else if (regionLocation != null && state == State.OPENING) {
   LOG.debug("Region location is opening: last={}, current={}, setting 
info:sn to {}",
   lastHost, regionLocation, regionLocation.getServerName());
   // Ideally, if no regionLocation, write null to the hbase:meta but this 
will confuse clients{code}
My local setup was fixed when I deployed this. Let me work on an upstream fix 
with a test.

> info:servername and info:sn inconsistent for OPEN region
> 
>
> Key: HBASE-20792
> URL: https://issues.apache.org/jira/browse/HBASE-20792
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 2.0.2
>
>
> Next problem we've run into after HBASE-20752 and HBASE-20708
> After a rolling restart of a cluster, we'll see situations where a collection 
> of regions will simply not be assigned out to the RS. I was able to reproduce 
> this my mimic the restart patterns our tests do internally (ignore whether 
> this is the best way to restart nodes for now :)). The general pattern is 
> this:
> {code:java}
> for rs in regionservers:
>   stop(server, rs, RS)
> for master in masters:
>   stop(server, master, MASTER)
> sleep(15)
> for master in masters:
>   start(server, master, MASTER)
> for rs in regionservers:
>   start(server, rs, RS){code}
> Looking at meta, we can see why the Master is ignoring some regions:
> {noformat}
>  test
> column=table:state, timestamp=1529871718998, value=\x08\x00
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:regioninfo, timestamp=1529967103390, value={ENCODED => 
> 0297f680df6dc0166a44f9536346268e, NAME => 
> 'test,,1529871718122.0297f680df6dc0166a44f9536346268e.', STARTKEY
>  => '', ENDKEY => 
> ''}
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:seqnumDuringOpen, timestamp=1529967103390, 
> value=\x00\x00\x00\x00\x00\x00\x00*
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:server, timestamp=1529967103390, 
> value=ctr-e138-1518143905142-378097-02-12.hwx.site:16020
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:serverstartcode, timestamp=1529967103390, value=1529966776248
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   column=info:sn, 
> timestamp=1529967096482, 
> value=ctr-e138-1518143905142-378097-02-06.hwx.site,16020,1529966755170
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:state, timestamp=1529967103390, value=OPEN{noformat}
> The region is marked as {{OPEN}}. The master doesn't know any better. 
> However, the interesting bit is that {{info:server}} and {{info:sn}} are 
> inconsistent (which, according to the javadoc should not be possible for an 
> {{OPEN}} region).{{}}
> This doesn't happen every time, but I caught it yesterday on the 2nd or 3rd 
> attempt, so I'm hopeful it's not a bear to repro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20732) Shutdown scan pool when master is stopped.

2018-06-26 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524498#comment-16524498
 ] 

Reid Chan commented on HBASE-20732:
---

Thanks chia-ping, v8 addressed your comment and modified related method 
descriptions.

> Shutdown scan pool when master is stopped.
> --
>
> Key: HBASE-20732
> URL: https://issues.apache.org/jira/browse/HBASE-20732
> Project: HBase
>  Issue Type: Bug
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Minor
> Fix For: 3.0.0, 2.1.0, 1.5.0, 1.4.6, 2.0.2
>
> Attachments: HBASE-20732.master.001.patch, 
> HBASE-20732.master.002.patch, HBASE-20732.master.003.patch, 
> HBASE-20732.master.004.patch, HBASE-20732.master.005.patch, 
> HBASE-20732.master.006.patch, HBASE-20732.master.007.patch, 
> HBASE-20732.master.008.patch
>
>
> If master is stopped, {{DirScanPool}} is kept open. This is found by 
> [~chia7712] when reviewing HBASE-20352.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20732) Shutdown scan pool when master is stopped.

2018-06-26 Thread Reid Chan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reid Chan updated HBASE-20732:
--
Attachment: HBASE-20732.master.008.patch

> Shutdown scan pool when master is stopped.
> --
>
> Key: HBASE-20732
> URL: https://issues.apache.org/jira/browse/HBASE-20732
> Project: HBase
>  Issue Type: Bug
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Minor
> Fix For: 3.0.0, 2.1.0, 1.5.0, 1.4.6, 2.0.2
>
> Attachments: HBASE-20732.master.001.patch, 
> HBASE-20732.master.002.patch, HBASE-20732.master.003.patch, 
> HBASE-20732.master.004.patch, HBASE-20732.master.005.patch, 
> HBASE-20732.master.006.patch, HBASE-20732.master.007.patch, 
> HBASE-20732.master.008.patch
>
>
> If master is stopped, {{DirScanPool}} is kept open. This is found by 
> [~chia7712] when reviewing HBASE-20352.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20792) info:servername and info:sn inconsistent for OPEN region

2018-06-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524493#comment-16524493
 ] 

Josh Elser commented on HBASE-20792:


Hey [~Apache9] – this is a few days old branch-2.0 with your HBASE-20708 pulled 
back onto it. I didn't realize this one didn't land onto branch-2.0 already 
(was there a reason for that?). It seems to have helped best I could tell :)
{quote}Could you please find the log in 
RegionStateStore.updateUserRegionLocation for the broken region? It is 
something like this:
{quote}
Hah, funny you should ask. This is where I'm currently poking. What I believe 
to be happening is that when we SCP->AssignProc for this region goes to update 
the OPENING state, we don't actually update {{info:sn}} like the code implies 
it should. Note, there is some more logging below that is from me hacking on 
things.
{noformat}
2018-06-27 02:14:34,803 TRACE [PEWorker-15] assignment.AssignProcedure: Update 
pid=18, ppid=17, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure 
table=hbase:namespace
, region=837a84143c4cd17952282464dfdcfd55; rit=OFFLINE, 
location=ctr-e138-1518143905142-380753-01-08.hwx.site,16020,1530065530163
2018-06-27 02:14:34,804 INFO  [PEWorker-15] assignment.RegionStateStore: pid=18 
updating hbase:meta row=837a84143c4cd17952282464dfdcfd55, regionState=OPENING
2018-06-27 02:14:34,912 INFO  [PEWorker-15] 
assignment.RegionTransitionProcedure: Dispatch pid=18, ppid=17, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=hba
se:namespace, region=837a84143c4cd17952282464dfdcfd55; rit=OPENING, 
location=ctr-e138-1518143905142-380753-01-08.hwx.site,16020,1530065530163
2018-06-27 02:14:35,148 TRACE 
[RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16000] 
assignment.AssignmentManager: Update region transition 
serverName=ctr-e138-1518143905
142-380753-01-08.hwx.site,16020,1530065530163 region=rit=OPENING, 
location=ctr-e138-1518143905142-380753-01-08.hwx.site,16020,1530065530163, 
table=hbase:namespace, regi
on=837a84143c4cd17952282464dfdcfd55 regionState=OPENED
2018-06-27 02:14:35,149 DEBUG 
[RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16000] 
assignment.RegionTransitionProcedure: Received report OPENED seqId=16, pid=18, 
ppid=1
7, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure 
table=hbase:namespace, region=837a84143c4cd17952282464dfdcfd55; rit=OPENING, 
location=ctr-e138-1518143905142-38075
3-01-08.hwx.site,16020,1530065530163
2018-06-27 02:14:35,149 DEBUG [PEWorker-1] 
assignment.RegionTransitionProcedure: Finishing pid=18, ppid=17, 
state=RUNNABLE:REGION_TRANSITION_FINISH; AssignProcedure table=hbase
:namespace, region=837a84143c4cd17952282464dfdcfd55; rit=OPENING, 
location=ctr-e138-1518143905142-380753-01-08.hwx.site,16020,1530065530163
2018-06-27 02:14:35,149 DEBUG [PEWorker-1] assignment.RegionStateStore: 
openSeqNum=16, adding location of 
ctr-e138-1518143905142-380753-01-08.hwx.site,16020,1530065530163 f
or 837a84143c4cd17952282464dfdcfd55
2018-06-27 02:14:35,150 INFO  [PEWorker-1] assignment.RegionStateStore: pid=18 
updating hbase:meta row=837a84143c4cd17952282464dfdcfd55, regionState=OPEN, 
openSeqNum=16, region
Location=ctr-e138-1518143905142-380753-01-08.hwx.site,16020,1530065530163{noformat}
Let me try to summarize what I think is happening for you (everyone). Consider 
one region "A" and two RS "rs1" and "rs2". The final result is that "A" is left 
unassigned by HBase but marked as OPEN in meta:
 * "A" is on "rs2"
 * move "A", "rs1"
 * kill "rs1"
 * SCP runs for "rs1"
 ** AP/RegionTransitionsProcedure runs for "A", OFFLINE'ing and then assigning 
to "rs2"
 ** {{info:sn}} is never updated with the OPENING state, but this is OK since 
the region does actually OPEN on "RS2"
 * kill "rs2"
 * restart master
 * Master doesn't assign "A" because it sees {{info:state=OPEN}}, 
{{info:sn=rs1}}, {{info:server=rs2}}.

> info:servername and info:sn inconsistent for OPEN region
> 
>
> Key: HBASE-20792
> URL: https://issues.apache.org/jira/browse/HBASE-20792
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 2.0.2
>
>
> Next problem we've run into after HBASE-20752 and HBASE-20708
> After a rolling restart of a cluster, we'll see situations where a collection 
> of regions will simply not be assigned out to the RS. I was able to reproduce 
> this my mimic the restart patterns our tests do internally (ignore whether 
> this is the best way to restart nodes for now :)). The general pattern is 
> this:
> {code:java}
> for rs in regionservers:
>   stop(server, rs, RS)
> for master in masters:
>   stop(server, master, MASTER)
> sleep(15)
> for master in 

[jira] [Commented] (HBASE-20745) Log when master proc wal rolls

2018-06-26 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524482#comment-16524482
 ] 

Andrew Purtell commented on HBASE-20745:


Commit log talks about logging not config default change is all. 

> Log when master proc wal rolls
> --
>
> Key: HBASE-20745
> URL: https://issues.apache.org/jira/browse/HBASE-20745
> Project: HBase
>  Issue Type: Sub-task
>  Components: debugging
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20745.master.001.patch
>
>
> Emit when we roll master proc WAL so can see when they happen. Want to 
> correlate instances of corruption w/ events on Master. Currently hard to do 
> on  a server where log-level is INFO (default for many deploys).
> Also, we log STUCK regions every 5 seconds. If a bundle of regions get stuck, 
> we can log so frequently, we roll away where the problem happened so lose the 
> chance to debug. Let me fix that too
> Need both debugging instances of parent issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-15320) HBase connector for Kafka Connect

2018-06-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524468#comment-16524468
 ] 

Hadoop QA commented on HBASE-15320:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue}  0m  
3s{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
57s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
15s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} refguide {color} | {color:blue}  5m 
16s{color} | {color:blue} branch has no errors when building the reference 
guide. See footer for rendered docs, which you should manually inspect. {color} 
|
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
31s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hbase-assembly . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
43s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
45s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  7m 19s{color} 
| {color:red} root generated 104 new + 1295 unchanged - 0 fixed = 1399 total 
(was 1295) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m 
22s{color} | {color:red} root: The patch generated 517 new + 15 unchanged - 0 
fixed = 532 total (was 15) {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 1s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
7s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} refguide {color} | {color:blue}  4m 
58s{color} | {color:blue} patch has no errors when building the reference 
guide. See footer for rendered docs, which you should manually inspect. {color} 
|
| {color:red}-1{color} | {color:red} shadedjars {color} | {color:red}  0m 
10s{color} | {color:red} patch has 7 errors when building our shaded downstream 
artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 27s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hbase-kafka-model hbase-assembly . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
23s{color} | {color:green} the patch 

[jira] [Commented] (HBASE-20790) Fix the style issues on branch HBASE-19064 before merging back to master

2018-06-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524467#comment-16524467
 ] 

Duo Zhang commented on HBASE-20790:
---

For the FutureReturnValueIgnored warning, there is no way to fix it currently. 
The problem is that, the chained CompletableFuture can not handle checked 
exception, so the only way to transform the checked exception, is creating a 
new CompletableFuture manually and complete it in whenComplete, which means 
that the return value of the chained CompletableFuture will finally be ignored, 
as every method in CompletableFuture will return a new CompletableFuture, which 
causes the error phone warning...

> Fix the style issues on branch HBASE-19064 before merging back to master
> 
>
> Key: HBASE-20790
> URL: https://issues.apache.org/jira/browse/HBASE-20790
> Project: HBase
>  Issue Type: Sub-task
>  Components: build
>Reporter: Duo Zhang
>Priority: Major
> Fix For: HBASE-19064
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-6028) Implement a cancel for in-progress compactions

2018-06-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524465#comment-16524465
 ] 

stack commented on HBASE-6028:
--

[~mogoel] let's address this nice feedback!

> Implement a cancel for in-progress compactions
> --
>
> Key: HBASE-6028
> URL: https://issues.apache.org/jira/browse/HBASE-6028
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Derek Wollenstein
>Assignee: Mohit Goel
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-6028.master.006.patch
>
>
> Depending on current server load, it can be extremely expensive to run 
> periodic minor / major compactions.  It would be helpful to have a feature 
> where a user could use the shell or a client tool to explicitly cancel an 
> in-progress compactions.  This would allow a system to recover when too many 
> regions became eligible for compactions at once



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20745) Log when master proc wal rolls

2018-06-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524464#comment-16524464
 ] 

stack commented on HBASE-20745:
---

Intentional unless I changed wrong thing?  In description I talk of how the 5 
seconds is too frequent.  Logs fill quickly.  Issue that caused prob rolls 
away.  What you thinking [~apurtell]?

> Log when master proc wal rolls
> --
>
> Key: HBASE-20745
> URL: https://issues.apache.org/jira/browse/HBASE-20745
> Project: HBase
>  Issue Type: Sub-task
>  Components: debugging
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20745.master.001.patch
>
>
> Emit when we roll master proc WAL so can see when they happen. Want to 
> correlate instances of corruption w/ events on Master. Currently hard to do 
> on  a server where log-level is INFO (default for many deploys).
> Also, we log STUCK regions every 5 seconds. If a bundle of regions get stuck, 
> we can log so frequently, we roll away where the problem happened so lose the 
> chance to debug. Let me fix that too
> Need both debugging instances of parent issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20745) Log when master proc wal rolls

2018-06-26 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524450#comment-16524450
 ] 

Andrew Purtell commented on HBASE-20745:


Intentional?
{noformat}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
index 3412c82f2d..d1e6e85f3c 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
@@ -136,7 +136,7 @@ public class AssignmentManager implements ServerListener {
 
   public static final String RIT_CHORE_INTERVAL_MSEC_CONF_KEY =
   "hbase.assignment.rit.chore.interval.msec";
-  private static final int DEFAULT_RIT_CHORE_INTERVAL_MSEC = 5 * 1000;
+  private static final int DEFAULT_RIT_CHORE_INTERVAL_MSEC = 60 * 1000;
 
   public static final String ASSIGN_MAX_ATTEMPTS =
   "hbase.assignment.maximum.attempts";
{noformat}

Can we revert this bit and put it in with a separate JIRA? Or was this part of 
a fix for something already logged?

> Log when master proc wal rolls
> --
>
> Key: HBASE-20745
> URL: https://issues.apache.org/jira/browse/HBASE-20745
> Project: HBase
>  Issue Type: Sub-task
>  Components: debugging
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20745.master.001.patch
>
>
> Emit when we roll master proc WAL so can see when they happen. Want to 
> correlate instances of corruption w/ events on Master. Currently hard to do 
> on  a server where log-level is INFO (default for many deploys).
> Also, we log STUCK regions every 5 seconds. If a bundle of regions get stuck, 
> we can log so frequently, we roll away where the problem happened so lose the 
> chance to debug. Let me fix that too
> Need both debugging instances of parent issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-6028) Implement a cancel for in-progress compactions

2018-06-26 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524446#comment-16524446
 ] 

Ted Yu commented on HBASE-6028:
---

{code}
2018-06-26 13:37:22,042 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16022-shortCompactions-1530045419380]
 regionserver.CompactSplit: Compaction failed
{code}
When compaction is interrupted, it is not really an error.
Within CompactSplit, whether compaction is enabled or not is known to the 
CompactionRunner. In the above case, can we replace the error log with INFO log 
saying compaction is interrupted ?

> Implement a cancel for in-progress compactions
> --
>
> Key: HBASE-6028
> URL: https://issues.apache.org/jira/browse/HBASE-6028
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Derek Wollenstein
>Assignee: Mohit Goel
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-6028.master.006.patch
>
>
> Depending on current server load, it can be extremely expensive to run 
> periodic minor / major compactions.  It would be helpful to have a feature 
> where a user could use the shell or a client tool to explicitly cancel an 
> in-progress compactions.  This would allow a system to recover when too many 
> regions became eligible for compactions at once



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20732) Shutdown scan pool when master is stopped.

2018-06-26 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20732:
---
Fix Version/s: 2.0.2
   2.1.0
   3.0.0

> Shutdown scan pool when master is stopped.
> --
>
> Key: HBASE-20732
> URL: https://issues.apache.org/jira/browse/HBASE-20732
> Project: HBase
>  Issue Type: Bug
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Minor
> Fix For: 3.0.0, 2.1.0, 1.5.0, 1.4.6, 2.0.2
>
> Attachments: HBASE-20732.master.001.patch, 
> HBASE-20732.master.002.patch, HBASE-20732.master.003.patch, 
> HBASE-20732.master.004.patch, HBASE-20732.master.005.patch, 
> HBASE-20732.master.006.patch, HBASE-20732.master.007.patch
>
>
> If master is stopped, {{DirScanPool}} is kept open. This is found by 
> [~chia7712] when reviewing HBASE-20352.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20732) Shutdown scan pool when master is stopped.

2018-06-26 Thread Chia-Ping Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524434#comment-16524434
 ] 

Chia-Ping Tsai commented on HBASE-20732:


{code:java}
+@Nullable
 private List getFilteredStatus(Predicate function) 
throws IOException {
   return FSUtils.listStatusWithStatusFilter(fs, dir, status -> 
function.test(status));
 }{code}
Is it better to use empty list to replace the null value? Returning a empty 
list can simplify the following code.
{code:java}
+subDirs = 
Optional.ofNullable(getFilteredStatus(FileStatus::isDirectory))
+  .orElseGet(Collections::emptyList);
+files = Optional.ofNullable(getFilteredStatus(FileStatus::isFile))
+.orElseGet(Collections::emptyList);{code}
 

> Shutdown scan pool when master is stopped.
> --
>
> Key: HBASE-20732
> URL: https://issues.apache.org/jira/browse/HBASE-20732
> Project: HBase
>  Issue Type: Bug
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Minor
> Fix For: 3.0.0, 2.1.0, 1.5.0, 1.4.6, 2.0.2
>
> Attachments: HBASE-20732.master.001.patch, 
> HBASE-20732.master.002.patch, HBASE-20732.master.003.patch, 
> HBASE-20732.master.004.patch, HBASE-20732.master.005.patch, 
> HBASE-20732.master.006.patch, HBASE-20732.master.007.patch
>
>
> If master is stopped, {{DirScanPool}} is kept open. This is found by 
> [~chia7712] when reviewing HBASE-20352.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20732) Shutdown scan pool when master is stopped.

2018-06-26 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20732:
---
Fix Version/s: 1.4.6
   1.5.0

> Shutdown scan pool when master is stopped.
> --
>
> Key: HBASE-20732
> URL: https://issues.apache.org/jira/browse/HBASE-20732
> Project: HBase
>  Issue Type: Bug
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Minor
> Fix For: 1.5.0, 1.4.6
>
> Attachments: HBASE-20732.master.001.patch, 
> HBASE-20732.master.002.patch, HBASE-20732.master.003.patch, 
> HBASE-20732.master.004.patch, HBASE-20732.master.005.patch, 
> HBASE-20732.master.006.patch, HBASE-20732.master.007.patch
>
>
> If master is stopped, {{DirScanPool}} is kept open. This is found by 
> [~chia7712] when reviewing HBASE-20352.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-6028) Implement a cancel for in-progress compactions

2018-06-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524412#comment-16524412
 ] 

Hadoop QA commented on HBASE-6028:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
33s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  3m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
33s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
35s{color} | {color:red} hbase-client: The patch generated 7 new + 304 
unchanged - 0 fixed = 311 total (was 304) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
15s{color} | {color:red} hbase-server: The patch generated 15 new + 338 
unchanged - 5 fixed = 353 total (was 343) {color} |
| {color:red}-1{color} | {color:red} rubocop {color} | {color:red}  0m 
11s{color} | {color:red} The patch generated 18 new + 411 unchanged - 0 fixed = 
429 total (was 411) {color} |
| {color:orange}-0{color} | {color:orange} ruby-lint {color} | {color:orange}  
0m  5s{color} | {color:orange} The patch generated 8 new + 725 unchanged - 0 
fixed = 733 total (was 725) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
32s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 57s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green}  
1m 41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
31s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
0s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}114m 11s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit 

[jira] [Commented] (HBASE-20791) RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s internalBalancer

2018-06-26 Thread chenxu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524400#comment-16524400
 ] 

chenxu commented on HBASE-20791:


bq. Can you put the patch on review board after fixing failing tests ?
https://reviews.apache.org/r/67752/

> RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s 
> internalBalancer
> --
>
> Key: HBASE-20791
> URL: https://issues.apache.org/jira/browse/HBASE-20791
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Affects Versions: 3.0.0, 2.0.0
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Attachments: HBASE-20791-master-v1.patch
>
>
> RSGroupBasedLoadBalancer#setClusterMetrics should pass ClusterMetrics to it’s 
> internalBalancer, Or the StochasticLoadBalancer(internal balancer) will lose 
> it's Up-to-date RegionLoads info, and effect the balance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20792) info:servername and info:sn inconsistent for OPEN region

2018-06-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524397#comment-16524397
 ] 

Duo Zhang commented on HBASE-20792:
---

Could you please find the log in RegionStateStore.updateUserRegionLocation for 
the broken region? It is something like this:

{code}
final StringBuilder info =
  new StringBuilder("pid=").append(pid).append(" updating hbase:meta row=")
.append(regionInfo.getEncodedName()).append(", 
regionState=").append(state);
{code}



> info:servername and info:sn inconsistent for OPEN region
> 
>
> Key: HBASE-20792
> URL: https://issues.apache.org/jira/browse/HBASE-20792
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 2.0.2
>
>
> Next problem we've run into after HBASE-20752 and HBASE-20708
> After a rolling restart of a cluster, we'll see situations where a collection 
> of regions will simply not be assigned out to the RS. I was able to reproduce 
> this my mimic the restart patterns our tests do internally (ignore whether 
> this is the best way to restart nodes for now :)). The general pattern is 
> this:
> {code:java}
> for rs in regionservers:
>   stop(server, rs, RS)
> for master in masters:
>   stop(server, master, MASTER)
> sleep(15)
> for master in masters:
>   start(server, master, MASTER)
> for rs in regionservers:
>   start(server, rs, RS){code}
> Looking at meta, we can see why the Master is ignoring some regions:
> {noformat}
>  test
> column=table:state, timestamp=1529871718998, value=\x08\x00
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:regioninfo, timestamp=1529967103390, value={ENCODED => 
> 0297f680df6dc0166a44f9536346268e, NAME => 
> 'test,,1529871718122.0297f680df6dc0166a44f9536346268e.', STARTKEY
>  => '', ENDKEY => 
> ''}
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:seqnumDuringOpen, timestamp=1529967103390, 
> value=\x00\x00\x00\x00\x00\x00\x00*
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:server, timestamp=1529967103390, 
> value=ctr-e138-1518143905142-378097-02-12.hwx.site:16020
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:serverstartcode, timestamp=1529967103390, value=1529966776248
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   column=info:sn, 
> timestamp=1529967096482, 
> value=ctr-e138-1518143905142-378097-02-06.hwx.site,16020,1529966755170
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:state, timestamp=1529967103390, value=OPEN{noformat}
> The region is marked as {{OPEN}}. The master doesn't know any better. 
> However, the interesting bit is that {{info:server}} and {{info:sn}} are 
> inconsistent (which, according to the javadoc should not be possible for an 
> {{OPEN}} region).{{}}
> This doesn't happen every time, but I caught it yesterday on the 2nd or 3rd 
> attempt, so I'm hopeful it's not a bear to repro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20792) info:servername and info:sn inconsistent for OPEN region

2018-06-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524382#comment-16524382
 ] 

Duo Zhang commented on HBASE-20792:
---

Is this on branch-2.1 or branch-2.0? HBASE-20708 is not in branch-2.0. But if 
the problem is in ProcWAL replay, then I think we will have the same problem on 
branch-2.1.

But I do not think we will write to meta when replaying ProcWALs? Do you mean 
that the MRP has not finished yet, so after we construct the PE it will execute 
the MRP and then messes up the data in meta?

Thanks.

> info:servername and info:sn inconsistent for OPEN region
> 
>
> Key: HBASE-20792
> URL: https://issues.apache.org/jira/browse/HBASE-20792
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 2.0.2
>
>
> Next problem we've run into after HBASE-20752 and HBASE-20708
> After a rolling restart of a cluster, we'll see situations where a collection 
> of regions will simply not be assigned out to the RS. I was able to reproduce 
> this my mimic the restart patterns our tests do internally (ignore whether 
> this is the best way to restart nodes for now :)). The general pattern is 
> this:
> {code:java}
> for rs in regionservers:
>   stop(server, rs, RS)
> for master in masters:
>   stop(server, master, MASTER)
> sleep(15)
> for master in masters:
>   start(server, master, MASTER)
> for rs in regionservers:
>   start(server, rs, RS){code}
> Looking at meta, we can see why the Master is ignoring some regions:
> {noformat}
>  test
> column=table:state, timestamp=1529871718998, value=\x08\x00
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:regioninfo, timestamp=1529967103390, value={ENCODED => 
> 0297f680df6dc0166a44f9536346268e, NAME => 
> 'test,,1529871718122.0297f680df6dc0166a44f9536346268e.', STARTKEY
>  => '', ENDKEY => 
> ''}
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:seqnumDuringOpen, timestamp=1529967103390, 
> value=\x00\x00\x00\x00\x00\x00\x00*
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:server, timestamp=1529967103390, 
> value=ctr-e138-1518143905142-378097-02-12.hwx.site:16020
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:serverstartcode, timestamp=1529967103390, value=1529966776248
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   column=info:sn, 
> timestamp=1529967096482, 
> value=ctr-e138-1518143905142-378097-02-06.hwx.site,16020,1529966755170
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:state, timestamp=1529967103390, value=OPEN{noformat}
> The region is marked as {{OPEN}}. The master doesn't know any better. 
> However, the interesting bit is that {{info:server}} and {{info:sn}} are 
> inconsistent (which, according to the javadoc should not be possible for an 
> {{OPEN}} region).{{}}
> This doesn't happen every time, but I caught it yesterday on the 2nd or 3rd 
> attempt, so I'm hopeful it's not a bear to repro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-15320) HBase connector for Kafka Connect

2018-06-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524380#comment-16524380
 ] 

stack commented on HBASE-15320:
---

Its a bit smaller (smile). 

Took a quick look.

Some licenses are missing.

We should fix it so Table has defaults for most of its methods so you don't 
have to supply a bunch of unimplementeds. Patch would be smaller still.

Adds two new modules, kafka-model and kafka-proxy. hbase-server depends on the 
kafka modules.

hbase-common/src/main/resources/hbase-default.xml changes intended? Same for 
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcServer.java

Is this adding source only?

You need this for sure?

177 
178   com.google.protobuf
179   protobuf-java
180 

Thanks.

> HBase connector for Kafka Connect
> -
>
> Key: HBASE-15320
> URL: https://issues.apache.org/jira/browse/HBASE-15320
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Andrew Purtell
>Assignee: Mike Wingert
>Priority: Major
>  Labels: beginner
> Fix For: 3.0.0
>
> Attachments: HBASE-15320.master.1.patch, HBASE-15320.master.10.patch, 
> HBASE-15320.master.11.patch, HBASE-15320.master.2.patch, 
> HBASE-15320.master.3.patch, HBASE-15320.master.4.patch, 
> HBASE-15320.master.5.patch, HBASE-15320.master.6.patch, 
> HBASE-15320.master.7.patch, HBASE-15320.master.8.patch, 
> HBASE-15320.master.8.patch, HBASE-15320.master.9.patch, HBASE-15320.pdf, 
> HBASE-15320.pdf
>
>
> Implement an HBase connector with source and sink tasks for the Connect 
> framework (http://docs.confluent.io/2.0.0/connect/index.html) available in 
> Kafka 0.9 and later.
> See also: 
> http://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines
> An HBase source 
> (http://docs.confluent.io/2.0.0/connect/devguide.html#task-example-source-task)
>  could be implemented as a replication endpoint or WALObserver, publishing 
> cluster wide change streams from the WAL to one or more topics, with 
> configurable mapping and partitioning of table changes to topics.  
> An HBase sink task 
> (http://docs.confluent.io/2.0.0/connect/devguide.html#sink-tasks) would 
> persist, with optional transformation (JSON? Avro?, map fields to native 
> schema?), Kafka SinkRecords into HBase tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20783) NPE encountered when rolling update from master with an async peer to branch HBASE-19064

2018-06-26 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20783:
--
Attachment: HBASE-20783-HBASE-19064-addendum.patch

> NPE encountered when rolling update from master with an async peer to branch 
> HBASE-19064
> 
>
> Key: HBASE-20783
> URL: https://issues.apache.org/jira/browse/HBASE-20783
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: HBASE-19064
>
> Attachments: HBASE-20783-HBASE-19064-addendum.patch, 
> HBASE-20783-HBASE-19064-v1.patch, HBASE-20783-HBASE-19064-v1.patch, 
> HBASE-20783-addendum.patch
>
>
> {code}
> 2018-06-25 16:25:04,261 ERROR [Thread-14] master.HMaster: Failed to become 
> active master
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.replication.SyncReplicationState.parseFrom(SyncReplicationState.java:72)
> at 
> org.apache.hadoop.hbase.replication.ZKReplicationPeerStorage.getSyncReplicationState(ZKReplicationPeerStorage.java:224)
> at 
> org.apache.hadoop.hbase.replication.ZKReplicationPeerStorage.getPeerSyncReplicationState(ZKReplicationPeerStorage.java:240)
> at 
> org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.create(ReplicationPeerManager.java:479)
> at 
> org.apache.hadoop.hbase.master.HMaster.initializeZKBasedSystemTrackers(HMaster.java:755)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:895)
> at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2126)
> at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:571)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-6028) Implement a cancel for in-progress compactions

2018-06-26 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524372#comment-16524372
 ] 

Sean Busbey commented on HBASE-6028:


possible places:

http://hbase.apache.org/book.html#ops.regionmgt.majorcompact

http://hbase.apache.org/book.html#compaction

> Implement a cancel for in-progress compactions
> --
>
> Key: HBASE-6028
> URL: https://issues.apache.org/jira/browse/HBASE-6028
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Derek Wollenstein
>Assignee: Mohit Goel
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-6028.master.006.patch
>
>
> Depending on current server load, it can be extremely expensive to run 
> periodic minor / major compactions.  It would be helpful to have a feature 
> where a user could use the shell or a client tool to explicitly cancel an 
> in-progress compactions.  This would allow a system to recover when too many 
> regions became eligible for compactions at once



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-6028) Implement a cancel for in-progress compactions

2018-06-26 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524364#comment-16524364
 ] 

Sean Busbey commented on HBASE-6028:


please add docs.

> Implement a cancel for in-progress compactions
> --
>
> Key: HBASE-6028
> URL: https://issues.apache.org/jira/browse/HBASE-6028
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Derek Wollenstein
>Assignee: Mohit Goel
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-6028.master.006.patch
>
>
> Depending on current server load, it can be extremely expensive to run 
> periodic minor / major compactions.  It would be helpful to have a feature 
> where a user could use the shell or a client tool to explicitly cancel an 
> in-progress compactions.  This would allow a system to recover when too many 
> regions became eligible for compactions at once



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20701) too much logging when balancer runs from BaseLoadBalancer

2018-06-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524357#comment-16524357
 ] 

Hudson commented on HBASE-20701:


Results for branch branch-1.3
[build #373 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/373/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/373//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/373//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/373//JDK8_Nightly_Build_Report_(Hadoop2)/]




(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> too much logging when balancer runs from BaseLoadBalancer
> -
>
> Key: HBASE-20701
> URL: https://issues.apache.org/jira/browse/HBASE-20701
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Monani Mihir
>Assignee: Monani Mihir
>Priority: Trivial
> Fix For: 1.5.0, 1.3.3, 1.4.6
>
> Attachments: HBASE-20701-branch-1.3.patch, 
> HBASE-20701-branch-1.3.patch, HBASE-20701-branch-1.3.patch, 
> HBASE-20701-branch-1.4.patch, HBASE-20701-branch-1.4.patch, 
> HBASE-20701.branch-1.001.patch
>
>
> When balancer runs, it tries to find least loaded server with better locality 
> for current region. During this, we make debug level logging for each of 
> those regions. It creates too much amount of logging at debug level , we 
> should move this to trace level logging.
> {code:java}
> int getLeastLoadedTopServerForRegion (int region, int currentServer) {
> ...
> if (leastLoadedServerIndex != -1) {
> LOG.debug("Pick the least loaded server " + 
> servers[leastLoadedServerIndex].getHostname()
> + " with better locality for region " + regions[region]);
> }
> ...
> }{code}
> This was fixed in branch-2.0 as part of -HBASE-14614-  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20194) Basic Replication WebUI - Master

2018-06-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524356#comment-16524356
 ] 

stack commented on HBASE-20194:
---

I tried to backport but depends on the new replication setup so passed on it. 
Pity.

> Basic Replication WebUI - Master
> 
>
> Key: HBASE-20194
> URL: https://issues.apache.org/jira/browse/HBASE-20194
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication, Usability
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Critical
> Fix For: 3.0.0, 2.1.0
>
> Attachments: HBASE-20194.master.001.patch, 
> HBASE-20194.master.002.patch, HBASE-20194.master.003.patch, 
> HBASE-20194.master.004.patch, HBASE-20194.master.005.patch, 
> HBASE-20194.master.005.patch, HBASE-20194.master.006.patch
>
>
> subtask of HBASE-15809. Implementation of Replication WebUI on Master webpage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20194) Basic Replication WebUI - Master

2018-06-26 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20194:
--
Fix Version/s: (was: 2.0.2)

> Basic Replication WebUI - Master
> 
>
> Key: HBASE-20194
> URL: https://issues.apache.org/jira/browse/HBASE-20194
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication, Usability
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Critical
> Fix For: 3.0.0, 2.1.0
>
> Attachments: HBASE-20194.master.001.patch, 
> HBASE-20194.master.002.patch, HBASE-20194.master.003.patch, 
> HBASE-20194.master.004.patch, HBASE-20194.master.005.patch, 
> HBASE-20194.master.005.patch, HBASE-20194.master.006.patch
>
>
> subtask of HBASE-15809. Implementation of Replication WebUI on Master webpage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20795) Allow option in BBKVComparator.compare to do comparison without sequence id

2018-06-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524355#comment-16524355
 ] 

stack edited comment on HBASE-20795 at 6/27/18 12:00 AM:
-

Pushed to branch-2.0+ after fixing checkstyle. Thanks [~an...@apache.org]


was (Author: stack):
Pushed to branch-2.0+

> Allow option in BBKVComparator.compare to do comparison without sequence id
> ---
>
> Key: HBASE-20795
> URL: https://issues.apache.org/jira/browse/HBASE-20795
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.1
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 3.0.0, 2.1.0, 2.0.2
>
> Attachments: HBASE-20795.patch
>
>
> CellComparatorImpl#compare(final Cell a, final Cell b, boolean 
> ignoreSequenceid) needs to ignore sequence id in comparison if 
> ignoreSequenceId parameter is set to true but BBKVComparator.compare used 
> internally for the cell of type ByteBufferKeyValue doesn't consider this.
>  {code}
> @Override
>   public int compare(final Cell a, final Cell b, boolean ignoreSequenceid) {
> int diff = 0;
> // "Peel off" the most common path.
> if (a instanceof ByteBufferKeyValue && b instanceof ByteBufferKeyValue) {
>   diff = BBKVComparator.compare((ByteBufferKeyValue)a, 
> (ByteBufferKeyValue)b);
>   if (diff != 0) {
> return diff;
>   }
> } else {
>   diff = compareRows(a, b);
>   if (diff != 0) {
> return diff;
>   }
>   diff = compareWithoutRow(a, b);
>   if (diff != 0) {
> return diff;
>   }
> }
> // Negate following comparisons so later edits show up first mvccVersion: 
> later sorts first
> return ignoreSequenceid? diff: Long.compare(b.getSequenceId(), 
> a.getSequenceId());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20795) Allow option in BBKVComparator.compare to do comparison without sequence id

2018-06-26 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20795:
--
   Resolution: Fixed
Fix Version/s: 2.0.2
   2.1.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to branch-2.0+

> Allow option in BBKVComparator.compare to do comparison without sequence id
> ---
>
> Key: HBASE-20795
> URL: https://issues.apache.org/jira/browse/HBASE-20795
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.1
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 3.0.0, 2.1.0, 2.0.2
>
> Attachments: HBASE-20795.patch
>
>
> CellComparatorImpl#compare(final Cell a, final Cell b, boolean 
> ignoreSequenceid) needs to ignore sequence id in comparison if 
> ignoreSequenceId parameter is set to true but BBKVComparator.compare used 
> internally for the cell of type ByteBufferKeyValue doesn't consider this.
>  {code}
> @Override
>   public int compare(final Cell a, final Cell b, boolean ignoreSequenceid) {
> int diff = 0;
> // "Peel off" the most common path.
> if (a instanceof ByteBufferKeyValue && b instanceof ByteBufferKeyValue) {
>   diff = BBKVComparator.compare((ByteBufferKeyValue)a, 
> (ByteBufferKeyValue)b);
>   if (diff != 0) {
> return diff;
>   }
> } else {
>   diff = compareRows(a, b);
>   if (diff != 0) {
> return diff;
>   }
>   diff = compareWithoutRow(a, b);
>   if (diff != 0) {
> return diff;
>   }
> }
> // Negate following comparisons so later edits show up first mvccVersion: 
> later sorts first
> return ignoreSequenceid? diff: Long.compare(b.getSequenceId(), 
> a.getSequenceId());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20194) Basic Replication WebUI - Master

2018-06-26 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20194:
--
Fix Version/s: (was: 2.2.0)
   2.0.2

> Basic Replication WebUI - Master
> 
>
> Key: HBASE-20194
> URL: https://issues.apache.org/jira/browse/HBASE-20194
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication, Usability
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Critical
> Fix For: 3.0.0, 2.1.0, 2.0.2
>
> Attachments: HBASE-20194.master.001.patch, 
> HBASE-20194.master.002.patch, HBASE-20194.master.003.patch, 
> HBASE-20194.master.004.patch, HBASE-20194.master.005.patch, 
> HBASE-20194.master.005.patch, HBASE-20194.master.006.patch
>
>
> subtask of HBASE-15809. Implementation of Replication WebUI on Master webpage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20795) Allow option in BBKVComparator.compare to do comparison without sequence id

2018-06-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524334#comment-16524334
 ] 

stack commented on HBASE-20795:
---

bq. and for that we Delegate compare to CellComparatorImpl#compare(final 
Cell a, final Cell b, boolean ignoreSequenceid). 

We should figure this out. CCI is @InterfaceAudience.Private for hbase internal 
use only. We should work on this.

bq. I observed a regression in our tests after HBASE-20564.

Good one. Thanks.

Let me push.

> Allow option in BBKVComparator.compare to do comparison without sequence id
> ---
>
> Key: HBASE-20795
> URL: https://issues.apache.org/jira/browse/HBASE-20795
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.1
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-20795.patch
>
>
> CellComparatorImpl#compare(final Cell a, final Cell b, boolean 
> ignoreSequenceid) needs to ignore sequence id in comparison if 
> ignoreSequenceId parameter is set to true but BBKVComparator.compare used 
> internally for the cell of type ByteBufferKeyValue doesn't consider this.
>  {code}
> @Override
>   public int compare(final Cell a, final Cell b, boolean ignoreSequenceid) {
> int diff = 0;
> // "Peel off" the most common path.
> if (a instanceof ByteBufferKeyValue && b instanceof ByteBufferKeyValue) {
>   diff = BBKVComparator.compare((ByteBufferKeyValue)a, 
> (ByteBufferKeyValue)b);
>   if (diff != 0) {
> return diff;
>   }
> } else {
>   diff = compareRows(a, b);
>   if (diff != 0) {
> return diff;
>   }
>   diff = compareWithoutRow(a, b);
>   if (diff != 0) {
> return diff;
>   }
> }
> // Negate following comparisons so later edits show up first mvccVersion: 
> later sorts first
> return ignoreSequenceid? diff: Long.compare(b.getSequenceId(), 
> a.getSequenceId());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-6028) Implement a cancel for in-progress compactions

2018-06-26 Thread Mohit Goel (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524332#comment-16524332
 ] 

Mohit Goel commented on HBASE-6028:
---

[~stack] Added release notes. Please review

> Implement a cancel for in-progress compactions
> --
>
> Key: HBASE-6028
> URL: https://issues.apache.org/jira/browse/HBASE-6028
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Derek Wollenstein
>Assignee: Mohit Goel
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-6028.master.006.patch
>
>
> Depending on current server load, it can be extremely expensive to run 
> periodic minor / major compactions.  It would be helpful to have a feature 
> where a user could use the shell or a client tool to explicitly cancel an 
> in-progress compactions.  This would allow a system to recover when too many 
> regions became eligible for compactions at once



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-6028) Implement a cancel for in-progress compactions

2018-06-26 Thread Mohit Goel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Goel updated HBASE-6028:
--
Release Note: 
Added a new feature to switch on and off the compactions at region server 
dynamically. Disabling compactions will also interrupt any currently ongoing
compactions. This setting will be lost on restart of the server. 
Added the configuration hbase.regionserver.compaction.enabled. User can 
enable/disable compactions statically using this configuration.
Added both the Synchronous and Asynchronous Admin Client API for the same. To 
turn on/ off compactions, "compaction_switch" command can also be used from 
hbase shell.

> Implement a cancel for in-progress compactions
> --
>
> Key: HBASE-6028
> URL: https://issues.apache.org/jira/browse/HBASE-6028
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Derek Wollenstein
>Assignee: Mohit Goel
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-6028.master.006.patch
>
>
> Depending on current server load, it can be extremely expensive to run 
> periodic minor / major compactions.  It would be helpful to have a feature 
> where a user could use the shell or a client tool to explicitly cancel an 
> in-progress compactions.  This would allow a system to recover when too many 
> regions became eligible for compactions at once



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20796) STUCK RIT though region successfully assigned

2018-06-26 Thread stack (JIRA)
stack created HBASE-20796:
-

 Summary: STUCK RIT though region successfully assigned
 Key: HBASE-20796
 URL: https://issues.apache.org/jira/browse/HBASE-20796
 Project: HBase
  Issue Type: Bug
  Components: amv2
Reporter: stack
 Fix For: 2.0.0


This is a good one. We keep logging messages like this:

{code}
2018-06-26 12:32:24,859 WARN 
org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK 
Region-In-Transition rit=OPENING, location=vd0410.X.Y.com,22101,1529611445046, 
table=IntegrationTestBigLinkedList_20180525080406, 
region=e10b35d49528e2453a04c7038e3393d7
{code}

...though the region is successfully assigned.

Story:

 * Dispatch an assign 2018-06-26 12:31:27,390 INFO 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Dispatch 
pid=370829, ppid=370391, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
AssignProcedure table=IntegrationTestBigLinkedList_20180612114844, 
region=f69ccf7d9178ce166b515e0e2ef019d2; rit=OPENING, 
location=vd0410.X.Y.Z,22101,1529611445046
 * It gets stuck 2018-06-26 12:32:29,860 WARN 
org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK 
Region-In-Transition rit=OPENING, location=vd0410.X.Y.Z,22101,1529611445046, 
table=IntegrationTestBigLinkedList_20180612114844, 
region=f69ccf7d9178ce166b515e0e2ef019d2 (Because the server was killed)
 * We stay STUCK for a while.
 * The Master notices the server as crashed and starts a SCP.
 * SCP kills ongoing assign: 2018-06-26 12:32:54,809 INFO 
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: pid=371105 found 
RIT pid=370829, ppid=370391, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
AssignProcedure table=IntegrationTestBigLinkedList_20180612114844, 
region=f69ccf7d9178ce166b515e0e2ef019d2; rit=OPENING, 
location=vd0410.X.Y.Z,22101,1529611445046
 * The kill brings on a retry ... 2018-06-26 12:32:54,810 WARN 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Remote 
call failed pid=370829, ppid=370391, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
AssignProcedure table=IntegrationTestBigLinkedList_20180612114844, 
region=f69ccf7d9178ce166b515e0e2ef019d2; rit=OPENING, 
location=vd0410.X.Y.Z,22101,1529611445046; exception=ServerCrashProcedure 
pid=371105, server=vd0410.X.Y.Z,22101,1529611445046
 * Which eventually succeeds. Successfully deployed to new server 
2018-06-26 12:32:55,429 INFO 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=370829, 
ppid=370391, state=SUCCESS; AssignProcedure 
table=IntegrationTestBigLinkedList_20180612114844, 
region=f69ccf7d9178ce166b515e0e2ef019d2 in 1mins, 35.379sec
 * But then, it looks like the RPC was ongoing and it broke in following way 
2018-06-26 12:33:06,378 WARN 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Remote 
call failed pid=370829, ppid=370391, state=SUCCESS; AssignProcedure 
table=IntegrationTestBigLinkedList_20180612114844, 
region=f69ccf7d9178ce166b515e0e2ef019d2; rit=OPEN, 
location=vc0614.halxg.cloudera.com,22101,1529611443424; exception=Call to 
vd0410.X.Y.Z/10.10.10.10:22101 failed on local exception: 
org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: 
syscall:read(..) failed: Connection reset by peer (Notice how state for region 
is OPEN and 'SUCCESS').
 * Then says 2018-06-26 12:33:06,380 INFO 
org.apache.hadoop.hbase.master.assignment.AssignProcedure: Retry=1 of max=10; 
pid=370829, ppid=370391, state=SUCCESS; AssignProcedure 
table=IntegrationTestBigLinkedList_20180612114844, 
region=f69ccf7d9178ce166b515e0e2ef019d2; rit=OPEN, 
location=vc0614.X.Y.Z,22101,1529611443424
 * And finally...  2018-06-26 12:34:10,727 WARN 
org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK 
Region-In-Transition rit=OFFLINE, location=null, 
table=IntegrationTestBigLinkedList_20180612114844, 
region=f69ccf7d9178ce166b515e0e2ef019d2

Restart of Master got rid of the STUCK complaints.

This is interesting because the stuck rpc and the successful reassign are all 
riding on the same pid.






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20795) Allow option in BBKVComparator.compare to do comparison without sequence id

2018-06-26 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524329#comment-16524329
 ] 

Ankit Singhal commented on HBASE-20795:
---

bq. How'd you arrive at the need for this fix?
[~stack], Actually Phoenix has its own internal memstore for concurrent index 
writes where we need to compare cells without sequence id and for that we 
Delegate compare to CellComparatorImpl#compare(final Cell a, final Cell b, 
boolean ignoreSequenceid). 
I observed a regression in our tests after HBASE-20564.

bq. "This is a tricked-out Comparator at heart of hbase read and write. It is 
in the HOT path so we try all sorts of ugly stuff so we can go faster. See 
below in this javadoc comment for the list."
bq. I did a bunch of work trying to minimize the bytecode these methods 
generate as they are the hottest code paths in our code base – small 
differences here show at the macro scale. This fix adds code. At one stage I'd 
duplicated the code here into CellComparatorImpl trying to make it so we 
minimize the Cell types that traverse this code path. Perhaps we should do that 
again? I'm still profiling. Let me add this in for now and if it causes 
slowdown, I'll dupe the method.
Thanks [~stack] for checking. 





> Allow option in BBKVComparator.compare to do comparison without sequence id
> ---
>
> Key: HBASE-20795
> URL: https://issues.apache.org/jira/browse/HBASE-20795
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.1
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-20795.patch
>
>
> CellComparatorImpl#compare(final Cell a, final Cell b, boolean 
> ignoreSequenceid) needs to ignore sequence id in comparison if 
> ignoreSequenceId parameter is set to true but BBKVComparator.compare used 
> internally for the cell of type ByteBufferKeyValue doesn't consider this.
>  {code}
> @Override
>   public int compare(final Cell a, final Cell b, boolean ignoreSequenceid) {
> int diff = 0;
> // "Peel off" the most common path.
> if (a instanceof ByteBufferKeyValue && b instanceof ByteBufferKeyValue) {
>   diff = BBKVComparator.compare((ByteBufferKeyValue)a, 
> (ByteBufferKeyValue)b);
>   if (diff != 0) {
> return diff;
>   }
> } else {
>   diff = compareRows(a, b);
>   if (diff != 0) {
> return diff;
>   }
>   diff = compareWithoutRow(a, b);
>   if (diff != 0) {
> return diff;
>   }
> }
> // Negate following comparisons so later edits show up first mvccVersion: 
> later sorts first
> return ignoreSequenceid? diff: Long.compare(b.getSequenceId(), 
> a.getSequenceId());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20770) WAL cleaner logs way too much; gets clogged when lots of work to do

2018-06-26 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20770:
---
Fix Version/s: 1.4.6
   1.5.0

> WAL cleaner logs way too much; gets clogged when lots of work to do
> ---
>
> Key: HBASE-20770
> URL: https://issues.apache.org/jira/browse/HBASE-20770
> Project: HBase
>  Issue Type: Bug
>  Components: logging
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 1.5.0, 1.4.6, 2.0.2
>
> Attachments: HBASE-20770.branch-2.0.001.patch
>
>
> Been here before (HBASE-7214 and HBASE-19652). Testing on large cluster, 
> Master log is in a continuous spew of logging output fililng disks. It is 
> stuck making no progress but hard to tell because it is logging minutiae 
> rather than a call on whats actually wrong.
> Log is full of this:
> {code}
> 2018-06-21 01:19:12,761 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/e98cdb817bb3af5fa26e2b885a0b2ec6/meta/bd49572de3914b66985fff5ea2ca7403
> 2018-06-21 01:19:12,761 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/e98cdb817bb3af5fa26e2b885a0b2ec6/meta/fad01294c6ca421db209e89b5b97d364
> 2018-06-21 01:19:12,823 WARN 
> org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Wait more than 6 ms 
> for deleting 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/d3f759d0495257fc1d33ae780b634455/tiny/b72bac4036444dcf9265c7b5664fd403,
>  exit...
> 2018-06-21 01:19:12,823 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/665bfa38c86a28d641ce08f8fea0a7f9
> 2018-06-21 01:19:12,824 WARN 
> org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Wait more than 6 ms 
> for deleting 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/2425053ad86823081b368e00bc471e56/tiny/6ea3cb1174434aecbc448e322e2a062c,
>  exit...
> 2018-06-21 01:19:12,824 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/e98cdb817bb3af5fa26e2b885a0b2ec6/big
> 2018-06-21 01:19:12,824 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/e98cdb817bb3af5fa26e2b885a0b2ec6/tiny
> 2018-06-21 01:19:12,827 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/665bfa38c86a28d641ce08f8fea0a7f9/meta
> 2018-06-21 01:19:12,844 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/17f85c98389104b19358f6751da577d0
> 2018-06-21 01:19:12,844 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/c98e276423813aaa74d848983c47d93c
> 2018-06-21 01:19:12,844 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/665bfa38c86a28d641ce08f8fea0a7f9/meta/90f21dec28d140cda48d37eeb44d37e8
> 2018-06-21 01:19:12,844 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/665bfa38c86a28d641ce08f8fea0a7f9/meta/8a4cf6410d5a4201963bc1415945f877
> 2018-06-21 01:19:12,848 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/c98e276423813aaa74d848983c47d93c/meta
> 2018-06-21 01:19:12,849 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore: Cleaning under 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/17f85c98389104b19358f6751da577d0/meta
> 2018-06-21 01:19:12,927 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/17f85c98389104b19358f6751da577d0/meta/6043fce5761e4479819b15405183f193
> 2018-06-21 01:19:12,927 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/c98e276423813aaa74d848983c47d93c/meta/69e6bf4650124859b2bc7ddf134be642
> 2018-06-21 01:19:13,011 DEBUG 
> org.apache.hadoop.hbase.master.cleaner.HFileCleaner: Removing 
> hdfs://ns1/hbase/archive/data/default/IntegrationTestBigLinkedList/17f85c98389104b19358f6751da577d0/meta/1a46700fbc434574a005c0b55879d5ed
> 

[jira] [Commented] (HBASE-6028) Implement a cancel for in-progress compactions

2018-06-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524289#comment-16524289
 ] 

stack commented on HBASE-6028:
--

Looks good [~mogoel] Want to fill out the release note with summary of what the 
patch adds? Thanks.

> Implement a cancel for in-progress compactions
> --
>
> Key: HBASE-6028
> URL: https://issues.apache.org/jira/browse/HBASE-6028
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Derek Wollenstein
>Assignee: Mohit Goel
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-6028.master.006.patch
>
>
> Depending on current server load, it can be extremely expensive to run 
> periodic minor / major compactions.  It would be helpful to have a feature 
> where a user could use the shell or a client tool to explicitly cancel an 
> in-progress compactions.  This would allow a system to recover when too many 
> regions became eligible for compactions at once



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20716) Unsafe access cleanup

2018-06-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524287#comment-16524287
 ] 

stack commented on HBASE-20716:
---

bq. Is polymorphic dispatch really faster than a final boolean conditional?

Not sure how the question relates.

We currently have a boolean check and then a static dispatch ("invokestatic") 
to do unsafe all over our code. The idea is to see if we can do away with 
checking the boolean everywhere and instead do it once in a static block on 
startup and based off the findings, insert into a static the class to use doing 
unsafe going forward; thereafter we could just static dispatch w/o boolean 
check. We actually do this already in one of our paths to unsafe (We have more 
than one route to unsafe. I'd think that there should be one only -- where this 
issue began).

I think we may see savings here on hot paths; in size and so perf. Would be 
sweet if [~awked06]'s jmh'ing bore out the supposition.



> Unsafe access cleanup
> -
>
> Key: HBASE-20716
> URL: https://issues.apache.org/jira/browse/HBASE-20716
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: stack
>Priority: Critical
>  Labels: beginner
> Attachments: Screen Shot 2018-06-26 at 11.37.49 AM.png
>
>
> We have two means of getting at unsafe; UnsafeAccess and then internal to the 
> Bytes class. They are effectively doing the same thing. We should have one 
> avenue to Unsafe only.
> Many of our paths to Unsafe via UnsafeAccess traverse flags to check if 
> access is available, if it is aligned and the order in which words are 
> written on the machine. Each check costs -- especially if done millions of 
> times a second -- and on occasion adds bloat in hot code paths. The unsafe 
> access inside Bytes checks on startup what the machine is capable off and 
> then does a static assign of the appropriate class-to-use from there on out. 
> UnsafeAccess does not do this running the checks everytime. Would be good to 
> have the Bytes behavior pervasive.
> The benefit of one access to Unsafe only is plain. The benefits we gain 
> removing checks will be harder to measure though should be plain when you 
> disassemble a hot-path; in a (very) rare case, the saved byte codes could be 
> the difference between inlining or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20789) TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky

2018-06-26 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-20789:
---
Attachment: bucket-33718.out

> TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
> ---
>
> Key: HBASE-20789
> URL: https://issues.apache.org/jira/browse/HBASE-20789
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: bucket-33718.out
>
>
> The UT failed frequently in our internal branch-2... Will dig into the UT.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20789) TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky

2018-06-26 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524277#comment-16524277
 ] 

Ted Yu commented on HBASE-20789:


https://builds.apache.org/job/HBASE-Flaky-Tests/33718/testReport/junit/org.apache.hadoop.hbase.io.hfile.bucket/TestBucketCache/testCacheBlockNextBlockMetadataMissing_0__blockSize_8_192__bucketSizes_null_/

I will attach the above test output.

> TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
> ---
>
> Key: HBASE-20789
> URL: https://issues.apache.org/jira/browse/HBASE-20789
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
>
> The UT failed frequently in our internal branch-2... Will dig into the UT.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20792) info:servername and info:sn inconsistent for OPEN region

2018-06-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524272#comment-16524272
 ] 

Josh Elser commented on HBASE-20792:


Best as I can see, when the Master is re-reading the last master's pv2 WAL, it 
does some kind of replay on the MRP that is executed in the step above. This 
modifies the state in meta from the correct state to the state where 
{{info:sn}} is set to the old RS. This same replay isn't done for SCP (or its 
child AssignProcess) which gives us this invalid state in meta which later 
screws up the AM.

> info:servername and info:sn inconsistent for OPEN region
> 
>
> Key: HBASE-20792
> URL: https://issues.apache.org/jira/browse/HBASE-20792
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 2.0.2
>
>
> Next problem we've run into after HBASE-20752 and HBASE-20708
> After a rolling restart of a cluster, we'll see situations where a collection 
> of regions will simply not be assigned out to the RS. I was able to reproduce 
> this my mimic the restart patterns our tests do internally (ignore whether 
> this is the best way to restart nodes for now :)). The general pattern is 
> this:
> {code:java}
> for rs in regionservers:
>   stop(server, rs, RS)
> for master in masters:
>   stop(server, master, MASTER)
> sleep(15)
> for master in masters:
>   start(server, master, MASTER)
> for rs in regionservers:
>   start(server, rs, RS){code}
> Looking at meta, we can see why the Master is ignoring some regions:
> {noformat}
>  test
> column=table:state, timestamp=1529871718998, value=\x08\x00
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:regioninfo, timestamp=1529967103390, value={ENCODED => 
> 0297f680df6dc0166a44f9536346268e, NAME => 
> 'test,,1529871718122.0297f680df6dc0166a44f9536346268e.', STARTKEY
>  => '', ENDKEY => 
> ''}
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:seqnumDuringOpen, timestamp=1529967103390, 
> value=\x00\x00\x00\x00\x00\x00\x00*
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:server, timestamp=1529967103390, 
> value=ctr-e138-1518143905142-378097-02-12.hwx.site:16020
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:serverstartcode, timestamp=1529967103390, value=1529966776248
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   column=info:sn, 
> timestamp=1529967096482, 
> value=ctr-e138-1518143905142-378097-02-06.hwx.site,16020,1529966755170
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:state, timestamp=1529967103390, value=OPEN{noformat}
> The region is marked as {{OPEN}}. The master doesn't know any better. 
> However, the interesting bit is that {{info:server}} and {{info:sn}} are 
> inconsistent (which, according to the javadoc should not be possible for an 
> {{OPEN}} region).{{}}
> This doesn't happen every time, but I caught it yesterday on the 2nd or 3rd 
> attempt, so I'm hopeful it's not a bear to repro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-6028) Implement a cancel for in-progress compactions

2018-06-26 Thread Mohit Goel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Goel updated HBASE-6028:
--
Status: Patch Available  (was: In Progress)

> Implement a cancel for in-progress compactions
> --
>
> Key: HBASE-6028
> URL: https://issues.apache.org/jira/browse/HBASE-6028
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Derek Wollenstein
>Assignee: Mohit Goel
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-6028.master.006.patch
>
>
> Depending on current server load, it can be extremely expensive to run 
> periodic minor / major compactions.  It would be helpful to have a feature 
> where a user could use the shell or a client tool to explicitly cancel an 
> in-progress compactions.  This would allow a system to recover when too many 
> regions became eligible for compactions at once



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-6028) Implement a cancel for in-progress compactions

2018-06-26 Thread Mohit Goel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Goel updated HBASE-6028:
--
Attachment: HBASE-6028.master.006.patch

> Implement a cancel for in-progress compactions
> --
>
> Key: HBASE-6028
> URL: https://issues.apache.org/jira/browse/HBASE-6028
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Derek Wollenstein
>Assignee: Mohit Goel
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-6028.master.006.patch
>
>
> Depending on current server load, it can be extremely expensive to run 
> periodic minor / major compactions.  It would be helpful to have a feature 
> where a user could use the shell or a client tool to explicitly cancel an 
> in-progress compactions.  This would allow a system to recover when too many 
> regions became eligible for compactions at once



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-6028) Implement a cancel for in-progress compactions

2018-06-26 Thread Mohit Goel (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524252#comment-16524252
 ] 

Mohit Goel commented on HBASE-6028:
---

Proposed command line usage is : 
{code}
hbase(main):023:0> help "compaction_switch"
Turn the compaction on or off on regionservers. Disabling compactions will also 
interrupt any
currently ongoing
compactions. It is ephemeral. This setting will be lost on restart of the 
server. Compaction
can also be enabled/disabled by modifying configuration 
hbase.regionserver.compaction.enabled in
hbase-site.xml.
Examples:
  To enable compactions on all region servers
  hbase> compaction_switch true
  To disable compactions on all region servers
  hbase> compaction_switch false
  To enable compactions on specific region servers
  hbase> compaction_switch true 'server2','server1'
  To disable compactions on specific region servers
  hbase> compaction_switch false 'server2','server1'
NOTE: A server name is its host, port plus startcode. For example:
host187.example.com,60020,1289493121758

hbase(main):020:0> compaction_switch "false"
SERVER   PREV_STATE
 dhcp-10-16-2-111.pa.cloudera.com,16022,153004290209 true
 2
 dhcp-10-16-2-111.pa.cloudera.com,16023,153004290320 true
 3
2 row(s)
Took 0.0214 seconds
hbase(main):021:0> compaction_switch "true"
SERVER   PREV_STATE
 dhcp-10-16-2-111.pa.cloudera.com,16022,153004290209 false
 2
 dhcp-10-16-2-111.pa.cloudera.com,16023,153004290320 false
 3
2 row(s)
Took 0.0217 seconds 
{code}

Logs from RS when switch off compaction is run :
{code}
2018-06-26 13:37:20,000 WARN  [regionserver/dhcp-10-16-2-111:16022] 
compactions.CompactionProgress: totalCompactingKVs=2277787 less than 
currentCompactedKVs=3146342
2018-06-26 13:37:22,003 INFO  
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16022] 
regionserver.CompactSplit: Interrupting running compactions because user 
switched off compactions
2018-06-26 13:37:22,005 INFO  
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16022-shortCompactions-1530045419380]
 throttle.PressureAwareThroughputController: 
10fffe4a4c251917d4422e917a3f114d#info0#compaction#26 average throughput is 
175.87 MB/second, slept 0 time(s) and total slept time is 0 ms. 0 active 
operations remaining, total limit is unlimited
2018-06-26 13:37:22,005 INFO  
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16022-longCompactions-1530045363453]
 throttle.PressureAwareThroughputController: 
10fffe4a4c251917d4422e917a3f114d#info0#compaction#24 average throughput is 
176.70 MB/second, slept 0 time(s) and total slept time is 0 ms. 0 active 
operations remaining, total limit is unlimited
2018-06-26 13:37:22,042 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16022-shortCompactions-1530045419380]
 regionserver.CompactSplit: Compaction failed 
region=TestTable,299441,1529703743138.10fffe4a4c251917d4422e917a3f114d.,
 storeName=info0, priority=0, startTime=1530045430730
java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for reader 
reader=file:/Users/mohit.goel/srcdir/distributed/data/data/default/TestTable/10fffe4a4c251917d4422e917a3f114d/info0/1eef927ebb0d47d59196229e3ce566bd,
 compression=none, cacheConf=blockCache=LruBlockCache{blockCount=7, 
currentSize=1.21 MB, freeSize=1.57 GB, maxSize=1.57 GB, heapSize=1.21 MB, 
minSize=1.50 GB, minFactor=0.95, multiSize=765.61 MB, multiFactor=0.5, 
singleSize=382.80 MB, singleFactor=0.25}, cacheDataOnRead=true, 
cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, 
cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, 
firstKey=Optional[0001048576/info0:0/1530045423841/Put/seqid=0],
 
lastKey=Optional[0003145727/info0:0/1530045423622/Put/seqid=0], 
avgKeyLen=44, avgValueLen=1000, entries=477045, length=504687508, 
cur=0002937125/info0:0/1530045421477/Put/vlen=1000/seqid=12377]
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:216)
at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:120)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:639)
at 
org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:385)
at 
org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:325)
at 
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65)
at 
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1403)
at 
org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2111)
at 

[jira] [Commented] (HBASE-20795) Allow option in BBKVComparator.compare to do comparison without sequence id

2018-06-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524241#comment-16524241
 ] 

stack commented on HBASE-20795:
---

[~an...@apache.org] Thanks.

How'd you arrive at the need for this fix?

Was it code reading?

Or did you see a perf difference?

You saw the comment on the BBKVC class which says: "This is a tricked-out 
Comparator at heart of hbase read and write. It is in the HOT path so we try 
all sorts of ugly stuff so we can go faster. See below in this javadoc comment 
for the list."

I did a bunch of work trying to minimize the bytecode these methods generate as 
they are the hottest code paths in our code base -- small differences here show 
at the macro scale. This fix adds code. At one stage I'd duplicated the code 
here into CellComparatorImpl trying to make it so we minimize the Cell types 
that traverse this code path. Perhaps we should do that again? I'm still 
profiling. Let me add this in for now and if it causes slowdown, I'll dupe the 
method. Thanks.



> Allow option in BBKVComparator.compare to do comparison without sequence id
> ---
>
> Key: HBASE-20795
> URL: https://issues.apache.org/jira/browse/HBASE-20795
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.1
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-20795.patch
>
>
> CellComparatorImpl#compare(final Cell a, final Cell b, boolean 
> ignoreSequenceid) needs to ignore sequence id in comparison if 
> ignoreSequenceId parameter is set to true but BBKVComparator.compare used 
> internally for the cell of type ByteBufferKeyValue doesn't consider this.
>  {code}
> @Override
>   public int compare(final Cell a, final Cell b, boolean ignoreSequenceid) {
> int diff = 0;
> // "Peel off" the most common path.
> if (a instanceof ByteBufferKeyValue && b instanceof ByteBufferKeyValue) {
>   diff = BBKVComparator.compare((ByteBufferKeyValue)a, 
> (ByteBufferKeyValue)b);
>   if (diff != 0) {
> return diff;
>   }
> } else {
>   diff = compareRows(a, b);
>   if (diff != 0) {
> return diff;
>   }
>   diff = compareWithoutRow(a, b);
>   if (diff != 0) {
> return diff;
>   }
> }
> // Negate following comparisons so later edits show up first mvccVersion: 
> later sorts first
> return ignoreSequenceid? diff: Long.compare(b.getSequenceId(), 
> a.getSequenceId());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20792) info:servername and info:sn inconsistent for OPEN region

2018-06-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524238#comment-16524238
 ] 

Josh Elser commented on HBASE-20792:


Oh, one more important detail it seems:
 * Put hbase:namespace region on the last regionserver (lexicographically 
sorted)
 * Move the region to the second to last RS (not sure if the RS itself is 
important, or just the MRP)
 * Execute the restart steps as above

That seems to be the secret sauce that causes this. Not sure why – running with 
more debugging now.

> info:servername and info:sn inconsistent for OPEN region
> 
>
> Key: HBASE-20792
> URL: https://issues.apache.org/jira/browse/HBASE-20792
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 2.0.2
>
>
> Next problem we've run into after HBASE-20752 and HBASE-20708
> After a rolling restart of a cluster, we'll see situations where a collection 
> of regions will simply not be assigned out to the RS. I was able to reproduce 
> this my mimic the restart patterns our tests do internally (ignore whether 
> this is the best way to restart nodes for now :)). The general pattern is 
> this:
> {code:java}
> for rs in regionservers:
>   stop(server, rs, RS)
> for master in masters:
>   stop(server, master, MASTER)
> sleep(15)
> for master in masters:
>   start(server, master, MASTER)
> for rs in regionservers:
>   start(server, rs, RS){code}
> Looking at meta, we can see why the Master is ignoring some regions:
> {noformat}
>  test
> column=table:state, timestamp=1529871718998, value=\x08\x00
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:regioninfo, timestamp=1529967103390, value={ENCODED => 
> 0297f680df6dc0166a44f9536346268e, NAME => 
> 'test,,1529871718122.0297f680df6dc0166a44f9536346268e.', STARTKEY
>  => '', ENDKEY => 
> ''}
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:seqnumDuringOpen, timestamp=1529967103390, 
> value=\x00\x00\x00\x00\x00\x00\x00*
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:server, timestamp=1529967103390, 
> value=ctr-e138-1518143905142-378097-02-12.hwx.site:16020
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:serverstartcode, timestamp=1529967103390, value=1529966776248
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   column=info:sn, 
> timestamp=1529967096482, 
> value=ctr-e138-1518143905142-378097-02-06.hwx.site,16020,1529966755170
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:state, timestamp=1529967103390, value=OPEN{noformat}
> The region is marked as {{OPEN}}. The master doesn't know any better. 
> However, the interesting bit is that {{info:server}} and {{info:sn}} are 
> inconsistent (which, according to the javadoc should not be possible for an 
> {{OPEN}} region).{{}}
> This doesn't happen every time, but I caught it yesterday on the 2nd or 3rd 
> attempt, so I'm hopeful it's not a bear to repro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20705) Having RPC Quota on a table prevents Space quota to be recreated/removed

2018-06-26 Thread Sai Nukavarapu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523791#comment-16523791
 ] 

Sai Nukavarapu edited comment on HBASE-20705 at 6/26/18 9:43 PM:
-

 

I am able to reproduce this issue in HDP-2.6.2.0/hbase 1.1.2. If you remove 
both SPACE and THROTTLE for the table then the exception goes away. 

 
{noformat}
hbase(main):015:0> list_quotas 
OWNER QUOTAS 
0 row(s) in 0.1520 seconds 

hbase(main):016:0> set_quota TYPE => SPACE, TABLE => 'default:t1', LIMIT => 
'1G', POLICY => NO_WRITES 

hbase(main):017:0> set_quota TYPE => THROTTLE, TABLE => 'default:t1', LIMIT => 
'10M/sec' 

hbase(main):018:0> list_quotas 
OWNER QUOTAS 
TABLE => t1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10M/sec, 
SCOPE => MACHINE 
TABLE => t1 TYPE => SPACE, TABLE => t1, LIMIT => 1073741824, VIOLATION_POLICY 
=> NO_WRITES 
2 row(s) in 0.1600 seconds 

hbase(main):019:0> set_quota TYPE => SPACE, TABLE => 'default:t1', LIMIT => 
NONE 

hbase(main):020:0> list_quotas 
OWNER QUOTAS 

ERROR: Cannot handle SpaceQuota without a soft limit

 

hbase(main):027:0> set_quota TYPE => SPACE, TABLE => 'default:t1', LIMIT => 
NONE 
hbase(main):029:0> set_quota TYPE => THROTTLE, TABLE => 'default:t1', LIMIT => 
NONE 

hbase(main):030:0> list_quotas 
OWNER QUOTAS 
0 row(s) in 0.1400 seconds 

 

{noformat}


was (Author: srikanth1184):
 

I am able to reproduce this issue. If you remove both SPACE and THROTTLE for 
the table then the exception goes away. 

 

{noformat}
hbase(main):015:0> list_quotas 
OWNER QUOTAS 
0 row(s) in 0.1520 seconds 

hbase(main):016:0> set_quota TYPE => SPACE, TABLE => 'default:t1', LIMIT => 
'1G', POLICY => NO_WRITES 

hbase(main):017:0> set_quota TYPE => THROTTLE, TABLE => 'default:t1', LIMIT => 
'10M/sec' 

hbase(main):018:0> list_quotas 
OWNER QUOTAS 
TABLE => t1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10M/sec, 
SCOPE => MACHINE 
TABLE => t1 TYPE => SPACE, TABLE => t1, LIMIT => 1073741824, VIOLATION_POLICY 
=> NO_WRITES 
2 row(s) in 0.1600 seconds 

hbase(main):019:0> set_quota TYPE => SPACE, TABLE => 'default:t1', LIMIT => 
NONE 

hbase(main):020:0> list_quotas 
OWNER QUOTAS 

ERROR: Cannot handle SpaceQuota without a soft limit

 

hbase(main):027:0> set_quota TYPE => SPACE, TABLE => 'default:t1', LIMIT => 
NONE 
hbase(main):029:0> set_quota TYPE => THROTTLE, TABLE => 'default:t1', LIMIT => 
NONE 

hbase(main):030:0> list_quotas 
OWNER QUOTAS 
0 row(s) in 0.1400 seconds 

 

{noformat}

> Having RPC Quota on a table prevents Space quota to be recreated/removed
> 
>
> Key: HBASE-20705
> URL: https://issues.apache.org/jira/browse/HBASE-20705
> Project: HBase
>  Issue Type: Bug
>Reporter: Biju Nair
>Assignee: Sakthi
>Priority: Major
>
> * Property {{hbase.quota.remove.on.table.delete}} is set to {{true}} by 
> default
>  * Create a table and set RPC and Space quota
> {noformat}
> hbase(main):022:0> create 't2','cf1'
> Created table t2
> Took 0.7420 seconds
> => Hbase::Table - t2
> hbase(main):023:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => '1G', 
> POLICY => NO_WRITES
> Took 0.0105 seconds
> hbase(main):024:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => 
> '10M/sec'
> Took 0.0186 seconds
> hbase(main):025:0> list_quotas
> TABLE => t2 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 
> 10M/sec, SCOPE => MACHINE
> TABLE => t2 TYPE => SPACE, TABLE => t2, LIMIT => 1073741824, VIOLATION_POLICY 
> => NO_WRITES{noformat}
>  * Drop the table and the Space quota is set to {{REMOVE => true}}
> {noformat}
> hbase(main):026:0> disable 't2'
> Took 0.4363 seconds
> hbase(main):027:0> drop 't2'
> Took 0.2344 seconds
> hbase(main):028:0> list_quotas
> TABLE => t2 TYPE => SPACE, TABLE => t2, REMOVE => true
> USER => u1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10M/sec, 
> SCOPE => MACHINE{noformat}
>  * Recreate the table and set Space quota back. The Space quota on the table 
> is still set to {{REMOVE => true}}
> {noformat}
> hbase(main):029:0> create 't2','cf1'
> Created table t2
> Took 0.7348 seconds
> => Hbase::Table - t2
> hbase(main):031:0> set_quota TYPE => SPACE, TABLE => 't2', LIMIT => '1G', 
> POLICY => NO_WRITES
> Took 0.0088 seconds
> hbase(main):032:0> list_quotas
> OWNER QUOTAS
> TABLE => t2 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 
> 10M/sec, SCOPE => MACHINE
> TABLE => t2 TYPE => SPACE, TABLE => t2, REMOVE => true{noformat}
>  * Remove RPC quota and drop the table, the Space Quota is not removed
> {noformat}
> hbase(main):033:0> set_quota TYPE => THROTTLE, TABLE => 't2', LIMIT => NONE
> Took 0.0193 seconds
> hbase(main):036:0> disable 't2'
> Took 0.4305 seconds
> hbase(main):037:0> drop 't2'
> Took 0.2353 seconds
> hbase(main):038:0> 

[jira] [Commented] (HBASE-20795) Allow option in BBKVComparator.compare to do comparison without sequence id

2018-06-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524231#comment-16524231
 ] 

Hadoop QA commented on HBASE-20795:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
1s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
57s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
37s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
23s{color} | {color:red} hbase-common: The patch generated 1 new + 2 unchanged 
- 0 fixed = 3 total (was 2) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
28s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 32s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
31s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
11s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 36m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-20795 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929259/HBASE-20795.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 8eb1fe202b9e 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 6a0c67344a |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13402/artifact/patchprocess/diff-checkstyle-hbase-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13402/testReport/ |
| Max. process+thread count | 

[jira] [Commented] (HBASE-18840) Add functionality to refresh meta table at master startup

2018-06-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524207#comment-16524207
 ] 

Hadoop QA commented on HBASE-18840:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} HBASE-18840 does not apply to HBASE-18477. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.7.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-18840 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929264/HBASE-18840.HBASE-18477.004.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/13403/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Add functionality to refresh meta table at master startup
> -
>
> Key: HBASE-18840
> URL: https://issues.apache.org/jira/browse/HBASE-18840
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: HBASE-18477
>Reporter: Zach York
>Assignee: Zach York
>Priority: Major
> Attachments: HBASE-18840.HBASE-18477.001.patch, 
> HBASE-18840.HBASE-18477.002.patch, HBASE-18840.HBASE-18477.003 (2) (1).patch, 
> HBASE-18840.HBASE-18477.003 (2).patch, HBASE-18840.HBASE-18477.003.patch, 
> HBASE-18840.HBASE-18477.004.patch
>
>
> If a HBase cluster’s hbase:meta table is deleted or a cluster is started with 
> a new meta table, HBase needs the functionality to synchronize it’s metadata 
> from Storage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20701) too much logging when balancer runs from BaseLoadBalancer

2018-06-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524204#comment-16524204
 ] 

Hudson commented on HBASE-20701:


Results for branch branch-1.4
[build #366 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/366/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/366//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/366//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/366//JDK8_Nightly_Build_Report_(Hadoop2)/]




(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> too much logging when balancer runs from BaseLoadBalancer
> -
>
> Key: HBASE-20701
> URL: https://issues.apache.org/jira/browse/HBASE-20701
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Monani Mihir
>Assignee: Monani Mihir
>Priority: Trivial
> Fix For: 1.5.0, 1.3.3, 1.4.6
>
> Attachments: HBASE-20701-branch-1.3.patch, 
> HBASE-20701-branch-1.3.patch, HBASE-20701-branch-1.3.patch, 
> HBASE-20701-branch-1.4.patch, HBASE-20701-branch-1.4.patch, 
> HBASE-20701.branch-1.001.patch
>
>
> When balancer runs, it tries to find least loaded server with better locality 
> for current region. During this, we make debug level logging for each of 
> those regions. It creates too much amount of logging at debug level , we 
> should move this to trace level logging.
> {code:java}
> int getLeastLoadedTopServerForRegion (int region, int currentServer) {
> ...
> if (leastLoadedServerIndex != -1) {
> LOG.debug("Pick the least loaded server " + 
> servers[leastLoadedServerIndex].getHostname()
> + " with better locality for region " + regions[region]);
> }
> ...
> }{code}
> This was fixed in branch-2.0 as part of -HBASE-14614-  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18840) Add functionality to refresh meta table at master startup

2018-06-26 Thread Zach York (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zach York updated HBASE-18840:
--
Attachment: HBASE-18840.HBASE-18477.004.patch

> Add functionality to refresh meta table at master startup
> -
>
> Key: HBASE-18840
> URL: https://issues.apache.org/jira/browse/HBASE-18840
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: HBASE-18477
>Reporter: Zach York
>Assignee: Zach York
>Priority: Major
> Attachments: HBASE-18840.HBASE-18477.001.patch, 
> HBASE-18840.HBASE-18477.002.patch, HBASE-18840.HBASE-18477.003 (2) (1).patch, 
> HBASE-18840.HBASE-18477.003 (2).patch, HBASE-18840.HBASE-18477.003.patch, 
> HBASE-18840.HBASE-18477.004.patch
>
>
> If a HBase cluster’s hbase:meta table is deleted or a cluster is started with 
> a new meta table, HBase needs the functionality to synchronize it’s metadata 
> from Storage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18477) Umbrella JIRA for HBase Read Replica clusters

2018-06-26 Thread Zach York (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524199#comment-16524199
 ] 

Zach York commented on HBASE-18477:
---

[~busbey] I'm going to pick up this work again as I'd like to avoid long term 
code maintenance.

 

What are the remaining functionality/conceptual issues to be addressed?

 

Also I'm starting to think that it doesn't make sense for these features to be 
in a feature branch as none of them are being turned on by default and keeping 
them in a feature branch increases the code maintenance aspect of the feature 
(I'd like to spend more time actually improving it rather than rebasing :) ).

 

Thanks for everyone's reviews so far!

> Umbrella JIRA for HBase Read Replica clusters
> -
>
> Key: HBASE-18477
> URL: https://issues.apache.org/jira/browse/HBASE-18477
> Project: HBase
>  Issue Type: New Feature
>Reporter: Zach York
>Assignee: Zach York
>Priority: Major
> Attachments: HBase Read-Replica Clusters Scope doc.docx, HBase 
> Read-Replica Clusters Scope doc.pdf, HBase Read-Replica Clusters Scope 
> doc_v2.docx, HBase Read-Replica Clusters Scope doc_v2.pdf
>
>
> Recently, changes (such as HBASE-17437) have unblocked HBase to run with a 
> root directory external to the cluster (such as in Amazon S3). This means 
> that the data is stored outside of the cluster and can be accessible after 
> the cluster has been terminated. One use case that is often asked about is 
> pointing multiple clusters to one root directory (sharing the data) to have 
> read resiliency in the case of a cluster failure.
>  
> This JIRA is an umbrella JIRA to contain all the tasks necessary to create a 
> read-replica HBase cluster that is pointed at the same root directory.
>  
> This requires making the Read-Replica cluster Read-Only (no metadata 
> operation or data operations).
> Separating the hbase:meta table for each cluster (Otherwise HBase gets 
> confused with multiple clusters trying to update the meta table with their ip 
> addresses)
> Adding refresh functionality for the meta table to ensure new metadata is 
> picked up on the read replica cluster.
> Adding refresh functionality for HFiles for a given table to ensure new data 
> is picked up on the read replica cluster.
>  
> This can be used with any existing cluster that is backed by an external 
> filesystem.
>  
> Please note that this feature is still quite manual (with the potential for 
> automation later).
>  
> More information on this particular feature can be found here: 
> https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20789) TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky

2018-06-26 Thread Zach York (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524195#comment-16524195
 ] 

Zach York commented on HBASE-20789:
---

[~yuzhih...@gmail.com] None of those build links actually load for me...

> TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
> ---
>
> Key: HBASE-20789
> URL: https://issues.apache.org/jira/browse/HBASE-20789
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
>
> The UT failed frequently in our internal branch-2... Will dig into the UT.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20787) Rebase the HBASE-18477 onto the current master to continue dev

2018-06-26 Thread Zach York (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zach York resolved HBASE-20787.
---
Resolution: Fixed

> Rebase the HBASE-18477 onto the current master to continue dev
> --
>
> Key: HBASE-20787
> URL: https://issues.apache.org/jira/browse/HBASE-20787
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: HBASE-18477
>Reporter: Zach York
>Assignee: Zach York
>Priority: Minor
> Fix For: HBASE-18477
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20792) info:servername and info:sn inconsistent for OPEN region

2018-06-26 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524188#comment-16524188
 ] 

Josh Elser commented on HBASE-20792:


Repro'ed it again. Was worried there was something more complicated that I was 
missing.

I modified my restart logic to wait a bit before stopping the last RS. This 
doesn't cause it every time, but I now have back-to-back "good", then "bad" 
logs which is helpful.
{code:java}
for rs in regionservers[:-1]:
    stop(server, rs, RS)

# make sure the master has time to get everything on that last RS
time.sleep(15)
stop(server, regionservers[-1], RS)

for master in masters:
    stop(server, master, MASTER)
for master in masters:
    start(server, master, MASTER)
for rs in regionservers:
    start(server, rs, RS){code}
{noformat}
hbase(main):007:0> scan 'hbase:meta', {STARTROW=>'hbase:namespace', 
STOPROW=>'hbase:o'}
ROW  COLUMN+CELL
 hbase:namespace 
column=table:state, timestamp=1530043805582, value=\x08\x00
 hbase:namespace,,1530043803815.11724acb879200aa8ff0aaeef8c6 
column=info:regioninfo, timestamp=1530044902910, value={ENCODED => 
11724acb879200aa8ff0aaeef8c624e5, NAME => 
'hbase:namespace,,1530043803815.11724acb879200aa8ff0aaeef8c624e5.'
 24e5.   , STARTKEY => '', 
ENDKEY => ''}
 hbase:namespace,,1530043803815.11724acb879200aa8ff0aaeef8c6 
column=info:seqnumDuringOpen, timestamp=1530044902910, 
value=\x00\x00\x00\x00\x00\x00\x00\x16
 24e5.
 hbase:namespace,,1530043803815.11724acb879200aa8ff0aaeef8c6 
column=info:server, timestamp=1530044902910, 
value=ctr-e138-1518143905142-380753-01-08.hwx.site:16020
 24e5.
 hbase:namespace,,1530043803815.11724acb879200aa8ff0aaeef8c6 
column=info:serverstartcode, timestamp=1530044902910, value=1530044412656
 24e5.
 hbase:namespace,,1530043803815.11724acb879200aa8ff0aaeef8c6 column=info:sn, 
timestamp=1530044827438, 
value=ctr-e138-1518143905142-380753-01-07.hwx.site,16020,1530044401376
 24e5.
 hbase:namespace,,1530043803815.11724acb879200aa8ff0aaeef8c6 column=info:state, 
timestamp=1530044902910, value=OPEN
 24e5.
2 row(s)
Took 0.1913 seconds{noformat}

> info:servername and info:sn inconsistent for OPEN region
> 
>
> Key: HBASE-20792
> URL: https://issues.apache.org/jira/browse/HBASE-20792
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 2.0.2
>
>
> Next problem we've run into after HBASE-20752 and HBASE-20708
> After a rolling restart of a cluster, we'll see situations where a collection 
> of regions will simply not be assigned out to the RS. I was able to reproduce 
> this my mimic the restart patterns our tests do internally (ignore whether 
> this is the best way to restart nodes for now :)). The general pattern is 
> this:
> {code:java}
> for rs in regionservers:
>   stop(server, rs, RS)
> for master in masters:
>   stop(server, master, MASTER)
> sleep(15)
> for master in masters:
>   start(server, master, MASTER)
> for rs in regionservers:
>   start(server, rs, RS){code}
> Looking at meta, we can see why the Master is ignoring some regions:
> {noformat}
>  test
> column=table:state, timestamp=1529871718998, value=\x08\x00
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:regioninfo, timestamp=1529967103390, value={ENCODED => 
> 0297f680df6dc0166a44f9536346268e, NAME => 
> 'test,,1529871718122.0297f680df6dc0166a44f9536346268e.', STARTKEY
>  => '', ENDKEY => 
> ''}
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:seqnumDuringOpen, timestamp=1529967103390, 
> value=\x00\x00\x00\x00\x00\x00\x00*
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:server, timestamp=1529967103390, 
> value=ctr-e138-1518143905142-378097-02-12.hwx.site:16020
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:serverstartcode, timestamp=1529967103390, value=1529966776248
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   column=info:sn, 
> timestamp=1529967096482, 
> value=ctr-e138-1518143905142-378097-02-06.hwx.site,16020,1529966755170
>  test,,1529871718122.0297f680df6dc0166a44f9536346268e.   
> column=info:state, timestamp=1529967103390, value=OPEN{noformat}
> The region is marked as {{OPEN}}. The master doesn't know any better. 
> However, the interesting bit is that {{info:server}} and {{info:sn}} are 
> inconsistent (which, according to the javadoc should not be possible for an 
> {{OPEN}} 

[jira] [Commented] (HBASE-15320) HBase connector for Kafka Connect

2018-06-26 Thread Mike Wingert (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524189#comment-16524189
 ] 

Mike Wingert commented on HBASE-15320:
--

I've attached an unpolished patch that uses the public API's instead of the 
internal ones for feedback.

 

> HBase connector for Kafka Connect
> -
>
> Key: HBASE-15320
> URL: https://issues.apache.org/jira/browse/HBASE-15320
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Andrew Purtell
>Assignee: Mike Wingert
>Priority: Major
>  Labels: beginner
> Fix For: 3.0.0
>
> Attachments: HBASE-15320.master.1.patch, HBASE-15320.master.10.patch, 
> HBASE-15320.master.11.patch, HBASE-15320.master.2.patch, 
> HBASE-15320.master.3.patch, HBASE-15320.master.4.patch, 
> HBASE-15320.master.5.patch, HBASE-15320.master.6.patch, 
> HBASE-15320.master.7.patch, HBASE-15320.master.8.patch, 
> HBASE-15320.master.8.patch, HBASE-15320.master.9.patch, HBASE-15320.pdf, 
> HBASE-15320.pdf
>
>
> Implement an HBase connector with source and sink tasks for the Connect 
> framework (http://docs.confluent.io/2.0.0/connect/index.html) available in 
> Kafka 0.9 and later.
> See also: 
> http://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines
> An HBase source 
> (http://docs.confluent.io/2.0.0/connect/devguide.html#task-example-source-task)
>  could be implemented as a replication endpoint or WALObserver, publishing 
> cluster wide change streams from the WAL to one or more topics, with 
> configurable mapping and partitioning of table changes to topics.  
> An HBase sink task 
> (http://docs.confluent.io/2.0.0/connect/devguide.html#sink-tasks) would 
> persist, with optional transformation (JSON? Avro?, map fields to native 
> schema?), Kafka SinkRecords into HBase tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20787) Rebase the HBASE-18477 onto the current master to continue dev

2018-06-26 Thread Zach York (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524187#comment-16524187
 ] 

Zach York commented on HBASE-20787:
---

Did a force push to clean the branch up.

> Rebase the HBASE-18477 onto the current master to continue dev
> --
>
> Key: HBASE-20787
> URL: https://issues.apache.org/jira/browse/HBASE-20787
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: HBASE-18477
>Reporter: Zach York
>Assignee: Zach York
>Priority: Minor
> Fix For: HBASE-18477
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-15320) HBase connector for Kafka Connect

2018-06-26 Thread Mike Wingert (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Wingert updated HBASE-15320:
-
Attachment: HBASE-15320.master.11.patch

> HBase connector for Kafka Connect
> -
>
> Key: HBASE-15320
> URL: https://issues.apache.org/jira/browse/HBASE-15320
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Andrew Purtell
>Assignee: Mike Wingert
>Priority: Major
>  Labels: beginner
> Fix For: 3.0.0
>
> Attachments: HBASE-15320.master.1.patch, HBASE-15320.master.10.patch, 
> HBASE-15320.master.11.patch, HBASE-15320.master.2.patch, 
> HBASE-15320.master.3.patch, HBASE-15320.master.4.patch, 
> HBASE-15320.master.5.patch, HBASE-15320.master.6.patch, 
> HBASE-15320.master.7.patch, HBASE-15320.master.8.patch, 
> HBASE-15320.master.8.patch, HBASE-15320.master.9.patch, HBASE-15320.pdf, 
> HBASE-15320.pdf
>
>
> Implement an HBase connector with source and sink tasks for the Connect 
> framework (http://docs.confluent.io/2.0.0/connect/index.html) available in 
> Kafka 0.9 and later.
> See also: 
> http://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines
> An HBase source 
> (http://docs.confluent.io/2.0.0/connect/devguide.html#task-example-source-task)
>  could be implemented as a replication endpoint or WALObserver, publishing 
> cluster wide change streams from the WAL to one or more topics, with 
> configurable mapping and partitioning of table changes to topics.  
> An HBase sink task 
> (http://docs.confluent.io/2.0.0/connect/devguide.html#sink-tasks) would 
> persist, with optional transformation (JSON? Avro?, map fields to native 
> schema?), Kafka SinkRecords into HBase tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20795) Allow option in BBKVComparator.compare to do comparison without sequence id

2018-06-26 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-20795:
--
Status: Patch Available  (was: Open)

> Allow option in BBKVComparator.compare to do comparison without sequence id
> ---
>
> Key: HBASE-20795
> URL: https://issues.apache.org/jira/browse/HBASE-20795
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.1
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-20795.patch
>
>
> CellComparatorImpl#compare(final Cell a, final Cell b, boolean 
> ignoreSequenceid) needs to ignore sequence id in comparison if 
> ignoreSequenceId parameter is set to true but BBKVComparator.compare used 
> internally for the cell of type ByteBufferKeyValue doesn't consider this.
>  {code}
> @Override
>   public int compare(final Cell a, final Cell b, boolean ignoreSequenceid) {
> int diff = 0;
> // "Peel off" the most common path.
> if (a instanceof ByteBufferKeyValue && b instanceof ByteBufferKeyValue) {
>   diff = BBKVComparator.compare((ByteBufferKeyValue)a, 
> (ByteBufferKeyValue)b);
>   if (diff != 0) {
> return diff;
>   }
> } else {
>   diff = compareRows(a, b);
>   if (diff != 0) {
> return diff;
>   }
>   diff = compareWithoutRow(a, b);
>   if (diff != 0) {
> return diff;
>   }
> }
> // Negate following comparisons so later edits show up first mvccVersion: 
> later sorts first
> return ignoreSequenceid? diff: Long.compare(b.getSequenceId(), 
> a.getSequenceId());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20789) TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky

2018-06-26 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524186#comment-16524186
 ] 

Ted Yu commented on HBASE-20789:


Zack:
You can get test failure from:
https://builds.apache.org/job/HBASE-Find-Flaky-Tests/lastSuccessfulBuild/artifact/dashboard.html

> TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
> ---
>
> Key: HBASE-20789
> URL: https://issues.apache.org/jira/browse/HBASE-20789
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
>
> The UT failed frequently in our internal branch-2... Will dig into the UT.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20795) Allow option in BBKVComparator.compare to do comparison without sequence id

2018-06-26 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-20795:
--
Affects Version/s: (was: 2.0.0)
   2.0.1

> Allow option in BBKVComparator.compare to do comparison without sequence id
> ---
>
> Key: HBASE-20795
> URL: https://issues.apache.org/jira/browse/HBASE-20795
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.1
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-20795.patch
>
>
> CellComparatorImpl#compare(final Cell a, final Cell b, boolean 
> ignoreSequenceid) needs to ignore sequence id in comparison if 
> ignoreSequenceId parameter is set to true but BBKVComparator.compare used 
> internally for the cell of type ByteBufferKeyValue doesn't consider this.
>  {code}
> @Override
>   public int compare(final Cell a, final Cell b, boolean ignoreSequenceid) {
> int diff = 0;
> // "Peel off" the most common path.
> if (a instanceof ByteBufferKeyValue && b instanceof ByteBufferKeyValue) {
>   diff = BBKVComparator.compare((ByteBufferKeyValue)a, 
> (ByteBufferKeyValue)b);
>   if (diff != 0) {
> return diff;
>   }
> } else {
>   diff = compareRows(a, b);
>   if (diff != 0) {
> return diff;
>   }
>   diff = compareWithoutRow(a, b);
>   if (diff != 0) {
> return diff;
>   }
> }
> // Negate following comparisons so later edits show up first mvccVersion: 
> later sorts first
> return ignoreSequenceid? diff: Long.compare(b.getSequenceId(), 
> a.getSequenceId());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20789) TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky

2018-06-26 Thread Zach York (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524184#comment-16524184
 ] 

Zach York commented on HBASE-20789:
---

Sorry the comment wasn't updated, I think I had updated the comment locally, 
but it must not have been pushed out.

 

Basically there are 3 cases here:

equality (0) -> these blocks are exactly the same, no issue.

(-1) -> The existing block has nextBlockOnDiskSize set so we will get 
performance gains by keeping that version.

(1) -> The new block has nextBlockOnDiskSize set so it makes sense to cache the 
new version

 

Please let me know if anything is unclear, I can try to clear it up and I can 
try to improve this logging.

Where is the test failing? AFAIK there shouldn't be much flakiness in this 
test, but let's fix it if there is.

Thanks for digging in!

> TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
> ---
>
> Key: HBASE-20789
> URL: https://issues.apache.org/jira/browse/HBASE-20789
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
>
> The UT failed frequently in our internal branch-2... Will dig into the UT.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20795) Allow option in BBKVComparator.compare to do comparison without sequence id

2018-06-26 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-20795:
--
Description: 
CellComparatorImpl#compare(final Cell a, final Cell b, boolean 
ignoreSequenceid) needs to ignore sequence id in comparison if ignoreSequenceId 
parameter is set to true but BBKVComparator.compare used internally for the 
cell of type ByteBufferKeyValue doesn't consider this.

 {code}
@Override
  public int compare(final Cell a, final Cell b, boolean ignoreSequenceid) {

int diff = 0;
// "Peel off" the most common path.
if (a instanceof ByteBufferKeyValue && b instanceof ByteBufferKeyValue) {
  diff = BBKVComparator.compare((ByteBufferKeyValue)a, 
(ByteBufferKeyValue)b);
  if (diff != 0) {
return diff;
  }
} else {
  diff = compareRows(a, b);
  if (diff != 0) {
return diff;
  }

  diff = compareWithoutRow(a, b);
  if (diff != 0) {
return diff;
  }
}

// Negate following comparisons so later edits show up first mvccVersion: 
later sorts first
return ignoreSequenceid? diff: Long.compare(b.getSequenceId(), 
a.getSequenceId());
  }
{code}

> Allow option in BBKVComparator.compare to do comparison without sequence id
> ---
>
> Key: HBASE-20795
> URL: https://issues.apache.org/jira/browse/HBASE-20795
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-20795.patch
>
>
> CellComparatorImpl#compare(final Cell a, final Cell b, boolean 
> ignoreSequenceid) needs to ignore sequence id in comparison if 
> ignoreSequenceId parameter is set to true but BBKVComparator.compare used 
> internally for the cell of type ByteBufferKeyValue doesn't consider this.
>  {code}
> @Override
>   public int compare(final Cell a, final Cell b, boolean ignoreSequenceid) {
> int diff = 0;
> // "Peel off" the most common path.
> if (a instanceof ByteBufferKeyValue && b instanceof ByteBufferKeyValue) {
>   diff = BBKVComparator.compare((ByteBufferKeyValue)a, 
> (ByteBufferKeyValue)b);
>   if (diff != 0) {
> return diff;
>   }
> } else {
>   diff = compareRows(a, b);
>   if (diff != 0) {
> return diff;
>   }
>   diff = compareWithoutRow(a, b);
>   if (diff != 0) {
> return diff;
>   }
> }
> // Negate following comparisons so later edits show up first mvccVersion: 
> later sorts first
> return ignoreSequenceid? diff: Long.compare(b.getSequenceId(), 
> a.getSequenceId());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20795) Allow option in BBKVComparator.compare to do comparison without sequence id

2018-06-26 Thread Ankit Singhal (JIRA)
Ankit Singhal created HBASE-20795:
-

 Summary: Allow option in BBKVComparator.compare to do comparison 
without sequence id
 Key: HBASE-20795
 URL: https://issues.apache.org/jira/browse/HBASE-20795
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Ankit Singhal
Assignee: Ankit Singhal
 Attachments: HBASE-20795.patch





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20795) Allow option in BBKVComparator.compare to do comparison without sequence id

2018-06-26 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-20795:
--
Attachment: HBASE-20795.patch

> Allow option in BBKVComparator.compare to do comparison without sequence id
> ---
>
> Key: HBASE-20795
> URL: https://issues.apache.org/jira/browse/HBASE-20795
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-20795.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >