[jira] [Commented] (HBASE-19808) Reenable TestMultiParallel

2018-01-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328481#comment-16328481
 ] 

Hadoop QA commented on HBASE-19808:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
49s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
4s{color} | {color:red} hbase-server: The patch generated 1 new + 10 unchanged 
- 0 fixed = 11 total (was 10) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
39s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
20m 58s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 98m  
0s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}138m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19808 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12906345/0001-HBASE-19808-Reenable-TestMultiParallel.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 5eacca954fb7 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 8b6b2b0b22 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11078/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11078/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11078/testReport/ |
| modules | C: hbase-server U: hbase-se

[jira] [Created] (HBASE-19809) Fix findbugs and error-prone warnings in hbase-procedure (branch-2)

2018-01-17 Thread Peter Somogyi (JIRA)
Peter Somogyi created HBASE-19809:
-

 Summary: Fix findbugs and error-prone warnings in hbase-procedure 
(branch-2)
 Key: HBASE-19809
 URL: https://issues.apache.org/jira/browse/HBASE-19809
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0-beta-1
Reporter: Peter Somogyi
 Fix For: 2.0.0-beta-2






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-19809) Fix findbugs and error-prone warnings in hbase-procedure (branch-2)

2018-01-17 Thread Peter Somogyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi reassigned HBASE-19809:
-

Assignee: Peter Somogyi

> Fix findbugs and error-prone warnings in hbase-procedure (branch-2)
> ---
>
> Key: HBASE-19809
> URL: https://issues.apache.org/jira/browse/HBASE-19809
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0-beta-1
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 2.0.0-beta-2
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19809) Fix findbugs and error-prone warnings in hbase-procedure (branch-2)

2018-01-17 Thread Peter Somogyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi updated HBASE-19809:
--
Status: Patch Available  (was: Open)

> Fix findbugs and error-prone warnings in hbase-procedure (branch-2)
> ---
>
> Key: HBASE-19809
> URL: https://issues.apache.org/jira/browse/HBASE-19809
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0-beta-1
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19809.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19809) Fix findbugs and error-prone warnings in hbase-procedure (branch-2)

2018-01-17 Thread Peter Somogyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi updated HBASE-19809:
--
Attachment: HBASE-19809.master.001.patch

> Fix findbugs and error-prone warnings in hbase-procedure (branch-2)
> ---
>
> Key: HBASE-19809
> URL: https://issues.apache.org/jira/browse/HBASE-19809
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0-beta-1
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19809.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-19810) Fix findbugs and error-prone warnings in hbase-metrics (branch-2)

2018-01-17 Thread Peter Somogyi (JIRA)
Peter Somogyi created HBASE-19810:
-

 Summary: Fix findbugs and error-prone warnings in hbase-metrics 
(branch-2)
 Key: HBASE-19810
 URL: https://issues.apache.org/jira/browse/HBASE-19810
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0-beta-1
Reporter: Peter Somogyi
Assignee: Peter Somogyi
 Fix For: 2.0.0-beta-2






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19810) Fix findbugs and error-prone warnings in hbase-metrics (branch-2)

2018-01-17 Thread Peter Somogyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi updated HBASE-19810:
--
Status: Patch Available  (was: Open)

> Fix findbugs and error-prone warnings in hbase-metrics (branch-2)
> -
>
> Key: HBASE-19810
> URL: https://issues.apache.org/jira/browse/HBASE-19810
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0-beta-1
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19810.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19810) Fix findbugs and error-prone warnings in hbase-metrics (branch-2)

2018-01-17 Thread Peter Somogyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi updated HBASE-19810:
--
Attachment: HBASE-19810.master.001.patch

> Fix findbugs and error-prone warnings in hbase-metrics (branch-2)
> -
>
> Key: HBASE-19810
> URL: https://issues.apache.org/jira/browse/HBASE-19810
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0-beta-1
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19810.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19810) Fix findbugs and error-prone warnings in hbase-metrics (branch-2)

2018-01-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328557#comment-16328557
 ] 

Hadoop QA commented on HBASE-19810:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
8s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
55s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
37s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
20m 12s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hbase-metrics in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 9s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 36m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19810 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12906369/HBASE-19810.master.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 3c787794bc51 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / d8d6ecdad1 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11080/testReport/ |
| modules | C: hbase-metrics U: hbase-metrics |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11080/console |
| Powered by | Apache Yetus 0.6.0   http://yetus.apache.org |


This message was automatically generated.



> Fix findbugs and error-prone warnings in hbase-metrics (branch-2)
> -

[jira] [Commented] (HBASE-19809) Fix findbugs and error-prone warnings in hbase-procedure (branch-2)

2018-01-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328558#comment-16328558
 ] 

Hadoop QA commented on HBASE-19809:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
 3s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
20s{color} | {color:red} hbase-procedure: The patch generated 1 new + 38 
unchanged - 2 fixed = 39 total (was 40) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
16s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
23m 11s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
3s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 9s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19809 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12906365/HBASE-19809.master.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux de2761f8bbce 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 
15:49:21 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / d8d6ecdad1 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11079/artifact/patchprocess/diff-checkstyle-hbase-procedure.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11079/testReport/ |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11079/console |
| Powered by | Apache Yetus 0.6.0   http://yetus.apache.org |


This message was automatically generated.



>

[jira] [Updated] (HBASE-19809) Fix findbugs and error-prone warnings in hbase-procedure (branch-2)

2018-01-17 Thread Peter Somogyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi updated HBASE-19809:
--
Status: In Progress  (was: Patch Available)

> Fix findbugs and error-prone warnings in hbase-procedure (branch-2)
> ---
>
> Key: HBASE-19809
> URL: https://issues.apache.org/jira/browse/HBASE-19809
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0-beta-1
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19809.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19809) Fix findbugs and error-prone warnings in hbase-procedure (branch-2)

2018-01-17 Thread Peter Somogyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi updated HBASE-19809:
--
Status: Patch Available  (was: In Progress)

Fixed import order in second patch.

> Fix findbugs and error-prone warnings in hbase-procedure (branch-2)
> ---
>
> Key: HBASE-19809
> URL: https://issues.apache.org/jira/browse/HBASE-19809
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0-beta-1
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19809.master.001.patch, 
> HBASE-19809.master.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19809) Fix findbugs and error-prone warnings in hbase-procedure (branch-2)

2018-01-17 Thread Peter Somogyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi updated HBASE-19809:
--
Attachment: HBASE-19809.master.002.patch

> Fix findbugs and error-prone warnings in hbase-procedure (branch-2)
> ---
>
> Key: HBASE-19809
> URL: https://issues.apache.org/jira/browse/HBASE-19809
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0-beta-1
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19809.master.001.patch, 
> HBASE-19809.master.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19809) Fix findbugs and error-prone warnings in hbase-procedure (branch-2)

2018-01-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328606#comment-16328606
 ] 

Hadoop QA commented on HBASE-19809:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
49s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
14s{color} | {color:red} hbase-procedure: The patch generated 1 new + 38 
unchanged - 2 fixed = 39 total (was 40) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
38s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
19m 59s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
1s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 8s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 38m 29s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19809 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12906365/HBASE-19809.master.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 9de7e18ab3b7 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 
15:49:21 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / d8d6ecdad1 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11081/artifact/patchprocess/diff-checkstyle-hbase-procedure.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11081/testReport/ |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11081/console |
| Powered by | Apache Yetus 0.6.0   http://yetus.apache.org |


This message was automatically generated.



>

[jira] [Commented] (HBASE-19809) Fix findbugs and error-prone warnings in hbase-procedure (branch-2)

2018-01-17 Thread Peter Somogyi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328608#comment-16328608
 ] 

Peter Somogyi commented on HBASE-19809:
---

This result is faulty since precommit ran against patch 1 again so it found the 
same checkstyle warning that was fixed in patch 2.

> Fix findbugs and error-prone warnings in hbase-procedure (branch-2)
> ---
>
> Key: HBASE-19809
> URL: https://issues.apache.org/jira/browse/HBASE-19809
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0-beta-1
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19809.master.001.patch, 
> HBASE-19809.master.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19803) False positive for the HBASE-Find-Flaky-Tests job

2018-01-17 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328707#comment-16328707
 ] 

Duo Zhang commented on HBASE-19803:
---

I've registered a SecurityManager in HConstants. Need some hack to let the 
surefire framework can still call System.exit.

First is that surefire will use System.exit to exit even for a successful test.
Second is that ForkedBooter.kill will use System.exit, I believe it is used to 
kill timeout test.

I've already hack for these two cases and started the third try.

> False positive for the HBASE-Find-Flaky-Tests job
> -
>
> Key: HBASE-19803
> URL: https://issues.apache.org/jira/browse/HBASE-19803
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>Priority: Major
>
> It reports two hangs for TestAsyncTableGetMultiThreaded, but I checked the 
> surefire output
> https://builds.apache.org/job/HBASE-Flaky-Tests/24830/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt
> This one was likely to be killed in the middle of the run within 20 seconds.
> https://builds.apache.org/job/HBASE-Flaky-Tests/24852/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt
> This one was also killed within about 1 minutes.
> The test is declared as LargeTests so the time limit should be 10 minutes. It 
> seems that the jvm may crash during the mvn test run and then we will kill 
> all the running tests and then we may mark some of them as hang which leads 
> to the false positive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19792) TestReplicationSmallTests.testDisableEnable fails

2018-01-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328708#comment-16328708
 ] 

Hudson commented on HBASE-19792:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4417 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/4417/])
HBASE-19792 TestReplicationSmallTests.testDisableEnable fails (zhangduo: rev 
d8d6ecdad1b360f877b290e462cf8bf9b717753b)
* (delete) 
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSmallTests.java
* (add) 
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/replication/TestVerifyReplication.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSmallTests.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationEmptyWALRecovery.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationBase.java


> TestReplicationSmallTests.testDisableEnable fails
> -
>
> Key: HBASE-19792
> URL: https://issues.apache.org/jira/browse/HBASE-19792
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19792.patch, HBASE-19792.patch, HBASE-19792.patch, 
> org.apache.hadoop.hbase.replication.TestReplicationSmallTests-output.txt
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19799) Add web UI to rsgroup

2018-01-17 Thread Balazs Meszaros (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328722#comment-16328722
 ] 

Balazs Meszaros commented on HBASE-19799:
-

+1 I tested the Web UI, it is great!

> Add web UI to rsgroup
> -
>
> Key: HBASE-19799
> URL: https://issues.apache.org/jira/browse/HBASE-19799
> Project: HBase
>  Issue Type: New Feature
>  Components: rsgroup, UI
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-19799.master.001.patch, master_rsgroup.png, 
> rsgroup_detail.png
>
>
> When the RSGroup feature is enabled, there is’t a webui to show the details 
> of rsgroup. we can only view the details of the rsgroup via shell commands, 
> which is inconvenient.
> This issue will add a webui to rsgroup. To show the statistics and details of 
> each rsgroup



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19779) The chunk encountering the OOM will store in ChunkCreator forever

2018-01-17 Thread Chia-Ping Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai updated HBASE-19779:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

> The chunk encountering the OOM will store in ChunkCreator forever
> -
>
> Key: HBASE-19779
> URL: https://issues.apache.org/jira/browse/HBASE-19779
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Minor
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19779.v0.patch, HBASE-19779.v1.patch
>
>
> If Chunk#init fail on OOM, the MSLABimpl won't store the id of chunk. We have 
> no chance to remove the chunk from {{ChunkCreator}} since MSLABimpl have 
> missed the id.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19736) Remove BaseLogCleanerDelegate deprecated #isLogDeletable(FileStatus) and use #isFileDeletable(FileStatus) instead

2018-01-17 Thread Chia-Ping Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai updated HBASE-19736:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the patch. [~reidchan]

> Remove BaseLogCleanerDelegate deprecated #isLogDeletable(FileStatus) and use 
> #isFileDeletable(FileStatus) instead
> -
>
> Key: HBASE-19736
> URL: https://issues.apache.org/jira/browse/HBASE-19736
> Project: HBase
>  Issue Type: Improvement
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Minor
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19736.master.001.patch, 
> HBASE-19736.master.002.patch, HBASE-19736.master.003.patch
>
>
> Mark #isLogDeletable(FileStatus) deprecated, and update server-side codes to 
> use #isFileDeletable(FileStatus).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-19811) findbugs and error-prone warnings in hbase-server (branch-2)

2018-01-17 Thread Peter Somogyi (JIRA)
Peter Somogyi created HBASE-19811:
-

 Summary:  findbugs and error-prone warnings in hbase-server 
(branch-2)
 Key: HBASE-19811
 URL: https://issues.apache.org/jira/browse/HBASE-19811
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0-beta-1
Reporter: Peter Somogyi
Assignee: Peter Somogyi
 Fix For: 2.0.0-beta-2






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HBASE-19811) findbugs and error-prone warnings in hbase-server (branch-2)

2018-01-17 Thread Peter Somogyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-19811 started by Peter Somogyi.
-
>  findbugs and error-prone warnings in hbase-server (branch-2)
> -
>
> Key: HBASE-19811
> URL: https://issues.apache.org/jira/browse/HBASE-19811
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0-beta-1
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 2.0.0-beta-2
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19812) TestFlushSnapshotFromClient fails because of failing region.flush

2018-01-17 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-19812:
--
Description: 
{noformat}
2018-01-17 06:43:48,390 INFO  [MemStoreFlusher.1] regionserver.HRegion(2516): 
Flushing 1/1 column families, memstore=549.25 KB
2018-01-17 06:43:48,390 DEBUG [MemStoreFlusher.1] 
regionserver.CompactingMemStore(205): FLUSHING TO DISK: region 
test,5,1516171425662.acafc22e1f8132285eae5362d0df536a.store: fam
2018-01-17 06:43:48,406 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=42601-inmemoryCompactions-1516171428312]
 regionserver.CompactionPipeline(206): Compaction pipeline segment 
Type=CSLMImmutableSegment, empty=no, cellCount=17576, cellSize=562432, 
totalHeapSize=1828120, min timestamp=1516171428258, max 
timestamp=1516171428258Num uniques -1;  flattened
2018-01-17 06:43:48,406 DEBUG [MemStoreFlusher.1] 
regionserver.CompactionPipeline(128): Swapping pipeline suffix; before=1, new 
segement=null
2018-01-17 06:43:48,455 DEBUG [Time-limited test] regionserver.HRegion(2201): 
NOT flushing memstore for region 
test,5,1516171425662.acafc22e1f8132285eae5362d0df536a., flushing=true, 
writesEnabled=true
{noformat}

You can see that we start a background flush first, and then we decided to do 
an in memory compaction, at the same time we call the region.flush from test, 
and it find that the region is already flushing so it give up.

This test is a bit awkward that we create the table with 6 regions which start 
key is 0,1,2,3,4,5, but when loading data we use 'aaa' to 'zzz', so there is 
only one region has data. And in the above scenario the only one region gives 
up flushing, then there is no data, and then our test fails.

> TestFlushSnapshotFromClient fails because of failing region.flush
> -
>
> Key: HBASE-19812
> URL: https://issues.apache.org/jira/browse/HBASE-19812
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>Priority: Major
>
> {noformat}
> 2018-01-17 06:43:48,390 INFO  [MemStoreFlusher.1] regionserver.HRegion(2516): 
> Flushing 1/1 column families, memstore=549.25 KB
> 2018-01-17 06:43:48,390 DEBUG [MemStoreFlusher.1] 
> regionserver.CompactingMemStore(205): FLUSHING TO DISK: region 
> test,5,1516171425662.acafc22e1f8132285eae5362d0df536a.store: fam
> 2018-01-17 06:43:48,406 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=42601-inmemoryCompactions-1516171428312]
>  regionserver.CompactionPipeline(206): Compaction pipeline segment 
> Type=CSLMImmutableSegment, empty=no, cellCount=17576, cellSize=562432, 
> totalHeapSize=1828120, min timestamp=1516171428258, max 
> timestamp=1516171428258Num uniques -1;  flattened
> 2018-01-17 06:43:48,406 DEBUG [MemStoreFlusher.1] 
> regionserver.CompactionPipeline(128): Swapping pipeline suffix; before=1, new 
> segement=null
> 2018-01-17 06:43:48,455 DEBUG [Time-limited test] regionserver.HRegion(2201): 
> NOT flushing memstore for region 
> test,5,1516171425662.acafc22e1f8132285eae5362d0df536a., flushing=true, 
> writesEnabled=true
> {noformat}
> You can see that we start a background flush first, and then we decided to do 
> an in memory compaction, at the same time we call the region.flush from test, 
> and it find that the region is already flushing so it give up.
> This test is a bit awkward that we create the table with 6 regions which 
> start key is 0,1,2,3,4,5, but when loading data we use 'aaa' to 'zzz', so 
> there is only one region has data. And in the above scenario the only one 
> region gives up flushing, then there is no data, and then our test fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19812) TestFlushSnapshotFromClient fails because of failing region.flush

2018-01-17 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-19812:
--
Environment: (was: {noformat}
2018-01-17 06:43:48,390 INFO  [MemStoreFlusher.1] regionserver.HRegion(2516): 
Flushing 1/1 column families, memstore=549.25 KB
2018-01-17 06:43:48,390 DEBUG [MemStoreFlusher.1] 
regionserver.CompactingMemStore(205): FLUSHING TO DISK: region 
test,5,1516171425662.acafc22e1f8132285eae5362d0df536a.store: fam
2018-01-17 06:43:48,406 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=42601-inmemoryCompactions-1516171428312]
 regionserver.CompactionPipeline(206): Compaction pipeline segment 
Type=CSLMImmutableSegment, empty=no, cellCount=17576, cellSize=562432, 
totalHeapSize=1828120, min timestamp=1516171428258, max 
timestamp=1516171428258Num uniques -1;  flattened
2018-01-17 06:43:48,406 DEBUG [MemStoreFlusher.1] 
regionserver.CompactionPipeline(128): Swapping pipeline suffix; before=1, new 
segement=null
2018-01-17 06:43:48,455 DEBUG [Time-limited test] regionserver.HRegion(2201): 
NOT flushing memstore for region 
test,5,1516171425662.acafc22e1f8132285eae5362d0df536a., flushing=true, 
writesEnabled=true
{noformat}

You can see that we start a background flush first, and then we decided to do 
an in memory compaction, at the same time we call the region.flush from test, 
and it find that the region is already flushing so it give up.

This test is a bit awkward that we create the table with 6 regions which start 
key is 0,1,2,3,4,5, but when loading data we use 'aaa' to 'zzz', so there is 
only one region has data. And in the above scenario the only one region gives 
up flushing, then there is no data, and then our test fails.)

> TestFlushSnapshotFromClient fails because of failing region.flush
> -
>
> Key: HBASE-19812
> URL: https://issues.apache.org/jira/browse/HBASE-19812
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-19812) TestFlushSnapshotFromClient fails because of failing region.flush

2018-01-17 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-19812:
-

 Summary: TestFlushSnapshotFromClient fails because of failing 
region.flush
 Key: HBASE-19812
 URL: https://issues.apache.org/jira/browse/HBASE-19812
 Project: HBase
  Issue Type: Bug
 Environment: {noformat}
2018-01-17 06:43:48,390 INFO  [MemStoreFlusher.1] regionserver.HRegion(2516): 
Flushing 1/1 column families, memstore=549.25 KB
2018-01-17 06:43:48,390 DEBUG [MemStoreFlusher.1] 
regionserver.CompactingMemStore(205): FLUSHING TO DISK: region 
test,5,1516171425662.acafc22e1f8132285eae5362d0df536a.store: fam
2018-01-17 06:43:48,406 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=42601-inmemoryCompactions-1516171428312]
 regionserver.CompactionPipeline(206): Compaction pipeline segment 
Type=CSLMImmutableSegment, empty=no, cellCount=17576, cellSize=562432, 
totalHeapSize=1828120, min timestamp=1516171428258, max 
timestamp=1516171428258Num uniques -1;  flattened
2018-01-17 06:43:48,406 DEBUG [MemStoreFlusher.1] 
regionserver.CompactionPipeline(128): Swapping pipeline suffix; before=1, new 
segement=null
2018-01-17 06:43:48,455 DEBUG [Time-limited test] regionserver.HRegion(2201): 
NOT flushing memstore for region 
test,5,1516171425662.acafc22e1f8132285eae5362d0df536a., flushing=true, 
writesEnabled=true
{noformat}

You can see that we start a background flush first, and then we decided to do 
an in memory compaction, at the same time we call the region.flush from test, 
and it find that the region is already flushing so it give up.

This test is a bit awkward that we create the table with 6 regions which start 
key is 0,1,2,3,4,5, but when loading data we use 'aaa' to 'zzz', so there is 
only one region has data. And in the above scenario the only one region gives 
up flushing, then there is no data, and then our test fails.
Reporter: Duo Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19812) TestFlushSnapshotFromClient fails because of failing region.flush

2018-01-17 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328771#comment-16328771
 ] 

Duo Zhang commented on HBASE-19812:
---

I think disable in memory compaction here can solve the problem but I want to 
see if there are better solutions.

What do you think? [~stack] [~ram_krish] [~anastas].

Thanks.

> TestFlushSnapshotFromClient fails because of failing region.flush
> -
>
> Key: HBASE-19812
> URL: https://issues.apache.org/jira/browse/HBASE-19812
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>Priority: Major
>
> {noformat}
> 2018-01-17 06:43:48,390 INFO  [MemStoreFlusher.1] regionserver.HRegion(2516): 
> Flushing 1/1 column families, memstore=549.25 KB
> 2018-01-17 06:43:48,390 DEBUG [MemStoreFlusher.1] 
> regionserver.CompactingMemStore(205): FLUSHING TO DISK: region 
> test,5,1516171425662.acafc22e1f8132285eae5362d0df536a.store: fam
> 2018-01-17 06:43:48,406 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=42601-inmemoryCompactions-1516171428312]
>  regionserver.CompactionPipeline(206): Compaction pipeline segment 
> Type=CSLMImmutableSegment, empty=no, cellCount=17576, cellSize=562432, 
> totalHeapSize=1828120, min timestamp=1516171428258, max 
> timestamp=1516171428258Num uniques -1;  flattened
> 2018-01-17 06:43:48,406 DEBUG [MemStoreFlusher.1] 
> regionserver.CompactionPipeline(128): Swapping pipeline suffix; before=1, new 
> segement=null
> 2018-01-17 06:43:48,455 DEBUG [Time-limited test] regionserver.HRegion(2201): 
> NOT flushing memstore for region 
> test,5,1516171425662.acafc22e1f8132285eae5362d0df536a., flushing=true, 
> writesEnabled=true
> {noformat}
> You can see that we start a background flush first, and then we decided to do 
> an in memory compaction, at the same time we call the region.flush from test, 
> and it find that the region is already flushing so it give up.
> This test is a bit awkward that we create the table with 6 regions which 
> start key is 0,1,2,3,4,5, but when loading data we use 'aaa' to 'zzz', so 
> there is only one region has data. And in the above scenario the only one 
> region gives up flushing, then there is no data, and then our test fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19799) Add web UI to rsgroup

2018-01-17 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-19799:
---
Status: Open  (was: Patch Available)

> Add web UI to rsgroup
> -
>
> Key: HBASE-19799
> URL: https://issues.apache.org/jira/browse/HBASE-19799
> Project: HBase
>  Issue Type: New Feature
>  Components: rsgroup, UI
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-19799.master.001.patch, master_rsgroup.png, 
> rsgroup_detail.png
>
>
> When the RSGroup feature is enabled, there is’t a webui to show the details 
> of rsgroup. we can only view the details of the rsgroup via shell commands, 
> which is inconvenient.
> This issue will add a webui to rsgroup. To show the statistics and details of 
> each rsgroup



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19770) Add '--return-values' option to Shell to print return values of commands in interactive mode

2018-01-17 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328897#comment-16328897
 ] 

Josh Elser commented on HBASE-19770:


bq. V3 still didn't address this I think?

Hah, and that's why I put up another patch :)

> Add '--return-values' option to Shell to print return values of commands in 
> interactive mode
> 
>
> Key: HBASE-19770
> URL: https://issues.apache.org/jira/browse/HBASE-19770
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: Romil Choksi
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19770.001.branch-2.patch, 
> HBASE-19770.002.branch-2.patch, HBASE-19770.003.branch-2.patch
>
>
> Another good find by our Romil.
> {code}
> hbase(main):001:0> list
> TABLE
> a
> 1 row(s)
> Took 0.8385 seconds
> hbase(main):002:0> tables=list
> TABLE
> a
> 1 row(s)
> Took 0.0267 seconds
> hbase(main):003:0> puts tables
> hbase(main):004:0> p tables
> nil
> {code}
> The {{list}} command should be returning {{\['a'\]}} but is not.
> The command class itself appears to be doing the right thing -- maybe the 
> retval is getting lost somewhere else?
> FYI [~stack].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19813) clone_snapshot fails with region failing to open when RS group feature is enabled

2018-01-17 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-19813:
---
Description: 
The following scenario came from support case.
In cluster 1, create RS group rsg. Move table to rsg group.
Take snapshot of the table and copy the snapshot to cluster 2 where there is no 
group called rsg.
Cloning snapshot to table new_t4 on cluster 2 fails :
{code}
2018-01-09 11:45:30,468 INFO  [RestoreSnapshot-pool68-t1] regionserver.HRegion: 
Closed new_t4,,1514454789243.a6173d2955182ac5bde208301681c6af.
2018-01-09 11:45:30,468 INFO  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
snapshot.CloneSnapshotHandler: Clone snapshot=snap_t3 on table=new_t4 completed!
2018-01-09 11:45:30,492 INFO  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
hbase.MetaTableAccessor: Added 1
2018-01-09 11:45:30,492 WARN  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
rsgroup.RSGroupBasedLoadBalancer: Group for table new_t4 is null
2018-01-09 11:45:30,492 DEBUG [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
rsgroup.RSGroupBasedLoadBalancer: Group Information found to be null. Some 
regions might be unassigned.
2018-01-09 11:45:30,492 WARN  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
master.RegionStates: Failed to open/close a6173d2955182ac5bde208301681c6af on 
null, set to FAILED_OPEN
{code}
Here is related code from RSGroupBasedLoadBalancer:
{code}
List candidateList = filterOfflineServers(info, servers);
for (RegionInfo region : regionList) {
  currentAssignmentMap.put(region, regions.get(region));
}
if(candidateList.size() > 0) {
  assignments.putAll(this.internalBalancer.retainAssignment(
  currentAssignmentMap, candidateList));
{code}
candidateList is empty for table new_t4, leaving region for the table in 
FAILED_OPEN state.

> clone_snapshot fails with region failing to open when RS group feature is 
> enabled
> -
>
> Key: HBASE-19813
> URL: https://issues.apache.org/jira/browse/HBASE-19813
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
>
> The following scenario came from support case.
> In cluster 1, create RS group rsg. Move table to rsg group.
> Take snapshot of the table and copy the snapshot to cluster 2 where there is 
> no group called rsg.
> Cloning snapshot to table new_t4 on cluster 2 fails :
> {code}
> 2018-01-09 11:45:30,468 INFO  [RestoreSnapshot-pool68-t1] 
> regionserver.HRegion: Closed 
> new_t4,,1514454789243.a6173d2955182ac5bde208301681c6af.
> 2018-01-09 11:45:30,468 INFO  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
> snapshot.CloneSnapshotHandler: Clone snapshot=snap_t3 on table=new_t4 
> completed!
> 2018-01-09 11:45:30,492 INFO  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
> hbase.MetaTableAccessor: Added 1
> 2018-01-09 11:45:30,492 WARN  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
> rsgroup.RSGroupBasedLoadBalancer: Group for table new_t4 is null
> 2018-01-09 11:45:30,492 DEBUG [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
> rsgroup.RSGroupBasedLoadBalancer: Group Information found to be null. Some 
> regions might be unassigned.
> 2018-01-09 11:45:30,492 WARN  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
> master.RegionStates: Failed to open/close a6173d2955182ac5bde208301681c6af on 
> null, set to FAILED_OPEN
> {code}
> Here is related code from RSGroupBasedLoadBalancer:
> {code}
> List candidateList = filterOfflineServers(info, servers);
> for (RegionInfo region : regionList) {
>   currentAssignmentMap.put(region, regions.get(region));
> }
> if(candidateList.size() > 0) {
>   assignments.putAll(this.internalBalancer.retainAssignment(
>   currentAssignmentMap, candidateList));
> {code}
> candidateList is empty for table new_t4, leaving region for the table in 
> FAILED_OPEN state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-19813) clone_snapshot fails with region failing to open when RS group feature is enabled

2018-01-17 Thread Ted Yu (JIRA)
Ted Yu created HBASE-19813:
--

 Summary: clone_snapshot fails with region failing to open when RS 
group feature is enabled
 Key: HBASE-19813
 URL: https://issues.apache.org/jira/browse/HBASE-19813
 Project: HBase
  Issue Type: Bug
 Environment: The following scenario came from support case.
In cluster 1, create RS group rsg. Move table to rsg group.
Take snapshot of the table and copy the snapshot to cluster 2 where there is no 
group called rsg.
Cloning snapshot to table new_t4 on cluster 2 fails :
{code}
2018-01-09 11:45:30,468 INFO  [RestoreSnapshot-pool68-t1] regionserver.HRegion: 
Closed new_t4,,1514454789243.a6173d2955182ac5bde208301681c6af.
2018-01-09 11:45:30,468 INFO  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
snapshot.CloneSnapshotHandler: Clone snapshot=snap_t3 on table=new_t4 completed!
2018-01-09 11:45:30,492 INFO  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
hbase.MetaTableAccessor: Added 1
2018-01-09 11:45:30,492 WARN  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
rsgroup.RSGroupBasedLoadBalancer: Group for table new_t4 is null
2018-01-09 11:45:30,492 DEBUG [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
rsgroup.RSGroupBasedLoadBalancer: Group Information found to be null. Some 
regions might be unassigned.
2018-01-09 11:45:30,492 WARN  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
master.RegionStates: Failed to open/close a6173d2955182ac5bde208301681c6af on 
null, set to FAILED_OPEN
{code}
Here is related code from RSGroupBasedLoadBalancer:
{code}
List candidateList = filterOfflineServers(info, servers);
for (RegionInfo region : regionList) {
  currentAssignmentMap.put(region, regions.get(region));
}
if(candidateList.size() > 0) {
  assignments.putAll(this.internalBalancer.retainAssignment(
  currentAssignmentMap, candidateList));
{code}
candidateList is empty for table new_t4, leaving region for the table in 
FAILED_OPEN state.
Reporter: Ted Yu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19813) clone_snapshot fails with region failing to open when RS group feature is enabled

2018-01-17 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-19813:
---
Environment: (was: The following scenario came from support case.
In cluster 1, create RS group rsg. Move table to rsg group.
Take snapshot of the table and copy the snapshot to cluster 2 where there is no 
group called rsg.
Cloning snapshot to table new_t4 on cluster 2 fails :
{code}
2018-01-09 11:45:30,468 INFO  [RestoreSnapshot-pool68-t1] regionserver.HRegion: 
Closed new_t4,,1514454789243.a6173d2955182ac5bde208301681c6af.
2018-01-09 11:45:30,468 INFO  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
snapshot.CloneSnapshotHandler: Clone snapshot=snap_t3 on table=new_t4 completed!
2018-01-09 11:45:30,492 INFO  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
hbase.MetaTableAccessor: Added 1
2018-01-09 11:45:30,492 WARN  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
rsgroup.RSGroupBasedLoadBalancer: Group for table new_t4 is null
2018-01-09 11:45:30,492 DEBUG [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
rsgroup.RSGroupBasedLoadBalancer: Group Information found to be null. Some 
regions might be unassigned.
2018-01-09 11:45:30,492 WARN  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
master.RegionStates: Failed to open/close a6173d2955182ac5bde208301681c6af on 
null, set to FAILED_OPEN
{code}
Here is related code from RSGroupBasedLoadBalancer:
{code}
List candidateList = filterOfflineServers(info, servers);
for (RegionInfo region : regionList) {
  currentAssignmentMap.put(region, regions.get(region));
}
if(candidateList.size() > 0) {
  assignments.putAll(this.internalBalancer.retainAssignment(
  currentAssignmentMap, candidateList));
{code}
candidateList is empty for table new_t4, leaving region for the table in 
FAILED_OPEN state.)

> clone_snapshot fails with region failing to open when RS group feature is 
> enabled
> -
>
> Key: HBASE-19813
> URL: https://issues.apache.org/jira/browse/HBASE-19813
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19813) clone_snapshot fails with region failing to open when RS group feature is enabled

2018-01-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328937#comment-16328937
 ] 

Ted Yu commented on HBASE-19813:


Currently snapshot cloning has no concept of assigning RS group:
{code}
  hbase> clone_snapshot 'snapshotName', 'namespace:tableName'
{code}
One option is to allow specification of RS group:
{code}
  hbase> clone_snapshot 'snapshotName', 'namespace:tableName', 'rsgroupName'
{code}
Master can assign the new table to the RS group first so that 
RSGroupBasedLoadBalancer has servers for the table regions.

> clone_snapshot fails with region failing to open when RS group feature is 
> enabled
> -
>
> Key: HBASE-19813
> URL: https://issues.apache.org/jira/browse/HBASE-19813
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
>
> The following scenario came from support case.
> In cluster 1, create RS group rsg. Move table to rsg group.
> Take snapshot of the table and copy the snapshot to cluster 2 where there is 
> no group called rsg.
> Cloning snapshot to table new_t4 on cluster 2 fails :
> {code}
> 2018-01-09 11:45:30,468 INFO  [RestoreSnapshot-pool68-t1] 
> regionserver.HRegion: Closed 
> new_t4,,1514454789243.a6173d2955182ac5bde208301681c6af.
> 2018-01-09 11:45:30,468 INFO  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
> snapshot.CloneSnapshotHandler: Clone snapshot=snap_t3 on table=new_t4 
> completed!
> 2018-01-09 11:45:30,492 INFO  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
> hbase.MetaTableAccessor: Added 1
> 2018-01-09 11:45:30,492 WARN  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
> rsgroup.RSGroupBasedLoadBalancer: Group for table new_t4 is null
> 2018-01-09 11:45:30,492 DEBUG [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
> rsgroup.RSGroupBasedLoadBalancer: Group Information found to be null. Some 
> regions might be unassigned.
> 2018-01-09 11:45:30,492 WARN  [MASTER_TABLE_OPERATIONS-shubh1-1:16000-0] 
> master.RegionStates: Failed to open/close a6173d2955182ac5bde208301681c6af on 
> null, set to FAILED_OPEN
> {code}
> Here is related code from RSGroupBasedLoadBalancer:
> {code}
> List candidateList = filterOfflineServers(info, servers);
> for (RegionInfo region : regionList) {
>   currentAssignmentMap.put(region, regions.get(region));
> }
> if(candidateList.size() > 0) {
>   assignments.putAll(this.internalBalancer.retainAssignment(
>   currentAssignmentMap, candidateList));
> {code}
> candidateList is empty for table new_t4, leaving region for the table in 
> FAILED_OPEN state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19598) Fix TestAssignmentManagerMetrics flaky test

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19598:
--
Status: Patch Available  (was: Open)

> Fix TestAssignmentManagerMetrics flaky test
> ---
>
> Key: HBASE-19598
> URL: https://issues.apache.org/jira/browse/HBASE-19598
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-1
>Reporter: Balazs Meszaros
>Assignee: Balazs Meszaros
>Priority: Major
> Attachments: HBASE-19598.master.001.patch, TestUtil.java
>
>
> TestAssignmentManagerMetrics fails constantly. After bisecting, it seems that 
> commit 010012cbcb broke it (HBASE-18946).
> The test method runs successfully, but it cannot shut the minicluster down, 
> and hangs forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19808) Reenable TestMultiParallel

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19808:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to master and branch-2.

> Reenable TestMultiParallel
> --
>
> Key: HBASE-19808
> URL: https://issues.apache.org/jira/browse/HBASE-19808
> Project: HBase
>  Issue Type: Bug
>  Components: test
> Environment: Reenable TestMultiParallel and half of 
> TestRegionServerReadRequestMetrics. They depended on Master being able to 
> carry the system tables exclusively. Disabling this request, they work so 
> just enable them again.
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: 0001-HBASE-19808-Reenable-TestMultiParallel.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19598) Fix TestAssignmentManagerMetrics flaky test

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19598:
--
Attachment: HBASE-19598.master.002.patch

> Fix TestAssignmentManagerMetrics flaky test
> ---
>
> Key: HBASE-19598
> URL: https://issues.apache.org/jira/browse/HBASE-19598
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-1
>Reporter: Balazs Meszaros
>Assignee: Balazs Meszaros
>Priority: Major
> Attachments: HBASE-19598.master.001.patch, 
> HBASE-19598.master.002.patch, TestUtil.java
>
>
> TestAssignmentManagerMetrics fails constantly. After bisecting, it seems that 
> commit 010012cbcb broke it (HBASE-18946).
> The test method runs successfully, but it cannot shut the minicluster down, 
> and hangs forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19598) Fix TestAssignmentManagerMetrics flaky test

2018-01-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329103#comment-16329103
 ] 

stack commented on HBASE-19598:
---

.002 Address [~Apache9] review comments up on rb.

> Fix TestAssignmentManagerMetrics flaky test
> ---
>
> Key: HBASE-19598
> URL: https://issues.apache.org/jira/browse/HBASE-19598
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-1
>Reporter: Balazs Meszaros
>Assignee: Balazs Meszaros
>Priority: Major
> Attachments: HBASE-19598.master.001.patch, 
> HBASE-19598.master.002.patch, TestUtil.java
>
>
> TestAssignmentManagerMetrics fails constantly. After bisecting, it seems that 
> commit 010012cbcb broke it (HBASE-18946).
> The test method runs successfully, but it cannot shut the minicluster down, 
> and hangs forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-19785) System Regions on the Master is broken

2018-01-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328309#comment-16328309
 ] 

stack edited comment on HBASE-19785 at 1/17/18 6:00 PM:


I spent more time on this. The Master-as-RegionServer needs more work. The 
complication is special-handling assigning hbase:meta ahead of all other 
regions. Master would need to have checked-in as an 'ordinary' RegionServer way 
early in Master startup. This complicates assign. In past we had Master start 
up a background thread that took care of background check-in by RegionServers 
but it 'escaped' our control of startup sequence so could have cases as in the 
parent issue where a mis-sequencing of events could have the Master kill 
itself; i.e. Master Connection failing to read the clusterid on setup because 
Master had not yet set it; and so on .

What sort-of-works is that the Master can act as any other RegionServer. It'll 
be late to check in so will probably miss the initial assignments but should 
pick up regions the next time the balancer runs.

TODO: backup Masters carrying regions.

For Master to be true RegionServer, needs more work/refactor/thought. Meantime, 
I can reenable a bunch of the disabled tests above: all of TestMultiParallel if 
I don't stipulate system tables on Master only and half of 
TestRegionServerReadRequestMetrics (too lazy to figure the counts in the 
remainder).  The TestRegionsOnMasterOptions has the three possible 
combinations. The system-tables-on-master only is what does not work and is 
disabled.


was (Author: stack):
I spent more time on this. The Master-as-RegionServer needs more work. The 
complication is special-handling assigning hbase:meta ahead of all other 
regions. Master would need to have checked-in as an 'ordinary' RegionServer way 
early in Master startup. This complicates assign. In past we had Master start 
up a background thread that took care of background check-in by RegionServers 
but it 'escaped' our control of startup sequence.

What sort-of-works is that the Master can act as any other RegionServer. It'll 
be late to check in so will probably miss the initial assignments but should 
pick up regions the next time the balancer runs.

TODO: backup Masters carrying regions.

For Master to be true RegionServer, needs more work/refactor/thought. Meantime, 
I can reenable a bunch of the disabled tests above: all of TestMultiParallel if 
I don't stipulate system tables on Master only and half of 
TestRegionServerReadRequestMetrics (too lazy to figure the counts in the 
remainder).  The TestRegionsOnMasterOptions has the three possible 
combinations. The system-tables-on-master only is what does not work and is 
disabled.

> System Regions on the Master is broken
> --
>
> Key: HBASE-19785
> URL: https://issues.apache.org/jira/browse/HBASE-19785
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.0
>
>
> The parent issue broke our being able to host system regions only on the 
> Master.
> This broke a few tests that depend on this ability. Two of the below actually 
> enable system regions on the Master for the test run. The remainder is the 
> test that make sure this works.
> TestMultiParallel
> TestRegionsOnMasterOptions
> TestRegionServerReadRequestMetrics
> Parent changed the startup order. System regions and Master-as-a-RegionServer 
> are having  issues because we wait for regionservers to check in before 
> completing Master startup which gets interesting when Master is supposed to 
> act like a RegionServer. Previously, Master startup was off in a background 
> thread.
> Needs more thought but not required for beta-1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19774) incorrect behavior of locateRegionInMeta

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19774:
--
Fix Version/s: (was: 2.0.0-beta-1)
   2.0.0-beta-2

> incorrect behavior of locateRegionInMeta
> 
>
> Key: HBASE-19774
> URL: https://issues.apache.org/jira/browse/HBASE-19774
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-1
>Reporter: Romil Choksi
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19774-wip-branch-2.patch
>
>
> When we try to operate with not existing table, in some circumstances we get 
> an incorrect report about the not existing table:
> {noformat}
> ERROR: Region of 
> 'hbase:namespace,,1510363071508.0d8ddea7654f95130959218e9bc9c89c.' is 
> expected in the table of 'nonExistentUsertable', but hbase:meta says it is in 
> the table of 'hbase:namespace'. hbase:meta might be damaged.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-19196) Release hbase-2.0.0-beta-1; the "Finish-line" release

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-19196.
---
Resolution: Fixed
  Assignee: stack

Resolving. beta-1 was pushed yesterday.

> Release hbase-2.0.0-beta-1; the "Finish-line" release
> -
>
> Key: HBASE-19196
> URL: https://issues.apache.org/jira/browse/HBASE-19196
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.0.0-beta-1
>
>
> APIs done, but external facing and Coprocessors. Done w/ features. Bug fixes 
> only from here on out. There'll be a beta-2 but that is about rolling upgrade 
> and bug fixes only. Then our first 2.0.0 Release Candidate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-19814) Release hbase-2.0.0-beta-2; "rolling upgrade" release

2018-01-17 Thread stack (JIRA)
stack created HBASE-19814:
-

 Summary: Release hbase-2.0.0-beta-2; "rolling upgrade" release
 Key: HBASE-19814
 URL: https://issues.apache.org/jira/browse/HBASE-19814
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 2.0.0-beta-2






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19598) Fix TestAssignmentManagerMetrics flaky test

2018-01-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329141#comment-16329141
 ] 

Hadoop QA commented on HBASE-19598:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
32s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
53s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  7m 
12s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
11s{color} | {color:red} hbase-server: The patch generated 2 new + 472 
unchanged - 0 fixed = 474 total (was 472) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
44s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
24m 37s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
0s{color} | {color:green} hbase-zookeeper in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 23m 23s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m 39s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.TestClientClusterMetrics |
|   | hadoop.hbase.TestClientClusterStatus |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19598 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12906340/HBASE-19598.master.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux c5cd7d19f611 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 
15:49:21 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / a3c98b2dd8 |
| maven | version: Apache Maven 3.5.2 
(138edd

[jira] [Commented] (HBASE-19770) Add '--return-values' option to Shell to print return values of commands in interactive mode

2018-01-17 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329140#comment-16329140
 ] 

Josh Elser commented on HBASE-19770:


bq. gave +1 earlier, feel free to submit whenever you're done with minor 
changes afa i'm concerned. (just stating it explicitly )

Ok, thanks again, Appy! I have Mike's changes done locally. Will push them up 
assuming the shell tests pass on master (branch-2 are good).

[~andrew.purt...@gmail.com], you OK with this change for branch-1.4?

> Add '--return-values' option to Shell to print return values of commands in 
> interactive mode
> 
>
> Key: HBASE-19770
> URL: https://issues.apache.org/jira/browse/HBASE-19770
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: Romil Choksi
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19770.001.branch-2.patch, 
> HBASE-19770.002.branch-2.patch, HBASE-19770.003.branch-2.patch
>
>
> Another good find by our Romil.
> {code}
> hbase(main):001:0> list
> TABLE
> a
> 1 row(s)
> Took 0.8385 seconds
> hbase(main):002:0> tables=list
> TABLE
> a
> 1 row(s)
> Took 0.0267 seconds
> hbase(main):003:0> puts tables
> hbase(main):004:0> p tables
> nil
> {code}
> The {{list}} command should be returning {{\['a'\]}} but is not.
> The command class itself appears to be doing the right thing -- maybe the 
> retval is getting lost somewhere else?
> FYI [~stack].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19770) Add '--return-values' option to Shell to print return values of commands in interactive mode

2018-01-17 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-19770:
---
Hadoop Flags: Reviewed
Release Note: Introduces a new option to the HBase shell: -r, 
--return-values. When the shell is in "interactive" mode (default), the return 
value of shell commands are not returned to the user as they dirty the console 
output. For those who desire this functionality, the "--return-values" option 
restores the old functionality of the commands passing their return value to 
the user.

> Add '--return-values' option to Shell to print return values of commands in 
> interactive mode
> 
>
> Key: HBASE-19770
> URL: https://issues.apache.org/jira/browse/HBASE-19770
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: Romil Choksi
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19770.001.branch-2.patch, 
> HBASE-19770.002.branch-2.patch, HBASE-19770.003.branch-2.patch
>
>
> Another good find by our Romil.
> {code}
> hbase(main):001:0> list
> TABLE
> a
> 1 row(s)
> Took 0.8385 seconds
> hbase(main):002:0> tables=list
> TABLE
> a
> 1 row(s)
> Took 0.0267 seconds
> hbase(main):003:0> puts tables
> hbase(main):004:0> p tables
> nil
> {code}
> The {{list}} command should be returning {{\['a'\]}} but is not.
> The command class itself appears to be doing the right thing -- maybe the 
> retval is getting lost somewhere else?
> FYI [~stack].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19812) TestFlushSnapshotFromClient fails because of failing region.flush

2018-01-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329171#comment-16329171
 ] 

stack commented on HBASE-19812:
---

Yeah, disable the in-memory compaction. Test is old. It has been around for 
ever. Probably needs overhaul. In meantime, yeah, if a timing issue, just 
disable the inmemory compaction for now.

I was going to try and do it but I can't get the test to fail running on two 
machines. It passed in the nightly but I do see it on the branch-2 flakies 
greatest hits here so its a problem 
[https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html]
 

> TestFlushSnapshotFromClient fails because of failing region.flush
> -
>
> Key: HBASE-19812
> URL: https://issues.apache.org/jira/browse/HBASE-19812
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>Priority: Major
>
> {noformat}
> 2018-01-17 06:43:48,390 INFO  [MemStoreFlusher.1] regionserver.HRegion(2516): 
> Flushing 1/1 column families, memstore=549.25 KB
> 2018-01-17 06:43:48,390 DEBUG [MemStoreFlusher.1] 
> regionserver.CompactingMemStore(205): FLUSHING TO DISK: region 
> test,5,1516171425662.acafc22e1f8132285eae5362d0df536a.store: fam
> 2018-01-17 06:43:48,406 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=42601-inmemoryCompactions-1516171428312]
>  regionserver.CompactionPipeline(206): Compaction pipeline segment 
> Type=CSLMImmutableSegment, empty=no, cellCount=17576, cellSize=562432, 
> totalHeapSize=1828120, min timestamp=1516171428258, max 
> timestamp=1516171428258Num uniques -1;  flattened
> 2018-01-17 06:43:48,406 DEBUG [MemStoreFlusher.1] 
> regionserver.CompactionPipeline(128): Swapping pipeline suffix; before=1, new 
> segement=null
> 2018-01-17 06:43:48,455 DEBUG [Time-limited test] regionserver.HRegion(2201): 
> NOT flushing memstore for region 
> test,5,1516171425662.acafc22e1f8132285eae5362d0df536a., flushing=true, 
> writesEnabled=true
> {noformat}
> You can see that we start a background flush first, and then we decided to do 
> an in memory compaction, at the same time we call the region.flush from test, 
> and it find that the region is already flushing so it give up.
> This test is a bit awkward that we create the table with 6 regions which 
> start key is 0,1,2,3,4,5, but when loading data we use 'aaa' to 'zzz', so 
> there is only one region has data. And in the above scenario the only one 
> region gives up flushing, then there is no data, and then our test fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-19815) Flakey TestAssignmentManager.testAssignWithRandExec

2018-01-17 Thread stack (JIRA)
stack created HBASE-19815:
-

 Summary: Flakey TestAssignmentManager.testAssignWithRandExec
 Key: HBASE-19815
 URL: https://issues.apache.org/jira/browse/HBASE-19815
 Project: HBase
  Issue Type: Bug
  Components: flakey, test
Reporter: stack
Assignee: stack


Saw the below in flakies failures 
https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html
 Seems to be highest failing incidence in branch-2.

{code}
2018-01-17 15:43:52,872 ERROR [ProcExecWrkr-12] 
procedure2.ProcedureExecutor(1481): CODE-BUG: Uncaught runtime exception: 
pid=5, ppid=4, state=RUNNABLE:RECOVER_META_SPLIT_LOGS; RecoverMetaProcedure 
failedMetaServer=localhost,104,1, splitWal=false
java.lang.ClassCastException: 
org.apache.hadoop.hbase.master.assignment.MockMasterServices cannot be cast to 
org.apache.hadoop.hbase.master.HMaster
at 
org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.prepare(RecoverMetaProcedure.java:253)
at 
org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.executeFromState(RecoverMetaProcedure.java:96)
at 
org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.executeFromState(RecoverMetaProcedure.java:51)
at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:182)
at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1456)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1225)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1735)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19815) Flakey TestAssignmentManager.testAssignWithRandExec

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19815:
--
Attachment: HBASE-19815.branch-2.001.patch

> Flakey TestAssignmentManager.testAssignWithRandExec
> ---
>
> Key: HBASE-19815
> URL: https://issues.apache.org/jira/browse/HBASE-19815
> Project: HBase
>  Issue Type: Bug
>  Components: flakey, test
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-19815.branch-2.001.patch
>
>
> Saw the below in flakies failures 
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html
>  Seems to be highest failing incidence in branch-2.
> {code}
> 2018-01-17 15:43:52,872 ERROR [ProcExecWrkr-12] 
> procedure2.ProcedureExecutor(1481): CODE-BUG: Uncaught runtime exception: 
> pid=5, ppid=4, state=RUNNABLE:RECOVER_META_SPLIT_LOGS; RecoverMetaProcedure 
> failedMetaServer=localhost,104,1, splitWal=false
> java.lang.ClassCastException: 
> org.apache.hadoop.hbase.master.assignment.MockMasterServices cannot be cast 
> to org.apache.hadoop.hbase.master.HMaster
>   at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.prepare(RecoverMetaProcedure.java:253)
>   at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.executeFromState(RecoverMetaProcedure.java:96)
>   at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.executeFromState(RecoverMetaProcedure.java:51)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:182)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1456)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1225)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1735)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19815) Flakey TestAssignmentManager.testAssignWithRandExec

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19815:
--
Fix Version/s: 2.0.0-beta-2
   Status: Patch Available  (was: Open)

.001 Fix cast failure.

> Flakey TestAssignmentManager.testAssignWithRandExec
> ---
>
> Key: HBASE-19815
> URL: https://issues.apache.org/jira/browse/HBASE-19815
> Project: HBase
>  Issue Type: Bug
>  Components: flakey, test
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19815.branch-2.001.patch
>
>
> Saw the below in flakies failures 
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html
>  Seems to be highest failing incidence in branch-2.
> {code}
> 2018-01-17 15:43:52,872 ERROR [ProcExecWrkr-12] 
> procedure2.ProcedureExecutor(1481): CODE-BUG: Uncaught runtime exception: 
> pid=5, ppid=4, state=RUNNABLE:RECOVER_META_SPLIT_LOGS; RecoverMetaProcedure 
> failedMetaServer=localhost,104,1, splitWal=false
> java.lang.ClassCastException: 
> org.apache.hadoop.hbase.master.assignment.MockMasterServices cannot be cast 
> to org.apache.hadoop.hbase.master.HMaster
>   at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.prepare(RecoverMetaProcedure.java:253)
>   at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.executeFromState(RecoverMetaProcedure.java:96)
>   at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.executeFromState(RecoverMetaProcedure.java:51)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:182)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1456)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1225)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1735)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19780) Change execution phase of checkstyle plugin back to default 'verify'

2018-01-17 Thread Jan Hentschel (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329245#comment-16329245
 ] 

Jan Hentschel commented on HBASE-19780:
---

The goal of HBASE-12521 is to have Checkstyle run as part of the actual build 
and I personally prefer it that way, but I see your point. If we do it that way 
we should add a short hint in the documentation that {{mvn 
checkstyle:checkstyle}} needs to work locally before submitting a patch and 
that committers should pay attention to that point in the pre-commit job. If we 
do it that way it isn't necessary to define Checkstyle in the sub-modules, as 
long as the plugin is configured in the top-level POM.

> Change execution phase of checkstyle plugin back to default 'verify'
> 
>
> Key: HBASE-19780
> URL: https://issues.apache.org/jira/browse/HBASE-19780
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
>Priority: Major
> Attachments: HBASE-19780.master.001.patch, 
> HBASE-19780.master.002.patch
>
>
> Not able to run following command successfully:
> {{mvn -DskipTests install site 
> -Dmaven.repo.local=/Users/appy/Desktop/temp_repo}}
> Use a clean separate repo so that existing packages don't pollute the build.
> Error is following.
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-site-plugin:3.4:site (default-site) on project 
> hbase: failed to get report for 
> org.apache.maven.plugins:maven-javadoc-plugin: Failed to execute goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (checkstyle) on 
> project hbase-error-prone: Execution checkstyle of goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check failed: Plugin 
> org.apache.maven.plugins:maven-checkstyle-plugin:2.17 or one of its 
> dependencies could not be resolved: Failure to find 
> org.apache.hbase:hbase-checkstyle:jar:2.0.0-beta-1 in 
> http://repository.apache.org/snapshots/ was cached in the local repository, 
> resolution will not be reattempted until the update interval of 
> apache.snapshots has elapsed or updates are forced -> [Help 1]
> {noformat}
> Note that master build goes pass this point.
> Need to figure out what's the difference and fix the overall build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19598) Fix TestAssignmentManagerMetrics flaky test

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19598:
--
Attachment: HBASE-19598.master.003.patch

> Fix TestAssignmentManagerMetrics flaky test
> ---
>
> Key: HBASE-19598
> URL: https://issues.apache.org/jira/browse/HBASE-19598
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-1
>Reporter: Balazs Meszaros
>Assignee: Balazs Meszaros
>Priority: Major
> Attachments: HBASE-19598.master.001.patch, 
> HBASE-19598.master.002.patch, HBASE-19598.master.003.patch, TestUtil.java
>
>
> TestAssignmentManagerMetrics fails constantly. After bisecting, it seems that 
> commit 010012cbcb broke it (HBASE-18946).
> The test method runs successfully, but it cannot shut the minicluster down, 
> and hangs forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19598) Fix TestAssignmentManagerMetrics flaky test

2018-01-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329250#comment-16329250
 ] 

Hadoop QA commented on HBASE-19598:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
55s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
 9s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
13s{color} | {color:red} hbase-server: The patch generated 2 new + 472 
unchanged - 0 fixed = 474 total (was 472) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
40s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
20m 27s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
42s{color} | {color:green} hbase-zookeeper in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 35m 13s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.TestClientClusterStatus |
|   | hadoop.hbase.TestClientClusterMetrics |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19598 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12906340/HBASE-19598.master.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 9b4dff3cbbda 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / a3c98b2dd8 |
| maven | version: Apache Maven 3.5.2 
(138edd6

[jira] [Commented] (HBASE-19598) Fix TestAssignmentManagerMetrics flaky test

2018-01-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329249#comment-16329249
 ] 

stack commented on HBASE-19598:
---

.003 Fix checkstyle and failing test

> Fix TestAssignmentManagerMetrics flaky test
> ---
>
> Key: HBASE-19598
> URL: https://issues.apache.org/jira/browse/HBASE-19598
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-1
>Reporter: Balazs Meszaros
>Assignee: Balazs Meszaros
>Priority: Major
> Attachments: HBASE-19598.master.001.patch, 
> HBASE-19598.master.002.patch, HBASE-19598.master.003.patch, TestUtil.java
>
>
> TestAssignmentManagerMetrics fails constantly. After bisecting, it seems that 
> commit 010012cbcb broke it (HBASE-18946).
> The test method runs successfully, but it cannot shut the minicluster down, 
> and hangs forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19809) Fix findbugs and error-prone warnings in hbase-procedure (branch-2)

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19809:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to branch-2 and master. Thanks [~psomogyi]

> Fix findbugs and error-prone warnings in hbase-procedure (branch-2)
> ---
>
> Key: HBASE-19809
> URL: https://issues.apache.org/jira/browse/HBASE-19809
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0-beta-1
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19809.master.001.patch, 
> HBASE-19809.master.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19810) Fix findbugs and error-prone warnings in hbase-metrics (branch-2)

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19810:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to master and branch-2. Thanks for the patch [~psomogyi]

> Fix findbugs and error-prone warnings in hbase-metrics (branch-2)
> -
>
> Key: HBASE-19810
> URL: https://issues.apache.org/jira/browse/HBASE-19810
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0-beta-1
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19810.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19735) Create a minimal "client" tarball installation

2018-01-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329268#comment-16329268
 ] 

stack commented on HBASE-19735:
---

{quote}Was able to run a basic test against a cluster as well as use the hbase 
shell.
{quote}
Ain't you fancy!

MR?

> Create a minimal "client" tarball installation
> --
>
> Key: HBASE-19735
> URL: https://issues.apache.org/jira/browse/HBASE-19735
> Project: HBase
>  Issue Type: New Feature
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-19735.001.branch-2.patch, 
> HBASE-19735.002.branch-2.patch
>
>
> We're moving ourselves towards more controlled dependencies. A logical next 
> step is to try to do the same for our "binary" artifacts that we create 
> during releases.
> There is code (our's and our dependency's) which the HMaster and RegionServer 
> require which, obviously, clients do not need.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19280) Move HFileWriterImpl.compressionByName(String name) to some utility class

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19280:
--
Fix Version/s: (was: 2.0.0-beta-2)
   2.0.0

> Move HFileWriterImpl.compressionByName(String name) to some utility class
> -
>
> Key: HBASE-19280
> URL: https://issues.apache.org/jira/browse/HBASE-19280
> Project: HBase
>  Issue Type: Bug
>Reporter: Ankit Singhal
>Priority: Trivial
> Fix For: 2.0.0
>
>
> This method can be moved to some utility (related jira PHOENIX-4368).
> {code}
> public static Compression.Algorithm compressionByName(String algoName) {
> if (algoName == null)
>   return HFile.DEFAULT_COMPRESSION_ALGORITHM;
> return Compression.getCompressionAlgorithmByName(algoName);
>   }
> {code}
> FYI, [~elserj]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19678) HBase Admin security capabilities should be represented as a Set

2018-01-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329286#comment-16329286
 ] 

stack commented on HBASE-19678:
---

>From HBASE-19679: "OK, so I made a mistake here. I posted the same patch under 
>two different tickets. The ticket HBASE-19678 should be re-opened as that 
>ticket points out a larger structural issue. This ticket should be closed 
>because of the patch that was submitted as part of HBASE-19678 is a duplicate 
>of the one provided here and was already applied."

> HBase Admin security capabilities should be represented as a Set
> 
>
> Key: HBASE-19678
> URL: https://issues.apache.org/jira/browse/HBASE-19678
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Fix For: 2.0.0-beta-2
>
>
> {code:title=org.apache.hadoop.hbase.client.Admin}
>   /**
>* Return the set of supported security capabilities.
>* @throws IOException
>* @throws UnsupportedOperationException
>*/
>   List getSecurityCapabilities() throws IOException;
> {code}
> The comment says a "set" but it returns a List.  A Set would be the most 
> appropriate data structure here, an immutable one perhaps, because the code 
> that interacts with it looks up information using the _contains_ method which 
> would be served well by a Set.  Please change this interface to return a Set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19678) HBase Admin security capabilities should be represented as a Set

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19678:
--
Fix Version/s: (was: 2.0.0-beta-2)
   2.0.0

> HBase Admin security capabilities should be represented as a Set
> 
>
> Key: HBASE-19678
> URL: https://issues.apache.org/jira/browse/HBASE-19678
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Fix For: 2.0.0
>
>
> {code:title=org.apache.hadoop.hbase.client.Admin}
>   /**
>* Return the set of supported security capabilities.
>* @throws IOException
>* @throws UnsupportedOperationException
>*/
>   List getSecurityCapabilities() throws IOException;
> {code}
> The comment says a "set" but it returns a List.  A Set would be the most 
> appropriate data structure here, an immutable one perhaps, because the code 
> that interacts with it looks up information using the _contains_ method which 
> would be served well by a Set.  Please change this interface to return a Set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19678) HBase Admin security capabilities should be represented as a Set

2018-01-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329297#comment-16329297
 ] 

stack commented on HBASE-19678:
---

[~yuzhih...@gmail.com] I have to do my own investigation as to what is going on 
in this issue. Why no clarifying note on the end here that explains the state? 
(Author is wrong in the commits as is the adding BELUGA BEHR as suffix in 
parenthesis).

 

The committed patch looks like it got applied and then reverted:

 

commit c2ca90f0fb5177372e9f72917d67b49014a54b5b
Author: tedyu 
Date: Sun Dec 31 11:32:14 2017 -0800

HBASE-19678 HBase Admin security capabilities should be represented as a Set 
(BELUGA BEHR)

 

commit c394f3919e7981247c60a3d3b075ee554cee826b
Author: tedyu 
Date: Mon Jan 1 14:16:46 2018 -0800

HBASE-19678 HBase Admin security capabilities should be represented as a Set - 
revert due to wrong issue

> HBase Admin security capabilities should be represented as a Set
> 
>
> Key: HBASE-19678
> URL: https://issues.apache.org/jira/browse/HBASE-19678
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Fix For: 2.0.0
>
>
> {code:title=org.apache.hadoop.hbase.client.Admin}
>   /**
>* Return the set of supported security capabilities.
>* @throws IOException
>* @throws UnsupportedOperationException
>*/
>   List getSecurityCapabilities() throws IOException;
> {code}
> The comment says a "set" but it returns a List.  A Set would be the most 
> appropriate data structure here, an immutable one perhaps, because the code 
> that interacts with it looks up information using the _contains_ method which 
> would be served well by a Set.  Please change this interface to return a Set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19143) Add has(Option) to ClusterStatus

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19143:
--
Fix Version/s: (was: 2.0.0-beta-2)
   2.0.0

> Add has(Option) to ClusterStatus
> 
>
> Key: HBASE-19143
> URL: https://issues.apache.org/jira/browse/HBASE-19143
> Project: HBase
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Minor
> Fix For: 2.0.0
>
>
> It helps user to distinguish between nothing and you-do-not-ask.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19151) hbase shell from bin tarball warns of missing gems

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19151:
--
Fix Version/s: (was: 2.0.0-beta-2)
   2.0.0

> hbase shell from bin tarball warns of missing gems
> --
>
> Key: HBASE-19151
> URL: https://issues.apache.org/jira/browse/HBASE-19151
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.0.0-alpha-4
>Reporter: Mike Drob
>Priority: Minor
> Fix For: 2.0.0
>
>
> {noformat}
> mdrob@mdrob-MBP:/tmp/hb2a4/hbase-2.0.0-alpha4$ bin/hbase shell
> 2017-11-01 14:39:31,637 WARN  [main] util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HBase Shell
> Use "help" to get list of supported commands.
> Use "exit" to quit this interactive shell.
> Version 2.0.0-alpha4, r5c4b985f89c99cc8b0f8515a4097c811a0848835, Tue Oct 31 
> 16:00:33 PDT 2017
> Took 0.0029 seconds
> Ignoring executable-hooks-1.3.2 because its extensions are not built. Try: 
> gem pristine executable-hooks --version 1.3.2
> Ignoring gem-wrappers-1.2.7 because its extensions are not built. Try: gem 
> pristine gem-wrappers --version 1.2.7
> Ignoring rainbow-2.2.2 because its extensions are not built. Try: gem 
> pristine rainbow --version 2.2.2
> {noformat}
> We should not have warnings like that - either bundle the gems or figure out 
> how to remove them from the execution path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18800) [Propaganda] Push shaded hbase-client as gateway to an hbase cluster going forward

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-18800:
--
Fix Version/s: (was: 2.0.0-beta-2)
   2.0.0

> [Propaganda] Push shaded hbase-client as gateway to an hbase cluster going 
> forward
> --
>
> Key: HBASE-18800
> URL: https://issues.apache.org/jira/browse/HBASE-18800
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Priority: Minor
> Fix For: 2.0.0
>
>
> We've bandied this about for a while now that folks should consume hbase via 
> the shaded hbase-client; it should work if their needs are minimal (and if it 
> doesn't work, would be good to hear why). This issue is about evangelizing 
> the shaded hbase-client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19816) Replication sink list is not updated on UnknownHostException

2018-01-17 Thread Scott Wilson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wilson updated HBASE-19816:
-
Description: 
We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
cluster and taking all live traffic which is replicated to cluster 2. We 
decommissioned several instances in cluster 2. After this happened most of the 
regions servers in cluster 1 showed this message in their logs repeatedly. 

 

{{2018-01-12 23:49:36,507 WARN 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
 Can't replicate because of a local or network error:}}
{{ java.net.UnknownHostException: data-017b.hbase-2.prod}}
{{ \{{ at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
{{ \{{ at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
{{ \{{ at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
{{ \{{ at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
{{ \{{ at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
{{ \{{ at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
{{ \{{ at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
{{ \{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)
{{ \{{ at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
{{ \{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)
{{ \{{ at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
{{ \{{ at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
{{ \{{ at java.lang.Thread.run(Thread.java:748)

 

The host data-017b.hbase-2.prod was one of those that had been removed from 
cluster 2. Next we observed our replication lag from cluster 1 to cluster 2 was 
elevated. Some region servers reported ageOfLastShippedOperation to be close to 
an hour.

The only way we found to clear the message was to restart the region servers 
that showed this message in the log. Once we did replication returned to 
normal. Restarting the affected region servers in cluster 1 took several days 
because we could not bring the cluster down.

>From reading the code it appears the cause was the zookeeper watch not being 
>triggered for the region server list change in cluster 2. We verified the list 
>in zookeeper for cluster 2 was correct and did not include the removed nodes.

One concrete improvement to make would be to force a refresh of the sink 
cluster region server list when an UnknownHostException is found. This is 
already done if the there is a ConnectException in 
HBaseInterClusterReplicationEndpoint

{{} else if (ioe instanceof ConnectException) {}}
 \{{ LOG.warn("Peer is unavailable, rechecking all sinks: ", ioe);}}
 \{{ replicationSinkMgr.chooseSinks();}}

I propose that should be extended to cover UnknownHostException.

We observed this behavior on 1.2.0-cdh-5.11.1 but it appears the same code 
still exists on the current master branch.

 

  was:
We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
cluster and taking all live traffic which is replicated to cluster 2. We 
decommissioned several instances in cluster 2. After this happened most of the 
regions servers in cluster 1 showed this message in their logs repeatedly. 

 

{{2018-01-12 23:49:36,507 WARN 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
 Can't replicate because of a local or network error:}}
{{java.net.UnknownHostException: data-017b.hbase-2.prod}}
{{ at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)}}
{{ at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)}}
{{ at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)}}
{{ at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)}}
{{ at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)}}
{{ at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)}}
{{ at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicatio

[jira] [Created] (HBASE-19816) Replication sink list is not updated on UnknownHostException

2018-01-17 Thread Scott Wilson (JIRA)
Scott Wilson created HBASE-19816:


 Summary: Replication sink list is not updated on 
UnknownHostException
 Key: HBASE-19816
 URL: https://issues.apache.org/jira/browse/HBASE-19816
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 1.2.0, 2.0.0
 Environment: We have two clusters set up with bi-directional 
replication. The clusters are around 400 nodes each and hosted in AWS.
Reporter: Scott Wilson


We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
cluster and taking all live traffic which is replicated to cluster 2. We 
decommissioned several instances in cluster 2. After this happened most of the 
regions servers in cluster 1 showed this message in their logs repeatedly. 

 

{{2018-01-12 23:49:36,507 WARN 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
 Can't replicate because of a local or network error:}}
{{java.net.UnknownHostException: data-017b.hbase-2.prod}}
{{ at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)}}
{{ at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)}}
{{ at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)}}
{{ at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)}}
{{ at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)}}
{{ at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)}}
{{ at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)}}
{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
{{ at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}
{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
{{ at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
{{ at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
{{ at java.lang.Thread.run(Thread.java:748)}}

 

The host data-017b.hbase-2.prod was one of those that had been removed from 
cluster 2. Next we observed our replication lag from cluster 1 to cluster 2 was 
elevated. Some region servers reported ageOfLastShippedOperation to be close to 
an hour.

The only way we found to clear the message was to restart the region servers 
that showed this message in the log. Once we did replication returned to 
normal. Restarting the affected region servers in cluster 1 took several days 
because we could not bring the cluster down.

>From reading the code it appears the cause was the zookeeper watch not being 
>triggered for the region server list change in cluster 2. We verified the list 
>in zookeeper for cluster 2 was correct and did not include the removed nodes.

One concrete improvement to make would be to force a refresh of the sink 
cluster region server list when an UnknownHostException is found. This is 
already done if the there is a ConnectException in 
HBaseInterClusterReplicationEndpoint

{{} else if (ioe instanceof ConnectException) {}}
{{ LOG.warn("Peer is unavailable, rechecking all sinks: ", ioe);}}
{{ replicationSinkMgr.chooseSinks();}}

I propose that should be extended to cover UnknownHostException.

We observed this behavior on 1.2.0-cdh-5.11.1 but it appears the same code 
still exists on the current master branch.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19816) Replication sink list is not updated on UnknownHostException

2018-01-17 Thread Scott Wilson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wilson updated HBASE-19816:
-
Description: 
We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
cluster and taking all live traffic which is replicated to cluster 2. We 
decommissioned several instances in cluster 2. After this happened most of the 
regions servers in cluster 1 showed this message in their logs repeatedly. 

 
{code}
2018-01-12 23:49:36,507 WARN 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
 Can't replicate because of a local or network error:
java.net.UnknownHostException: data-017b.hbase-2.prod
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
 

The host data-017b.hbase-2.prod was one of those that had been removed from 
cluster 2. Next we observed our replication lag from cluster 1 to cluster 2 was 
elevated. Some region servers reported ageOfLastShippedOperation to be close to 
an hour.

The only way we found to clear the message was to restart the region servers 
that showed this message in the log. Once we did replication returned to 
normal. Restarting the affected region servers in cluster 1 took several days 
because we could not bring the cluster down.

>From reading the code it appears the cause was the zookeeper watch not being 
>triggered for the region server list change in cluster 2. We verified the list 
>in zookeeper for cluster 2 was correct and did not include the removed nodes.

One concrete improvement to make would be to force a refresh of the sink 
cluster region server list when an UnknownHostException is found. This is 
already done if the there is a ConnectException in 
HBaseInterClusterReplicationEndpoint

{code:java}
} else if (ioe instanceof ConnectException) {
  LOG.warn("Peer is unavailable, rechecking all sinks: ", ioe);
  replicationSinkMgr.chooseSinks();
{code}

I propose that should be extended to cover UnknownHostException.

We observed this behavior on 1.2.0-cdh-5.11.1 but it appears the same code 
still exists on the current master branch.

 

  was:
We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
cluster and taking all live traffic which is replicated to cluster 2. We 
decommissioned several instances in cluster 2. After this happened most of the 
regions servers in cluster 1 showed this message in their logs repeatedly. 

 

{{2018-01-12 23:49:36,507 WARN 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
 Can't replicate because of a local or network error:}}
{{ java.net.UnknownHostException: data-017b.hbase-2.prod}}
{{ \{{ at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
{{ \{{ at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
{{ \{{ at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
{{ \{{ at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
{{ \{{ at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
{{ \{{ at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
{{ \{{ at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
{{ \{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)

[jira] [Commented] (HBASE-19780) Change execution phase of checkstyle plugin back to default 'verify'

2018-01-17 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329314#comment-16329314
 ] 

Mike Drob commented on HBASE-19780:
---

It still runs as part of {{mvn verify}} right? I'm confused why it doesn't run 
as part of {{mvn install}} then and I disagree with that. Yes, it should be 
run, if you really want to squeeze an extra few seconds out there's always 
{{-Dmaven.checkstyle.skip}} although I might be slightly off on the property.

> Change execution phase of checkstyle plugin back to default 'verify'
> 
>
> Key: HBASE-19780
> URL: https://issues.apache.org/jira/browse/HBASE-19780
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
>Priority: Major
> Attachments: HBASE-19780.master.001.patch, 
> HBASE-19780.master.002.patch
>
>
> Not able to run following command successfully:
> {{mvn -DskipTests install site 
> -Dmaven.repo.local=/Users/appy/Desktop/temp_repo}}
> Use a clean separate repo so that existing packages don't pollute the build.
> Error is following.
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-site-plugin:3.4:site (default-site) on project 
> hbase: failed to get report for 
> org.apache.maven.plugins:maven-javadoc-plugin: Failed to execute goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (checkstyle) on 
> project hbase-error-prone: Execution checkstyle of goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check failed: Plugin 
> org.apache.maven.plugins:maven-checkstyle-plugin:2.17 or one of its 
> dependencies could not be resolved: Failure to find 
> org.apache.hbase:hbase-checkstyle:jar:2.0.0-beta-1 in 
> http://repository.apache.org/snapshots/ was cached in the local repository, 
> resolution will not be reattempted until the update interval of 
> apache.snapshots has elapsed or updates are forced -> [Help 1]
> {noformat}
> Note that master build goes pass this point.
> Need to figure out what's the difference and fix the overall build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19816) Replication sink list is not updated on UnknownHostException

2018-01-17 Thread Scott Wilson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wilson updated HBASE-19816:
-
Description: 
We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
cluster and taking all live traffic which is replicated to cluster 2. We 
decommissioned several instances in cluster 2. After this happened most of the 
regions servers in cluster 1 showed this message in their logs repeatedly. 
 
{code}
2018-01-12 23:49:36,507 WARN 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
 Can't replicate because of a local or network error:
java.net.UnknownHostException: data-017b.hbase-2.prod
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

The host data-017b.hbase-2.prod was one of those that had been removed from 
cluster 2. Next we observed our replication lag from cluster 1 to cluster 2 was 
elevated. Some region servers reported ageOfLastShippedOperation to be close to 
an hour.

The only way we found to clear the message was to restart the region servers 
that showed this message in the log. Once we did replication returned to 
normal. Restarting the affected region servers in cluster 1 took several days 
because we could not bring the cluster down.

>From reading the code it appears the cause was the zookeeper watch not being 
>triggered for the region server list change in cluster 2. We verified the list 
>in zookeeper for cluster 2 was correct and did not include the removed nodes.

One concrete improvement to make would be to force a refresh of the sink 
cluster region server list when an UnknownHostException is found. This is 
already done if the there is a ConnectException in 
HBaseInterClusterReplicationEndpoint

{code:java}
} else if (ioe instanceof ConnectException) {
  LOG.warn("Peer is unavailable, rechecking all sinks: ", ioe);
  replicationSinkMgr.chooseSinks();
{code}

I propose that should be extended to cover UnknownHostException.

We observed this behavior on 1.2.0-cdh-5.11.1 but it appears the same code 
still exists on the current master branch.

 

  was:
We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
cluster and taking all live traffic which is replicated to cluster 2. We 
decommissioned several instances in cluster 2. After this happened most of the 
regions servers in cluster 1 showed this message in their logs repeatedly. 

 
{code}
2018-01-12 23:49:36,507 WARN 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
 Can't replicate because of a local or network error:
java.net.UnknownHostException: data-017b.hbase-2.prod
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concur

[jira] [Updated] (HBASE-19816) Replication sink list is not updated on UnknownHostException

2018-01-17 Thread Scott Wilson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wilson updated HBASE-19816:
-
Description: 
We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
cluster and taking all live traffic which is replicated to cluster 2. We 
decommissioned several instances in cluster 2 which involves deleting the 
instance and its DNS record. After this happened most of the regions servers in 
cluster 1 showed this message in their logs repeatedly. 
 
{code}
2018-01-12 23:49:36,507 WARN 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
 Can't replicate because of a local or network error:
java.net.UnknownHostException: data-017b.hbase-2.prod
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

The host data-017b.hbase-2.prod was one of those that had been removed from 
cluster 2. Next we observed our replication lag from cluster 1 to cluster 2 was 
elevated. Some region servers reported ageOfLastShippedOperation to be close to 
an hour.

The only way we found to clear the message was to restart the region servers 
that showed this message in the log. Once we did replication returned to 
normal. Restarting the affected region servers in cluster 1 took several days 
because we could not bring the cluster down.

>From reading the code it appears the cause was the zookeeper watch not being 
>triggered for the region server list change in cluster 2. We verified the list 
>in zookeeper for cluster 2 was correct and did not include the removed nodes.

One concrete improvement to make would be to force a refresh of the sink 
cluster region server list when an UnknownHostException is found. This is 
already done if the there is a ConnectException in 
HBaseInterClusterReplicationEndpoint

{code:java}
} else if (ioe instanceof ConnectException) {
  LOG.warn("Peer is unavailable, rechecking all sinks: ", ioe);
  replicationSinkMgr.chooseSinks();
{code}

I propose that should be extended to cover UnknownHostException.

We observed this behavior on 1.2.0-cdh-5.11.1 but it appears the same code 
still exists on the current master branch.

 

  was:
We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
cluster and taking all live traffic which is replicated to cluster 2. We 
decommissioned several instances in cluster 2. After this happened most of the 
regions servers in cluster 1 showed this message in their logs repeatedly. 
 
{code}
2018-01-12 23:49:36,507 WARN 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
 Can't replicate because of a local or network error:
java.net.UnknownHostException: data-017b.hbase-2.prod
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$Runn

[jira] [Resolved] (HBASE-19716) Fix flaky test master.assignment.TestAssignmentManager

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-19716.
---
Resolution: Duplicate

Resolving as dupe of HBASE-19815? Seems like this test mostly passes now at 
least on branch-2 [~uagashe]

> Fix flaky test master.assignment.TestAssignmentManager
> --
>
> Key: HBASE-19716
> URL: https://issues.apache.org/jira/browse/HBASE-19716
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.0.0-beta-1
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.0.0-beta-2
>
>
> Fix flaky test:
> master.assignment.TestAssignmentManager   89.7% (70 / 78).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19816) Replication sink list is not updated on UnknownHostException

2018-01-17 Thread Scott Wilson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wilson updated HBASE-19816:
-
Description: 
We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
cluster and taking all live traffic which is replicated to cluster 2. We 
decommissioned several instances in cluster 2 which involves deleting the 
instance and its DNS record. After this happened most of the regions servers in 
cluster 1 showed this message in their logs repeatedly. 
 
{code}
2018-01-12 23:49:36,507 WARN 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
 Can't replicate because of a local or network error:
java.net.UnknownHostException: data-017b.hbase-2.prod
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

The host data-017b.hbase-2.prod was one of those that had been removed from 
cluster 2. Next we observed our replication lag from cluster 1 to cluster 2 was 
elevated. Some region servers reported ageOfLastShippedOperation to be close to 
an hour.

The only way we found to clear the message was to restart the region servers 
that showed this message in the log. Once we did replication returned to 
normal. Restarting the affected region servers in cluster 1 took several days 
because we could not bring the cluster down.

>From reading the code it appears the cause was the zookeeper watch not being 
>triggered for the region server list change in cluster 2. We verified the list 
>in zookeeper for cluster 2 was correct and did not include the removed nodes.

One concrete improvement to make would be to force a refresh of the sink 
cluster region server list when an {{UnknownHostException}} is found. This is 
already done if the there is a {{ConnectException}} in 
{{HBaseInterClusterReplicationEndpoint.java}}

{code:java}
} else if (ioe instanceof ConnectException) {
  LOG.warn("Peer is unavailable, rechecking all sinks: ", ioe);
  replicationSinkMgr.chooseSinks();
{code}

I propose that should be extended to cover UnknownHostException.

We observed this behavior on 1.2.0-cdh-5.11.1 but it appears the same code 
still exists on the current master branch.

 

  was:
We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
cluster and taking all live traffic which is replicated to cluster 2. We 
decommissioned several instances in cluster 2 which involves deleting the 
instance and its DNS record. After this happened most of the regions servers in 
cluster 1 showed this message in their logs repeatedly. 
 
{code}
2018-01-12 23:49:36,507 WARN 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
 Can't replicate because of a local or network error:
java.net.UnknownHostException: data-017b.hbase-2.prod
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
at java.util.concurrent.

[jira] [Updated] (HBASE-19816) Replication sink list is not updated on UnknownHostException

2018-01-17 Thread Scott Wilson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wilson updated HBASE-19816:
-
Description: 
We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
cluster and taking all live traffic which is replicated to cluster 2. We 
decommissioned several instances in cluster 2 which involves deleting the 
instance and its DNS record. After this happened most of the regions servers in 
cluster 1 showed this message in their logs repeatedly. 
 
{code}
2018-01-12 23:49:36,507 WARN 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
 Can't replicate because of a local or network error:
java.net.UnknownHostException: data-017b.hbase-2.prod
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

The host data-017b.hbase-2.prod was one of those that had been removed from 
cluster 2. Next we observed our replication lag from cluster 1 to cluster 2 was 
elevated. Some region servers reported ageOfLastShippedOperation to be close to 
an hour.

The only way we found to clear the message was to restart the region servers 
that showed this message in the log. Once we did replication returned to 
normal. Restarting the affected region servers in cluster 1 took several days 
because we could not bring the cluster down.

>From reading the code it appears the cause was the zookeeper watch not being 
>triggered for the region server list change in cluster 2. We verified the list 
>in zookeeper for cluster 2 was correct and did not include the removed nodes.

One concrete improvement to make would be to force a refresh of the sink 
cluster region server list when an {{UnknownHostException}} is found. This is 
already done if the there is a {{ConnectException}} in 
{{HBaseInterClusterReplicationEndpoint.java}}

{code:java}
} else if (ioe instanceof ConnectException) {
  LOG.warn("Peer is unavailable, rechecking all sinks: ", ioe);
  replicationSinkMgr.chooseSinks();
{code}

I propose that should be extended to cover {{UnknownHostException}}.

We observed this behavior on 1.2.0-cdh-5.11.1 but it appears the same code 
still exists on the current master branch.

 

  was:
We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
cluster and taking all live traffic which is replicated to cluster 2. We 
decommissioned several instances in cluster 2 which involves deleting the 
instance and its DNS record. After this happened most of the regions servers in 
cluster 1 showed this message in their logs repeatedly. 
 
{code}
2018-01-12 23:49:36,507 WARN 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
 Can't replicate because of a local or network error:
java.net.UnknownHostException: data-017b.hbase-2.prod
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
at java.util.concurr

[jira] [Commented] (HBASE-19780) Change execution phase of checkstyle plugin back to default 'verify'

2018-01-17 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329333#comment-16329333
 ] 

Appy commented on HBASE-19780:
--

bq. If we do it that way we should add a short hint in the documentation that 
mvn checkstyle:checkstyle needs to work locally before submitting a patch ...
Can add "Submitting Patches" section in documentation. Was trying it locally 
first, added a checkstyle error to ForeignExceptionUtil and ran {{mvn 
checkstyle:checkstyle}}, but the command didn't fail (with patch 003 above). 

bq. ...and that committers should pay attention to that point in the pre-commit 
job.
New checkstyle warnings will show up as +/- 1 in hadoop QA result. So that's 
covered.

bq. If we do it that way it isn't necessary to define Checkstyle in the 
sub-modules, as long as the plugin is configured in the top-level POM.
If we remove checkstyle definitions from sub-modules, we'll loose the nice 
failOnViolation things that you have added to prevent regressions. They are 
great! I'd like to keep them.

Here's a suggestion building on your previous one:
# Let's add a recommendation in documentation that run {{mvn checkstyle:check}} 
before submitting patches since it'll catch CS violations in modules which are 
perfectly clean.
# Add {{checkstyle:check}} as part of main pre-commit build. If there is any 
violation in these clean modules (towards which you have put great effort), 
then the pre-commit will fail also for the mvn install step, which is an 
important one. Thus, clean CK in these modules become hard pre-commit 
requirement *indirectly*.
If you agree, let's put a note on dev@ proposing these changes.

In meantime, does 003 (002 + fixing bad pom) seem good to you for committing ?

> Change execution phase of checkstyle plugin back to default 'verify'
> 
>
> Key: HBASE-19780
> URL: https://issues.apache.org/jira/browse/HBASE-19780
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
>Priority: Major
> Attachments: HBASE-19780.master.001.patch, 
> HBASE-19780.master.002.patch
>
>
> Not able to run following command successfully:
> {{mvn -DskipTests install site 
> -Dmaven.repo.local=/Users/appy/Desktop/temp_repo}}
> Use a clean separate repo so that existing packages don't pollute the build.
> Error is following.
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-site-plugin:3.4:site (default-site) on project 
> hbase: failed to get report for 
> org.apache.maven.plugins:maven-javadoc-plugin: Failed to execute goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (checkstyle) on 
> project hbase-error-prone: Execution checkstyle of goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check failed: Plugin 
> org.apache.maven.plugins:maven-checkstyle-plugin:2.17 or one of its 
> dependencies could not be resolved: Failure to find 
> org.apache.hbase:hbase-checkstyle:jar:2.0.0-beta-1 in 
> http://repository.apache.org/snapshots/ was cached in the local repository, 
> resolution will not be reattempted until the update interval of 
> apache.snapshots has elapsed or updates are forced -> [Help 1]
> {noformat}
> Note that master build goes pass this point.
> Need to figure out what's the difference and fix the overall build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-19657) Reenable TestAssignmentManagerMetrics for beta-2

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-19657.
---
Resolution: Invalid

HBASE-19598  does the reenable so this issue not needed.

> Reenable TestAssignmentManagerMetrics for beta-2
> 
>
> Key: HBASE-19657
> URL: https://issues.apache.org/jira/browse/HBASE-19657
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: Balazs Meszaros
>Priority: Major
> Fix For: 2.0.0-beta-2
>
>
> Was disabled by HBASE-19656. Reenable for beta-2 after HBASE-19598 is done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19586) Figure how to enable compression by default (fallbacks if native is missing, etc.)

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19586:
--
Fix Version/s: (was: 2.0.0-beta-2)
   2.0.0

> Figure how to enable compression by default (fallbacks if native is missing, 
> etc.)
> --
>
> Key: HBASE-19586
> URL: https://issues.apache.org/jira/browse/HBASE-19586
> Project: HBase
>  Issue Type: Sub-task
>  Components: defaults
>Reporter: stack
>Priority: Major
> Fix For: 2.0.0
>
>
> See parent issue where the benefits of enabling compression are brought up 
> (again!). Figure how we can make it work out of the box rather than expect 
> user set it up. Parking this issue to look at it before we release 2.0.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19256) [hbase-thirdparty] shade jetty

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19256:
--
Fix Version/s: (was: thirdparty-2.0.0)
   (was: 2.0.0-beta-2)
   2.0.0

> [hbase-thirdparty] shade jetty
> --
>
> Key: HBASE-19256
> URL: https://issues.apache.org/jira/browse/HBASE-19256
> Project: HBase
>  Issue Type: Task
>  Components: dependencies, thirdparty
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19005) Mutation batch should not accept operations with different durabilities

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19005:
--
Fix Version/s: (was: 2.0.0-beta-2)
   2.0.0

> Mutation batch should not accept operations with different durabilities
> ---
>
> Key: HBASE-19005
> URL: https://issues.apache.org/jira/browse/HBASE-19005
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-3
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.0.0
>
>
> Javadoc and change client side API to not accept operations with different 
> durabilities in a mutation batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19005) Mutation batch should not accept operations with different durabilities

2018-01-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329351#comment-16329351
 ] 

stack commented on HBASE-19005:
---

Seems critical. Not being worked on though (correct me if I am wrong). Moving 
out of beta-2.

> Mutation batch should not accept operations with different durabilities
> ---
>
> Key: HBASE-19005
> URL: https://issues.apache.org/jira/browse/HBASE-19005
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-3
>Reporter: Umesh Agashe
>Assignee: Umesh Agashe
>Priority: Major
> Fix For: 2.0.0
>
>
> Javadoc and change client side API to not accept operations with different 
> durabilities in a mutation batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18730) [pom cleanup] Purge junit from parent pom; not all modules have tests

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-18730:
--
Fix Version/s: (was: 2.0.0-beta-2)
   2.0.0

> [pom cleanup] Purge junit from parent pom; not all modules have tests
> -
>
> Key: HBASE-18730
> URL: https://issues.apache.org/jira/browse/HBASE-18730
> Project: HBase
>  Issue Type: Sub-task
>  Components: build, pom
>Reporter: stack
>Assignee: Mike Drob
>Priority: Major
> Fix For: 2.0.0
>
>
> Just removing is not enough (tried it in parent task).
> Our build calls test in each module which trips complaint/failure if no junit 
> dependency present.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18442) Speed up Memstore chunk pool ByteBuffer allocations

2018-01-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329354#comment-16329354
 ] 

stack commented on HBASE-18442:
---

Any progress here [~ram_krish] ?

> Speed up Memstore chunk pool ByteBuffer allocations
> ---
>
> Key: HBASE-18442
> URL: https://issues.apache.org/jira/browse/HBASE-18442
> Project: HBase
>  Issue Type: Improvement
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-18442_1.patch, HBASE-18442_2.patch, 
> HBASE-18442_3.patch, HBASE-18442_4.patch
>
>
> Like in HBASE-17738 we can speed up the allocation of memstore chunk pool's 
> ByteBuffers. {code}
> 2017-07-24 17:51:09,726 INFO  [regionserver/stobdtserver6/10.66.254.41:16020] 
> regionserver.ChunkCreator: Allocating MemStoreChunkPool with chunk size 2 MB, 
> max count 12288, initial count 12288
> 2017-07-24 17:51:19,642 INFO  [regionserver/stobdtserver6/10.66.254.41:16020] 
> regionserver.HRegionServer: Serving as stobdtserver6,16020,1500898858958, 
> RpcServer on stobdtserver6/10.66.254.41:16020, sessionid=0x15d748a9ccc0002
> {code}
> Allocating 12288 buffers of 2MB size takes around 10 secs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-18442) Speed up Memstore chunk pool ByteBuffer allocations

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-18442:
--
Fix Version/s: (was: 2.0.0-beta-2)
   2.0.0

> Speed up Memstore chunk pool ByteBuffer allocations
> ---
>
> Key: HBASE-18442
> URL: https://issues.apache.org/jira/browse/HBASE-18442
> Project: HBase
>  Issue Type: Improvement
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-18442_1.patch, HBASE-18442_2.patch, 
> HBASE-18442_3.patch, HBASE-18442_4.patch
>
>
> Like in HBASE-17738 we can speed up the allocation of memstore chunk pool's 
> ByteBuffers. {code}
> 2017-07-24 17:51:09,726 INFO  [regionserver/stobdtserver6/10.66.254.41:16020] 
> regionserver.ChunkCreator: Allocating MemStoreChunkPool with chunk size 2 MB, 
> max count 12288, initial count 12288
> 2017-07-24 17:51:19,642 INFO  [regionserver/stobdtserver6/10.66.254.41:16020] 
> regionserver.HRegionServer: Serving as stobdtserver6,16020,1500898858958, 
> RpcServer on stobdtserver6/10.66.254.41:16020, sessionid=0x15d748a9ccc0002
> {code}
> Allocating 12288 buffers of 2MB size takes around 10 secs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18442) Speed up Memstore chunk pool ByteBuffer allocations

2018-01-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329355#comment-16329355
 ] 

stack commented on HBASE-18442:
---

Moving out nice-to-have

> Speed up Memstore chunk pool ByteBuffer allocations
> ---
>
> Key: HBASE-18442
> URL: https://issues.apache.org/jira/browse/HBASE-18442
> Project: HBase
>  Issue Type: Improvement
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-18442_1.patch, HBASE-18442_2.patch, 
> HBASE-18442_3.patch, HBASE-18442_4.patch
>
>
> Like in HBASE-17738 we can speed up the allocation of memstore chunk pool's 
> ByteBuffers. {code}
> 2017-07-24 17:51:09,726 INFO  [regionserver/stobdtserver6/10.66.254.41:16020] 
> regionserver.ChunkCreator: Allocating MemStoreChunkPool with chunk size 2 MB, 
> max count 12288, initial count 12288
> 2017-07-24 17:51:19,642 INFO  [regionserver/stobdtserver6/10.66.254.41:16020] 
> regionserver.HRegionServer: Serving as stobdtserver6,16020,1500898858958, 
> RpcServer on stobdtserver6/10.66.254.41:16020, sessionid=0x15d748a9ccc0002
> {code}
> Allocating 12288 buffers of 2MB size takes around 10 secs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19780) Change execution phase of checkstyle plugin back to default 'verify'

2018-01-17 Thread Appy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Appy updated HBASE-19780:
-
Attachment: HBASE-19780.master.003.patch

> Change execution phase of checkstyle plugin back to default 'verify'
> 
>
> Key: HBASE-19780
> URL: https://issues.apache.org/jira/browse/HBASE-19780
> Project: HBase
>  Issue Type: Bug
>Reporter: Appy
>Assignee: Appy
>Priority: Major
> Attachments: HBASE-19780.master.001.patch, 
> HBASE-19780.master.002.patch, HBASE-19780.master.003.patch
>
>
> Not able to run following command successfully:
> {{mvn -DskipTests install site 
> -Dmaven.repo.local=/Users/appy/Desktop/temp_repo}}
> Use a clean separate repo so that existing packages don't pollute the build.
> Error is following.
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-site-plugin:3.4:site (default-site) on project 
> hbase: failed to get report for 
> org.apache.maven.plugins:maven-javadoc-plugin: Failed to execute goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (checkstyle) on 
> project hbase-error-prone: Execution checkstyle of goal 
> org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check failed: Plugin 
> org.apache.maven.plugins:maven-checkstyle-plugin:2.17 or one of its 
> dependencies could not be resolved: Failure to find 
> org.apache.hbase:hbase-checkstyle:jar:2.0.0-beta-1 in 
> http://repository.apache.org/snapshots/ was cached in the local repository, 
> resolution will not be reattempted until the update interval of 
> apache.snapshots has elapsed or updates are forced -> [Help 1]
> {noformat}
> Note that master build goes pass this point.
> Need to figure out what's the difference and fix the overall build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19816) Replication sink list is not updated on UnknownHostException

2018-01-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329378#comment-16329378
 ] 

Ted Yu commented on HBASE-19816:


Do you want to attach a patch with your proposal?

> Replication sink list is not updated on UnknownHostException
> 
>
> Key: HBASE-19816
> URL: https://issues.apache.org/jira/browse/HBASE-19816
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.2.0
> Environment: We have two clusters set up with bi-directional 
> replication. The clusters are around 400 nodes each and hosted in AWS.
>Reporter: Scott Wilson
>Priority: Major
>
> We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
> cluster and taking all live traffic which is replicated to cluster 2. We 
> decommissioned several instances in cluster 2 which involves deleting the 
> instance and its DNS record. After this happened most of the regions servers 
> in cluster 1 showed this message in their logs repeatedly. 
>  
> {code}
> 2018-01-12 23:49:36,507 WARN 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
>  Can't replicate because of a local or network error:
> java.net.UnknownHostException: data-017b.hbase-2.prod
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The host data-017b.hbase-2.prod was one of those that had been removed from 
> cluster 2. Next we observed our replication lag from cluster 1 to cluster 2 
> was elevated. Some region servers reported ageOfLastShippedOperation to be 
> close to an hour.
> The only way we found to clear the message was to restart the region servers 
> that showed this message in the log. Once we did replication returned to 
> normal. Restarting the affected region servers in cluster 1 took several days 
> because we could not bring the cluster down.
> From reading the code it appears the cause was the zookeeper watch not being 
> triggered for the region server list change in cluster 2. We verified the 
> list in zookeeper for cluster 2 was correct and did not include the removed 
> nodes.
> One concrete improvement to make would be to force a refresh of the sink 
> cluster region server list when an {{UnknownHostException}} is found. This is 
> already done if the there is a {{ConnectException}} in 
> {{HBaseInterClusterReplicationEndpoint.java}}
> {code:java}
> } else if (ioe instanceof ConnectException) {
>   LOG.warn("Peer is unavailable, rechecking all sinks: ", ioe);
>   replicationSinkMgr.chooseSinks();
> {code}
> I propose that should be extended to cover {{UnknownHostException}}.
> We observed this behavior on 1.2.0-cdh-5.11.1 but it appears the same code 
> still exists on the current master branch.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19770) Add '--return-values' option to Shell to print return values of commands in interactive mode

2018-01-17 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-19770:
---
Attachment: HBASE-19770.004.branch-2.patch

> Add '--return-values' option to Shell to print return values of commands in 
> interactive mode
> 
>
> Key: HBASE-19770
> URL: https://issues.apache.org/jira/browse/HBASE-19770
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: Romil Choksi
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19770.001.branch-2.patch, 
> HBASE-19770.002.branch-2.patch, HBASE-19770.003.branch-2.patch, 
> HBASE-19770.004.branch-2.patch
>
>
> Another good find by our Romil.
> {code}
> hbase(main):001:0> list
> TABLE
> a
> 1 row(s)
> Took 0.8385 seconds
> hbase(main):002:0> tables=list
> TABLE
> a
> 1 row(s)
> Took 0.0267 seconds
> hbase(main):003:0> puts tables
> hbase(main):004:0> p tables
> nil
> {code}
> The {{list}} command should be returning {{\['a'\]}} but is not.
> The command class itself appears to be doing the right thing -- maybe the 
> retval is getting lost somewhere else?
> FYI [~stack].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19770) Add '--return-values' option to Shell to print return values of commands in interactive mode

2018-01-17 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329386#comment-16329386
 ] 

Josh Elser commented on HBASE-19770:


.004 The patch I had committed to branch-2 and master, but Mike pointed out 
that I should have used {{unless}} instead of the {{if not}} I did use.

> Add '--return-values' option to Shell to print return values of commands in 
> interactive mode
> 
>
> Key: HBASE-19770
> URL: https://issues.apache.org/jira/browse/HBASE-19770
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: Romil Choksi
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19770.001.branch-2.patch, 
> HBASE-19770.002.branch-2.patch, HBASE-19770.003.branch-2.patch, 
> HBASE-19770.004.branch-2.patch
>
>
> Another good find by our Romil.
> {code}
> hbase(main):001:0> list
> TABLE
> a
> 1 row(s)
> Took 0.8385 seconds
> hbase(main):002:0> tables=list
> TABLE
> a
> 1 row(s)
> Took 0.0267 seconds
> hbase(main):003:0> puts tables
> hbase(main):004:0> p tables
> nil
> {code}
> The {{list}} command should be returning {{\['a'\]}} but is not.
> The command class itself appears to be doing the right thing -- maybe the 
> retval is getting lost somewhere else?
> FYI [~stack].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19770) Add '--return-values' option to Shell to print return values of commands in interactive mode

2018-01-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329399#comment-16329399
 ] 

Hadoop QA commented on HBASE-19770:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HBASE-19770 does not apply to branch-2. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.6.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-19770 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12905736/HBASE-19770.001.branch-2.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11087/console |
| Powered by | Apache Yetus 0.6.0   http://yetus.apache.org |


This message was automatically generated.



> Add '--return-values' option to Shell to print return values of commands in 
> interactive mode
> 
>
> Key: HBASE-19770
> URL: https://issues.apache.org/jira/browse/HBASE-19770
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: Romil Choksi
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19770.001.branch-2.patch, 
> HBASE-19770.002.branch-2.patch, HBASE-19770.003.branch-2.patch, 
> HBASE-19770.004.branch-2.patch
>
>
> Another good find by our Romil.
> {code}
> hbase(main):001:0> list
> TABLE
> a
> 1 row(s)
> Took 0.8385 seconds
> hbase(main):002:0> tables=list
> TABLE
> a
> 1 row(s)
> Took 0.0267 seconds
> hbase(main):003:0> puts tables
> hbase(main):004:0> p tables
> nil
> {code}
> The {{list}} command should be returning {{\['a'\]}} but is not.
> The command class itself appears to be doing the right thing -- maybe the 
> retval is getting lost somewhere else?
> FYI [~stack].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19812) TestFlushSnapshotFromClient fails because of failing region.flush

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19812:
--
Attachment: 19812.patch

> TestFlushSnapshotFromClient fails because of failing region.flush
> -
>
> Key: HBASE-19812
> URL: https://issues.apache.org/jira/browse/HBASE-19812
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>Priority: Major
> Attachments: 19812.patch
>
>
> {noformat}
> 2018-01-17 06:43:48,390 INFO  [MemStoreFlusher.1] regionserver.HRegion(2516): 
> Flushing 1/1 column families, memstore=549.25 KB
> 2018-01-17 06:43:48,390 DEBUG [MemStoreFlusher.1] 
> regionserver.CompactingMemStore(205): FLUSHING TO DISK: region 
> test,5,1516171425662.acafc22e1f8132285eae5362d0df536a.store: fam
> 2018-01-17 06:43:48,406 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=42601-inmemoryCompactions-1516171428312]
>  regionserver.CompactionPipeline(206): Compaction pipeline segment 
> Type=CSLMImmutableSegment, empty=no, cellCount=17576, cellSize=562432, 
> totalHeapSize=1828120, min timestamp=1516171428258, max 
> timestamp=1516171428258Num uniques -1;  flattened
> 2018-01-17 06:43:48,406 DEBUG [MemStoreFlusher.1] 
> regionserver.CompactionPipeline(128): Swapping pipeline suffix; before=1, new 
> segement=null
> 2018-01-17 06:43:48,455 DEBUG [Time-limited test] regionserver.HRegion(2201): 
> NOT flushing memstore for region 
> test,5,1516171425662.acafc22e1f8132285eae5362d0df536a., flushing=true, 
> writesEnabled=true
> {noformat}
> You can see that we start a background flush first, and then we decided to do 
> an in memory compaction, at the same time we call the region.flush from test, 
> and it find that the region is already flushing so it give up.
> This test is a bit awkward that we create the table with 6 regions which 
> start key is 0,1,2,3,4,5, but when loading data we use 'aaa' to 'zzz', so 
> there is only one region has data. And in the above scenario the only one 
> region gives up flushing, then there is no data, and then our test fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19812) TestFlushSnapshotFromClient fails because of failing region.flush

2018-01-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329405#comment-16329405
 ] 

stack commented on HBASE-19812:
---

Here is patch to disable inmemory compaction on this test.

> TestFlushSnapshotFromClient fails because of failing region.flush
> -
>
> Key: HBASE-19812
> URL: https://issues.apache.org/jira/browse/HBASE-19812
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>Priority: Major
> Attachments: 19812.patch
>
>
> {noformat}
> 2018-01-17 06:43:48,390 INFO  [MemStoreFlusher.1] regionserver.HRegion(2516): 
> Flushing 1/1 column families, memstore=549.25 KB
> 2018-01-17 06:43:48,390 DEBUG [MemStoreFlusher.1] 
> regionserver.CompactingMemStore(205): FLUSHING TO DISK: region 
> test,5,1516171425662.acafc22e1f8132285eae5362d0df536a.store: fam
> 2018-01-17 06:43:48,406 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=42601-inmemoryCompactions-1516171428312]
>  regionserver.CompactionPipeline(206): Compaction pipeline segment 
> Type=CSLMImmutableSegment, empty=no, cellCount=17576, cellSize=562432, 
> totalHeapSize=1828120, min timestamp=1516171428258, max 
> timestamp=1516171428258Num uniques -1;  flattened
> 2018-01-17 06:43:48,406 DEBUG [MemStoreFlusher.1] 
> regionserver.CompactionPipeline(128): Swapping pipeline suffix; before=1, new 
> segement=null
> 2018-01-17 06:43:48,455 DEBUG [Time-limited test] regionserver.HRegion(2201): 
> NOT flushing memstore for region 
> test,5,1516171425662.acafc22e1f8132285eae5362d0df536a., flushing=true, 
> writesEnabled=true
> {noformat}
> You can see that we start a background flush first, and then we decided to do 
> an in memory compaction, at the same time we call the region.flush from test, 
> and it find that the region is already flushing so it give up.
> This test is a bit awkward that we create the table with 6 regions which 
> start key is 0,1,2,3,4,5, but when loading data we use 'aaa' to 'zzz', so 
> there is only one region has data. And in the above scenario the only one 
> region gives up flushing, then there is no data, and then our test fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19812) TestFlushSnapshotFromClient fails because of failing region.flush

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19812:
--
Status: Patch Available  (was: Open)

> TestFlushSnapshotFromClient fails because of failing region.flush
> -
>
> Key: HBASE-19812
> URL: https://issues.apache.org/jira/browse/HBASE-19812
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>Priority: Major
> Attachments: 19812.patch
>
>
> {noformat}
> 2018-01-17 06:43:48,390 INFO  [MemStoreFlusher.1] regionserver.HRegion(2516): 
> Flushing 1/1 column families, memstore=549.25 KB
> 2018-01-17 06:43:48,390 DEBUG [MemStoreFlusher.1] 
> regionserver.CompactingMemStore(205): FLUSHING TO DISK: region 
> test,5,1516171425662.acafc22e1f8132285eae5362d0df536a.store: fam
> 2018-01-17 06:43:48,406 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=42601-inmemoryCompactions-1516171428312]
>  regionserver.CompactionPipeline(206): Compaction pipeline segment 
> Type=CSLMImmutableSegment, empty=no, cellCount=17576, cellSize=562432, 
> totalHeapSize=1828120, min timestamp=1516171428258, max 
> timestamp=1516171428258Num uniques -1;  flattened
> 2018-01-17 06:43:48,406 DEBUG [MemStoreFlusher.1] 
> regionserver.CompactionPipeline(128): Swapping pipeline suffix; before=1, new 
> segement=null
> 2018-01-17 06:43:48,455 DEBUG [Time-limited test] regionserver.HRegion(2201): 
> NOT flushing memstore for region 
> test,5,1516171425662.acafc22e1f8132285eae5362d0df536a., flushing=true, 
> writesEnabled=true
> {noformat}
> You can see that we start a background flush first, and then we decided to do 
> an in memory compaction, at the same time we call the region.flush from test, 
> and it find that the region is already flushing so it give up.
> This test is a bit awkward that we create the table with 6 regions which 
> start key is 0,1,2,3,4,5, but when loading data we use 'aaa' to 'zzz', so 
> there is only one region has data. And in the above scenario the only one 
> region gives up flushing, then there is no data, and then our test fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19735) Create a minimal "client" tarball installation

2018-01-17 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329411#comment-16329411
 ] 

Josh Elser commented on HBASE-19735:


bq. Ain't you fancy!

;)

bq. MR?

Didn't get there yet: I got stuck trying to use the hbase-shaded-client with 
that same basic client. For some reason that eludes me still, the client got 
stuck waiting on pulling hbaseid out of ZK.

I think I know the right way to do this all in one Maven module, so I'm 
consolidating that now. Will resume testing after I make sure the approach is 
kosher.

> Create a minimal "client" tarball installation
> --
>
> Key: HBASE-19735
> URL: https://issues.apache.org/jira/browse/HBASE-19735
> Project: HBase
>  Issue Type: New Feature
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-19735.001.branch-2.patch, 
> HBASE-19735.002.branch-2.patch
>
>
> We're moving ourselves towards more controlled dependencies. A logical next 
> step is to try to do the same for our "binary" artifacts that we create 
> during releases.
> There is code (our's and our dependency's) which the HMaster and RegionServer 
> require which, obviously, clients do not need.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19598) Fix TestAssignmentManagerMetrics flaky test

2018-01-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329425#comment-16329425
 ] 

Hadoop QA commented on HBASE-19598:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
34s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
9s{color} | {color:red} hbase-server: The patch generated 2 new + 472 unchanged 
- 0 fixed = 474 total (was 472) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
13s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
22m 25s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
12s{color} | {color:green} hbase-zookeeper in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 32m 59s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.TestClientClusterMetrics |
|   | hadoop.hbase.TestClientClusterStatus |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19598 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12906340/HBASE-19598.master.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux b98224059562 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 7224546b1e |
| maven | version: Apache Maven 3.5.2 
(138edd61f

[jira] [Updated] (HBASE-19598) Fix TestAssignmentManagerMetrics flaky test

2018-01-17 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19598:
--
Attachment: HBASE-19598.master.003.patch

> Fix TestAssignmentManagerMetrics flaky test
> ---
>
> Key: HBASE-19598
> URL: https://issues.apache.org/jira/browse/HBASE-19598
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-1
>Reporter: Balazs Meszaros
>Assignee: Balazs Meszaros
>Priority: Major
> Attachments: HBASE-19598.master.001.patch, 
> HBASE-19598.master.002.patch, HBASE-19598.master.003.patch, 
> HBASE-19598.master.003.patch, TestUtil.java
>
>
> TestAssignmentManagerMetrics fails constantly. After bisecting, it seems that 
> commit 010012cbcb broke it (HBASE-18946).
> The test method runs successfully, but it cannot shut the minicluster down, 
> and hangs forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19163) "Maximum lock count exceeded" from region server's batch processing

2018-01-17 Thread huaxiang sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329455#comment-16329455
 ] 

huaxiang sun commented on HBASE-19163:
--

Hi [~saint@gmail.com], ping for commit to branch-1. I got +1 from 
[~uagashe], need to get your +1 to move forward, thanks.

> "Maximum lock count exceeded" from region server's batch processing
> ---
>
> Key: HBASE-19163
> URL: https://issues.apache.org/jira/browse/HBASE-19163
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 3.0.0, 1.2.7, 2.0.0-alpha-3
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Major
> Attachments: HBASE-19163-branch-1-v001.patch, 
> HBASE-19163-branch-1-v001.patch, HBASE-19163-master-v001.patch, 
> HBASE-19163.master.001.patch, HBASE-19163.master.002.patch, 
> HBASE-19163.master.004.patch, HBASE-19163.master.005.patch, 
> HBASE-19163.master.006.patch, HBASE-19163.master.007.patch, 
> HBASE-19163.master.008.patch, HBASE-19163.master.009.patch, 
> HBASE-19163.master.009.patch, HBASE-19163.master.010.patch, unittest-case.diff
>
>
> In one of use cases, we found the following exception and replication is 
> stuck.
> {code}
> 2017-10-25 19:41:17,199 WARN  [hconnection-0x28db294f-shared--pool4-t936] 
> client.AsyncProcess: #3, table=foo, attempt=5/5 failed=262836ops, last 
> exception: java.io.IOException: java.io.IOException: Maximum lock count 
> exceeded
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
> Caused by: java.lang.Error: Maximum lock count exceeded
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:528)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:488)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1327)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5163)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3018)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2877)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2819)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:753)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:715)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2148)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33656)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
> ... 3 more
> {code}
> While we are still examining the data pattern, it is sure that there are too 
> many mutations in the batch against the same row, this exceeds the maximum 
> 64k shared lock count and it throws an error and failed the whole batch.
> There are two approaches to solve this issue.
> 1). Let's say there are mutations against the same row in the batch, we just 
> need to acquire the lock once for the same row vs to acquire the lock for 
> each mutation.
> 2). We catch the error and start to process whatever it gets and loop back.
> With HBASE-17924, approach 1 seems easy to implement now. 
> Create the jira and will post update/patch when investigation moving forward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-17 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329466#comment-16329466
 ] 

Vladimir Rodionov commented on HBASE-17852:
---

So, we are returning back to procV2 and tight integration with hbase-server? 
[~appy], we used to have this before, but had to move everything from 
hbase-server more than a year ago by request from [~stack]. Therefore, I need 
[~stack] +1 on this plan before I start working on refactoring again.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19815) Flakey TestAssignmentManager.testAssignWithRandExec

2018-01-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329477#comment-16329477
 ] 

Hadoop QA commented on HBASE-19815:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
39s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
26s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
28s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
14m  3s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 92m 
35s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}127m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:9f2f2db |
| JIRA Issue | HBASE-19815 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12906448/HBASE-19815.branch-2.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux d740c8f10303 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2 / b4f6ae86b6 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11084/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/11084/console |
| Powered by | Apache Yetus 0.6.0   http://yetus.apache.org |


This message was automatically generated.



> Flakey TestAssignmentManager.testAssignWithRandExec
> ---
>
> Key: HBASE-19815
> URL: https://issues.apache.org/ji

[jira] [Commented] (HBASE-19794) TestZooKeeper hangs

2018-01-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329492#comment-16329492
 ] 

stack commented on HBASE-19794:
---

I can't make this hang locally or on a test machine. I see it failing 16% of 
time according to 
[https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/dashboard.html]
  Its a timeout. 

 

Log has loads of threads hanging out.  Some Proc workers blocked:

 

Thread 2268 (RS_CLOSE_REGION-asf903:58756-1): State: BLOCKED Blocked count: 12 
Waited count: 17 Blocked on 
org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode@1c0991d8 
Blocked by 2083 (ProcExecWrkr-6) Stack: 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportTransition(AssignmentManager.java:869)
 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.updateRegionTransition(AssignmentManager.java:857)
 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportRegionStateTransition(AssignmentManager.java:801)
 
org.apache.hadoop.hbase.master.MasterRpcServices.reportRegionStateTransition(MasterRpcServices.java:1561)
 
org.apache.hadoop.hbase.regionserver.HRegionServer.reportRegionStateTransition(HRegionServer.java:2263)
 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:121)
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
java.lang.Thread.run(Thread.java:748) Thread 2267 
(RS_CLOSE_REGION-asf903:58756-0): State: BLOCKED Blocked count: 14 Waited 
count: 17 Blocked on 
org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode@75cdbae3 
Blocked by 2086 (ProcExecWrkr-9) Stack: 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportTransition(AssignmentManager.java:869)
 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.updateRegionTransition(AssignmentManager.java:857)
 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportRegionStateTransition(AssignmentManager.java:801)
 
org.apache.hadoop.hbase.master.MasterRpcServices.reportRegionStateTransition(MasterRpcServices.java:1561)
 
org.apache.hadoop.hbase.regionserver.HRegionServer.reportRegionStateTransition(HRegionServer.java:2263)
 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:121)
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
java.lang.Thread.run(Thread.java:748)

 

The Proc Workers are not daemon threads. Let me change that so at least we stop 
timing out.

 

> TestZooKeeper hangs
> ---
>
> Key: HBASE-19794
> URL: https://issues.apache.org/jira/browse/HBASE-19794
> Project: HBase
>  Issue Type: Bug
>Reporter: Duo Zhang
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: org.apache.hadoop.hbase.TestZooKeeper-output.txt
>
>
> Seems like the TestZKAsyncRegistry that hangs in shutdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19816) Replication sink list is not updated on UnknownHostException

2018-01-17 Thread Scott Wilson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wilson updated HBASE-19816:
-
Attachment: HBASE-19816.master.001.patch

> Replication sink list is not updated on UnknownHostException
> 
>
> Key: HBASE-19816
> URL: https://issues.apache.org/jira/browse/HBASE-19816
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.2.0
> Environment: We have two clusters set up with bi-directional 
> replication. The clusters are around 400 nodes each and hosted in AWS.
>Reporter: Scott Wilson
>Priority: Major
> Attachments: HBASE-19816.master.001.patch
>
>
> We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
> cluster and taking all live traffic which is replicated to cluster 2. We 
> decommissioned several instances in cluster 2 which involves deleting the 
> instance and its DNS record. After this happened most of the regions servers 
> in cluster 1 showed this message in their logs repeatedly. 
>  
> {code}
> 2018-01-12 23:49:36,507 WARN 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
>  Can't replicate because of a local or network error:
> java.net.UnknownHostException: data-017b.hbase-2.prod
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The host data-017b.hbase-2.prod was one of those that had been removed from 
> cluster 2. Next we observed our replication lag from cluster 1 to cluster 2 
> was elevated. Some region servers reported ageOfLastShippedOperation to be 
> close to an hour.
> The only way we found to clear the message was to restart the region servers 
> that showed this message in the log. Once we did replication returned to 
> normal. Restarting the affected region servers in cluster 1 took several days 
> because we could not bring the cluster down.
> From reading the code it appears the cause was the zookeeper watch not being 
> triggered for the region server list change in cluster 2. We verified the 
> list in zookeeper for cluster 2 was correct and did not include the removed 
> nodes.
> One concrete improvement to make would be to force a refresh of the sink 
> cluster region server list when an {{UnknownHostException}} is found. This is 
> already done if the there is a {{ConnectException}} in 
> {{HBaseInterClusterReplicationEndpoint.java}}
> {code:java}
> } else if (ioe instanceof ConnectException) {
>   LOG.warn("Peer is unavailable, rechecking all sinks: ", ioe);
>   replicationSinkMgr.chooseSinks();
> {code}
> I propose that should be extended to cover {{UnknownHostException}}.
> We observed this behavior on 1.2.0-cdh-5.11.1 but it appears the same code 
> still exists on the current master branch.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19816) Replication sink list is not updated on UnknownHostException

2018-01-17 Thread Scott Wilson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wilson updated HBASE-19816:
-
Attachment: HBASE-19816.master.002.patch

> Replication sink list is not updated on UnknownHostException
> 
>
> Key: HBASE-19816
> URL: https://issues.apache.org/jira/browse/HBASE-19816
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.2.0
> Environment: We have two clusters set up with bi-directional 
> replication. The clusters are around 400 nodes each and hosted in AWS.
>Reporter: Scott Wilson
>Priority: Major
> Attachments: HBASE-19816.master.001.patch
>
>
> We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
> cluster and taking all live traffic which is replicated to cluster 2. We 
> decommissioned several instances in cluster 2 which involves deleting the 
> instance and its DNS record. After this happened most of the regions servers 
> in cluster 1 showed this message in their logs repeatedly. 
>  
> {code}
> 2018-01-12 23:49:36,507 WARN 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
>  Can't replicate because of a local or network error:
> java.net.UnknownHostException: data-017b.hbase-2.prod
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The host data-017b.hbase-2.prod was one of those that had been removed from 
> cluster 2. Next we observed our replication lag from cluster 1 to cluster 2 
> was elevated. Some region servers reported ageOfLastShippedOperation to be 
> close to an hour.
> The only way we found to clear the message was to restart the region servers 
> that showed this message in the log. Once we did replication returned to 
> normal. Restarting the affected region servers in cluster 1 took several days 
> because we could not bring the cluster down.
> From reading the code it appears the cause was the zookeeper watch not being 
> triggered for the region server list change in cluster 2. We verified the 
> list in zookeeper for cluster 2 was correct and did not include the removed 
> nodes.
> One concrete improvement to make would be to force a refresh of the sink 
> cluster region server list when an {{UnknownHostException}} is found. This is 
> already done if the there is a {{ConnectException}} in 
> {{HBaseInterClusterReplicationEndpoint.java}}
> {code:java}
> } else if (ioe instanceof ConnectException) {
>   LOG.warn("Peer is unavailable, rechecking all sinks: ", ioe);
>   replicationSinkMgr.chooseSinks();
> {code}
> I propose that should be extended to cover {{UnknownHostException}}.
> We observed this behavior on 1.2.0-cdh-5.11.1 but it appears the same code 
> still exists on the current master branch.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19816) Replication sink list is not updated on UnknownHostException

2018-01-17 Thread Scott Wilson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Wilson updated HBASE-19816:
-
Attachment: (was: HBASE-19816.master.002.patch)

> Replication sink list is not updated on UnknownHostException
> 
>
> Key: HBASE-19816
> URL: https://issues.apache.org/jira/browse/HBASE-19816
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.2.0
> Environment: We have two clusters set up with bi-directional 
> replication. The clusters are around 400 nodes each and hosted in AWS.
>Reporter: Scott Wilson
>Priority: Major
> Attachments: HBASE-19816.master.001.patch
>
>
> We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
> cluster and taking all live traffic which is replicated to cluster 2. We 
> decommissioned several instances in cluster 2 which involves deleting the 
> instance and its DNS record. After this happened most of the regions servers 
> in cluster 1 showed this message in their logs repeatedly. 
>  
> {code}
> 2018-01-12 23:49:36,507 WARN 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
>  Can't replicate because of a local or network error:
> java.net.UnknownHostException: data-017b.hbase-2.prod
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The host data-017b.hbase-2.prod was one of those that had been removed from 
> cluster 2. Next we observed our replication lag from cluster 1 to cluster 2 
> was elevated. Some region servers reported ageOfLastShippedOperation to be 
> close to an hour.
> The only way we found to clear the message was to restart the region servers 
> that showed this message in the log. Once we did replication returned to 
> normal. Restarting the affected region servers in cluster 1 took several days 
> because we could not bring the cluster down.
> From reading the code it appears the cause was the zookeeper watch not being 
> triggered for the region server list change in cluster 2. We verified the 
> list in zookeeper for cluster 2 was correct and did not include the removed 
> nodes.
> One concrete improvement to make would be to force a refresh of the sink 
> cluster region server list when an {{UnknownHostException}} is found. This is 
> already done if the there is a {{ConnectException}} in 
> {{HBaseInterClusterReplicationEndpoint.java}}
> {code:java}
> } else if (ioe instanceof ConnectException) {
>   LOG.warn("Peer is unavailable, rechecking all sinks: ", ioe);
>   replicationSinkMgr.chooseSinks();
> {code}
> I propose that should be extended to cover {{UnknownHostException}}.
> We observed this behavior on 1.2.0-cdh-5.11.1 but it appears the same code 
> still exists on the current master branch.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-17852) Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental backup)

2018-01-17 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329494#comment-16329494
 ] 

Appy commented on HBASE-17852:
--

Replication is doing it, but it's already in hbase-server module so it's 
definitely not the ideal example. But I think its possible to do procv2 + 
backup without tight integration with hbase-server i.e. while keeping things in 
separate module. Won't be surprised if it requires some refactoring/small 
design improvements in proc2 code itself, but that'll be all for good. Maybe 
backup module become the poster face for "Building features with procv2" and we 
make replication do the same.

> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> 
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch, 
> HBASE-17852-v9.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19816) Replication sink list is not updated on UnknownHostException

2018-01-17 Thread Scott Wilson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329497#comment-16329497
 ] 

Scott Wilson commented on HBASE-19816:
--

Patch added.

> Replication sink list is not updated on UnknownHostException
> 
>
> Key: HBASE-19816
> URL: https://issues.apache.org/jira/browse/HBASE-19816
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.2.0
> Environment: We have two clusters set up with bi-directional 
> replication. The clusters are around 400 nodes each and hosted in AWS.
>Reporter: Scott Wilson
>Priority: Major
> Attachments: HBASE-19816.master.001.patch
>
>
> We have two clusters, call them 1 and 2. Cluster 1 was the current "primary" 
> cluster and taking all live traffic which is replicated to cluster 2. We 
> decommissioned several instances in cluster 2 which involves deleting the 
> instance and its DNS record. After this happened most of the regions servers 
> in cluster 1 showed this message in their logs repeatedly. 
>  
> {code}
> 2018-01-12 23:49:36,507 WARN 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
>  Can't replicate because of a local or network error:
> java.net.UnknownHostException: data-017b.hbase-2.prod
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.(AbstractRpcClient.java:315)
> at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(AbstractRpcClient.java:267)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1737)
> at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getAdmin(ConnectionManager.java:1719)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager.getReplicationSink(ReplicationSinkManager.java:119)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:339)
> at 
> org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:326)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> The host data-017b.hbase-2.prod was one of those that had been removed from 
> cluster 2. Next we observed our replication lag from cluster 1 to cluster 2 
> was elevated. Some region servers reported ageOfLastShippedOperation to be 
> close to an hour.
> The only way we found to clear the message was to restart the region servers 
> that showed this message in the log. Once we did replication returned to 
> normal. Restarting the affected region servers in cluster 1 took several days 
> because we could not bring the cluster down.
> From reading the code it appears the cause was the zookeeper watch not being 
> triggered for the region server list change in cluster 2. We verified the 
> list in zookeeper for cluster 2 was correct and did not include the removed 
> nodes.
> One concrete improvement to make would be to force a refresh of the sink 
> cluster region server list when an {{UnknownHostException}} is found. This is 
> already done if the there is a {{ConnectException}} in 
> {{HBaseInterClusterReplicationEndpoint.java}}
> {code:java}
> } else if (ioe instanceof ConnectException) {
>   LOG.warn("Peer is unavailable, rechecking all sinks: ", ioe);
>   replicationSinkMgr.chooseSinks();
> {code}
> I propose that should be extended to cover {{UnknownHostException}}.
> We observed this behavior on 1.2.0-cdh-5.11.1 but it appears the same code 
> still exists on the current master branch.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >