[jira] [Commented] (HBASE-21407) Resolve NPE in backup Master UI

2018-11-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673928#comment-16673928
 ] 

Hudson commented on HBASE-21407:


Results for branch branch-2.1
[build #573 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/573/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/573//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/573//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/573//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Resolve NPE in backup Master UI 
> 
>
> Key: HBASE-21407
> URL: https://issues.apache.org/jira/browse/HBASE-21407
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: hbase-21407.master.001.patch, 
> hbase-21407.master.001.patch, hbase-21407.master.001.patch
>
>
> Since some pages of our UI are using jsp instead of jamon, the fix of 
> HBASE-18263 is not enough. Added the fix of HBASE-18263 to the header.jsp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21351) The force update thread may have race with PE worker when the procedure is rolling back

2018-11-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673929#comment-16673929
 ] 

Hudson commented on HBASE-21351:


Results for branch branch-2.1
[build #573 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/573/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/573//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/573//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/573//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> The force update thread may have race with PE worker when the procedure is 
> rolling back
> ---
>
> Key: HBASE-21351
> URL: https://issues.apache.org/jira/browse/HBASE-21351
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21351-v1.patch, HBASE-21351-v1.patch, 
> HBASE-21351-v2.patch, HBASE-21351.patch
>
>
> We will acquire the procExecutionLock for a procedure when force updating its 
> state to prevent race with PE worker, but this does not work then the 
> procedure is rolling back.
> If a procedure is failed, we will mark the root procedure stack as FAILED, 
> and then start to rollback the whole procedure stack. We will pop every 
> procedure in the stack and try to rollback them. So we may change the state 
> of a procedure without holding its procExecutionLock when rolling back.
> This means we may persist an intermediate state of a procedure and cause 
> corruption when loading procedures. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673926#comment-16673926
 ] 

Hadoop QA commented on HBASE-21387:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
2s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.8.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
47s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 4s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} hbase-server: The patch generated 0 new + 1 
unchanged - 1 fixed = 1 total (was 2) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 6s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 50s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}121m 
46s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}161m 17s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21387 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946772/21387.v7.txt |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux fac8a2a4af88 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 62fe365934 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14941/testReport/ |
| 

[jira] [Commented] (HBASE-21396) Create 2.1.1 release

2018-11-02 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673893#comment-16673893
 ] 

stack commented on HBASE-21396:
---

Thanks.  Let me fix

> Create 2.1.1 release
> 
>
> Key: HBASE-21396
> URL: https://issues.apache.org/jira/browse/HBASE-21396
> Project: HBase
>  Issue Type: Task
>  Components: rm
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21407) Resolve NPE in backup Master UI

2018-11-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673890#comment-16673890
 ] 

Hudson commented on HBASE-21407:


Results for branch branch-2.0
[build #1053 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1053/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1053//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- Something went wrong running this stage, please [check relevant console 
output|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1053//console].


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- Something went wrong running this stage, please [check relevant console 
output|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1053//console].


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Resolve NPE in backup Master UI 
> 
>
> Key: HBASE-21407
> URL: https://issues.apache.org/jira/browse/HBASE-21407
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: hbase-21407.master.001.patch, 
> hbase-21407.master.001.patch, hbase-21407.master.001.patch
>
>
> Since some pages of our UI are using jsp instead of jamon, the fix of 
> HBASE-18263 is not enough. Added the fix of HBASE-18263 to the header.jsp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-02 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21387:
---
Attachment: 21387.v7.txt

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt, 
> 21387.v7.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673884#comment-16673884
 ] 

Hadoop QA commented on HBASE-21387:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
3s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.8.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 5s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
5s{color} | {color:red} hbase-server: The patch generated 10 new + 1 unchanged 
- 1 fixed = 11 total (was 2) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 4s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 47s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}122m 
16s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}161m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21387 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946752/21387.v6.txt |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux fa3e06a28883 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 25c964e9a3 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 

[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-11-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673868#comment-16673868
 ] 

Hudson commented on HBASE-20952:


Results for branch HBASE-20952
[build #37 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/37/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/37//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/37//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/37//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21351) The force update thread may have race with PE worker when the procedure is rolling back

2018-11-02 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21351:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to branch-2.0+.

Thanks [~stack] for reviewing.

> The force update thread may have race with PE worker when the procedure is 
> rolling back
> ---
>
> Key: HBASE-21351
> URL: https://issues.apache.org/jira/browse/HBASE-21351
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21351-v1.patch, HBASE-21351-v1.patch, 
> HBASE-21351-v2.patch, HBASE-21351.patch
>
>
> We will acquire the procExecutionLock for a procedure when force updating its 
> state to prevent race with PE worker, but this does not work then the 
> procedure is rolling back.
> If a procedure is failed, we will mark the root procedure stack as FAILED, 
> and then start to rollback the whole procedure stack. We will pop every 
> procedure in the stack and try to rollback them. So we may change the state 
> of a procedure without holding its procExecutionLock when rolling back.
> This means we may persist an intermediate state of a procedure and cause 
> corruption when loading procedures. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21430) [hbase-connectors] Move hbase-spark* modules to hbase-connectors repo

2018-11-02 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673857#comment-16673857
 ] 

Duo Zhang commented on HBASE-21430:
---

Is it possible to config the gitbox to not post all the contents of a pull 
request on JIRA?

> [hbase-connectors] Move hbase-spark* modules to hbase-connectors repo
> -
>
> Key: HBASE-21430
> URL: https://issues.apache.org/jira/browse/HBASE-21430
> Project: HBase
>  Issue Type: Bug
>  Components: hbase-connectors, spark
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Exploring moving the spark modules out of core hbase and into 
> hbase-connectors. Perhaps spark is deserving of its own repo (I think 
> [~busbey] was on about this) but meantime, experimenting w/ having it out in 
> hbase-connectors.
> Here is thread on spark integration 
> https://lists.apache.org/thread.html/fd74ef9b9da77abf794664f06ea19c839fb3d543647fb29115081683@%3Cdev.hbase.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21396) Create 2.1.1 release

2018-11-02 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673855#comment-16673855
 ] 

Duo Zhang commented on HBASE-21396:
---

One problem sir.

http://hbase.apache.org/downloads.html

The release date of 2.1.1 should be 10/26? It is still 07/18 on the page.

> Create 2.1.1 release
> 
>
> Key: HBASE-21396
> URL: https://issues.apache.org/jira/browse/HBASE-21396
> Project: HBase
>  Issue Type: Task
>  Components: rm
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21396) Create 2.1.1 release

2018-11-02 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-21396.
---
   Resolution: Fixed
Fix Version/s: 2.1.1

Resolving as done.

> Create 2.1.1 release
> 
>
> Key: HBASE-21396
> URL: https://issues.apache.org/jira/browse/HBASE-21396
> Project: HBase
>  Issue Type: Task
>  Components: rm
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21396) Create 2.1.1 release

2018-11-02 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673845#comment-16673845
 ] 

stack commented on HBASE-21396:
---

Sent notice to hbase-dev, hbase-user and to announce@apache: 
https://lists.apache.org/api/atom.lua?mid=2f2b9858bb71b21a98ae87ab84065293b89de7384c2e90694ce579e1@%3Cannounce.apache.org%3E

> Create 2.1.1 release
> 
>
> Key: HBASE-21396
> URL: https://issues.apache.org/jira/browse/HBASE-21396
> Project: HBase
>  Issue Type: Task
>  Components: rm
>Reporter: stack
>Assignee: stack
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21399) Generate and commit 2.1.1 RELEASENOTES.md and CHANGES.md

2018-11-02 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-21399.
---
   Resolution: Fixed
 Assignee: stack
Fix Version/s: 2.1.1

> Generate and commit 2.1.1 RELEASENOTES.md and CHANGES.md
> 
>
> Key: HBASE-21399
> URL: https://issues.apache.org/jira/browse/HBASE-21399
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
>
> Ran ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.1.1 
> -l --sortorder=newer --skip-credits then carefully stitched the product into 
> the current CHANGES.md and RELEASENOTES.md files being careful to preserve 
> markdown header ABOVE the apache license else the .md files won't render as 
> markdown as in
> {code}
> # HBASE  2.1.1 Release Notes
> 
> These release notes cover new developer and user-facing incompatibilities, 
> important issues, features, and major improvements.
> ---
> 
> {code}
> Check that CHANGES and RELEASENOTES draw properly in a markdown parser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21407) Resolve NPE in backup Master UI

2018-11-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673839#comment-16673839
 ] 

Hudson commented on HBASE-21407:


Results for branch branch-2
[build #1480 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1480/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1480//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1480//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1480//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Resolve NPE in backup Master UI 
> 
>
> Key: HBASE-21407
> URL: https://issues.apache.org/jira/browse/HBASE-21407
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: hbase-21407.master.001.patch, 
> hbase-21407.master.001.patch, hbase-21407.master.001.patch
>
>
> Since some pages of our UI are using jsp instead of jamon, the fix of 
> HBASE-18263 is not enough. Added the fix of HBASE-18263 to the header.jsp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21422) NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR

2018-11-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673838#comment-16673838
 ] 

Hudson commented on HBASE-21422:


Results for branch branch-2
[build #1480 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1480/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1480//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1480//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1480//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR
> --
>
> Key: HBASE-21422
> URL: https://issues.apache.org/jira/browse/HBASE-21422
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21422-v1.patch, HBASE-21422-v1.patch, 
> HBASE-21422.patch
>
>
> {noformat}
> 2018-10-31 16:22:01,302 ERROR [Time-limited test] 
> assignment.TestMergeTableRegionsProcedure(305): error!
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.getStateId(MergeTableRegionsProcedure.java:386)
>   at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.getStateId(MergeTableRegionsProcedure.java:84)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.getCurrentStateId(StateMachineProcedure.java:276)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureTestingUtility.testRecoveryAndDoubleExecution(MasterProcedureTestingUtility.java:414)
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:296)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21430) [hbase-connectors] Move hbase-spark* modules to hbase-connectors repo

2018-11-02 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673813#comment-16673813
 ] 

stack commented on HBASE-21430:
---

Oh boy. Fat pull request in JIRA. I've more to do here but in case anyone wants 
to have a looksee, see above link to pull request.

> [hbase-connectors] Move hbase-spark* modules to hbase-connectors repo
> -
>
> Key: HBASE-21430
> URL: https://issues.apache.org/jira/browse/HBASE-21430
> Project: HBase
>  Issue Type: Bug
>  Components: hbase-connectors, spark
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Exploring moving the spark modules out of core hbase and into 
> hbase-connectors. Perhaps spark is deserving of its own repo (I think 
> [~busbey] was on about this) but meantime, experimenting w/ having it out in 
> hbase-connectors.
> Here is thread on spark integration 
> https://lists.apache.org/thread.html/fd74ef9b9da77abf794664f06ea19c839fb3d543647fb29115081683@%3Cdev.hbase.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21430) [hbase-connectors] Move hbase-spark* modules to hbase-connectors repo

2018-11-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673811#comment-16673811
 ] 

ASF GitHub Bot commented on HBASE-21430:


saintstack opened a new pull request #4: HBASE-21430 [hbase-connectors] Move 
hbase-spark* modules to hbase-connectors repo
URL: https://github.com/apache/hbase-connectors/pull/4
 
 
   Move over the hbase-spark* modules.
   
   TODO: profiles for hadoop2 and hadoop3.
   TODO: tests don't pass yet; dependency issue in spark.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [hbase-connectors] Move hbase-spark* modules to hbase-connectors repo
> -
>
> Key: HBASE-21430
> URL: https://issues.apache.org/jira/browse/HBASE-21430
> Project: HBase
>  Issue Type: Bug
>  Components: hbase-connectors, spark
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Exploring moving the spark modules out of core hbase and into 
> hbase-connectors. Perhaps spark is deserving of its own repo (I think 
> [~busbey] was on about this) but meantime, experimenting w/ having it out in 
> hbase-connectors.
> Here is thread on spark integration 
> https://lists.apache.org/thread.html/fd74ef9b9da77abf794664f06ea19c839fb3d543647fb29115081683@%3Cdev.hbase.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] saintstack opened a new pull request #4: HBASE-21430 [hbase-connectors] Move hbase-spark* modules to hbase-connectors repo

2018-11-02 Thread GitBox
saintstack opened a new pull request #4: HBASE-21430 [hbase-connectors] Move 
hbase-spark* modules to hbase-connectors repo
URL: https://github.com/apache/hbase-connectors/pull/4
 
 
   Move over the hbase-spark* modules.
   
   TODO: profiles for hadoop2 and hadoop3.
   TODO: tests don't pass yet; dependency issue in spark.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-02 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21387:
---
Status: Patch Available  (was: Open)

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-02 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673793#comment-16673793
 ] 

Ted Yu commented on HBASE-21387:


In patch v6, I try to detect discrepancy w.r.t. the number of in progress 
snapshots from the view of {{refreshCache}} versus from the view from 
{{getUnreferencedFiles}}.
If there is discrepancy, keep the file(s) for the current round.

See if this is easier to understand.

Thanks


> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-02 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21387:
---
Attachment: 21387.v6.txt

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v2.txt, 21387.v3.txt, 21387.v6.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673772#comment-16673772
 ] 

Hadoop QA commented on HBASE-20604:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
44s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
47s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 2s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
5s{color} | {color:red} hbase-server: The patch generated 1 new + 22 unchanged 
- 0 fixed = 23 total (was 22) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 3s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 49s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}121m 
39s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}160m 56s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-20604 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924617/HBASE-20604.002.patch 
|
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 94552e4de446 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 25c964e9a3 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14939/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14939/testReport/ |
| 

[jira] [Updated] (HBASE-21430) [hbase-connectors] Move hbase-spark* modules to hbase-connectors repo

2018-11-02 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21430:
--
Description: 
Exploring moving the spark modules out of core hbase and into hbase-connectors. 
Perhaps spark is deserving of its own repo (I think [~busbey] was on about 
this) but meantime, experimenting w/ having it out in hbase-connectors.

Here is thread on spark integration 
https://lists.apache.org/thread.html/fd74ef9b9da77abf794664f06ea19c839fb3d543647fb29115081683@%3Cdev.hbase.apache.org%3E

  was:Exploring moving the spark modules out of core hbase and into 
hbase-connectors. Perhaps spark is deserving of its own repo (I think [~busbey] 
was on about this) but meantime, experimenting w/ having it out in 
hbase-connectors.


> [hbase-connectors] Move hbase-spark* modules to hbase-connectors repo
> -
>
> Key: HBASE-21430
> URL: https://issues.apache.org/jira/browse/HBASE-21430
> Project: HBase
>  Issue Type: Bug
>  Components: hbase-connectors, spark
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Exploring moving the spark modules out of core hbase and into 
> hbase-connectors. Perhaps spark is deserving of its own repo (I think 
> [~busbey] was on about this) but meantime, experimenting w/ having it out in 
> hbase-connectors.
> Here is thread on spark integration 
> https://lists.apache.org/thread.html/fd74ef9b9da77abf794664f06ea19c839fb3d543647fb29115081683@%3Cdev.hbase.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21430) [hbase-connectors] Move hbase-spark* modules to hbase-connectors repo

2018-11-02 Thread stack (JIRA)
stack created HBASE-21430:
-

 Summary: [hbase-connectors] Move hbase-spark* modules to 
hbase-connectors repo
 Key: HBASE-21430
 URL: https://issues.apache.org/jira/browse/HBASE-21430
 Project: HBase
  Issue Type: Bug
  Components: hbase-connectors, spark
Reporter: stack
Assignee: stack


Exploring moving the spark modules out of core hbase and into hbase-connectors. 
Perhaps spark is deserving of its own repo (I think [~busbey] was on about 
this) but meantime, experimenting w/ having it out in hbase-connectors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21425) 2.1.1 fails to start over 1.x data; namespace not assigned

2018-11-02 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673738#comment-16673738
 ] 

stack commented on HBASE-21425:
---

[~allan163] or [~Apache9], a +1 please.


> 2.1.1 fails to start over 1.x data; namespace not assigned
> --
>
> Key: HBASE-21425
> URL: https://issues.apache.org/jira/browse/HBASE-21425
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.3, 2.1.2
>
> Attachments: HBASE-21425.branch-2.1.001.patch, 
> HBASE-21425.branch-2.1.002.patch
>
>
> I tested hbase-2.1.1 starting up over data written by branch-1.4. It failed 
> because the TableStateManager, as part of its startup, failed its migration 
> of table state from zookeeper to hbase:meta table. This is exception:
> {code}
> 2018-11-01 10:49:33,678 ERROR [master/kalashnikov:16000:becomeActiveMaster] 
> master.TableStateManager: Unable to get table hbase:namespace state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> hbase:namespace
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215)
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:147)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableEnabled(AssignmentManager.java:327)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.lambda$processOfflineRegions$3(AssignmentManager.java:1236)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.processOfflineRegions(AssignmentManager.java:1237)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.joinCluster(AssignmentManager.java:1218)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1001)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2257)
>   at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> This happens inside in processOfflineRegions so result of above exception is 
> that procedures are not scheduled; i.e. namespace table assign for one is not 
> assigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21425) 2.1.1 fails to start over 1.x data; namespace not assigned

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673677#comment-16673677
 ] 

Hadoop QA commented on HBASE-21425:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
29s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
25s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
20s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
40s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} branch-2.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
17s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 15s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}188m 
43s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}236m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 |
| JIRA Issue | HBASE-21425 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946704/HBASE-21425.branch-2.1.002.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux a2e18a7976d2 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.1 / 62d73d2068 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14938/testReport/ |
| Max. process+thread count | 4494 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Commented] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

2018-11-02 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673603#comment-16673603
 ] 

Esteban Gutierrez commented on HBASE-20604:
---

[~mdrob] I looked into that and and even it seems related we are doing 
positional reads and there is no pre-fetching involved. 

[~apurtell] we have been running in a production environment for months and we 
haven't run into an issue, also {{entry.getEdit().readFromCells}} needs to 
trigger a mismatch of the consumed entries vs  the expected entries or see an 
{{InvalidProtocolBufferException}} while consuming the WAL and seeking to 
{{originalPosition}}. So far, I think is safe to commit at this point if you 
are ok with the change. Thanks!

> ProtobufLogReader#readNext can incorrectly loop to the same position in the 
> stream until the the WAL is rolled
> --
>
> Key: HBASE-20604
> URL: https://issues.apache.org/jira/browse/HBASE-20604
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, wal
>Affects Versions: 3.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
>Priority: Critical
> Attachments: HBASE-20604.002.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream 
> associated to the {{FSDataInputStream}} from the WAL that we are reading. 
> Under certain conditions, e.g. when using the encryption at rest 
> ({{CryptoInputStream}}) the stream can return partial data which can cause a 
> premature EOF that cause {{inputStream.getPos()}} to return to the same 
> origina position causing {{ProtobufLogReader#readNext}} to re-try over the 
> reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck 
> until the WAL is rolled and causing replication delays up to an hour in some 
> cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21407) Resolve NPE in backup Master UI

2018-11-02 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21407:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: (was: 2.1.0)
   2.1.2
   2.0.3
   Status: Resolved  (was: Patch Available)

Pushed to branch-2.0+. Thanks for the patch [~tianjingyun]

> Resolve NPE in backup Master UI 
> 
>
> Key: HBASE-21407
> URL: https://issues.apache.org/jira/browse/HBASE-21407
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: hbase-21407.master.001.patch, 
> hbase-21407.master.001.patch, hbase-21407.master.001.patch
>
>
> Since some pages of our UI are using jsp instead of jamon, the fix of 
> HBASE-18263 is not enough. Added the fix of HBASE-18263 to the header.jsp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21407) Resolve NPE in backup Master UI

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673524#comment-16673524
 ] 

Hadoop QA commented on HBASE-21407:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}122m 
57s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}134m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21407 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946697/hbase-21407.master.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  |
| uname | Linux be63fcfb9fc3 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / e7f6c2972d |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14937/testReport/ |
| Max. process+thread count | 4794 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14937/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Resolve NPE in backup Master UI 
> 
>
> Key: HBASE-21407
> URL: https://issues.apache.org/jira/browse/HBASE-21407
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Minor
> Fix For: 3.0.0, 2.1.0, 2.2.0
>
> Attachments: hbase-21407.master.001.patch, 
> hbase-21407.master.001.patch, hbase-21407.master.001.patch
>
>
> Since some pages of our UI are using jsp instead of jamon, the fix of 
> HBASE-18263 is not enough. Added the fix of HBASE-18263 to the header.jsp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21429) [hbase-connectors] pom refactoring adding kafka dir intermediary

2018-11-02 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-21429.
---
   Resolution: Fixed
 Assignee: stack
Fix Version/s: connector-1.0.0

Pushed (but should have done fork/pull will do for next one as part of 
playing w/ gitbox).

https://github.com/apache/hbase-connectors/commit/824edc7d9e93ef7021304e611094b032b725be4f

> [hbase-connectors] pom refactoring adding kafka dir intermediary
> 
>
> Key: HBASE-21429
> URL: https://issues.apache.org/jira/browse/HBASE-21429
> Project: HBase
>  Issue Type: Bug
>  Components: hbase-connectors, kafka
>Affects Versions: connector-1.0.0
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: connector-1.0.0
>
>
> Some refactoring of the pom setup in hbase-connectors. Adds in an 
> intermediary pom. Move dependencies needed by kafka down into here from 
> parent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-02 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673480#comment-16673480
 ] 

Ted Yu commented on HBASE-21387:


Currently refreshCache has void return type:
{code}
  private synchronized void refreshCache() throws IOException {
{code}
One potential fix is for {{refreshCache}} to return the name of in progress 
snapshot.
{{getUnreferencedFiles}} stores the returned in progress snapshot name and 
checks whether the name can be found when calling {{getSnapshotsInProgress}}. 
If the name no longer appears as in progress snapshot, {{getUnreferencedFiles}} 
can invoke {{refreshCache}} again.

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v2.txt, 21387.v3.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21429) [hbase-connectors] pom refactoring adding kafka dir intermediary

2018-11-02 Thread stack (JIRA)
stack created HBASE-21429:
-

 Summary: [hbase-connectors] pom refactoring adding kafka dir 
intermediary
 Key: HBASE-21429
 URL: https://issues.apache.org/jira/browse/HBASE-21429
 Project: HBase
  Issue Type: Bug
  Components: hbase-connectors, kafka
Affects Versions: connector-1.0.0
Reporter: stack


Some refactoring of the pom setup in hbase-connectors. Adds in an intermediary 
pom. Move dependencies needed by kafka down into here from parent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21421) Do not kill RS if reportOnlineRegions fails

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673465#comment-16673465
 ] 

Hadoop QA commented on HBASE-21421:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
55s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
43s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
12s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 9s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
24s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
53s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 33s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}122m 29s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}157m 36s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.master.assignment.TestRogueRSAssignment |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 |
| JIRA Issue | HBASE-21421 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946694/HBASE-21421.branch-2.0.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 6bfd71ff9442 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.0 / ec9c25561d |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14936/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 

[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-02 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673467#comment-16673467
 ] 

Ted Yu commented on HBASE-21387:


For the unit test, first idea is to use CountDownLatch to reproduce the race 
condition.
Looking for a way to pass CountDownLatch between TakeSnapshotHandler and 
SnapshotFileCache.

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v2.txt, 21387.v3.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-02 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21387:
---
Attachment: 21387.dbg.txt

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.dbg.txt, 21387.v2.txt, 21387.v3.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-02 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21387:
---
Attachment: (was: 21387.v1.txt)

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.v2.txt, 21387.v3.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-02 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673419#comment-16673419
 ] 

Ted Yu commented on HBASE-21387:


Josh, the race condition surrounding in progress snapshot is described in 
description of the JIRA.

Let me try to :
* collect relevant SnapshotFileCache log
* see if a unit test can be written to reproduce the race condition

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.v1.txt, 21387.v2.txt, 21387.v3.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-02 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21387:
---
Description: 
During recent report from customer where ExportSnapshot failed:
{code}
2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
snapshot.SnapshotReferenceUtil: Can't find hfile: 
44f6c3c646e84de6a63fe30da4fcb3aa in the real 
(hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
 or archive 
(hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
 directory for the primary table. 
{code}
We found the following in log:
{code}
2018-10-09 18:54:23,675 DEBUG 
[00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
cleaner.HFileCleaner: Removing: 
hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
from archive
{code}
The root cause is race condition surrounding in progress snapshot(s) handling 
between refreshCache() and getUnreferencedFiles().
There are two callers of refreshCache: one from RefreshCacheTask#run and the 
other from SnapshotHFileCleaner.

Let's look at the code of refreshCache:
{code}
  if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
{code}
whose intention is to exclude in progress snapshot(s).
Suppose when the RefreshCacheTask runs refreshCache, there is some in progress 
snapshot (about to finish).

When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
lastModifiedTime is up to date. So cleaner proceeds to check in progress 
snapshot(s). However, the snapshot has completed by that time, resulting in 
some file(s) deemed unreferenced.

  was:
During recent report from customer where ExportSnapshot failed:
{code}
2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
snapshot.SnapshotReferenceUtil: Can't find hfile: 
44f6c3c646e84de6a63fe30da4fcb3aa in the real 
(hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
 or archive 
(hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
 directory for the primary table. 
{code}
We found the following in log:
{code}
2018-10-09 18:54:23,675 DEBUG 
[00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
cleaner.HFileCleaner: Removing: 
hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
from archive
{code}
The root cause is race condition surrounding in progress snapshot(s) handling 
between refreshCache() and getUnreferencedFiles().
There are two callers of refreshCache: one from RefreshCacheTask#run and the 
other from SnapshotHFileCleaner.

Let's look at the code of refreshCache:
{code}
  if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
{code}
which only excludes the temp dir, but not in progress snapshot(s).
Suppose when the RefreshCacheTask runs refreshCache, SnapshotDirectoryInfo for 
the in progress snapshot doesn't include all store file (leaving some hole in 
cache).

When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
lastModifiedTime is up to date. So cleaner proceeds to check in progress 
snapshot(s). However, the snapshot has completed by that time, resulting in 
some file(s) deemed unreferenced.


> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.v1.txt, 21387.v2.txt, 21387.v3.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> 

[jira] [Updated] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-02 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21387:
---
Status: Open  (was: Patch Available)

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.v1.txt, 21387.v2.txt, 21387.v3.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> whose intention is to exclude in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, there is some in 
> progress snapshot (about to finish).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21425) 2.1.1 fails to start over 1.x data; namespace not assigned

2018-11-02 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673413#comment-16673413
 ] 

stack commented on HBASE-21425:
---

.0002 fixes comment. Test failure is a flakey.

> 2.1.1 fails to start over 1.x data; namespace not assigned
> --
>
> Key: HBASE-21425
> URL: https://issues.apache.org/jira/browse/HBASE-21425
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.3, 2.1.2
>
> Attachments: HBASE-21425.branch-2.1.001.patch, 
> HBASE-21425.branch-2.1.002.patch
>
>
> I tested hbase-2.1.1 starting up over data written by branch-1.4. It failed 
> because the TableStateManager, as part of its startup, failed its migration 
> of table state from zookeeper to hbase:meta table. This is exception:
> {code}
> 2018-11-01 10:49:33,678 ERROR [master/kalashnikov:16000:becomeActiveMaster] 
> master.TableStateManager: Unable to get table hbase:namespace state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> hbase:namespace
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215)
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:147)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableEnabled(AssignmentManager.java:327)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.lambda$processOfflineRegions$3(AssignmentManager.java:1236)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.processOfflineRegions(AssignmentManager.java:1237)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.joinCluster(AssignmentManager.java:1218)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1001)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2257)
>   at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> This happens inside in processOfflineRegions so result of above exception is 
> that procedures are not scheduled; i.e. namespace table assign for one is not 
> assigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21425) 2.1.1 fails to start over 1.x data; namespace not assigned

2018-11-02 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21425:
--
Attachment: HBASE-21425.branch-2.1.002.patch

> 2.1.1 fails to start over 1.x data; namespace not assigned
> --
>
> Key: HBASE-21425
> URL: https://issues.apache.org/jira/browse/HBASE-21425
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.3, 2.1.2
>
> Attachments: HBASE-21425.branch-2.1.001.patch, 
> HBASE-21425.branch-2.1.002.patch
>
>
> I tested hbase-2.1.1 starting up over data written by branch-1.4. It failed 
> because the TableStateManager, as part of its startup, failed its migration 
> of table state from zookeeper to hbase:meta table. This is exception:
> {code}
> 2018-11-01 10:49:33,678 ERROR [master/kalashnikov:16000:becomeActiveMaster] 
> master.TableStateManager: Unable to get table hbase:namespace state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> hbase:namespace
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215)
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:147)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableEnabled(AssignmentManager.java:327)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.lambda$processOfflineRegions$3(AssignmentManager.java:1236)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.processOfflineRegions(AssignmentManager.java:1237)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.joinCluster(AssignmentManager.java:1218)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1001)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2257)
>   at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> This happens inside in processOfflineRegions so result of above exception is 
> that procedures are not scheduled; i.e. namespace table assign for one is not 
> assigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21351) The force update thread may have race with PE worker when the procedure is rolling back

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673408#comment-16673408
 ] 

Hadoop QA commented on HBASE-21351:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
43s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 5s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 5s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 12s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
39s{color} | {color:green} hbase-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
22s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}238m 
54s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
48s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}293m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21351 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946667/HBASE-21351-v2.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux a48e31a99943 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / ee55b558c0 |
| 

[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-02 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673406#comment-16673406
 ] 

Josh Elser commented on HBASE-21387:


[~yuzhih...@gmail.com], I don't understand what you're doing with 
[^21387.v2.txt].

You filed this issue to fix this apparent race condition, but now you're 
submitting patches for something else entirely here?

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.v1.txt, 21387.v2.txt, 21387.v3.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>   if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> which only excludes the temp dir, but not in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, SnapshotDirectoryInfo 
> for the in progress snapshot doesn't include all store file (leaving some 
> hole in cache).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21425) 2.1.1 fails to start over 1.x data; namespace not assigned

2018-11-02 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21425:
--
Priority: Critical  (was: Major)

> 2.1.1 fails to start over 1.x data; namespace not assigned
> --
>
> Key: HBASE-21425
> URL: https://issues.apache.org/jira/browse/HBASE-21425
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.3, 2.1.2
>
> Attachments: HBASE-21425.branch-2.1.001.patch
>
>
> I tested hbase-2.1.1 starting up over data written by branch-1.4. It failed 
> because the TableStateManager, as part of its startup, failed its migration 
> of table state from zookeeper to hbase:meta table. This is exception:
> {code}
> 2018-11-01 10:49:33,678 ERROR [master/kalashnikov:16000:becomeActiveMaster] 
> master.TableStateManager: Unable to get table hbase:namespace state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> hbase:namespace
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215)
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:147)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableEnabled(AssignmentManager.java:327)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.lambda$processOfflineRegions$3(AssignmentManager.java:1236)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.processOfflineRegions(AssignmentManager.java:1237)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.joinCluster(AssignmentManager.java:1218)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1001)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2257)
>   at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> This happens inside in processOfflineRegions so result of above exception is 
> that procedures are not scheduled; i.e. namespace table assign for one is not 
> assigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21035) Meta Table should be able to online even if all procedures are lost

2018-11-02 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673369#comment-16673369
 ] 

stack commented on HBASE-21035:
---

Thanks [~allan163]. Yours was not the issue over in HBASE-21425? You could not 
bring the meta online? hbck2 is not an option on your hbase2 version? Not yet 
perhaps?

> Meta Table should be able to online even if all procedures are lost
> ---
>
> Key: HBASE-21035
> URL: https://issues.apache.org/jira/browse/HBASE-21035
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21035.branch-2.0.001.patch, 
> HBASE-21035.branch-2.1.001.patch
>
>
> After HBASE-20708, we changed the way we init after master starts. It will 
> only check WAL dirs and compare to Zookeeper RS nodes to decide which server 
> need to expire. For servers which's dir is ending with 'SPLITTING', we assure 
> that there will be a SCP for it.
> But, if the server with the meta region crashed before master restarts, and 
> if all the procedure wals are lost (due to bug, or deleted manually, 
> whatever), the new restarted master will be stuck when initing. Since no one 
> will bring meta region online.
> Although it is an anomaly case, but I think no matter what happens, we need 
> to online meta region. Otherwise, we are sitting ducks, noting can be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21407) Resolve NPE in backup Master UI

2018-11-02 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21407:
--
Attachment: hbase-21407.master.001.patch

> Resolve NPE in backup Master UI 
> 
>
> Key: HBASE-21407
> URL: https://issues.apache.org/jira/browse/HBASE-21407
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Minor
> Fix For: 3.0.0, 2.1.0, 2.2.0
>
> Attachments: hbase-21407.master.001.patch, 
> hbase-21407.master.001.patch, hbase-21407.master.001.patch
>
>
> Since some pages of our UI are using jsp instead of jamon, the fix of 
> HBASE-18263 is not enough. Added the fix of HBASE-18263 to the header.jsp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21421) Do not kill RS if reportOnlineRegions fails

2018-11-02 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673203#comment-16673203
 ] 

Allan Yang commented on HBASE-21421:


Will commit this too all branches if no objection.

> Do not kill RS if reportOnlineRegions fails
> ---
>
> Key: HBASE-21421
> URL: https://issues.apache.org/jira/browse/HBASE-21421
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21421.branch-2.0.001.patch
>
>
> In the periodic regionServerReport from RS to master, we will call 
> master.getAssignmentManager().reportOnlineRegions() to make sure the RS has a 
> same state with Master. If RS holds a region which master think should be on 
> another RS, the Master will kill the RS.
> But, the regionServerReport could be lagging(due to network or something), 
> which can't represent the current state of RegionServer. Besides, we will 
> call reportRegionStateTransition and try forever until it successfully 
> reported to master  when online a region. We can count on 
> reportRegionStateTransition calls.
> I have encountered cases that the regions are closed on the RS and  
> reportRegionStateTransition to master successfully. But later, a lagging 
> regionServerReport tells the master the region is online on the RS(Which is 
> not at the moment, this call may generated some time ago and delayed by 
> network somehow), the the master think the region should be on another RS, 
> and kill the RS, which should not be.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21421) Do not kill RS if reportOnlineRegions fails

2018-11-02 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21421:
---
Attachment: (was: HBASE-21421.branch-2.0.001.patch)

> Do not kill RS if reportOnlineRegions fails
> ---
>
> Key: HBASE-21421
> URL: https://issues.apache.org/jira/browse/HBASE-21421
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21421.branch-2.0.001.patch
>
>
> In the periodic regionServerReport from RS to master, we will call 
> master.getAssignmentManager().reportOnlineRegions() to make sure the RS has a 
> same state with Master. If RS holds a region which master think should be on 
> another RS, the Master will kill the RS.
> But, the regionServerReport could be lagging(due to network or something), 
> which can't represent the current state of RegionServer. Besides, we will 
> call reportRegionStateTransition and try forever until it successfully 
> reported to master  when online a region. We can count on 
> reportRegionStateTransition calls.
> I have encountered cases that the regions are closed on the RS and  
> reportRegionStateTransition to master successfully. But later, a lagging 
> regionServerReport tells the master the region is online on the RS(Which is 
> not at the moment, this call may generated some time ago and delayed by 
> network somehow), the the master think the region should be on another RS, 
> and kill the RS, which should not be.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21421) Do not kill RS if reportOnlineRegions fails

2018-11-02 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21421:
---
Attachment: HBASE-21421.branch-2.0.001.patch

> Do not kill RS if reportOnlineRegions fails
> ---
>
> Key: HBASE-21421
> URL: https://issues.apache.org/jira/browse/HBASE-21421
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21421.branch-2.0.001.patch
>
>
> In the periodic regionServerReport from RS to master, we will call 
> master.getAssignmentManager().reportOnlineRegions() to make sure the RS has a 
> same state with Master. If RS holds a region which master think should be on 
> another RS, the Master will kill the RS.
> But, the regionServerReport could be lagging(due to network or something), 
> which can't represent the current state of RegionServer. Besides, we will 
> call reportRegionStateTransition and try forever until it successfully 
> reported to master  when online a region. We can count on 
> reportRegionStateTransition calls.
> I have encountered cases that the regions are closed on the RS and  
> reportRegionStateTransition to master successfully. But later, a lagging 
> regionServerReport tells the master the region is online on the RS(Which is 
> not at the moment, this call may generated some time ago and delayed by 
> network somehow), the the master think the region should be on another RS, 
> and kill the RS, which should not be.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files

2018-11-02 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673171#comment-16673171
 ] 

Ted Yu commented on HBASE-21387:


>From https://builds.apache.org/job/PreCommit-HBASE-Build/14932/console :
{code}
00:38:23 +1 overall
00:38:23 
00:38:23 | Vote |   Subsystem |  Runtime   | Comment
00:38:23 

00:38:23 |   0  | reexec  |   0m 11s   | Docker mode activated. 
00:38:23 |   0  |  patch  |   0m  2s   | The patch file was not named 
according 
00:38:23 |  | || to hbase's naming conventions. 
Please
00:38:23 |  | || see
00:38:23 |  | || 
https://yetus.apache.org/documentation/0.
00:38:23 |  | || 8.0/precommit-patchnames for
00:38:23 |  | || instructions.
00:38:23 |  | || Prechecks 
00:38:23 |  +1  |  hbaseanti  |   0m  0s   | Patch does not have any 
anti-patterns. 
00:38:23 |  +1  |@author  |   0m  0s   | The patch does not contain any 
@author 
00:38:23 |  | || tags.
00:38:23 |  -0  | test4tests  |   0m  0s   | The patch doesn't appear to 
include any 
00:38:23 |  | || new or modified tests. Please 
justify
00:38:23 |  | || why no new tests are needed 
for this
00:38:23 |  | || patch. Also please list what 
manual
00:38:23 |  | || steps were performed to verify 
this
00:38:23 |  | || patch.
00:38:23 |  | || master Compile Tests 
00:38:23 |  +1  | mvninstall  |   4m 49s   | master passed 
00:38:23 |  +1  |compile  |   1m 46s   | master passed 
00:38:23 |  +1  | checkstyle  |   1m  7s   | master passed 
00:38:23 |  +1  | shadedjars  |   4m  2s   | branch has no errors when 
building our 
00:38:23 |  | || shaded downstream artifacts.
00:38:23 |  +1  |   findbugs  |   2m  1s   | master passed 
00:38:23 |  +1  |javadoc  |   0m 30s   | master passed 
00:38:23 |  | || Patch Compile Tests 
00:38:23 |  +1  | mvninstall  |   4m 45s   | the patch passed 
00:38:23 |  +1  |compile  |   1m 50s   | the patch passed 
00:38:23 |  +1  |  javac  |   1m 50s   | the patch passed 
00:38:23 |  +1  | checkstyle  |   1m  4s   | the patch passed 
00:38:23 |  +1  | whitespace  |   0m  0s   | The patch has no whitespace 
issues. 
00:38:23 |  +1  | shadedjars  |   4m  6s   | patch has no errors when 
building our 
00:38:23 |  | || shaded downstream artifacts.
00:38:24 |  +1  |hadoopcheck  |   9m 53s   | Patch does not cause any 
errors with 
00:38:24 |  | || Hadoop 2.7.4 or 3.0.0.
00:38:24 |  +1  |   findbugs  |   2m 11s   | the patch passed 
00:38:24 |  +1  |javadoc  |   0m 29s   | the patch passed 
00:38:24 |  | || Other Tests 
00:38:24 |  +1  |   unit  | 128m 21s   | hbase-server in the patch 
passed. 
00:38:24 |  +1  | asflicense  |   0m 25s   | The patch does not generate 
ASF License 
00:38:24 |  | || warnings.
00:38:24 |  | | 168m  0s   | 
00:38:24 
00:38:24 
00:38:24 || Subsystem || Report/Notes ||
00:38:24 

00:38:24 | Docker | Client=17.05.0-ce Server=17.05.0-ce 
Image:yetus/hbase:b002b0b |
00:38:24 | JIRA Issue | HBASE-21387 |
00:38:24 | JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946617/21387.v3.txt |
{code}

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> --
>
> Key: HBASE-21387
> URL: https://issues.apache.org/jira/browse/HBASE-21387
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>  Labels: snapshot
> Attachments: 21387.v1.txt, 21387.v2.txt, 21387.v3.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for 

[jira] [Commented] (HBASE-19953) Avoid calling post* hook when procedure fails

2018-11-02 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673150#comment-16673150
 ] 

Josh Elser commented on HBASE-19953:


{quote}So I think 'Avoid calling post* hook when procedure fails' is not stand 
here, since in 1.x we will call postModifyTable even before the modify process 
finish.
{quote}
This is surprising to me. I thought the code was otherwise. I'll have to take 
another look.
{quote}I think we can revert the change here, otherwise, user will suffer a RPC 
timeout when alter/truncate big tables.
{quote}
Caveat: if they're using the synchronous API, right?
{quote}Last but not least, as mentioned in HBASE-20658, the sync latch will be 
release after prepare state in DDLs like enable/disable other than 
alter/truncate(which only release it after the whole process finish). So there 
is a inconsistency here, we are trying hard to make sure postModifyTable to be 
called only after the whole process finish, but for other post* hooks like 
postEnableTable, they are not.
{quote}
How would we reconcile this? IIRC, our public API states that the call will 
block until the operation is complete. Seems to me that you're suggesting that 
we do not actually adhere to that in all places, nor that we should.

In reality, a synchronous API for DDL operations is super-useful – applications 
can't reasonably proceed to run if an action hasn't completed. So, I'd pose the 
question: how would we know when to say that a DDL operation is "completed 
enough"?

I am -1 on just reverting this. That would break a use-case while fixing 
another – not the right way to solve this. Let's talk through the semantics we 
can reasonably implement and what we need to provide for users. From there, we 
can figure out what we can safely implement.

> Avoid calling post* hook when procedure fails
> -
>
> Key: HBASE-19953
> URL: https://issues.apache.org/jira/browse/HBASE-19953
> Project: HBase
>  Issue Type: Bug
>  Components: master, proc-v2
>Reporter: Ramesh Mani
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 2.0.0-beta-2, 2.0.0
>
> Attachments: HBASE-19952.001.branch-2.patch, 
> HBASE-19953.002.branch-2.patch, HBASE-19953.003.branch-2.patch
>
>
> Ramesh pointed out a case where I think we're mishandling some post\* 
> MasterObserver hooks. Specifically, I'm looking at the deleteNamespace.
> We synchronously execute the DeleteNamespace procedure. When the user 
> provides a namespace that isn't empty, the procedure does a rollback (which 
> is just a no-op), but this doesn't propagate an exception up to the 
> NonceProcedureRunnable in {{HMaster#deleteNamespace}}. It took Ramesh 
> pointing it out a bit better to me that the code executes a bit differently 
> than we actually expect.
> I think we need to double-check our post hooks and make sure we aren't 
> invoking them when the procedure actually failed. cc/ [~Apache9], [~stack].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-19953) Avoid calling post* hook when procedure fails

2018-11-02 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673126#comment-16673126
 ] 

Allan Yang edited comment on HBASE-19953 at 11/2/18 2:15 PM:
-

The discussion need to be continued here , I also noticed that We can get a RPC 
timeout when alter/truncate a big table because of the modifications in this 
issue.
This issue turns the whole alter/truncate into a sync op, the op time will be 
unacceptable if the table is huge.
Even in 1.x, modifying table is a async op, we will not wait the regions to be 
reopened, but use admin.getAlterStauts() to check if finish.
So I think 'Avoid calling post* hook when procedure fails' is not stand here, 
since in 1.x we will call postModifyTable even before the modify process finish.
And, in 2.x, we have a hook named postCompletedModifyTableAction which can 
ensure only be executed after the whole process finish.
Last but not least, as mentioned in HBASE-20658, the sync latch will be release 
after prepare state in DDLs like enable/disable other than alter/truncate(which 
only release it after the whole process finish). So there is a inconsistency 
here, we are trying hard to make sure postModifyTable  to be called only after 
the whole process finish, but for other post* hooks like postEnableTable, they 
are not.

I think we can revert the change for alter/truncate here, or we need to release 
the latch in alter/truncate fater prepare state just like other DDLs
otherwise, user will suffer a RPC timeout when alter/truncate big tables.
[~elserj], [~stack],[~Apache9]


was (Author: allan163):
The discussion need to be continued here , I also noticed that We can get a RPC 
timeout when alter/truncate a big table because of the modifications in this 
issue.
This issue turns the whole alter/truncate into a sync op, the op time will be 
unacceptable if the table is huge.
Even in 1.x, modifying table is a async op, we will not wait the regions to be 
reopened, but use admin.getAlterStauts() to check if finish.
So I think 'Avoid calling post* hook when procedure fails' is not stand here, 
since in 1.x we will call postModifyTable even before the modify process finish.
And, in 2.x, we have a hook named postCompletedModifyTableAction which can 
ensure only be executed after the whole process finish.
Last but not least, as mentioned in HBASE-20658, the sync latch will be release 
after prepare state in DDLs like enable/disable other than alter/truncate(which 
only release it after the whole process finish). So there is a inconsistency 
here, we are trying hard to make sure postModifyTable  to be called only after 
the whole process finish, but for other post* hooks like postEnableTable, they 
are not.
I think we can revert the change here, otherwise, user will suffer a RPC 
timeout when alter/truncate big tables.
[~elserj], [~stack],[~Apache9]

> Avoid calling post* hook when procedure fails
> -
>
> Key: HBASE-19953
> URL: https://issues.apache.org/jira/browse/HBASE-19953
> Project: HBase
>  Issue Type: Bug
>  Components: master, proc-v2
>Reporter: Ramesh Mani
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 2.0.0-beta-2, 2.0.0
>
> Attachments: HBASE-19952.001.branch-2.patch, 
> HBASE-19953.002.branch-2.patch, HBASE-19953.003.branch-2.patch
>
>
> Ramesh pointed out a case where I think we're mishandling some post\* 
> MasterObserver hooks. Specifically, I'm looking at the deleteNamespace.
> We synchronously execute the DeleteNamespace procedure. When the user 
> provides a namespace that isn't empty, the procedure does a rollback (which 
> is just a no-op), but this doesn't propagate an exception up to the 
> NonceProcedureRunnable in {{HMaster#deleteNamespace}}. It took Ramesh 
> pointing it out a bit better to me that the code executes a bit differently 
> than we actually expect.
> I think we need to double-check our post hooks and make sure we aren't 
> invoking them when the procedure actually failed. cc/ [~Apache9], [~stack].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19953) Avoid calling post* hook when procedure fails

2018-11-02 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673126#comment-16673126
 ] 

Allan Yang commented on HBASE-19953:


The discussion need to be continued here , I also noticed that We can get a RPC 
timeout when alter/truncate a big table because of the modifications in this 
issue.
This issue turns the whole alter/truncate into a sync op, the op time will be 
unacceptable if the table is huge.
Even in 1.x, modifying table is a async op, we will not wait the regions to be 
reopened, but use admin.getAlterStauts() to check if finish.
So I think 'Avoid calling post* hook when procedure fails' is not stand here, 
since in 1.x we will call postModifyTable even before the modify process finish.
And, in 2.x, we have a hook named postCompletedModifyTableAction which can 
ensure only be executed after the whole process finish.
Last but not least, as mentioned in HBASE-20658, the sync latch will be release 
after prepare state in DDLs like enable/disable other than alter/truncate(which 
only release it after the whole process finish). So there is a inconsistency 
here, we are trying hard to make sure postModifyTable  to be called only after 
the whole process finish, but for other post* hooks like postEnableTable, they 
are not.
I think we can revert the change here, otherwise, user will suffer a RPC 
timeout when alter/truncate big tables.
[~elserj], [~stack],[~Apache9]

> Avoid calling post* hook when procedure fails
> -
>
> Key: HBASE-19953
> URL: https://issues.apache.org/jira/browse/HBASE-19953
> Project: HBase
>  Issue Type: Bug
>  Components: master, proc-v2
>Reporter: Ramesh Mani
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 2.0.0-beta-2, 2.0.0
>
> Attachments: HBASE-19952.001.branch-2.patch, 
> HBASE-19953.002.branch-2.patch, HBASE-19953.003.branch-2.patch
>
>
> Ramesh pointed out a case where I think we're mishandling some post\* 
> MasterObserver hooks. Specifically, I'm looking at the deleteNamespace.
> We synchronously execute the DeleteNamespace procedure. When the user 
> provides a namespace that isn't empty, the procedure does a rollback (which 
> is just a no-op), but this doesn't propagate an exception up to the 
> NonceProcedureRunnable in {{HMaster#deleteNamespace}}. It took Ramesh 
> pointing it out a bit better to me that the code executes a bit differently 
> than we actually expect.
> I think we need to double-check our post hooks and make sure we aren't 
> invoking them when the procedure actually failed. cc/ [~Apache9], [~stack].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21422) NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR

2018-11-02 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21422:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to master and branch-2.

Thanks [~stack] for reviewing.

> NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR
> --
>
> Key: HBASE-21422
> URL: https://issues.apache.org/jira/browse/HBASE-21422
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21422-v1.patch, HBASE-21422-v1.patch, 
> HBASE-21422.patch
>
>
> {noformat}
> 2018-10-31 16:22:01,302 ERROR [Time-limited test] 
> assignment.TestMergeTableRegionsProcedure(305): error!
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.getStateId(MergeTableRegionsProcedure.java:386)
>   at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.getStateId(MergeTableRegionsProcedure.java:84)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.getCurrentStateId(StateMachineProcedure.java:276)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureTestingUtility.testRecoveryAndDoubleExecution(MasterProcedureTestingUtility.java:414)
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:296)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21424) Change flakies and nightlies so scheduled less often

2018-11-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673075#comment-16673075
 ] 

Hudson commented on HBASE-21424:


Results for branch master
[build #581 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/581/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/581//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/581//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/581//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Change flakies and nightlies so scheduled less often
> 
>
> Key: HBASE-21424
> URL: https://issues.apache.org/jira/browse/HBASE-21424
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.9
>
> Attachments: HBASE-21424.branch-2.1.001.patch
>
>
> Infra wrote us:
> {code}
> Chris Thistlethwaite 
> 9:09 AM (25 minutes ago)
>  to dev, team
> Greetings!
> During the Jenkins outage yesterday I noticed a ton of builds from
> HBase-Flaky-Tests 
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-Flaky-Tests/ in
> the queue. Turns out this runs a bunch of pipeline builds every hour
> which clogs up Jenkins, both for you and other projects. For example,
> branch-2.0 is currently queuing 3 builds, waiting on the 4th to finish,
> and it's also behind the HBase Nightly.
> That brings me to HBase Nightly 
> https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/ it
> runs every 6 hours, which is a bit excessive for a nightly build which
> by definition should be once a day. Especially as it gets dangerously
> close to running into itself as builds currently around 4-5 hours of
> build time.
> I suggest something more like Flaky-Tests every 6 hours and the Nightly
> once a day. If you agree to these changes, feel free to update Jenkins.
> Otherwise, I'll update the jobs in the next few days if there is no
> response.
> Please add t...@infra.apache.org and/or my address to any replies as
> we're not subbed to your dev list.
> Thank you,
> Chris T.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19682) Use Collections.emptyList() For Empty List Values

2018-11-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673074#comment-16673074
 ] 

Hudson commented on HBASE-19682:


Results for branch master
[build #581 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/581/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/581//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/581//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/581//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Use Collections.emptyList() For Empty List Values
> -
>
> Key: HBASE-19682
> URL: https://issues.apache.org/jira/browse/HBASE-19682
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-19682.1.patch, HBASE-19682.2.patch, 
> HBASE-19682.3.1.patch, HBASE-19682.4.patch, HBASE-19682.5.patch, 
> HBASE-19682.6.patch, HBASE-19682.7.patch, example.patch
>
>
> Use {{Collection.emptyList()}} for returning an empty list instead of 
> {{return new ArrayList<> ()}}.  The default constructor creates a buffer of 
> size 10 for _ArrayList_ therefore, returning this static value saves on some 
> memory and GC pressure and saves time not having to allocate a new internally 
> buffer for each instantiation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21422) NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR

2018-11-02 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21422:
--
Fix Version/s: 2.2.0
   3.0.0

> NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR
> --
>
> Key: HBASE-21422
> URL: https://issues.apache.org/jira/browse/HBASE-21422
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21422-v1.patch, HBASE-21422-v1.patch, 
> HBASE-21422.patch
>
>
> {noformat}
> 2018-10-31 16:22:01,302 ERROR [Time-limited test] 
> assignment.TestMergeTableRegionsProcedure(305): error!
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.getStateId(MergeTableRegionsProcedure.java:386)
>   at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.getStateId(MergeTableRegionsProcedure.java:84)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.getCurrentStateId(StateMachineProcedure.java:276)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureTestingUtility.testRecoveryAndDoubleExecution(MasterProcedureTestingUtility.java:414)
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:296)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21422) NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR

2018-11-02 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673052#comment-16673052
 ] 

Duo Zhang commented on HBASE-21422:
---

TestMetaTableAccessor passed locally for me. And this is a test issue and we do 
not modify the logic of any non-test code, so I do not think it will cause the 
TestMetaTableAccessor to fail.

Let me commit.

> NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR
> --
>
> Key: HBASE-21422
> URL: https://issues.apache.org/jira/browse/HBASE-21422
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HBASE-21422-v1.patch, HBASE-21422-v1.patch, 
> HBASE-21422.patch
>
>
> {noformat}
> 2018-10-31 16:22:01,302 ERROR [Time-limited test] 
> assignment.TestMergeTableRegionsProcedure(305): error!
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.getStateId(MergeTableRegionsProcedure.java:386)
>   at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.getStateId(MergeTableRegionsProcedure.java:84)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.getCurrentStateId(StateMachineProcedure.java:276)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureTestingUtility.testRecoveryAndDoubleExecution(MasterProcedureTestingUtility.java:414)
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:296)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21422) NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673021#comment-16673021
 ] 

Hadoop QA commented on HBASE-21422:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
44s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
59s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
39s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 2s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 55s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
29s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}131m 28s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}178m 34s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.TestMetaTableAccessor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21422 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946657/HBASE-21422-v1.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 0f47daad986b 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / ee55b558c0 |
| maven | version: Apache Maven 3.5.4 

[jira] [Updated] (HBASE-21351) The force update thread may have race with PE worker when the procedure is rolling back

2018-11-02 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21351:
--
Attachment: HBASE-21351-v2.patch

> The force update thread may have race with PE worker when the procedure is 
> rolling back
> ---
>
> Key: HBASE-21351
> URL: https://issues.apache.org/jira/browse/HBASE-21351
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21351-v1.patch, HBASE-21351-v1.patch, 
> HBASE-21351-v2.patch, HBASE-21351.patch
>
>
> We will acquire the procExecutionLock for a procedure when force updating its 
> state to prevent race with PE worker, but this does not work then the 
> procedure is rolling back.
> If a procedure is failed, we will mark the root procedure stack as FAILED, 
> and then start to rollback the whole procedure stack. We will pop every 
> procedure in the stack and try to rollback them. So we may change the state 
> of a procedure without holding its procExecutionLock when rolling back.
> This means we may persist an intermediate state of a procedure and cause 
> corruption when loading procedures. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21421) Do not kill RS if reportOnlineRegions fails

2018-11-02 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672953#comment-16672953
 ] 

Anoop Sam John commented on HBASE-21421:


A nice issue and find.

> Do not kill RS if reportOnlineRegions fails
> ---
>
> Key: HBASE-21421
> URL: https://issues.apache.org/jira/browse/HBASE-21421
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21421.branch-2.0.001.patch
>
>
> In the periodic regionServerReport from RS to master, we will call 
> master.getAssignmentManager().reportOnlineRegions() to make sure the RS has a 
> same state with Master. If RS holds a region which master think should be on 
> another RS, the Master will kill the RS.
> But, the regionServerReport could be lagging(due to network or something), 
> which can't represent the current state of RegionServer. Besides, we will 
> call reportRegionStateTransition and try forever until it successfully 
> reported to master  when online a region. We can count on 
> reportRegionStateTransition calls.
> I have encountered cases that the regions are closed on the RS and  
> reportRegionStateTransition to master successfully. But later, a lagging 
> regionServerReport tells the master the region is online on the RS(Which is 
> not at the moment, this call may generated some time ago and delayed by 
> network somehow), the the master think the region should be on another RS, 
> and kill the RS, which should not be.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.

2018-11-02 Thread Jeongdae Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672938#comment-16672938
 ] 

Jeongdae Kim edited comment on HBASE-21418 at 11/2/18 11:13 AM:


Thanks for your comments. I’ll reflect your comments to the next patch.
{quote}
Generally I am not a fan of adding more HBase and/or scan options that one has 
to know about. (which is why I had removed the LOOK_AHEAD hint that I myself 
had added a bit earlier).
{quote}
I 100% agree with you, and would like to do without options too. but, we don't 
have any information like next block index as far as we use 
ConcurrentSkipListMap as data structure, I couldn’t find a nice solution 
without extra cost.

{quote}
Why max versions here? The SEEKing can also be an issue with many columns, 
right?
 
If we can, let's find a heuristic to do this automatically (like I did with 
HFiles), so that a user won't have to hint.
{quote}
Right, I used the max versions as a heuristic in case that users pass no hint. 
I had no any idea about proper heuristic.
If we can bear small extra costs when putting cells into a memstore, What about 
maintaining some stats for columns and using it to decide whether doing seek 
operations or not. Let me try to make a patch for this.


was (Author: jeongdae kim):
Thanks for your comments. I’ll reflect your comments to the next patch.
{quote}
Generally I am not a fan of adding more HBase and/or scan options that one has 
to know about. (which is why I had removed the LOOK_AHEAD hint that I myself 
had added a bit earlier).
{quote}
I 100% agree with you, and would like to do without options too. but, I 
couldn’t find a nice solution without extra cost.

{quote}
Why max versions here? The SEEKing can also be an issue with many columns, 
right?
 
If we can, let's find a heuristic to do this automatically (like I did with 
HFiles), so that a user won't have to hint.
{quote}
Right, I used the max versions as a heuristic in case that users pass no hint. 
I had no any idea about proper heuristic.
If we can bear small extra costs when putting cells into a memstore, What about 
maintaining some stats for columns and using it to decide whether doing seek 
operations or not. Let me try to make a patch for this.

> Reduce a number of reseek operations in MemstoreScanner when seek point is 
> close to the current row.
> 
>
> Key: HBASE-21418
> URL: https://issues.apache.org/jira/browse/HBASE-21418
> Project: HBase
>  Issue Type: Improvement
>  Components: scan, Scanners
>Affects Versions: 1.2.5
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>  Labels: performance
> Attachments: HBASE-21418.branch-1.2.001.patch, 
> HBASE-21418.branch-1.2.001.patch
>
>
> We observed “responseTooSlow” logs for Get requests in our production 
> clusters. even some get requests were responded after 10 seconds.
> Affected get requests were done with the timerange, and target rows have many 
> columns that have some versions.
> We reproduced this issue, and found this behavior happens only when scanning 
> in the memstore. after flushing the HStore, this slow response issue for Get 
> disappeared and all same get requests are responded very quickly.
>  
> We investigated this case, and found this performance difference between 
> memstore scanner and hfile scanner is caused by the number of reseek 
> operations executed while scanning. When a store scanner needs to reseek the 
> next column, Hfile scanner wisely decide whether it have to reseek or not by 
> checking the seek point is in current block, whereas memstore scanner just do 
> reseek without decision unlike Hfile scanner. In our case, almost all columns 
> in the memstore have older timestamp than scan(get)’s timerange, and so many 
> reseek operations occur as much as about the number of columns. This results 
> in increasing the response time of Get requests sporadically.
>  
> To improve the reseek operation of the memstore scanner, i think it’s better 
> skipping than seeking when reseek requested, if seek point is quite close to 
> current cell that the scanner is pointing now.(Actually, i changed 
> MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time 
> of Get was 6x faster than before) But we can’t decide whether seek point is 
> close to the current cell or not, because memstore scannner has no 
> information such as next block index.
>  Before HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this 
> case, and it may be deprecated someday. But, i think that hint is still be 
> useful for the memstore scanner to try to skip first, before reseeking, and 
> with this option we can make reseek operations 

[jira] [Commented] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.

2018-11-02 Thread Jeongdae Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672938#comment-16672938
 ] 

Jeongdae Kim commented on HBASE-21418:
--

Thanks for your comments. I’ll reflect your comments to the next patch.
{quote}
Generally I am not a fan of adding more HBase and/or scan options that one has 
to know about. (which is why I had removed the LOOK_AHEAD hint that I myself 
had added a bit earlier).
{quote}
I 100% agree with you, and would like to do without options too. but, I 
couldn’t find a nice solution without extra cost.

{quote}
Why max versions here? The SEEKing can also be an issue with many columns, 
right?
 
If we can, let's find a heuristic to do this automatically (like I did with 
HFiles), so that a user won't have to hint.
{quote}
Right, I used the max versions as a heuristic in case that users pass no hint. 
I had no any idea about proper heuristic.
If we can bear small extra costs when putting cells into a memstore, What about 
maintaining some stats for columns and using it to decide whether doing seek 
operations or not. Let me try to make a patch for this.

> Reduce a number of reseek operations in MemstoreScanner when seek point is 
> close to the current row.
> 
>
> Key: HBASE-21418
> URL: https://issues.apache.org/jira/browse/HBASE-21418
> Project: HBase
>  Issue Type: Improvement
>  Components: scan, Scanners
>Affects Versions: 1.2.5
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>  Labels: performance
> Attachments: HBASE-21418.branch-1.2.001.patch, 
> HBASE-21418.branch-1.2.001.patch
>
>
> We observed “responseTooSlow” logs for Get requests in our production 
> clusters. even some get requests were responded after 10 seconds.
> Affected get requests were done with the timerange, and target rows have many 
> columns that have some versions.
> We reproduced this issue, and found this behavior happens only when scanning 
> in the memstore. after flushing the HStore, this slow response issue for Get 
> disappeared and all same get requests are responded very quickly.
>  
> We investigated this case, and found this performance difference between 
> memstore scanner and hfile scanner is caused by the number of reseek 
> operations executed while scanning. When a store scanner needs to reseek the 
> next column, Hfile scanner wisely decide whether it have to reseek or not by 
> checking the seek point is in current block, whereas memstore scanner just do 
> reseek without decision unlike Hfile scanner. In our case, almost all columns 
> in the memstore have older timestamp than scan(get)’s timerange, and so many 
> reseek operations occur as much as about the number of columns. This results 
> in increasing the response time of Get requests sporadically.
>  
> To improve the reseek operation of the memstore scanner, i think it’s better 
> skipping than seeking when reseek requested, if seek point is quite close to 
> current cell that the scanner is pointing now.(Actually, i changed 
> MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time 
> of Get was 6x faster than before) But we can’t decide whether seek point is 
> close to the current cell or not, because memstore scannner has no 
> information such as next block index.
>  Before HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this 
> case, and it may be deprecated someday. But, i think that hint is still be 
> useful for the memstore scanner to try to skip first, before reseeking, and 
> with this option we can make reseek operations of memstore scanner smarter.
>  
> I tested this patch in our case, and got the same result as i changed 
> matchcode (mentioned above).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.

2018-11-02 Thread Jeongdae Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672936#comment-16672936
 ] 

Jeongdae Kim commented on HBASE-21418:
--

Thanks for the comment [~yuzhih...@gmail.com].

{quote}
What is TestLookAheadBeforeReseek supposed to show without the fix ?
{quote}
It's for showing performance difference . I'll remove the test from my patch 
and make an external link for this test.

> Reduce a number of reseek operations in MemstoreScanner when seek point is 
> close to the current row.
> 
>
> Key: HBASE-21418
> URL: https://issues.apache.org/jira/browse/HBASE-21418
> Project: HBase
>  Issue Type: Improvement
>  Components: scan, Scanners
>Affects Versions: 1.2.5
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>  Labels: performance
> Attachments: HBASE-21418.branch-1.2.001.patch, 
> HBASE-21418.branch-1.2.001.patch
>
>
> We observed “responseTooSlow” logs for Get requests in our production 
> clusters. even some get requests were responded after 10 seconds.
> Affected get requests were done with the timerange, and target rows have many 
> columns that have some versions.
> We reproduced this issue, and found this behavior happens only when scanning 
> in the memstore. after flushing the HStore, this slow response issue for Get 
> disappeared and all same get requests are responded very quickly.
>  
> We investigated this case, and found this performance difference between 
> memstore scanner and hfile scanner is caused by the number of reseek 
> operations executed while scanning. When a store scanner needs to reseek the 
> next column, Hfile scanner wisely decide whether it have to reseek or not by 
> checking the seek point is in current block, whereas memstore scanner just do 
> reseek without decision unlike Hfile scanner. In our case, almost all columns 
> in the memstore have older timestamp than scan(get)’s timerange, and so many 
> reseek operations occur as much as about the number of columns. This results 
> in increasing the response time of Get requests sporadically.
>  
> To improve the reseek operation of the memstore scanner, i think it’s better 
> skipping than seeking when reseek requested, if seek point is quite close to 
> current cell that the scanner is pointing now.(Actually, i changed 
> MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time 
> of Get was 6x faster than before) But we can’t decide whether seek point is 
> close to the current cell or not, because memstore scannner has no 
> information such as next block index.
>  Before HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this 
> case, and it may be deprecated someday. But, i think that hint is still be 
> useful for the memstore scanner to try to skip first, before reseeking, and 
> with this option we can make reseek operations of memstore scanner smarter.
>  
> I tested this patch in our case, and got the same result as i changed 
> matchcode (mentioned above).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21424) Change flakies and nightlies so scheduled less often

2018-11-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672898#comment-16672898
 ] 

Hudson commented on HBASE-21424:


Results for branch branch-1
[build #535 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/535/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/535//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/535//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/535//JDK8_Nightly_Build_Report_(Hadoop2)/]




(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> Change flakies and nightlies so scheduled less often
> 
>
> Key: HBASE-21424
> URL: https://issues.apache.org/jira/browse/HBASE-21424
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.9
>
> Attachments: HBASE-21424.branch-2.1.001.patch
>
>
> Infra wrote us:
> {code}
> Chris Thistlethwaite 
> 9:09 AM (25 minutes ago)
>  to dev, team
> Greetings!
> During the Jenkins outage yesterday I noticed a ton of builds from
> HBase-Flaky-Tests 
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-Flaky-Tests/ in
> the queue. Turns out this runs a bunch of pipeline builds every hour
> which clogs up Jenkins, both for you and other projects. For example,
> branch-2.0 is currently queuing 3 builds, waiting on the 4th to finish,
> and it's also behind the HBase Nightly.
> That brings me to HBase Nightly 
> https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/ it
> runs every 6 hours, which is a bit excessive for a nightly build which
> by definition should be once a day. Especially as it gets dangerously
> close to running into itself as builds currently around 4-5 hours of
> build time.
> I suggest something more like Flaky-Tests every 6 hours and the Nightly
> once a day. If you agree to these changes, feel free to update Jenkins.
> Otherwise, I'll update the jobs in the next few days if there is no
> response.
> Please add t...@infra.apache.org and/or my address to any replies as
> we're not subbed to your dev list.
> Thank you,
> Chris T.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21301) Heatmap for key access patterns

2018-11-02 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672866#comment-16672866
 ] 

Reid Chan commented on HBASE-21301:
---

Looks like you didn't publish it after modifications? But never mind, try it 
next time.
Reviewing.

> Heatmap for key access patterns
> ---
>
> Key: HBASE-21301
> URL: https://issues.apache.org/jira/browse/HBASE-21301
> Project: HBase
>  Issue Type: Improvement
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-21301.v0.master.patch
>
>
> Google recently released a beta feature for Cloud Bigtable which presents a 
> heat map of the keyspace. *Given how hotspotting comes up now and again here, 
> this is a good idea for giving HBase ops a tool to be proactive about it.* 
> >>>
> Additionally, we are announcing the beta version of Key Visualizer, a 
> visualization tool for Cloud Bigtable key access patterns. Key Visualizer 
> helps debug performance issues due to unbalanced access patterns across the 
> key space, or single rows that are too large or receiving too much read or 
> write activity. With Key Visualizer, you get a heat map visualization of 
> access patterns over time, along with the ability to zoom into specific key 
> or time ranges, or select a specific row to find the full row key ID that's 
> responsible for a hotspot. Key Visualizer is automatically enabled for Cloud 
> Bigtable clusters with sufficient data or activity, and does not affect Cloud 
> Bigtable cluster performance. 
> <<<
> From 
> [https://cloudplatform.googleblog.com/2018/07/on-gcp-your-database-your-way.html]
> (Copied this description from the write-up by [~apurtell], thanks Andrew.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21301) Heatmap for key access patterns

2018-11-02 Thread Archana Katiyar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672860#comment-16672860
 ] 

Archana Katiyar commented on HBASE-21301:
-

Thanks [~reidchan]; this is my first review request so was not aware of the 
drill. Added 'hbase' group and you too in the review.

> Heatmap for key access patterns
> ---
>
> Key: HBASE-21301
> URL: https://issues.apache.org/jira/browse/HBASE-21301
> Project: HBase
>  Issue Type: Improvement
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-21301.v0.master.patch
>
>
> Google recently released a beta feature for Cloud Bigtable which presents a 
> heat map of the keyspace. *Given how hotspotting comes up now and again here, 
> this is a good idea for giving HBase ops a tool to be proactive about it.* 
> >>>
> Additionally, we are announcing the beta version of Key Visualizer, a 
> visualization tool for Cloud Bigtable key access patterns. Key Visualizer 
> helps debug performance issues due to unbalanced access patterns across the 
> key space, or single rows that are too large or receiving too much read or 
> write activity. With Key Visualizer, you get a heat map visualization of 
> access patterns over time, along with the ability to zoom into specific key 
> or time ranges, or select a specific row to find the full row key ID that's 
> responsible for a hotspot. Key Visualizer is automatically enabled for Cloud 
> Bigtable clusters with sufficient data or activity, and does not affect Cloud 
> Bigtable cluster performance. 
> <<<
> From 
> [https://cloudplatform.googleblog.com/2018/07/on-gcp-your-database-your-way.html]
> (Copied this description from the write-up by [~apurtell], thanks Andrew.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21301) Heatmap for key access patterns

2018-11-02 Thread Reid Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672847#comment-16672847
 ] 

Reid Chan commented on HBASE-21301:
---

For reviews, besides *people*, you are free to include *hbase* groups, then 
other developers and i can receive review request as well. ;)


> Heatmap for key access patterns
> ---
>
> Key: HBASE-21301
> URL: https://issues.apache.org/jira/browse/HBASE-21301
> Project: HBase
>  Issue Type: Improvement
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-21301.v0.master.patch
>
>
> Google recently released a beta feature for Cloud Bigtable which presents a 
> heat map of the keyspace. *Given how hotspotting comes up now and again here, 
> this is a good idea for giving HBase ops a tool to be proactive about it.* 
> >>>
> Additionally, we are announcing the beta version of Key Visualizer, a 
> visualization tool for Cloud Bigtable key access patterns. Key Visualizer 
> helps debug performance issues due to unbalanced access patterns across the 
> key space, or single rows that are too large or receiving too much read or 
> write activity. With Key Visualizer, you get a heat map visualization of 
> access patterns over time, along with the ability to zoom into specific key 
> or time ranges, or select a specific row to find the full row key ID that's 
> responsible for a hotspot. Key Visualizer is automatically enabled for Cloud 
> Bigtable clusters with sufficient data or activity, and does not affect Cloud 
> Bigtable cluster performance. 
> <<<
> From 
> [https://cloudplatform.googleblog.com/2018/07/on-gcp-your-database-your-way.html]
> (Copied this description from the write-up by [~apurtell], thanks Andrew.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21301) Heatmap for key access patterns

2018-11-02 Thread Archana Katiyar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672831#comment-16672831
 ] 

Archana Katiyar commented on HBASE-21301:
-

Thanks [~reidchan] for the pointer; patch uploaded and review request created  
- [https://reviews.apache.org/r/69240/] .

Also, the patch has UI as well.

> Heatmap for key access patterns
> ---
>
> Key: HBASE-21301
> URL: https://issues.apache.org/jira/browse/HBASE-21301
> Project: HBase
>  Issue Type: Improvement
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-21301.v0.master.patch
>
>
> Google recently released a beta feature for Cloud Bigtable which presents a 
> heat map of the keyspace. *Given how hotspotting comes up now and again here, 
> this is a good idea for giving HBase ops a tool to be proactive about it.* 
> >>>
> Additionally, we are announcing the beta version of Key Visualizer, a 
> visualization tool for Cloud Bigtable key access patterns. Key Visualizer 
> helps debug performance issues due to unbalanced access patterns across the 
> key space, or single rows that are too large or receiving too much read or 
> write activity. With Key Visualizer, you get a heat map visualization of 
> access patterns over time, along with the ability to zoom into specific key 
> or time ranges, or select a specific row to find the full row key ID that's 
> responsible for a hotspot. Key Visualizer is automatically enabled for Cloud 
> Bigtable clusters with sufficient data or activity, and does not affect Cloud 
> Bigtable cluster performance. 
> <<<
> From 
> [https://cloudplatform.googleblog.com/2018/07/on-gcp-your-database-your-way.html]
> (Copied this description from the write-up by [~apurtell], thanks Andrew.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21422) NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR

2018-11-02 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21422:
--
Attachment: HBASE-21422-v1.patch

> NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR
> --
>
> Key: HBASE-21422
> URL: https://issues.apache.org/jira/browse/HBASE-21422
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HBASE-21422-v1.patch, HBASE-21422-v1.patch, 
> HBASE-21422.patch
>
>
> {noformat}
> 2018-10-31 16:22:01,302 ERROR [Time-limited test] 
> assignment.TestMergeTableRegionsProcedure(305): error!
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.getStateId(MergeTableRegionsProcedure.java:386)
>   at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.getStateId(MergeTableRegionsProcedure.java:84)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.getCurrentStateId(StateMachineProcedure.java:276)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureTestingUtility.testRecoveryAndDoubleExecution(MasterProcedureTestingUtility.java:414)
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:296)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21422) NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR

2018-11-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672803#comment-16672803
 ] 

Hadoop QA commented on HBASE-21422:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
39s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
19s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
 5s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
12m 31s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
45s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}282m 45s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}341m 45s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.client.TestFromClientSideWithCoprocessor |
|   | hadoop.hbase.client.TestSnapshotTemporaryDirectoryWithRegionReplicas |
|   | hadoop.hbase.master.procedure.TestServerCrashProcedureWithReplicas |
|   | hadoop.hbase.client.TestFromClientSide3 |
|   | hadoop.hbase.client.TestFromClientSide |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21422 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946639/HBASE-21422-v1.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 50b6368746f7 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 

[jira] [Created] (HBASE-21428) Performance issue due to userRegionLock in the ConnectionManager.

2018-11-02 Thread koo (JIRA)
koo created HBASE-21428:
---

 Summary: Performance issue due to userRegionLock in the 
ConnectionManager.
 Key: HBASE-21428
 URL: https://issues.apache.org/jira/browse/HBASE-21428
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.7
Reporter: koo


My service is that execute a lot of puts using HTableMultiplexer.
After the version change, most of the requests are rejected.

It works fine in 1.2.6.1, but there is a problem in 1.2.7.

This issue is related with the HBASE-19260.

Most of my threads are using a lot of time as below.

 
|"Worker-972" #2479 daemon prio=5 os_prio=0 tid=0x7f8cea86b000 nid=0x4c8c 
waiting on condition [0x7f8b78104000]
 java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x0005dd703b78> (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
 at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
 at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
 at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1274)
 at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1186)
 at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1170)
 at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1127)
 at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:962)
 at 
org.apache.hadoop.hbase.client.HTableMultiplexer.put(HTableMultiplexer.java:206)
 at 
org.apache.hadoop.hbase.client.HTableMultiplexer.put(HTableMultiplexer.java:150)|

 

When I looked at the issue(HBASE-19260), I recognized the dangerous of to allow 
accessessing multiple threads.
However, Already create many threads with the limitations
I think it is very inefficient to allow only one thread access.

 
| this.metaLookupPool = getThreadPool(
 conf.getInt("hbase.hconnection.meta.lookup.threads.max", 128),
 conf.getInt("hbase.hconnection.meta.lookup.threads.core", 10),
 "-metaLookup-shared-", new LinkedBlockingQueue());|

 

I want to suggest changing it that allow to have multiple locks.(but not the 
entire thread)

The following is pseudocode.

 
|int lockSize = conf.getInt("hbase.hconnection.meta.lookup.threads.max", 128) / 
2;
BlockingQueue userRegionLockQueue = new 
LinkedBlockingQueue();
 for (int i=0; i 

[jira] [Commented] (HBASE-21347) Backport HBASE-21200 "Memstore flush doesn't finish because of seekToPreviousRow() in memstore scanner." to branch-1

2018-11-02 Thread Toshihiro Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672629#comment-16672629
 ] 

Toshihiro Suzuki commented on HBASE-21347:
--

Could you please review the patch when you get a chance? [~apurtell]

> Backport HBASE-21200 "Memstore flush doesn't finish because of 
> seekToPreviousRow() in memstore scanner." to branch-1
> 
>
> Key: HBASE-21347
> URL: https://issues.apache.org/jira/browse/HBASE-21347
> Project: HBase
>  Issue Type: Sub-task
>  Components: backport, Scanners
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Critical
> Attachments: HBASE-21347.branch-1.001.patch
>
>
> Backport parent issue to branch-1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)