date:20181216

[jira] [Commented] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

2018-12-16 Thread Jingyun Tian (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722750#comment-16722750
 ] 

Jingyun Tian commented on HBASE-21565:
--

[~zghaobac] Before HBASE-21508, there is no ServerNode check when submitting a 
SCP. Thus I think there may also have other changes that may affect this patch. 
I didn't dig into that patch, let me know if I'm wrong.

> Delete dead server from dead server list too early leads to concurrent Server 
> Crash Procedures(SCP) for a same server
> -
>
> Key: HBASE-21565
> URL: https://issues.apache.org/jira/browse/HBASE-21565
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Critical
> Attachments: HBASE-21565.master.001.patch, 
> HBASE-21565.master.002.patch, HBASE-21565.master.003.patch, 
> HBASE-21565.master.004.patch, HBASE-21565.master.005.patch, 
> HBASE-21565.master.006.patch, HBASE-21565.master.007.patch, 
> HBASE-21565.master.008.patch
>
>
> There are 2 kinds of SCP for a same server will be scheduled during cluster 
> restart, one is ZK session timeout, the other one is new server report in 
> will cause the stale one do fail over. The only barrier for these 2 kinds of 
> SCP is check if the server is in the dead server list.
> {code}
> if (this.deadservers.isDeadServer(serverName)) {
>   LOG.warn("Expiration called on {} but crash processing already in 
> progress", serverName);
>   return false;
> }
> {code}
> But the problem is when master finish initialization, it will delete all 
> stale servers from dead server list. Thus when the SCP for ZK session timeout 
> come in, the barrier is already removed.
> Here is the logs that how this problem occur.
> {code}
> 2018-12-07,11:42:37,589 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=9, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> 2018-12-07,11:42:58,007 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=444, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> {code}
> Now we can see two SCP are scheduled for the same server.
> But the first procedure is finished after the second SCP starts.
> {code}
> 2018-12-07,11:43:08,038 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=9, 
> state=SUCCESS, hasLock=false; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false 
> in 30.5340sec
> {code}
> Thus it will leads the problem that regions will be assigned twice.
> {code}
> 2018-12-07,12:16:33,039 WARN 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, 
> location=c4-hadoop-tst-st28.bj,29100,1544154149607, table=test_failover, 
> region=459b3130b40caf3b8f3e1421766f4089 reported OPEN on 
> server=c4-hadoop-tst-st29.bj,29100,1544154149615 but state has otherwise
> {code}
> And here we can see the server is removed from dead server list before the 
> second SCP starts.
> {code}
> 2018-12-07,11:42:44,938 DEBUG org.apache.hadoop.hbase.master.DeadServer: 
> Removed c4-hadoop-tst-st27.bj,29100,1544153846859 ; numProcessing=3
> {code}
> Thus we should not delete dead server from dead server list immediately.
> Patch to fix this problem will be upload later.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21583) Insert info Level0 instead of throwing Exceptions when doing bulkload on tables with StripeCompaction enabled

2018-12-16 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722742#comment-16722742
 ] 

Hadoop QA commented on HBASE-21583:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 9s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
16s{color} | {color:green} hbase-server: The patch generated 0 new + 14 
unchanged - 5 fixed = 14 total (was 19) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
21s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 18s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}130m  
3s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}170m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21583 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12951986/HBASE-21583-master-v3.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 529945a40ca3 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 
17 11:07:07 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / ac0b3bb547 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15299/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15299/testReport/ |
| Max. process+thread count | 4098 (vs. ulimit of

[jira] [Commented] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

2018-12-16 Thread Guanghao Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722726#comment-16722726
 ] 

Guanghao Zhang commented on HBASE-21565:


{quote}bq.This patch relies on HBASE-21508, and I think it's useful to port 
these 2 patches to branch-2.0.
{quote}
Can you explain more about this?

And FYI [~allan163] please take a look about this for branch-2.0. Thanks.

> Delete dead server from dead server list too early leads to concurrent Server 
> Crash Procedures(SCP) for a same server
> -
>
> Key: HBASE-21565
> URL: https://issues.apache.org/jira/browse/HBASE-21565
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Critical
> Attachments: HBASE-21565.master.001.patch, 
> HBASE-21565.master.002.patch, HBASE-21565.master.003.patch, 
> HBASE-21565.master.004.patch, HBASE-21565.master.005.patch, 
> HBASE-21565.master.006.patch, HBASE-21565.master.007.patch, 
> HBASE-21565.master.008.patch
>
>
> There are 2 kinds of SCP for a same server will be scheduled during cluster 
> restart, one is ZK session timeout, the other one is new server report in 
> will cause the stale one do fail over. The only barrier for these 2 kinds of 
> SCP is check if the server is in the dead server list.
> {code}
> if (this.deadservers.isDeadServer(serverName)) {
>   LOG.warn("Expiration called on {} but crash processing already in 
> progress", serverName);
>   return false;
> }
> {code}
> But the problem is when master finish initialization, it will delete all 
> stale servers from dead server list. Thus when the SCP for ZK session timeout 
> come in, the barrier is already removed.
> Here is the logs that how this problem occur.
> {code}
> 2018-12-07,11:42:37,589 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=9, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> 2018-12-07,11:42:58,007 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=444, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> {code}
> Now we can see two SCP are scheduled for the same server.
> But the first procedure is finished after the second SCP starts.
> {code}
> 2018-12-07,11:43:08,038 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=9, 
> state=SUCCESS, hasLock=false; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false 
> in 30.5340sec
> {code}
> Thus it will leads the problem that regions will be assigned twice.
> {code}
> 2018-12-07,12:16:33,039 WARN 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, 
> location=c4-hadoop-tst-st28.bj,29100,1544154149607, table=test_failover, 
> region=459b3130b40caf3b8f3e1421766f4089 reported OPEN on 
> server=c4-hadoop-tst-st29.bj,29100,1544154149615 but state has otherwise
> {code}
> And here we can see the server is removed from dead server list before the 
> second SCP starts.
> {code}
> 2018-12-07,11:42:44,938 DEBUG org.apache.hadoop.hbase.master.DeadServer: 
> Removed c4-hadoop-tst-st27.bj,29100,1544153846859 ; numProcessing=3
> {code}
> Thus we should not delete dead server from dead server list immediately.
> Patch to fix this problem will be upload later.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

2018-12-16 Thread Guanghao Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722724#comment-16722724
 ] 

Guanghao Zhang commented on HBASE-21565:


{quote}bq. Assert.assertTrue(serverNode != null);
{quote}
assertNull and assertNotNull may be better... And please add check for ONLINE 
state when server node is not null. Thanks.

> Delete dead server from dead server list too early leads to concurrent Server 
> Crash Procedures(SCP) for a same server
> -
>
> Key: HBASE-21565
> URL: https://issues.apache.org/jira/browse/HBASE-21565
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Critical
> Attachments: HBASE-21565.master.001.patch, 
> HBASE-21565.master.002.patch, HBASE-21565.master.003.patch, 
> HBASE-21565.master.004.patch, HBASE-21565.master.005.patch, 
> HBASE-21565.master.006.patch, HBASE-21565.master.007.patch, 
> HBASE-21565.master.008.patch
>
>
> There are 2 kinds of SCP for a same server will be scheduled during cluster 
> restart, one is ZK session timeout, the other one is new server report in 
> will cause the stale one do fail over. The only barrier for these 2 kinds of 
> SCP is check if the server is in the dead server list.
> {code}
> if (this.deadservers.isDeadServer(serverName)) {
>   LOG.warn("Expiration called on {} but crash processing already in 
> progress", serverName);
>   return false;
> }
> {code}
> But the problem is when master finish initialization, it will delete all 
> stale servers from dead server list. Thus when the SCP for ZK session timeout 
> come in, the barrier is already removed.
> Here is the logs that how this problem occur.
> {code}
> 2018-12-07,11:42:37,589 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=9, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> 2018-12-07,11:42:58,007 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=444, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> {code}
> Now we can see two SCP are scheduled for the same server.
> But the first procedure is finished after the second SCP starts.
> {code}
> 2018-12-07,11:43:08,038 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=9, 
> state=SUCCESS, hasLock=false; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false 
> in 30.5340sec
> {code}
> Thus it will leads the problem that regions will be assigned twice.
> {code}
> 2018-12-07,12:16:33,039 WARN 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, 
> location=c4-hadoop-tst-st28.bj,29100,1544154149607, table=test_failover, 
> region=459b3130b40caf3b8f3e1421766f4089 reported OPEN on 
> server=c4-hadoop-tst-st29.bj,29100,1544154149615 but state has otherwise
> {code}
> And here we can see the server is removed from dead server list before the 
> second SCP starts.
> {code}
> 2018-12-07,11:42:44,938 DEBUG org.apache.hadoop.hbase.master.DeadServer: 
> Removed c4-hadoop-tst-st27.bj,29100,1544153846859 ; numProcessing=3
> {code}
> Thus we should not delete dead server from dead server list immediately.
> Patch to fix this problem will be upload later.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

2018-12-16 Thread Jingyun Tian (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722719#comment-16722719
 ] 

Jingyun Tian commented on HBASE-21565:
--

[~zghaobac] Thanks for your comment. Added some status check for ServerNode in 
patch 008. 
[~stack] This patch relies on HBASE-21508, and I think it's useful to port 
these 2 patches to branch-2.0. But I'm not sure if there is any conflicts. 
How do you think? [~zghaobac] and [~stack].

> Delete dead server from dead server list too early leads to concurrent Server 
> Crash Procedures(SCP) for a same server
> -
>
> Key: HBASE-21565
> URL: https://issues.apache.org/jira/browse/HBASE-21565
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Critical
> Attachments: HBASE-21565.master.001.patch, 
> HBASE-21565.master.002.patch, HBASE-21565.master.003.patch, 
> HBASE-21565.master.004.patch, HBASE-21565.master.005.patch, 
> HBASE-21565.master.006.patch, HBASE-21565.master.007.patch, 
> HBASE-21565.master.008.patch
>
>
> There are 2 kinds of SCP for a same server will be scheduled during cluster 
> restart, one is ZK session timeout, the other one is new server report in 
> will cause the stale one do fail over. The only barrier for these 2 kinds of 
> SCP is check if the server is in the dead server list.
> {code}
> if (this.deadservers.isDeadServer(serverName)) {
>   LOG.warn("Expiration called on {} but crash processing already in 
> progress", serverName);
>   return false;
> }
> {code}
> But the problem is when master finish initialization, it will delete all 
> stale servers from dead server list. Thus when the SCP for ZK session timeout 
> come in, the barrier is already removed.
> Here is the logs that how this problem occur.
> {code}
> 2018-12-07,11:42:37,589 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=9, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> 2018-12-07,11:42:58,007 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=444, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> {code}
> Now we can see two SCP are scheduled for the same server.
> But the first procedure is finished after the second SCP starts.
> {code}
> 2018-12-07,11:43:08,038 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=9, 
> state=SUCCESS, hasLock=false; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false 
> in 30.5340sec
> {code}
> Thus it will leads the problem that regions will be assigned twice.
> {code}
> 2018-12-07,12:16:33,039 WARN 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, 
> location=c4-hadoop-tst-st28.bj,29100,1544154149607, table=test_failover, 
> region=459b3130b40caf3b8f3e1421766f4089 reported OPEN on 
> server=c4-hadoop-tst-st29.bj,29100,1544154149615 but state has otherwise
> {code}
> And here we can see the server is removed from dead server list before the 
> second SCP starts.
> {code}
> 2018-12-07,11:42:44,938 DEBUG org.apache.hadoop.hbase.master.DeadServer: 
> Removed c4-hadoop-tst-st27.bj,29100,1544153846859 ; numProcessing=3
> {code}
> Thus we should not delete dead server from dead server list immediately.
> Patch to fix this problem will be upload later.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

2018-12-16 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722713#comment-16722713
 ] 

Hadoop QA commented on HBASE-21565:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
18s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
40s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 59s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}149m 
34s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}192m 56s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21565 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12951979/HBASE-21565.master.007.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 1ddeadaa93c6 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 
31 10:55:11 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / ac0b3bb547 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15298/testReport/ |
| Max. process+thread count | 4362 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15298/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Delete dead

[jira] [Updated] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

2018-12-16 Thread Jingyun Tian (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-21565:
-
Attachment: HBASE-21565.master.008.patch

> Delete dead server from dead server list too early leads to concurrent Server 
> Crash Procedures(SCP) for a same server
> -
>
> Key: HBASE-21565
> URL: https://issues.apache.org/jira/browse/HBASE-21565
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Critical
> Attachments: HBASE-21565.master.001.patch, 
> HBASE-21565.master.002.patch, HBASE-21565.master.003.patch, 
> HBASE-21565.master.004.patch, HBASE-21565.master.005.patch, 
> HBASE-21565.master.006.patch, HBASE-21565.master.007.patch, 
> HBASE-21565.master.008.patch
>
>
> There are 2 kinds of SCP for a same server will be scheduled during cluster 
> restart, one is ZK session timeout, the other one is new server report in 
> will cause the stale one do fail over. The only barrier for these 2 kinds of 
> SCP is check if the server is in the dead server list.
> {code}
> if (this.deadservers.isDeadServer(serverName)) {
>   LOG.warn("Expiration called on {} but crash processing already in 
> progress", serverName);
>   return false;
> }
> {code}
> But the problem is when master finish initialization, it will delete all 
> stale servers from dead server list. Thus when the SCP for ZK session timeout 
> come in, the barrier is already removed.
> Here is the logs that how this problem occur.
> {code}
> 2018-12-07,11:42:37,589 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=9, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> 2018-12-07,11:42:58,007 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=444, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> {code}
> Now we can see two SCP are scheduled for the same server.
> But the first procedure is finished after the second SCP starts.
> {code}
> 2018-12-07,11:43:08,038 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=9, 
> state=SUCCESS, hasLock=false; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false 
> in 30.5340sec
> {code}
> Thus it will leads the problem that regions will be assigned twice.
> {code}
> 2018-12-07,12:16:33,039 WARN 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, 
> location=c4-hadoop-tst-st28.bj,29100,1544154149607, table=test_failover, 
> region=459b3130b40caf3b8f3e1421766f4089 reported OPEN on 
> server=c4-hadoop-tst-st29.bj,29100,1544154149615 but state has otherwise
> {code}
> And here we can see the server is removed from dead server list before the 
> second SCP starts.
> {code}
> 2018-12-07,11:42:44,938 DEBUG org.apache.hadoop.hbase.master.DeadServer: 
> Removed c4-hadoop-tst-st27.bj,29100,1544153846859 ; numProcessing=3
> {code}
> Thus we should not delete dead server from dead server list immediately.
> Patch to fix this problem will be upload later.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21514) Refactor CacheConfig

2018-12-16 Thread Duo Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722700#comment-16722700
 ] 

Duo Zhang commented on HBASE-21514:
---

+1.

> Refactor CacheConfig
> 
>
> Key: HBASE-21514
> URL: https://issues.apache.org/jira/browse/HBASE-21514
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21514.master.001.patch, 
> HBASE-21514.master.002.patch, HBASE-21514.master.003.patch, 
> HBASE-21514.master.004.patch, HBASE-21514.master.005.patch, 
> HBASE-21514.master.006.patch, HBASE-21514.master.007.patch, 
> HBASE-21514.master.008.patch, HBASE-21514.master.009.patch, 
> HBASE-21514.master.010.patch, HBASE-21514.master.011.patch, 
> HBASE-21514.master.011.patch, HBASE-21514.master.012.patch, 
> HBASE-21514.master.013.patch, HBASE-21514.master.013.patch, 
> HBASE-21514.master.014.patch
>
>
> # Add block cache and mob file cache to HRegionServer's member variable. One 
> rs has one block cache and one mob file cache.
>  # Move the global cache instances from CacheConfig to BlockCacheFactory. 
> Only keep config stuff in CacheConfig. And the CacheConfig still have a 
> reference to the RegionServer's block cache. Whether to cache a block need 
> block cache is present and the related config is true.
>  # Remove MobCacheCofnig. It only used for the global mob file cache 
> instance. After move the mob file cache to RegionServer. It is not used 
> anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21019) Use StealJobQueue to better utilize handlers in MasterFifoRpcScheduler

2018-12-16 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722698#comment-16722698
 ] 

Hadoop QA commented on HBASE-21019:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
11s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
11s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 21s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}146m 
15s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}186m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21019 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12951976/HBASE-21019.master.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux c6344568eafc 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 
31 10:55:11 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / ac0b3bb547 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15297/testReport/ |
| Max. process+thread count | 4798 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output |

[jira] [Commented] (HBASE-21514) Refactor CacheConfig

2018-12-16 Thread Guanghao Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722696#comment-16722696
 ] 

Guanghao Zhang commented on HBASE-21514:


Rebased and add 014 patch. Any more comments? [~anoop.hbase] [~stack] [~Apache9]

> Refactor CacheConfig
> 
>
> Key: HBASE-21514
> URL: https://issues.apache.org/jira/browse/HBASE-21514
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21514.master.001.patch, 
> HBASE-21514.master.002.patch, HBASE-21514.master.003.patch, 
> HBASE-21514.master.004.patch, HBASE-21514.master.005.patch, 
> HBASE-21514.master.006.patch, HBASE-21514.master.007.patch, 
> HBASE-21514.master.008.patch, HBASE-21514.master.009.patch, 
> HBASE-21514.master.010.patch, HBASE-21514.master.011.patch, 
> HBASE-21514.master.011.patch, HBASE-21514.master.012.patch, 
> HBASE-21514.master.013.patch, HBASE-21514.master.013.patch, 
> HBASE-21514.master.014.patch
>
>
> # Add block cache and mob file cache to HRegionServer's member variable. One 
> rs has one block cache and one mob file cache.
>  # Move the global cache instances from CacheConfig to BlockCacheFactory. 
> Only keep config stuff in CacheConfig. And the CacheConfig still have a 
> reference to the RegionServer's block cache. Whether to cache a block need 
> block cache is present and the related config is true.
>  # Remove MobCacheCofnig. It only used for the global mob file cache 
> instance. After move the mob file cache to RegionServer. It is not used 
> anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21514) Refactor CacheConfig

2018-12-16 Thread Guanghao Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21514:
---
Attachment: HBASE-21514.master.014.patch

> Refactor CacheConfig
> 
>
> Key: HBASE-21514
> URL: https://issues.apache.org/jira/browse/HBASE-21514
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21514.master.001.patch, 
> HBASE-21514.master.002.patch, HBASE-21514.master.003.patch, 
> HBASE-21514.master.004.patch, HBASE-21514.master.005.patch, 
> HBASE-21514.master.006.patch, HBASE-21514.master.007.patch, 
> HBASE-21514.master.008.patch, HBASE-21514.master.009.patch, 
> HBASE-21514.master.010.patch, HBASE-21514.master.011.patch, 
> HBASE-21514.master.011.patch, HBASE-21514.master.012.patch, 
> HBASE-21514.master.013.patch, HBASE-21514.master.013.patch, 
> HBASE-21514.master.014.patch
>
>
> # Add block cache and mob file cache to HRegionServer's member variable. One 
> rs has one block cache and one mob file cache.
>  # Move the global cache instances from CacheConfig to BlockCacheFactory. 
> Only keep config stuff in CacheConfig. And the CacheConfig still have a 
> reference to the RegionServer's block cache. Whether to cache a block need 
> block cache is present and the related config is true.
>  # Remove MobCacheCofnig. It only used for the global mob file cache 
> instance. After move the mob file cache to RegionServer. It is not used 
> anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21592) quota.addGetResult(r) throw NPE

2018-12-16 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722678#comment-16722678
 ] 

Hadoop QA commented on HBASE-21592:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
50s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
53s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
48s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 17s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}129m  
3s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}165m 19s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21592 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12951973/HBASE-21592.master.0003.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux a8d75e11a14f 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / ac0b3bb547 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15294/testReport/ |
| Max. process+thread count | 4675 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15294/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> quota.addGetResult(r)

[jira] [Updated] (HBASE-21583) Insert info Level0 instead of throwing Exceptions when doing bulkload on tables with StripeCompaction enabled

2018-12-16 Thread chenxu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenxu updated HBASE-21583:
---
Attachment: HBASE-21583-master-v3.patch

> Insert info Level0 instead of throwing Exceptions when doing bulkload on 
> tables with StripeCompaction enabled
> -
>
> Key: HBASE-21583
> URL: https://issues.apache.org/jira/browse/HBASE-21583
> Project: HBase
>  Issue Type: Bug
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Attachments: HBASE-21583-master-v1.patch, 
> HBASE-21583-master-v2.patch, HBASE-21583-master-v3.patch
>
>
> Table-A is an existing table that enable StripeCompaction，when create 
> Table-B(also enable StripeCompaction) and bulkload data copy from Table-A，the 
> following exception will occur.
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=1, exceptions:
> Mon Dec 10 21:26:34 CST 2018, 
> RpcRetryingCaller{globalStartTime=158389041, pause=100, retries=1}, 
> java.io.IOException: java.io.IOException: Newly created stripes do not cover 
> the entire key space.
> at 
> org.apache.hadoop.hbase.regionserver.StripeStoreFileManager$CompactionOrFlushMergeCopy.processNewCandidateStripes(StripeStoreFileManager.java:886)
> at 
> org.apache.hadoop.hbase.regionserver.StripeStoreFileManager$CompactionOrFlushMergeCopy.mergeResults(StripeStoreFileManager.java:717)
> at 
> org.apache.hadoop.hbase.regionserver.StripeStoreFileManager$CompactionOrFlushMergeCopy.access$100(StripeStoreFileManager.java:690)
> at 
> org.apache.hadoop.hbase.regionserver.StripeStoreFileManager.insertNewFiles(StripeStoreFileManager.java:151)
> at org.apache.hadoop.hbase.regionserver.HStore.bulkLoadHFile(HStore.java:891)
> at org.apache.hadoop.hbase.regionserver.HStore.bulkLoadHFile(HStore.java:869)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:5524)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:1980)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33650)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2208)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:116)
> at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:135)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:110)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

2018-12-16 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722666#comment-16722666
 ] 

stack commented on HBASE-21565:
---

+1 on .007 after addressing [~zghaobac] 's comments. Should go to branch-2.0+? 
Thanks [~tianjingyun]

> Delete dead server from dead server list too early leads to concurrent Server 
> Crash Procedures(SCP) for a same server
> -
>
> Key: HBASE-21565
> URL: https://issues.apache.org/jira/browse/HBASE-21565
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Critical
> Attachments: HBASE-21565.master.001.patch, 
> HBASE-21565.master.002.patch, HBASE-21565.master.003.patch, 
> HBASE-21565.master.004.patch, HBASE-21565.master.005.patch, 
> HBASE-21565.master.006.patch, HBASE-21565.master.007.patch
>
>
> There are 2 kinds of SCP for a same server will be scheduled during cluster 
> restart, one is ZK session timeout, the other one is new server report in 
> will cause the stale one do fail over. The only barrier for these 2 kinds of 
> SCP is check if the server is in the dead server list.
> {code}
> if (this.deadservers.isDeadServer(serverName)) {
>   LOG.warn("Expiration called on {} but crash processing already in 
> progress", serverName);
>   return false;
> }
> {code}
> But the problem is when master finish initialization, it will delete all 
> stale servers from dead server list. Thus when the SCP for ZK session timeout 
> come in, the barrier is already removed.
> Here is the logs that how this problem occur.
> {code}
> 2018-12-07,11:42:37,589 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=9, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> 2018-12-07,11:42:58,007 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=444, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> {code}
> Now we can see two SCP are scheduled for the same server.
> But the first procedure is finished after the second SCP starts.
> {code}
> 2018-12-07,11:43:08,038 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=9, 
> state=SUCCESS, hasLock=false; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false 
> in 30.5340sec
> {code}
> Thus it will leads the problem that regions will be assigned twice.
> {code}
> 2018-12-07,12:16:33,039 WARN 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, 
> location=c4-hadoop-tst-st28.bj,29100,1544154149607, table=test_failover, 
> region=459b3130b40caf3b8f3e1421766f4089 reported OPEN on 
> server=c4-hadoop-tst-st29.bj,29100,1544154149615 but state has otherwise
> {code}
> And here we can see the server is removed from dead server list before the 
> second SCP starts.
> {code}
> 2018-12-07,11:42:44,938 DEBUG org.apache.hadoop.hbase.master.DeadServer: 
> Removed c4-hadoop-tst-st27.bj,29100,1544153846859 ; numProcessing=3
> {code}
> Thus we should not delete dead server from dead server list immediately.
> Patch to fix this problem will be upload later.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

2018-12-16 Thread Guanghao Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722665#comment-16722665
 ] 

Guanghao Zhang commented on HBASE-21565:


Add assert check for ServerNode in testClusterRestartFailOver? check ServerNode 
exists before kill, check ServerNode state is not online when there is a SCP, 
check ServerNode not exists after SCP finished.

+1 for others.

> Delete dead server from dead server list too early leads to concurrent Server 
> Crash Procedures(SCP) for a same server
> -
>
> Key: HBASE-21565
> URL: https://issues.apache.org/jira/browse/HBASE-21565
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Critical
> Attachments: HBASE-21565.master.001.patch, 
> HBASE-21565.master.002.patch, HBASE-21565.master.003.patch, 
> HBASE-21565.master.004.patch, HBASE-21565.master.005.patch, 
> HBASE-21565.master.006.patch, HBASE-21565.master.007.patch
>
>
> There are 2 kinds of SCP for a same server will be scheduled during cluster 
> restart, one is ZK session timeout, the other one is new server report in 
> will cause the stale one do fail over. The only barrier for these 2 kinds of 
> SCP is check if the server is in the dead server list.
> {code}
> if (this.deadservers.isDeadServer(serverName)) {
>   LOG.warn("Expiration called on {} but crash processing already in 
> progress", serverName);
>   return false;
> }
> {code}
> But the problem is when master finish initialization, it will delete all 
> stale servers from dead server list. Thus when the SCP for ZK session timeout 
> come in, the barrier is already removed.
> Here is the logs that how this problem occur.
> {code}
> 2018-12-07,11:42:37,589 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=9, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> 2018-12-07,11:42:58,007 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=444, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> {code}
> Now we can see two SCP are scheduled for the same server.
> But the first procedure is finished after the second SCP starts.
> {code}
> 2018-12-07,11:43:08,038 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=9, 
> state=SUCCESS, hasLock=false; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false 
> in 30.5340sec
> {code}
> Thus it will leads the problem that regions will be assigned twice.
> {code}
> 2018-12-07,12:16:33,039 WARN 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, 
> location=c4-hadoop-tst-st28.bj,29100,1544154149607, table=test_failover, 
> region=459b3130b40caf3b8f3e1421766f4089 reported OPEN on 
> server=c4-hadoop-tst-st29.bj,29100,1544154149615 but state has otherwise
> {code}
> And here we can see the server is removed from dead server list before the 
> second SCP starts.
> {code}
> 2018-12-07,11:42:44,938 DEBUG org.apache.hadoop.hbase.master.DeadServer: 
> Removed c4-hadoop-tst-st27.bj,29100,1544153846859 ; numProcessing=3
> {code}
> Thus we should not delete dead server from dead server list immediately.
> Patch to fix this problem will be upload later.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

2018-12-16 Thread Guanghao Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722665#comment-16722665
 ] 

Guanghao Zhang edited comment on HBASE-21565 at 12/17/18 4:12 AM:
--

Add assert check for ServerNode in testClusterRestartFailOver? check ServerNode 
exists before kill, check ServerNode state is not online when there is a SCP, 
check ServerNode not exists after SCP finished. As the SCP will relay on 
ServerNode, it will be safe to add more assert tests for ServerNode.

+1 for others.


was (Author: zghaobac):
Add assert check for ServerNode in testClusterRestartFailOver? check ServerNode 
exists before kill, check ServerNode state is not online when there is a SCP, 
check ServerNode not exists after SCP finished.

+1 for others.

> Delete dead server from dead server list too early leads to concurrent Server 
> Crash Procedures(SCP) for a same server
> -
>
> Key: HBASE-21565
> URL: https://issues.apache.org/jira/browse/HBASE-21565
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Critical
> Attachments: HBASE-21565.master.001.patch, 
> HBASE-21565.master.002.patch, HBASE-21565.master.003.patch, 
> HBASE-21565.master.004.patch, HBASE-21565.master.005.patch, 
> HBASE-21565.master.006.patch, HBASE-21565.master.007.patch
>
>
> There are 2 kinds of SCP for a same server will be scheduled during cluster 
> restart, one is ZK session timeout, the other one is new server report in 
> will cause the stale one do fail over. The only barrier for these 2 kinds of 
> SCP is check if the server is in the dead server list.
> {code}
> if (this.deadservers.isDeadServer(serverName)) {
>   LOG.warn("Expiration called on {} but crash processing already in 
> progress", serverName);
>   return false;
> }
> {code}
> But the problem is when master finish initialization, it will delete all 
> stale servers from dead server list. Thus when the SCP for ZK session timeout 
> come in, the barrier is already removed.
> Here is the logs that how this problem occur.
> {code}
> 2018-12-07,11:42:37,589 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=9, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> 2018-12-07,11:42:58,007 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=444, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> {code}
> Now we can see two SCP are scheduled for the same server.
> But the first procedure is finished after the second SCP starts.
> {code}
> 2018-12-07,11:43:08,038 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=9, 
> state=SUCCESS, hasLock=false; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false 
> in 30.5340sec
> {code}
> Thus it will leads the problem that regions will be assigned twice.
> {code}
> 2018-12-07,12:16:33,039 WARN 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, 
> location=c4-hadoop-tst-st28.bj,29100,1544154149607, table=test_failover, 
> region=459b3130b40caf3b8f3e1421766f4089 reported OPEN on 
> server=c4-hadoop-tst-st29.bj,29100,1544154149615 but state has otherwise
> {code}
> And here we can see the server is removed from dead server list before the 
> second SCP starts.
> {code}
> 2018-12-07,11:42:44,938 DEBUG org.apache.hadoop.hbase.master.DeadServer: 
> Removed c4-hadoop-tst-st27.bj,29100,1544153846859 ; numProcessing=3
> {code}
> Thus we should not delete dead server from dead server list immediately.
> Patch to fix this problem will be upload later.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

2018-12-16 Thread Jingyun Tian (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-21565:
-
Attachment: HBASE-21565.master.007.patch

> Delete dead server from dead server list too early leads to concurrent Server 
> Crash Procedures(SCP) for a same server
> -
>
> Key: HBASE-21565
> URL: https://issues.apache.org/jira/browse/HBASE-21565
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Critical
> Attachments: HBASE-21565.master.001.patch, 
> HBASE-21565.master.002.patch, HBASE-21565.master.003.patch, 
> HBASE-21565.master.004.patch, HBASE-21565.master.005.patch, 
> HBASE-21565.master.006.patch, HBASE-21565.master.007.patch
>
>
> There are 2 kinds of SCP for a same server will be scheduled during cluster 
> restart, one is ZK session timeout, the other one is new server report in 
> will cause the stale one do fail over. The only barrier for these 2 kinds of 
> SCP is check if the server is in the dead server list.
> {code}
> if (this.deadservers.isDeadServer(serverName)) {
>   LOG.warn("Expiration called on {} but crash processing already in 
> progress", serverName);
>   return false;
> }
> {code}
> But the problem is when master finish initialization, it will delete all 
> stale servers from dead server list. Thus when the SCP for ZK session timeout 
> come in, the barrier is already removed.
> Here is the logs that how this problem occur.
> {code}
> 2018-12-07,11:42:37,589 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=9, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> 2018-12-07,11:42:58,007 INFO 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=444, 
> state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false
> {code}
> Now we can see two SCP are scheduled for the same server.
> But the first procedure is finished after the second SCP starts.
> {code}
> 2018-12-07,11:43:08,038 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=9, 
> state=SUCCESS, hasLock=false; ServerCrashProcedure 
> server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false 
> in 30.5340sec
> {code}
> Thus it will leads the problem that regions will be assigned twice.
> {code}
> 2018-12-07,12:16:33,039 WARN 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, 
> location=c4-hadoop-tst-st28.bj,29100,1544154149607, table=test_failover, 
> region=459b3130b40caf3b8f3e1421766f4089 reported OPEN on 
> server=c4-hadoop-tst-st29.bj,29100,1544154149615 but state has otherwise
> {code}
> And here we can see the server is removed from dead server list before the 
> second SCP starts.
> {code}
> 2018-12-07,11:42:44,938 DEBUG org.apache.hadoop.hbase.master.DeadServer: 
> Removed c4-hadoop-tst-st27.bj,29100,1544153846859 ; numProcessing=3
> {code}
> Thus we should not delete dead server from dead server list immediately.
> Patch to fix this problem will be upload later.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21019) Use StealJobQueue to better utilize handlers in MasterFifoRpcScheduler

2018-12-16 Thread Guanghao Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21019:
---
Attachment: HBASE-21019.master.001.patch

> Use StealJobQueue to better utilize handlers in MasterFifoRpcScheduler
> --
>
> Key: HBASE-21019
> URL: https://issues.apache.org/jira/browse/HBASE-21019
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Yi Mei
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21019.master.001.patch, 
> HBASE-21019.master.001.patch, HBASE-21019.master.001.patch
>
>
> In HBASE-20965, we propose MasterFifoRpcScheduler to handle RSReport requests 
> in independent handlers, other requests are handled in other handlers.
> To better utilize handlers, we can use StealJobQueue to allow other handlers 
> to steal jobs from RSReport handlers when there is a few RSReport handlers.
> And it's the same for SimpleRpcScheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21019) Use StealJobQueue to better utilize handlers in MasterFifoRpcScheduler

2018-12-16 Thread Guanghao Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21019:
---
Attachment: (was: HBASE-21578.master.002.patch)

> Use StealJobQueue to better utilize handlers in MasterFifoRpcScheduler
> --
>
> Key: HBASE-21019
> URL: https://issues.apache.org/jira/browse/HBASE-21019
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Yi Mei
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21019.master.001.patch, 
> HBASE-21019.master.001.patch, HBASE-21019.master.001.patch
>
>
> In HBASE-20965, we propose MasterFifoRpcScheduler to handle RSReport requests 
> in independent handlers, other requests are handled in other handlers.
> To better utilize handlers, we can use StealJobQueue to allow other handlers 
> to steal jobs from RSReport handlers when there is a few RSReport handlers.
> And it's the same for SimpleRpcScheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21019) Use StealJobQueue to better utilize handlers in MasterFifoRpcScheduler

2018-12-16 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722645#comment-16722645
 ] 

Hadoop QA commented on HBASE-21019:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} HBASE-21019 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.8.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-21019 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12951975/HBASE-21578.master.002.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15296/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Use StealJobQueue to better utilize handlers in MasterFifoRpcScheduler
> --
>
> Key: HBASE-21019
> URL: https://issues.apache.org/jira/browse/HBASE-21019
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Yi Mei
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21019.master.001.patch, 
> HBASE-21019.master.001.patch, HBASE-21578.master.002.patch
>
>
> In HBASE-20965, we propose MasterFifoRpcScheduler to handle RSReport requests 
> in independent handlers, other requests are handled in other handlers.
> To better utilize handlers, we can use StealJobQueue to allow other handlers 
> to steal jobs from RSReport handlers when there is a few RSReport handlers.
> And it's the same for SimpleRpcScheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21019) Use StealJobQueue to better utilize handlers in MasterFifoRpcScheduler

2018-12-16 Thread Guanghao Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21019:
---
Attachment: HBASE-21578.master.002.patch

> Use StealJobQueue to better utilize handlers in MasterFifoRpcScheduler
> --
>
> Key: HBASE-21019
> URL: https://issues.apache.org/jira/browse/HBASE-21019
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Yi Mei
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21019.master.001.patch, 
> HBASE-21019.master.001.patch, HBASE-21578.master.002.patch
>
>
> In HBASE-20965, we propose MasterFifoRpcScheduler to handle RSReport requests 
> in independent handlers, other requests are handled in other handlers.
> To better utilize handlers, we can use StealJobQueue to allow other handlers 
> to steal jobs from RSReport handlers when there is a few RSReport handlers.
> And it's the same for SimpleRpcScheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21514) Refactor CacheConfig

2018-12-16 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722641#comment-16722641
 ] 

Hadoop QA commented on HBASE-21514:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} HBASE-21514 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.8.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-21514 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12951974/HBASE-21514.master.013.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15295/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Refactor CacheConfig
> 
>
> Key: HBASE-21514
> URL: https://issues.apache.org/jira/browse/HBASE-21514
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21514.master.001.patch, 
> HBASE-21514.master.002.patch, HBASE-21514.master.003.patch, 
> HBASE-21514.master.004.patch, HBASE-21514.master.005.patch, 
> HBASE-21514.master.006.patch, HBASE-21514.master.007.patch, 
> HBASE-21514.master.008.patch, HBASE-21514.master.009.patch, 
> HBASE-21514.master.010.patch, HBASE-21514.master.011.patch, 
> HBASE-21514.master.011.patch, HBASE-21514.master.012.patch, 
> HBASE-21514.master.013.patch, HBASE-21514.master.013.patch
>
>
> # Add block cache and mob file cache to HRegionServer's member variable. One 
> rs has one block cache and one mob file cache.
>  # Move the global cache instances from CacheConfig to BlockCacheFactory. 
> Only keep config stuff in CacheConfig. And the CacheConfig still have a 
> reference to the RegionServer's block cache. Whether to cache a block need 
> block cache is present and the related config is true.
>  # Remove MobCacheCofnig. It only used for the global mob file cache 
> instance. After move the mob file cache to RegionServer. It is not used 
> anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21514) Refactor CacheConfig

2018-12-16 Thread Guanghao Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21514:
---
Attachment: HBASE-21514.master.013.patch

> Refactor CacheConfig
> 
>
> Key: HBASE-21514
> URL: https://issues.apache.org/jira/browse/HBASE-21514
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21514.master.001.patch, 
> HBASE-21514.master.002.patch, HBASE-21514.master.003.patch, 
> HBASE-21514.master.004.patch, HBASE-21514.master.005.patch, 
> HBASE-21514.master.006.patch, HBASE-21514.master.007.patch, 
> HBASE-21514.master.008.patch, HBASE-21514.master.009.patch, 
> HBASE-21514.master.010.patch, HBASE-21514.master.011.patch, 
> HBASE-21514.master.011.patch, HBASE-21514.master.012.patch, 
> HBASE-21514.master.013.patch, HBASE-21514.master.013.patch
>
>
> # Add block cache and mob file cache to HRegionServer's member variable. One 
> rs has one block cache and one mob file cache.
>  # Move the global cache instances from CacheConfig to BlockCacheFactory. 
> Only keep config stuff in CacheConfig. And the CacheConfig still have a 
> reference to the RegionServer's block cache. Whether to cache a block need 
> block cache is present and the related config is true.
>  # Remove MobCacheCofnig. It only used for the global mob file cache 
> instance. After move the mob file cache to RegionServer. It is not used 
> anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21592) quota.addGetResult(r) throw NPE

2018-12-16 Thread xuqinya (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuqinya updated HBASE-21592:

Attachment: HBASE-21592.master.0003.patch

> quota.addGetResult(r)  throw  NPE
> -
>
> Key: HBASE-21592
> URL: https://issues.apache.org/jira/browse/HBASE-21592
> Project: HBase
>  Issue Type: Bug
>Reporter: xuqinya
>Assignee: xuqinya
>Priority: Major
> Attachments: HBASE-21592.master.0001.patch, 
> HBASE-21592.master.0002.patch, HBASE-21592.master.0003.patch
>
>
> Setting the RPC quota, table.exists(Get) cause quota.addGetResult(r)  throw  
> NPE.
> {code:java}
> set_quota TYPE => THROTTLE, NAMESPACE => 'ns1', LIMIT => '1000req/sec'
> {code}
> {code:java}
> Connection conn = ConnectionFactory.createConnection(config);
> Table htable = conn.getTable(TableName.valueOf("ns1:t1"));
> boolean exists = htable.exists(new Get(Bytes.toBytes("123"))); {code}
> log:
> java.io.IOException: java.io.IOException
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2183)
>  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>  at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>  at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>  at java.lang.Thread.run(Thread.java:745)
>  Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.hbase.quotas.QuotaUtil.calculateResultSize(QuotaUtil.java:282)
>  at 
> org.apache.hadoop.hbase.quotas.DefaultOperationQuota.addGetResult(DefaultOperationQuota.java:99)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1907)
>  at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32381)
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2135)
>  ... 4 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (HBASE-21608) Having RPC quota, Not completely deleted SPACE quota

2018-12-16 Thread Sakthi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sakthi resolved HBASE-21608.

Resolution: Duplicate

> Having RPC quota, Not completely deleted SPACE quota
> 
>
> Key: HBASE-21608
> URL: https://issues.apache.org/jira/browse/HBASE-21608
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.2
>Reporter: xuqinya
>Assignee: xuqinya
>Priority: Major
> Attachments: HBASE-21608-branch-2.0001.patch
>
>
> When RPC quota and SPACE quota are set , only the SPACE quota is cancelled, 
> and the SPACE quota information cannot be deleted completely in hbase:quota. 
> And new quotas cannot be updated.
> {code:java}
> set_quota TYPE => SPACE, NAMESPACE => 'ns1', LIMIT => '50T', POLICY => 
> NO_WRITES
> Took 0.0288 seconds
> set_quota TYPE => THROTTLE, NAMESPACE => 'ns1', LIMIT => '10req/sec'
> Took 0.0238 seconds
> list_quotas
> OWNER  QUOTAS
>  NAMESPACE => ns1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_NUMBER, LIMIT => 
> 10req/sec, SCOPE => MACHINE
>  NAMESPACE => ns1 TYPE => SPACE, NAMESPACE => ns1, LIMIT => 54975581388800, 
> VIOLATION_POLICY => NO_WRITES
> 2 row(s)
> Took 0.0349 seconds
> set_quota TYPE => SPACE, NAMESPACE => 'ns1', LIMIT => NONE
> Took 0.0169 seconds
> list_quotas
> OWNER  QUOTAS
>  NAMESPACE => ns1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_NUMBER, LIMIT => 
> 10req/sec, SCOPE => MACHINE
>  NAMESPACE => ns1 TYPE => SPACE, NAMESPACE => ns1, REMOVE => true
> 2 row(s)
> Took 0.0407 seconds
> list_quotas
> OWNER  QUOTAS
>  NAMESPACE => ns1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_NUMBER, LIMIT => 
> 10req/sec, SCOPE => MACHINE
>  NAMESPACE => ns1 TYPE => SPACE, NAMESPACE => ns1, REMOVE => true
> 2 row(s)
> Took 0.0313 seconds
> set_quota TYPE => SPACE, NAMESPACE => 'ns1', LIMIT => '50T', POLICY => 
> NO_WRITES
> Took 0.0086 seconds
> list_quotas
> OWNER  QUOTAS
>  NAMESPACE => ns1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_NUMBER, LIMIT => 
> 10req/sec, SCOPE => MACHINE
>  NAMESPACE => ns1 TYPE => SPACE, NAMESPACE => ns1, REMOVE => true
> 2 row(s)
> Took 0.0303 seconds
> {code}
> When we remove the SPACE quota,We should delete it in hbase :quota.
>  As follows:
> {code:java}
> set_quota TYPE => SPACE, NAMESPACE => 'ns1', LIMIT => '1T', POLICY => 
> NO_WRITES
> Took 0.0186 seconds
> set_quota TYPE => THROTTLE, NAMESPACE => 'ns1', LIMIT => '10req/sec'
> Took 0.0191 seconds
> list_quotas
> OWNER  QUOTAS
>  NAMESPACE => ns1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_NUMBER, LIMIT => 
> 10req/sec, SCOPE => MACHINE
>  NAMESPACE => ns1 TYPE => SPACE, NAMESPACE => ns1, LIMIT => 1099511627776, 
> VIOLATION_POLICY => NO_WRITES
> 2 row(s)
> Took 0.0332 seconds
> set_quota TYPE => SPACE, NAMESPACE => 'ns1', LIMIT => NONE
> Took 0.0131 seconds
> list_quotas
> OWNER  QUOTAS
>  NAMESPACE => ns1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_NUMBER, LIMIT => 
> 10req/sec, SCOPE => MACHINE
> 1 row(s)
> Took 0.0306 seconds
> set_quota TYPE => SPACE, NAMESPACE => 'ns1', LIMIT => '50T', POLICY => 
> NO_WRITES
> Took 0.0183 seconds
> list_quotas
> OWNER  QUOTAS
>  NAMESPACE => ns1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_NUMBER, LIMIT => 
> 10req/sec, SCOPE => MACHINE
>  NAMESPACE => ns1 TYPE => SPACE, NAMESPACE => ns1, LIMIT => 54975581388800, 
> VIOLATION_POLICY => NO_WRITES
> 2 row(s)
> Took 0.0332 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21589) TestCleanupMetaWAL fails

2018-12-16 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722626#comment-16722626
 ] 

Hadoop QA commented on HBASE-21589:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 9s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
49s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
38s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
20s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
45s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 12s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}116m 
54s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}152m 56s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 |
| JIRA Issue | HBASE-21589 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12951966/HBASE-21589.branch-2.0.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 25dcbedb98cf 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.0 / 3180a6864a |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15293/testReport/ |
| Max. process+thread count | 4275 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15293/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically

[jira] [Updated] (HBASE-21589) TestCleanupMetaWAL fails

2018-12-16 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21589:
--
Assignee: stack
  Status: Patch Available  (was: Open)

> TestCleanupMetaWAL fails
> 
>
> Key: HBASE-21589
> URL: https://issues.apache.org/jira/browse/HBASE-21589
> Project: HBase
>  Issue Type: Bug
>  Components: test, wal
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 2.1.2, 2.0.4
>
> Attachments: HBASE-21589.branch-2.0.001.patch, 
> org.apache.hadoop.hbase.regionserver.TestCleanupMetaWAL-output.txt
>
>
> This test fails near all-the-time. Sunk two RCs. Fix. Made it a blocker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21589) TestCleanupMetaWAL fails

2018-12-16 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722595#comment-16722595
 ] 

stack commented on HBASE-21589:
---

.001 Changes the wait on the SCP finish from 10 seconds to 30 seconds. Odd is 
that in studying runs, it doesn't take 10 seconds to complete the SCP. I could 
see the test pass on occasion when it so happened that of the two RS in the 
test, after the move of the meta, all regions were on the RS that was NOT 
killed by the test but otherwise the waitFor on 10 seconds would fail.

> TestCleanupMetaWAL fails
> 
>
> Key: HBASE-21589
> URL: https://issues.apache.org/jira/browse/HBASE-21589
> Project: HBase
>  Issue Type: Bug
>  Components: test, wal
>Reporter: stack
>Priority: Blocker
> Fix For: 2.1.2, 2.0.4
>
> Attachments: HBASE-21589.branch-2.0.001.patch, 
> org.apache.hadoop.hbase.regionserver.TestCleanupMetaWAL-output.txt
>
>
> This test fails near all-the-time. Sunk two RCs. Fix. Made it a blocker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21589) TestCleanupMetaWAL fails

2018-12-16 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21589:
--
Attachment: HBASE-21589.branch-2.0.001.patch

> TestCleanupMetaWAL fails
> 
>
> Key: HBASE-21589
> URL: https://issues.apache.org/jira/browse/HBASE-21589
> Project: HBase
>  Issue Type: Bug
>  Components: test, wal
>Reporter: stack
>Priority: Blocker
> Fix For: 2.1.2, 2.0.4
>
> Attachments: HBASE-21589.branch-2.0.001.patch, 
> org.apache.hadoop.hbase.regionserver.TestCleanupMetaWAL-output.txt
>
>
> This test fails near all-the-time. Sunk two RCs. Fix. Made it a blocker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21519) Namespace region is never assigned in a HM failover scenario and HM abort always due to init timeout

2018-12-16 Thread Pankaj Kumar (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722527#comment-16722527
 ] 

Pankaj Kumar commented on HBASE-21519:
--

Here the problem is when multiwal is enabled. During RegionGroupingProvider 
init, providerId will be set as the server name always. But during Meta region 
open, WAL group will be selected based on below condition in 
RegionGroupingProvider,

https://github.com/apache/hbase/blob/3180a6864aa6a019fc1ec21ae78f009475bacea1/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/RegionGroupingProvider.java#L192

Since providerId is set as server name in case of RegionGroupingProvider, so 
WAL group wont be META_WAL_GROUP_NAME (meta). Meta WAL will be created as a 
normal WAL (without extension .meta). And during meta SCP we split log based on 
the filter (meta), so it wont split the META file and can't reassign namespace 
region since as per meta (based on Hfile) it was on a dead server and already 
SCP finished.

In the attached patch, WAL group will be init based on the region nto 
providerId. UT also attached to reproduce the problem.

Please correct me if I missed something.

> Namespace region is never assigned in a HM failover scenario and HM abort 
> always due to init timeout
> 
>
> Key: HBASE-21519
> URL: https://issues.apache.org/jira/browse/HBASE-21519
> Project: HBase
>  Issue Type: Bug
>  Components: master, wal
>Affects Versions: 2.1.1
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HBASE-21519.branch-2.patch
>
>
> In our test env we found that namespace region is never be assigned on HM 
> failover scenario when multiwal feature is enabled,
> {noformat}
> 2018-11-28 01:38:28,085 WARN [master/HM-1:16000:becomeActiveMaster] 
> master.HMaster: 
> hbase:namespace,,1543339859614.31f6d3383af09e18e1e81ca02a93de15. is NOT 
> online; state=\{31f6d3383af09e18e1e81ca02a93de15 state=OPEN, 
> ts=1543340156928, server=RS-2,16020,1543339824397}; 
> ServerCrashProcedures=false. Master startup cannot progress, in 
> holding-pattern until region onlined.
> {noformat}
> And finally HM abort with following error,
> {noformat}
> 2018-11-28 01:39:16,858 ERROR 
> [ActiveMasterInitializationMonitor-1543338648565] master.HMaster: Master 
> failed to complete initialization after 24ms. Please consider submitting 
> a bug report including a thread dump of this process.
> 2018-11-28 01:39:18,980 ERROR 
> [ActiveMasterInitializationMonitor-1543338648565] master.HMaster: Zombie 
> Master exiting. Thread dump to stdout
> {noformat}
> Stack trace:
> {noformat}
> Thread 102 (master/HM-1:16000:becomeActiveMaster):
>  State: TIMED_WAITING
>  Blocked count: 100
>  Waited count: 246
>  Stack:
>  java.lang.Thread.sleep(Native Method)
>  org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:148)
>  org.apache.hadoop.hbase.master.HMaster.isRegionOnline(HMaster.java:1166)
>  
> org.apache.hadoop.hbase.master.HMaster.waitForNamespaceOnline(HMaster.java:1187)
>  
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1044)
>  
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2285)
>  org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:590)
>  org.apache.hadoop.hbase.master.HMaster$$Lambda$40/1078246575.run(Unknown 
> Source)
>  java.lang.Thread.run(Thread.java:745)
> {noformat}
>  
> Step to reproduce:
>  1) Setup a HBase cluster with 1/2 HM (say HM-1) and 2 RS(say RS-1 & RS-2)
>  2) Enable multiwal feature with following configuration setting and start 
> the cluster,
> {noformat}
>  
>  hbase.wal.provider
>  multiwal
>  
> 
>  hbase.wal.regiongrouping.strategy
>  identity
>  
> {noformat}
> 3) Make sure meta and namespace regions are assigned on different RS, suppose 
> RS-1 & RS-2 respectively.
>  4) Create table 't1' 
>  5) Flush the meta table explicitly
>  6) Kill the RS-2, so during RS-2 SCP all regions including namespace region 
> will be assigned to RS-1.
>  7) Now Kill RS-1 before meta flush happen. Here both RS-2 & RS-1 are 
> shutdown now.
>  8) Stop the HM and start RS-1 & RS-2.
>  9) Now start the HM.
> Meta region is assigned successfully but HM is keep waiting for the namespace 
> region onlline (Master startup cannot progress, in holding-pattern until 
> region onlined) and abort with timeout.
> Observation:
>  1) After step-3 namespace region was assigned to RS-2 and meta entry was as 
> follows,
> {noformat}
>  hbase:namespace,,1543339859614.31f6d3383af09e18e1e81ca02a93de15. 
> column=info:server, timestamp=1543339860920, value=RS-2:16020
>  hbase:namespace,,1543339859614.31f6d3383af09e18e1e81ca02a93de15. 
>

[jira] [Commented] (HBASE-21512) Introduce an AsyncClusterConnection and replace the usage of ClusterConnection

2018-12-16 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722512#comment-16722512
 ] 

Hudson commented on HBASE-21512:


Results for branch HBASE-21512
[build #19 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/19/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/19//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/19//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/19//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Introduce an AsyncClusterConnection and replace the usage of ClusterConnection
> --
>
> Key: HBASE-21512
> URL: https://issues.apache.org/jira/browse/HBASE-21512
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
>
> At least for the RSProcedureDispatcher, with CompletableFuture we do not need 
> to set a delay and use a thread pool any more, which could reduce the 
> resource usage and also the latency.
> Once this is done, I think we can remove the ClusterConnection completely, 
> and start to rewrite the old sync client based on the async client, which 
> could reduce the code base a lot for our client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21520) TestMultiColumnScanner cost long time when using ROWCOL bloom type

2018-12-16 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722489#comment-16722489
 ] 

Hudson commented on HBASE-21520:


Results for branch master
[build #666 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/666/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/666//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/666//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/666//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> TestMultiColumnScanner cost long time when using ROWCOL bloom type
> --
>
> Key: HBASE-21520
> URL: https://issues.apache.org/jira/browse/HBASE-21520
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.2.10, 1.4.10, 2.1.3, 2.0.5
>
> Attachments: HBASE-21520.branch-1.v4.patch, HBASE-21520.v1.patch, 
> HBASE-21520.v2.patch, HBASE-21520.v3.patch, HBASE-21520.v4.patch, 
> TestMultiColumnScanner.png, rowcol.txt
>
>
> The TestMultiColumnScanner is easy to be timeout,  you can see HBASE-21517.   
> In my localhost,  when I set the parameters to be { 
> Compression.Algorithm.NONE, BloomType.ROW, false },  it took about 5 seconds. 
>  but if I set the parameters to be  { Compression.Algorithm.NONE, 
> BloomType.ROWCOL, false },  it would take about 45 seconds, which means 
> ROWCOL cost much more time than ROW.
> Need to find out what's wrong with this ut.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21520) TestMultiColumnScanner cost long time when using ROWCOL bloom type

2018-12-16 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722456#comment-16722456
 ] 

Hudson commented on HBASE-21520:


Results for branch branch-1
[build #595 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/595/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/595//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/595//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/595//JDK8_Nightly_Build_Report_(Hadoop2)/]




(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> TestMultiColumnScanner cost long time when using ROWCOL bloom type
> --
>
> Key: HBASE-21520
> URL: https://issues.apache.org/jira/browse/HBASE-21520
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.2.10, 1.4.10, 2.1.3, 2.0.5
>
> Attachments: HBASE-21520.branch-1.v4.patch, HBASE-21520.v1.patch, 
> HBASE-21520.v2.patch, HBASE-21520.v3.patch, HBASE-21520.v4.patch, 
> TestMultiColumnScanner.png, rowcol.txt
>
>
> The TestMultiColumnScanner is easy to be timeout,  you can see HBASE-21517.   
> In my localhost,  when I set the parameters to be { 
> Compression.Algorithm.NONE, BloomType.ROW, false },  it took about 5 seconds. 
>  but if I set the parameters to be  { Compression.Algorithm.NONE, 
> BloomType.ROWCOL, false },  it would take about 45 seconds, which means 
> ROWCOL cost much more time than ROW.
> Need to find out what's wrong with this ut.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21592) quota.addGetResult(r) throw NPE

2018-12-16 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722435#comment-16722435
 ] 

Hadoop QA commented on HBASE-21592:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
49s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
47s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 17s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}129m 
18s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}165m 38s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21592 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12951923/HBASE-21592.master.0002.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 9bf5e4043203 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / ac0b3bb547 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15292/testReport/ |
| Max. process+thread count | 4750 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15292/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> quota.addGetResult(r)

[jira] [Commented] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

[jira] [Commented] (HBASE-21583) Insert info Level0 instead of throwing Exceptions when doing bulkload on tables with StripeCompaction enabled

[jira] [Commented] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

[jira] [Commented] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

[jira] [Commented] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

[jira] [Commented] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

[jira] [Updated] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

[jira] [Commented] (HBASE-21514) Refactor CacheConfig

[jira] [Commented] (HBASE-21019) Use StealJobQueue to better utilize handlers in MasterFifoRpcScheduler

[jira] [Commented] (HBASE-21514) Refactor CacheConfig

[jira] [Updated] (HBASE-21514) Refactor CacheConfig

[jira] [Commented] (HBASE-21592) quota.addGetResult(r) throw NPE

[jira] [Updated] (HBASE-21583) Insert info Level0 instead of throwing Exceptions when doing bulkload on tables with StripeCompaction enabled

[jira] [Commented] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

[jira] [Commented] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

[jira] [Comment Edited] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

[jira] [Updated] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server

[jira] [Updated] (HBASE-21019) Use StealJobQueue to better utilize handlers in MasterFifoRpcScheduler

[jira] [Updated] (HBASE-21019) Use StealJobQueue to better utilize handlers in MasterFifoRpcScheduler

[jira] [Commented] (HBASE-21019) Use StealJobQueue to better utilize handlers in MasterFifoRpcScheduler

[jira] [Updated] (HBASE-21019) Use StealJobQueue to better utilize handlers in MasterFifoRpcScheduler

[jira] [Commented] (HBASE-21514) Refactor CacheConfig

[jira] [Updated] (HBASE-21514) Refactor CacheConfig

[jira] [Updated] (HBASE-21592) quota.addGetResult(r) throw NPE

[jira] [Resolved] (HBASE-21608) Having RPC quota, Not completely deleted SPACE quota

[jira] [Commented] (HBASE-21589) TestCleanupMetaWAL fails

[jira] [Updated] (HBASE-21589) TestCleanupMetaWAL fails

[jira] [Commented] (HBASE-21589) TestCleanupMetaWAL fails

[jira] [Updated] (HBASE-21589) TestCleanupMetaWAL fails

[jira] [Commented] (HBASE-21519) Namespace region is never assigned in a HM failover scenario and HM abort always due to init timeout

[jira] [Commented] (HBASE-21512) Introduce an AsyncClusterConnection and replace the usage of ClusterConnection

[jira] [Commented] (HBASE-21520) TestMultiColumnScanner cost long time when using ROWCOL bloom type

[jira] [Commented] (HBASE-21520) TestMultiColumnScanner cost long time when using ROWCOL bloom type

[jira] [Commented] (HBASE-21592) quota.addGetResult(r) throw NPE

34 matches

Site Navigation

Mail list logo

Footer information