[jira] [Commented] (HBASE-21332) HBase scan with PageFilter cannot get all rows, non-edge region skiped

2018-10-23 Thread pddNick (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661820#comment-16661820
 ] 

pddNick commented on HBASE-21332:
-

Thank u so much , it helps me a lot.

 '+PageFilter isn't a global filter which consider the cross regions case, it 
only consider the region level+'. This is the thing that matters. 
But{color:#d04437} it causes problem only when scanner crosses region.{color}

{color:#33}Say now the scanner is in [111, 222) and there are plenty of 
rows to retrieve which are more than PAGE_SIZE.The scanner will get PAGE_SIZE 
rows and returns them to client , everything works just fine.{color}

{color:#33}But when there are not enough rows left in [111, 222).The 
scanner will combine all rows of parallel scanner in region [222, 333) and 
[333, 444) and return them to client.That means the client will get 
(RowsLeftInFirstRegion + PAGE_SIZE + PAGE_SIZE).In this way the clinet skips 
region [222,333) while the last rowkey is in [333,444).I add some log to the UT 
and the output is exactly what i expect.{color}

{color:#33}So the solution is to get PAGE_SIZE rows by ourselves every time 
we do scan, or to limit scan range in single region(this is what i do in my 
real life project).{color}

 

> HBase scan with PageFilter cannot get all rows, non-edge region skiped
> --
>
> Key: HBASE-21332
> URL: https://issues.apache.org/jira/browse/HBASE-21332
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, scan
>Affects Versions: 1.1.2
> Environment: * Server version:1.1.2.2.6.5.0-292, 
> revision=897822d4dd5956ca186974c10382e9094683fa29
>  * 2 region servers
>  * 4 regions
>  * HBase client:1.3.1
>  
>Reporter: pddNick
>Assignee: Zheng Hu
>Priority: Minor
> Attachments: HBaseTest.java, image-2018-10-17-21-14-25-354.png, 
> image-2018-10-17-21-15-23-439.png, image-2018-10-23-17-37-22-028.png
>
>
> When using scan with pagefilter to get data from hbase, the scanner will 
> skip{color:#ff} 'non-edge'{color} regions.The code i use comes from the 
> book _HBase: Definitive Guide, Example 4.8, PageFilter example._ Difference 
> is i use scan with startRow and stopRow.
> Say i have regions with start and end keys like \{'111', '222', '333', 
> '444'}, which means i have 3 regions \{111, 222}, \{222, 333}, \{333, 444} 
> and they are in different region servers. When scan with startRow '111' and 
> stopRow '444' , most data in region \{222, 333} will be skiped and won't be 
> returned by ResultScanner.Region \{111,222} or \{333,444} works just fine and 
> because region \{222,333} doesn't contain startRowkey or stopRowkey i call it 
> non-edge region.
> Below is some explanation with log:
>  
> {code:java}
> // Here scanner works just fine in region {111,222}, it gets exactly 
> {pageSize} rows each time, which is 1000
> ...
> 2018-10-17 21:25:57.810 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [213971861069] to [2179067497952422], sum [1000 : 64000], cost: 
> [77ms]
> 2018-10-17 21:25:57.885 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [2179098921079755] to [21c2879280113661], sum [1000 : 65000], cost: 
> [75ms]
> 2018-10-17 21:25:57.962 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [21c2899018774688] to [2203180876471552], sum [1000 : 66000], cost: 
> [77ms]
> // Here scanner goes from region {111,222} to {222,333}. As you can see, the 
> scanner gets 2405 rows with stopRow '3373621463365126'.The scanner moves to 
> regin {333,444} too early and most data in {222,333} are skiped.
> 2018-10-17 21:25:58.321 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [2203223414254308] to [3373621463365126], sum [2405 : 68405], cost: 
> [359ms]
> // Now the scanner is in region {333,444}, everything works just fine
> 2018-10-17 21:25:58.396 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [3373764408525604] to [33b3849714659525], sum [1000 : 69405], cost: 
> [74ms]
> 2018-10-17 21:25:58.467 INFO 213872 [ main] c.p.s.c.HBaseTest : Test: results 
> from [33b3882378177107] to [33f5221377695765], sum [1000 : 70405], cost: 
> [71ms]
> ...{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661819#comment-16661819
 ] 

Hadoop QA commented on HBASE-21363:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
15s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
20s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m  1s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
16s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
10s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21363 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945343/HBASE-21363-v5.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 5615fc1d5f9b 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 1f437ac221 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14841/testReport/ |
| Max. process+thread count | 272 (vs. ulimit of 1) |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14841/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Rewrit

[jira] [Commented] (HBASE-21215) Figure how to invoke hbck2; make it easy to find

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661814#comment-16661814
 ] 

Duo Zhang commented on HBASE-21215:
---

This is the only one left for 2.1.1.

> Figure how to invoke hbck2; make it easy to find
> 
>
> Key: HBASE-21215
> URL: https://issues.apache.org/jira/browse/HBASE-21215
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 2.1.1
>
>
> In 
> https://docs.google.com/document/d/1Oun4G3M5fyrM0OxXcCKYF8td0KD7gJQjnU9Ad-2t-uk/edit#,
>  the doc on hbck2 'form', one item to figure is how to invoke hbck2. Related, 
> how to make it easy to find? [~busbey] has some ideas (posted in doc). This 
> issue is for implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20540) [umbrella] Hadoop 3 compatibility

2018-10-23 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20540:
--
Fix Version/s: 2.0.3
   2.2.0

> [umbrella] Hadoop 3 compatibility
> -
>
> Key: HBASE-20540
> URL: https://issues.apache.org/jira/browse/HBASE-20540
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
>
> There are known issues about the hadoop 3 compatibility for hbase 2. But 
> hadoop 3 is still not production ready. So we will link the issues here and 
> once there is a production ready hadoop 3 release, we will fix these issues 
> soon and upgrade our dependencies on hadoop, and also update the support 
> matrix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20540) [umbrella] Hadoop 3 compatibility

2018-10-23 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-20540.
---
Resolution: Fixed

All sub tasks are done. Let's resolve.

> [umbrella] Hadoop 3 compatibility
> -
>
> Key: HBASE-20540
> URL: https://issues.apache.org/jira/browse/HBASE-20540
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.1.1
>
>
> There are known issues about the hadoop 3 compatibility for hbase 2. But 
> hadoop 3 is still not production ready. So we will link the issues here and 
> once there is a production ready hadoop 3 release, we will fix these issues 
> soon and upgrade our dependencies on hadoop, and also update the support 
> matrix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21363:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to branch-2.0+. Thanks [~allan163] for reviewing.

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363-v5.patch, 
> HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21351) The force update thread may have race with PE worker when the procedure is rolling back

2018-10-23 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21351:
--
Fix Version/s: (was: 2.0.3)
   (was: 2.1.1)

> The force update thread may have race with PE worker when the procedure is 
> rolling back
> ---
>
> Key: HBASE-21351
> URL: https://issues.apache.org/jira/browse/HBASE-21351
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
>
> We will acquire the procExecutionLock for a procedure when force updating its 
> state to prevent race with PE worker, but this does not work then the 
> procedure is rolling back.
> If a procedure is failed, we will mark the root procedure stack as FAILED, 
> and then start to rollback the whole procedure stack. We will pop every 
> procedure in the stack and try to rollback them. So we may change the state 
> of a procedure without holding its procExecutionLock when rolling back.
> This means we may persist an intermediate state of a procedure and cause 
> corruption when loading procedures. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21365) Throw exception when user put data with skip wal to a table which may be replicated

2018-10-23 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21365:
---
Fix Version/s: 3.0.0

> Throw exception when user put data with skip wal to a table which may be 
> replicated
> ---
>
> Key: HBASE-21365
> URL: https://issues.apache.org/jira/browse/HBASE-21365
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 3.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HBASE-21365.master.001.patch
>
>
> A real problem in our production cluster. A user point that his table's data 
> can't be replicate to the peer cluster. Then we start to debug the reason. We 
> checked the replication scope, checked the replication wal entry filter, and 
> check the namespace,tablecfs config. But didn't found any problem. We enabled 
> the RS's debug log to find the reason. Finally, we found use use put with 
> skip wal to write data. But it taked a long time... Our replication use wal 
> to replicate data. So the data can't be replicated to peer cluster. I thought 
> throw a exception may be better for user if the table's replication scope is 
> not 0. (as 0 means not replicated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21365) Throw exception when user put data with skip wal to a table which may be replicated

2018-10-23 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21365:
---
Component/s: Client

> Throw exception when user put data with skip wal to a table which may be 
> replicated
> ---
>
> Key: HBASE-21365
> URL: https://issues.apache.org/jira/browse/HBASE-21365
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 3.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HBASE-21365.master.001.patch
>
>
> A real problem in our production cluster. A user point that his table's data 
> can't be replicate to the peer cluster. Then we start to debug the reason. We 
> checked the replication scope, checked the replication wal entry filter, and 
> check the namespace,tablecfs config. But didn't found any problem. We enabled 
> the RS's debug log to find the reason. Finally, we found use use put with 
> skip wal to write data. But it taked a long time... Our replication use wal 
> to replicate data. So the data can't be replicated to peer cluster. I thought 
> throw a exception may be better for user if the table's replication scope is 
> not 0. (as 0 means not replicated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21365) Throw exception when user put data with skip wal to a table which may be replicated

2018-10-23 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21365:
---
Affects Version/s: 3.0.0

> Throw exception when user put data with skip wal to a table which may be 
> replicated
> ---
>
> Key: HBASE-21365
> URL: https://issues.apache.org/jira/browse/HBASE-21365
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-21365.master.001.patch
>
>
> A real problem in our production cluster. A user point that his table's data 
> can't be replicate to the peer cluster. Then we start to debug the reason. We 
> checked the replication scope, checked the replication wal entry filter, and 
> check the namespace,tablecfs config. But didn't found any problem. We enabled 
> the RS's debug log to find the reason. Finally, we found use use put with 
> skip wal to write data. But it taked a long time... Our replication use wal 
> to replicate data. So the data can't be replicated to peer cluster. I thought 
> throw a exception may be better for user if the table's replication scope is 
> not 0. (as 0 means not replicated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21376) Add some verbose log to MasterProcedureScheduler

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661791#comment-16661791
 ] 

Hadoop QA commented on HBASE-21376:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
56s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
12s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
57s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
14s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 9s{color} | {color:green} hbase-server: The patch generated 0 new + 7 
unchanged - 1 fixed = 7 total (was 8) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 4s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m  0s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}162m 
51s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}198m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 |
| JIRA Issue | HBASE-21376 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945328/HBASE-21376.branch-2.0.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 8714fdd00ecf 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.0 / 169e3bafc8 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14838/testReport/ |
| Max. process+thread count | 4544 (vs. ulimit of 1) |
| mod

[jira] [Updated] (HBASE-21365) Throw exception when user put data with skip wal to a table which may be replicated

2018-10-23 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21365:
---
Attachment: HBASE-21365.master.001.patch

> Throw exception when user put data with skip wal to a table which may be 
> replicated
> ---
>
> Key: HBASE-21365
> URL: https://issues.apache.org/jira/browse/HBASE-21365
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-21365.master.001.patch
>
>
> A real problem in our production cluster. A user point that his table's data 
> can't be replicate to the peer cluster. Then we start to debug the reason. We 
> checked the replication scope, checked the replication wal entry filter, and 
> check the namespace,tablecfs config. But didn't found any problem. We enabled 
> the RS's debug log to find the reason. Finally, we found use use put with 
> skip wal to write data. But it taked a long time... Our replication use wal 
> to replicate data. So the data can't be replicated to peer cluster. I thought 
> throw a exception may be better for user if the table's replication scope is 
> not 0. (as 0 means not replicated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21365) Throw exception when user put data with skip wal to a table which may be replicated

2018-10-23 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21365:
---
Assignee: Guanghao Zhang
  Status: Patch Available  (was: Open)

> Throw exception when user put data with skip wal to a table which may be 
> replicated
> ---
>
> Key: HBASE-21365
> URL: https://issues.apache.org/jira/browse/HBASE-21365
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-21365.master.001.patch
>
>
> A real problem in our production cluster. A user point that his table's data 
> can't be replicate to the peer cluster. Then we start to debug the reason. We 
> checked the replication scope, checked the replication wal entry filter, and 
> check the namespace,tablecfs config. But didn't found any problem. We enabled 
> the RS's debug log to find the reason. Finally, we found use use put with 
> skip wal to write data. But it taked a long time... Our replication use wal 
> to replicate data. So the data can't be replicated to peer cluster. I thought 
> throw a exception may be better for user if the table's replication scope is 
> not 0. (as 0 means not replicated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21365) Throw exception when user put data with skip wal to a table which may be replicated

2018-10-23 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21365:
---
Hadoop Flags: Incompatible change

> Throw exception when user put data with skip wal to a table which may be 
> replicated
> ---
>
> Key: HBASE-21365
> URL: https://issues.apache.org/jira/browse/HBASE-21365
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Minor
> Attachments: HBASE-21365.master.001.patch
>
>
> A real problem in our production cluster. A user point that his table's data 
> can't be replicate to the peer cluster. Then we start to debug the reason. We 
> checked the replication scope, checked the replication wal entry filter, and 
> check the namespace,tablecfs config. But didn't found any problem. We enabled 
> the RS's debug log to find the reason. Finally, we found use use put with 
> skip wal to write data. But it taked a long time... Our replication use wal 
> to replicate data. So the data can't be replicated to peer cluster. I thought 
> throw a exception may be better for user if the table's replication scope is 
> not 0. (as 0 means not replicated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661788#comment-16661788
 ] 

Hadoop QA commented on HBASE-21344:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
15s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
59s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
22s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 3s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 4s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 44s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}119m 36s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}158m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 |
| JIRA Issue | HBASE-21344 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945333/HBASE-21344.branch-2.0.003.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux df63925f8666 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 
17 11:07:07 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.0 / 169e3bafc8 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14839/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14839/testReport/ |
| Max. process+thread count | 4057 

[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661783#comment-16661783
 ] 

Hudson commented on HBASE-21349:


Results for branch branch-2
[build #1435 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Cluster is going down but CatalogJanitor and Normalizer try to run and fail 
> noisely
> ---
>
> Key: HBASE-21349
> URL: https://issues.apache.org/jira/browse/HBASE-21349
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: Xu Cang
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21349.master.002.patch, 
> HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, 
> HBASE-22349.master.001.patch
>
>
> Shutting down can take a while. Meantime catalog janitor and or normalizer 
> (etc?) try to run and when they can't, they fail noisely. Looks bad:
> {code}
> 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; 
> onlineServers=51
> 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: 
> Failed scan of catalog table
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116)
> at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-10-19 21:25:54,507 ERROR 
> org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to 
> normalize regions.
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189)
> at 
> org.apache.hadoop.hbase.master.HMaster.normalizeRegion

[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661782#comment-16661782
 ] 

Hudson commented on HBASE-21342:


Results for branch branch-2
[build #1435 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21338) [balancer] If balancer is an ill-fit for cluster size, it gives little indication

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661781#comment-16661781
 ] 

Hudson commented on HBASE-21338:


Results for branch branch-2
[build #1435 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1435//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> [balancer] If balancer is an ill-fit for cluster size, it gives little 
> indication
> -
>
> Key: HBASE-21338
> URL: https://issues.apache.org/jira/browse/HBASE-21338
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer, Operability
>Reporter: stack
>Assignee: Xu Cang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21338.master.001.patch
>
>
> See parent issue. Running balancer on a cluster where the max steps was way 
> inadequate, the balancer gave little to no indication that it was 
> ill-configured. In fact, it only logged its starting and then that there was 
> nothing to do though the cluster was obviously out-of-whack.
> Ideally the balancer would complain when say the maxSteps limit is a small 
> fraction of what the cluster's calculated max steps are, or it would notice 
> that the balancer is making little progress on an imbalanced cluster and 
> shout. Can we set balancer configs w/o having to restart Master?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21363:
--
Attachment: HBASE-21363-v5.patch

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363-v5.patch, 
> HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661751#comment-16661751
 ] 

Hudson commented on HBASE-20952:


Results for branch HBASE-20952
[build #27 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/27//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup&restore. Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B&R doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661744#comment-16661744
 ] 

Hudson commented on HBASE-21342:


Results for branch branch-2.0
[build #1004 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21338) [balancer] If balancer is an ill-fit for cluster size, it gives little indication

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661742#comment-16661742
 ] 

Hudson commented on HBASE-21338:


Results for branch branch-2.0
[build #1004 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> [balancer] If balancer is an ill-fit for cluster size, it gives little 
> indication
> -
>
> Key: HBASE-21338
> URL: https://issues.apache.org/jira/browse/HBASE-21338
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer, Operability
>Reporter: stack
>Assignee: Xu Cang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21338.master.001.patch
>
>
> See parent issue. Running balancer on a cluster where the max steps was way 
> inadequate, the balancer gave little to no indication that it was 
> ill-configured. In fact, it only logged its starting and then that there was 
> nothing to do though the cluster was obviously out-of-whack.
> Ideally the balancer would complain when say the maxSteps limit is a small 
> fraction of what the cluster's calculated max steps are, or it would notice 
> that the balancer is making little progress on an imbalanced cluster and 
> shout. Can we set balancer configs w/o having to restart Master?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661743#comment-16661743
 ] 

Hudson commented on HBASE-21349:


Results for branch branch-2.0
[build #1004 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1004//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Cluster is going down but CatalogJanitor and Normalizer try to run and fail 
> noisely
> ---
>
> Key: HBASE-21349
> URL: https://issues.apache.org/jira/browse/HBASE-21349
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: Xu Cang
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21349.master.002.patch, 
> HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, 
> HBASE-22349.master.001.patch
>
>
> Shutting down can take a while. Meantime catalog janitor and or normalizer 
> (etc?) try to run and when they can't, they fail noisely. Looks bad:
> {code}
> 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; 
> onlineServers=51
> 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: 
> Failed scan of catalog table
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116)
> at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-10-19 21:25:54,507 ERROR 
> org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to 
> normalize regions.
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189)
> at 
> org.apache.hadoop.hbase.master.HMaster.normalizeRegions(HMaster.java:1718)
> at 
> org.ap

[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661729#comment-16661729
 ] 

Hadoop QA commented on HBASE-21363:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
33s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
32s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 13s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
36s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 41m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21363 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945335/HBASE-21363-v4.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux b207abd0da91 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 
17 11:07:07 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 1f437ac221 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14840/testReport/ |
| Max. process+thread count | 273 (vs. ulimit of 1) |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14840/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.


[jira] [Commented] (HBASE-21325) Force to terminate regionserver when abort hang in somewhere

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661720#comment-16661720
 ] 

Hadoop QA commented on HBASE-21325:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
50s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
26s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 5s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
32s{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 32s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
12s{color} | {color:red} hbase-server: The patch generated 3 new + 76 unchanged 
- 0 fixed = 79 total (was 76) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
15s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 26s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}133m 55s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}174m 13s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.replication.TestSyncReplicationRemoveRemoteWAL |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21325 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945315/HBASE-21325.master.003.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 3106f256c7e3 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 1f437ac221 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| compile | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14836/artifact/patchprocess/patch-compile-hbase-server.txt
 |
| javac | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14836/artifact/patchproces

[jira] [Commented] (HBASE-21338) [balancer] If balancer is an ill-fit for cluster size, it gives little indication

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661716#comment-16661716
 ] 

Hudson commented on HBASE-21338:


Results for branch branch-2.1
[build #522 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> [balancer] If balancer is an ill-fit for cluster size, it gives little 
> indication
> -
>
> Key: HBASE-21338
> URL: https://issues.apache.org/jira/browse/HBASE-21338
> Project: HBase
>  Issue Type: Sub-task
>  Components: Balancer, Operability
>Reporter: stack
>Assignee: Xu Cang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21338.master.001.patch
>
>
> See parent issue. Running balancer on a cluster where the max steps was way 
> inadequate, the balancer gave little to no indication that it was 
> ill-configured. In fact, it only logged its starting and then that there was 
> nothing to do though the cluster was obviously out-of-whack.
> Ideally the balancer would complain when say the maxSteps limit is a small 
> fraction of what the cluster's calculated max steps are, or it would notice 
> that the balancer is making little progress on an imbalanced cluster and 
> shout. Can we set balancer configs w/o having to restart Master?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661718#comment-16661718
 ] 

Hudson commented on HBASE-21342:


Results for branch branch-2.1
[build #522 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> HBASE-21342.006.patch, HBASE-21342.007.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661717#comment-16661717
 ] 

Hudson commented on HBASE-21349:


Results for branch branch-2.1
[build #522 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/522//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Cluster is going down but CatalogJanitor and Normalizer try to run and fail 
> noisely
> ---
>
> Key: HBASE-21349
> URL: https://issues.apache.org/jira/browse/HBASE-21349
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: Xu Cang
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21349.master.002.patch, 
> HBASE-21349.master.002.patch, HBASE-21349.master.002.patch, 
> HBASE-22349.master.001.patch
>
>
> Shutting down can take a while. Meantime catalog janitor and or normalizer 
> (etc?) try to run and when they can't, they fail noisely. Looks bad:
> {code}
> 2018-10-19 21:23:24,962 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Cluster shutdown set; vc1205.halxg.cloudera.com,22101,1539991730711 expired; 
> onlineServers=51
> 2018-10-19 21:25:54,502 WARN org.apache.hadoop.hbase.master.CatalogJanitor: 
> Failed scan of catalog table
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:684)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:679)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:185)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:137)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:243)
> at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:116)
> at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-10-19 21:25:54,507 ERROR 
> org.apache.hadoop.hbase.master.normalizer.RegionNormalizerChore: Failed to 
> normalize regions.
> java.io.IOException: connection is closed
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:267)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:763)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:734)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:690)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.fullScanTables(MetaTableAccessor.java:240)
> at 
> org.apache.hadoop.hbase.master.TableStateManager.getTablesInStates(TableStateManager.java:189)
> at 
> org.apache.hadoop.hbase.master.HMaster.normal

[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661708#comment-16661708
 ] 

Allan Yang commented on HBASE-21363:


{code}
+this.partial = resetDelete ? false : other.partial;
{code}
Add a comment for this one
The v4 patch looks great, +1 for it. You can add a comment while committing 

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661707#comment-16661707
 ] 

Hadoop QA commented on HBASE-21372:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
38s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
51s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
52s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} branch-2.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
35s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 12s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}176m 
23s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}216m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 |
| JIRA Issue | HBASE-21372 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945308/HBASE-21372.branch-2.1.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux d021c3937513 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.1 / d35f65f396 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14833/testReport/ |
| Max. process+thread count | 4249 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://bui

[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661695#comment-16661695
 ] 

Duo Zhang commented on HBASE-21363:
---

Add a UT to confirm that we will reset all the deleted flags when building 
holdingCleanupTracker even if the original tracker is partial.

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21363:
--
Attachment: HBASE-21363-v4.patch

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363-v4.patch, HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-21344:
--
Attachment: HBASE-21344.branch-2.0.003.patch

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 2.0.3
>
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch, 
> HBASE-21344.branch-2.0.003.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.po

[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661668#comment-16661668
 ] 

Duo Zhang commented on HBASE-21363:
---

Talked with [~allan163] offline, there are still problems. I'm writing a UT 
now. Will be back soon.

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661667#comment-16661667
 ] 

stack commented on HBASE-21364:
---

Thanks boys.

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661666#comment-16661666
 ] 

stack commented on HBASE-21344:
---

You need to add .branch-2.0. into the name of your patch [~an...@apache.org] 
Also, this is an important patch because the left-over start will prevent us 
getting to the wait-on-meta holding pattern.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 2.0.3
>
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>  

[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661663#comment-16661663
 ] 

Duo Zhang commented on HBASE-21254:
---

Can do this later.

> Need to find a way to limit the number of proc wal files
> 
>
> Key: HBASE-21254
> URL: https://issues.apache.org/jira/browse/HBASE-21254
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, 
> HBASE-21254-v3.patch, HBASE-21254.patch
>
>
> For regionserver, we have a max wal file limitation, if we reach the 
> limitation, we will trigger a flush on specific regions so that we can delete 
> old wal files. But for proc wals, we do not have this mechanism, and it will 
> be worse after HBASE-21233, as if there is an old procedure which can not 
> make progress and do not persist its state, we need to keep the old proc wal 
> file for ever...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21357) RS should abort if OOM in Reader thread

2018-10-23 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21357:
---
   Resolution: Fixed
Fix Version/s: 1.4.9
   Status: Resolved  (was: Patch Available)

> RS should abort if OOM in Reader thread
> ---
>
> Key: HBASE-21357
> URL: https://issues.apache.org/jira/browse/HBASE-21357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.8
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 1.4.9
>
> Attachments: HBASE-21357.branch-1.001.patch, 
> HBASE-21357.branch-1.001.patch
>
>
> It is a bit strange, we will abort the RS if OOM in Listener thread, 
> Responder thread and in CallRunner thread, only not in Reader thread... 
> We should abort RS if OOM happens in Reader thread, too. If not, the reader 
> thread exists because of OOM, and the selector closes. Later connection 
> select to this reader will be ignored
> {code}
> try {
>   if (key.isValid()) {
> if (key.isAcceptable())
>   doAccept(key);
>   }
> } catch (IOException ignored) {
>   if (LOG.isTraceEnabled()) LOG.trace("ignored", ignored);
> }
> {code}
> Leaving the client (or Master and other RS)'s call wait until SocketTimeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21357) RS should abort if OOM in Reader thread

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661659#comment-16661659
 ] 

Allan Yang commented on HBASE-21357:


Pushed to branch-1, thanks [~stack] for reviewing.

> RS should abort if OOM in Reader thread
> ---
>
> Key: HBASE-21357
> URL: https://issues.apache.org/jira/browse/HBASE-21357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.8
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21357.branch-1.001.patch, 
> HBASE-21357.branch-1.001.patch
>
>
> It is a bit strange, we will abort the RS if OOM in Listener thread, 
> Responder thread and in CallRunner thread, only not in Reader thread... 
> We should abort RS if OOM happens in Reader thread, too. If not, the reader 
> thread exists because of OOM, and the selector closes. Later connection 
> select to this reader will be ignored
> {code}
> try {
>   if (key.isValid()) {
> if (key.isAcceptable())
>   doAccept(key);
>   }
> } catch (IOException ignored) {
>   if (LOG.isTraceEnabled()) LOG.trace("ignored", ignored);
> }
> {code}
> Leaving the client (or Master and other RS)'s call wait until SocketTimeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661658#comment-16661658
 ] 

Hadoop QA commented on HBASE-21344:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HBASE-21344 does not apply to branch-2. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.8.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-21344 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945321/HBASE-21344-branch-2.0_v3.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14837/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 2.0.3
>
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAs

[jira] [Updated] (HBASE-21376) Add some verbose log to MasterProcedureScheduler

2018-10-23 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21376:
---
Status: Patch Available  (was: Open)

> Add some verbose log to MasterProcedureScheduler
> 
>
> Key: HBASE-21376
> URL: https://issues.apache.org/jira/browse/HBASE-21376
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21376.branch-2.0.001.patch
>
>
> As discussed in HBASE-21364, we divided the patch in HBASE-21364 to two, the 
> critical one is already submitted in HBASE-21364 to branch-2.0 and 
> branch-2.1, but I also added some useful logs  which need to commit to all 
> branches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21376) Add some verbose log to MasterProcedureScheduler

2018-10-23 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21376:
---
Attachment: HBASE-21376.branch-2.0.001.patch

> Add some verbose log to MasterProcedureScheduler
> 
>
> Key: HBASE-21376
> URL: https://issues.apache.org/jira/browse/HBASE-21376
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21376.branch-2.0.001.patch
>
>
> As discussed in HBASE-21364, we divided the patch in HBASE-21364 to two, the 
> critical one is already submitted in HBASE-21364 to branch-2.0 and 
> branch-2.1, but I also added some useful logs  which need to commit to all 
> branches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21344:
--
Fix Version/s: 2.0.3

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 2.0.3
>
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.C

[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21344:
--
Status: Patch Available  (was: Open)

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFutur

[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661655#comment-16661655
 ] 

stack commented on HBASE-21344:
---

The change in HBaseTestingUtility is just formatting?

Otherwise, patch seems good.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 2.0.3
>
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> j

[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661653#comment-16661653
 ] 

Allan Yang commented on HBASE-21254:


But we can use public Entry tryLockEntry(long id, long time) instead, we can 
still block the forceUpdateExecutor thread if some procedure is stuck.

> Need to find a way to limit the number of proc wal files
> 
>
> Key: HBASE-21254
> URL: https://issues.apache.org/jira/browse/HBASE-21254
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, 
> HBASE-21254-v3.patch, HBASE-21254.patch
>
>
> For regionserver, we have a max wal file limitation, if we reach the 
> limitation, we will trigger a flush on specific regions so that we can delete 
> old wal files. But for proc wals, we do not have this mechanism, and it will 
> be worse after HBASE-21233, as if there is an old procedure which can not 
> make progress and do not persist its state, we need to keep the old proc wal 
> file for ever...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661646#comment-16661646
 ] 

Allan Yang edited comment on HBASE-21364 at 10/24/18 2:58 AM:
--

The patch without verbose log has already committed to branch-2.0 and 
branch-2.1, thanks! Opened HBASE-21376 to commit the verbose log.


was (Author: allan163):
The patch without verbose log has already committed to branch-2.0 and 
branch-2.1, thanks!

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21364:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661646#comment-16661646
 ] 

Allan Yang commented on HBASE-21364:


The patch without verbose log has already committed to branch-2.0 and 
branch-2.1, thanks!

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21376) Add some verbose log to MasterProcedureScheduler

2018-10-23 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21376:
--

 Summary: Add some verbose log to MasterProcedureScheduler
 Key: HBASE-21376
 URL: https://issues.apache.org/jira/browse/HBASE-21376
 Project: HBase
  Issue Type: Sub-task
Reporter: Allan Yang
Assignee: Allan Yang


As discussed in HBASE-21364, we divided the patch in HBASE-21364 to two, the 
critical one is already submitted in HBASE-21364 to branch-2.0 and branch-2.1, 
but I also added some useful logs  which need to commit to all branches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661644#comment-16661644
 ] 

Duo Zhang commented on HBASE-21254:
---

But we do need to review the implementation again. As I found that, the actual 
deletion for a root procedure is done later in CompletedProcedureCleaner, so a 
successful procedure could also block us from removing a file...

> Need to find a way to limit the number of proc wal files
> 
>
> Key: HBASE-21254
> URL: https://issues.apache.org/jira/browse/HBASE-21254
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, 
> HBASE-21254-v3.patch, HBASE-21254.patch
>
>
> For regionserver, we have a max wal file limitation, if we reach the 
> limitation, we will trigger a flush on specific regions so that we can delete 
> old wal files. But for proc wals, we do not have this mechanism, and it will 
> be worse after HBASE-21233, as if there is an old procedure which can not 
> make progress and do not persist its state, we need to keep the old proc wal 
> file for ever...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661643#comment-16661643
 ] 

Duo Zhang commented on HBASE-21254:
---

No it will be executed in a separated thread. In the log rolling thread we will 
just schedule a task into the forceUpdateExecutor.

> Need to find a way to limit the number of proc wal files
> 
>
> Key: HBASE-21254
> URL: https://issues.apache.org/jira/browse/HBASE-21254
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, 
> HBASE-21254-v3.patch, HBASE-21254.patch
>
>
> For regionserver, we have a max wal file limitation, if we reach the 
> limitation, we will trigger a flush on specific regions so that we can delete 
> old wal files. But for proc wals, we do not have this mechanism, and it will 
> be worse after HBASE-21233, as if there is an old procedure which can not 
> make progress and do not persist its state, we need to keep the old proc wal 
> file for ever...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661642#comment-16661642
 ] 

Allan Yang commented on HBASE-21254:


In forceUpdateProcedure, we will acquire the execution lock before force update:
{code}
 private void forceUpdateProcedure(long procId) throws IOException {
IdLock.Entry lockEntry = procExecutionLock.getLockEntry(procId);
try {
{code}
We will wait forever here if the procedure is stuck. And IIRC, 
forceUpdateProcedure will be executed in the roll procedure log thread, which 
will stuck it too.

> Need to find a way to limit the number of proc wal files
> 
>
> Key: HBASE-21254
> URL: https://issues.apache.org/jira/browse/HBASE-21254
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, 
> HBASE-21254-v3.patch, HBASE-21254.patch
>
>
> For regionserver, we have a max wal file limitation, if we reach the 
> limitation, we will trigger a flush on specific regions so that we can delete 
> old wal files. But for proc wals, we do not have this mechanism, and it will 
> be worse after HBASE-21233, as if there is an old procedure which can not 
> make progress and do not persist its state, we need to keep the old proc wal 
> file for ever...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661628#comment-16661628
 ] 

Duo Zhang commented on HBASE-21254:
---

Which lock? I do not get your point, about the 'wait for the lock when rolling 
log'...

> Need to find a way to limit the number of proc wal files
> 
>
> Key: HBASE-21254
> URL: https://issues.apache.org/jira/browse/HBASE-21254
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, 
> HBASE-21254-v3.patch, HBASE-21254.patch
>
>
> For regionserver, we have a max wal file limitation, if we reach the 
> limitation, we will trigger a flush on specific regions so that we can delete 
> old wal files. But for proc wals, we do not have this mechanism, and it will 
> be worse after HBASE-21233, as if there is an old procedure which can not 
> make progress and do not persist its state, we need to keep the old proc wal 
> file for ever...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21254) Need to find a way to limit the number of proc wal files

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661625#comment-16661625
 ] 

Allan Yang commented on HBASE-21254:


I have a question for this one, what if a procedure is stuck there, and can't 
get the lock for it, will it stuck to wait for the lock when rolling log?

> Need to find a way to limit the number of proc wal files
> 
>
> Key: HBASE-21254
> URL: https://issues.apache.org/jira/browse/HBASE-21254
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21254-v1.patch, HBASE-21254-v2.patch, 
> HBASE-21254-v3.patch, HBASE-21254.patch
>
>
> For regionserver, we have a max wal file limitation, if we reach the 
> limitation, we will trigger a flush on specific regions so that we can delete 
> old wal files. But for proc wals, we do not have this mechanism, and it will 
> be worse after HBASE-21233, as if there is an old procedure which can not 
> make progress and do not persist its state, we need to keep the old proc wal 
> file for ever...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-21344:
--
Attachment: HBASE-21344-branch-2.0_v3.patch

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch, HBASE-21344-branch-2.0_v3.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.conc

[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661612#comment-16661612
 ] 

Duo Zhang commented on HBASE-21372:
---

TRSP uses the same config to decide whether to give up retrying.

> Set hbase.assignment.maximum.attempts to Long.MAX
> -
>
> Key: HBASE-21372
> URL: https://issues.apache.org/jira/browse/HBASE-21372
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-21372.branch-2.1.001.patch, 
> HBASE-21372.branch-2.1.001.patch
>
>
> From parent issue, [~allan163] suggests that we not give up on assign unless 
> there a change -- an SCP triggers failure -- or at the extreme, an operator 
> intervenes. This jibes w/ how we're thinking about assign (or to put it 
> another way, we have no handling for the case where we exhaust retries).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661613#comment-16661613
 ] 

Duo Zhang commented on HBASE-21363:
---

Any other concerns? [~allan163].

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661611#comment-16661611
 ] 

Hadoop QA commented on HBASE-21363:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
17s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
15s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 30s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
8s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
10s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21363 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945314/HBASE-21363-v3.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 1194f80ac5e4 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 1f437ac221 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14835/testReport/ |
| Max. process+thread count | 278 (vs. ulimit of 1) |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14835/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Rewri

[jira] [Comment Edited] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661606#comment-16661606
 ] 

Allan Yang edited comment on HBASE-21372 at 10/24/18 2:06 AM:
--

+1 for it.
TransitRegionStateProcedure is introduced in branch-2+ by [~Apache9], can 
[~Apache9] conform that in branch-2+, we have already retry forever for 
assignment?


was (Author: allan163):
+1 for it, TransitRegionStateProcedure is introduced in branch-2+ by 
[~Apache9], can [~Apache9] conform that in branch-2+, we have already retry 
forever for assignment?

> Set hbase.assignment.maximum.attempts to Long.MAX
> -
>
> Key: HBASE-21372
> URL: https://issues.apache.org/jira/browse/HBASE-21372
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-21372.branch-2.1.001.patch, 
> HBASE-21372.branch-2.1.001.patch
>
>
> From parent issue, [~allan163] suggests that we not give up on assign unless 
> there a change -- an SCP triggers failure -- or at the extreme, an operator 
> intervenes. This jibes w/ how we're thinking about assign (or to put it 
> another way, we have no handling for the case where we exhaust retries).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661606#comment-16661606
 ] 

Allan Yang commented on HBASE-21372:


+1 for it, TransitRegionStateProcedure is introduced in branch-2+ by 
[~Apache9], can [~Apache9] conform that in branch-2+, we have already retry 
forever for assignment?

> Set hbase.assignment.maximum.attempts to Long.MAX
> -
>
> Key: HBASE-21372
> URL: https://issues.apache.org/jira/browse/HBASE-21372
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-21372.branch-2.1.001.patch, 
> HBASE-21372.branch-2.1.001.patch
>
>
> From parent issue, [~allan163] suggests that we not give up on assign unless 
> there a change -- an SCP triggers failure -- or at the extreme, an operator 
> intervenes. This jibes w/ how we're thinking about assign (or to put it 
> another way, we have no handling for the case where we exhaust retries).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661593#comment-16661593
 ] 

Allan Yang commented on HBASE-21364:


The checkstyle error is not valid, we add too many comments in the method, so 
it exceeds 150 lines. Will commit the actual fix to  branch-2.1 and branch-2.0, 
and open another issue to commit the verbose code to all branches as [~Apache9] 
said.

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661584#comment-16661584
 ] 

Hadoop QA commented on HBASE-21373:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 0s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
49s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
22s{color} | {color:red} hbase-server: The patch generated 2 new + 37 unchanged 
- 0 fixed = 39 total (was 37) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
45s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
1m 44s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 40s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}150m 47s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.procedure.TestFailedProcCleanup |
|   | hadoop.hbase.mapreduce.TestLoadIncrementalHFiles |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:61288f8 |
| JIRA Issue | HBASE-21373 |
| JIRA Patch URL | 
https://issues.apache.org/jira

[jira] [Created] (HBASE-21375) Forward port "HBASE-21364 Procedure holds the lock should put to front of the queue after restart"

2018-10-23 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-21375:
-

 Summary: Forward port "HBASE-21364 Procedure holds the lock should 
put to front of the queue after restart"
 Key: HBASE-21375
 URL: https://issues.apache.org/jira/browse/HBASE-21375
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2
Reporter: Duo Zhang
Assignee: Duo Zhang
 Fix For: 3.0.0, 2.2.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661580#comment-16661580
 ] 

stack commented on HBASE-21344:
---

Make new patch [~an...@apache.org]? Thanks.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util

[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661576#comment-16661576
 ] 

Hadoop QA commented on HBASE-21371:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m  
9s{color} | {color:green} hbase-resource-bundle in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 9s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 12m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21371 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945311/HBASE-21371.master.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  xml  |
| uname | Linux 92319fb5b492 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 1f437ac221 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14834/testReport/ |
| Max. process+thread count | 87 (vs. ulimit of 1) |
| modules | C: hbase-resource-bundle U: hbase-resource-bundle |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14834/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license 
> error
> --
>
> Key: HBASE-21371
> URL: https://issues.apache.org/jira/browse/HBASE-21371
> Project: HBase
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-21371.master.001.patch
>
>
> Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional 
> licenses that break HBase's license check plugin.
> CDDL/GPLv2+CE license
> {quote}This product includes JavaBeans Activation Framework API jar licensed 
> under the CDDL/GPLv2+CE.
> CDDL or GPL version 2 plus the Classpath Exception
>  ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> javax.activation
>  javax.activation-api
>  1.2.0
> maven central search
>  g:javax.activation AND a:javax.activation-api AND v:1.2.0
> project website
>  [http://java.net/all/javax.activation-api/]
>  project source
>  [https://github.com/javaee/activation/javax.activation-api]
> {quo

[jira] [Comment Edited] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661554#comment-16661554
 ] 

Ankit Singhal edited comment on HBASE-21344 at 10/24/18 1:36 AM:
-

{quote}What you doing here from patch?{quote}
I'm just removing duplicate tableStateManager.start() (and keeping the 
tableStateManager.start() after checking meta is actually online).
And the test in the patch is to check if we have OPENING state for meta also, 
still SCP can succeed , so we don't need to change state of meta znode to 
offline during FAILED_OPEN of assign procedure(this meta znode state will 
intern also help in avoiding IMP if meta is in transition due to Server crash). 
Anyways, a sub-task to increase the no. of Assign max attempt , will not let 
the call go in FAILED_OPEN path(I think).

{quote}
1096 Optional optProc = 
this.procedureExecutor.getProcedures().stream()
 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
 1098 .filter(s -> s.hasMetaTableRegion()).findAny();
 You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?
{quote}
Yes, My bad, we don't need this particular change.


was (Author: an...@apache.org):
{quote}What you doing here from patch?{quote}
I'm just removing duplicate tableStateManager.start() (and keeping the 
tableStateManager.start() after checking meta is actually online).
And the test in the patch is to check if we have OPENING state for meta also, 
still SCP can succeed , so we don't need to change state of meta znode to 
offline during FAILED_OPEN of assign procedure(this meta znode state will 
intern also help in avoiding IMP if meta is in transition due to Server crash). 
Anyways, a sub-task to increase the no. of Assign procedure , will not let the 
call go in FAILED_OPEN path(I think).

{quote}
1096 Optional optProc = 
this.procedureExecutor.getProcedures().stream()
 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
 1098 .filter(s -> s.hasMetaTableRegion()).findAny();
 You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?
{quote}
Yes, My bad, we don't need this particular change.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashP

[jira] [Updated] (HBASE-21325) Force to terminate regionserver when abort hang in somewhere

2018-10-23 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21325:
---
Attachment: HBASE-21325.master.003.patch

> Force to terminate regionserver when abort hang in somewhere
> 
>
> Key: HBASE-21325
> URL: https://issues.apache.org/jira/browse/HBASE-21325
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21325.master.001.patch, 
> HBASE-21325.master.001.patch, HBASE-21325.master.002.patch, 
> HBASE-21325.master.003.patch
>
>
> When testing sync replication, I found that, if I transit the remote cluster 
> to DA, while the local cluster is still in A, the region server will hang 
> when shutdown. As the fsOk flag only test the local cluster(which is 
> reasonable), we will enter the waitOnAllRegionsToClose, and since the WAL is 
> broken(the remote wal directory is gone)  so we will never succeed. And this 
> lead to an infinite wait inside waitOnAllRegionsToClose.
> So I think here we should have an upper bound for the wait time in 
> waitOnAllRegionsToClose method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661570#comment-16661570
 ] 

Duo Zhang commented on HBASE-21363:
---

Add simple comments for the ProcedureWALFormat.load method.

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661554#comment-16661554
 ] 

Ankit Singhal edited comment on HBASE-21344 at 10/24/18 1:30 AM:
-

{quote}What you doing here from patch?{quote}
I'm just removing duplicate tableStateManager.start() (and keeping the 
tableStateManager.start() after checking meta is actually online).
And the test in the patch is to check if we have OPENING state for meta also, 
still SCP can succeed , so we don't need to change state of meta znode to 
offline during FAILED_OPEN of assign procedure(this meta znode state will 
intern also help in avoiding IMP if meta is in transition due to Server crash). 
Anyways, a sub-task to increase the no. of Assign procedure , will not let the 
call go in FAILED_OPEN path(I think).

{quote}
1096 Optional optProc = 
this.procedureExecutor.getProcedures().stream()
 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
 1098 .filter(s -> s.hasMetaTableRegion()).findAny();
 You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?
{quote}
Yes, My bad, we don't need this particular change.


was (Author: an...@apache.org):
{quote}What you doing here from patch?{quote}
I'm just removing duplicate tableStateManager.start() (and keeping the 
tableStateManager.start() after checking meta is actually online).
And the test in the patch is to check if we have OPENING state for meta also, 
still SCP can succeed , so we don't need to change state of meta znode to 
offline during FAILED_OPEN of assign procedure(this meta znode state will 
intern also help in avoiding IMP if meta is in transition due to Server crash)

{quote}
1096 Optional optProc = 
this.procedureExecutor.getProcedures().stream()
 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
 1098 .filter(s -> s.hasMetaTableRegion()).findAny();
 You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?
{quote}
Yes, My bad, we don't need this particular change.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.ja

[jira] [Updated] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21363:
--
Attachment: HBASE-21363-v3.patch

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363-v3.patch, HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661554#comment-16661554
 ] 

Ankit Singhal edited comment on HBASE-21344 at 10/24/18 1:24 AM:
-

{quote}What you doing here from patch?{quote}
I'm just removing duplicate tableStateManager.start() (and keeping the 
tableStateManager.start() after checking meta is actually online).
And the test in the patch is to check if we have OPENING state for meta also, 
still SCP can succeed , so we don't need to change state of meta znode to 
offline during FAILED_OPEN of assign procedure(this meta znode state will 
intern also help in avoiding IMP if meta is in transition due to Server crash)

{quote}
1096 Optional optProc = 
this.procedureExecutor.getProcedures().stream()
 1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
 1098 .filter(s -> s.hasMetaTableRegion()).findAny();
 You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?
{quote}
Yes, My bad, we don't need this particular change.


was (Author: an...@apache.org):
{quote}1096 Optional optProc = 
this.procedureExecutor.getProcedures().stream()
1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
1098 .filter(s -> s.hasMetaTableRegion()).findAny();
You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?
{quote}

Yes, My bad, we don't need this particular change.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThre

[jira] [Updated] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error

2018-10-23 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HBASE-21371:

Attachment: (was: HBASE-21371.001.patch)

> Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license 
> error
> --
>
> Key: HBASE-21371
> URL: https://issues.apache.org/jira/browse/HBASE-21371
> Project: HBase
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-21371.master.001.patch
>
>
> Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional 
> licenses that break HBase's license check plugin.
> CDDL/GPLv2+CE license
> {quote}This product includes JavaBeans Activation Framework API jar licensed 
> under the CDDL/GPLv2+CE.
> CDDL or GPL version 2 plus the Classpath Exception
>  ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> javax.activation
>  javax.activation-api
>  1.2.0
> maven central search
>  g:javax.activation AND a:javax.activation-api AND v:1.2.0
> project website
>  [http://java.net/all/javax.activation-api/]
>  project source
>  [https://github.com/javaee/activation/javax.activation-api]
> {quote}
> Bouncy Castle License 
> {quote}–
>  This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, 
> and CRMF APIs licensed under the Bouncy Castle Licence.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.bouncycastle
>  bcpkix-jdk15on
>  1.60
> maven central search
>  g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60
> project website
>  [http://www.bouncycastle.org/java.html]
>  project source
>  [https://github.com/bcgit/bc-java]
>  –
> {quote}
>  
> And a long list of "Apache Software License - Version 2.0" licensed Jetty 
> dependencies like this:
> {quote}
> This product includes Jetty :: Servlet Annotations licensed under the Apache 
> Software License - Version 2.0.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.eclipse.jetty
>  jetty-annotations
>  9.3.19.v20170502
> maven central search
>  g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502
> project website
>  [http://www.eclipse.org/jetty]
>  project source
>  [https://github.com/eclipse/jetty.project/jetty-annotations]
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error

2018-10-23 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HBASE-21371:

Status: Patch Available  (was: Open)

> Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license 
> error
> --
>
> Key: HBASE-21371
> URL: https://issues.apache.org/jira/browse/HBASE-21371
> Project: HBase
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-21371.master.001.patch
>
>
> Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional 
> licenses that break HBase's license check plugin.
> CDDL/GPLv2+CE license
> {quote}This product includes JavaBeans Activation Framework API jar licensed 
> under the CDDL/GPLv2+CE.
> CDDL or GPL version 2 plus the Classpath Exception
>  ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> javax.activation
>  javax.activation-api
>  1.2.0
> maven central search
>  g:javax.activation AND a:javax.activation-api AND v:1.2.0
> project website
>  [http://java.net/all/javax.activation-api/]
>  project source
>  [https://github.com/javaee/activation/javax.activation-api]
> {quote}
> Bouncy Castle License 
> {quote}–
>  This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, 
> and CRMF APIs licensed under the Bouncy Castle Licence.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.bouncycastle
>  bcpkix-jdk15on
>  1.60
> maven central search
>  g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60
> project website
>  [http://www.bouncycastle.org/java.html]
>  project source
>  [https://github.com/bcgit/bc-java]
>  –
> {quote}
>  
> And a long list of "Apache Software License - Version 2.0" licensed Jetty 
> dependencies like this:
> {quote}
> This product includes Jetty :: Servlet Annotations licensed under the Apache 
> Software License - Version 2.0.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.eclipse.jetty
>  jetty-annotations
>  9.3.19.v20170502
> maven central search
>  g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502
> project website
>  [http://www.eclipse.org/jetty]
>  project source
>  [https://github.com/eclipse/jetty.project/jetty-annotations]
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error

2018-10-23 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661560#comment-16661560
 ] 

Wei-Chiu Chuang commented on HBASE-21371:
-

yes, this will need to be done wherever we ship the relevant jar. if we already 
have bouncycastle as a dependency shouldn't we have an entry for it though?

I've checked the so called "Bouncy Castle License" is literally the same as the 
MIT License, character-by-character.

Interestingly I found LICENSE.vm contains this hard coded license text:
{quote}Bouncycastle is released under the MIT license (available above),
 and is Copyright (c) 2000 - 2006 The Legion Of The Bouncy Castle.
{quote}
 Maybe there's a bug in the license checker plugin and it didn't find 
Bouncycastle before so you had to manually add this license text?

> Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license 
> error
> --
>
> Key: HBASE-21371
> URL: https://issues.apache.org/jira/browse/HBASE-21371
> Project: HBase
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-21371.master.001.patch
>
>
> Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional 
> licenses that break HBase's license check plugin.
> CDDL/GPLv2+CE license
> {quote}This product includes JavaBeans Activation Framework API jar licensed 
> under the CDDL/GPLv2+CE.
> CDDL or GPL version 2 plus the Classpath Exception
>  ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> javax.activation
>  javax.activation-api
>  1.2.0
> maven central search
>  g:javax.activation AND a:javax.activation-api AND v:1.2.0
> project website
>  [http://java.net/all/javax.activation-api/]
>  project source
>  [https://github.com/javaee/activation/javax.activation-api]
> {quote}
> Bouncy Castle License 
> {quote}–
>  This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, 
> and CRMF APIs licensed under the Bouncy Castle Licence.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.bouncycastle
>  bcpkix-jdk15on
>  1.60
> maven central search
>  g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60
> project website
>  [http://www.bouncycastle.org/java.html]
>  project source
>  [https://github.com/bcgit/bc-java]
>  –
> {quote}
>  
> And a long list of "Apache Software License - Version 2.0" licensed Jetty 
> dependencies like this:
> {quote}
> This product includes Jetty :: Servlet Annotations licensed under the Apache 
> Software License - Version 2.0.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.eclipse.jetty
>  jetty-annotations
>  9.3.19.v20170502
> maven central search
>  g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502
> project website
>  [http://www.eclipse.org/jetty]
>  project source
>  [https://github.com/eclipse/jetty.project/jetty-annotations]
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error

2018-10-23 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HBASE-21371:

Attachment: HBASE-21371.master.001.patch

> Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license 
> error
> --
>
> Key: HBASE-21371
> URL: https://issues.apache.org/jira/browse/HBASE-21371
> Project: HBase
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-21371.001.patch, HBASE-21371.master.001.patch
>
>
> Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional 
> licenses that break HBase's license check plugin.
> CDDL/GPLv2+CE license
> {quote}This product includes JavaBeans Activation Framework API jar licensed 
> under the CDDL/GPLv2+CE.
> CDDL or GPL version 2 plus the Classpath Exception
>  ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> javax.activation
>  javax.activation-api
>  1.2.0
> maven central search
>  g:javax.activation AND a:javax.activation-api AND v:1.2.0
> project website
>  [http://java.net/all/javax.activation-api/]
>  project source
>  [https://github.com/javaee/activation/javax.activation-api]
> {quote}
> Bouncy Castle License 
> {quote}–
>  This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, 
> and CRMF APIs licensed under the Bouncy Castle Licence.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.bouncycastle
>  bcpkix-jdk15on
>  1.60
> maven central search
>  g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60
> project website
>  [http://www.bouncycastle.org/java.html]
>  project source
>  [https://github.com/bcgit/bc-java]
>  –
> {quote}
>  
> And a long list of "Apache Software License - Version 2.0" licensed Jetty 
> dependencies like this:
> {quote}
> This product includes Jetty :: Servlet Annotations licensed under the Apache 
> Software License - Version 2.0.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.eclipse.jetty
>  jetty-annotations
>  9.3.19.v20170502
> maven central search
>  g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502
> project website
>  [http://www.eclipse.org/jetty]
>  project source
>  [https://github.com/eclipse/jetty.project/jetty-annotations]
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21371) Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license error

2018-10-23 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661558#comment-16661558
 ] 

Wei-Chiu Chuang commented on HBASE-21371:
-

 
{code:java}
$ mvn dependency:tree -Dhadoop.profile=3.0 
-Dhadoop-three.version=3.3.0-SNAPSHOT{code}
 javax.activation is included indirectly from hadoop-common:
{quote}[INFO] +- org.apache.hadoop:hadoop-common:jar:3.3.0-SNAPSHOT:compile

[INFO] |  +- javax.activation:javax.activation-api:jar:1.2.0:runtime
{quote}
bouncycastle is included in test jars only:
{quote}[INFO] +- org.apache.hadoop:hadoop-minicluster:jar:3.3.0-SNAPSHOT:test

[INFO] | - 
org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:3.3.0-SNAPSHOT:test
 [INFO] | +- org.bouncycastle:bcprov-jdk15on:jar:1.60:test
 [INFO] | - org.bouncycastle:bcpkix-jdk15on:jar:1.60:test
{quote}
the new jetty dependencies are included in test jars only too:
{quote}[INFO] +- org.apache.hadoop:hadoop-minicluster:jar:3.3.0-SNAPSHOT:test

[INFO] |  +- 
org.apache.hadoop:hadoop-yarn-server-tests:test-jar:tests:3.3.0-SNAPSHOT:test

[INFO] |  |  +- 
org.apache.hadoop:hadoop-yarn-server-nodemanager:jar:3.3.0-SNAPSHOT:test

[INFO] |  |  |  +- 
org.eclipse.jetty.websocket:javax-websocket-server-impl:jar:9.3.19.v20170502:test

 
{quote}

> Hbase unable to compile against Hadoop trunk (3.3.0-SNAPSHOT) due to license 
> error
> --
>
> Key: HBASE-21371
> URL: https://issues.apache.org/jira/browse/HBASE-21371
> Project: HBase
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HBASE-21371.001.patch, HBASE-21371.master.001.patch
>
>
> Hadoop 3.3.0 (trunk) updated various dependencies, which adds additional 
> licenses that break HBase's license check plugin.
> CDDL/GPLv2+CE license
> {quote}This product includes JavaBeans Activation Framework API jar licensed 
> under the CDDL/GPLv2+CE.
> CDDL or GPL version 2 plus the Classpath Exception
>  ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> javax.activation
>  javax.activation-api
>  1.2.0
> maven central search
>  g:javax.activation AND a:javax.activation-api AND v:1.2.0
> project website
>  [http://java.net/all/javax.activation-api/]
>  project source
>  [https://github.com/javaee/activation/javax.activation-api]
> {quote}
> Bouncy Castle License 
> {quote}–
>  This product includes Bouncy Castle PKIX, CMS, EAC, TSP, PKCS, OCSP, CMP, 
> and CRMF APIs licensed under the Bouncy Castle Licence.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.bouncycastle
>  bcpkix-jdk15on
>  1.60
> maven central search
>  g:org.bouncycastle AND a:bcpkix-jdk15on AND v:1.60
> project website
>  [http://www.bouncycastle.org/java.html]
>  project source
>  [https://github.com/bcgit/bc-java]
>  –
> {quote}
>  
> And a long list of "Apache Software License - Version 2.0" licensed Jetty 
> dependencies like this:
> {quote}
> This product includes Jetty :: Servlet Annotations licensed under the Apache 
> Software License - Version 2.0.
> ERROR: Please check  this License for acceptability here:
> [https://www.apache.org/legal/resolved]
> If it is okay, then update the list named 'non_aggregate_fine' in the 
> LICENSE.vm file.
>  If it isn't okay, then revert the change that added the dependency.
> More info on the dependency:
> org.eclipse.jetty
>  jetty-annotations
>  9.3.19.v20170502
> maven central search
>  g:org.eclipse.jetty AND a:jetty-annotations AND v:9.3.19.v20170502
> project website
>  [http://www.eclipse.org/jetty]
>  project source
>  [https://github.com/eclipse/jetty.project/jetty-annotations]
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661554#comment-16661554
 ] 

Ankit Singhal commented on HBASE-21344:
---

{quote}1096 Optional optProc = 
this.procedureExecutor.getProcedures().stream()
1097 .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
1098 .filter(s -> s.hasMetaTableRegion()).findAny();
You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?
{quote}

Yes, My bad, we don't need this particular change.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$ge

[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661555#comment-16661555
 ] 

Duo Zhang commented on HBASE-21363:
---

OK, it is in ProcedureWALFormat... Let me update the patch.

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661553#comment-16661553
 ] 

Duo Zhang commented on HBASE-21363:
---

[~allan163] Where is the code we call resetModified on the tracker in 
ProcedureWALFormatReader? I can find it.

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661548#comment-16661548
 ] 

stack commented on HBASE-21344:
---

bq. Actually, I'm working against branch-2.0 only, here you can see 
tableStateManager is started 2 times,

My bad. Indeed, 2.0 has this. 2.1 does not.

What you doing here from patch?

1096  Optional optProc = 
this.procedureExecutor.getProcedures().stream()
1097  .filter(p -> p instanceof ServerCrashProcedure).map(o -> 
(ServerCrashProcedure) o)
1098  .filter(s -> s.hasMetaTableRegion()).findAny();

You are reporting SCPs only if they have meta on them? Isn't this method more 
generic than just meta searches?

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.

[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661544#comment-16661544
 ] 

Ankit Singhal commented on HBASE-21344:
---

bq. You don't seem to be working against the tip of branch-2.0 or branch-2.1. 
You seem to be working in your own branch? Is that so? If so, startup has 
changed pretty radically since 2.0.0.
Actually, I'm working against branch-2.0 only, here you can see 
tableStateManager is started 2 times,

At this instance, we only wait for IMP( which will be ok during the first start 
after deploy) but not when there are SCPs.
https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L929

TableStateManger is started after meta is actually online(which is correct).
https://github.com/apache/hbase/blob/branch-2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L958


> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
>

[jira] [Commented] (HBASE-21224) Handle compaction queue duplication

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661536#comment-16661536
 ] 

Hadoop QA commented on HBASE-21224:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
47s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
57s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 1s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m  3s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}285m 41s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}325m  6s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.client.TestFromClientSide |
|   | hadoop.hbase.namespace.TestNamespaceAuditor |
|   | hadoop.hbase.client.TestSnapshotTemporaryDirectoryWithRegionReplicas |
|   | hadoop.hbase.replication.TestReplicationKillSlaveRS |
|   | hadoop.hbase.client.TestSnapshotDFSTemporaryDirectory |
|   | hadoop.hbase.quotas.TestSpaceQuotas |
|   | hadoop.hbase.regionserver.TestRegionReplicaFailover |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21224 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945266/HBASE-21224-master.004.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 9607297f8ebf 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 3b68e5393e |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1

[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661533#comment-16661533
 ] 

stack commented on HBASE-21364:
---

Ok. Thanks. Yeah, want to cut an RC0 if I can. Thanks for working on this.

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661528#comment-16661528
 ] 

Duo Zhang commented on HBASE-21364:
---

This is a critical problem so mark it as blocker for 2.1.1.

And for the patch, I suggest that we split it into two piece. The verbose 
related code can be done in a separated issue, and can be committed to all 
branches. And the code for fixing the actual problem should be committed to 
branch-2.1 and branch-2.0, which should be done ASAP as we want to push out 
2.1.1 now.

Ping [~stack].



> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21364:
--
Priority: Blocker  (was: Major)

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Blocker
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21364:
--
Fix Version/s: 2.0.3
   2.1.1

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21363) Rewrite the buildingHoldCleanupTracker method in WALProcedureStore

2018-10-23 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661525#comment-16661525
 ] 

Duo Zhang commented on HBASE-21363:
---

Oh shit. Let me check the code. I think this should be in 2.1. The patch is 
almost there. Thanks.

> Rewrite the buildingHoldCleanupTracker method in WALProcedureStore
> --
>
> Key: HBASE-21363
> URL: https://issues.apache.org/jira/browse/HBASE-21363
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21363-v1.patch, HBASE-21363-v2.patch, 
> HBASE-21363.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661524#comment-16661524
 ] 

stack commented on HBASE-20828:
---

I need to write up what is in here. The subtasks have changed AMv2 for the 
better. Stuff like HBASE-21278 where now we do not try to rollback successful 
procedures but rather the parent needs to schedule compensatory, new Procedures 
needs evangelizing. Ditto the background task that is trying to limit our 
backlog of master proc wals TODO.

> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX

2018-10-23 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21372:
--
Attachment: HBASE-21372.branch-2.1.001.patch

> Set hbase.assignment.maximum.attempts to Long.MAX
> -
>
> Key: HBASE-21372
> URL: https://issues.apache.org/jira/browse/HBASE-21372
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Attachments: HBASE-21372.branch-2.1.001.patch, 
> HBASE-21372.branch-2.1.001.patch
>
>
> From parent issue, [~allan163] suggests that we not give up on assign unless 
> there a change -- an SCP triggers failure -- or at the extreme, an operator 
> intervenes. This jibes w/ how we're thinking about assign (or to put it 
> another way, we have no handling for the case where we exhaust retries).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21372) Set hbase.assignment.maximum.attempts to Long.MAX

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661515#comment-16661515
 ] 

Hadoop QA commented on HBASE-21372:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
15s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
13s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
58s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} branch-2.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
16s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
13m 20s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}210m 22s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}261m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 |
| JIRA Issue | HBASE-21372 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945268/HBASE-21372.branch-2.1.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux cd3b360c9c27 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.1 / e29ce9f937 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14830/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14830/testReport/ |
| Max. process+thread cou

[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661513#comment-16661513
 ] 

stack commented on HBASE-21344:
---

[~an...@apache.org] You don't seem to be working against the tip of branch-2.0 
or branch-2.1. You seem to be working in your own branch? Is that so? If so, 
startup has changed pretty radically since 2.0.0.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenCo

[jira] [Commented] (HBASE-21349) Cluster is going down but CatalogJanitor and Normalizer try to run and fail noisely

2018-10-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661486#comment-16661486
 ] 

Hadoop QA commented on HBASE-21349:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
 1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
19s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
29s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 29s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 40s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}173m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.client.TestBlockEvictionFromClient |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21349 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945273/HBASE-21349.master.002.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux fc32ee6e94ae 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 1e9d998727 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14831/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/jo

[jira] [Work stopped] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"

2018-10-23 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-21373 stopped by Xu Cang.
---
> Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for 
> cluster size, it gives little indication"
> -
>
> Key: HBASE-21373
> URL: https://issues.apache.org/jira/browse/HBASE-21373
> Project: HBase
>  Issue Type: Bug
>  Components: Operability
>Reporter: stack
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-21373.branch-1.001.patch
>
>
> Issue to backport to branch-1. Hope you don't mind my assigning it to you Xu 
> Cang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"

2018-10-23 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-21373 started by Xu Cang.
---
> Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for 
> cluster size, it gives little indication"
> -
>
> Key: HBASE-21373
> URL: https://issues.apache.org/jira/browse/HBASE-21373
> Project: HBase
>  Issue Type: Bug
>  Components: Operability
>Reporter: stack
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-21373.branch-1.001.patch
>
>
> Issue to backport to branch-1. Hope you don't mind my assigning it to you Xu 
> Cang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21373) Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for cluster size, it gives little indication"

2018-10-23 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-21373:

Attachment: HBASE-21373.branch-1.001.patch
Status: Patch Available  (was: Open)

> Backport to branch-1, "HBASE-21338 [balancer] If balancer is an ill-fit for 
> cluster size, it gives little indication"
> -
>
> Key: HBASE-21373
> URL: https://issues.apache.org/jira/browse/HBASE-21373
> Project: HBase
>  Issue Type: Bug
>  Components: Operability
>Reporter: stack
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-21373.branch-1.001.patch
>
>
> Issue to backport to branch-1. Hope you don't mind my assigning it to you Xu 
> Cang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21073) "Maintenance mode" master

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661421#comment-16661421
 ] 

Hudson commented on HBASE-21073:


Results for branch branch-2.0
[build #1002 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1002//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> "Maintenance mode" master
> -
>
> Key: HBASE-21073
> URL: https://issues.apache.org/jira/browse/HBASE-21073
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, hbck2, master
>Reporter: stack
>Assignee: Mike Drob
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21073.branch-2.001.patch, 
> HBASE-21073.branch-2.1.001.patch, HBASE-21073.branch-2.1.002.patch, 
> HBASE-21073.master.001.patch, HBASE-21073.master.002.patch, 
> HBASE-21073.master.003.patch, HBASE-21073.master.004.patch, 
> HBASE-21073.master.005.patch, HBASE-21073.master.006.patch, 
> HBASE-21073.master.007.patch, HBASE-21073.master.008.patch, 
> HBASE-21073.master.009.patch, HBASE-21073.master.010.patch, 
> HBASE-21073.master.011.patch
>
>
> Make it so we can bring up a Master in "maintenance mode". This is parse of 
> master wal procs but not taking on regionservers. It would be in a state 
> where "repair" Procedures could run; e.g. a Procedure that could recover meta 
> by looking for meta WALs, splitting them, dropping recovered.edits, and even 
> making it so meta is readable. See parent issue for why needed (disaster 
> recovery).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661395#comment-16661395
 ] 

Ankit Singhal commented on HBASE-21344:
---

bq. This should be happening already. We wait on meta assign. If SCPs, they'll 
run and recover meta if one of them was holding it. If no assign for meta in 
the procedure store, then something untoward and at least for now, operator 
needs to figure what happened until we fix the bug. Operator can schedule an 
assign with hbck2
bq. branch-2.0 will go into a holding pattern if hbase:meta is not assigned 
(ditto if hbase:namespace is not assigned) waiting on operator intevention to 
clear the lack-of-assign.
Thanks [~stack] for the pointer, I didn't go down as the problem was started 
when we are starting tableStateManager without waiting for meta assignment by 
SCPs. I think we can just remove this from here as we already starting after 
waiting for meta to get online.(attached patch for the same)
{code}
 if (initMetaProc != null) {
   initMetaProc.await();
 }
-tableStateManager.start();
{code}

bq. That said, I see some value in this patch. In particular the bit around 
resetting hbase:meta state if failure.
We shouldn't offline the meta if we are failing the assignment as it will start 
the InitMetaProcedure (which we don't want as SCP need to take care of 
recovering of Meta).



> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7

[jira] [Updated] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-23 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-21344:
--
Attachment: HBASE-21344-branch-2.0_v2.patch

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch, 
> HBASE-21344-branch-2.0_v2.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at 
> java.util.concurrent.CompletableFuture.complete

[jira] [Resolved] (HBASE-21353) TestHBCKCommandLineParsing#testCommandWithOptions hangs on call to HBCK2#checkHBCKSupport

2018-10-23 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-21353.
---
   Resolution: Fixed
 Assignee: stack
Fix Version/s: hbck2-1.0.0

Pushed fix over on hbase-operator-tools/hbase-hbck2.

> TestHBCKCommandLineParsing#testCommandWithOptions hangs on call to 
> HBCK2#checkHBCKSupport
> -
>
> Key: HBASE-21353
> URL: https://issues.apache.org/jira/browse/HBASE-21353
> Project: HBase
>  Issue Type: Test
>  Components: hbase-operator-tools, hbck2
>Reporter: Ted Yu
>Assignee: stack
>Priority: Major
> Fix For: hbck2-1.0.0
>
>
> I noticed the following when running 
> TestHBCKCommandLineParsing#testCommandWithOptions :
> {code}
> "main" #1 prio=5 os_prio=31 tid=0x7f851c80 nid=0x1703 waiting on 
> condition [0x70216000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00076d3055d8> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:564)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.(ConnectionImplementation.java:297)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:229)
>   at 
> org.apache.hadoop.hbase.client.ConnectionFactory$$Lambda$11/502838712.run(Unknown
>  Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:347)
>   at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:227)
>   at 
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:127)
>   at org.apache.hbase.HBCK2.checkHBCKSupport(HBCK2.java:93)
>   at org.apache.hbase.HBCK2.run(HBCK2.java:352)
>   at 
> org.apache.hbase.TestHBCKCommandLineParsing.testCommandWithOptions(TestHBCKCommandLineParsing.java:62)
> {code}
> The test doesn't spin up hbase cluster.
> Hence the call to check hbck support hangs.
> In HBCK2#run, we can refactor the code such that argument parsing is done 
> prior to calling HBCK2#checkHBCKSupport .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >