[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658604#comment-16658604
 ] 

Hadoop QA commented on HBASE-20973:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
41s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
28s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
33s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
58s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
45s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
56s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 47s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
6s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}140m 55s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}186m 41s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.coprocessor.TestMetaTableMetrics |
|   | hadoop.hbase.client.TestRestoreSnapshotFromClientWithRegionReplicas |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 |
| JIRA Issue | HBASE-20973 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944930/HBASE-20973.branch-2.0.002.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 3c2a3a66f876 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 

[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-10-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658576#comment-16658576
 ] 

Hudson commented on HBASE-20952:


Results for branch HBASE-20952
[build #25 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/25/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/25//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/25//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/25//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-21 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658540#comment-16658540
 ] 

Allan Yang commented on HBASE-21344:


For the problem SCP or AP may fail due to all kinds of issues (security issue 
like this one), I suggest we can set hbase.assignment.maximum.attempts to 
Long.MAX. So AP will try forever, until we resolve the problem which cause the 
region can't assign.
We can't afford that the AP fails(thus SCP roll back) leaving some region 
unassigned and we don't know about it(since the corresponding procedures are 
rolled back and the region won't show as RIT in the WebUI ). I think branch-1.x 
also have this kind of issue. The assign failed, but shows nothing in WebUI, we 
don't know a region is not online until the customer reach us.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, details=row 'backup:system' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, 
> exception=java.io.IOException: Meta region is in state OPENING
> at 
> org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> at 
> 

[jira] [Commented] (HBASE-21325) Force to terminate regionserver when abort hang in somewhere

2018-10-21 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658541#comment-16658541
 ] 

Guanghao Zhang commented on HBASE-21325:


I found the ut not hanged anymore after i rebase master code... Need dig more.

> Force to terminate regionserver when abort hang in somewhere
> 
>
> Key: HBASE-21325
> URL: https://issues.apache.org/jira/browse/HBASE-21325
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21325.master.001.patch, 
> HBASE-21325.master.001.patch
>
>
> When testing sync replication, I found that, if I transit the remote cluster 
> to DA, while the local cluster is still in A, the region server will hang 
> when shutdown. As the fsOk flag only test the local cluster(which is 
> reasonable), we will enter the waitOnAllRegionsToClose, and since the WAL is 
> broken(the remote wal directory is gone)  so we will never succeed. And this 
> lead to an infinite wait inside waitOnAllRegionsToClose.
> So I think here we should have an upper bound for the wait time in 
> waitOnAllRegionsToClose method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21316) All RegionServer Down when RS_LOG_REPLAY_OPS

2018-10-21 Thread justice (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

justice updated HBASE-21316:

Labels: easyfix  (was: )
Attachment: 0001-add-catch-for-ArrayIndexOutOfBoundsException-when-ch.patch
  Tags: HBASE-21316,WALSplitter
Status: Patch Available  (was: Open)

> All RegionServer Down when RS_LOG_REPLAY_OPS
> 
>
> Key: HBASE-21316
> URL: https://issues.apache.org/jira/browse/HBASE-21316
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: justice
>Priority: Major
>  Labels: easyfix
> Attachments: 
> 0001-add-catch-for-ArrayIndexOutOfBoundsException-when-ch.patch, log.tgz
>
>
> 1. One RegionServer die as unknow reason, log as follow:
> {code:java}
> 2018-10-14 20:31:47,423 INFO [main-SendThread(11.3.20.101:2181)] 
> zookeeper.ClientCnxn: Socket connection established to 
> 11.3.20.101/11.3.20.101:2181, initiating session 2018-10-14 20:31:47,433 INFO 
> [main-SendThread(11.3.20.101:2181)] zookeeper.ClientCnxn: Session 
> establishment complete on server 11.3.20.101/11.3.20.101:2181, sessionid = 
> 0x6500073f944a8e79, negotiated timeout = 3 2018-10-14日 Sunday 21:03:05 
> CST Starting regionserver on 11-3-19-199.JD.LOCAL core file size (blocks, -c) 
> 0 data seg size (kbytes, -d) unlimited
> {code}
> 2. Master receive zk deletenode event, and start ServerCrashProcedure Task
> {code:java}
> 2018-10-14 20:31:47,437 INFO [main-EventThread] master.RegionServerTracker: 
> RegionServer ephemeral node deleted, processing expiration 
> [11-3-19-199.jd.local,16020,1539492869470]
> 2018-10-14 20:31:47,539 INFO [PEWorker-1] procedure.ServerCrashProcedure: 
> Start pid=25053, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
> server=11-3-19-199.jd.local,16020,1539492869470, splitWal=true, meta=false
> 2018-10-14 20:31:47,550 INFO [PEWorker-1] master.SplitLogManager: Started 
> splitting 63 logs in 
> [hdfs://11-3-18-67.JD.LOCAL:9000/hbase/WALs/11-3-19-199.jd.local,16020,1539492869470-splitting]
>  for [11-3-19-199.jd.local,16020,1539492869470] ... 2018-10-14 20:31:48,592 
> INFO [main-EventThread] coordination.SplitLogManagerCoordination: Task 
> /hbase/splitWAL/WALs%2F11-3-19-199.jd.local%2C16020%2C1539492869470-splitting%2F11-3-19-199.jd.local%252C16020%252C1539492869470.1539520250598
>  acquired by 11-3-18-71.jd.local,16020,1539492869409
> {code}
> 3. One alive RegionServer Node get SplitLogWorker,  has an error and stop
> {code:java}
> 2018-10-14 20:31:48,602 INFO [SplitLogWorker-11-3-18-71:16020] 
> coordination.ZkSplitLogWorkerCoordination: worker 
> 11-3-18-71.jd.local,16020,1539492869409 acquired task 
> /hbase/splitWAL/WALs%2F11-3-19-199.jd.local%2C16020%2C1539492869470-splitting%2F11-3-19-199.jd.local%252C16020%252C1539492869470.1539520250598
>  
> ...
> 2018-10-14 21:03:26,219 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/11-3-18-71:16020-1] executor.EventHandler: 
> Caught throwable while processing event RS_LOG_REPLAY
> java.lang.ArrayIndexOutOfBoundsException: 8811
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365)
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358)
> at 
> org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(PrivateCellUtil.java:735)
> at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:816)
> at org.apache.hadoop.hbase.wal.WALEdit.isMetaEditFamily(WALEdit.java:102)
> at org.apache.hadoop.hbase.wal.WALEdit.isMetaEdit(WALEdit.java:107)
> at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:296)
> at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:194)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-10-14 21:03:26,227 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/11-3-18-71:16020-1] 
> regionserver.HRegionServer: * ABORTING region server 
> 11-3-18-71.jd.local,16020,1539522186368: Caught throwable while processing 
> event RS_LOG_REPLAY *
> 
> 2018-10-14 20:31:48,780 INFO 
> [RS_LOG_REPLAY_OPS-regionserver/11-3-18-71:16020-0] 
> regionserver.HRegionServer: * STOPPING region server 
> '11-3-18-71.jd.local,16020,1539492869409' *
> {code}
> 4. other alive regionserver node die one by one, at last, all regionserver 
> node die



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21316) All RegionServer Down when RS_LOG_REPLAY_OPS

2018-10-21 Thread justice (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658533#comment-16658533
 ] 

justice commented on HBASE-21316:
-

I don't save the WAL files with this environment , And there has not catch 
java.lang.ArrayIndexOutOfBoundsException at splitLogFile() function of 
WALSplitter

> All RegionServer Down when RS_LOG_REPLAY_OPS
> 
>
> Key: HBASE-21316
> URL: https://issues.apache.org/jira/browse/HBASE-21316
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: justice
>Priority: Major
> Attachments: log.tgz
>
>
> 1. One RegionServer die as unknow reason, log as follow:
> {code:java}
> 2018-10-14 20:31:47,423 INFO [main-SendThread(11.3.20.101:2181)] 
> zookeeper.ClientCnxn: Socket connection established to 
> 11.3.20.101/11.3.20.101:2181, initiating session 2018-10-14 20:31:47,433 INFO 
> [main-SendThread(11.3.20.101:2181)] zookeeper.ClientCnxn: Session 
> establishment complete on server 11.3.20.101/11.3.20.101:2181, sessionid = 
> 0x6500073f944a8e79, negotiated timeout = 3 2018-10-14日 Sunday 21:03:05 
> CST Starting regionserver on 11-3-19-199.JD.LOCAL core file size (blocks, -c) 
> 0 data seg size (kbytes, -d) unlimited
> {code}
> 2. Master receive zk deletenode event, and start ServerCrashProcedure Task
> {code:java}
> 2018-10-14 20:31:47,437 INFO [main-EventThread] master.RegionServerTracker: 
> RegionServer ephemeral node deleted, processing expiration 
> [11-3-19-199.jd.local,16020,1539492869470]
> 2018-10-14 20:31:47,539 INFO [PEWorker-1] procedure.ServerCrashProcedure: 
> Start pid=25053, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
> server=11-3-19-199.jd.local,16020,1539492869470, splitWal=true, meta=false
> 2018-10-14 20:31:47,550 INFO [PEWorker-1] master.SplitLogManager: Started 
> splitting 63 logs in 
> [hdfs://11-3-18-67.JD.LOCAL:9000/hbase/WALs/11-3-19-199.jd.local,16020,1539492869470-splitting]
>  for [11-3-19-199.jd.local,16020,1539492869470] ... 2018-10-14 20:31:48,592 
> INFO [main-EventThread] coordination.SplitLogManagerCoordination: Task 
> /hbase/splitWAL/WALs%2F11-3-19-199.jd.local%2C16020%2C1539492869470-splitting%2F11-3-19-199.jd.local%252C16020%252C1539492869470.1539520250598
>  acquired by 11-3-18-71.jd.local,16020,1539492869409
> {code}
> 3. One alive RegionServer Node get SplitLogWorker,  has an error and stop
> {code:java}
> 2018-10-14 20:31:48,602 INFO [SplitLogWorker-11-3-18-71:16020] 
> coordination.ZkSplitLogWorkerCoordination: worker 
> 11-3-18-71.jd.local,16020,1539492869409 acquired task 
> /hbase/splitWAL/WALs%2F11-3-19-199.jd.local%2C16020%2C1539492869470-splitting%2F11-3-19-199.jd.local%252C16020%252C1539492869470.1539520250598
>  
> ...
> 2018-10-14 21:03:26,219 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/11-3-18-71:16020-1] executor.EventHandler: 
> Caught throwable while processing event RS_LOG_REPLAY
> java.lang.ArrayIndexOutOfBoundsException: 8811
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365)
> at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358)
> at 
> org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(PrivateCellUtil.java:735)
> at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:816)
> at org.apache.hadoop.hbase.wal.WALEdit.isMetaEditFamily(WALEdit.java:102)
> at org.apache.hadoop.hbase.wal.WALEdit.isMetaEdit(WALEdit.java:107)
> at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:296)
> at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:194)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:99)
> at 
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-10-14 21:03:26,227 ERROR 
> [RS_LOG_REPLAY_OPS-regionserver/11-3-18-71:16020-1] 
> regionserver.HRegionServer: * ABORTING region server 
> 11-3-18-71.jd.local,16020,1539522186368: Caught throwable while processing 
> event RS_LOG_REPLAY *
> 
> 2018-10-14 20:31:48,780 INFO 
> [RS_LOG_REPLAY_OPS-regionserver/11-3-18-71:16020-0] 
> regionserver.HRegionServer: * STOPPING region server 
> '11-3-18-71.jd.local,16020,1539492869409' *
> {code}
> 4. other alive regionserver node die one by one, at last, all regionserver 
> node die



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-10-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658527#comment-16658527
 ] 

Duo Zhang commented on HBASE-21355:
---

Talked with [~openinx] offline, we think that the problem is the code 
introduced in HBASE-20940. We should not try to open the store files every time 
when calling getRegionInfo, and then close them. It is too expensive. Instead, 
I think we could also check the compacted files when testing whether a region 
is mergable or splittable. And I think we should provide a UT for this, as 
there is no test for HBASE-20940.

> HStore's storeSize is calculated repeatedly which causing the confusing 
> region split 
> -
>
> Key: HBASE-21355
> URL: https://issues.apache.org/jira/browse/HBASE-21355
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.1.1, 2.0.3, 1.4.9, 1.2.9
>
> Attachments: HBASE-21355.branch-1.patch, HBASE-21355.v1.patch
>
>
> When testing the branch-2's write performance in our internal cluster,  we 
> found that the region will be inexplicably split.  
> We use the default ConstantSizeRegionSplitPolicy and 
> hbase.hregion.max.filesize=40G,but  the region will be split even if its 
> bytes size is less than 40G(only ~6G). 
> Checked the code, I found that the following path  will  accumulate the 
> store's storeSize to a very big value, because the path has no reset..
> {code}
> RsRpcServices#getRegionInfo
>   -> HRegion#isMergeable
>-> HRegion#hasReferences
> -> HStore#hasReferences
> -> HStore#openStoreFiles
> {code}
> BTW, we seems forget to maintain the read replica's storeSize when refresh 
> the store files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21325) Force to terminate regionserver when abort hang in somewhere

2018-10-21 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21325:
---
Attachment: HBASE-21325.master.001.patch

> Force to terminate regionserver when abort hang in somewhere
> 
>
> Key: HBASE-21325
> URL: https://issues.apache.org/jira/browse/HBASE-21325
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Attachments: HBASE-21325.master.001.patch, 
> HBASE-21325.master.001.patch
>
>
> When testing sync replication, I found that, if I transit the remote cluster 
> to DA, while the local cluster is still in A, the region server will hang 
> when shutdown. As the fsOk flag only test the local cluster(which is 
> reasonable), we will enter the waitOnAllRegionsToClose, and since the WAL is 
> broken(the remote wal directory is gone)  so we will never succeed. And this 
> lead to an infinite wait inside waitOnAllRegionsToClose.
> So I think here we should have an upper bound for the wait time in 
> waitOnAllRegionsToClose method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-21 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658515#comment-16658515
 ] 

Allan Yang commented on HBASE-21344:


{code}
-if 
(assignmentManager.getRegionStates().getRegionState(RegionInfoBuilder.FIRST_META_REGIONINFO)
-  .isOffline()) {
+RegionState metaRegionState =
+
assignmentManager.getRegionStates().getRegionState(RegionInfoBuilder.FIRST_META_REGIONINFO);
+if (!metaRegionState.isOpened()) {
   Optional> optProc = 
procedureExecutor.getProcedures().stream()
 .filter(p -> p instanceof InitMetaProcedure).findAny();
-  if (optProc.isPresent()) {
+  // check if we are not loading a successful procedure by the last master 
as Meta is still not
+  // in OPEN state
+  // this also helps in unnecessary waiting on the latch(and get stuck) as 
the countdown was
+  // reset and will never be down to zero as the procedure is not running
+  if (optProc.isPresent() && !((InitMetaProcedure) 
optProc.get()).isSuccess()) {
 initMetaProc = (InitMetaProcedure) optProc.get();
   } else {
 // schedule an init meta procedure if meta has not been deployed yet
{code}
This is not right, the meta table may on a crashed server, and in RIT state, if 
we assign it directly, it may loss some data, since the WAL may not replayed.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, 
> retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state 
> OPENING, 

[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'

2018-10-21 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658508#comment-16658508
 ] 

Allan Yang commented on HBASE-21354:


I add some comments in the holdingCleanupTracker() method, it is not what you 
mean?

> Procedure may be deleted improperly during master restarts resulting in 
> 'Corrupt'
> -
>
> Key: HBASE-21354
> URL: https://issues.apache.org/jira/browse/HBASE-21354
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21354.branch-2.0.001.patch, 
> HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch, 
> HBASE-21354.branch-2.0.004.patch
>
>
> Good news! [~stack], [~Apache9], I may find the root cause of mysterious 
> ‘Corrupted procedure’ or some procedures disappeared after master 
> restarts(happens during ITBLL).
> This is because during master restarts, we load procedures from the log, and 
> builds the 'holdingCleanupTracker' according each log's tracker. We may mark 
> a procedure in the oldest log as deleted if one log doesn't contain the 
> procedure. This is Inappropriate since one log will not contain info of the 
> log if this procedure was not updated during the time. We can only delete the 
> procedure only if it is not in the global tracker, which have the whole 
> picture.
> {code}
> trackerNode = tracker.lookupClosestNode(trackerNode, procId);
> if (trackerNode == null || !trackerNode.contains(procId) ||
>   trackerNode.isModified(procId)) {
>   // the procedure was removed or modified
>   node.delete(procId);
> }
> {code}
> A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the 
> corruption happened in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-10-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658502#comment-16658502
 ] 

Hadoop QA commented on HBASE-21355:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HBASE-21355 does not apply to branch-1. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.8.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-21355 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944932/HBASE-21355.branch-1.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14791/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> HStore's storeSize is calculated repeatedly which causing the confusing 
> region split 
> -
>
> Key: HBASE-21355
> URL: https://issues.apache.org/jira/browse/HBASE-21355
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.1.1, 2.0.3, 1.4.9, 1.2.9
>
> Attachments: HBASE-21355.branch-1.patch, HBASE-21355.v1.patch
>
>
> When testing the branch-2's write performance in our internal cluster,  we 
> found that the region will be inexplicably split.  
> We use the default ConstantSizeRegionSplitPolicy and 
> hbase.hregion.max.filesize=40G,but  the region will be split even if its 
> bytes size is less than 40G(only ~6G). 
> Checked the code, I found that the following path  will  accumulate the 
> store's storeSize to a very big value, because the path has no reset..
> {code}
> RsRpcServices#getRegionInfo
>   -> HRegion#isMergeable
>-> HRegion#hasReferences
> -> HStore#hasReferences
> -> HStore#openStoreFiles
> {code}
> BTW, we seems forget to maintain the read replica's storeSize when refresh 
> the store files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-10-21 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21355:
-
Fix Version/s: 1.2.9
   1.4.9
   1.3.3
   1.5.0

> HStore's storeSize is calculated repeatedly which causing the confusing 
> region split 
> -
>
> Key: HBASE-21355
> URL: https://issues.apache.org/jira/browse/HBASE-21355
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.1.1, 2.0.3, 1.4.9, 1.2.9
>
> Attachments: HBASE-21355.branch-1.patch, HBASE-21355.v1.patch
>
>
> When testing the branch-2's write performance in our internal cluster,  we 
> found that the region will be inexplicably split.  
> We use the default ConstantSizeRegionSplitPolicy and 
> hbase.hregion.max.filesize=40G,but  the region will be split even if its 
> bytes size is less than 40G(only ~6G). 
> Checked the code, I found that the following path  will  accumulate the 
> store's storeSize to a very big value, because the path has no reset..
> {code}
> RsRpcServices#getRegionInfo
>   -> HRegion#isMergeable
>-> HRegion#hasReferences
> -> HStore#hasReferences
> -> HStore#openStoreFiles
> {code}
> BTW, we seems forget to maintain the read replica's storeSize when refresh 
> the store files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'

2018-10-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658498#comment-16658498
 ] 

Duo Zhang commented on HBASE-21354:
---

{quote}
Done in V4 patch.
{quote}

Where? I skimmed and haven't found...

> Procedure may be deleted improperly during master restarts resulting in 
> 'Corrupt'
> -
>
> Key: HBASE-21354
> URL: https://issues.apache.org/jira/browse/HBASE-21354
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21354.branch-2.0.001.patch, 
> HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch, 
> HBASE-21354.branch-2.0.004.patch
>
>
> Good news! [~stack], [~Apache9], I may find the root cause of mysterious 
> ‘Corrupted procedure’ or some procedures disappeared after master 
> restarts(happens during ITBLL).
> This is because during master restarts, we load procedures from the log, and 
> builds the 'holdingCleanupTracker' according each log's tracker. We may mark 
> a procedure in the oldest log as deleted if one log doesn't contain the 
> procedure. This is Inappropriate since one log will not contain info of the 
> log if this procedure was not updated during the time. We can only delete the 
> procedure only if it is not in the global tracker, which have the whole 
> picture.
> {code}
> trackerNode = tracker.lookupClosestNode(trackerNode, procId);
> if (trackerNode == null || !trackerNode.contains(procId) ||
>   trackerNode.isModified(procId)) {
>   // the procedure was removed or modified
>   node.delete(procId);
> }
> {code}
> A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the 
> corruption happened in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-10-21 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21355:
-
Attachment: HBASE-21355.branch-1.patch

> HStore's storeSize is calculated repeatedly which causing the confusing 
> region split 
> -
>
> Key: HBASE-21355
> URL: https://issues.apache.org/jira/browse/HBASE-21355
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21355.branch-1.patch, HBASE-21355.v1.patch
>
>
> When testing the branch-2's write performance in our internal cluster,  we 
> found that the region will be inexplicably split.  
> We use the default ConstantSizeRegionSplitPolicy and 
> hbase.hregion.max.filesize=40G,but  the region will be split even if its 
> bytes size is less than 40G(only ~6G). 
> Checked the code, I found that the following path  will  accumulate the 
> store's storeSize to a very big value, because the path has no reset..
> {code}
> RsRpcServices#getRegionInfo
>   -> HRegion#isMergeable
>-> HRegion#hasReferences
> -> HStore#hasReferences
> -> HStore#openStoreFiles
> {code}
> BTW, we seems forget to maintain the read replica's storeSize when refresh 
> the store files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'

2018-10-21 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658495#comment-16658495
 ] 

Allan Yang commented on HBASE-21354:


{quote}
Just add a note when building holdingCleanupTracker?
{quote}
Done in V4 patch.
{quote}
Make these info-level?
{quote}
What's your concern, sir [~stack]? I think DEBUG is enough.

{quote}
I've seen when lots of chaos where I cannot clean up a Procedure because 
another holds a lock but the 'other' no longer exists.
{quote}
I don't think this one can solve this issue... If the other procedure is 
holding a lock, that means the procedure is loaded normally form replay, and it 
is somewhere for sure...
This patch only solve issues that the procedures may deleted improperly and 
thus making their parent/children procedure 'corrupt'

> Procedure may be deleted improperly during master restarts resulting in 
> 'Corrupt'
> -
>
> Key: HBASE-21354
> URL: https://issues.apache.org/jira/browse/HBASE-21354
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21354.branch-2.0.001.patch, 
> HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch, 
> HBASE-21354.branch-2.0.004.patch
>
>
> Good news! [~stack], [~Apache9], I may find the root cause of mysterious 
> ‘Corrupted procedure’ or some procedures disappeared after master 
> restarts(happens during ITBLL).
> This is because during master restarts, we load procedures from the log, and 
> builds the 'holdingCleanupTracker' according each log's tracker. We may mark 
> a procedure in the oldest log as deleted if one log doesn't contain the 
> procedure. This is Inappropriate since one log will not contain info of the 
> log if this procedure was not updated during the time. We can only delete the 
> procedure only if it is not in the global tracker, which have the whole 
> picture.
> {code}
> trackerNode = tracker.lookupClosestNode(trackerNode, procId);
> if (trackerNode == null || !trackerNode.contains(procId) ||
>   trackerNode.isModified(procId)) {
>   // the procedure was removed or modified
>   node.delete(procId);
> }
> {code}
> A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the 
> corruption happened in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'

2018-10-21 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21354:
---
Attachment: HBASE-21354.branch-2.0.004.patch

> Procedure may be deleted improperly during master restarts resulting in 
> 'Corrupt'
> -
>
> Key: HBASE-21354
> URL: https://issues.apache.org/jira/browse/HBASE-21354
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21354.branch-2.0.001.patch, 
> HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch, 
> HBASE-21354.branch-2.0.004.patch
>
>
> Good news! [~stack], [~Apache9], I may find the root cause of mysterious 
> ‘Corrupted procedure’ or some procedures disappeared after master 
> restarts(happens during ITBLL).
> This is because during master restarts, we load procedures from the log, and 
> builds the 'holdingCleanupTracker' according each log's tracker. We may mark 
> a procedure in the oldest log as deleted if one log doesn't contain the 
> procedure. This is Inappropriate since one log will not contain info of the 
> log if this procedure was not updated during the time. We can only delete the 
> procedure only if it is not in the global tracker, which have the whole 
> picture.
> {code}
> trackerNode = tracker.lookupClosestNode(trackerNode, procId);
> if (trackerNode == null || !trackerNode.contains(procId) ||
>   trackerNode.isModified(procId)) {
>   // the procedure was removed or modified
>   node.delete(procId);
> }
> {code}
> A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the 
> corruption happened in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-20973:
---
Attachment: HBASE-20973.branch-2.0.002.patch

> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch, 
> HBASE-20973.branch-2.0.002.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive 
> lock held by ModifyTableProcedure was never released. All the procedure 
> against this table were blocked. Until the master restarted, and since the 
> lock info for the procedure won't be restored, the other procedures can go 
> again, it is quite embarrassing that a bug save us...(this bug will be fixed 
> in HBASE-20846)
> I tried to reproduce this one using the test case in HBASE-20921 but I just 
> can't reproduce it.
> A easy way to resolve this is add a try catch, making sure no matter what 
> happens, the table's exclusive lock can always be relased.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21302) Release 1.2.8

2018-10-21 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey resolved HBASE-21302.
-
Resolution: Fixed

release announcement sent to announce@apache and {dev,user}@hbase

> Release 1.2.8
> -
>
> Key: HBASE-21302
> URL: https://issues.apache.org/jira/browse/HBASE-21302
> Project: HBase
>  Issue Type: Task
>  Components: community
>Affects Versions: 1.2.8
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 1.2.8
>
>
> 1.4.8 is out, time to make 1.2.8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21336) Simplify the implementation of WALProcedureMap

2018-10-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658487#comment-16658487
 ] 

Duo Zhang commented on HBASE-21336:
---

Maybe. Now I moved the logic to a separated step so it should not happen any 
more.

> Simplify the implementation of WALProcedureMap
> --
>
> Key: HBASE-21336
> URL: https://issues.apache.org/jira/browse/HBASE-21336
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21336-v1.patch, HBASE-21336-v2.patch, 
> HBASE-21336-v3.patch, HBASE-21336.patch
>
>
> I do not think we need to implement the logic from such a low level, i.e, 
> building complicated linked list by hand, which makes it really hard to 
> understand.
> Let me try to implement it with existing data structures...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658486#comment-16658486
 ] 

Duo Zhang commented on HBASE-20973:
---

I agree to disable grow or merge for now, but my point is that, if the max node 
size limit works correctly, then the grow or merge should not happen, as now 
the max node size is 64...

> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive 
> lock held by ModifyTableProcedure was never released. All the procedure 
> against this table were blocked. Until the master restarted, and since the 
> lock info for the procedure won't be restored, the other procedures can go 
> again, it is quite embarrassing that a bug save us...(this bug will be fixed 
> in HBASE-20846)
> I tried to reproduce this one using the test case in HBASE-20921 but I just 
> can't reproduce it.
> A easy way to resolve this is add a try catch, making sure no matter what 
> happens, the table's exclusive lock can always be relased.



--
This message 

[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658484#comment-16658484
 ] 

Allan Yang commented on HBASE-20973:


{quote}
This does not make sense... we should use Math.abs(67 - 191) to test whether it 
exceeds the max node size?
{quote}
If Math.abs() is not used, then there is no case that BitSetNode can grow. For 
merging, I can't think of a case that two BitSetNode can merged. Unless two 
BitSetNodes are overlap(That is impossible). 
So, since it can't grow or merge in normal cases, what about this patch, can we 
disable them for now, [~Apache9]?

> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive 
> lock held by ModifyTableProcedure was never released. All the procedure 
> against this table were blocked. Until the master restarted, and since the 
> lock info for the procedure won't be restored, the other procedures can go 
> again, it is quite embarrassing that a bug save us...(this bug will be fixed 
> in HBASE-20846)

[jira] [Commented] (HBASE-21336) Simplify the implementation of WALProcedureMap

2018-10-21 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658482#comment-16658482
 ] 

Allan Yang commented on HBASE-21336:


{quote}
 if it has no sub procedures. But now I sort the procedures so the parent 
procedure will arrive at first then the test will always pass and then we will 
set it to RUNNABLE...
{quote}
Do you remember I sent you some logs on Wechat? I encountered a case that a 
parent procedure executed first even there are children procedures. And later 
the children procedures began to execute but only find its parent gone. It may 
have something to do with these logic, may have bug in it...
{code}
2018-10-10 14:38:08,662 DEBUG [PEWorker-2] procedure2.ProcedureExecutor(1451): 
LOCK_EVENT_WAIT pid=15, ppid=14, state=RUNNABLE:REGION_TRANSITION_QUEUE, 
hasLock=false; AssignProcedure table=hbase:acl, 
region=267335c85766c62479fb4a5f18a1e95f
2018-10-10 14:38:08,964 INFO  [PEWorker-1] procedure2.ProcedureExecutor(1461): 
Finished pid=14, state=SUCCESS, hasLock=false; ServerCrashProcedure 
server=hb-uf6oyi699w8h700f0-003,16020,1539076734964, splitWal=true, meta=false 
in 3mins, 18.934sec
2018-10-10 14:38:13,699 WARN  [PEWorker-3] procedure2.ProcedureExecutor(1385): 
Rollback because parent is done/rolledback proc=pid=15, ppid=14, 
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure 
table=hbase:acl, region=267335c85766c62479fb4a5f18a1e95f
{code} 

> Simplify the implementation of WALProcedureMap
> --
>
> Key: HBASE-21336
> URL: https://issues.apache.org/jira/browse/HBASE-21336
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21336-v1.patch, HBASE-21336-v2.patch, 
> HBASE-21336-v3.patch, HBASE-21336.patch
>
>
> I do not think we need to implement the logic from such a low level, i.e, 
> building complicated linked list by hand, which makes it really hard to 
> understand.
> Let me try to implement it with existing data structures...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658478#comment-16658478
 ] 

Duo Zhang commented on HBASE-20973:
---

This does not make sense... we should use Math.abs(67 - 191) to test whether it 
exceeds the max node size?

> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive 
> lock held by ModifyTableProcedure was never released. All the procedure 
> against this table were blocked. Until the master restarted, and since the 
> lock info for the procedure won't be restored, the other procedures can go 
> again, it is quite embarrassing that a bug save us...(this bug will be fixed 
> in HBASE-20846)
> I tried to reproduce this one using the test case in HBASE-20921 but I just 
> can't reproduce it.
> A easy way to resolve this is add a try catch, making sure no matter what 
> happens, the table's exclusive lock can always be relased.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658476#comment-16658476
 ] 

Allan Yang commented on HBASE-20973:


{quote}
So the max node size is useless? Or there are holes where we miss the max size 
check?
{quote}
No, it is not useless. But, a Math.abs() was used to check whether it can grow. 
That means it can't grow up, but it can grow down.
For the code I pasted above. when inserting 129, it will create a BitSetNode 
from 127 to 191. When inserting 67. It will find Math(67 - 127) < max node 
size, so this BItSetNode will grow down to 64. So it became (64-191).

> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive 
> lock held by ModifyTableProcedure was never released. All the procedure 
> against this table were blocked. Until the master restarted, and since the 
> lock info for the procedure won't be restored, the other procedures can go 
> again, it is quite embarrassing that a bug save us...(this bug will be fixed 
> 

[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658473#comment-16658473
 ] 

Duo Zhang commented on HBASE-20973:
---

And the left shift of java is not cyclical, it just uses the lowest several 
bits for shift, I think we need to add comments in the BitSetNode 
implementation about this.

> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive 
> lock held by ModifyTableProcedure was never released. All the procedure 
> against this table were blocked. Until the master restarted, and since the 
> lock info for the procedure won't be restored, the other procedures can go 
> again, it is quite embarrassing that a bug save us...(this bug will be fixed 
> in HBASE-20846)
> I tried to reproduce this one using the test case in HBASE-20921 but I just 
> can't reproduce it.
> A easy way to resolve this is add a try catch, making sure no matter what 
> happens, the table's exclusive lock can always be relased.



--
This message was sent by 

[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658472#comment-16658472
 ] 

Duo Zhang commented on HBASE-20973:
---

So the max node size is useless? Or there are holes where we miss the max size 
check?

And the memory waste is huge if we do not grow the BitSetNode, I'd say. 
Although it seems only a few bytes, but the BitSetNode itself also just consume 
a few bytes, which means that for the worst case the memory could be doubled if 
we can not grow the BitSetNode.

Anyway, correctness is the first thing. I've already filed HBASE-21314 for the 
efficient problem.

Thanks.

> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive 
> lock held by ModifyTableProcedure was never released. All the procedure 
> against this table were blocked. Until the master restarted, and since the 
> lock info for the procedure won't be restored, the other procedures can go 
> again, it is quite embarrassing that a bug save us...(this bug will be fixed 

[jira] [Comment Edited] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658264#comment-16658264
 ] 

Allan Yang edited comment on HBASE-20973 at 10/22/18 1:45 AM:
--

Actually it can, you can use these line of code
{code}
ProcedureStoreTracker tracker = new ProcedureStoreTracker();
tracker.setPartialFlag(false);
tracker.insert(1);
tracker.insert(129);
tracker.insert(67);
{code}
When insert proc=67, the BitSetNode of(127-191)will grow to (64-191).
And Java left shift is cyclical... 1L <<65 will equals 1L << 1 …So left 
shifting more than 64 time is OK.



was (Author: allan163):
Actually it can, you can use these line of code
{code}
ProcedureStoreTracker tracker = new ProcedureStoreTracker();
tracker.setPartialFlag(false);
tracker.insert(1);
tracker.insert(129);
tracker.insert(67);
{code}
When insert proc=67, the BitSetNode of(127-191)will grow to (64-191).
And Java left shift is cyclical... 1L <<65 will equals 1L << 1 …… 


> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious 

[jira] [Comment Edited] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658264#comment-16658264
 ] 

Allan Yang edited comment on HBASE-20973 at 10/22/18 1:45 AM:
--

Actually it can, you can use these lines of code
{code}
ProcedureStoreTracker tracker = new ProcedureStoreTracker();
tracker.setPartialFlag(false);
tracker.insert(1);
tracker.insert(129);
tracker.insert(67);
{code}
When inserting proc=67, the BitSetNode of(127-191)will grow to (64-191).
And Java left shift is cyclical... 1L <<65 will equals 1L << 1 …So left 
shifting more than 64 times is OK.



was (Author: allan163):
Actually it can, you can use these line of code
{code}
ProcedureStoreTracker tracker = new ProcedureStoreTracker();
tracker.setPartialFlag(false);
tracker.insert(1);
tracker.insert(129);
tracker.insert(67);
{code}
When insert proc=67, the BitSetNode of(127-191)will grow to (64-191).
And Java left shift is cyclical... 1L <<65 will equals 1L << 1 …So left 
shifting more than 64 time is OK.


> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> 

[jira] [Comment Edited] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658264#comment-16658264
 ] 

Allan Yang edited comment on HBASE-20973 at 10/22/18 1:44 AM:
--

Actually it can, you can use these line of code
{code}
ProcedureStoreTracker tracker = new ProcedureStoreTracker();
tracker.setPartialFlag(false);
tracker.insert(1);
tracker.insert(129);
tracker.insert(67);
{code}
When insert proc=67, the BitSetNode of(127-191)will grow to (64-191).
And Java left shift is cyclical... 1L <<65 will equals 1L << 1 …… 



was (Author: allan163):
Actually it can, you can use these line of code
{code}
ProcedureStoreTracker tracker = new ProcedureStoreTracker();
tracker.setPartialFlag(false);
tracker.insert(1);
tracker.insert(129);
tracker.insert(67);
{code}
When insert proc=67, the BitSetNode of(64-127)will grow to (64-191).
And Java left shift is cyclical... 1L <<65 will equals 1L << 1 …… 


> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the 

[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658467#comment-16658467
 ] 

Allan Yang commented on HBASE-20973:


{quote}
What would implication be for not growing bitsetnode? There'd be a max on 
possible Procedure counts?
{quote}
I don't think so, growing is just for saving a little memory(a few bytes, which 
are a boolean called partial and a long called start).

> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive 
> lock held by ModifyTableProcedure was never released. All the procedure 
> against this table were blocked. Until the master restarted, and since the 
> lock info for the procedure won't be restored, the other procedures can go 
> again, it is quite embarrassing that a bug save us...(this bug will be fixed 
> in HBASE-20846)
> I tried to reproduce this one using the test case in HBASE-20921 but I just 
> can't reproduce it.
> A easy way to resolve this is add a try catch, making sure no matter what 
> happens, 

[jira] [Commented] (HBASE-21334) TestMergeTableRegionsProcedure is flakey

2018-10-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658466#comment-16658466
 ] 

Duo Zhang commented on HBASE-21334:
---

No useful information in the log. Anyway let me commit the patch to all 
branches first, at least it solves one of the problems. Will keep an eye on the 
flakey dashboard.

> TestMergeTableRegionsProcedure is flakey
> 
>
> Key: HBASE-21334
> URL: https://issues.apache.org/jira/browse/HBASE-21334
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, proc-v2, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21334.patch, 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt
>
>
> {noformat}
> Error Message
> found 5 corrupted procedure(s) on replay
> Stacktrace
> java.io.IOException: found 5 corrupted procedure(s) on replay
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21336) Simplify the implementation of WALProcedureMap

2018-10-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658465#comment-16658465
 ] 

Duo Zhang commented on HBASE-21336:
---

I haven't changed the force update yet. Will address the problem HBASE-20351 
and HBASE-20352.

I sort the procedures by id, before passing it to the upper layer. This is not 
necessary, but could make the log more friendly I think.

Reverted from master as one of the problems has been found in HBASE-21334. Will 
upload a new patch soon to address the comments on rb.

> Simplify the implementation of WALProcedureMap
> --
>
> Key: HBASE-21336
> URL: https://issues.apache.org/jira/browse/HBASE-21336
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21336-v1.patch, HBASE-21336-v2.patch, 
> HBASE-21336-v3.patch, HBASE-21336.patch
>
>
> I do not think we need to implement the logic from such a low level, i.e, 
> building complicated linked list by hand, which makes it really hard to 
> understand.
> Let me try to implement it with existing data structures...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21334) TestMergeTableRegionsProcedure is flakey

2018-10-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658455#comment-16658455
 ] 

Duo Zhang commented on HBASE-21334:
---

Much stable now but TestMergeTableRegionsProcedure still failed once, a NPE...

Let me dig.

> TestMergeTableRegionsProcedure is flakey
> 
>
> Key: HBASE-21334
> URL: https://issues.apache.org/jira/browse/HBASE-21334
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, proc-v2, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21334.patch, 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt
>
>
> {noformat}
> Error Message
> found 5 corrupted procedure(s) on replay
> Stacktrace
> java.io.IOException: found 5 corrupted procedure(s) on replay
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21318) Make RefreshHFilesClient runnable

2018-10-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658438#comment-16658438
 ] 

Hadoop QA commented on HBASE-21318:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
15s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} hbase-examples: The patch generated 0 new + 16 
unchanged - 4 fixed = 16 total (was 20) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
14s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 26s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
26s{color} | {color:green} hbase-examples in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 9s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21318 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944919/HBASE-21318.master.004.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 1edf18626275 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / dd474ef199 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14788/testReport/ |
| Max. process+thread count | 2727 (vs. ulimit of 1) |
| modules | C: hbase-examples U: hbase-examples |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14788/console |
| Powered by | Apache Yetus 0.8.0   

[jira] [Commented] (HBASE-21318) Make RefreshHFilesClient runnable

2018-10-21 Thread Tak Lon (Stephen) Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658428#comment-16658428
 ] 

Tak Lon (Stephen) Wu commented on HBASE-21318:
--

Thanks [~yuzhih...@gmail.com] for reviewing it, I have attached new patch with 
more style fixes as well.

> Make RefreshHFilesClient runnable
> -
>
> Key: HBASE-21318
> URL: https://issues.apache.org/jira/browse/HBASE-21318
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile
>Affects Versions: 3.0.0, 1.5.0, 2.1.2
>Reporter: Tak Lon (Stephen) Wu
>Assignee: Tak Lon (Stephen) Wu
>Priority: Minor
> Attachments: HBASE-21318.master.001.patch, 
> HBASE-21318.master.002.patch, HBASE-21318.master.003.patch, 
> HBASE-21318.master.004.patch
>
>
> Other than when user enables hbase.coprocessor.region.classes with 
> RefreshHFilesEndPoint, user can also run this client as tool runner class/CLI 
> and calls refresh HFiles directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21318) Make RefreshHFilesClient runnable

2018-10-21 Thread Tak Lon (Stephen) Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tak Lon (Stephen) Wu updated HBASE-21318:
-
Attachment: HBASE-21318.master.004.patch

> Make RefreshHFilesClient runnable
> -
>
> Key: HBASE-21318
> URL: https://issues.apache.org/jira/browse/HBASE-21318
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile
>Affects Versions: 3.0.0, 1.5.0, 2.1.2
>Reporter: Tak Lon (Stephen) Wu
>Assignee: Tak Lon (Stephen) Wu
>Priority: Minor
> Attachments: HBASE-21318.master.001.patch, 
> HBASE-21318.master.002.patch, HBASE-21318.master.003.patch, 
> HBASE-21318.master.004.patch
>
>
> Other than when user enables hbase.coprocessor.region.classes with 
> RefreshHFilesEndPoint, user can also run this client as tool runner class/CLI 
> and calls refresh HFiles directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-21 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658426#comment-16658426
 ] 

Ankit Singhal commented on HBASE-21344:
---

bq. To what are you referring? Looking in AP and SCP, there is no rollback – 
but you are referring to a specific location. Looking at patch, it looks like 
you are working in AP#FAILED_OPEN.
yes, I meant AP#FAILED_OPEN. Thank you for checking.

bq. 400 try
bq. { 401   handleFailure(env, regionNode); 402 }
bq. catch (IOException e)
bq. { 403   return false; 404   }
bq. Previous we'd let out IOEs now you are catching them and converting them to 
false. Maybe catch more local to undoRegionAsOpening since this is new source 
of IOE.
Previously there was no IOException thrown by handleFailure, but agreed, I'll 
catch locally to undoRegionAsOpening and add a warning.

bq. Did you change decrementMinRegionServerCount to public for tests? If so, 
add @VisibleForTesting... 
Yes, Sure will make the change.

bq.Yes. Philosophy is that all recovery is done via SCP since it has the means 
for splitting WALs (or figuring this step can be skipped).
ok, so during master initialization(restart or standby becoming active), do I 
search for SCP in procedure queue(and wait on it) and see if it is holding meta 
and can recover from splitting of logs to the assignment of Meta?

Thank you so much for the review, let me know once you have more comments on 
the same.(I'll be happy to work on them)









> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> {code}
> {code:java}
> { DEBUG [PEWorker-2] 

[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-21 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658410#comment-16658410
 ] 

stack commented on HBASE-21344:
---

bq. Here actually, we are correcting the rollback of assign procedure for Meta 
in both IMP and SCP. 

Do you mean roll back of AssignProcedure? (IMP has one step, making an 
AssignProcedure as subprocedure for meta). Rollback and AP don't really go 
together; AP doesn't support Rollback.

When you say...

bq. Earlier, rollback of assign corrects the meta region node(by moving it to 
offline state)

To what are you referring? Looking in AP and SCP, there is no rollback -- but 
you are referring to a specific location. Looking at patch, it looks like you 
are working in AP#FAILED_OPEN.

If so, your changes in here look good...   I wonder about this one though:

400 try {
401   handleFailure(env, regionNode);
402 } catch (IOException e) {
403   return false;
404 }

Previous we'd let out IOEs now you are catching them and converting them to 
false. Maybe catch more local to undoRegionAsOpening since this is new source 
of IOE.

I think this addition of yours looks good down here in undoRegionAsOpening.

Did you change decrementMinRegionServerCount to public for tests? If so, add 
@VisibleForTesting... annotation sir.

Let me get back to you after I study your tests more. They look good.

bq. Do you think scheduling IMP without checking whether meta logs were split 
or not, will cause in any problem?

Yes. Philosophy is that all recovery is done via SCP since it has the means for 
splitting WALs (or figuring this step can be skipped).

bq. HBASE-21035 looks quite similar but it seems that the handling is more 
related to the case where the procedures WALs is accidentally/ intentionally 
cleared.

That is correct. but the general practice suggested is that when unsure, 
then for now at least, we fall back to operator intervention.

Please be patient with me Ankit. I'm generally slow. It takes me a while to 
understand. Thanks for the help.

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> 

[jira] [Commented] (HBASE-21318) Make RefreshHFilesClient runnable

2018-10-21 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658404#comment-16658404
 ] 

Ted Yu commented on HBASE-21318:


We're close.
{code}
HBase clusters are sharing 
{code}
Remove 'are'.

Looks good otherwise.

> Make RefreshHFilesClient runnable
> -
>
> Key: HBASE-21318
> URL: https://issues.apache.org/jira/browse/HBASE-21318
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile
>Affects Versions: 3.0.0, 1.5.0, 2.1.2
>Reporter: Tak Lon (Stephen) Wu
>Assignee: Tak Lon (Stephen) Wu
>Priority: Minor
> Attachments: HBASE-21318.master.001.patch, 
> HBASE-21318.master.002.patch, HBASE-21318.master.003.patch
>
>
> Other than when user enables hbase.coprocessor.region.classes with 
> RefreshHFilesEndPoint, user can also run this client as tool runner class/CLI 
> and calls refresh HFiles directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21336) Simplify the implementation of WALProcedureMap

2018-10-21 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658403#comment-16658403
 ] 

stack commented on HBASE-21336:
---

Left comments up on rb. This looks great.

> Simplify the implementation of WALProcedureMap
> --
>
> Key: HBASE-21336
> URL: https://issues.apache.org/jira/browse/HBASE-21336
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21336-v1.patch, HBASE-21336-v2.patch, 
> HBASE-21336-v3.patch, HBASE-21336.patch
>
>
> I do not think we need to implement the logic from such a low level, i.e, 
> building complicated linked list by hand, which makes it really hard to 
> understand.
> Let me try to implement it with existing data structures...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21318) Make RefreshHFilesClient runnable

2018-10-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658397#comment-16658397
 ] 

Hadoop QA commented on HBASE-21318:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
56s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
16s{color} | {color:red} hbase-examples: The patch generated 1 new + 19 
unchanged - 1 fixed = 20 total (was 20) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
 3s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
12m 38s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
54s{color} | {color:green} hbase-examples in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 41m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21318 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944917/HBASE-21318.master.003.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 630b4b350c77 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / dd474ef199 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14787/artifact/patchprocess/diff-checkstyle-hbase-examples.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14787/testReport/ |
| Max. process+thread count | 2481 (vs. ulimit of 1) |
| modules | C: hbase-examples U: hbase-examples |
| Console output | 

[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-21 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658392#comment-16658392
 ] 

Ted Yu commented on HBASE-21342:


[~mdrob]:
Can you review the patch one more time ?

Thanks

> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21318) Make RefreshHFilesClient runnable

2018-10-21 Thread Tak Lon (Stephen) Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tak Lon (Stephen) Wu updated HBASE-21318:
-
Attachment: HBASE-21318.master.003.patch

> Make RefreshHFilesClient runnable
> -
>
> Key: HBASE-21318
> URL: https://issues.apache.org/jira/browse/HBASE-21318
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile
>Affects Versions: 3.0.0, 1.5.0, 2.1.2
>Reporter: Tak Lon (Stephen) Wu
>Assignee: Tak Lon (Stephen) Wu
>Priority: Minor
> Attachments: HBASE-21318.master.001.patch, 
> HBASE-21318.master.002.patch, HBASE-21318.master.003.patch
>
>
> Other than when user enables hbase.coprocessor.region.classes with 
> RefreshHFilesEndPoint, user can also run this client as tool runner class/CLI 
> and calls refresh HFiles directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658372#comment-16658372
 ] 

Hadoop QA commented on HBASE-21342:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
50s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
32s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
14s{color} | {color:red} hbase-server: The patch generated 3 new + 3 unchanged 
- 0 fixed = 6 total (was 3) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
22s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 51s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}145m 
16s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}190m 27s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21342 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944909/HBASE-21342.005.patch 
|
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 2f5b93405f3e 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / dd474ef199 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14786/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14786/testReport/ |
| Max. process+thread count | 4679 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Commented] (HBASE-21336) Simplify the implementation of WALProcedureMap

2018-10-21 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658369#comment-16658369
 ] 

stack commented on HBASE-21336:
---

bq. I do not think we need to guarantee any order when loading procedures.

bq. But now I sort the procedures so the parent procedure will arrive at first 
then the test will always pass and then we will set it to RUNNABLE...

bq. And I think the force update logic can also make this happen, as it will 
update the parent procedure if it is stuck there, and mess up the replay 
order(for the old implement). Let me see how to fix this.

So, did you drop ordering? You just have a sort on load? What did you do for 
force update?

Would be sweet if you could do without ordering.

Looking at patch...

> Simplify the implementation of WALProcedureMap
> --
>
> Key: HBASE-21336
> URL: https://issues.apache.org/jira/browse/HBASE-21336
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21336-v1.patch, HBASE-21336-v2.patch, 
> HBASE-21336-v3.patch, HBASE-21336.patch
>
>
> I do not think we need to implement the logic from such a low level, i.e, 
> building complicated linked list by hand, which makes it really hard to 
> understand.
> Let me try to implement it with existing data structures...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'

2018-10-21 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658360#comment-16658360
 ] 

stack commented on HBASE-21354:
---

bq. So this could only happen after restarting, and it also requires that we 
fail to write the trailer for a file(not only the newest file, and maybe this 
could also happen with multiple restarts?)?

I've seen this happen, where on restart, it complains that one of the many WAL 
files is missing its ending/tracker.

bq. LOG.debug("Remove the oldest log {}", logs.getFirst());

Make these info-level? This and their creation on log roll if not already.

I like your adding a 'why' to log message.

+1




> Procedure may be deleted improperly during master restarts resulting in 
> 'Corrupt'
> -
>
> Key: HBASE-21354
> URL: https://issues.apache.org/jira/browse/HBASE-21354
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21354.branch-2.0.001.patch, 
> HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch
>
>
> Good news! [~stack], [~Apache9], I may find the root cause of mysterious 
> ‘Corrupted procedure’ or some procedures disappeared after master 
> restarts(happens during ITBLL).
> This is because during master restarts, we load procedures from the log, and 
> builds the 'holdingCleanupTracker' according each log's tracker. We may mark 
> a procedure in the oldest log as deleted if one log doesn't contain the 
> procedure. This is Inappropriate since one log will not contain info of the 
> log if this procedure was not updated during the time. We can only delete the 
> procedure only if it is not in the global tracker, which have the whole 
> picture.
> {code}
> trackerNode = tracker.lookupClosestNode(trackerNode, procId);
> if (trackerNode == null || !trackerNode.contains(procId) ||
>   trackerNode.isModified(procId)) {
>   // the procedure was removed or modified
>   node.delete(procId);
> }
> {code}
> A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the 
> corruption happened in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'

2018-10-21 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658270#comment-16658270
 ] 

stack edited comment on HBASE-21354 at 10/21/18 7:58 PM:
-

Makes sense [~allan163] Nice find sir. Here's hoping this addresses the weird 
issue I've seen when lots of chaos where I cannot clean up a Procedure because 
another holds a lock but the 'other' no longer exists.

(I removed  my 'nit' after taking a deeper look -- log makes sense).

Great test.


was (Author: stack):
Makes sense [~allan163] Nice find sir. Here's hoping this addresses the weird 
issue I've seen when lots of chaos where I cannot clean up a Procedure because 
another holds a lock but the 'other' no longer exists.

nit: These kind of logs w/o adding context -- name of the file being recovered 
-- can be useless  LOG.debug("Starting WAL Procedure Store lease 
recovery"); 

Great test.

> Procedure may be deleted improperly during master restarts resulting in 
> 'Corrupt'
> -
>
> Key: HBASE-21354
> URL: https://issues.apache.org/jira/browse/HBASE-21354
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21354.branch-2.0.001.patch, 
> HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch
>
>
> Good news! [~stack], [~Apache9], I may find the root cause of mysterious 
> ‘Corrupted procedure’ or some procedures disappeared after master 
> restarts(happens during ITBLL).
> This is because during master restarts, we load procedures from the log, and 
> builds the 'holdingCleanupTracker' according each log's tracker. We may mark 
> a procedure in the oldest log as deleted if one log doesn't contain the 
> procedure. This is Inappropriate since one log will not contain info of the 
> log if this procedure was not updated during the time. We can only delete the 
> procedure only if it is not in the global tracker, which have the whole 
> picture.
> {code}
> trackerNode = tracker.lookupClosestNode(trackerNode, procId);
> if (trackerNode == null || !trackerNode.contains(procId) ||
>   trackerNode.isModified(procId)) {
>   // the procedure was removed or modified
>   node.delete(procId);
> }
> {code}
> A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the 
> corruption happened in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658354#comment-16658354
 ] 

Hadoop QA commented on HBASE-21342:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
14s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
10s{color} | {color:red} hbase-server: The patch generated 3 new + 3 unchanged 
- 0 fixed = 6 total (was 3) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
15s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 50s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}143m 36s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}185m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.client.TestBlockEvictionFromClient |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21342 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944908/HBASE-21342.004.patch 
|
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 5a0d9b55f3dc 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / dd474ef199 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14785/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14785/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 

[jira] [Commented] (HBASE-13468) hbase.zookeeper.quorum supports ipv6 address

2018-10-21 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658336#comment-16658336
 ] 

Mike Drob commented on HBASE-13468:
---

You need to add a relocation stanza at 
https://github.com/apache/hbase/blob/master/hbase-shaded/pom.xml#L340

> hbase.zookeeper.quorum supports ipv6 address
> 
>
> Key: HBASE-13468
> URL: https://issues.apache.org/jira/browse/HBASE-13468
> Project: HBase
>  Issue Type: Bug
>Reporter: Mingtao Zhang
>Assignee: maoling
>Priority: Major
> Attachments: HBASE-13468.master.001.patch, 
> HBASE-13468.master.002.patch, HBASE-13468.master.003.patch
>
>
> I put ipv6 address in hbase.zookeeper.quorum, by the time this string went to 
> zookeeper code, the address is messed up, i.e. only '[1234' left. 
> I started using pseudo mode with embedded zk = true.
> I downloaded 1.0.0, not sure which affected version should be here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

2018-10-21 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658335#comment-16658335
 ] 

Ankit Singhal commented on HBASE-21344:
---

Thanks [~stack] for taking a look. PFB , my responses
{quote}So, you are trying to figure the case where the assign in IMP failed to 
succeed – where the region is stuck in the OPENING state – and if you can find 
this condition, you'd reschedule an IMP (the body of which happens to be an 
assign of meta)?
{quote}
Here actually, we are correcting the rollback of assign procedure for Meta in 
both IMP and SCP. We are not re-scheduling the IMP until the master restarts(or 
standby become active) and it finds that meta is still not OPEN.

Earlier, rollback of assign corrects the meta region node(by moving it to 
offline state) but for Meta, we also store state in another meta 
znode(/hbase/meta-region-server), which was not cleared or set back to offline. 
(patch is trying to fix this)
{quote}What you think of the discussion over in HBASE-21035 where we decide to 
punt on auto-assign for now at least (IMP only assigns, doesn't do recovery of 
meta WALs if any).
{quote}
HBASE-21035 looks quite similar but it seems that the handling is more related 
to the case where the procedures WALs is accidentally/ intentionally cleared.

In our case, splitting of meta logs is completed but assign in SCP failed, 
which left the meta region node in the OPENING state still after rollback, now 
meta is never assigned(even after restart) resulting in SCP to never kick-in or 
cluster to get stuck. 
So to fix this we are doing two things:-
 * fixing the meta regionserver node(back to offline state) during the 
rollback(undoRegionOpening)
 * During master initialization, checking meta assignment, if it's still not 
open, we are scheduling another IMP for assignment and waiting on it for 
completion (Do you think scheduling IMP without checking whether meta logs were 
split or not, will cause in any problem?)

> hbase:meta location in ZooKeeper set to OPENING by the procedure which 
> eventually failed but precludes Master from assigning it forever
> ---
>
> Key: HBASE-21344
> URL: https://issues.apache.org/jira/browse/HBASE-21344
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-21344-branch-2.0.patch
>
>
> [~elserj] has already summarized it well.
> 1. hbase:meta was on RS8
> 2. RS8 crashed, SCP was queued for it, meta first
> 3. meta was marked OFFLINE
> 4. meta marked as OPENING on RS3
> 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
> 6. We attempt the openRegion/assignment 10 times, failing each time
> 7. We start rolling back the procedure:
> {code:java}
> 2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: 
> Usually this should not happen, we will release the lock before if the 
> procedure is finished, even if the holdLock is true, arrive here means we 
> have some holes where we do not release the lock. And the releaseLock below 
> may fail since the procedure may have already been deleted from the procedure 
> store.
> 2018-10-08 06:51:24,543 INFO  [PEWorker-9] 
> procedure.MasterProcedureScheduler: pid=48, ppid=47, 
> state=FAILED:REGION_TRANSITION_QUEUE, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 
> checking lock on 1588230740
> {code}
> {code:java}
> 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=47, 
> state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; ServerCrashProcedure 
> server=,16020,1538974612843, splitWal=true, meta=true
> java.lang.UnsupportedOperationException: unhandled 
> state=SERVER_CRASH_GET_REGIONS
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
>   at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
>   at 
> 

[jira] [Commented] (HBASE-21178) [BC break] : Get and Scan operation with a custom converter_class not working

2018-10-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658289#comment-16658289
 ] 

Hudson commented on HBASE-21178:


Results for branch branch-2.1
[build #507 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/507/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/507//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/507//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/507//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> [BC break] : Get and Scan operation with a custom converter_class not working
> -
>
> Key: HBASE-21178
> URL: https://issues.apache.org/jira/browse/HBASE-21178
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.0.0
>Reporter: Subrat Mishra
>Assignee: Subrat Mishra
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21178.master.001.patch, 
> HBASE-21178.master.002.patch, HBASE-21178.master.003.patch
>
>
> Consider a simple scenario:
> {code:java}
> create 'foo', {NAME => 'f1'}
> put 'foo','r1','f1:a',1000
> get 'foo','r1',{COLUMNS => 
> ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']} 
> scan 'foo',{COLUMNS => 
> ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']}{code}
> Both get and scan fails with ERROR
> {code:java}
> ERROR: wrong number of arguments (3 for 1) {code}
> Looks like in table.rb file converter_method expects 3 arguments [(bytes, 
> offset, len)] since version 2.0.0, prior to version 2.0.0 it was taking only 
> 1 argument [(bytes)]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-21 Thread mazhenlin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mazhenlin updated HBASE-21342:
--
Attachment: HBASE-21342.005.patch

> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-21 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658283#comment-16658283
 ] 

Ted Yu commented on HBASE-21342:


bq. It is in finally block.

Perhaps I was not very clear in the previous comment.

postBulkLoadHFile() throws IOE. When this happens, I don't think Java would run 
the rest of the code (decrementUgiReference in this case) in the finally block.
You can either move decrementUgiReference call to the beginning of finally 
block (since it doesn't throw IOE) or, use nested finally in the current 
finally block.



> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-21 Thread mazhenlin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658279#comment-16658279
 ] 

mazhenlin commented on HBASE-21342:
---

{quote}Should the above be enclosed in finally block ?
{quote}
It is in finally block.

 

Other comments were fixed. Thank you very much for the review, I have learned a 
lot.

> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

2018-10-21 Thread mazhenlin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mazhenlin updated HBASE-21342:
--
Attachment: HBASE-21342.004.patch

> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> 
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>Reporter: mazhenlin
>Assignee: mazhenlin
>Priority: Major
> Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21178) [BC break] : Get and Scan operation with a custom converter_class not working

2018-10-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658273#comment-16658273
 ] 

Hudson commented on HBASE-21178:


Results for branch branch-2.0
[build #990 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/990/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/990//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/990//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/990//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> [BC break] : Get and Scan operation with a custom converter_class not working
> -
>
> Key: HBASE-21178
> URL: https://issues.apache.org/jira/browse/HBASE-21178
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.0.0
>Reporter: Subrat Mishra
>Assignee: Subrat Mishra
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21178.master.001.patch, 
> HBASE-21178.master.002.patch, HBASE-21178.master.003.patch
>
>
> Consider a simple scenario:
> {code:java}
> create 'foo', {NAME => 'f1'}
> put 'foo','r1','f1:a',1000
> get 'foo','r1',{COLUMNS => 
> ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']} 
> scan 'foo',{COLUMNS => 
> ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']}{code}
> Both get and scan fails with ERROR
> {code:java}
> ERROR: wrong number of arguments (3 for 1) {code}
> Looks like in table.rb file converter_method expects 3 arguments [(bytes, 
> offset, len)] since version 2.0.0, prior to version 2.0.0 it was taking only 
> 1 argument [(bytes)]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'

2018-10-21 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658270#comment-16658270
 ] 

stack commented on HBASE-21354:
---

Makes sense [~allan163] Nice find sir. Here's hoping this addresses the weird 
issue I've seen when lots of chaos where I cannot clean up a Procedure because 
another holds a lock but the 'other' no longer exists.

nit: These kind of logs w/o adding context -- name of the file being recovered 
-- can be useless  LOG.debug("Starting WAL Procedure Store lease 
recovery"); 

Great test.

> Procedure may be deleted improperly during master restarts resulting in 
> 'Corrupt'
> -
>
> Key: HBASE-21354
> URL: https://issues.apache.org/jira/browse/HBASE-21354
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21354.branch-2.0.001.patch, 
> HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch
>
>
> Good news! [~stack], [~Apache9], I may find the root cause of mysterious 
> ‘Corrupted procedure’ or some procedures disappeared after master 
> restarts(happens during ITBLL).
> This is because during master restarts, we load procedures from the log, and 
> builds the 'holdingCleanupTracker' according each log's tracker. We may mark 
> a procedure in the oldest log as deleted if one log doesn't contain the 
> procedure. This is Inappropriate since one log will not contain info of the 
> log if this procedure was not updated during the time. We can only delete the 
> procedure only if it is not in the global tracker, which have the whole 
> picture.
> {code}
> trackerNode = tracker.lookupClosestNode(trackerNode, procId);
> if (trackerNode == null || !trackerNode.contains(procId) ||
>   trackerNode.isModified(procId)) {
>   // the procedure was removed or modified
>   node.delete(procId);
> }
> {code}
> A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the 
> corruption happened in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658269#comment-16658269
 ] 

stack commented on HBASE-20973:
---

What would implication be for not growing bitsetnode? There'd be a max on 
possible Procedure counts?



> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive 
> lock held by ModifyTableProcedure was never released. All the procedure 
> against this table were blocked. Until the master restarted, and since the 
> lock info for the procedure won't be restored, the other procedures can go 
> again, it is quite embarrassing that a bug save us...(this bug will be fixed 
> in HBASE-20846)
> I tried to reproduce this one using the test case in HBASE-20921 but I just 
> can't reproduce it.
> A easy way to resolve this is add a try catch, making sure no matter what 
> happens, the table's exclusive lock can always be relased.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-13468) hbase.zookeeper.quorum supports ipv6 address

2018-10-21 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658267#comment-16658267
 ] 

Ted Yu commented on HBASE-13468:


We should be able to use commons validator.

[~mdrob] may have more insight on how to avoid the errors seen at the end of 
https://builds.apache.org/job/PreCommit-HBASE-Build/14701/artifact/patchprocess/patch-javac-3.0.0.txt

Thanks

> hbase.zookeeper.quorum supports ipv6 address
> 
>
> Key: HBASE-13468
> URL: https://issues.apache.org/jira/browse/HBASE-13468
> Project: HBase
>  Issue Type: Bug
>Reporter: Mingtao Zhang
>Assignee: maoling
>Priority: Major
> Attachments: HBASE-13468.master.001.patch, 
> HBASE-13468.master.002.patch, HBASE-13468.master.003.patch
>
>
> I put ipv6 address in hbase.zookeeper.quorum, by the time this string went to 
> zookeeper code, the address is messed up, i.e. only '[1234' left. 
> I started using pseudo mode with embedded zk = true.
> I downloaded 1.0.0, not sure which affected version should be here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658264#comment-16658264
 ] 

Allan Yang commented on HBASE-20973:


Actually it can, you can use these line of code
{code}
ProcedureStoreTracker tracker = new ProcedureStoreTracker();
tracker.setPartialFlag(false);
tracker.insert(1);
tracker.insert(129);
tracker.insert(67);
{code}
When insert proc=67, the BitSetNode of(64-127)will grow to (64-191).
And Java left shift is cyclical... 1L <<65 will equals 1L << 1 …… 


> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive 
> lock held by ModifyTableProcedure was never released. All the procedure 
> against this table were blocked. Until the master restarted, and since the 
> lock info for the procedure won't be restored, the other procedures can go 
> again, it is quite embarrassing that a bug save us...(this bug will be fixed 
> in HBASE-20846)
> I tried to reproduce this one using the test case in HBASE-20921 but I just 
> can't 

[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658243#comment-16658243
 ] 

Duo Zhang commented on HBASE-20973:
---

It is a bit strange. As mentioned in HBASE-21314, the max size for a BitSetNode 
is set to 64, which means that a BitSetNode can not grow to more than one long, 
i.e, we can never grow a BitSetNode, or merge two BitSetNodes...

> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive 
> lock held by ModifyTableProcedure was never released. All the procedure 
> against this table were blocked. Until the master restarted, and since the 
> lock info for the procedure won't be restored, the other procedures can go 
> again, it is quite embarrassing that a bug save us...(this bug will be fixed 
> in HBASE-20846)
> I tried to reproduce this one using the test case in HBASE-20921 but I just 
> can't reproduce it.
> A easy way to resolve this is add a try catch, making sure no matter what 
> happens, the table's exclusive lock 

[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'

2018-10-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658241#comment-16658241
 ] 

Duo Zhang commented on HBASE-21354:
---

Just add a note when building holdingCleanupTracker? That we may fail to 
persist the store tracker for a wal proc file, so only the global store tracker 
can be trust to contain all the information, and can be used to determine 
whether a procedure has been deleted.

> Procedure may be deleted improperly during master restarts resulting in 
> 'Corrupt'
> -
>
> Key: HBASE-21354
> URL: https://issues.apache.org/jira/browse/HBASE-21354
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21354.branch-2.0.001.patch, 
> HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch
>
>
> Good news! [~stack], [~Apache9], I may find the root cause of mysterious 
> ‘Corrupted procedure’ or some procedures disappeared after master 
> restarts(happens during ITBLL).
> This is because during master restarts, we load procedures from the log, and 
> builds the 'holdingCleanupTracker' according each log's tracker. We may mark 
> a procedure in the oldest log as deleted if one log doesn't contain the 
> procedure. This is Inappropriate since one log will not contain info of the 
> log if this procedure was not updated during the time. We can only delete the 
> procedure only if it is not in the global tracker, which have the whole 
> picture.
> {code}
> trackerNode = tracker.lookupClosestNode(trackerNode, procId);
> if (trackerNode == null || !trackerNode.contains(procId) ||
>   trackerNode.isModified(procId)) {
>   // the procedure was removed or modified
>   node.delete(procId);
> }
> {code}
> A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the 
> corruption happened in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21334) TestMergeTableRegionsProcedure is flakey

2018-10-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658240#comment-16658240
 ] 

Duo Zhang commented on HBASE-21334:
---

Pushed to master. Let's see how it works.

> TestMergeTableRegionsProcedure is flakey
> 
>
> Key: HBASE-21334
> URL: https://issues.apache.org/jira/browse/HBASE-21334
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, proc-v2, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21334.patch, 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt
>
>
> {noformat}
> Error Message
> found 5 corrupted procedure(s) on replay
> Stacktrace
> java.io.IOException: found 5 corrupted procedure(s) on replay
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21224) Handle compaction queue duplication

2018-10-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658238#comment-16658238
 ] 

Hadoop QA commented on HBASE-21224:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
51s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 3s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
3s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
18s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
17s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 50s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}248m 35s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}294m 29s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.procedure.TestServerCrashProcedureWithReplicas |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21224 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944884/HBASE-21224-master.003.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 22f7293c2930 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / ae5308ac4a |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14782/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14782/testReport/ |
| Max. process+thread count | 4941 (vs. ulimit of 1) |
| modules | C: hbase-server U: 

[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'

2018-10-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658237#comment-16658237
 ] 

Hadoop QA commented on HBASE-21354:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
49s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
10s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 3s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
31s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
54s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 35s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
7s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}116m 22s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}157m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.client.TestRestoreSnapshotFromClientWithRegionReplicas |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 |
| JIRA Issue | HBASE-21354 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944894/HBASE-21354.branch-2.0.003.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux a2d8a26264f4 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.0 / 

[jira] [Commented] (HBASE-21334) TestMergeTableRegionsProcedure is flakey

2018-10-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658235#comment-16658235
 ] 

Hadoop QA commented on HBASE-21334:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
49s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 5s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
10s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m  0s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}247m 22s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}291m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.io.asyncfs.TestSaslFanOutOneBlockAsyncDFSOutput |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21334 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944885/HBASE-21334.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux fcf4a43f965b 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / ae5308ac4a |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14780/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14780/testReport/ |
| Max. process+thread count | 4851 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| 

[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658234#comment-16658234
 ] 

Hadoop QA commented on HBASE-20973:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
43s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
28s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
48s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 45s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
14s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 35m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 |
| JIRA Issue | HBASE-20973 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944897/HBASE-20973.branch-2.0.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 21334225081f 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.0 / 25167fb0f9 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14784/testReport/ |
| Max. process+thread count | 279 (vs. ulimit of 1) |
| modules | C: hbase-procedure U: hbase-procedure |
| Console output | 

[jira] [Commented] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-10-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658221#comment-16658221
 ] 

Hadoop QA commented on HBASE-21355:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
50s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
21s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
26s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 13s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}201m 
44s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}244m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21355 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944888/HBASE-21355.v1.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 2b6833769442 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / ae5308ac4a |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14781/testReport/ |
| Max. process+thread count | 4869 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14781/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> HStore's storeSize is 

[jira] [Updated] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-20973:
---
Status: Patch Available  (was: Open)

> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.0.1, 2.1.0
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive 
> lock held by ModifyTableProcedure was never released. All the procedure 
> against this table were blocked. Until the master restarted, and since the 
> lock info for the procedure won't be restored, the other procedures can go 
> again, it is quite embarrassing that a bug save us...(this bug will be fixed 
> in HBASE-20846)
> I tried to reproduce this one using the test case in HBASE-20921 but I just 
> can't reproduce it.
> A easy way to resolve this is add a try catch, making sure no matter what 
> happens, the table's exclusive lock can always be relased.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-20973:
---
Attachment: HBASE-20973.branch-2.0.001.patch

> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
> Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive 
> lock held by ModifyTableProcedure was never released. All the procedure 
> against this table were blocked. Until the master restarted, and since the 
> lock info for the procedure won't be restored, the other procedures can go 
> again, it is quite embarrassing that a bug save us...(this bug will be fixed 
> in HBASE-20846)
> I tried to reproduce this one using the test case in HBASE-20921 but I just 
> can't reproduce it.
> A easy way to resolve this is add a try catch, making sure no matter what 
> happens, the table's exclusive lock can always be relased.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-10-21 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658220#comment-16658220
 ] 

Allan Yang commented on HBASE-20973:


I haven't find out the root cause of the problem here, but I suspect there is a 
race condition that when one thread to get the bit in the BitSetNode, another 
thread merges the BitSetNode with another one, so that the arrays in BitSetNode 
shrinks, result in an ArrayIndexOutOfBoundsException. I suggest we disable the 
ability of grow and merge of BitSetNode, to avoid this kind of problem until we 
find the root cause, I uploaded a patch to disable them. What do you think, 
[~stack], [~Apache9]?

> ArrayIndexOutOfBoundsException when rolling back procedure
> --
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.1.0, 2.0.1
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Critical
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive 
> lock held by ModifyTableProcedure was never released. All the procedure 
> against this table were blocked. Until the master restarted, and since the 
> lock info for the procedure won't be restored, the other procedures can go 
> again, it is quite embarrassing that a bug save us...(this bug will be fixed 
> in HBASE-20846)

[jira] [Created] (HBASE-21356) bulkLoadHFile API should ensure that rs has the source hfile's write permission

2018-10-21 Thread Zheng Hu (JIRA)
Zheng Hu created HBASE-21356:


 Summary: bulkLoadHFile API should ensure that rs has the source 
hfile's write permission
 Key: HBASE-21356
 URL: https://issues.apache.org/jira/browse/HBASE-21356
 Project: HBase
  Issue Type: Bug
Reporter: Zheng Hu
Assignee: Zheng Hu


If the rs bulk load a HFile but has no write permission of it,  we can read & 
compact the hfile, but after the compaction finished, the HFile willl be moved 
to archive directory,  the HFileCleaner won't has permission to delete, then 
the HFile will always be keep in HDFS. 

Need check the file's write permission when run bulkLoadHFile at server side,  
if no write permission, then reject.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21281) Update bouncycastle dependency.

2018-10-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658196#comment-16658196
 ] 

Hudson commented on HBASE-21281:


Results for branch master
[build #559 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/559/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Update bouncycastle dependency.
> ---
>
> Key: HBASE-21281
> URL: https://issues.apache.org/jira/browse/HBASE-21281
> Project: HBase
>  Issue Type: Task
>  Components: dependencies, test
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: 21281.addendum.patch, 21281.addendum2.patch, 
> HBASE-21281.001.branch-2.0.patch
>
>
> Looks like we still depend on bcprov-jdk16 for some x509 certificate 
> generation in our tests. Bouncycastle has moved beyond this in 1.47, changing 
> the artifact names.
> [http://www.bouncycastle.org/wiki/display/JA1/Porting+from+earlier+BC+releases+to+1.47+and+later]
> There are some API changes too, but it looks like we don't use any of these.
> It seems like we also have vestiges in the POMs from when we were depending 
> on a specific BC version that came in from Hadoop. We now have a 
> KeyStoreTestUtil class in HBase, which makes me think we can also clean up 
> some dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21302) Release 1.2.8

2018-10-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658197#comment-16658197
 ] 

Hudson commented on HBASE-21302:


Results for branch master
[build #559 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/559/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Release 1.2.8
> -
>
> Key: HBASE-21302
> URL: https://issues.apache.org/jira/browse/HBASE-21302
> Project: HBase
>  Issue Type: Task
>  Components: community
>Affects Versions: 1.2.8
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 1.2.8
>
>
> 1.4.8 is out, time to make 1.2.8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21336) Simplify the implementation of WALProcedureMap

2018-10-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658194#comment-16658194
 ] 

Hudson commented on HBASE-21336:


Results for branch master
[build #559 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/559/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Simplify the implementation of WALProcedureMap
> --
>
> Key: HBASE-21336
> URL: https://issues.apache.org/jira/browse/HBASE-21336
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21336-v1.patch, HBASE-21336-v2.patch, 
> HBASE-21336-v3.patch, HBASE-21336.patch
>
>
> I do not think we need to implement the logic from such a low level, i.e, 
> building complicated linked list by hand, which makes it really hard to 
> understand.
> Let me try to implement it with existing data structures...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21194) Add tests in TestCopyTable which exercise MOB feature

2018-10-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658195#comment-16658195
 ] 

Hudson commented on HBASE-21194:


Results for branch master
[build #559 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/559/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add tests in TestCopyTable which exercise MOB feature
> -
>
> Key: HBASE-21194
> URL: https://issues.apache.org/jira/browse/HBASE-21194
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Artem Ervits
>Priority: Minor
>  Labels: mob
> Fix For: 3.0.0
>
> Attachments: 21194.v08.patch, HBASE-21194.v01.patch, 
> HBASE-21194.v02.patch, HBASE-21194.v03.patch, HBASE-21194.v06.patch, 
> HBASE-21194.v07.patch, HBASE-21194.v08.patch
>
>
> Currently TestCopyTable doesn't cover table(s) with MOB feature enabled.
> We should add variant that enables MOB on the table being copied and verify 
> that MOB content is copied correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'

2018-10-21 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21354:
---
Attachment: HBASE-21354.branch-2.0.003.patch

> Procedure may be deleted improperly during master restarts resulting in 
> 'Corrupt'
> -
>
> Key: HBASE-21354
> URL: https://issues.apache.org/jira/browse/HBASE-21354
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21354.branch-2.0.001.patch, 
> HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch
>
>
> Good news! [~stack], [~Apache9], I may find the root cause of mysterious 
> ‘Corrupted procedure’ or some procedures disappeared after master 
> restarts(happens during ITBLL).
> This is because during master restarts, we load procedures from the log, and 
> builds the 'holdingCleanupTracker' according each log's tracker. We may mark 
> a procedure in the oldest log as deleted if one log doesn't contain the 
> procedure. This is Inappropriate since one log will not contain info of the 
> log if this procedure was not updated during the time. We can only delete the 
> procedure only if it is not in the global tracker, which have the whole 
> picture.
> {code}
> trackerNode = tracker.lookupClosestNode(trackerNode, procId);
> if (trackerNode == null || !trackerNode.contains(procId) ||
>   trackerNode.isModified(procId)) {
>   // the procedure was removed or modified
>   node.delete(procId);
> }
> {code}
> A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the 
> corruption happened in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'

2018-10-21 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658190#comment-16658190
 ] 

Allan Yang commented on HBASE-21354:


{quote}
So this could only happen after restarting, and it also requires that we fail 
to write the trailer for a file
{quote}
Yes, you are right, if the latest log has the global tracker written in the 
trailer, then there is no problem.
About the comments, I'm not quit understand you, where to add?

The failed TestWALProcedureStore.testNoTrailerDoubleRestart() is related, it 
helps me find a bug - We only need to check the deleted bits in the global 
tracker. Uploaded a V3 to fix it.

> Procedure may be deleted improperly during master restarts resulting in 
> 'Corrupt'
> -
>
> Key: HBASE-21354
> URL: https://issues.apache.org/jira/browse/HBASE-21354
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21354.branch-2.0.001.patch, 
> HBASE-21354.branch-2.0.002.patch
>
>
> Good news! [~stack], [~Apache9], I may find the root cause of mysterious 
> ‘Corrupted procedure’ or some procedures disappeared after master 
> restarts(happens during ITBLL).
> This is because during master restarts, we load procedures from the log, and 
> builds the 'holdingCleanupTracker' according each log's tracker. We may mark 
> a procedure in the oldest log as deleted if one log doesn't contain the 
> procedure. This is Inappropriate since one log will not contain info of the 
> log if this procedure was not updated during the time. We can only delete the 
> procedure only if it is not in the global tracker, which have the whole 
> picture.
> {code}
> trackerNode = tracker.lookupClosestNode(trackerNode, procId);
> if (trackerNode == null || !trackerNode.contains(procId) ||
>   trackerNode.isModified(procId)) {
>   // the procedure was removed or modified
>   node.delete(procId);
> }
> {code}
> A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the 
> corruption happened in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21302) Release 1.2.8

2018-10-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658161#comment-16658161
 ] 

Hudson commented on HBASE-21302:


Results for branch branch-1.2
[build #520 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.2/520/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.2/520//General_Nightly_Build_Report/]


(/) {color:green}+1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.2/520//JDK7_Nightly_Build_Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.2/520//JDK8_Nightly_Build_Report_(Hadoop2)/]




(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Release 1.2.8
> -
>
> Key: HBASE-21302
> URL: https://issues.apache.org/jira/browse/HBASE-21302
> Project: HBase
>  Issue Type: Task
>  Components: community
>Affects Versions: 1.2.8
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Major
> Fix For: 1.2.8
>
>
> 1.4.8 is out, time to make 1.2.8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21224) Handle compaction queue duplication

2018-10-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658156#comment-16658156
 ] 

Hadoop QA commented on HBASE-21224:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
15s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
18s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 49s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}132m 16s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}174m  6s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.replication.TestSyncReplicationStandbyKillMaster |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21224 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944882/HBASE-21224-master.002.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 01f1e871aabd 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / ae5308ac4a |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14779/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14779/testReport/ |
| Max. process+thread count | 4722 (vs. ulimit of 1) |
| modules | C: hbase-server U: 

[jira] [Resolved] (HBASE-21178) [BC break] : Get and Scan operation with a custom converter_class not working

2018-10-21 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-21178.
---
Resolution: Fixed

Pushed to branch-2.1 and branch-2.0.

> [BC break] : Get and Scan operation with a custom converter_class not working
> -
>
> Key: HBASE-21178
> URL: https://issues.apache.org/jira/browse/HBASE-21178
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.0.0
>Reporter: Subrat Mishra
>Assignee: Subrat Mishra
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21178.master.001.patch, 
> HBASE-21178.master.002.patch, HBASE-21178.master.003.patch
>
>
> Consider a simple scenario:
> {code:java}
> create 'foo', {NAME => 'f1'}
> put 'foo','r1','f1:a',1000
> get 'foo','r1',{COLUMNS => 
> ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']} 
> scan 'foo',{COLUMNS => 
> ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']}{code}
> Both get and scan fails with ERROR
> {code:java}
> ERROR: wrong number of arguments (3 for 1) {code}
> Looks like in table.rb file converter_method expects 3 arguments [(bytes, 
> offset, len)] since version 2.0.0, prior to version 2.0.0 it was taking only 
> 1 argument [(bytes)]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21178) [BC break] : Get and Scan operation with a custom converter_class not working

2018-10-21 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21178:
--
Fix Version/s: 2.0.3
   2.1.1

> [BC break] : Get and Scan operation with a custom converter_class not working
> -
>
> Key: HBASE-21178
> URL: https://issues.apache.org/jira/browse/HBASE-21178
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.0.0
>Reporter: Subrat Mishra
>Assignee: Subrat Mishra
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21178.master.001.patch, 
> HBASE-21178.master.002.patch, HBASE-21178.master.003.patch
>
>
> Consider a simple scenario:
> {code:java}
> create 'foo', {NAME => 'f1'}
> put 'foo','r1','f1:a',1000
> get 'foo','r1',{COLUMNS => 
> ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']} 
> scan 'foo',{COLUMNS => 
> ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']}{code}
> Both get and scan fails with ERROR
> {code:java}
> ERROR: wrong number of arguments (3 for 1) {code}
> Looks like in table.rb file converter_method expects 3 arguments [(bytes, 
> offset, len)] since version 2.0.0, prior to version 2.0.0 it was taking only 
> 1 argument [(bytes)]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-21178) [BC break] : Get and Scan operation with a custom converter_class not working

2018-10-21 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang reopened HBASE-21178:
---

The discussion here mentioned that the fix should be committed to all 2.x 
branches but we haven't committed to branch-2.1 and branch-2.0 yet.

> [BC break] : Get and Scan operation with a custom converter_class not working
> -
>
> Key: HBASE-21178
> URL: https://issues.apache.org/jira/browse/HBASE-21178
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.0.0
>Reporter: Subrat Mishra
>Assignee: Subrat Mishra
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21178.master.001.patch, 
> HBASE-21178.master.002.patch, HBASE-21178.master.003.patch
>
>
> Consider a simple scenario:
> {code:java}
> create 'foo', {NAME => 'f1'}
> put 'foo','r1','f1:a',1000
> get 'foo','r1',{COLUMNS => 
> ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']} 
> scan 'foo',{COLUMNS => 
> ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']}{code}
> Both get and scan fails with ERROR
> {code:java}
> ERROR: wrong number of arguments (3 for 1) {code}
> Looks like in table.rb file converter_method expects 3 arguments [(bytes, 
> offset, len)] since version 2.0.0, prior to version 2.0.0 it was taking only 
> 1 argument [(bytes)]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-10-21 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21355:
-
Attachment: HBASE-21355.v1.patch

> HStore's storeSize is calculated repeatedly which causing the confusing 
> region split 
> -
>
> Key: HBASE-21355
> URL: https://issues.apache.org/jira/browse/HBASE-21355
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21355.v1.patch
>
>
> When testing the branch-2's write performance in our internal cluster,  we 
> found that the region will be inexplicably split.  
> We use the default ConstantSizeRegionSplitPolicy and 
> hbase.hregion.max.filesize=40G,but  the region will be split even if its 
> bytes size is less than 40G(only ~6G). 
> Checked the code, I found that the following path  will  accumulate the 
> store's storeSize to a very big value, because the path has no reset..
> {code}
> RsRpcServices#getRegionInfo
>   -> HRegion#isMergeable
>-> HRegion#hasReferences
> -> HStore#hasReferences
> -> HStore#openStoreFiles
> {code}
> BTW, we seems forget to maintain the read replica's storeSize when refresh 
> the store files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-10-21 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21355:
-
Status: Patch Available  (was: Open)

> HStore's storeSize is calculated repeatedly which causing the confusing 
> region split 
> -
>
> Key: HBASE-21355
> URL: https://issues.apache.org/jira/browse/HBASE-21355
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21355.v1.patch
>
>
> When testing the branch-2's write performance in our internal cluster,  we 
> found that the region will be inexplicably split.  
> We use the default ConstantSizeRegionSplitPolicy and 
> hbase.hregion.max.filesize=40G,but  the region will be split even if its 
> bytes size is less than 40G(only ~6G). 
> Checked the code, I found that the following path  will  accumulate the 
> store's storeSize to a very big value, because the path has no reset..
> {code}
> RsRpcServices#getRegionInfo
>   -> HRegion#isMergeable
>-> HRegion#hasReferences
> -> HStore#hasReferences
> -> HStore#openStoreFiles
> {code}
> BTW, we seems forget to maintain the read replica's storeSize when refresh 
> the store files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-10-21 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21355:
--
Component/s: regionserver

> HStore's storeSize is calculated repeatedly which causing the confusing 
> region split 
> -
>
> Key: HBASE-21355
> URL: https://issues.apache.org/jira/browse/HBASE-21355
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
>
> When testing the branch-2's write performance in our internal cluster,  we 
> found that the region will be inexplicably split.  
> We use the default ConstantSizeRegionSplitPolicy and 
> hbase.hregion.max.filesize=40G,but  the region will be split even if its 
> bytes size is less than 40G(only ~6G). 
> Checked the code, I found that the following path  will  accumulate the 
> store's storeSize to a very big value, because the path has no reset..
> {code}
> RsRpcServices#getRegionInfo
>   -> HRegion#isMergeable
>-> HRegion#hasReferences
> -> HStore#hasReferences
> -> HStore#openStoreFiles
> {code}
> BTW, we seems forget to maintain the read replica's storeSize when refresh 
> the store files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-10-21 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21355:
--
Priority: Blocker  (was: Critical)

> HStore's storeSize is calculated repeatedly which causing the confusing 
> region split 
> -
>
> Key: HBASE-21355
> URL: https://issues.apache.org/jira/browse/HBASE-21355
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
>
> When testing the branch-2's write performance in our internal cluster,  we 
> found that the region will be inexplicably split.  
> We use the default ConstantSizeRegionSplitPolicy and 
> hbase.hregion.max.filesize=40G,but  the region will be split even if its 
> bytes size is less than 40G(only ~6G). 
> Checked the code, I found that the following path  will  accumulate the 
> store's storeSize to a very big value, because the path has no reset..
> {code}
> RsRpcServices#getRegionInfo
>   -> HRegion#isMergeable
>-> HRegion#hasReferences
> -> HStore#hasReferences
> -> HStore#openStoreFiles
> {code}
> BTW, we seems forget to maintain the read replica's storeSize when refresh 
> the store files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21334) TestMergeTableRegionsProcedure is flakey

2018-10-21 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21334:
--
 Assignee: Duo Zhang
Fix Version/s: 2.0.3
   2.1.1
   2.2.0
   3.0.0
   Status: Patch Available  (was: Open)

> TestMergeTableRegionsProcedure is flakey
> 
>
> Key: HBASE-21334
> URL: https://issues.apache.org/jira/browse/HBASE-21334
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, proc-v2, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21334.patch, 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt
>
>
> {noformat}
> Error Message
> found 5 corrupted procedure(s) on replay
> Stacktrace
> java.io.IOException: found 5 corrupted procedure(s) on replay
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'

2018-10-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658131#comment-16658131
 ] 

Hadoop QA commented on HBASE-21354:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
56s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
7s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
27s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
51s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
38s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
48s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m  7s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  2m 13s{color} 
| {color:red} hbase-procedure in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}115m 
32s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}157m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.procedure2.store.wal.TestWALProcedureStore |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 |
| JIRA Issue | HBASE-21354 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12944880/HBASE-21354.branch-2.0.002.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 2f024bdade66 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.0 / 5908655244 |

[jira] [Updated] (HBASE-21334) TestMergeTableRegionsProcedure is flakey

2018-10-21 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21334:
--
Attachment: HBASE-21334.patch

> TestMergeTableRegionsProcedure is flakey
> 
>
> Key: HBASE-21334
> URL: https://issues.apache.org/jira/browse/HBASE-21334
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, proc-v2, test
>Reporter: Duo Zhang
>Priority: Major
> Attachments: HBASE-21334.patch, 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt
>
>
> {noformat}
> Error Message
> found 5 corrupted procedure(s) on replay
> Stacktrace
> java.io.IOException: found 5 corrupted procedure(s) on replay
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21224) Handle compaction queue duplication

2018-10-21 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-21224:

Attachment: HBASE-21224-master.003.patch

> Handle compaction queue duplication
> ---
>
> Key: HBASE-21224
> URL: https://issues.apache.org/jira/browse/HBASE-21224
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Minor
> Attachments: HBASE-21224-master.001.patch, 
> HBASE-21224-master.002.patch, HBASE-21224-master.003.patch
>
>
> Mentioned by [~allan163] that we may want to handle compaction queue 
> duplication in this Jira https://issues.apache.org/jira/browse/HBASE-18451 
> Creating this item for further assessment and discussion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-10-21 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658125#comment-16658125
 ] 

Zheng Hu edited comment on HBASE-21355 at 10/21/18 8:06 AM:


bq. BTW, we seems forget to maintain the read replica's storeSize when refresh 
the store files.
Oh, it's not true,  in the HStore#refreshStoreFilesInternal, we will execute 
completeCompaction in the final, so the storeSize & totalUncompressedBytes will 
also be refreshed..


was (Author: openinx):
bq. BTW, we seems forget to maintain the read replica's storeSize when refresh 
the store files.
Oh, it's no ture,  in the HStore#refreshStoreFilesInternal, we will execute 
completeCompaction in the final, so the storeSize & totalUncompressedBytes will 
also be refreshed..

> HStore's storeSize is calculated repeatedly which causing the confusing 
> region split 
> -
>
> Key: HBASE-21355
> URL: https://issues.apache.org/jira/browse/HBASE-21355
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
>
> When testing the branch-2's write performance in our internal cluster,  we 
> found that the region will be inexplicably split.  
> We use the default ConstantSizeRegionSplitPolicy and 
> hbase.hregion.max.filesize=40G,but  the region will be split even if its 
> bytes size is less than 40G(only ~6G). 
> Checked the code, I found that the following path  will  accumulate the 
> store's storeSize to a very big value, because the path has no reset..
> {code}
> RsRpcServices#getRegionInfo
>   -> HRegion#isMergeable
>-> HRegion#hasReferences
> -> HStore#hasReferences
> -> HStore#openStoreFiles
> {code}
> BTW, we seems forget to maintain the read replica's storeSize when refresh 
> the store files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-10-21 Thread Zheng Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658125#comment-16658125
 ] 

Zheng Hu commented on HBASE-21355:
--

bq. BTW, we seems forget to maintain the read replica's storeSize when refresh 
the store files.
Oh, it's no ture,  in the HStore#refreshStoreFilesInternal, we will execute 
completeCompaction in the final, so the storeSize & totalUncompressedBytes will 
also be refreshed..

> HStore's storeSize is calculated repeatedly which causing the confusing 
> region split 
> -
>
> Key: HBASE-21355
> URL: https://issues.apache.org/jira/browse/HBASE-21355
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
>
> When testing the branch-2's write performance in our internal cluster,  we 
> found that the region will be inexplicably split.  
> We use the default ConstantSizeRegionSplitPolicy and 
> hbase.hregion.max.filesize=40G,but  the region will be split even if its 
> bytes size is less than 40G(only ~6G). 
> Checked the code, I found that the following path  will  accumulate the 
> store's storeSize to a very big value, because the path has no reset..
> {code}
> RsRpcServices#getRegionInfo
>   -> HRegion#isMergeable
>-> HRegion#hasReferences
> -> HStore#hasReferences
> -> HStore#openStoreFiles
> {code}
> BTW, we seems forget to maintain the read replica's storeSize when refresh 
> the store files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21334) TestMergeTableRegionsProcedure is flakey

2018-10-21 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21334:
--
Component/s: test

> TestMergeTableRegionsProcedure is flakey
> 
>
> Key: HBASE-21334
> URL: https://issues.apache.org/jira/browse/HBASE-21334
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, proc-v2, test
>Reporter: Duo Zhang
>Priority: Major
> Attachments: 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt
>
>
> {noformat}
> Error Message
> found 5 corrupted procedure(s) on replay
> Stacktrace
> java.io.IOException: found 5 corrupted procedure(s) on replay
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-10-21 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21355:
-
Description: 
When testing the branch-2's write performance in our internal cluster,  we 
found that the region will be inexplicably split.  

We use the default ConstantSizeRegionSplitPolicy and 
hbase.hregion.max.filesize=40G,but  the region will be split even if its bytes 
size is less than 40G(only ~6G). 

Checked the code, I found that the following path  will  accumulate the store's 
storeSize to a very big value, because the path has no reset..

{code}
RsRpcServices#getRegionInfo
  -> HRegion#isMergeable
   -> HRegion#hasReferences
-> HStore#hasReferences
-> HStore#openStoreFiles
{code}

BTW, we seems forget to maintain the read replica's storeSize when refresh the 
store files.


  was:
When testing the branch-2's write performance in our internal cluster,  we 
found that the region will be inexplicably split.  

We use the default ConstantSizeRegionSplitPolicy and 
hbase.hregion.max.filesize=40G,but  the region will be split even if its bytes 
size is less than 40G(only ~6G). 

Checked the code, I found that the following path  will  accumulate the store's 
storeSize to a very big value, because the path has no reset..

{code}
RsRpcServices#getRegionInfo
  -> HRegion#isMergeable
   -> HRegion#hasReferences
-> HStore#hasReferences
-> HStore#openStoreFiles
{code}

BTW, we seems forget to maintain the read replica's storeSize when 
openStoreFiles.



> HStore's storeSize is calculated repeatedly which causing the confusing 
> region split 
> -
>
> Key: HBASE-21355
> URL: https://issues.apache.org/jira/browse/HBASE-21355
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
>
> When testing the branch-2's write performance in our internal cluster,  we 
> found that the region will be inexplicably split.  
> We use the default ConstantSizeRegionSplitPolicy and 
> hbase.hregion.max.filesize=40G,but  the region will be split even if its 
> bytes size is less than 40G(only ~6G). 
> Checked the code, I found that the following path  will  accumulate the 
> store's storeSize to a very big value, because the path has no reset..
> {code}
> RsRpcServices#getRegionInfo
>   -> HRegion#isMergeable
>-> HRegion#hasReferences
> -> HStore#hasReferences
> -> HStore#openStoreFiles
> {code}
> BTW, we seems forget to maintain the read replica's storeSize when refresh 
> the store files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-10-21 Thread Zheng Hu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21355:
-
Fix Version/s: 2.0.3
   2.1.1
   2.2.0
   3.0.0

> HStore's storeSize is calculated repeatedly which causing the confusing 
> region split 
> -
>
> Key: HBASE-21355
> URL: https://issues.apache.org/jira/browse/HBASE-21355
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
>
> When testing the branch-2's write performance in our internal cluster,  we 
> found that the region will be inexplicably split.  
> We use the default ConstantSizeRegionSplitPolicy and 
> hbase.hregion.max.filesize=40G,but  the region will be split even if its 
> bytes size is less than 40G(only ~6G). 
> Checked the code, I found that the following path  will  accumulate the 
> store's storeSize to a very big value, because the path has no reset..
> {code}
> RsRpcServices#getRegionInfo
>   -> HRegion#isMergeable
>-> HRegion#hasReferences
> -> HStore#hasReferences
> -> HStore#openStoreFiles
> {code}
> BTW, we seems forget to maintain the read replica's storeSize when 
> openStoreFiles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split

2018-10-21 Thread Zheng Hu (JIRA)
Zheng Hu created HBASE-21355:


 Summary: HStore's storeSize is calculated repeatedly which causing 
the confusing region split 
 Key: HBASE-21355
 URL: https://issues.apache.org/jira/browse/HBASE-21355
 Project: HBase
  Issue Type: Bug
Reporter: Zheng Hu
Assignee: Zheng Hu


When testing the branch-2's write performance in our internal cluster,  we 
found that the region will be inexplicably split.  

We use the default ConstantSizeRegionSplitPolicy and 
hbase.hregion.max.filesize=40G,but  the region will be split even if its bytes 
size is less than 40G(only ~6G). 

Checked the code, I found that the following path  will  accumulate the store's 
storeSize to a very big value, because the path has no reset..

{code}
RsRpcServices#getRegionInfo
  -> HRegion#isMergeable
   -> HRegion#hasReferences
-> HStore#hasReferences
-> HStore#openStoreFiles
{code}

BTW, we seems forget to maintain the read replica's storeSize when 
openStoreFiles.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21334) TestMergeTableRegionsProcedure is flakey

2018-10-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658121#comment-16658121
 ] 

Duo Zhang commented on HBASE-21334:
---

OK I think this is test issue. For MergeTableRegionsProcedure and 
SplitTableRegionProcedure, we will schedule TRSPs to bring the region online, 
and since the MergeTableRegionsProcedure or SplitTableRegionProcedure still 
holds the lock when rolling back, the TRSPs can only be executed after the 
rollback is finished, and since we have set kill after every step so these 
TRSPs may also be effected.

We have a piece of code in MasterProcedureTestingUtility to deal with this but 
obviously it does not always work...

{code}
if (waitForAsyncProcs) {
  // Sometimes there are other procedures still executing (including 
asynchronously spawned by
  // procId) and due to KillAndToggleBeforeStoreUpdate flag 
ProcedureExecutor is stopped before
  // store update. Let all pending procedures finish normally.
  if (!procExec.isRunning()) {
LOG.warn("ProcedureExecutor not running, may have been stopped by 
pending procedure due to"
+ " KillAndToggleBeforeStoreUpdate flag.");
ProcedureTestingUtility.setKillAndToggleBeforeStoreUpdate(procExec, 
false);
restartMasterProcedureExecutor(procExec);
ProcedureTestingUtility.waitNoProcedureRunning(procExec);
  }
}
{code}

Let me think how to make it more stable...

> TestMergeTableRegionsProcedure is flakey
> 
>
> Key: HBASE-21334
> URL: https://issues.apache.org/jira/browse/HBASE-21334
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Priority: Major
> Attachments: 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt
>
>
> {noformat}
> Error Message
> found 5 corrupted procedure(s) on replay
> Stacktrace
> java.io.IOException: found 5 corrupted procedure(s) on replay
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'

2018-10-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658115#comment-16658115
 ] 

Duo Zhang commented on HBASE-21354:
---

So this could only happen after restarting, and it also requires that we fail 
to write the trailer for a file(not only the newest file, and maybe this could 
also happen with multiple restarts?)? As when rolling, we will reuse the 
storeTracker, so the storeTracker for a newer file will always contains all the 
procedures for an older file. I think we also need to add this to the comment 
to prevent someone changes it back in the future. And we can also mention in 
the comment that this will not cause any procedures can not be deleted, as if a 
procedure has been deleted, we can always get this information through the 
newest storeTracker(the global one).

No other problems. Nice catch, the patch is great. Thanks our mighty 
[~allan163].

> Procedure may be deleted improperly during master restarts resulting in 
> 'Corrupt'
> -
>
> Key: HBASE-21354
> URL: https://issues.apache.org/jira/browse/HBASE-21354
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21354.branch-2.0.001.patch, 
> HBASE-21354.branch-2.0.002.patch
>
>
> Good news! [~stack], [~Apache9], I may find the root cause of mysterious 
> ‘Corrupted procedure’ or some procedures disappeared after master 
> restarts(happens during ITBLL).
> This is because during master restarts, we load procedures from the log, and 
> builds the 'holdingCleanupTracker' according each log's tracker. We may mark 
> a procedure in the oldest log as deleted if one log doesn't contain the 
> procedure. This is Inappropriate since one log will not contain info of the 
> log if this procedure was not updated during the time. We can only delete the 
> procedure only if it is not in the global tracker, which have the whole 
> picture.
> {code}
> trackerNode = tracker.lookupClosestNode(trackerNode, procId);
> if (trackerNode == null || !trackerNode.contains(procId) ||
>   trackerNode.isModified(procId)) {
>   // the procedure was removed or modified
>   node.delete(procId);
> }
> {code}
> A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the 
> corruption happened in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21224) Handle compaction queue duplication

2018-10-21 Thread Xu Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-21224:

Attachment: HBASE-21224-master.002.patch

> Handle compaction queue duplication
> ---
>
> Key: HBASE-21224
> URL: https://issues.apache.org/jira/browse/HBASE-21224
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Minor
> Attachments: HBASE-21224-master.001.patch, 
> HBASE-21224-master.002.patch
>
>
> Mentioned by [~allan163] that we may want to handle compaction queue 
> duplication in this Jira https://issues.apache.org/jira/browse/HBASE-18451 
> Creating this item for further assessment and discussion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >