[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658604#comment-16658604 ] Hadoop QA commented on HBASE-20973: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 27s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 41s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 28s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 33s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 58s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 45s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 56s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 47s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 6s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}140m 55s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 44s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}186m 41s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.coprocessor.TestMetaTableMetrics | | | hadoop.hbase.client.TestRestoreSnapshotFromClientWithRegionReplicas | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 | | JIRA Issue | HBASE-20973 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944930/HBASE-20973.branch-2.0.002.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 3c2a3a66f876 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018
[jira] [Commented] (HBASE-20952) Re-visit the WAL API
[ https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658576#comment-16658576 ] Hudson commented on HBASE-20952: Results for branch HBASE-20952 [build #25 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/25/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/25//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/25//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/25//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Re-visit the WAL API > > > Key: HBASE-20952 > URL: https://issues.apache.org/jira/browse/HBASE-20952 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: Josh Elser >Priority: Major > Attachments: 20952.v1.txt > > > Take a step back from the current WAL implementations and think about what an > HBase WAL API should look like. What are the primitive calls that we require > to guarantee durability of writes with a high degree of performance? > The API needs to take the current implementations into consideration. We > should also have a mind for what is happening in the Ratis LogService (but > the LogService should not dictate what HBase's WAL API looks like RATIS-272). > Other "systems" inside of HBase that use WALs are replication and > backup Replication has the use-case for "tail"'ing the WAL which we > should provide via our new API. B doesn't do anything fancy (IIRC). We > should make sure all consumers are generally going to be OK with the API we > create. > The API may be "OK" (or OK in a part). We need to also consider other methods > which were "bolted" on such as {{AbstractFSWAL}} and > {{WALFileLengthProvider}}. Other corners of "WAL use" (like the > {{WALSplitter}} should also be looked at to use WAL-APIs only). > We also need to make sure that adequate interface audience and stability > annotations are chosen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658540#comment-16658540 ] Allan Yang commented on HBASE-21344: For the problem SCP or AP may fail due to all kinds of issues (security issue like this one), I suggest we can set hbase.assignment.maximum.attempts to Long.MAX. So AP will try forever, until we resolve the problem which cause the region can't assign. We can't afford that the AP fails(thus SCP roll back) leaving some region unassigned and we don't know about it(since the corresponding procedures are rolled back and the region won't show as RIT in the WebUI ). I think branch-1.x also have this kind of issue. The assign failed, but shows nothing in WebUI, we don't know a region is not online until the customer reach us. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING, details=row 'backup:system' on table 'hbase:meta' at > region=hbase:meta,,1.1588230740, hostname=, seqNum=-1, > exception=java.io.IOException: Meta region is in state OPENING > at > org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at >
[jira] [Commented] (HBASE-21325) Force to terminate regionserver when abort hang in somewhere
[ https://issues.apache.org/jira/browse/HBASE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658541#comment-16658541 ] Guanghao Zhang commented on HBASE-21325: I found the ut not hanged anymore after i rebase master code... Need dig more. > Force to terminate regionserver when abort hang in somewhere > > > Key: HBASE-21325 > URL: https://issues.apache.org/jira/browse/HBASE-21325 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang >Assignee: Guanghao Zhang >Priority: Major > Attachments: HBASE-21325.master.001.patch, > HBASE-21325.master.001.patch > > > When testing sync replication, I found that, if I transit the remote cluster > to DA, while the local cluster is still in A, the region server will hang > when shutdown. As the fsOk flag only test the local cluster(which is > reasonable), we will enter the waitOnAllRegionsToClose, and since the WAL is > broken(the remote wal directory is gone) so we will never succeed. And this > lead to an infinite wait inside waitOnAllRegionsToClose. > So I think here we should have an upper bound for the wait time in > waitOnAllRegionsToClose method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21316) All RegionServer Down when RS_LOG_REPLAY_OPS
[ https://issues.apache.org/jira/browse/HBASE-21316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] justice updated HBASE-21316: Labels: easyfix (was: ) Attachment: 0001-add-catch-for-ArrayIndexOutOfBoundsException-when-ch.patch Tags: HBASE-21316,WALSplitter Status: Patch Available (was: Open) > All RegionServer Down when RS_LOG_REPLAY_OPS > > > Key: HBASE-21316 > URL: https://issues.apache.org/jira/browse/HBASE-21316 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0 >Reporter: justice >Priority: Major > Labels: easyfix > Attachments: > 0001-add-catch-for-ArrayIndexOutOfBoundsException-when-ch.patch, log.tgz > > > 1. One RegionServer die as unknow reason, log as follow: > {code:java} > 2018-10-14 20:31:47,423 INFO [main-SendThread(11.3.20.101:2181)] > zookeeper.ClientCnxn: Socket connection established to > 11.3.20.101/11.3.20.101:2181, initiating session 2018-10-14 20:31:47,433 INFO > [main-SendThread(11.3.20.101:2181)] zookeeper.ClientCnxn: Session > establishment complete on server 11.3.20.101/11.3.20.101:2181, sessionid = > 0x6500073f944a8e79, negotiated timeout = 3 2018-10-14日 Sunday 21:03:05 > CST Starting regionserver on 11-3-19-199.JD.LOCAL core file size (blocks, -c) > 0 data seg size (kbytes, -d) unlimited > {code} > 2. Master receive zk deletenode event, and start ServerCrashProcedure Task > {code:java} > 2018-10-14 20:31:47,437 INFO [main-EventThread] master.RegionServerTracker: > RegionServer ephemeral node deleted, processing expiration > [11-3-19-199.jd.local,16020,1539492869470] > 2018-10-14 20:31:47,539 INFO [PEWorker-1] procedure.ServerCrashProcedure: > Start pid=25053, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure > server=11-3-19-199.jd.local,16020,1539492869470, splitWal=true, meta=false > 2018-10-14 20:31:47,550 INFO [PEWorker-1] master.SplitLogManager: Started > splitting 63 logs in > [hdfs://11-3-18-67.JD.LOCAL:9000/hbase/WALs/11-3-19-199.jd.local,16020,1539492869470-splitting] > for [11-3-19-199.jd.local,16020,1539492869470] ... 2018-10-14 20:31:48,592 > INFO [main-EventThread] coordination.SplitLogManagerCoordination: Task > /hbase/splitWAL/WALs%2F11-3-19-199.jd.local%2C16020%2C1539492869470-splitting%2F11-3-19-199.jd.local%252C16020%252C1539492869470.1539520250598 > acquired by 11-3-18-71.jd.local,16020,1539492869409 > {code} > 3. One alive RegionServer Node get SplitLogWorker, has an error and stop > {code:java} > 2018-10-14 20:31:48,602 INFO [SplitLogWorker-11-3-18-71:16020] > coordination.ZkSplitLogWorkerCoordination: worker > 11-3-18-71.jd.local,16020,1539492869409 acquired task > /hbase/splitWAL/WALs%2F11-3-19-199.jd.local%2C16020%2C1539492869470-splitting%2F11-3-19-199.jd.local%252C16020%252C1539492869470.1539520250598 > > ... > 2018-10-14 21:03:26,219 ERROR > [RS_LOG_REPLAY_OPS-regionserver/11-3-18-71:16020-1] executor.EventHandler: > Caught throwable while processing event RS_LOG_REPLAY > java.lang.ArrayIndexOutOfBoundsException: 8811 > at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365) > at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358) > at > org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(PrivateCellUtil.java:735) > at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:816) > at org.apache.hadoop.hbase.wal.WALEdit.isMetaEditFamily(WALEdit.java:102) > at org.apache.hadoop.hbase.wal.WALEdit.isMetaEdit(WALEdit.java:107) > at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:296) > at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:194) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:99) > at > org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70) > at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-10-14 21:03:26,227 ERROR > [RS_LOG_REPLAY_OPS-regionserver/11-3-18-71:16020-1] > regionserver.HRegionServer: * ABORTING region server > 11-3-18-71.jd.local,16020,1539522186368: Caught throwable while processing > event RS_LOG_REPLAY * > > 2018-10-14 20:31:48,780 INFO > [RS_LOG_REPLAY_OPS-regionserver/11-3-18-71:16020-0] > regionserver.HRegionServer: * STOPPING region server > '11-3-18-71.jd.local,16020,1539492869409' * > {code} > 4. other alive regionserver node die one by one, at last, all regionserver > node die -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21316) All RegionServer Down when RS_LOG_REPLAY_OPS
[ https://issues.apache.org/jira/browse/HBASE-21316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658533#comment-16658533 ] justice commented on HBASE-21316: - I don't save the WAL files with this environment , And there has not catch java.lang.ArrayIndexOutOfBoundsException at splitLogFile() function of WALSplitter > All RegionServer Down when RS_LOG_REPLAY_OPS > > > Key: HBASE-21316 > URL: https://issues.apache.org/jira/browse/HBASE-21316 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.0.0 >Reporter: justice >Priority: Major > Attachments: log.tgz > > > 1. One RegionServer die as unknow reason, log as follow: > {code:java} > 2018-10-14 20:31:47,423 INFO [main-SendThread(11.3.20.101:2181)] > zookeeper.ClientCnxn: Socket connection established to > 11.3.20.101/11.3.20.101:2181, initiating session 2018-10-14 20:31:47,433 INFO > [main-SendThread(11.3.20.101:2181)] zookeeper.ClientCnxn: Session > establishment complete on server 11.3.20.101/11.3.20.101:2181, sessionid = > 0x6500073f944a8e79, negotiated timeout = 3 2018-10-14日 Sunday 21:03:05 > CST Starting regionserver on 11-3-19-199.JD.LOCAL core file size (blocks, -c) > 0 data seg size (kbytes, -d) unlimited > {code} > 2. Master receive zk deletenode event, and start ServerCrashProcedure Task > {code:java} > 2018-10-14 20:31:47,437 INFO [main-EventThread] master.RegionServerTracker: > RegionServer ephemeral node deleted, processing expiration > [11-3-19-199.jd.local,16020,1539492869470] > 2018-10-14 20:31:47,539 INFO [PEWorker-1] procedure.ServerCrashProcedure: > Start pid=25053, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure > server=11-3-19-199.jd.local,16020,1539492869470, splitWal=true, meta=false > 2018-10-14 20:31:47,550 INFO [PEWorker-1] master.SplitLogManager: Started > splitting 63 logs in > [hdfs://11-3-18-67.JD.LOCAL:9000/hbase/WALs/11-3-19-199.jd.local,16020,1539492869470-splitting] > for [11-3-19-199.jd.local,16020,1539492869470] ... 2018-10-14 20:31:48,592 > INFO [main-EventThread] coordination.SplitLogManagerCoordination: Task > /hbase/splitWAL/WALs%2F11-3-19-199.jd.local%2C16020%2C1539492869470-splitting%2F11-3-19-199.jd.local%252C16020%252C1539492869470.1539520250598 > acquired by 11-3-18-71.jd.local,16020,1539492869409 > {code} > 3. One alive RegionServer Node get SplitLogWorker, has an error and stop > {code:java} > 2018-10-14 20:31:48,602 INFO [SplitLogWorker-11-3-18-71:16020] > coordination.ZkSplitLogWorkerCoordination: worker > 11-3-18-71.jd.local,16020,1539492869409 acquired task > /hbase/splitWAL/WALs%2F11-3-19-199.jd.local%2C16020%2C1539492869470-splitting%2F11-3-19-199.jd.local%252C16020%252C1539492869470.1539520250598 > > ... > 2018-10-14 21:03:26,219 ERROR > [RS_LOG_REPLAY_OPS-regionserver/11-3-18-71:16020-1] executor.EventHandler: > Caught throwable while processing event RS_LOG_REPLAY > java.lang.ArrayIndexOutOfBoundsException: 8811 > at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1365) > at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1358) > at > org.apache.hadoop.hbase.PrivateCellUtil.matchingFamily(PrivateCellUtil.java:735) > at org.apache.hadoop.hbase.CellUtil.matchingFamily(CellUtil.java:816) > at org.apache.hadoop.hbase.wal.WALEdit.isMetaEditFamily(WALEdit.java:102) > at org.apache.hadoop.hbase.wal.WALEdit.isMetaEdit(WALEdit.java:107) > at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:296) > at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:194) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:99) > at > org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70) > at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-10-14 21:03:26,227 ERROR > [RS_LOG_REPLAY_OPS-regionserver/11-3-18-71:16020-1] > regionserver.HRegionServer: * ABORTING region server > 11-3-18-71.jd.local,16020,1539522186368: Caught throwable while processing > event RS_LOG_REPLAY * > > 2018-10-14 20:31:48,780 INFO > [RS_LOG_REPLAY_OPS-regionserver/11-3-18-71:16020-0] > regionserver.HRegionServer: * STOPPING region server > '11-3-18-71.jd.local,16020,1539492869409' * > {code} > 4. other alive regionserver node die one by one, at last, all regionserver > node die -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split
[ https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658527#comment-16658527 ] Duo Zhang commented on HBASE-21355: --- Talked with [~openinx] offline, we think that the problem is the code introduced in HBASE-20940. We should not try to open the store files every time when calling getRegionInfo, and then close them. It is too expensive. Instead, I think we could also check the compacted files when testing whether a region is mergable or splittable. And I think we should provide a UT for this, as there is no test for HBASE-20940. > HStore's storeSize is calculated repeatedly which causing the confusing > region split > - > > Key: HBASE-21355 > URL: https://issues.apache.org/jira/browse/HBASE-21355 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Blocker > Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.1.1, 2.0.3, 1.4.9, 1.2.9 > > Attachments: HBASE-21355.branch-1.patch, HBASE-21355.v1.patch > > > When testing the branch-2's write performance in our internal cluster, we > found that the region will be inexplicably split. > We use the default ConstantSizeRegionSplitPolicy and > hbase.hregion.max.filesize=40G,but the region will be split even if its > bytes size is less than 40G(only ~6G). > Checked the code, I found that the following path will accumulate the > store's storeSize to a very big value, because the path has no reset.. > {code} > RsRpcServices#getRegionInfo > -> HRegion#isMergeable >-> HRegion#hasReferences > -> HStore#hasReferences > -> HStore#openStoreFiles > {code} > BTW, we seems forget to maintain the read replica's storeSize when refresh > the store files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21325) Force to terminate regionserver when abort hang in somewhere
[ https://issues.apache.org/jira/browse/HBASE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-21325: --- Attachment: HBASE-21325.master.001.patch > Force to terminate regionserver when abort hang in somewhere > > > Key: HBASE-21325 > URL: https://issues.apache.org/jira/browse/HBASE-21325 > Project: HBase > Issue Type: Improvement >Reporter: Duo Zhang >Assignee: Guanghao Zhang >Priority: Major > Attachments: HBASE-21325.master.001.patch, > HBASE-21325.master.001.patch > > > When testing sync replication, I found that, if I transit the remote cluster > to DA, while the local cluster is still in A, the region server will hang > when shutdown. As the fsOk flag only test the local cluster(which is > reasonable), we will enter the waitOnAllRegionsToClose, and since the WAL is > broken(the remote wal directory is gone) so we will never succeed. And this > lead to an infinite wait inside waitOnAllRegionsToClose. > So I think here we should have an upper bound for the wait time in > waitOnAllRegionsToClose method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658515#comment-16658515 ] Allan Yang commented on HBASE-21344: {code} -if (assignmentManager.getRegionStates().getRegionState(RegionInfoBuilder.FIRST_META_REGIONINFO) - .isOffline()) { +RegionState metaRegionState = + assignmentManager.getRegionStates().getRegionState(RegionInfoBuilder.FIRST_META_REGIONINFO); +if (!metaRegionState.isOpened()) { Optional> optProc = procedureExecutor.getProcedures().stream() .filter(p -> p instanceof InitMetaProcedure).findAny(); - if (optProc.isPresent()) { + // check if we are not loading a successful procedure by the last master as Meta is still not + // in OPEN state + // this also helps in unnecessary waiting on the latch(and get stuck) as the countdown was + // reset and will never be down to zero as the procedure is not running + if (optProc.isPresent() && !((InitMetaProcedure) optProc.get()).isSuccess()) { initMetaProc = (InitMetaProcedure) optProc.get(); } else { // schedule an init meta procedure if meta has not been deployed yet {code} This is not right, the meta table may on a crashed server, and in RIT state, if we assign it directly, it may loss some data, since the WAL may not replayed. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, > retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state > OPENING,
[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'
[ https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658508#comment-16658508 ] Allan Yang commented on HBASE-21354: I add some comments in the holdingCleanupTracker() method, it is not what you mean? > Procedure may be deleted improperly during master restarts resulting in > 'Corrupt' > - > > Key: HBASE-21354 > URL: https://issues.apache.org/jira/browse/HBASE-21354 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21354.branch-2.0.001.patch, > HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch, > HBASE-21354.branch-2.0.004.patch > > > Good news! [~stack], [~Apache9], I may find the root cause of mysterious > ‘Corrupted procedure’ or some procedures disappeared after master > restarts(happens during ITBLL). > This is because during master restarts, we load procedures from the log, and > builds the 'holdingCleanupTracker' according each log's tracker. We may mark > a procedure in the oldest log as deleted if one log doesn't contain the > procedure. This is Inappropriate since one log will not contain info of the > log if this procedure was not updated during the time. We can only delete the > procedure only if it is not in the global tracker, which have the whole > picture. > {code} > trackerNode = tracker.lookupClosestNode(trackerNode, procId); > if (trackerNode == null || !trackerNode.contains(procId) || > trackerNode.isModified(procId)) { > // the procedure was removed or modified > node.delete(procId); > } > {code} > A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the > corruption happened in the patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split
[ https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658502#comment-16658502 ] Hadoop QA commented on HBASE-21355: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HBASE-21355 does not apply to branch-1. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.8.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-21355 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944932/HBASE-21355.branch-1.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14791/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > HStore's storeSize is calculated repeatedly which causing the confusing > region split > - > > Key: HBASE-21355 > URL: https://issues.apache.org/jira/browse/HBASE-21355 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Blocker > Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.1.1, 2.0.3, 1.4.9, 1.2.9 > > Attachments: HBASE-21355.branch-1.patch, HBASE-21355.v1.patch > > > When testing the branch-2's write performance in our internal cluster, we > found that the region will be inexplicably split. > We use the default ConstantSizeRegionSplitPolicy and > hbase.hregion.max.filesize=40G,but the region will be split even if its > bytes size is less than 40G(only ~6G). > Checked the code, I found that the following path will accumulate the > store's storeSize to a very big value, because the path has no reset.. > {code} > RsRpcServices#getRegionInfo > -> HRegion#isMergeable >-> HRegion#hasReferences > -> HStore#hasReferences > -> HStore#openStoreFiles > {code} > BTW, we seems forget to maintain the read replica's storeSize when refresh > the store files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split
[ https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Hu updated HBASE-21355: - Fix Version/s: 1.2.9 1.4.9 1.3.3 1.5.0 > HStore's storeSize is calculated repeatedly which causing the confusing > region split > - > > Key: HBASE-21355 > URL: https://issues.apache.org/jira/browse/HBASE-21355 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Blocker > Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.1.1, 2.0.3, 1.4.9, 1.2.9 > > Attachments: HBASE-21355.branch-1.patch, HBASE-21355.v1.patch > > > When testing the branch-2's write performance in our internal cluster, we > found that the region will be inexplicably split. > We use the default ConstantSizeRegionSplitPolicy and > hbase.hregion.max.filesize=40G,but the region will be split even if its > bytes size is less than 40G(only ~6G). > Checked the code, I found that the following path will accumulate the > store's storeSize to a very big value, because the path has no reset.. > {code} > RsRpcServices#getRegionInfo > -> HRegion#isMergeable >-> HRegion#hasReferences > -> HStore#hasReferences > -> HStore#openStoreFiles > {code} > BTW, we seems forget to maintain the read replica's storeSize when refresh > the store files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'
[ https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658498#comment-16658498 ] Duo Zhang commented on HBASE-21354: --- {quote} Done in V4 patch. {quote} Where? I skimmed and haven't found... > Procedure may be deleted improperly during master restarts resulting in > 'Corrupt' > - > > Key: HBASE-21354 > URL: https://issues.apache.org/jira/browse/HBASE-21354 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21354.branch-2.0.001.patch, > HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch, > HBASE-21354.branch-2.0.004.patch > > > Good news! [~stack], [~Apache9], I may find the root cause of mysterious > ‘Corrupted procedure’ or some procedures disappeared after master > restarts(happens during ITBLL). > This is because during master restarts, we load procedures from the log, and > builds the 'holdingCleanupTracker' according each log's tracker. We may mark > a procedure in the oldest log as deleted if one log doesn't contain the > procedure. This is Inappropriate since one log will not contain info of the > log if this procedure was not updated during the time. We can only delete the > procedure only if it is not in the global tracker, which have the whole > picture. > {code} > trackerNode = tracker.lookupClosestNode(trackerNode, procId); > if (trackerNode == null || !trackerNode.contains(procId) || > trackerNode.isModified(procId)) { > // the procedure was removed or modified > node.delete(procId); > } > {code} > A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the > corruption happened in the patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split
[ https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Hu updated HBASE-21355: - Attachment: HBASE-21355.branch-1.patch > HStore's storeSize is calculated repeatedly which causing the confusing > region split > - > > Key: HBASE-21355 > URL: https://issues.apache.org/jira/browse/HBASE-21355 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Blocker > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21355.branch-1.patch, HBASE-21355.v1.patch > > > When testing the branch-2's write performance in our internal cluster, we > found that the region will be inexplicably split. > We use the default ConstantSizeRegionSplitPolicy and > hbase.hregion.max.filesize=40G,but the region will be split even if its > bytes size is less than 40G(only ~6G). > Checked the code, I found that the following path will accumulate the > store's storeSize to a very big value, because the path has no reset.. > {code} > RsRpcServices#getRegionInfo > -> HRegion#isMergeable >-> HRegion#hasReferences > -> HStore#hasReferences > -> HStore#openStoreFiles > {code} > BTW, we seems forget to maintain the read replica's storeSize when refresh > the store files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'
[ https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658495#comment-16658495 ] Allan Yang commented on HBASE-21354: {quote} Just add a note when building holdingCleanupTracker? {quote} Done in V4 patch. {quote} Make these info-level? {quote} What's your concern, sir [~stack]? I think DEBUG is enough. {quote} I've seen when lots of chaos where I cannot clean up a Procedure because another holds a lock but the 'other' no longer exists. {quote} I don't think this one can solve this issue... If the other procedure is holding a lock, that means the procedure is loaded normally form replay, and it is somewhere for sure... This patch only solve issues that the procedures may deleted improperly and thus making their parent/children procedure 'corrupt' > Procedure may be deleted improperly during master restarts resulting in > 'Corrupt' > - > > Key: HBASE-21354 > URL: https://issues.apache.org/jira/browse/HBASE-21354 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21354.branch-2.0.001.patch, > HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch, > HBASE-21354.branch-2.0.004.patch > > > Good news! [~stack], [~Apache9], I may find the root cause of mysterious > ‘Corrupted procedure’ or some procedures disappeared after master > restarts(happens during ITBLL). > This is because during master restarts, we load procedures from the log, and > builds the 'holdingCleanupTracker' according each log's tracker. We may mark > a procedure in the oldest log as deleted if one log doesn't contain the > procedure. This is Inappropriate since one log will not contain info of the > log if this procedure was not updated during the time. We can only delete the > procedure only if it is not in the global tracker, which have the whole > picture. > {code} > trackerNode = tracker.lookupClosestNode(trackerNode, procId); > if (trackerNode == null || !trackerNode.contains(procId) || > trackerNode.isModified(procId)) { > // the procedure was removed or modified > node.delete(procId); > } > {code} > A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the > corruption happened in the patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'
[ https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yang updated HBASE-21354: --- Attachment: HBASE-21354.branch-2.0.004.patch > Procedure may be deleted improperly during master restarts resulting in > 'Corrupt' > - > > Key: HBASE-21354 > URL: https://issues.apache.org/jira/browse/HBASE-21354 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21354.branch-2.0.001.patch, > HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch, > HBASE-21354.branch-2.0.004.patch > > > Good news! [~stack], [~Apache9], I may find the root cause of mysterious > ‘Corrupted procedure’ or some procedures disappeared after master > restarts(happens during ITBLL). > This is because during master restarts, we load procedures from the log, and > builds the 'holdingCleanupTracker' according each log's tracker. We may mark > a procedure in the oldest log as deleted if one log doesn't contain the > procedure. This is Inappropriate since one log will not contain info of the > log if this procedure was not updated during the time. We can only delete the > procedure only if it is not in the global tracker, which have the whole > picture. > {code} > trackerNode = tracker.lookupClosestNode(trackerNode, procId); > if (trackerNode == null || !trackerNode.contains(procId) || > trackerNode.isModified(procId)) { > // the procedure was removed or modified > node.delete(procId); > } > {code} > A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the > corruption happened in the patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yang updated HBASE-20973: --- Attachment: HBASE-20973.branch-2.0.002.patch > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch, > HBASE-20973.branch-2.0.002.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed > in HBASE-20846) > I tried to reproduce this one using the test case in HBASE-20921 but I just > can't reproduce it. > A easy way to resolve this is add a try catch, making sure no matter what > happens, the table's exclusive lock can always be relased. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21302) Release 1.2.8
[ https://issues.apache.org/jira/browse/HBASE-21302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey resolved HBASE-21302. - Resolution: Fixed release announcement sent to announce@apache and {dev,user}@hbase > Release 1.2.8 > - > > Key: HBASE-21302 > URL: https://issues.apache.org/jira/browse/HBASE-21302 > Project: HBase > Issue Type: Task > Components: community >Affects Versions: 1.2.8 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Major > Fix For: 1.2.8 > > > 1.4.8 is out, time to make 1.2.8. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21336) Simplify the implementation of WALProcedureMap
[ https://issues.apache.org/jira/browse/HBASE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658487#comment-16658487 ] Duo Zhang commented on HBASE-21336: --- Maybe. Now I moved the logic to a separated step so it should not happen any more. > Simplify the implementation of WALProcedureMap > -- > > Key: HBASE-21336 > URL: https://issues.apache.org/jira/browse/HBASE-21336 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21336-v1.patch, HBASE-21336-v2.patch, > HBASE-21336-v3.patch, HBASE-21336.patch > > > I do not think we need to implement the logic from such a low level, i.e, > building complicated linked list by hand, which makes it really hard to > understand. > Let me try to implement it with existing data structures... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658486#comment-16658486 ] Duo Zhang commented on HBASE-20973: --- I agree to disable grow or merge for now, but my point is that, if the max node size limit works correctly, then the grow or merge should not happen, as now the max node size is 64... > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed > in HBASE-20846) > I tried to reproduce this one using the test case in HBASE-20921 but I just > can't reproduce it. > A easy way to resolve this is add a try catch, making sure no matter what > happens, the table's exclusive lock can always be relased. -- This message
[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658484#comment-16658484 ] Allan Yang commented on HBASE-20973: {quote} This does not make sense... we should use Math.abs(67 - 191) to test whether it exceeds the max node size? {quote} If Math.abs() is not used, then there is no case that BitSetNode can grow. For merging, I can't think of a case that two BitSetNode can merged. Unless two BitSetNodes are overlap(That is impossible). So, since it can't grow or merge in normal cases, what about this patch, can we disable them for now, [~Apache9]? > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed > in HBASE-20846)
[jira] [Commented] (HBASE-21336) Simplify the implementation of WALProcedureMap
[ https://issues.apache.org/jira/browse/HBASE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658482#comment-16658482 ] Allan Yang commented on HBASE-21336: {quote} if it has no sub procedures. But now I sort the procedures so the parent procedure will arrive at first then the test will always pass and then we will set it to RUNNABLE... {quote} Do you remember I sent you some logs on Wechat? I encountered a case that a parent procedure executed first even there are children procedures. And later the children procedures began to execute but only find its parent gone. It may have something to do with these logic, may have bug in it... {code} 2018-10-10 14:38:08,662 DEBUG [PEWorker-2] procedure2.ProcedureExecutor(1451): LOCK_EVENT_WAIT pid=15, ppid=14, state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure table=hbase:acl, region=267335c85766c62479fb4a5f18a1e95f 2018-10-10 14:38:08,964 INFO [PEWorker-1] procedure2.ProcedureExecutor(1461): Finished pid=14, state=SUCCESS, hasLock=false; ServerCrashProcedure server=hb-uf6oyi699w8h700f0-003,16020,1539076734964, splitWal=true, meta=false in 3mins, 18.934sec 2018-10-10 14:38:13,699 WARN [PEWorker-3] procedure2.ProcedureExecutor(1385): Rollback because parent is done/rolledback proc=pid=15, ppid=14, state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure table=hbase:acl, region=267335c85766c62479fb4a5f18a1e95f {code} > Simplify the implementation of WALProcedureMap > -- > > Key: HBASE-21336 > URL: https://issues.apache.org/jira/browse/HBASE-21336 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21336-v1.patch, HBASE-21336-v2.patch, > HBASE-21336-v3.patch, HBASE-21336.patch > > > I do not think we need to implement the logic from such a low level, i.e, > building complicated linked list by hand, which makes it really hard to > understand. > Let me try to implement it with existing data structures... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658478#comment-16658478 ] Duo Zhang commented on HBASE-20973: --- This does not make sense... we should use Math.abs(67 - 191) to test whether it exceeds the max node size? > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed > in HBASE-20846) > I tried to reproduce this one using the test case in HBASE-20921 but I just > can't reproduce it. > A easy way to resolve this is add a try catch, making sure no matter what > happens, the table's exclusive lock can always be relased. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658476#comment-16658476 ] Allan Yang commented on HBASE-20973: {quote} So the max node size is useless? Or there are holes where we miss the max size check? {quote} No, it is not useless. But, a Math.abs() was used to check whether it can grow. That means it can't grow up, but it can grow down. For the code I pasted above. when inserting 129, it will create a BitSetNode from 127 to 191. When inserting 67. It will find Math(67 - 127) < max node size, so this BItSetNode will grow down to 64. So it became (64-191). > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed >
[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658473#comment-16658473 ] Duo Zhang commented on HBASE-20973: --- And the left shift of java is not cyclical, it just uses the lowest several bits for shift, I think we need to add comments in the BitSetNode implementation about this. > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed > in HBASE-20846) > I tried to reproduce this one using the test case in HBASE-20921 but I just > can't reproduce it. > A easy way to resolve this is add a try catch, making sure no matter what > happens, the table's exclusive lock can always be relased. -- This message was sent by
[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658472#comment-16658472 ] Duo Zhang commented on HBASE-20973: --- So the max node size is useless? Or there are holes where we miss the max size check? And the memory waste is huge if we do not grow the BitSetNode, I'd say. Although it seems only a few bytes, but the BitSetNode itself also just consume a few bytes, which means that for the worst case the memory could be doubled if we can not grow the BitSetNode. Anyway, correctness is the first thing. I've already filed HBASE-21314 for the efficient problem. Thanks. > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed
[jira] [Comment Edited] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658264#comment-16658264 ] Allan Yang edited comment on HBASE-20973 at 10/22/18 1:45 AM: -- Actually it can, you can use these line of code {code} ProcedureStoreTracker tracker = new ProcedureStoreTracker(); tracker.setPartialFlag(false); tracker.insert(1); tracker.insert(129); tracker.insert(67); {code} When insert proc=67, the BitSetNode of(127-191)will grow to (64-191). And Java left shift is cyclical... 1L <<65 will equals 1L << 1 …So left shifting more than 64 time is OK. was (Author: allan163): Actually it can, you can use these line of code {code} ProcedureStoreTracker tracker = new ProcedureStoreTracker(); tracker.setPartialFlag(false); tracker.insert(1); tracker.insert(129); tracker.insert(67); {code} When insert proc=67, the BitSetNode of(127-191)will grow to (64-191). And Java left shift is cyclical... 1L <<65 will equals 1L << 1 …… > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious
[jira] [Comment Edited] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658264#comment-16658264 ] Allan Yang edited comment on HBASE-20973 at 10/22/18 1:45 AM: -- Actually it can, you can use these lines of code {code} ProcedureStoreTracker tracker = new ProcedureStoreTracker(); tracker.setPartialFlag(false); tracker.insert(1); tracker.insert(129); tracker.insert(67); {code} When inserting proc=67, the BitSetNode of(127-191)will grow to (64-191). And Java left shift is cyclical... 1L <<65 will equals 1L << 1 …So left shifting more than 64 times is OK. was (Author: allan163): Actually it can, you can use these line of code {code} ProcedureStoreTracker tracker = new ProcedureStoreTracker(); tracker.setPartialFlag(false); tracker.insert(1); tracker.insert(129); tracker.insert(67); {code} When insert proc=67, the BitSetNode of(127-191)will grow to (64-191). And Java left shift is cyclical... 1L <<65 will equals 1L << 1 …So left shifting more than 64 time is OK. > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at >
[jira] [Comment Edited] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658264#comment-16658264 ] Allan Yang edited comment on HBASE-20973 at 10/22/18 1:44 AM: -- Actually it can, you can use these line of code {code} ProcedureStoreTracker tracker = new ProcedureStoreTracker(); tracker.setPartialFlag(false); tracker.insert(1); tracker.insert(129); tracker.insert(67); {code} When insert proc=67, the BitSetNode of(127-191)will grow to (64-191). And Java left shift is cyclical... 1L <<65 will equals 1L << 1 …… was (Author: allan163): Actually it can, you can use these line of code {code} ProcedureStoreTracker tracker = new ProcedureStoreTracker(); tracker.setPartialFlag(false); tracker.insert(1); tracker.insert(129); tracker.insert(67); {code} When insert proc=67, the BitSetNode of(64-127)will grow to (64-191). And Java left shift is cyclical... 1L <<65 will equals 1L << 1 …… > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the
[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658467#comment-16658467 ] Allan Yang commented on HBASE-20973: {quote} What would implication be for not growing bitsetnode? There'd be a max on possible Procedure counts? {quote} I don't think so, growing is just for saving a little memory(a few bytes, which are a boolean called partial and a long called start). > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed > in HBASE-20846) > I tried to reproduce this one using the test case in HBASE-20921 but I just > can't reproduce it. > A easy way to resolve this is add a try catch, making sure no matter what > happens,
[jira] [Commented] (HBASE-21334) TestMergeTableRegionsProcedure is flakey
[ https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658466#comment-16658466 ] Duo Zhang commented on HBASE-21334: --- No useful information in the log. Anyway let me commit the patch to all branches first, at least it solves one of the problems. Will keep an eye on the flakey dashboard. > TestMergeTableRegionsProcedure is flakey > > > Key: HBASE-21334 > URL: https://issues.apache.org/jira/browse/HBASE-21334 > Project: HBase > Issue Type: Bug > Components: amv2, proc-v2, test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21334.patch, > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt > > > {noformat} > Error Message > found 5 corrupted procedure(s) on replay > Stacktrace > java.io.IOException: found 5 corrupted procedure(s) on replay > at > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21336) Simplify the implementation of WALProcedureMap
[ https://issues.apache.org/jira/browse/HBASE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658465#comment-16658465 ] Duo Zhang commented on HBASE-21336: --- I haven't changed the force update yet. Will address the problem HBASE-20351 and HBASE-20352. I sort the procedures by id, before passing it to the upper layer. This is not necessary, but could make the log more friendly I think. Reverted from master as one of the problems has been found in HBASE-21334. Will upload a new patch soon to address the comments on rb. > Simplify the implementation of WALProcedureMap > -- > > Key: HBASE-21336 > URL: https://issues.apache.org/jira/browse/HBASE-21336 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21336-v1.patch, HBASE-21336-v2.patch, > HBASE-21336-v3.patch, HBASE-21336.patch > > > I do not think we need to implement the logic from such a low level, i.e, > building complicated linked list by hand, which makes it really hard to > understand. > Let me try to implement it with existing data structures... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21334) TestMergeTableRegionsProcedure is flakey
[ https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658455#comment-16658455 ] Duo Zhang commented on HBASE-21334: --- Much stable now but TestMergeTableRegionsProcedure still failed once, a NPE... Let me dig. > TestMergeTableRegionsProcedure is flakey > > > Key: HBASE-21334 > URL: https://issues.apache.org/jira/browse/HBASE-21334 > Project: HBase > Issue Type: Bug > Components: amv2, proc-v2, test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21334.patch, > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt > > > {noformat} > Error Message > found 5 corrupted procedure(s) on replay > Stacktrace > java.io.IOException: found 5 corrupted procedure(s) on replay > at > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21318) Make RefreshHFilesClient runnable
[ https://issues.apache.org/jira/browse/HBASE-21318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658438#comment-16658438 ] Hadoop QA commented on HBASE-21318: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 59s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 15s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} hbase-examples: The patch generated 0 new + 16 unchanged - 4 fixed = 16 total (was 20) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 14s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 26s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 26s{color} | {color:green} hbase-examples in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 9s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 5s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21318 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944919/HBASE-21318.master.004.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 1edf18626275 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / dd474ef199 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14788/testReport/ | | Max. process+thread count | 2727 (vs. ulimit of 1) | | modules | C: hbase-examples U: hbase-examples | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14788/console | | Powered by | Apache Yetus 0.8.0
[jira] [Commented] (HBASE-21318) Make RefreshHFilesClient runnable
[ https://issues.apache.org/jira/browse/HBASE-21318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658428#comment-16658428 ] Tak Lon (Stephen) Wu commented on HBASE-21318: -- Thanks [~yuzhih...@gmail.com] for reviewing it, I have attached new patch with more style fixes as well. > Make RefreshHFilesClient runnable > - > > Key: HBASE-21318 > URL: https://issues.apache.org/jira/browse/HBASE-21318 > Project: HBase > Issue Type: Improvement > Components: HFile >Affects Versions: 3.0.0, 1.5.0, 2.1.2 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Minor > Attachments: HBASE-21318.master.001.patch, > HBASE-21318.master.002.patch, HBASE-21318.master.003.patch, > HBASE-21318.master.004.patch > > > Other than when user enables hbase.coprocessor.region.classes with > RefreshHFilesEndPoint, user can also run this client as tool runner class/CLI > and calls refresh HFiles directly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21318) Make RefreshHFilesClient runnable
[ https://issues.apache.org/jira/browse/HBASE-21318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tak Lon (Stephen) Wu updated HBASE-21318: - Attachment: HBASE-21318.master.004.patch > Make RefreshHFilesClient runnable > - > > Key: HBASE-21318 > URL: https://issues.apache.org/jira/browse/HBASE-21318 > Project: HBase > Issue Type: Improvement > Components: HFile >Affects Versions: 3.0.0, 1.5.0, 2.1.2 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Minor > Attachments: HBASE-21318.master.001.patch, > HBASE-21318.master.002.patch, HBASE-21318.master.003.patch, > HBASE-21318.master.004.patch > > > Other than when user enables hbase.coprocessor.region.classes with > RefreshHFilesEndPoint, user can also run this client as tool runner class/CLI > and calls refresh HFiles directly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658426#comment-16658426 ] Ankit Singhal commented on HBASE-21344: --- bq. To what are you referring? Looking in AP and SCP, there is no rollback – but you are referring to a specific location. Looking at patch, it looks like you are working in AP#FAILED_OPEN. yes, I meant AP#FAILED_OPEN. Thank you for checking. bq. 400 try bq. { 401 handleFailure(env, regionNode); 402 } bq. catch (IOException e) bq. { 403 return false; 404 } bq. Previous we'd let out IOEs now you are catching them and converting them to false. Maybe catch more local to undoRegionAsOpening since this is new source of IOE. Previously there was no IOException thrown by handleFailure, but agreed, I'll catch locally to undoRegionAsOpening and add a warning. bq. Did you change decrementMinRegionServerCount to public for tests? If so, add @VisibleForTesting... Yes, Sure will make the change. bq.Yes. Philosophy is that all recovery is done via SCP since it has the means for splitting WALs (or figuring this step can be skipped). ok, so during master initialization(restart or standby becoming active), do I search for SCP in procedure queue(and wait on it) and see if it is holding meta and can recover from splitting of logs to the assignment of Meta? Thank you so much for the review, let me know once you have more comments on the same.(I'll be happy to work on them) > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) > {code} > {code:java} > { DEBUG [PEWorker-2]
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658410#comment-16658410 ] stack commented on HBASE-21344: --- bq. Here actually, we are correcting the rollback of assign procedure for Meta in both IMP and SCP. Do you mean roll back of AssignProcedure? (IMP has one step, making an AssignProcedure as subprocedure for meta). Rollback and AP don't really go together; AP doesn't support Rollback. When you say... bq. Earlier, rollback of assign corrects the meta region node(by moving it to offline state) To what are you referring? Looking in AP and SCP, there is no rollback -- but you are referring to a specific location. Looking at patch, it looks like you are working in AP#FAILED_OPEN. If so, your changes in here look good... I wonder about this one though: 400 try { 401 handleFailure(env, regionNode); 402 } catch (IOException e) { 403 return false; 404 } Previous we'd let out IOEs now you are catching them and converting them to false. Maybe catch more local to undoRegionAsOpening since this is new source of IOE. I think this addition of yours looks good down here in undoRegionAsOpening. Did you change decrementMinRegionServerCount to public for tests? If so, add @VisibleForTesting... annotation sir. Let me get back to you after I study your tests more. They look good. bq. Do you think scheduling IMP without checking whether meta logs were split or not, will cause in any problem? Yes. Philosophy is that all recovery is done via SCP since it has the means for splitting WALs (or figuring this step can be skipped). bq. HBASE-21035 looks quite similar but it seems that the handling is more related to the case where the procedures WALs is accidentally/ intentionally cleared. That is correct. but the general practice suggested is that when unsure, then for now at least, we fall back to operator intervention. Please be patient with me Ankit. I'm generally slow. It takes me a while to understand. Thanks for the help. > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at >
[jira] [Commented] (HBASE-21318) Make RefreshHFilesClient runnable
[ https://issues.apache.org/jira/browse/HBASE-21318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658404#comment-16658404 ] Ted Yu commented on HBASE-21318: We're close. {code} HBase clusters are sharing {code} Remove 'are'. Looks good otherwise. > Make RefreshHFilesClient runnable > - > > Key: HBASE-21318 > URL: https://issues.apache.org/jira/browse/HBASE-21318 > Project: HBase > Issue Type: Improvement > Components: HFile >Affects Versions: 3.0.0, 1.5.0, 2.1.2 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Minor > Attachments: HBASE-21318.master.001.patch, > HBASE-21318.master.002.patch, HBASE-21318.master.003.patch > > > Other than when user enables hbase.coprocessor.region.classes with > RefreshHFilesEndPoint, user can also run this client as tool runner class/CLI > and calls refresh HFiles directly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21336) Simplify the implementation of WALProcedureMap
[ https://issues.apache.org/jira/browse/HBASE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658403#comment-16658403 ] stack commented on HBASE-21336: --- Left comments up on rb. This looks great. > Simplify the implementation of WALProcedureMap > -- > > Key: HBASE-21336 > URL: https://issues.apache.org/jira/browse/HBASE-21336 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21336-v1.patch, HBASE-21336-v2.patch, > HBASE-21336-v3.patch, HBASE-21336.patch > > > I do not think we need to implement the logic from such a low level, i.e, > building complicated linked list by hand, which makes it really hard to > understand. > Let me try to implement it with existing data structures... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21318) Make RefreshHFilesClient runnable
[ https://issues.apache.org/jira/browse/HBASE-21318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658397#comment-16658397 ] Hadoop QA commented on HBASE-21318: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 56s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s{color} | {color:red} hbase-examples: The patch generated 1 new + 19 unchanged - 1 fixed = 20 total (was 20) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 3s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 12m 38s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 54s{color} | {color:green} hbase-examples in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 41m 30s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21318 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944917/HBASE-21318.master.003.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 630b4b350c77 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / dd474ef199 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/14787/artifact/patchprocess/diff-checkstyle-hbase-examples.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14787/testReport/ | | Max. process+thread count | 2481 (vs. ulimit of 1) | | modules | C: hbase-examples U: hbase-examples | | Console output |
[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658392#comment-16658392 ] Ted Yu commented on HBASE-21342: [~mdrob]: Can you review the patch one more time ? Thanks > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, > race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21318) Make RefreshHFilesClient runnable
[ https://issues.apache.org/jira/browse/HBASE-21318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tak Lon (Stephen) Wu updated HBASE-21318: - Attachment: HBASE-21318.master.003.patch > Make RefreshHFilesClient runnable > - > > Key: HBASE-21318 > URL: https://issues.apache.org/jira/browse/HBASE-21318 > Project: HBase > Issue Type: Improvement > Components: HFile >Affects Versions: 3.0.0, 1.5.0, 2.1.2 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Minor > Attachments: HBASE-21318.master.001.patch, > HBASE-21318.master.002.patch, HBASE-21318.master.003.patch > > > Other than when user enables hbase.coprocessor.region.classes with > RefreshHFilesEndPoint, user can also run this client as tool runner class/CLI > and calls refresh HFiles directly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658372#comment-16658372 ] Hadoop QA commented on HBASE-21342: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 30s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 17s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 32s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 14s{color} | {color:red} hbase-server: The patch generated 3 new + 3 unchanged - 0 fixed = 6 total (was 3) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 22s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 51s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}145m 16s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}190m 27s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21342 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944909/HBASE-21342.005.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 2f5b93405f3e 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / dd474ef199 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/14786/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14786/testReport/ | | Max. process+thread count | 4679 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output |
[jira] [Commented] (HBASE-21336) Simplify the implementation of WALProcedureMap
[ https://issues.apache.org/jira/browse/HBASE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658369#comment-16658369 ] stack commented on HBASE-21336: --- bq. I do not think we need to guarantee any order when loading procedures. bq. But now I sort the procedures so the parent procedure will arrive at first then the test will always pass and then we will set it to RUNNABLE... bq. And I think the force update logic can also make this happen, as it will update the parent procedure if it is stuck there, and mess up the replay order(for the old implement). Let me see how to fix this. So, did you drop ordering? You just have a sort on load? What did you do for force update? Would be sweet if you could do without ordering. Looking at patch... > Simplify the implementation of WALProcedureMap > -- > > Key: HBASE-21336 > URL: https://issues.apache.org/jira/browse/HBASE-21336 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21336-v1.patch, HBASE-21336-v2.patch, > HBASE-21336-v3.patch, HBASE-21336.patch > > > I do not think we need to implement the logic from such a low level, i.e, > building complicated linked list by hand, which makes it really hard to > understand. > Let me try to implement it with existing data structures... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'
[ https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658360#comment-16658360 ] stack commented on HBASE-21354: --- bq. So this could only happen after restarting, and it also requires that we fail to write the trailer for a file(not only the newest file, and maybe this could also happen with multiple restarts?)? I've seen this happen, where on restart, it complains that one of the many WAL files is missing its ending/tracker. bq. LOG.debug("Remove the oldest log {}", logs.getFirst()); Make these info-level? This and their creation on log roll if not already. I like your adding a 'why' to log message. +1 > Procedure may be deleted improperly during master restarts resulting in > 'Corrupt' > - > > Key: HBASE-21354 > URL: https://issues.apache.org/jira/browse/HBASE-21354 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21354.branch-2.0.001.patch, > HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch > > > Good news! [~stack], [~Apache9], I may find the root cause of mysterious > ‘Corrupted procedure’ or some procedures disappeared after master > restarts(happens during ITBLL). > This is because during master restarts, we load procedures from the log, and > builds the 'holdingCleanupTracker' according each log's tracker. We may mark > a procedure in the oldest log as deleted if one log doesn't contain the > procedure. This is Inappropriate since one log will not contain info of the > log if this procedure was not updated during the time. We can only delete the > procedure only if it is not in the global tracker, which have the whole > picture. > {code} > trackerNode = tracker.lookupClosestNode(trackerNode, procId); > if (trackerNode == null || !trackerNode.contains(procId) || > trackerNode.isModified(procId)) { > // the procedure was removed or modified > node.delete(procId); > } > {code} > A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the > corruption happened in the patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'
[ https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658270#comment-16658270 ] stack edited comment on HBASE-21354 at 10/21/18 7:58 PM: - Makes sense [~allan163] Nice find sir. Here's hoping this addresses the weird issue I've seen when lots of chaos where I cannot clean up a Procedure because another holds a lock but the 'other' no longer exists. (I removed my 'nit' after taking a deeper look -- log makes sense). Great test. was (Author: stack): Makes sense [~allan163] Nice find sir. Here's hoping this addresses the weird issue I've seen when lots of chaos where I cannot clean up a Procedure because another holds a lock but the 'other' no longer exists. nit: These kind of logs w/o adding context -- name of the file being recovered -- can be useless LOG.debug("Starting WAL Procedure Store lease recovery"); Great test. > Procedure may be deleted improperly during master restarts resulting in > 'Corrupt' > - > > Key: HBASE-21354 > URL: https://issues.apache.org/jira/browse/HBASE-21354 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21354.branch-2.0.001.patch, > HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch > > > Good news! [~stack], [~Apache9], I may find the root cause of mysterious > ‘Corrupted procedure’ or some procedures disappeared after master > restarts(happens during ITBLL). > This is because during master restarts, we load procedures from the log, and > builds the 'holdingCleanupTracker' according each log's tracker. We may mark > a procedure in the oldest log as deleted if one log doesn't contain the > procedure. This is Inappropriate since one log will not contain info of the > log if this procedure was not updated during the time. We can only delete the > procedure only if it is not in the global tracker, which have the whole > picture. > {code} > trackerNode = tracker.lookupClosestNode(trackerNode, procId); > if (trackerNode == null || !trackerNode.contains(procId) || > trackerNode.isModified(procId)) { > // the procedure was removed or modified > node.delete(procId); > } > {code} > A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the > corruption happened in the patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658354#comment-16658354 ] Hadoop QA commented on HBASE-21342: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 12s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 12s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 14s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 10s{color} | {color:red} hbase-server: The patch generated 3 new + 3 unchanged - 0 fixed = 6 total (was 3) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 15s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 50s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}143m 36s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}185m 26s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.client.TestBlockEvictionFromClient | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21342 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944908/HBASE-21342.004.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 5a0d9b55f3dc 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / dd474ef199 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/14785/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/14785/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results |
[jira] [Commented] (HBASE-13468) hbase.zookeeper.quorum supports ipv6 address
[ https://issues.apache.org/jira/browse/HBASE-13468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658336#comment-16658336 ] Mike Drob commented on HBASE-13468: --- You need to add a relocation stanza at https://github.com/apache/hbase/blob/master/hbase-shaded/pom.xml#L340 > hbase.zookeeper.quorum supports ipv6 address > > > Key: HBASE-13468 > URL: https://issues.apache.org/jira/browse/HBASE-13468 > Project: HBase > Issue Type: Bug >Reporter: Mingtao Zhang >Assignee: maoling >Priority: Major > Attachments: HBASE-13468.master.001.patch, > HBASE-13468.master.002.patch, HBASE-13468.master.003.patch > > > I put ipv6 address in hbase.zookeeper.quorum, by the time this string went to > zookeeper code, the address is messed up, i.e. only '[1234' left. > I started using pseudo mode with embedded zk = true. > I downloaded 1.0.0, not sure which affected version should be here. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21344) hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever
[ https://issues.apache.org/jira/browse/HBASE-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658335#comment-16658335 ] Ankit Singhal commented on HBASE-21344: --- Thanks [~stack] for taking a look. PFB , my responses {quote}So, you are trying to figure the case where the assign in IMP failed to succeed – where the region is stuck in the OPENING state – and if you can find this condition, you'd reschedule an IMP (the body of which happens to be an assign of meta)? {quote} Here actually, we are correcting the rollback of assign procedure for Meta in both IMP and SCP. We are not re-scheduling the IMP until the master restarts(or standby become active) and it finds that meta is still not OPEN. Earlier, rollback of assign corrects the meta region node(by moving it to offline state) but for Meta, we also store state in another meta znode(/hbase/meta-region-server), which was not cleared or set back to offline. (patch is trying to fix this) {quote}What you think of the discussion over in HBASE-21035 where we decide to punt on auto-assign for now at least (IMP only assigns, doesn't do recovery of meta WALs if any). {quote} HBASE-21035 looks quite similar but it seems that the handling is more related to the case where the procedures WALs is accidentally/ intentionally cleared. In our case, splitting of meta logs is completed but assign in SCP failed, which left the meta region node in the OPENING state still after rollback, now meta is never assigned(even after restart) resulting in SCP to never kick-in or cluster to get stuck. So to fix this we are doing two things:- * fixing the meta regionserver node(back to offline state) during the rollback(undoRegionOpening) * During master initialization, checking meta assignment, if it's still not open, we are scheduling another IMP for assignment and waiting on it for completion (Do you think scheduling IMP without checking whether meta logs were split or not, will cause in any problem?) > hbase:meta location in ZooKeeper set to OPENING by the procedure which > eventually failed but precludes Master from assigning it forever > --- > > Key: HBASE-21344 > URL: https://issues.apache.org/jira/browse/HBASE-21344 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-21344-branch-2.0.patch > > > [~elserj] has already summarized it well. > 1. hbase:meta was on RS8 > 2. RS8 crashed, SCP was queued for it, meta first > 3. meta was marked OFFLINE > 4. meta marked as OPENING on RS3 > 5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue > 6. We attempt the openRegion/assignment 10 times, failing each time > 7. We start rolling back the procedure: > {code:java} > 2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: > Usually this should not happen, we will release the lock before if the > procedure is finished, even if the holdLock is true, arrive here means we > have some holes where we do not release the lock. And the releaseLock below > may fail since the procedure may have already been deleted from the procedure > store. > 2018-10-08 06:51:24,543 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: pid=48, ppid=47, > state=FAILED:REGION_TRANSITION_QUEUE, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 > checking lock on 1588230740 > {code} > {code:java} > 2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=47, > state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; ServerCrashProcedure > server=,16020,1538974612843, splitWal=true, meta=true > java.lang.UnsupportedOperationException: unhandled > state=SERVER_CRASH_GET_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) > at > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) > at >
[jira] [Commented] (HBASE-21178) [BC break] : Get and Scan operation with a custom converter_class not working
[ https://issues.apache.org/jira/browse/HBASE-21178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658289#comment-16658289 ] Hudson commented on HBASE-21178: Results for branch branch-2.1 [build #507 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/507/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/507//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/507//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/507//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > [BC break] : Get and Scan operation with a custom converter_class not working > - > > Key: HBASE-21178 > URL: https://issues.apache.org/jira/browse/HBASE-21178 > Project: HBase > Issue Type: Bug > Components: shell >Affects Versions: 2.0.0 >Reporter: Subrat Mishra >Assignee: Subrat Mishra >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21178.master.001.patch, > HBASE-21178.master.002.patch, HBASE-21178.master.003.patch > > > Consider a simple scenario: > {code:java} > create 'foo', {NAME => 'f1'} > put 'foo','r1','f1:a',1000 > get 'foo','r1',{COLUMNS => > ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']} > scan 'foo',{COLUMNS => > ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']}{code} > Both get and scan fails with ERROR > {code:java} > ERROR: wrong number of arguments (3 for 1) {code} > Looks like in table.rb file converter_method expects 3 arguments [(bytes, > offset, len)] since version 2.0.0, prior to version 2.0.0 it was taking only > 1 argument [(bytes)] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mazhenlin updated HBASE-21342: -- Attachment: HBASE-21342.005.patch > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, > race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658283#comment-16658283 ] Ted Yu commented on HBASE-21342: bq. It is in finally block. Perhaps I was not very clear in the previous comment. postBulkLoadHFile() throws IOE. When this happens, I don't think Java would run the rest of the code (decrementUgiReference in this case) in the finally block. You can either move decrementUgiReference call to the beginning of finally block (since it doesn't throw IOE) or, use nested finally in the current finally block. > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658279#comment-16658279 ] mazhenlin commented on HBASE-21342: --- {quote}Should the above be enclosed in finally block ? {quote} It is in finally block. Other comments were fixed. Thank you very much for the review, I have learned a lot. > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad
[ https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mazhenlin updated HBASE-21342: -- Attachment: HBASE-21342.004.patch > FileSystem in use may get closed by other bulk load call in secure bulkLoad > > > Key: HBASE-21342 > URL: https://issues.apache.org/jira/browse/HBASE-21342 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7 >Reporter: mazhenlin >Assignee: mazhenlin >Priority: Major > Attachments: 21342.v1.txt, HBASE-21342.002.patch, > HBASE-21342.003.patch, HBASE-21342.004.patch, race.patch > > > As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If > Two secure bulkload calls from the same UGI into two different regions and > one region finishes earlier, it will close the bulk load fs, and the other > region will fail. > > Another case would be more serious. The FileSystem.close() function needs two > synchronized variables : CACHE and deleteOnExit. If one region calls > FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while > another region is trying to close srcFS ( in > SecureBulkLoadListener.closeSrcFs) , can cause deadlock here. > > I have wrote a UT for this and fixed it using reference counter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21178) [BC break] : Get and Scan operation with a custom converter_class not working
[ https://issues.apache.org/jira/browse/HBASE-21178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658273#comment-16658273 ] Hudson commented on HBASE-21178: Results for branch branch-2.0 [build #990 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/990/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/990//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/990//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/990//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > [BC break] : Get and Scan operation with a custom converter_class not working > - > > Key: HBASE-21178 > URL: https://issues.apache.org/jira/browse/HBASE-21178 > Project: HBase > Issue Type: Bug > Components: shell >Affects Versions: 2.0.0 >Reporter: Subrat Mishra >Assignee: Subrat Mishra >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21178.master.001.patch, > HBASE-21178.master.002.patch, HBASE-21178.master.003.patch > > > Consider a simple scenario: > {code:java} > create 'foo', {NAME => 'f1'} > put 'foo','r1','f1:a',1000 > get 'foo','r1',{COLUMNS => > ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']} > scan 'foo',{COLUMNS => > ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']}{code} > Both get and scan fails with ERROR > {code:java} > ERROR: wrong number of arguments (3 for 1) {code} > Looks like in table.rb file converter_method expects 3 arguments [(bytes, > offset, len)] since version 2.0.0, prior to version 2.0.0 it was taking only > 1 argument [(bytes)] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'
[ https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658270#comment-16658270 ] stack commented on HBASE-21354: --- Makes sense [~allan163] Nice find sir. Here's hoping this addresses the weird issue I've seen when lots of chaos where I cannot clean up a Procedure because another holds a lock but the 'other' no longer exists. nit: These kind of logs w/o adding context -- name of the file being recovered -- can be useless LOG.debug("Starting WAL Procedure Store lease recovery"); Great test. > Procedure may be deleted improperly during master restarts resulting in > 'Corrupt' > - > > Key: HBASE-21354 > URL: https://issues.apache.org/jira/browse/HBASE-21354 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21354.branch-2.0.001.patch, > HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch > > > Good news! [~stack], [~Apache9], I may find the root cause of mysterious > ‘Corrupted procedure’ or some procedures disappeared after master > restarts(happens during ITBLL). > This is because during master restarts, we load procedures from the log, and > builds the 'holdingCleanupTracker' according each log's tracker. We may mark > a procedure in the oldest log as deleted if one log doesn't contain the > procedure. This is Inappropriate since one log will not contain info of the > log if this procedure was not updated during the time. We can only delete the > procedure only if it is not in the global tracker, which have the whole > picture. > {code} > trackerNode = tracker.lookupClosestNode(trackerNode, procId); > if (trackerNode == null || !trackerNode.contains(procId) || > trackerNode.isModified(procId)) { > // the procedure was removed or modified > node.delete(procId); > } > {code} > A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the > corruption happened in the patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658269#comment-16658269 ] stack commented on HBASE-20973: --- What would implication be for not growing bitsetnode? There'd be a max on possible Procedure counts? > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed > in HBASE-20846) > I tried to reproduce this one using the test case in HBASE-20921 but I just > can't reproduce it. > A easy way to resolve this is add a try catch, making sure no matter what > happens, the table's exclusive lock can always be relased. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-13468) hbase.zookeeper.quorum supports ipv6 address
[ https://issues.apache.org/jira/browse/HBASE-13468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658267#comment-16658267 ] Ted Yu commented on HBASE-13468: We should be able to use commons validator. [~mdrob] may have more insight on how to avoid the errors seen at the end of https://builds.apache.org/job/PreCommit-HBASE-Build/14701/artifact/patchprocess/patch-javac-3.0.0.txt Thanks > hbase.zookeeper.quorum supports ipv6 address > > > Key: HBASE-13468 > URL: https://issues.apache.org/jira/browse/HBASE-13468 > Project: HBase > Issue Type: Bug >Reporter: Mingtao Zhang >Assignee: maoling >Priority: Major > Attachments: HBASE-13468.master.001.patch, > HBASE-13468.master.002.patch, HBASE-13468.master.003.patch > > > I put ipv6 address in hbase.zookeeper.quorum, by the time this string went to > zookeeper code, the address is messed up, i.e. only '[1234' left. > I started using pseudo mode with embedded zk = true. > I downloaded 1.0.0, not sure which affected version should be here. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658264#comment-16658264 ] Allan Yang commented on HBASE-20973: Actually it can, you can use these line of code {code} ProcedureStoreTracker tracker = new ProcedureStoreTracker(); tracker.setPartialFlag(false); tracker.insert(1); tracker.insert(129); tracker.insert(67); {code} When insert proc=67, the BitSetNode of(64-127)will grow to (64-191). And Java left shift is cyclical... 1L <<65 will equals 1L << 1 …… > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed > in HBASE-20846) > I tried to reproduce this one using the test case in HBASE-20921 but I just > can't
[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658243#comment-16658243 ] Duo Zhang commented on HBASE-20973: --- It is a bit strange. As mentioned in HBASE-21314, the max size for a BitSetNode is set to 64, which means that a BitSetNode can not grow to more than one long, i.e, we can never grow a BitSetNode, or merge two BitSetNodes... > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed > in HBASE-20846) > I tried to reproduce this one using the test case in HBASE-20921 but I just > can't reproduce it. > A easy way to resolve this is add a try catch, making sure no matter what > happens, the table's exclusive lock
[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'
[ https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658241#comment-16658241 ] Duo Zhang commented on HBASE-21354: --- Just add a note when building holdingCleanupTracker? That we may fail to persist the store tracker for a wal proc file, so only the global store tracker can be trust to contain all the information, and can be used to determine whether a procedure has been deleted. > Procedure may be deleted improperly during master restarts resulting in > 'Corrupt' > - > > Key: HBASE-21354 > URL: https://issues.apache.org/jira/browse/HBASE-21354 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21354.branch-2.0.001.patch, > HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch > > > Good news! [~stack], [~Apache9], I may find the root cause of mysterious > ‘Corrupted procedure’ or some procedures disappeared after master > restarts(happens during ITBLL). > This is because during master restarts, we load procedures from the log, and > builds the 'holdingCleanupTracker' according each log's tracker. We may mark > a procedure in the oldest log as deleted if one log doesn't contain the > procedure. This is Inappropriate since one log will not contain info of the > log if this procedure was not updated during the time. We can only delete the > procedure only if it is not in the global tracker, which have the whole > picture. > {code} > trackerNode = tracker.lookupClosestNode(trackerNode, procId); > if (trackerNode == null || !trackerNode.contains(procId) || > trackerNode.isModified(procId)) { > // the procedure was removed or modified > node.delete(procId); > } > {code} > A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the > corruption happened in the patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21334) TestMergeTableRegionsProcedure is flakey
[ https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658240#comment-16658240 ] Duo Zhang commented on HBASE-21334: --- Pushed to master. Let's see how it works. > TestMergeTableRegionsProcedure is flakey > > > Key: HBASE-21334 > URL: https://issues.apache.org/jira/browse/HBASE-21334 > Project: HBase > Issue Type: Bug > Components: amv2, proc-v2, test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21334.patch, > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt > > > {noformat} > Error Message > found 5 corrupted procedure(s) on replay > Stacktrace > java.io.IOException: found 5 corrupted procedure(s) on replay > at > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21224) Handle compaction queue duplication
[ https://issues.apache.org/jira/browse/HBASE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658238#comment-16658238 ] Hadoop QA commented on HBASE-21224: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 51s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 3s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 12s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 18s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 17s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 50s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}248m 35s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}294m 29s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.master.procedure.TestServerCrashProcedureWithReplicas | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21224 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944884/HBASE-21224-master.003.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 22f7293c2930 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / ae5308ac4a | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/14782/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14782/testReport/ | | Max. process+thread count | 4941 (vs. ulimit of 1) | | modules | C: hbase-server U:
[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'
[ https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658237#comment-16658237 ] Hadoop QA commented on HBASE-21354: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 49s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 3s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 31s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 54s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 35s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 7s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}116m 22s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}157m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.client.TestRestoreSnapshotFromClientWithRegionReplicas | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 | | JIRA Issue | HBASE-21354 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944894/HBASE-21354.branch-2.0.003.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux a2d8a26264f4 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.0 /
[jira] [Commented] (HBASE-21334) TestMergeTableRegionsProcedure is flakey
[ https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658235#comment-16658235 ] Hadoop QA commented on HBASE-21334: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 49s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 14s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 54s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 5s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 10s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 0s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}247m 22s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}291m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.io.asyncfs.TestSaslFanOutOneBlockAsyncDFSOutput | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21334 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944885/HBASE-21334.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux fcf4a43f965b 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / ae5308ac4a | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/14780/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14780/testReport/ | | Max. process+thread count | 4851 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | |
[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658234#comment-16658234 ] Hadoop QA commented on HBASE-20973: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 43s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 28s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 48s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 9m 45s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 14s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 35m 57s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 | | JIRA Issue | HBASE-20973 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944897/HBASE-20973.branch-2.0.001.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 21334225081f 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | branch-2.0 / 25167fb0f9 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14784/testReport/ | | Max. process+thread count | 279 (vs. ulimit of 1) | | modules | C: hbase-procedure U: hbase-procedure | | Console output |
[jira] [Commented] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split
[ https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658221#comment-16658221 ] Hadoop QA commented on HBASE-21355: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 13s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 21s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 26s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 13s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}201m 44s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}244m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21355 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944888/HBASE-21355.v1.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 2b6833769442 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / ae5308ac4a | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14781/testReport/ | | Max. process+thread count | 4869 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14781/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > HStore's storeSize is
[jira] [Updated] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yang updated HBASE-20973: --- Status: Patch Available (was: Open) > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.0.1, 2.1.0 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed > in HBASE-20846) > I tried to reproduce this one using the test case in HBASE-20921 but I just > can't reproduce it. > A easy way to resolve this is add a try catch, making sure no matter what > happens, the table's exclusive lock can always be relased. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yang updated HBASE-20973: --- Attachment: HBASE-20973.branch-2.0.001.patch > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > Attachments: HBASE-20973.branch-2.0.001.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed > in HBASE-20846) > I tried to reproduce this one using the test case in HBASE-20921 but I just > can't reproduce it. > A easy way to resolve this is add a try catch, making sure no matter what > happens, the table's exclusive lock can always be relased. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure
[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658220#comment-16658220 ] Allan Yang commented on HBASE-20973: I haven't find out the root cause of the problem here, but I suspect there is a race condition that when one thread to get the bit in the BitSetNode, another thread merges the BitSetNode with another one, so that the arrays in BitSetNode shrinks, result in an ArrayIndexOutOfBoundsException. I suggest we disable the ability of grow and merge of BitSetNode, to avoid this kind of problem until we find the root cause, I uploaded a patch to disable them. What do you think, [~stack], [~Apache9]? > ArrayIndexOutOfBoundsException when rolling back procedure > -- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Critical > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed > in HBASE-20846)
[jira] [Created] (HBASE-21356) bulkLoadHFile API should ensure that rs has the source hfile's write permission
Zheng Hu created HBASE-21356: Summary: bulkLoadHFile API should ensure that rs has the source hfile's write permission Key: HBASE-21356 URL: https://issues.apache.org/jira/browse/HBASE-21356 Project: HBase Issue Type: Bug Reporter: Zheng Hu Assignee: Zheng Hu If the rs bulk load a HFile but has no write permission of it, we can read & compact the hfile, but after the compaction finished, the HFile willl be moved to archive directory, the HFileCleaner won't has permission to delete, then the HFile will always be keep in HDFS. Need check the file's write permission when run bulkLoadHFile at server side, if no write permission, then reject. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21281) Update bouncycastle dependency.
[ https://issues.apache.org/jira/browse/HBASE-21281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658196#comment-16658196 ] Hudson commented on HBASE-21281: Results for branch master [build #559 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/559/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Update bouncycastle dependency. > --- > > Key: HBASE-21281 > URL: https://issues.apache.org/jira/browse/HBASE-21281 > Project: HBase > Issue Type: Task > Components: dependencies, test >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: 21281.addendum.patch, 21281.addendum2.patch, > HBASE-21281.001.branch-2.0.patch > > > Looks like we still depend on bcprov-jdk16 for some x509 certificate > generation in our tests. Bouncycastle has moved beyond this in 1.47, changing > the artifact names. > [http://www.bouncycastle.org/wiki/display/JA1/Porting+from+earlier+BC+releases+to+1.47+and+later] > There are some API changes too, but it looks like we don't use any of these. > It seems like we also have vestiges in the POMs from when we were depending > on a specific BC version that came in from Hadoop. We now have a > KeyStoreTestUtil class in HBase, which makes me think we can also clean up > some dependencies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21302) Release 1.2.8
[ https://issues.apache.org/jira/browse/HBASE-21302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658197#comment-16658197 ] Hudson commented on HBASE-21302: Results for branch master [build #559 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/559/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Release 1.2.8 > - > > Key: HBASE-21302 > URL: https://issues.apache.org/jira/browse/HBASE-21302 > Project: HBase > Issue Type: Task > Components: community >Affects Versions: 1.2.8 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Major > Fix For: 1.2.8 > > > 1.4.8 is out, time to make 1.2.8. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21336) Simplify the implementation of WALProcedureMap
[ https://issues.apache.org/jira/browse/HBASE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658194#comment-16658194 ] Hudson commented on HBASE-21336: Results for branch master [build #559 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/559/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Simplify the implementation of WALProcedureMap > -- > > Key: HBASE-21336 > URL: https://issues.apache.org/jira/browse/HBASE-21336 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21336-v1.patch, HBASE-21336-v2.patch, > HBASE-21336-v3.patch, HBASE-21336.patch > > > I do not think we need to implement the logic from such a low level, i.e, > building complicated linked list by hand, which makes it really hard to > understand. > Let me try to implement it with existing data structures... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21194) Add tests in TestCopyTable which exercise MOB feature
[ https://issues.apache.org/jira/browse/HBASE-21194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658195#comment-16658195 ] Hudson commented on HBASE-21194: Results for branch master [build #559 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/559/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/559//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Add tests in TestCopyTable which exercise MOB feature > - > > Key: HBASE-21194 > URL: https://issues.apache.org/jira/browse/HBASE-21194 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Artem Ervits >Priority: Minor > Labels: mob > Fix For: 3.0.0 > > Attachments: 21194.v08.patch, HBASE-21194.v01.patch, > HBASE-21194.v02.patch, HBASE-21194.v03.patch, HBASE-21194.v06.patch, > HBASE-21194.v07.patch, HBASE-21194.v08.patch > > > Currently TestCopyTable doesn't cover table(s) with MOB feature enabled. > We should add variant that enables MOB on the table being copied and verify > that MOB content is copied correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'
[ https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Yang updated HBASE-21354: --- Attachment: HBASE-21354.branch-2.0.003.patch > Procedure may be deleted improperly during master restarts resulting in > 'Corrupt' > - > > Key: HBASE-21354 > URL: https://issues.apache.org/jira/browse/HBASE-21354 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21354.branch-2.0.001.patch, > HBASE-21354.branch-2.0.002.patch, HBASE-21354.branch-2.0.003.patch > > > Good news! [~stack], [~Apache9], I may find the root cause of mysterious > ‘Corrupted procedure’ or some procedures disappeared after master > restarts(happens during ITBLL). > This is because during master restarts, we load procedures from the log, and > builds the 'holdingCleanupTracker' according each log's tracker. We may mark > a procedure in the oldest log as deleted if one log doesn't contain the > procedure. This is Inappropriate since one log will not contain info of the > log if this procedure was not updated during the time. We can only delete the > procedure only if it is not in the global tracker, which have the whole > picture. > {code} > trackerNode = tracker.lookupClosestNode(trackerNode, procId); > if (trackerNode == null || !trackerNode.contains(procId) || > trackerNode.isModified(procId)) { > // the procedure was removed or modified > node.delete(procId); > } > {code} > A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the > corruption happened in the patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'
[ https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658190#comment-16658190 ] Allan Yang commented on HBASE-21354: {quote} So this could only happen after restarting, and it also requires that we fail to write the trailer for a file {quote} Yes, you are right, if the latest log has the global tracker written in the trailer, then there is no problem. About the comments, I'm not quit understand you, where to add? The failed TestWALProcedureStore.testNoTrailerDoubleRestart() is related, it helps me find a bug - We only need to check the deleted bits in the global tracker. Uploaded a V3 to fix it. > Procedure may be deleted improperly during master restarts resulting in > 'Corrupt' > - > > Key: HBASE-21354 > URL: https://issues.apache.org/jira/browse/HBASE-21354 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21354.branch-2.0.001.patch, > HBASE-21354.branch-2.0.002.patch > > > Good news! [~stack], [~Apache9], I may find the root cause of mysterious > ‘Corrupted procedure’ or some procedures disappeared after master > restarts(happens during ITBLL). > This is because during master restarts, we load procedures from the log, and > builds the 'holdingCleanupTracker' according each log's tracker. We may mark > a procedure in the oldest log as deleted if one log doesn't contain the > procedure. This is Inappropriate since one log will not contain info of the > log if this procedure was not updated during the time. We can only delete the > procedure only if it is not in the global tracker, which have the whole > picture. > {code} > trackerNode = tracker.lookupClosestNode(trackerNode, procId); > if (trackerNode == null || !trackerNode.contains(procId) || > trackerNode.isModified(procId)) { > // the procedure was removed or modified > node.delete(procId); > } > {code} > A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the > corruption happened in the patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21302) Release 1.2.8
[ https://issues.apache.org/jira/browse/HBASE-21302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658161#comment-16658161 ] Hudson commented on HBASE-21302: Results for branch branch-1.2 [build #520 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.2/520/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.2/520//General_Nightly_Build_Report/] (/) {color:green}+1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.2/520//JDK7_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.2/520//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Release 1.2.8 > - > > Key: HBASE-21302 > URL: https://issues.apache.org/jira/browse/HBASE-21302 > Project: HBase > Issue Type: Task > Components: community >Affects Versions: 1.2.8 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Major > Fix For: 1.2.8 > > > 1.4.8 is out, time to make 1.2.8. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21224) Handle compaction queue duplication
[ https://issues.apache.org/jira/browse/HBASE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658156#comment-16658156 ] Hadoop QA commented on HBASE-21224: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 11s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 15s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 18s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 49s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}132m 16s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}174m 6s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.replication.TestSyncReplicationStandbyKillMaster | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21224 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944882/HBASE-21224-master.002.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 01f1e871aabd 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / ae5308ac4a | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/14779/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14779/testReport/ | | Max. process+thread count | 4722 (vs. ulimit of 1) | | modules | C: hbase-server U:
[jira] [Resolved] (HBASE-21178) [BC break] : Get and Scan operation with a custom converter_class not working
[ https://issues.apache.org/jira/browse/HBASE-21178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-21178. --- Resolution: Fixed Pushed to branch-2.1 and branch-2.0. > [BC break] : Get and Scan operation with a custom converter_class not working > - > > Key: HBASE-21178 > URL: https://issues.apache.org/jira/browse/HBASE-21178 > Project: HBase > Issue Type: Bug > Components: shell >Affects Versions: 2.0.0 >Reporter: Subrat Mishra >Assignee: Subrat Mishra >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21178.master.001.patch, > HBASE-21178.master.002.patch, HBASE-21178.master.003.patch > > > Consider a simple scenario: > {code:java} > create 'foo', {NAME => 'f1'} > put 'foo','r1','f1:a',1000 > get 'foo','r1',{COLUMNS => > ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']} > scan 'foo',{COLUMNS => > ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']}{code} > Both get and scan fails with ERROR > {code:java} > ERROR: wrong number of arguments (3 for 1) {code} > Looks like in table.rb file converter_method expects 3 arguments [(bytes, > offset, len)] since version 2.0.0, prior to version 2.0.0 it was taking only > 1 argument [(bytes)] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21178) [BC break] : Get and Scan operation with a custom converter_class not working
[ https://issues.apache.org/jira/browse/HBASE-21178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21178: -- Fix Version/s: 2.0.3 2.1.1 > [BC break] : Get and Scan operation with a custom converter_class not working > - > > Key: HBASE-21178 > URL: https://issues.apache.org/jira/browse/HBASE-21178 > Project: HBase > Issue Type: Bug > Components: shell >Affects Versions: 2.0.0 >Reporter: Subrat Mishra >Assignee: Subrat Mishra >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21178.master.001.patch, > HBASE-21178.master.002.patch, HBASE-21178.master.003.patch > > > Consider a simple scenario: > {code:java} > create 'foo', {NAME => 'f1'} > put 'foo','r1','f1:a',1000 > get 'foo','r1',{COLUMNS => > ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']} > scan 'foo',{COLUMNS => > ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']}{code} > Both get and scan fails with ERROR > {code:java} > ERROR: wrong number of arguments (3 for 1) {code} > Looks like in table.rb file converter_method expects 3 arguments [(bytes, > offset, len)] since version 2.0.0, prior to version 2.0.0 it was taking only > 1 argument [(bytes)] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21178) [BC break] : Get and Scan operation with a custom converter_class not working
[ https://issues.apache.org/jira/browse/HBASE-21178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang reopened HBASE-21178: --- The discussion here mentioned that the fix should be committed to all 2.x branches but we haven't committed to branch-2.1 and branch-2.0 yet. > [BC break] : Get and Scan operation with a custom converter_class not working > - > > Key: HBASE-21178 > URL: https://issues.apache.org/jira/browse/HBASE-21178 > Project: HBase > Issue Type: Bug > Components: shell >Affects Versions: 2.0.0 >Reporter: Subrat Mishra >Assignee: Subrat Mishra >Priority: Critical > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21178.master.001.patch, > HBASE-21178.master.002.patch, HBASE-21178.master.003.patch > > > Consider a simple scenario: > {code:java} > create 'foo', {NAME => 'f1'} > put 'foo','r1','f1:a',1000 > get 'foo','r1',{COLUMNS => > ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']} > scan 'foo',{COLUMNS => > ['f1:a:c(org.apache.hadoop.hbase.util.Bytes).len']}{code} > Both get and scan fails with ERROR > {code:java} > ERROR: wrong number of arguments (3 for 1) {code} > Looks like in table.rb file converter_method expects 3 arguments [(bytes, > offset, len)] since version 2.0.0, prior to version 2.0.0 it was taking only > 1 argument [(bytes)] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split
[ https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Hu updated HBASE-21355: - Attachment: HBASE-21355.v1.patch > HStore's storeSize is calculated repeatedly which causing the confusing > region split > - > > Key: HBASE-21355 > URL: https://issues.apache.org/jira/browse/HBASE-21355 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Blocker > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21355.v1.patch > > > When testing the branch-2's write performance in our internal cluster, we > found that the region will be inexplicably split. > We use the default ConstantSizeRegionSplitPolicy and > hbase.hregion.max.filesize=40G,but the region will be split even if its > bytes size is less than 40G(only ~6G). > Checked the code, I found that the following path will accumulate the > store's storeSize to a very big value, because the path has no reset.. > {code} > RsRpcServices#getRegionInfo > -> HRegion#isMergeable >-> HRegion#hasReferences > -> HStore#hasReferences > -> HStore#openStoreFiles > {code} > BTW, we seems forget to maintain the read replica's storeSize when refresh > the store files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split
[ https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Hu updated HBASE-21355: - Status: Patch Available (was: Open) > HStore's storeSize is calculated repeatedly which causing the confusing > region split > - > > Key: HBASE-21355 > URL: https://issues.apache.org/jira/browse/HBASE-21355 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Blocker > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21355.v1.patch > > > When testing the branch-2's write performance in our internal cluster, we > found that the region will be inexplicably split. > We use the default ConstantSizeRegionSplitPolicy and > hbase.hregion.max.filesize=40G,but the region will be split even if its > bytes size is less than 40G(only ~6G). > Checked the code, I found that the following path will accumulate the > store's storeSize to a very big value, because the path has no reset.. > {code} > RsRpcServices#getRegionInfo > -> HRegion#isMergeable >-> HRegion#hasReferences > -> HStore#hasReferences > -> HStore#openStoreFiles > {code} > BTW, we seems forget to maintain the read replica's storeSize when refresh > the store files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split
[ https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21355: -- Component/s: regionserver > HStore's storeSize is calculated repeatedly which causing the confusing > region split > - > > Key: HBASE-21355 > URL: https://issues.apache.org/jira/browse/HBASE-21355 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Blocker > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > > When testing the branch-2's write performance in our internal cluster, we > found that the region will be inexplicably split. > We use the default ConstantSizeRegionSplitPolicy and > hbase.hregion.max.filesize=40G,but the region will be split even if its > bytes size is less than 40G(only ~6G). > Checked the code, I found that the following path will accumulate the > store's storeSize to a very big value, because the path has no reset.. > {code} > RsRpcServices#getRegionInfo > -> HRegion#isMergeable >-> HRegion#hasReferences > -> HStore#hasReferences > -> HStore#openStoreFiles > {code} > BTW, we seems forget to maintain the read replica's storeSize when refresh > the store files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split
[ https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21355: -- Priority: Blocker (was: Critical) > HStore's storeSize is calculated repeatedly which causing the confusing > region split > - > > Key: HBASE-21355 > URL: https://issues.apache.org/jira/browse/HBASE-21355 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Blocker > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > > When testing the branch-2's write performance in our internal cluster, we > found that the region will be inexplicably split. > We use the default ConstantSizeRegionSplitPolicy and > hbase.hregion.max.filesize=40G,but the region will be split even if its > bytes size is less than 40G(only ~6G). > Checked the code, I found that the following path will accumulate the > store's storeSize to a very big value, because the path has no reset.. > {code} > RsRpcServices#getRegionInfo > -> HRegion#isMergeable >-> HRegion#hasReferences > -> HStore#hasReferences > -> HStore#openStoreFiles > {code} > BTW, we seems forget to maintain the read replica's storeSize when refresh > the store files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21334) TestMergeTableRegionsProcedure is flakey
[ https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21334: -- Assignee: Duo Zhang Fix Version/s: 2.0.3 2.1.1 2.2.0 3.0.0 Status: Patch Available (was: Open) > TestMergeTableRegionsProcedure is flakey > > > Key: HBASE-21334 > URL: https://issues.apache.org/jira/browse/HBASE-21334 > Project: HBase > Issue Type: Bug > Components: amv2, proc-v2, test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21334.patch, > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt > > > {noformat} > Error Message > found 5 corrupted procedure(s) on replay > Stacktrace > java.io.IOException: found 5 corrupted procedure(s) on replay > at > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'
[ https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658131#comment-16658131 ] Hadoop QA commented on HBASE-21354: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 23s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 56s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 27s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 51s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 38s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 48s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 7s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 13s{color} | {color:red} hbase-procedure in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}115m 32s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}157m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.procedure2.store.wal.TestWALProcedureStore | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 | | JIRA Issue | HBASE-21354 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12944880/HBASE-21354.branch-2.0.002.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 2f024bdade66 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.0 / 5908655244 |
[jira] [Updated] (HBASE-21334) TestMergeTableRegionsProcedure is flakey
[ https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21334: -- Attachment: HBASE-21334.patch > TestMergeTableRegionsProcedure is flakey > > > Key: HBASE-21334 > URL: https://issues.apache.org/jira/browse/HBASE-21334 > Project: HBase > Issue Type: Bug > Components: amv2, proc-v2, test >Reporter: Duo Zhang >Priority: Major > Attachments: HBASE-21334.patch, > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt > > > {noformat} > Error Message > found 5 corrupted procedure(s) on replay > Stacktrace > java.io.IOException: found 5 corrupted procedure(s) on replay > at > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21224) Handle compaction queue duplication
[ https://issues.apache.org/jira/browse/HBASE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-21224: Attachment: HBASE-21224-master.003.patch > Handle compaction queue duplication > --- > > Key: HBASE-21224 > URL: https://issues.apache.org/jira/browse/HBASE-21224 > Project: HBase > Issue Type: Improvement > Components: Compaction >Reporter: Xu Cang >Assignee: Xu Cang >Priority: Minor > Attachments: HBASE-21224-master.001.patch, > HBASE-21224-master.002.patch, HBASE-21224-master.003.patch > > > Mentioned by [~allan163] that we may want to handle compaction queue > duplication in this Jira https://issues.apache.org/jira/browse/HBASE-18451 > Creating this item for further assessment and discussion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split
[ https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658125#comment-16658125 ] Zheng Hu edited comment on HBASE-21355 at 10/21/18 8:06 AM: bq. BTW, we seems forget to maintain the read replica's storeSize when refresh the store files. Oh, it's not true, in the HStore#refreshStoreFilesInternal, we will execute completeCompaction in the final, so the storeSize & totalUncompressedBytes will also be refreshed.. was (Author: openinx): bq. BTW, we seems forget to maintain the read replica's storeSize when refresh the store files. Oh, it's no ture, in the HStore#refreshStoreFilesInternal, we will execute completeCompaction in the final, so the storeSize & totalUncompressedBytes will also be refreshed.. > HStore's storeSize is calculated repeatedly which causing the confusing > region split > - > > Key: HBASE-21355 > URL: https://issues.apache.org/jira/browse/HBASE-21355 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > > When testing the branch-2's write performance in our internal cluster, we > found that the region will be inexplicably split. > We use the default ConstantSizeRegionSplitPolicy and > hbase.hregion.max.filesize=40G,but the region will be split even if its > bytes size is less than 40G(only ~6G). > Checked the code, I found that the following path will accumulate the > store's storeSize to a very big value, because the path has no reset.. > {code} > RsRpcServices#getRegionInfo > -> HRegion#isMergeable >-> HRegion#hasReferences > -> HStore#hasReferences > -> HStore#openStoreFiles > {code} > BTW, we seems forget to maintain the read replica's storeSize when refresh > the store files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split
[ https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658125#comment-16658125 ] Zheng Hu commented on HBASE-21355: -- bq. BTW, we seems forget to maintain the read replica's storeSize when refresh the store files. Oh, it's no ture, in the HStore#refreshStoreFilesInternal, we will execute completeCompaction in the final, so the storeSize & totalUncompressedBytes will also be refreshed.. > HStore's storeSize is calculated repeatedly which causing the confusing > region split > - > > Key: HBASE-21355 > URL: https://issues.apache.org/jira/browse/HBASE-21355 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > > When testing the branch-2's write performance in our internal cluster, we > found that the region will be inexplicably split. > We use the default ConstantSizeRegionSplitPolicy and > hbase.hregion.max.filesize=40G,but the region will be split even if its > bytes size is less than 40G(only ~6G). > Checked the code, I found that the following path will accumulate the > store's storeSize to a very big value, because the path has no reset.. > {code} > RsRpcServices#getRegionInfo > -> HRegion#isMergeable >-> HRegion#hasReferences > -> HStore#hasReferences > -> HStore#openStoreFiles > {code} > BTW, we seems forget to maintain the read replica's storeSize when refresh > the store files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21334) TestMergeTableRegionsProcedure is flakey
[ https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21334: -- Component/s: test > TestMergeTableRegionsProcedure is flakey > > > Key: HBASE-21334 > URL: https://issues.apache.org/jira/browse/HBASE-21334 > Project: HBase > Issue Type: Bug > Components: amv2, proc-v2, test >Reporter: Duo Zhang >Priority: Major > Attachments: > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt > > > {noformat} > Error Message > found 5 corrupted procedure(s) on replay > Stacktrace > java.io.IOException: found 5 corrupted procedure(s) on replay > at > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split
[ https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Hu updated HBASE-21355: - Description: When testing the branch-2's write performance in our internal cluster, we found that the region will be inexplicably split. We use the default ConstantSizeRegionSplitPolicy and hbase.hregion.max.filesize=40G,but the region will be split even if its bytes size is less than 40G(only ~6G). Checked the code, I found that the following path will accumulate the store's storeSize to a very big value, because the path has no reset.. {code} RsRpcServices#getRegionInfo -> HRegion#isMergeable -> HRegion#hasReferences -> HStore#hasReferences -> HStore#openStoreFiles {code} BTW, we seems forget to maintain the read replica's storeSize when refresh the store files. was: When testing the branch-2's write performance in our internal cluster, we found that the region will be inexplicably split. We use the default ConstantSizeRegionSplitPolicy and hbase.hregion.max.filesize=40G,but the region will be split even if its bytes size is less than 40G(only ~6G). Checked the code, I found that the following path will accumulate the store's storeSize to a very big value, because the path has no reset.. {code} RsRpcServices#getRegionInfo -> HRegion#isMergeable -> HRegion#hasReferences -> HStore#hasReferences -> HStore#openStoreFiles {code} BTW, we seems forget to maintain the read replica's storeSize when openStoreFiles. > HStore's storeSize is calculated repeatedly which causing the confusing > region split > - > > Key: HBASE-21355 > URL: https://issues.apache.org/jira/browse/HBASE-21355 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > > When testing the branch-2's write performance in our internal cluster, we > found that the region will be inexplicably split. > We use the default ConstantSizeRegionSplitPolicy and > hbase.hregion.max.filesize=40G,but the region will be split even if its > bytes size is less than 40G(only ~6G). > Checked the code, I found that the following path will accumulate the > store's storeSize to a very big value, because the path has no reset.. > {code} > RsRpcServices#getRegionInfo > -> HRegion#isMergeable >-> HRegion#hasReferences > -> HStore#hasReferences > -> HStore#openStoreFiles > {code} > BTW, we seems forget to maintain the read replica's storeSize when refresh > the store files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split
[ https://issues.apache.org/jira/browse/HBASE-21355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Hu updated HBASE-21355: - Fix Version/s: 2.0.3 2.1.1 2.2.0 3.0.0 > HStore's storeSize is calculated repeatedly which causing the confusing > region split > - > > Key: HBASE-21355 > URL: https://issues.apache.org/jira/browse/HBASE-21355 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > > When testing the branch-2's write performance in our internal cluster, we > found that the region will be inexplicably split. > We use the default ConstantSizeRegionSplitPolicy and > hbase.hregion.max.filesize=40G,but the region will be split even if its > bytes size is less than 40G(only ~6G). > Checked the code, I found that the following path will accumulate the > store's storeSize to a very big value, because the path has no reset.. > {code} > RsRpcServices#getRegionInfo > -> HRegion#isMergeable >-> HRegion#hasReferences > -> HStore#hasReferences > -> HStore#openStoreFiles > {code} > BTW, we seems forget to maintain the read replica's storeSize when > openStoreFiles. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21355) HStore's storeSize is calculated repeatedly which causing the confusing region split
Zheng Hu created HBASE-21355: Summary: HStore's storeSize is calculated repeatedly which causing the confusing region split Key: HBASE-21355 URL: https://issues.apache.org/jira/browse/HBASE-21355 Project: HBase Issue Type: Bug Reporter: Zheng Hu Assignee: Zheng Hu When testing the branch-2's write performance in our internal cluster, we found that the region will be inexplicably split. We use the default ConstantSizeRegionSplitPolicy and hbase.hregion.max.filesize=40G,but the region will be split even if its bytes size is less than 40G(only ~6G). Checked the code, I found that the following path will accumulate the store's storeSize to a very big value, because the path has no reset.. {code} RsRpcServices#getRegionInfo -> HRegion#isMergeable -> HRegion#hasReferences -> HStore#hasReferences -> HStore#openStoreFiles {code} BTW, we seems forget to maintain the read replica's storeSize when openStoreFiles. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21334) TestMergeTableRegionsProcedure is flakey
[ https://issues.apache.org/jira/browse/HBASE-21334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658121#comment-16658121 ] Duo Zhang commented on HBASE-21334: --- OK I think this is test issue. For MergeTableRegionsProcedure and SplitTableRegionProcedure, we will schedule TRSPs to bring the region online, and since the MergeTableRegionsProcedure or SplitTableRegionProcedure still holds the lock when rolling back, the TRSPs can only be executed after the rollback is finished, and since we have set kill after every step so these TRSPs may also be effected. We have a piece of code in MasterProcedureTestingUtility to deal with this but obviously it does not always work... {code} if (waitForAsyncProcs) { // Sometimes there are other procedures still executing (including asynchronously spawned by // procId) and due to KillAndToggleBeforeStoreUpdate flag ProcedureExecutor is stopped before // store update. Let all pending procedures finish normally. if (!procExec.isRunning()) { LOG.warn("ProcedureExecutor not running, may have been stopped by pending procedure due to" + " KillAndToggleBeforeStoreUpdate flag."); ProcedureTestingUtility.setKillAndToggleBeforeStoreUpdate(procExec, false); restartMasterProcedureExecutor(procExec); ProcedureTestingUtility.waitNoProcedureRunning(procExec); } } {code} Let me think how to make it more stable... > TestMergeTableRegionsProcedure is flakey > > > Key: HBASE-21334 > URL: https://issues.apache.org/jira/browse/HBASE-21334 > Project: HBase > Issue Type: Bug > Components: amv2, proc-v2 >Reporter: Duo Zhang >Priority: Major > Attachments: > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure-output.txt > > > {noformat} > Error Message > found 5 corrupted procedure(s) on replay > Stacktrace > java.io.IOException: found 5 corrupted procedure(s) on replay > at > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:295) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in 'Corrupt'
[ https://issues.apache.org/jira/browse/HBASE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658115#comment-16658115 ] Duo Zhang commented on HBASE-21354: --- So this could only happen after restarting, and it also requires that we fail to write the trailer for a file(not only the newest file, and maybe this could also happen with multiple restarts?)? As when rolling, we will reuse the storeTracker, so the storeTracker for a newer file will always contains all the procedures for an older file. I think we also need to add this to the comment to prevent someone changes it back in the future. And we can also mention in the comment that this will not cause any procedures can not be deleted, as if a procedure has been deleted, we can always get this information through the newest storeTracker(the global one). No other problems. Nice catch, the patch is great. Thanks our mighty [~allan163]. > Procedure may be deleted improperly during master restarts resulting in > 'Corrupt' > - > > Key: HBASE-21354 > URL: https://issues.apache.org/jira/browse/HBASE-21354 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.1.0, 2.0.2 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21354.branch-2.0.001.patch, > HBASE-21354.branch-2.0.002.patch > > > Good news! [~stack], [~Apache9], I may find the root cause of mysterious > ‘Corrupted procedure’ or some procedures disappeared after master > restarts(happens during ITBLL). > This is because during master restarts, we load procedures from the log, and > builds the 'holdingCleanupTracker' according each log's tracker. We may mark > a procedure in the oldest log as deleted if one log doesn't contain the > procedure. This is Inappropriate since one log will not contain info of the > log if this procedure was not updated during the time. We can only delete the > procedure only if it is not in the global tracker, which have the whole > picture. > {code} > trackerNode = tracker.lookupClosestNode(trackerNode, procId); > if (trackerNode == null || !trackerNode.contains(procId) || > trackerNode.isModified(procId)) { > // the procedure was removed or modified > node.delete(procId); > } > {code} > A test case(testProcedureShouldNotCleanOnLoad) shows cleanly how the > corruption happened in the patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21224) Handle compaction queue duplication
[ https://issues.apache.org/jira/browse/HBASE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-21224: Attachment: HBASE-21224-master.002.patch > Handle compaction queue duplication > --- > > Key: HBASE-21224 > URL: https://issues.apache.org/jira/browse/HBASE-21224 > Project: HBase > Issue Type: Improvement > Components: Compaction >Reporter: Xu Cang >Assignee: Xu Cang >Priority: Minor > Attachments: HBASE-21224-master.001.patch, > HBASE-21224-master.002.patch > > > Mentioned by [~allan163] that we may want to handle compaction queue > duplication in this Jira https://issues.apache.org/jira/browse/HBASE-18451 > Creating this item for further assessment and discussion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)