[jira] [Commented] (HBASE-21564) race condition in WAL rolling resulting in size-based rolling getting stuck
[ https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712419#comment-16712419 ] Hadoop QA commented on HBASE-21564: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 44s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 30s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 49s{color} | {color:red} hbase-server generated 6 new + 182 unchanged - 6 fixed = 188 total (was 188) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 3s{color} | {color:red} hbase-server: The patch generated 6 new + 1 unchanged - 0 fixed = 7 total (was 1) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 50s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 15s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}148m 12s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 5s{color} | {color:red} hbase-backup in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 0s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}241m 35s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.master.replication.TestTransitPeerSyncReplicationStateProcedureRetry | | | hadoop.hbase.fs.TestBlockReorderMultiBlocks | | | hadoop.hbase.client.replication.TestReplicationAdmin | | | hadoop.hbase.replication.TestSyncReplicationStandBy | | | hadoop.hbase.replication.TestReplicationSmallTestsSync | | | hadoop.hbase.replication.TestSerialSyncReplication | | | hadoop.hbase.replication.TestReplicationChangingPeerRegionservers | | | hadoop.hbase.replication.TestReplicationSmallTests | | | hadoop.hbase.client.TestAsyncClusterAdminApi | | | hadoop.hbase.replication.TestAddToSerialReplicationPeer | | | hadoop.hbase.replication.
[jira] [Updated] (HBASE-21553) schedLock not released in MasterProcedureScheduler
[ https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Cang updated HBASE-21553: Attachment: HBASE-21553-branch-1.001.patch Status: Patch Available (was: Open) > schedLock not released in MasterProcedureScheduler > -- > > Key: HBASE-21553 > URL: https://issues.apache.org/jira/browse/HBASE-21553 > Project: HBase > Issue Type: Improvement >Reporter: Xu Cang >Assignee: Xu Cang >Priority: Major > Attachments: HBASE-21553-branch-1.001.patch > > > https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749 > As shown above, we didn't unlock schedLock which can cause deadlock. > Besides this, there are other places in this class handles schedLock.unlock > in a risky manner. I'd like to move them to finally block to improve the > robustness of handling locks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21566) Release notes and changes for 2.0.4RC0 and 2.1.2RC0
[ https://issues.apache.org/jira/browse/HBASE-21566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-21566. --- Resolution: Fixed Fix Version/s: 2.1.2 Pushed the appropriate changes and releasenotes on branch-2.0 and branch-2.1. > Release notes and changes for 2.0.4RC0 and 2.1.2RC0 > --- > > Key: HBASE-21566 > URL: https://issues.apache.org/jira/browse/HBASE-21566 > Project: HBase > Issue Type: Sub-task > Components: release >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.1.2, 2.0.4 > > > $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.0.4 -l > --sortorder=newer --skip-credits > $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.1.2 -l > --sortorder=newer --skip-credits > ... using yetus tagged 0.8.0 > ...then carefully stitched the product into the current CHANGES.md and > RELEASENOTES.md files being careful to preserve markdown header ABOVE the > apache license else the .md files won't render as markdown -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21566) Release notes and changes for 2.0.4RC0 and 2.1.2RC0
[ https://issues.apache.org/jira/browse/HBASE-21566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21566: -- Summary: Release notes and changes for 2.0.4RC0 and 2.1.2RC0 (was: Release notes and changes for 2.0.4RC0 and 2.1.1RC0) > Release notes and changes for 2.0.4RC0 and 2.1.2RC0 > --- > > Key: HBASE-21566 > URL: https://issues.apache.org/jira/browse/HBASE-21566 > Project: HBase > Issue Type: Sub-task > Components: release >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.4 > > > $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.0.4 -l > --sortorder=newer --skip-credits > $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.1.1 -l > --sortorder=newer --skip-credits > ... using yetus tagged 0.8.0 > ...then carefully stitched the product into the current CHANGES.md and > RELEASENOTES.md files being careful to preserve markdown header ABOVE the > apache license else the .md files won't render as markdown -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21566) Release notes and changes for 2.0.4RC0 and 2.1.2RC0
[ https://issues.apache.org/jira/browse/HBASE-21566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21566: -- Description: $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.0.4 -l --sortorder=newer --skip-credits $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.1.2 -l --sortorder=newer --skip-credits ... using yetus tagged 0.8.0 ...then carefully stitched the product into the current CHANGES.md and RELEASENOTES.md files being careful to preserve markdown header ABOVE the apache license else the .md files won't render as markdown was: $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.0.4 -l --sortorder=newer --skip-credits $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.1.1 -l --sortorder=newer --skip-credits ... using yetus tagged 0.8.0 ...then carefully stitched the product into the current CHANGES.md and RELEASENOTES.md files being careful to preserve markdown header ABOVE the apache license else the .md files won't render as markdown > Release notes and changes for 2.0.4RC0 and 2.1.2RC0 > --- > > Key: HBASE-21566 > URL: https://issues.apache.org/jira/browse/HBASE-21566 > Project: HBase > Issue Type: Sub-task > Components: release >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.4 > > > $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.0.4 -l > --sortorder=newer --skip-credits > $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.1.2 -l > --sortorder=newer --skip-credits > ... using yetus tagged 0.8.0 > ...then carefully stitched the product into the current CHANGES.md and > RELEASENOTES.md files being careful to preserve markdown header ABOVE the > apache license else the .md files won't render as markdown -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21566) Release notes and changes for 2.0.4RC0 and 2.1.1RC0
stack created HBASE-21566: - Summary: Release notes and changes for 2.0.4RC0 and 2.1.1RC0 Key: HBASE-21566 URL: https://issues.apache.org/jira/browse/HBASE-21566 Project: HBase Issue Type: Sub-task Components: release Reporter: stack Assignee: stack Fix For: 2.0.4 $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.0.4 -l --sortorder=newer --skip-credits $ ./release-doc-maker/releasedocmaker.py -p HBASE --fileversions -v 2.1.1 -l --sortorder=newer --skip-credits ... using yetus tagged 0.8.0 ...then carefully stitched the product into the current CHANGES.md and RELEASENOTES.md files being careful to preserve markdown header ABOVE the apache license else the .md files won't render as markdown -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21554) Show replication endpoint classname for replication peer on master web UI
[ https://issues.apache.org/jira/browse/HBASE-21554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-21554: --- Attachment: HBASE-21554.branch-2.001.patch > Show replication endpoint classname for replication peer on master web UI > - > > Key: HBASE-21554 > URL: https://issues.apache.org/jira/browse/HBASE-21554 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-21554.branch-2.001.patch, > HBASE-21554.master.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21554) Show replication endpoint classname for replication peer on master web UI
[ https://issues.apache.org/jira/browse/HBASE-21554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712358#comment-16712358 ] Zheng Hu commented on HBASE-21554: -- +1 > Show replication endpoint classname for replication peer on master web UI > - > > Key: HBASE-21554 > URL: https://issues.apache.org/jira/browse/HBASE-21554 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Minor > Attachments: HBASE-21554.master.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21413) Empty meta log doesn't get split when restart whole cluster
[ https://issues.apache.org/jira/browse/HBASE-21413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21413: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to branch-2.0+. Andrew made an issue for backport to branch-1. Pushing so can work on an RC. Thanks [~allan163] > Empty meta log doesn't get split when restart whole cluster > --- > > Key: HBASE-21413 > URL: https://issues.apache.org/jira/browse/HBASE-21413 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.1.1, 2.0.2 >Reporter: Jingyun Tian >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4 > > Attachments: HBASE-21413.branch-2.1.001.patch, > HBASE-21413.branch-2.1.002.patch, Screenshot from 2018-10-31 18-11-02.png, > Screenshot from 2018-10-31 18-11-11.png > > > After I restart whole cluster, there is a splitting directory still exists on > hdfs. Then I found there is only an empty meta wal file in it. I'll dig into > this later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21413) Empty meta log doesn't get split when restart whole cluster
[ https://issues.apache.org/jira/browse/HBASE-21413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21413: -- Fix Version/s: 2.2.0 > Empty meta log doesn't get split when restart whole cluster > --- > > Key: HBASE-21413 > URL: https://issues.apache.org/jira/browse/HBASE-21413 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.1.1, 2.0.2 >Reporter: Jingyun Tian >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4 > > Attachments: HBASE-21413.branch-2.1.001.patch, > HBASE-21413.branch-2.1.002.patch, Screenshot from 2018-10-31 18-11-02.png, > Screenshot from 2018-10-31 18-11-11.png > > > After I restart whole cluster, there is a splitting directory still exists on > hdfs. Then I found there is only an empty meta wal file in it. I'll dig into > this later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21559: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Resolving so can put up an RC. Can open new issue if still a prob. > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4 > > Attachments: HBASE-21559.v1.patch, HBASE-21559.v2.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712338#comment-16712338 ] stack commented on HBASE-21559: --- Nice fix [~openinx]. Running it locally, I can't make the hang anymore. Thanks. > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4 > > Attachments: HBASE-21559.v1.patch, HBASE-21559.v2.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21563) HBase Get Encounters java.lang.IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HBASE-21563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712337#comment-16712337 ] ramkrishna.s.vasudevan commented on HBASE-21563: I think similar issues we have seen earlier too with this encoders? I don remember if they got fixed. > HBase Get Encounters java.lang.IndexOutOfBoundsException > > > Key: HBASE-21563 > URL: https://issues.apache.org/jira/browse/HBASE-21563 > Project: HBase > Issue Type: Bug > Components: HFile >Affects Versions: 1.2.0 >Reporter: William Shen >Priority: Major > Attachments: 67a04bc049be4f58afecdcc0a3ba62ca.tar.gz > > > We've recently encountered issue retrieving data from our HBase cluster, and > have not had much luck troubleshooting the issue. We narrowed down our issue > to a single GET, which appears to be caused by FastDiffDeltaEncoder.java > running into java.lang.IndexOutOfBoundsException. > Perhaps there is a bug on a corner case for FastDiffDeltaEncoder? > We are running 1.2.0-cdh5.9.2, and the GET in question is: > {noformat} > hbase(main):004:0> get 'qa2.ADGROUPS', > "\x05\x80\x00\x00\x00\x00\x1F\x54\x9C\x80\x00\x00\x00\x00\x1C\x7D\x45\x00\x04\x80\x00\x00\x00\x00\x1D\x0F\x19\x80\x00\x00\x00\x00\x4A\x64\x6F\x80\x00\x00\x00\x01\xD9\xDB\xCE" > COLUMNCELL > > > ERROR: java.io.IOException > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165) > Caused by: java.lang.IndexOutOfBoundsException > at java.nio.Buffer.checkBounds(Buffer.java:567) > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:149) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:465) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:516) > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:618) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.next(HFileReaderV2.java:1277) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:180) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:588) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5706) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5865) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5643) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5620) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5606) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6801) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6779) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2029) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170) > ... 3 more {noformat} > Likewise, running {{ hbase hfile -f -p }} on the specific hfile, a subset of > kv pairs were printed until the program hits the following exception and > crashes: > {noformat} > Exception in thread "main" java.lang.RuntimeException: Unknown code 65 > at org.apache.hadoop.hbase.KeyValue$Type.codeToType(KeyValue.java:259) > at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1246) > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$ClonedSeekerState.toString(BufferedDataBlockEncoder.java:506) > at java.lang.String.valueOf(String.java:2994) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:382) > at > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:316) > at > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:255) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hbase.io.hfile.HFilePre
[jira] [Resolved] (HBASE-21562) TestRestoreSnapshotFromClientAfterSplittingRegions and related tests are flakey
[ https://issues.apache.org/jira/browse/HBASE-21562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-21562. --- Resolution: Duplicate Resolve. Duplicate of HBASE-21559. Thanks [~Apache9] > TestRestoreSnapshotFromClientAfterSplittingRegions and related tests are > flakey > --- > > Key: HBASE-21562 > URL: https://issues.apache.org/jira/browse/HBASE-21562 > Project: HBase > Issue Type: Bug > Components: test >Reporter: stack >Priority: Major > Fix For: 2.1.2 > > > Fails 60% of the time on GCE runs. Messes up our nightlies for branch-2.1 and > branch-2.0 at least. > Looking its a bit tough figuring what is going on. Test asks us split > regions. The split starts then hangs. Last thing reported is: > 2018-12-06 10:20:30,823 INFO [PEWorker-16] > procedure.MasterProcedureScheduler(741): Took xlock for pid=174, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testRestoreSnapshotAfterSplittingRegions_1__regionReplication_3_-1544120421990, > parent=034bb3ebb3f9a7442f927caacdda5354, > daughterA=fbe392ca659b3913181d05ac4fb19b4c, > daughterB=3646ac333722af33c32e6f3428d23f95 > ... then all we get is that the worker is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-21559: -- Fix Version/s: (was: 2.0.5) 2.2.0 > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4 > > Attachments: HBASE-21559.v1.patch, HBASE-21559.v2.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21549) Add shell command for serial replication peer
[ https://issues.apache.org/jira/browse/HBASE-21549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-21549: --- Attachment: HBASE-21549.branch-2.001.patch > Add shell command for serial replication peer > - > > Key: HBASE-21549 > URL: https://issues.apache.org/jira/browse/HBASE-21549 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Attachments: HBASE-21549.branch-2.001.patch, > HBASE-21549.master.001.patch, HBASE-21549.master.002.patch, > HBASE-21549.master.003.patch > > > add_peer support add a serial replication peer directly. > set_peer_serial support change a replication peer's serial flag. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712323#comment-16712323 ] Hudson commented on HBASE-21559: Results for branch branch-2.1 [build #664 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/664/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/664//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/664//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/664//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, HBASE-21559.v2.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21564) race condition in WAL rolling resulting in size-based rolling getting stuck
[ https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712307#comment-16712307 ] Duo Zhang commented on HBASE-21564: --- Thanks for the nice finding. Can apply the fix for now. And I think we should redesign the log rolller, at least we should use different thread for different WALs... > race condition in WAL rolling resulting in size-based rolling getting stuck > --- > > Key: HBASE-21564 > URL: https://issues.apache.org/jira/browse/HBASE-21564 > Project: HBase > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HBASE-21564.master.001.patch > > > Manifests at least with AsyncFsWriter. > There's a window after LogRoller replaces the writer in the WAL, but before > it sets the rollLog boolean to false in the finally, where the WAL class can > request another log roll (it can happen in particular when the logs are > getting archived in the LogRoller thread, and there's high write volume > causing the logs to roll quickly). > LogRoller will blindly reset the rollLog flag in finally and "forget" about > this request. > AsyncWAL in turn never requests it again because its own rollRequested field > is set and it expects a callback. Logs don't get rolled until a periodic roll > is triggered after that. > The acknowledgment of roll requests by LogRoller should be atomic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21565) Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server
Jingyun Tian created HBASE-21565: Summary: Delete dead server from dead server list too early leads to concurrent Server Crash Procedures(SCP) for a same server Key: HBASE-21565 URL: https://issues.apache.org/jira/browse/HBASE-21565 Project: HBase Issue Type: Bug Reporter: Jingyun Tian Assignee: Jingyun Tian There are 2 kinds of SCP for a same server will be scheduled during cluster restart, one is ZK session timeout, the other one is new server report in will cause the stale one do fail over. The only barrier for these 2 kinds of SCP is check if the server is in the dead server list. {code} if (this.deadservers.isDeadServer(serverName)) { LOG.warn("Expiration called on {} but crash processing already in progress", serverName); return false; } {code} But the problem is when master finish initialization, it will delete all stale servers from dead server list. Thus when the SCP for ZK session timeout come in, the barrier is already removed. Here is the logs that how this problem occur. {code} 2018-12-07,11:42:37,589 INFO org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=9, state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false 2018-12-07,11:42:58,007 INFO org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=444, state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false {code} Now we can see two SCP are scheduled for the same server. But the first procedure is finished after the second SCP starts. {code} 2018-12-07,11:43:08,038 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=9, state=SUCCESS, hasLock=false; ServerCrashProcedure server=c4-hadoop-tst-st27.bj,29100,1544153846859, splitWal=true, meta=false in 30.5340sec {code} Thus it will leads the problem that regions will be assigned twice. {code} 2018-12-07,12:16:33,039 WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: rit=OPEN, location=c4-hadoop-tst-st28.bj,29100,1544154149607, table=test_failover, region=459b3130b40caf3b8f3e1421766f4089 reported OPEN on server=c4-hadoop-tst-st29.bj,29100,1544154149615 but state has otherwise {code} And here we can see the server is removed from dead server list before the second SCP starts. {code} 2018-12-07,11:42:44,938 DEBUG org.apache.hadoop.hbase.master.DeadServer: Removed c4-hadoop-tst-st27.bj,29100,1544153846859 ; numProcessing=3 {code} Thus we should not delete dead server from dead server list immediately. Patch to fix this problem will be upload later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712311#comment-16712311 ] Hudson commented on HBASE-21559: Results for branch branch-2.0 [build #1143 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1143/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1143//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1143//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1143//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, HBASE-21559.v2.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21564) race condition in WAL rolling resulting in size-based rolling getting stuck
[ https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712302#comment-16712302 ] stack commented on HBASE-21564: --- bq. reaching target size causes all WALs to roll... all WALs ... you mean the user-space WAL and meta WAL? If so, that sounds wrong. Unintentional. This looks like a gnarly bug. Good find. You can repro it [~sershe]? Thanks. > race condition in WAL rolling resulting in size-based rolling getting stuck > --- > > Key: HBASE-21564 > URL: https://issues.apache.org/jira/browse/HBASE-21564 > Project: HBase > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HBASE-21564.master.001.patch > > > Manifests at least with AsyncFsWriter. > There's a window after LogRoller replaces the writer in the WAL, but before > it sets the rollLog boolean to false in the finally, where the WAL class can > request another log roll (it can happen in particular when the logs are > getting archived in the LogRoller thread, and there's high write volume > causing the logs to roll quickly). > LogRoller will blindly reset the rollLog flag in finally and "forget" about > this request. > AsyncWAL in turn never requests it again because its own rollRequested field > is set and it expects a callback. Logs don't get rolled until a periodic roll > is triggered after that. > The acknowledgment of roll requests by LogRoller should be atomic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21564) race condition in WAL rolling resulting in size-based rolling getting stuck
[ https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712290#comment-16712290 ] Sergey Shelukhin commented on HBASE-21564: -- [~stack] do you remember why on WAL reaching target size causes all WALs to roll (in normal non-multi-wal case, only meta wal will be affected)? See LogRoller walNeedsToRoll map before this patch - in normal case, the value is set to true for a particular WAL when requesting a WAL roll based on size, but when actually rolling WALs in run() it's not used as a filter but merely as a value for "force" flag and all WALs are rolled. It seems like a random thing to do, esp. if using multi-wal. > race condition in WAL rolling resulting in size-based rolling getting stuck > --- > > Key: HBASE-21564 > URL: https://issues.apache.org/jira/browse/HBASE-21564 > Project: HBase > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HBASE-21564.master.001.patch > > > Manifests at least with AsyncFsWriter. > There's a window after LogRoller replaces the writer in the WAL, but before > it sets the rollLog boolean to false in the finally, where the WAL class can > request another log roll (it can happen in particular when the logs are > getting archived in the LogRoller thread, and there's high write volume > causing the logs to roll quickly). > LogRoller will blindly reset the rollLog flag in finally and "forget" about > this request. > AsyncWAL in turn never requests it again because its own rollRequested field > is set and it expects a callback. Logs don't get rolled until a periodic roll > is triggered after that. > The acknowledgment of roll requests by LogRoller should be atomic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21564) race condition in WAL rolling resulting in size-based rolling getting stuck
[ https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712283#comment-16712283 ] Sergey Shelukhin commented on HBASE-21564: -- This patch should fix the issue (I'll test on an internal repro); it also changes log roll request to rely on future-s instead of looping (since with roll requests arriving frequently, the way the wait... is implemented it may never return because some log will always be rolling based on size) > race condition in WAL rolling resulting in size-based rolling getting stuck > --- > > Key: HBASE-21564 > URL: https://issues.apache.org/jira/browse/HBASE-21564 > Project: HBase > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HBASE-21564.master.001.patch > > > Manifests at least with AsyncFsWriter. > There's a window after LogRoller replaces the writer in the WAL, but before > it sets the rollLog boolean to false in the finally, where the WAL class can > request another log roll (it can happen in particular when the logs are > getting archived in the LogRoller thread, and there's high write volume > causing the logs to roll quickly). > LogRoller will blindly reset the rollLog flag in finally and "forget" about > this request. > AsyncWAL in turn never requests it again because its own rollRequested field > is set and it expects a callback. Logs don't get rolled until a periodic roll > is triggered after that. > The acknowledgment of roll requests by LogRoller should be atomic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21564) race condition in WAL rolling
[ https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-21564: - Attachment: HBASE-21564.master.001.patch > race condition in WAL rolling > - > > Key: HBASE-21564 > URL: https://issues.apache.org/jira/browse/HBASE-21564 > Project: HBase > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HBASE-21564.master.001.patch > > > Manifests at least with AsyncFsWriter. > There's a window after LogRoller replaces the writer in the WAL, but before > it sets the rollLog boolean to false in the finally, where the WAL class can > request another log roll (it can happen in particular when the logs are > getting archived in the LogRoller thread, and there's high write volume > causing the logs to roll quickly). > LogRoller will blindly reset the rollLog flag in finally and "forget" about > this request. > AsyncWAL in turn never requests it again because its own rollRequested field > is set and it expects a callback. Logs don't get rolled until a periodic roll > is triggered after that. > The acknowledgment of roll requests by LogRoller should be atomic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21564) race condition in WAL rolling resulting in size-based rolling getting stuck
[ https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-21564: - Summary: race condition in WAL rolling resulting in size-based rolling getting stuck (was: race condition in WAL rolling) > race condition in WAL rolling resulting in size-based rolling getting stuck > --- > > Key: HBASE-21564 > URL: https://issues.apache.org/jira/browse/HBASE-21564 > Project: HBase > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HBASE-21564.master.001.patch > > > Manifests at least with AsyncFsWriter. > There's a window after LogRoller replaces the writer in the WAL, but before > it sets the rollLog boolean to false in the finally, where the WAL class can > request another log roll (it can happen in particular when the logs are > getting archived in the LogRoller thread, and there's high write volume > causing the logs to roll quickly). > LogRoller will blindly reset the rollLog flag in finally and "forget" about > this request. > AsyncWAL in turn never requests it again because its own rollRequested field > is set and it expects a callback. Logs don't get rolled until a periodic roll > is triggered after that. > The acknowledgment of roll requests by LogRoller should be atomic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21564) race condition in WAL rolling
[ https://issues.apache.org/jira/browse/HBASE-21564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-21564: - Status: Patch Available (was: Open) > race condition in WAL rolling > - > > Key: HBASE-21564 > URL: https://issues.apache.org/jira/browse/HBASE-21564 > Project: HBase > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HBASE-21564.master.001.patch > > > Manifests at least with AsyncFsWriter. > There's a window after LogRoller replaces the writer in the WAL, but before > it sets the rollLog boolean to false in the finally, where the WAL class can > request another log roll (it can happen in particular when the logs are > getting archived in the LogRoller thread, and there's high write volume > causing the logs to roll quickly). > LogRoller will blindly reset the rollLog flag in finally and "forget" about > this request. > AsyncWAL in turn never requests it again because its own rollRequested field > is set and it expects a callback. Logs don't get rolled until a periodic roll > is triggered after that. > The acknowledgment of roll requests by LogRoller should be atomic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21564) race condition in WAL rolling
Sergey Shelukhin created HBASE-21564: Summary: race condition in WAL rolling Key: HBASE-21564 URL: https://issues.apache.org/jira/browse/HBASE-21564 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Manifests at least with AsyncFsWriter. There's a window after LogRoller replaces the writer in the WAL, but before it sets the rollLog boolean to false in the finally, where the WAL class can request another log roll (it can happen in particular when the logs are getting archived in the LogRoller thread, and there's high write volume causing the logs to roll quickly). LogRoller will blindly reset the rollLog flag in finally and "forget" about this request. AsyncWAL in turn never requests it again because its own rollRequested field is set and it expects a callback. Logs don't get rolled until a periodic roll is triggered after that. The acknowledgment of roll requests by LogRoller should be atomic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21514) Refactor CacheConfig
[ https://issues.apache.org/jira/browse/HBASE-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-21514: --- Description: # move the global cache instances from CacheConfig to BlockCacheFactory. Only keep config stuff in CacheConfig. # Move block cache to HRegionServer's member variable. One rs has one block cache. was: # move the global cache instances from CacheConfig to BlockCacheFactory. Only keep config stuff in CacheConfig. # Move block cache to HRegionServer's member variable. One rs has one block cache. # Still keep GLOBAL_BLOCK_CACHE_INSTANCE in BlockCacheFactory. As there are some unit tests which don't start a mini cluster. But want to use block cache, too. > Refactor CacheConfig > > > Key: HBASE-21514 > URL: https://issues.apache.org/jira/browse/HBASE-21514 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21514.master.001.patch, > HBASE-21514.master.002.patch, HBASE-21514.master.003.patch, > HBASE-21514.master.004.patch, HBASE-21514.master.005.patch, > HBASE-21514.master.006.patch, HBASE-21514.master.007.patch, > HBASE-21514.master.008.patch, HBASE-21514.master.009.patch > > > # move the global cache instances from CacheConfig to BlockCacheFactory. Only > keep config stuff in CacheConfig. > # Move block cache to HRegionServer's member variable. One rs has one block > cache. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21514) Refactor CacheConfig
[ https://issues.apache.org/jira/browse/HBASE-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712259#comment-16712259 ] Guanghao Zhang commented on HBASE-21514: The javac warning not introduced by this patch. > Refactor CacheConfig > > > Key: HBASE-21514 > URL: https://issues.apache.org/jira/browse/HBASE-21514 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21514.master.001.patch, > HBASE-21514.master.002.patch, HBASE-21514.master.003.patch, > HBASE-21514.master.004.patch, HBASE-21514.master.005.patch, > HBASE-21514.master.006.patch, HBASE-21514.master.007.patch, > HBASE-21514.master.008.patch, HBASE-21514.master.009.patch > > > # move the global cache instances from CacheConfig to BlockCacheFactory. Only > keep config stuff in CacheConfig. > # Move block cache to HRegionServer's member variable. One rs has one block > cache. > # Still keep GLOBAL_BLOCK_CACHE_INSTANCE in BlockCacheFactory. As there are > some unit tests which don't start a mini cluster. But want to use block > cache, too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21413) Empty meta log doesn't get split when restart whole cluster
[ https://issues.apache.org/jira/browse/HBASE-21413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712258#comment-16712258 ] Allan Yang commented on HBASE-21413: [~apurtell], sure, I can do the backport > Empty meta log doesn't get split when restart whole cluster > --- > > Key: HBASE-21413 > URL: https://issues.apache.org/jira/browse/HBASE-21413 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.1.1, 2.0.2 >Reporter: Jingyun Tian >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4 > > Attachments: HBASE-21413.branch-2.1.001.patch, > HBASE-21413.branch-2.1.002.patch, Screenshot from 2018-10-31 18-11-02.png, > Screenshot from 2018-10-31 18-11-11.png > > > After I restart whole cluster, there is a splitting directory still exists on > hdfs. Then I found there is only an empty meta wal file in it. I'll dig into > this later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side
[ https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712243#comment-16712243 ] Zheng Hu commented on HBASE-21551: -- Thanks [~busbey] for the release note. > Memory leak when use scan with STREAM at server side > > > Key: HBASE-21551 > URL: https://issues.apache.org/jira/browse/HBASE-21551 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Blocker > Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4 > > Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, > HBASE-21551.v3.patch, heap-dump.jpg > > > We open the RegionServerScanner with STREAM as following: > {code} > RegionScannerImpl#initializeScanners > |---> HStore#getScanner > |--> StoreScanner() > |---> > StoreFileScanner#getScannersForStoreFiles > |--> > HStoreFile#getStreamScanner #1 > {code} > In #1, we put the StoreFileReader into a concurrent hash map streamReaders, > but not remove the StreamReader from streamReaders until closing the store > file. > So if we scan with stream with so many times, the streamReaders hash map > will be exploded. we can see the heap dump in the attached heap-dump.jpg. > I found this bug, because when i benchmark the scan performance by using YCSB > in a cluster (heap size of RS is 50g), the Rs was easy to occur a long time > full gc ( ~ 110 sec) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21563) HBase Get Encounters java.lang.IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HBASE-21563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712241#comment-16712241 ] Zheng Hu commented on HBASE-21563: -- Let me take a look. > HBase Get Encounters java.lang.IndexOutOfBoundsException > > > Key: HBASE-21563 > URL: https://issues.apache.org/jira/browse/HBASE-21563 > Project: HBase > Issue Type: Bug > Components: HFile >Affects Versions: 1.2.0 >Reporter: William Shen >Priority: Major > Attachments: 67a04bc049be4f58afecdcc0a3ba62ca.tar.gz > > > We've recently encountered issue retrieving data from our HBase cluster, and > have not had much luck troubleshooting the issue. We narrowed down our issue > to a single GET, which appears to be caused by FastDiffDeltaEncoder.java > running into java.lang.IndexOutOfBoundsException. > Perhaps there is a bug on a corner case for FastDiffDeltaEncoder? > We are running 1.2.0-cdh5.9.2, and the GET in question is: > {noformat} > hbase(main):004:0> get 'qa2.ADGROUPS', > "\x05\x80\x00\x00\x00\x00\x1F\x54\x9C\x80\x00\x00\x00\x00\x1C\x7D\x45\x00\x04\x80\x00\x00\x00\x00\x1D\x0F\x19\x80\x00\x00\x00\x00\x4A\x64\x6F\x80\x00\x00\x00\x01\xD9\xDB\xCE" > COLUMNCELL > > > ERROR: java.io.IOException > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165) > Caused by: java.lang.IndexOutOfBoundsException > at java.nio.Buffer.checkBounds(Buffer.java:567) > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:149) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:465) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:516) > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:618) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.next(HFileReaderV2.java:1277) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:180) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:588) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5706) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5865) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5643) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5620) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5606) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6801) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6779) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2029) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170) > ... 3 more {noformat} > Likewise, running {{ hbase hfile -f -p }} on the specific hfile, a subset of > kv pairs were printed until the program hits the following exception and > crashes: > {noformat} > Exception in thread "main" java.lang.RuntimeException: Unknown code 65 > at org.apache.hadoop.hbase.KeyValue$Type.codeToType(KeyValue.java:259) > at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1246) > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$ClonedSeekerState.toString(BufferedDataBlockEncoder.java:506) > at java.lang.String.valueOf(String.java:2994) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:382) > at > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:316) > at > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:255) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:677) > {noformat} > I have attached the HFile related to this issue for
[jira] [Resolved] (HBASE-21551) Memory leak when use scan with STREAM at server side
[ https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey resolved HBASE-21551. - Resolution: Fixed Release Note: ### Summary HBase clusters will experience Region Server failures due to out of memory errors due to a leak given any of the following: * User initiates Scan operations set to use the STREAM reading type * User initiates Scan operations set to use the default reading type that read more than 4 * the block size of column families involved in the scan (e.g. by default 4*64KiB) * Compactions run ### Root cause When there are long running scans the Region Server process attempts to optimize access by using a different API geared towards sequential access. Due to an error in HBASE-20704 for HBase 2.0+ the Region Server fails to release related resources when those scans finish. That same optimization path is always used for the HBase internal file compaction process. ### Workaround Impact for this error can be minimized by setting the config value “hbase.storescanner.pread.max.bytes” to MAX_INT to avoid the optimization for default user scans. Clients should also be checked to ensure they do not pass the STREAM read type to the Scan API. This will have a severe impact on performance for long scans. Compactions always use this sequential optimized reading mechanism so downstream users will need to periodically restart Region Server roles after compactions have happened. > Memory leak when use scan with STREAM at server side > > > Key: HBASE-21551 > URL: https://issues.apache.org/jira/browse/HBASE-21551 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Blocker > Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4 > > Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, > HBASE-21551.v3.patch, heap-dump.jpg > > > We open the RegionServerScanner with STREAM as following: > {code} > RegionScannerImpl#initializeScanners > |---> HStore#getScanner > |--> StoreScanner() > |---> > StoreFileScanner#getScannersForStoreFiles > |--> > HStoreFile#getStreamScanner #1 > {code} > In #1, we put the StoreFileReader into a concurrent hash map streamReaders, > but not remove the StreamReader from streamReaders until closing the store > file. > So if we scan with stream with so many times, the streamReaders hash map > will be exploded. we can see the heap dump in the attached heap-dump.jpg. > I found this bug, because when i benchmark the scan performance by using YCSB > in a cluster (heap size of RS is 50g), the Rs was easy to occur a long time > full gc ( ~ 110 sec) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21551) Memory leak when use scan with STREAM at server side
[ https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey reopened HBASE-21551: - reopening so I can add a release note > Memory leak when use scan with STREAM at server side > > > Key: HBASE-21551 > URL: https://issues.apache.org/jira/browse/HBASE-21551 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Blocker > Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4 > > Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, > HBASE-21551.v3.patch, heap-dump.jpg > > > We open the RegionServerScanner with STREAM as following: > {code} > RegionScannerImpl#initializeScanners > |---> HStore#getScanner > |--> StoreScanner() > |---> > StoreFileScanner#getScannersForStoreFiles > |--> > HStoreFile#getStreamScanner #1 > {code} > In #1, we put the StoreFileReader into a concurrent hash map streamReaders, > but not remove the StreamReader from streamReaders until closing the store > file. > So if we scan with stream with so many times, the streamReaders hash map > will be exploded. we can see the heap dump in the attached heap-dump.jpg. > I found this bug, because when i benchmark the scan performance by using YCSB > in a cluster (heap size of RS is 50g), the Rs was easy to occur a long time > full gc ( ~ 110 sec) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712174#comment-16712174 ] Duo Zhang commented on HBASE-21559: --- Pushed to branch-2.0+. Let's see how it works. > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, HBASE-21559.v2.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21562) TestRestoreSnapshotFromClientAfterSplittingRegions and related tests are flakey
[ https://issues.apache.org/jira/browse/HBASE-21562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712166#comment-16712166 ] Duo Zhang commented on HBASE-21562: --- HBASE-21559? > TestRestoreSnapshotFromClientAfterSplittingRegions and related tests are > flakey > --- > > Key: HBASE-21562 > URL: https://issues.apache.org/jira/browse/HBASE-21562 > Project: HBase > Issue Type: Bug > Components: test >Reporter: stack >Priority: Major > Fix For: 2.1.2 > > > Fails 60% of the time on GCE runs. Messes up our nightlies for branch-2.1 and > branch-2.0 at least. > Looking its a bit tough figuring what is going on. Test asks us split > regions. The split starts then hangs. Last thing reported is: > 2018-12-06 10:20:30,823 INFO [PEWorker-16] > procedure.MasterProcedureScheduler(741): Took xlock for pid=174, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testRestoreSnapshotAfterSplittingRegions_1__regionReplication_3_-1544120421990, > parent=034bb3ebb3f9a7442f927caacdda5354, > daughterA=fbe392ca659b3913181d05ac4fb19b4c, > daughterB=3646ac333722af33c32e6f3428d23f95 > ... then all we get is that the worker is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21414) StoreFileSize growth rate metric
[ https://issues.apache.org/jira/browse/HBASE-21414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-21414: - Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to master. Thanks for the patch! > StoreFileSize growth rate metric > > > Key: HBASE-21414 > URL: https://issues.apache.org/jira/browse/HBASE-21414 > Project: HBase > Issue Type: Improvement > Components: metrics, monitoring >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Minor > Fix For: 3.0.0 > > Attachments: HBASE-21414.master.001.patch, > HBASE-21414.master.002.patch, HBASE-21414.master.003.patch > > > A metric on the growth rate of storefile sizes would be nice to have as a way > of monitoring traffic patterns. I know you can get the same insight from > graphing the delta on the storeFileSize metric, but not all metrics > visualization tools support that -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21414) StoreFileSize growth rate metric
[ https://issues.apache.org/jira/browse/HBASE-21414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712047#comment-16712047 ] Hadoop QA commented on HBASE-21414: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 29s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 47s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 25s{color} | {color:blue} hbase-hadoop2-compat in master has 18 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} The patch passed checkstyle in hbase-hadoop-compat {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} The patch passed checkstyle in hbase-hadoop2-compat {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s{color} | {color:green} hbase-server: The patch generated 0 new + 3 unchanged - 2 fixed = 3 total (was 5) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 49s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 10s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}122m 14s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}165m 0s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21414 | | JIRA Patch URL | https://issues.apache.org/jira
[jira] [Commented] (HBASE-21563) HBase Get Encounters java.lang.IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HBASE-21563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712031#comment-16712031 ] stack commented on HBASE-21563: --- [~openinx] This like your HBASE-21379? In a different location? Different encoding? > HBase Get Encounters java.lang.IndexOutOfBoundsException > > > Key: HBASE-21563 > URL: https://issues.apache.org/jira/browse/HBASE-21563 > Project: HBase > Issue Type: Bug > Components: HFile >Affects Versions: 1.2.0 >Reporter: William Shen >Priority: Major > Attachments: 67a04bc049be4f58afecdcc0a3ba62ca.tar.gz > > > We've recently encountered issue retrieving data from our HBase cluster, and > have not had much luck troubleshooting the issue. We narrowed down our issue > to a single GET, which appears to be caused by FastDiffDeltaEncoder.java > running into java.lang.IndexOutOfBoundsException. > Perhaps there is a bug on a corner case for FastDiffDeltaEncoder? > We are running 1.2.0-cdh5.9.2, and the GET in question is: > {noformat} > hbase(main):004:0> get 'qa2.ADGROUPS', > "\x05\x80\x00\x00\x00\x00\x1F\x54\x9C\x80\x00\x00\x00\x00\x1C\x7D\x45\x00\x04\x80\x00\x00\x00\x00\x1D\x0F\x19\x80\x00\x00\x00\x00\x4A\x64\x6F\x80\x00\x00\x00\x01\xD9\xDB\xCE" > COLUMNCELL > > > ERROR: java.io.IOException > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165) > Caused by: java.lang.IndexOutOfBoundsException > at java.nio.Buffer.checkBounds(Buffer.java:567) > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:149) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:465) > at > org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:516) > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:618) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.next(HFileReaderV2.java:1277) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:180) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:588) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5706) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5865) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5643) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5620) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5606) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6801) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6779) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2029) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170) > ... 3 more {noformat} > Likewise, running {{ hbase hfile -f -p }} on the specific hfile, a subset of > kv pairs were printed until the program hits the following exception and > crashes: > {noformat} > Exception in thread "main" java.lang.RuntimeException: Unknown code 65 > at org.apache.hadoop.hbase.KeyValue$Type.codeToType(KeyValue.java:259) > at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1246) > at > org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$ClonedSeekerState.toString(BufferedDataBlockEncoder.java:506) > at java.lang.String.valueOf(String.java:2994) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:382) > at > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:316) > at > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:255) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:677) > {nofor
[jira] [Created] (HBASE-21563) HBase Get Encounters java.lang.IndexOutOfBoundsException
William Shen created HBASE-21563: Summary: HBase Get Encounters java.lang.IndexOutOfBoundsException Key: HBASE-21563 URL: https://issues.apache.org/jira/browse/HBASE-21563 Project: HBase Issue Type: Bug Components: HFile Affects Versions: 1.2.0 Reporter: William Shen Attachments: 67a04bc049be4f58afecdcc0a3ba62ca.tar.gz We've recently encountered issue retrieving data from our HBase cluster, and have not had much luck troubleshooting the issue. We narrowed down our issue to a single GET, which appears to be caused by FastDiffDeltaEncoder.java running into java.lang.IndexOutOfBoundsException. Perhaps there is a bug on a corner case for FastDiffDeltaEncoder? We are running 1.2.0-cdh5.9.2, and the GET in question is: {noformat} hbase(main):004:0> get 'qa2.ADGROUPS', "\x05\x80\x00\x00\x00\x00\x1F\x54\x9C\x80\x00\x00\x00\x00\x1C\x7D\x45\x00\x04\x80\x00\x00\x00\x00\x1D\x0F\x19\x80\x00\x00\x00\x00\x4A\x64\x6F\x80\x00\x00\x00\x01\xD9\xDB\xCE" COLUMNCELL ERROR: java.io.IOException at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165) Caused by: java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkBounds(Buffer.java:567) at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:149) at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decode(FastDiffDeltaEncoder.java:465) at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder$1.decodeNext(FastDiffDeltaEncoder.java:516) at org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$BufferedEncodedSeeker.next(BufferedDataBlockEncoder.java:618) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.next(HFileReaderV2.java:1277) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:180) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:108) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:588) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5706) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5865) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5643) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5620) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5606) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6801) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6779) at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2029) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170) ... 3 more {noformat} Likewise, running {{ hbase hfile -f -p }} on the specific hfile, a subset of kv pairs were printed until the program hits the following exception and crashes: {noformat} Exception in thread "main" java.lang.RuntimeException: Unknown code 65 at org.apache.hadoop.hbase.KeyValue$Type.codeToType(KeyValue.java:259) at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1246) at org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder$ClonedSeekerState.toString(BufferedDataBlockEncoder.java:506) at java.lang.String.valueOf(String.java:2994) at java.lang.StringBuilder.append(StringBuilder.java:131) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.scanKeysValues(HFilePrettyPrinter.java:382) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.processFile(HFilePrettyPrinter.java:316) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.run(HFilePrettyPrinter.java:255) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.io.hfile.HFilePrettyPrinter.main(HFilePrettyPrinter.java:677) {noformat} I have attached the HFile related to this issue for debugging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711921#comment-16711921 ] Hadoop QA commented on HBASE-21559: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 25s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 48s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 44s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 34s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}225m 27s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}271m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21559 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12950848/HBASE-21559.v2.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux daa900739133 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 12e75a8a63 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/15211/testReport/ | | Max. process+thread count | 5021 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/15
[jira] [Created] (HBASE-21562) TestRestoreSnapshotFromClientAfterSplittingRegions and related tests are flakey
stack created HBASE-21562: - Summary: TestRestoreSnapshotFromClientAfterSplittingRegions and related tests are flakey Key: HBASE-21562 URL: https://issues.apache.org/jira/browse/HBASE-21562 Project: HBase Issue Type: Bug Components: test Reporter: stack Fix For: 2.1.2 Fails 60% of the time on GCE runs. Messes up our nightlies for branch-2.1 and branch-2.0 at least. Looking its a bit tough figuring what is going on. Test asks us split regions. The split starts then hangs. Last thing reported is: 2018-12-06 10:20:30,823 INFO [PEWorker-16] procedure.MasterProcedureScheduler(741): Took xlock for pid=174, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure table=testRestoreSnapshotAfterSplittingRegions_1__regionReplication_3_-1544120421990, parent=034bb3ebb3f9a7442f927caacdda5354, daughterA=fbe392ca659b3913181d05ac4fb19b4c, daughterB=3646ac333722af33c32e6f3428d23f95 ... then all we get is that the worker is stuck. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21453) Convert ReadOnlyZKClient to DEBUG instead of INFO
[ https://issues.apache.org/jira/browse/HBASE-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711889#comment-16711889 ] Peter Somogyi commented on HBASE-21453: --- You're right that this Jira is really focused on ReadOnlyZKClient class. Let's have this as it is now and have a follow up issue for other zookeeper related log messages. +1 on this patch > Convert ReadOnlyZKClient to DEBUG instead of INFO > - > > Key: HBASE-21453 > URL: https://issues.apache.org/jira/browse/HBASE-21453 > Project: HBase > Issue Type: Bug > Components: logging, Zookeeper >Reporter: stack >Assignee: Sakthi >Priority: Major > Attachments: hbase-21453.master.001.patch > > > Running commands in spark-shell, this is what it looks like on each > invocation: > {code} > scala> val count = rdd.count() > 2018-11-07 21:01:46,026 INFO [Executor task launch worker for task 1] > zookeeper.ReadOnlyZKClient: Connect 0x18f3d868 to localhost:2181 with session > timeout=9ms, retries 30, retry interval 1000ms, keepAlive=6ms > 2018-11-07 21:01:46,027 INFO [ReadOnlyZKClient-localhost:2181@0x18f3d868] > zookeeper.ZooKeeper: Initiating client connection, > connectString=localhost:2181 sessionTimeout=9 > watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$20/1362339879@743dab9f > 2018-11-07 21:01:46,030 INFO > [ReadOnlyZKClient-localhost:2181@0x18f3d868-SendThread(localhost:2181)] > zookeeper.ClientCnxn: Opening socket connection to server > localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL > (unknown error) > 2018-11-07 21:01:46,031 INFO > [ReadOnlyZKClient-localhost:2181@0x18f3d868-SendThread(localhost:2181)] > zookeeper.ClientCnxn: Socket connection established to > localhost/127.0.0.1:2181, initiating session > 2018-11-07 21:01:46,033 INFO > [ReadOnlyZKClient-localhost:2181@0x18f3d868-SendThread(localhost:2181)] > zookeeper.ClientCnxn: Session establishment complete on server > localhost/127.0.0.1:2181, sessionid = 0x166f1b283080005, negotiated timeout = > 4 > 2018-11-07 21:01:46,035 INFO [Executor task launch worker for task 1] > mapreduce.TableInputFormatBase: Input split length: 0 bytes. > [Stage 1:> (0 + 1) / > 1]2018-11-07 21:01:48,074 INFO [Executor task launch worker for task 1] > zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x18f3d868 to > localhost:2181 > 2018-11-07 21:01:48,075 INFO [ReadOnlyZKClient-localhost:2181@0x18f3d868] > zookeeper.ZooKeeper: Session: 0x166f1b283080005 closed > 2018-11-07 21:01:48,076 INFO [ReadOnlyZKClient > -localhost:2181@0x18f3d868-EventThread] zookeeper.ClientCnxn: EventThread > shut down for session: 0x166f1b283080005 > count: Long = 10 > {code} > Let me shut down the ReadOnlyZKClient log level. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711870#comment-16711870 ] stack commented on HBASE-15560: --- Scheduled this on 1.5 too (smile). > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0 > > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, > run_ycsb_loading.sh, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-15560: -- Fix Version/s: 1.5.0 > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0 > > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, > run_ycsb_loading.sh, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21414) StoreFileSize growth rate metric
[ https://issues.apache.org/jira/browse/HBASE-21414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommy Li updated HBASE-21414: - Attachment: HBASE-21414.master.003.patch > StoreFileSize growth rate metric > > > Key: HBASE-21414 > URL: https://issues.apache.org/jira/browse/HBASE-21414 > Project: HBase > Issue Type: Improvement > Components: metrics, monitoring >Reporter: Tommy Li >Assignee: Tommy Li >Priority: Minor > Attachments: HBASE-21414.master.001.patch, > HBASE-21414.master.002.patch, HBASE-21414.master.003.patch > > > A metric on the growth rate of storefile sizes would be nice to have as a way > of monitoring traffic patterns. I know you can get the same insight from > graphing the delta on the storeFileSize metric, but not all metrics > visualization tools support that -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711856#comment-16711856 ] Andrew Purtell commented on HBASE-15560: We could make it default in 3.0 for sure, possibly in 2.2 with a big fat release note? Because it is configurable an operator could switch away if they notice a problem after an upgrade. Although I think we might have a debate on compatibility semantics if done in a minor. I am winding down some internal stuff at work and will have more time to work on open source very soon, with the intent to branch for 1.5 and make a series of 1.5 releases. For what it's worth we could try tiny-LFU as default in 1.5 should a branch-1 patch be made available and committed prior to starting that. Expecting to start the 1.5 stuff next month, January 2019. Part of the release work for a new minor would be a lot more perf testing than usual, although with the usual set of crappy tools (PE, YCSB, etc.) > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, > run_ycsb_loading.sh, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21413) Empty meta log doesn't get split when restart whole cluster
[ https://issues.apache.org/jira/browse/HBASE-21413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711815#comment-16711815 ] Andrew Purtell commented on HBASE-21413: This is an issue for branch-1 too but the patch as is can't be applied, it includes Java 8 constructs in the test. I will open a subtask for backport. No need to take it up if you don't want [~allan163], although a backport would certainly be appreciated. > Empty meta log doesn't get split when restart whole cluster > --- > > Key: HBASE-21413 > URL: https://issues.apache.org/jira/browse/HBASE-21413 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.1.1, 2.0.2 >Reporter: Jingyun Tian >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4 > > Attachments: HBASE-21413.branch-2.1.001.patch, > HBASE-21413.branch-2.1.002.patch, Screenshot from 2018-10-31 18-11-02.png, > Screenshot from 2018-10-31 18-11-11.png > > > After I restart whole cluster, there is a splitting directory still exists on > hdfs. Then I found there is only an empty meta wal file in it. I'll dig into > this later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711823#comment-16711823 ] stack commented on HBASE-15560: --- Thanks [~apurtell]. Was just hoping it was better in most cases so we would just enable it as default. Was trying to avoid adding code and options that might go unexercised. Lets see if we get an uptake on our call for a volunteer? If nought, can commit (I've scheduled this against 2.2/3.0 so it will at least get consideration before we make those releases). > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, > run_ycsb_loading.sh, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21561) Backport HBASE-21413 (Empty meta log doesn't get split when restart whole cluster) to branch-1
Andrew Purtell created HBASE-21561: -- Summary: Backport HBASE-21413 (Empty meta log doesn't get split when restart whole cluster) to branch-1 Key: HBASE-21561 URL: https://issues.apache.org/jira/browse/HBASE-21561 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Fix For: 1.5.0, 1.3.3, 1.4.10 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-21553) schedLock not released in MasterProcedureScheduler
[ https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reassigned HBASE-21553: -- Assignee: Xu Cang > schedLock not released in MasterProcedureScheduler > -- > > Key: HBASE-21553 > URL: https://issues.apache.org/jira/browse/HBASE-21553 > Project: HBase > Issue Type: Improvement >Reporter: Xu Cang >Assignee: Xu Cang >Priority: Major > > https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749 > As shown above, we didn't unlock schedLock which can cause deadlock. > Besides this, there are other places in this class handles schedLock.unlock > in a risky manner. I'd like to move them to finally block to improve the > robustness of handling locks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15560) TinyLFU-based BlockCache
[ https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711804#comment-16711804 ] Andrew Purtell commented on HBASE-15560: I came to say the feature is additive (modulo changes to blockcache to enable the tiny-LFU policy to be an optional feature) and optional, so why not put it in and allow people to try it out at their option. However then I see above [~stack] wants it to be default out of a well-intentioned goal to hold down further growth of the state space of our optional configurations. Unfortunately the reason for the growth over time of our suite of configuration options is the IMHO unresolvable tension between the desire to ship new and beneficial features to the user community and the desire of others to acquire bug fixes from upgrades without taking on default-on changes that might destabilize current operations. There is no way to resolve this tension so over time the suite of optional configurations for a mature product grows. I think that is fine. So why not commit this and let people try it out at their option? > TinyLFU-based BlockCache > > > Key: HBASE-15560 > URL: https://issues.apache.org/jira/browse/HBASE-15560 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Affects Versions: 2.0.0 >Reporter: Ben Manes >Assignee: Ben Manes >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, > bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, run_ycsb_c.sh, > run_ycsb_loading.sh, tinylfu.patch > > > LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and > recency of the working set. It achieves concurrency by using an O( n ) > background thread to prioritize the entries and evict. Accessing an entry is > O(1) by a hash table lookup, recording its logical access time, and setting a > frequency flag. A write is performed in O(1) time by updating the hash table > and triggering an async eviction thread. This provides ideal concurrency and > minimizes the latencies by penalizing the thread instead of the caller. > However the policy does not age the frequencies and may not be resilient to > various workload patterns. > W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the > frequency in a counting sketch, ages periodically by halving the counters, > and orders entries by SLRU. An entry is discarded by comparing the frequency > of the new arrival (candidate) to the SLRU's victim, and keeping the one with > the highest frequency. This allows the operations to be performed in O(1) > time and, though the use of a compact sketch, a much larger history is > retained beyond the current working set. In a variety of real world traces > the policy had [near optimal hit > rates|https://github.com/ben-manes/caffeine/wiki/Efficiency]. > Concurrency is achieved by buffering and replaying the operations, similar to > a write-ahead log. A read is recorded into a striped ring buffer and writes > to a queue. The operations are applied in batches under a try-lock by an > asynchronous thread, thereby track the usage pattern without incurring high > latencies > ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]). > In YCSB benchmarks the results were inconclusive. For a large cache (99% hit > rates) the two caches have near identical throughput and latencies with > LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a > 1-4% hit rate improvement and therefore lower latencies. The lack luster > result is because a synthetic Zipfian distribution is used, which SLRU > performs optimally. In a more varied, real-world workload we'd expect to see > improvements by being able to make smarter predictions. > The provided patch implements BlockCache using the > [Caffeine|https://github.com/ben-manes/caffeine] caching library (see > HighScalability > [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]). > Edward Bortnikov and Eshcar Hillel have graciously provided guidance for > evaluating this patch ([github > branch|https://github.com/ben-manes/hbase/tree/tinylfu]). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21553) schedLock not released in MasterProcedureScheduler
[ https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711805#comment-16711805 ] Xu Cang commented on HBASE-21553: - Yes. Will upload a patch today. [~apurtell] > schedLock not released in MasterProcedureScheduler > -- > > Key: HBASE-21553 > URL: https://issues.apache.org/jira/browse/HBASE-21553 > Project: HBase > Issue Type: Improvement >Reporter: Xu Cang >Priority: Major > > https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749 > As shown above, we didn't unlock schedLock which can cause deadlock. > Besides this, there are other places in this class handles schedLock.unlock > in a risky manner. I'd like to move them to finally block to improve the > robustness of handling locks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.
[ https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711798#comment-16711798 ] Hadoop QA commented on HBASE-21505: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 44s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 30s{color} | {color:blue} hbase-hadoop2-compat in master has 18 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 5s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 8s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 30s{color} | {color:red} hbase-server: The patch generated 3 new + 85 unchanged - 3 fixed = 88 total (was 88) {color} | | {color:red}-1{color} | {color:red} rubocop {color} | {color:red} 0m 9s{color} | {color:red} The patch generated 55 new + 405 unchanged - 9 fixed = 460 total (was 414) {color} | | {color:orange}-0{color} | {color:orange} ruby-lint {color} | {color:orange} 0m 4s{color} | {color:orange} The patch generated 3 new + 748 unchanged - 1 fixed = 751 total (was 749) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 56s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 31s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 3m 0s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 43s{color} | {color:red} hbase-server generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 34s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s{color} | {color:green} hbase-hadoop-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 31s{color} | {color:green} hbase-hadoop2-compat in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hbase-protocol
[jira] [Commented] (HBASE-21553) schedLock not released in MasterProcedureScheduler
[ https://issues.apache.org/jira/browse/HBASE-21553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711781#comment-16711781 ] Andrew Purtell commented on HBASE-21553: Are you planning to provide a patch [~xucang]? > schedLock not released in MasterProcedureScheduler > -- > > Key: HBASE-21553 > URL: https://issues.apache.org/jira/browse/HBASE-21553 > Project: HBase > Issue Type: Improvement >Reporter: Xu Cang >Priority: Major > > https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L749 > As shown above, we didn't unlock schedLock which can cause deadlock. > Besides this, there are other places in this class handles schedLock.unlock > in a risky manner. I'd like to move them to finally block to improve the > robustness of handling locks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21283) Add new shell command 'rit' for listing regions in transition
[ https://issues.apache.org/jira/browse/HBASE-21283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-21283: Component/s: Operability > Add new shell command 'rit' for listing regions in transition > - > > Key: HBASE-21283 > URL: https://issues.apache.org/jira/browse/HBASE-21283 > Project: HBase > Issue Type: Improvement > Components: Operability, shell >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.2.0 > > Attachments: HBASE-21283-branch-1.patch, HBASE-21283-branch-1.patch, > HBASE-21283-branch-1.patch, HBASE-21283.patch, HBASE-21283.patch, > HBASE-21283.patch > > > The 'status' shell command shows regions in transition but sometimes an > operator may want to retrieve a simple list of regions in transition. Here's > a patch that adds a new 'rit' command to the TOOLS group that does just that. > No test, because it seems hard to mock RITs from the ruby test code, but I > have run TestShell and it passes, so the command is verified to meet minimum > requirements, like help text, and manually verified with branch-1 (shell in > branch-2 and up doesn't return until TransitRegionProcedure has completed so > by that time no RIT): > {noformat} > HBase Shell > Use "help" to get list of supported commands. > Use "exit" to quit this interactive shell. > Version 1.5.0-SNAPSHOT, r9bb6d2fa8b760f16cd046657240ebd4ad91cb6de, Mon Oct 8 > 21:05:50 UTC 2018 > hbase(main):001:0> help 'rit' > List all regions in transition. > Examples: > hbase> rit > hbase(main):002:0> create ... > 0 row(s) in 2.5150 seconds > => Hbase::Table - IntegrationTestBigLinkedList > hbase(main):003:0> rit > 0 row(s) in 0.0340 seconds > hbase(main):004:0> unassign '56f0c38c81ae453d19906ce156a2d6a1' > 0 row(s) in 0.0540 seconds > hbase(main):005:0> rit > IntegrationTestBigLinkedList,L\xCC\xCC\xCC\xCC\xCC\xCC\xCB,1539117183224.56f0c38c81ae453d19906ce156a2d6a1. > state=PENDING_CLOSE, ts=Tue Oct 09 20:33:34 UTC 2018 (0s ago), server=null > > > > 1 row(s) in 0.0170 seconds > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.
[ https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil updated HBASE-21505: - Status: Patch Available (was: In Progress) > Several inconsistencies on information reported for Replication Sources by > hbase shell status 'replication' command. > > > Key: HBASE-21505 > URL: https://issues.apache.org/jira/browse/HBASE-21505 > Project: HBase > Issue Type: Bug >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Attachments: > 0001-HBASE-21505-initial-version-for-more-detailed-report.patch, > HBASE-21505-master.001.patch, HBASE-21505-master.002.patch > > > While reviewing hbase shell status 'replication' command, noticed the > following issues related to replication source section: > 1) TimeStampsOfLastShippedOp keeps getting updated and increasing even when > no new edits were added to source, so nothing was really shipped. Test steps > performed: > 1.1) Source cluster with only one table targeted to replication; > 1.2) Added a new row, confirmed the row appeared in Target cluster; > 1.3) Issued status 'replication' command in source, TimeStampsOfLastShippedOp > shows current timestamp T1. > 1.4) Waited 30 seconds, no new data added to source. Issued status > 'replication' command, now shows timestamp T2. > 2) When replication is stuck due some connectivity issues or target > unavailability, if new edits are added in source, reported AgeOfLastShippedOp > is wrongly showing same value as "Replication Lag". This is incorrect, > AgeOfLastShippedOp should not change until there's indeed another edit > shipped to target. Test steps performed: > 2.1) Source cluster with only one table targeted to replication; > 2.2) Stopped target cluster RS; > 2.3) Put a new row on source. Running status 'replication' command does show > lag increasing. TimeStampsOfLastShippedOp seems correct also, no further > updates as described on bullet #1 above. > 2.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even > though there's no new edit shipped to target: > {noformat} > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581 > ... > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586 > ... > {noformat} > 3) AgeOfLastShippedOp gets set to 0 even when a given edit had taken some > time before it got finally shipped to target. Test steps performed: > 3.1) Source cluster with only one table targeted to replication; > 3.2) Stopped target cluster RS; > 3.3) Put a new row on source. > 3.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even > though there's no new edit shipped to target: > {noformat} > T1: > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581 > ... > T2: > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586 > ... > {noformat} > 3.5) Restart target cluster RS and verified the new row appeared there. No > new edit added, but status 'replication' command reports AgeOfLastShippedOp > as 0, while it should be the diff between the time it concluded shipping at > target and the time it was added in source: > {noformat} > SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=0 > {noformat} > 4) When replication is stuck due some connectivity issues or target > unavailability, if RS is restarted, once recovered queue source is started, > TimeStampsOfLastShippedOp is set to initial java date (Thu Jan 01 01:00:00 > GMT 1970, for example), thus "Replication Lag" also gives a complete > inaccurate value. > Tests performed: > 4.1) Source cluster with only one table targeted to replication; > 4.2) Stopped target cluster RS; > 4.3) Put a new row on source, restart RS on source, waited a few seconds for > recovery queue source to startup, then it gives: > {noformat} > SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Thu Jan 01 01:00:00 GMT 1970, Replication > Lag=9223372036854775807 > {noformat} > Also, we should report status to all sources running, current output format > gives the impression there’s only one, even when there are recovery queues, > for instance. > Here is a list of ideas on how the command should report under different > states of replication: > a) Source started, target stopped, no ed
[jira] [Commented] (HBASE-21526) Use AsyncClusterConnection in ServerManager for getRsAdmin
[ https://issues.apache.org/jira/browse/HBASE-21526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711660#comment-16711660 ] Hadoop QA commented on HBASE-21526: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} HBASE-21512 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 9s{color} | {color:green} HBASE-21512 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 53s{color} | {color:green} HBASE-21512 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 0s{color} | {color:green} HBASE-21512 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 48s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 31s{color} | {color:green} HBASE-21512 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} HBASE-21512 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} The patch passed checkstyle in hbase-common {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} The patch passed checkstyle in hbase-client {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green} hbase-server: The patch generated 0 new + 167 unchanged - 4 fixed = 167 total (was 171) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 47s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 18s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 42s{color} | {color:green} hbase-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 14s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}125m 8s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}177m 43s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yet
[jira] [Updated] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.
[ https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil updated HBASE-21505: - Status: In Progress (was: Patch Available) > Several inconsistencies on information reported for Replication Sources by > hbase shell status 'replication' command. > > > Key: HBASE-21505 > URL: https://issues.apache.org/jira/browse/HBASE-21505 > Project: HBase > Issue Type: Bug >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Attachments: > 0001-HBASE-21505-initial-version-for-more-detailed-report.patch, > HBASE-21505-master.001.patch, HBASE-21505-master.002.patch > > > While reviewing hbase shell status 'replication' command, noticed the > following issues related to replication source section: > 1) TimeStampsOfLastShippedOp keeps getting updated and increasing even when > no new edits were added to source, so nothing was really shipped. Test steps > performed: > 1.1) Source cluster with only one table targeted to replication; > 1.2) Added a new row, confirmed the row appeared in Target cluster; > 1.3) Issued status 'replication' command in source, TimeStampsOfLastShippedOp > shows current timestamp T1. > 1.4) Waited 30 seconds, no new data added to source. Issued status > 'replication' command, now shows timestamp T2. > 2) When replication is stuck due some connectivity issues or target > unavailability, if new edits are added in source, reported AgeOfLastShippedOp > is wrongly showing same value as "Replication Lag". This is incorrect, > AgeOfLastShippedOp should not change until there's indeed another edit > shipped to target. Test steps performed: > 2.1) Source cluster with only one table targeted to replication; > 2.2) Stopped target cluster RS; > 2.3) Put a new row on source. Running status 'replication' command does show > lag increasing. TimeStampsOfLastShippedOp seems correct also, no further > updates as described on bullet #1 above. > 2.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even > though there's no new edit shipped to target: > {noformat} > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581 > ... > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586 > ... > {noformat} > 3) AgeOfLastShippedOp gets set to 0 even when a given edit had taken some > time before it got finally shipped to target. Test steps performed: > 3.1) Source cluster with only one table targeted to replication; > 3.2) Stopped target cluster RS; > 3.3) Put a new row on source. > 3.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even > though there's no new edit shipped to target: > {noformat} > T1: > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581 > ... > T2: > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586 > ... > {noformat} > 3.5) Restart target cluster RS and verified the new row appeared there. No > new edit added, but status 'replication' command reports AgeOfLastShippedOp > as 0, while it should be the diff between the time it concluded shipping at > target and the time it was added in source: > {noformat} > SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=0 > {noformat} > 4) When replication is stuck due some connectivity issues or target > unavailability, if RS is restarted, once recovered queue source is started, > TimeStampsOfLastShippedOp is set to initial java date (Thu Jan 01 01:00:00 > GMT 1970, for example), thus "Replication Lag" also gives a complete > inaccurate value. > Tests performed: > 4.1) Source cluster with only one table targeted to replication; > 4.2) Stopped target cluster RS; > 4.3) Put a new row on source, restart RS on source, waited a few seconds for > recovery queue source to startup, then it gives: > {noformat} > SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Thu Jan 01 01:00:00 GMT 1970, Replication > Lag=9223372036854775807 > {noformat} > Also, we should report status to all sources running, current output format > gives the impression there’s only one, even when there are recovery queues, > for instance. > Here is a list of ideas on how the command should report under different > states of replication: > a) Source started, target stopped, no ed
[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region
[ https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711632#comment-16711632 ] Pankaj Kumar commented on HBASE-21217: -- Got it [~allan163]... thanks for the Jira pointer. > Revisit the executeProcedure method for open/close region > - > > Key: HBASE-21217 > URL: https://issues.apache.org/jira/browse/HBASE-21217 > Project: HBase > Issue Type: Sub-task > Components: amv2, proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, > HBASE-21217.patch > > > Currently we just call openRegion and closeRegion directly, which is a bit > buggy. For example, in order to not fail all the open region requests while > there is only one failure, we will catch the exception and set a flag in the > return value. But for executeProcedures call, the return value will be > ignored, and we expect the openRegion method will always call > reportRegionStateTransition to report the failure but in fact it does not... > And after HBASE-20881, we can confirm that the race could happen, where we > send a close request to a region which is opening(HBASE-21199), and vice > visa. So I think here we need to revisit the implementation of > executeProcedures to make it more stable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711604#comment-16711604 ] Hadoop QA commented on HBASE-21559: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 33s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 13s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 7s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 9m 27s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}131m 44s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}171m 4s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21559 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12950827/HBASE-21559.v1.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 60403a0126a1 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 12e75a8a63 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/15208/testReport/ | | Max. process+thread count | 4432 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE
[jira] [Commented] (HBASE-21512) Introduce an AsyncClusterConnection and replace the usage of ClusterConnection
[ https://issues.apache.org/jira/browse/HBASE-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711569#comment-16711569 ] Hudson commented on HBASE-21512: Results for branch HBASE-21512 [build #8 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/8/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/8//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/8//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/8//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Introduce an AsyncClusterConnection and replace the usage of ClusterConnection > -- > > Key: HBASE-21512 > URL: https://issues.apache.org/jira/browse/HBASE-21512 > Project: HBase > Issue Type: Umbrella >Reporter: Duo Zhang >Priority: Major > Fix For: 3.0.0 > > > At least for the RSProcedureDispatcher, with CompletableFuture we do not need > to set a delay and use a thread pool any more, which could reduce the > resource usage and also the latency. > Once this is done, I think we can remove the ClusterConnection completely, > and start to rewrite the old sync client based on the async client, which > could reduce the code base a lot for our client. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711538#comment-16711538 ] Zheng Hu commented on HBASE-21559: -- bq. But at t8, the TakeSnapshotHandler is already in the map right? Think about the above case again, should be no problem if move the v!=null & v.isFinished() out of the computeIfPresent. because the status of v will transform from not finished to finished. if get a not finished state, then the STRP won't process, it's OK even if someone change it from not finished to finished. bq. so the problem here is that the state should be volatile. the state is volatile now. > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Hu updated HBASE-21559: - Attachment: HBASE-21559.v2.patch > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, HBASE-21559.v2.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711506#comment-16711506 ] Duo Zhang commented on HBASE-21559: --- But at t8, the TakeSnapshotHandler is already in the map right? We will not hold the map lock when changing its state, so the problem here is that the state should be volatile. > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711503#comment-16711503 ] Zheng Hu edited comment on HBASE-21559 at 12/6/18 2:25 PM: --- bq. But they will not hold the map lock when modifying right? Assume the case: t1. start snapshot t2. hold the table x-lock t3. release the table x-lock; t4. downgrade to slock because table is enabled; t5. start snapshot on RS... t6. SplitTableRegionProcedure (STRP) submitted . t7. STRP hold the table s-lock t8. STRP check isTakingSnapshot . Then at t8, the SnapshotManager may update the status of handler at any time , I think. was (Author: openinx): bq. But they will not hold the map lock when modifying right? Assume the case: t1. start snapshot t2. hold the table x-lock t3. rease the table x-lock; t4. downgrade to slock because table is enabled; t5. start snapshot on RS... t6. SplitTableRegionProcedure start . t7. STRP hold the table s-lock t8. check isTakingSnapshot . Then at t8, the SnapshotManager may update the status of handler at any time , I think. > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711503#comment-16711503 ] Zheng Hu commented on HBASE-21559: -- bq. But they will not hold the map lock when modifying right? Assume the case: t1. start snapshot t2. hold the table x-lock t3. rease the table x-lock; t4. downgrade to slock because table is enabled; t5. start snapshot on RS... t6. SplitTableRegionProcedure start . t7. STRP hold the table s-lock t8. check isTakingSnapshot . Then at t8, the SnapshotManager may update the status of handler at any time , I think. > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.
[ https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711499#comment-16711499 ] Wellington Chevreuil commented on HBASE-21505: -- Attaching new patch version. Previous one had some unused imports that caused compilation errors on the build, ain't sure why those don't happen locally for me. There was also some uncaught chedckstyle violations, now addressed. > Several inconsistencies on information reported for Replication Sources by > hbase shell status 'replication' command. > > > Key: HBASE-21505 > URL: https://issues.apache.org/jira/browse/HBASE-21505 > Project: HBase > Issue Type: Bug >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Attachments: > 0001-HBASE-21505-initial-version-for-more-detailed-report.patch, > HBASE-21505-master.001.patch, HBASE-21505-master.002.patch > > > While reviewing hbase shell status 'replication' command, noticed the > following issues related to replication source section: > 1) TimeStampsOfLastShippedOp keeps getting updated and increasing even when > no new edits were added to source, so nothing was really shipped. Test steps > performed: > 1.1) Source cluster with only one table targeted to replication; > 1.2) Added a new row, confirmed the row appeared in Target cluster; > 1.3) Issued status 'replication' command in source, TimeStampsOfLastShippedOp > shows current timestamp T1. > 1.4) Waited 30 seconds, no new data added to source. Issued status > 'replication' command, now shows timestamp T2. > 2) When replication is stuck due some connectivity issues or target > unavailability, if new edits are added in source, reported AgeOfLastShippedOp > is wrongly showing same value as "Replication Lag". This is incorrect, > AgeOfLastShippedOp should not change until there's indeed another edit > shipped to target. Test steps performed: > 2.1) Source cluster with only one table targeted to replication; > 2.2) Stopped target cluster RS; > 2.3) Put a new row on source. Running status 'replication' command does show > lag increasing. TimeStampsOfLastShippedOp seems correct also, no further > updates as described on bullet #1 above. > 2.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even > though there's no new edit shipped to target: > {noformat} > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581 > ... > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586 > ... > {noformat} > 3) AgeOfLastShippedOp gets set to 0 even when a given edit had taken some > time before it got finally shipped to target. Test steps performed: > 3.1) Source cluster with only one table targeted to replication; > 3.2) Stopped target cluster RS; > 3.3) Put a new row on source. > 3.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even > though there's no new edit shipped to target: > {noformat} > T1: > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581 > ... > T2: > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586 > ... > {noformat} > 3.5) Restart target cluster RS and verified the new row appeared there. No > new edit added, but status 'replication' command reports AgeOfLastShippedOp > as 0, while it should be the diff between the time it concluded shipping at > target and the time it was added in source: > {noformat} > SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=0 > {noformat} > 4) When replication is stuck due some connectivity issues or target > unavailability, if RS is restarted, once recovered queue source is started, > TimeStampsOfLastShippedOp is set to initial java date (Thu Jan 01 01:00:00 > GMT 1970, for example), thus "Replication Lag" also gives a complete > inaccurate value. > Tests performed: > 4.1) Source cluster with only one table targeted to replication; > 4.2) Stopped target cluster RS; > 4.3) Put a new row on source, restart RS on source, waited a few seconds for > recovery queue source to startup, then it gives: > {noformat} > SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Thu Jan 01 01:00:00 GMT 1970, Replication > Lag=9223372036854775807 > {noformat} > Also, we should report status to all sources running, current output format
[jira] [Updated] (HBASE-21505) Several inconsistencies on information reported for Replication Sources by hbase shell status 'replication' command.
[ https://issues.apache.org/jira/browse/HBASE-21505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil updated HBASE-21505: - Attachment: HBASE-21505-master.002.patch > Several inconsistencies on information reported for Replication Sources by > hbase shell status 'replication' command. > > > Key: HBASE-21505 > URL: https://issues.apache.org/jira/browse/HBASE-21505 > Project: HBase > Issue Type: Bug >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Attachments: > 0001-HBASE-21505-initial-version-for-more-detailed-report.patch, > HBASE-21505-master.001.patch, HBASE-21505-master.002.patch > > > While reviewing hbase shell status 'replication' command, noticed the > following issues related to replication source section: > 1) TimeStampsOfLastShippedOp keeps getting updated and increasing even when > no new edits were added to source, so nothing was really shipped. Test steps > performed: > 1.1) Source cluster with only one table targeted to replication; > 1.2) Added a new row, confirmed the row appeared in Target cluster; > 1.3) Issued status 'replication' command in source, TimeStampsOfLastShippedOp > shows current timestamp T1. > 1.4) Waited 30 seconds, no new data added to source. Issued status > 'replication' command, now shows timestamp T2. > 2) When replication is stuck due some connectivity issues or target > unavailability, if new edits are added in source, reported AgeOfLastShippedOp > is wrongly showing same value as "Replication Lag". This is incorrect, > AgeOfLastShippedOp should not change until there's indeed another edit > shipped to target. Test steps performed: > 2.1) Source cluster with only one table targeted to replication; > 2.2) Stopped target cluster RS; > 2.3) Put a new row on source. Running status 'replication' command does show > lag increasing. TimeStampsOfLastShippedOp seems correct also, no further > updates as described on bullet #1 above. > 2.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even > though there's no new edit shipped to target: > {noformat} > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581 > ... > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586 > ... > {noformat} > 3) AgeOfLastShippedOp gets set to 0 even when a given edit had taken some > time before it got finally shipped to target. Test steps performed: > 3.1) Source cluster with only one table targeted to replication; > 3.2) Stopped target cluster RS; > 3.3) Put a new row on source. > 3.4) AgeOfLastShippedOp keeps increasing together with Replication Lag, even > though there's no new edit shipped to target: > {noformat} > T1: > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=5581, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=5581 > ... > T2: > ... > SOURCE: PeerID=1, AgeOfLastShippedOp=8586, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=8586 > ... > {noformat} > 3.5) Restart target cluster RS and verified the new row appeared there. No > new edit added, but status 'replication' command reports AgeOfLastShippedOp > as 0, while it should be the diff between the time it concluded shipping at > target and the time it was added in source: > {noformat} > SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Wed Nov 21 02:50:23 GMT 2018, Replication Lag=0 > {noformat} > 4) When replication is stuck due some connectivity issues or target > unavailability, if RS is restarted, once recovered queue source is started, > TimeStampsOfLastShippedOp is set to initial java date (Thu Jan 01 01:00:00 > GMT 1970, for example), thus "Replication Lag" also gives a complete > inaccurate value. > Tests performed: > 4.1) Source cluster with only one table targeted to replication; > 4.2) Stopped target cluster RS; > 4.3) Put a new row on source, restart RS on source, waited a few seconds for > recovery queue source to startup, then it gives: > {noformat} > SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=1, > TimeStampsOfLastShippedOp=Thu Jan 01 01:00:00 GMT 1970, Replication > Lag=9223372036854775807 > {noformat} > Also, we should report status to all sources running, current output format > gives the impression there’s only one, even when there are recovery queues, > for instance. > Here is a list of ideas on how the command should report under different > states of replication: > a) Source started, target stopped, no edits
[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region
[ https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711498#comment-16711498 ] Allan Yang commented on HBASE-21217: We have HBASE-21237 for branch-2.0/2.1. [~pankaj2461] > Revisit the executeProcedure method for open/close region > - > > Key: HBASE-21217 > URL: https://issues.apache.org/jira/browse/HBASE-21217 > Project: HBase > Issue Type: Sub-task > Components: amv2, proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, > HBASE-21217.patch > > > Currently we just call openRegion and closeRegion directly, which is a bit > buggy. For example, in order to not fail all the open region requests while > there is only one failure, we will catch the exception and set a flag in the > return value. But for executeProcedures call, the return value will be > ignored, and we expect the openRegion method will always call > reportRegionStateTransition to report the failure but in fact it does not... > And after HBASE-20881, we can confirm that the race could happen, where we > send a close request to a region which is opening(HBASE-21199), and vice > visa. So I think here we need to revisit the implementation of > executeProcedures to make it more stable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19830) [AMv2] RPCs while holding (Region) Locks (to update hbase:meta with region state)
[ https://issues.apache.org/jira/browse/HBASE-19830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pankaj Kumar updated HBASE-19830: - Component/s: amv2 > [AMv2] RPCs while holding (Region) Locks (to update hbase:meta with region > state) > - > > Key: HBASE-19830 > URL: https://issues.apache.org/jira/browse/HBASE-19830 > Project: HBase > Issue Type: Bug > Components: amv2 >Reporter: stack >Assignee: stack >Priority: Major > > Do we have to? Its a problem if we want Master to host regions and its just a > problem anyways. See HBASE-19828 for scenarios mostly around cluster shutdown > where the order in which processes go down is hard to control and it happens > that a server-hosted-client is trying to rpc a missing hbase:meta. > Eventually, we'll time out (server-hosted-clients have their retries upped) > but bad for MTTR and unit tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21549) Add shell command for serial replication peer
[ https://issues.apache.org/jira/browse/HBASE-21549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711472#comment-16711472 ] Peter Somogyi commented on HBASE-21549: --- Thanks for addressing my review comments. +1 > Add shell command for serial replication peer > - > > Key: HBASE-21549 > URL: https://issues.apache.org/jira/browse/HBASE-21549 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Attachments: HBASE-21549.master.001.patch, > HBASE-21549.master.002.patch, HBASE-21549.master.003.patch > > > add_peer support add a serial replication peer directly. > set_peer_serial support change a replication peer's serial flag. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region
[ https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711455#comment-16711455 ] Pankaj Kumar commented on HBASE-21217: -- Do you mean this bug is not applicable to branch-2.0/2.1? Pardon me if wrong > Revisit the executeProcedure method for open/close region > - > > Key: HBASE-21217 > URL: https://issues.apache.org/jira/browse/HBASE-21217 > Project: HBase > Issue Type: Sub-task > Components: amv2, proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, > HBASE-21217.patch > > > Currently we just call openRegion and closeRegion directly, which is a bit > buggy. For example, in order to not fail all the open region requests while > there is only one failure, we will catch the exception and set a flag in the > return value. But for executeProcedures call, the return value will be > ignored, and we expect the openRegion method will always call > reportRegionStateTransition to report the failure but in fact it does not... > And after HBASE-20881, we can confirm that the race could happen, where we > send a close request to a region which is opening(HBASE-21199), and vice > visa. So I think here we need to revisit the implementation of > executeProcedures to make it more stable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711436#comment-16711436 ] Zheng Hu commented on HBASE-21559: -- bq. Do we really need to test the isFinished under the map lock? Just get it out and check? I thought better to do this, in case of we get a handler and someone update the handler's status right now? > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711430#comment-16711430 ] Duo Zhang commented on HBASE-21559: --- Do we really need to test the isFinished under the map lock? Just get it out and check? > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region
[ https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711441#comment-16711441 ] Duo Zhang commented on HBASE-21217: --- For branch-2.1/branch-2.0, we will just call openRegion and closeRegion directly, IIRC. > Revisit the executeProcedure method for open/close region > - > > Key: HBASE-21217 > URL: https://issues.apache.org/jira/browse/HBASE-21217 > Project: HBase > Issue Type: Sub-task > Components: amv2, proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, > HBASE-21217.patch > > > Currently we just call openRegion and closeRegion directly, which is a bit > buggy. For example, in order to not fail all the open region requests while > there is only one failure, we will catch the exception and set a flag in the > return value. But for executeProcedures call, the return value will be > ignored, and we expect the openRegion method will always call > reportRegionStateTransition to report the failure but in fact it does not... > And after HBASE-20881, we can confirm that the race could happen, where we > send a close request to a region which is opening(HBASE-21199), and vice > visa. So I think here we need to revisit the implementation of > executeProcedures to make it more stable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711439#comment-16711439 ] Duo Zhang commented on HBASE-21559: --- But they will not hold the map lock when modifying right? > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21526) Use AsyncClusterConnection in ServerManager for getRsAdmin
[ https://issues.apache.org/jira/browse/HBASE-21526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-21526: -- Attachment: HBASE-21526-HBASE-21512-v2.patch > Use AsyncClusterConnection in ServerManager for getRsAdmin > -- > > Key: HBASE-21526 > URL: https://issues.apache.org/jira/browse/HBASE-21526 > Project: HBase > Issue Type: Sub-task >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Attachments: HBASE-21526-HBASE-21512-v1.patch, > HBASE-21526-HBASE-21512-v2.patch, HBASE-21526-HBASE-21512.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region
[ https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711438#comment-16711438 ] Pankaj Kumar commented on HBASE-21217: -- Ping [~allan163], any plan to backport this bug in 2.0/2.1 branch? > Revisit the executeProcedure method for open/close region > - > > Key: HBASE-21217 > URL: https://issues.apache.org/jira/browse/HBASE-21217 > Project: HBase > Issue Type: Sub-task > Components: amv2, proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, > HBASE-21217.patch > > > Currently we just call openRegion and closeRegion directly, which is a bit > buggy. For example, in order to not fail all the open region requests while > there is only one failure, we will catch the exception and set a flag in the > return value. But for executeProcedures call, the return value will be > ignored, and we expect the openRegion method will always call > reportRegionStateTransition to report the failure but in fact it does not... > And after HBASE-20881, we can confirm that the race could happen, where we > send a close request to a region which is opening(HBASE-21199), and vice > visa. So I think here we need to revisit the implementation of > executeProcedures to make it more stable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711429#comment-16711429 ] Zheng Hu commented on HBASE-21559: -- BTW, I think we can move the snapshot feature from procedure.v1 from procedure.v2 in the future. So I assigned the HBASE-14413 to myself. > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-14413) Procedure V2 - Snapshot V2
[ https://issues.apache.org/jira/browse/HBASE-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711422#comment-16711422 ] Zheng Hu commented on HBASE-14413: -- Not in progress. Let me make this forward. Hope you don't mind. [~vrodionov], [~mbertozzi] > Procedure V2 - Snapshot V2 > -- > > Key: HBASE-14413 > URL: https://issues.apache.org/jira/browse/HBASE-14413 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Zheng Hu >Priority: Major > > We need new implementation of snapshot feature that is more robust and > performant. Ideally, it will work with multiple tables as well. The possible > areas of improvements: > # Must be flushless. Coordinated memstore flushes across a cluster are bad. > # Verification phase must be distributed, done in parallel and not on Master. > In theory, the only info we need to record snapshot of a table: list of WAL > files, list of HFiles and max sequence id of an edit which has been flushed > per Region. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711425#comment-16711425 ] Zheng Hu commented on HBASE-21559: -- Run the UT in my localhost 5 times, Seems OK now. > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21560) Return a new TableDescriptor for MasterObserver#preModifyTable to allow coprocessor modify the TableDescriptor
[ https://issues.apache.org/jira/browse/HBASE-21560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711421#comment-16711421 ] Duo Zhang commented on HBASE-21560: --- +1. > Return a new TableDescriptor for MasterObserver#preModifyTable to allow > coprocessor modify the TableDescriptor > -- > > Key: HBASE-21560 > URL: https://issues.apache.org/jira/browse/HBASE-21560 > Project: HBase > Issue Type: Improvement >Reporter: Guanghao Zhang >Priority: Major > > Same with HBASE-21550. The new TableDescriptor is immutable for 2.0+. But in > our use case, the coprocessor may change the TableDescriptor when > preModifyTable. It is allowed before 2.0. For 2.0+, We can return a new > TableDescriptor for MasterObserver#preModifyTable to allow this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-14413) Procedure V2 - Snapshot V2
[ https://issues.apache.org/jira/browse/HBASE-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Hu reassigned HBASE-14413: Assignee: Zheng Hu > Procedure V2 - Snapshot V2 > -- > > Key: HBASE-14413 > URL: https://issues.apache.org/jira/browse/HBASE-14413 > Project: HBase > Issue Type: Sub-task >Reporter: Vladimir Rodionov >Assignee: Zheng Hu >Priority: Major > > We need new implementation of snapshot feature that is more robust and > performant. Ideally, it will work with multiple tables as well. The possible > areas of improvements: > # Must be flushless. Coordinated memstore flushes across a cluster are bad. > # Verification phase must be distributed, done in parallel and not on Master. > In theory, the only info we need to record snapshot of a table: list of WAL > files, list of HFiles and max sequence id of an edit which has been flushed > per Region. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21514) Refactor CacheConfig
[ https://issues.apache.org/jira/browse/HBASE-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711416#comment-16711416 ] Hadoop QA commented on HBASE-21514: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 30 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 59s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 48s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 48s{color} | {color:red} hbase-server generated 4 new + 184 unchanged - 4 fixed = 188 total (was 188) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 18s{color} | {color:green} hbase-server: The patch generated 0 new + 868 unchanged - 58 fixed = 868 total (was 926) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 51s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 35s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}127m 46s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}164m 35s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-21514 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12950812/HBASE-21514.master.009.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux d903706a348b 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 12e75a8a63 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | javac | https://builds.apache.org/job/PreCommit-HBASE-Build/15207/artifact/patchprocess/diff-compile-javac-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/15207/testReport/ | | Max. process+thread count | 4906 (vs. ulimit of 1
[jira] [Updated] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Hu updated HBASE-21559: - Status: Patch Available (was: Open) > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711397#comment-16711397 ] Zheng Hu commented on HBASE-21559: -- Currently, the snapshotManager grab the object lock in many method. This is a very rough way of locking. I think we should change the locking way of SnapshotManager , not just synchronized the big SnapshotManager object, but use a more concrete lock (in case of dead lock). Anyway , Let me fix this dead lock firstly. So upload a patch.v1. > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21559) The RestoreSnapshotFromClientTestBase related UT are flaky
[ https://issues.apache.org/jira/browse/HBASE-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Hu updated HBASE-21559: - Attachment: HBASE-21559.v1.patch > The RestoreSnapshotFromClientTestBase related UT are flaky > -- > > Key: HBASE-21559 > URL: https://issues.apache.org/jira/browse/HBASE-21559 > Project: HBase > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.1.2, 2.0.4, 2.0.5 > > Attachments: HBASE-21559.v1.patch, > TEST-org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.xml, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions-output.txt, > > org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientAfterSplittingRegions.txt > > > The related UT are: > * TestRestoreSnapshotFromClientAfterSplittingRegions > * TestRestoreSnapshotFromClientWithRegionReplicas > * TestMobRestoreSnapshotFromClientAfterSplittingRegions > I guess the main problem is: a dead lock between SplitTableRegionProcedure > and SnapshotProcedure.. > Attached logs from the failed UT. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21550) Add a new method preCreateTableRegionInfos for MasterObserver which allows CPs to modify the TableDescriptor
[ https://issues.apache.org/jira/browse/HBASE-21550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711380#comment-16711380 ] Hudson commented on HBASE-21550: Results for branch branch-2 [build #1542 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Add a new method preCreateTableRegionInfos for MasterObserver which allows > CPs to modify the TableDescriptor > > > Key: HBASE-21550 > URL: https://issues.apache.org/jira/browse/HBASE-21550 > Project: HBase > Issue Type: Bug > Components: Coprocessors >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21550.patch > > > Before 2.0, we will pass a HTableDescriptor and the CPs can modify the schema > of a table, but now we will pass a TableDescriptor, which is immutable. I > think it is correct to pass an immutable instance here, but we should have a > return value for this method to allow CPs to return a new TableDescriptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side
[ https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711382#comment-16711382 ] Hudson commented on HBASE-21551: Results for branch branch-2 [build #1542 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Memory leak when use scan with STREAM at server side > > > Key: HBASE-21551 > URL: https://issues.apache.org/jira/browse/HBASE-21551 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Blocker > Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4 > > Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, > HBASE-21551.v3.patch, heap-dump.jpg > > > We open the RegionServerScanner with STREAM as following: > {code} > RegionScannerImpl#initializeScanners > |---> HStore#getScanner > |--> StoreScanner() > |---> > StoreFileScanner#getScannersForStoreFiles > |--> > HStoreFile#getStreamScanner #1 > {code} > In #1, we put the StoreFileReader into a concurrent hash map streamReaders, > but not remove the StreamReader from streamReaders until closing the store > file. > So if we scan with stream with so many times, the streamReaders hash map > will be exploded. we can see the heap dump in the attached heap-dump.jpg. > I found this bug, because when i benchmark the scan performance by using YCSB > in a cluster (heap size of RS is 50g), the Rs was easy to occur a long time > full gc ( ~ 110 sec) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21534) TestAssignmentManager is flakey
[ https://issues.apache.org/jira/browse/HBASE-21534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711381#comment-16711381 ] Hudson commented on HBASE-21534: Results for branch branch-2 [build #1542 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1542//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > TestAssignmentManager is flakey > --- > > Key: HBASE-21534 > URL: https://issues.apache.org/jira/browse/HBASE-21534 > Project: HBase > Issue Type: Task > Components: test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21534-addendum-v1.patch, > HBASE-21534-addendum.patch, HBASE-21534.patch > > > See this in the outout and then the test hang > {noformat} > 2018-11-29 20:47:50,061 WARN [MockRSProcedureDispatcher-pool5-t10] > assignment.AssignmentManager(894): The region server localhost,102,1 is > already dead, skip reportRegionStateTransition call > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21557) Set version to 2.0.4 on branch-2.0 so can cut an RC
[ https://issues.apache.org/jira/browse/HBASE-21557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711336#comment-16711336 ] Hudson commented on HBASE-21557: Results for branch branch-2.0 [build #1141 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Set version to 2.0.4 on branch-2.0 so can cut an RC > --- > > Key: HBASE-21557 > URL: https://issues.apache.org/jira/browse/HBASE-21557 > Project: HBase > Issue Type: Sub-task > Components: release >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.4 > > > $ mvn clean org.codehaus.mojo:versions-maven-plugin:2.5:set -DnewVersion=2.0.4 > $ find . -name pom.xml -exec git add {} \; > $ git commit ... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21126) Add ability for HBase Canary to ignore a configurable number of ZooKeeper down nodes
[ https://issues.apache.org/jira/browse/HBASE-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711337#comment-16711337 ] Hudson commented on HBASE-21126: Results for branch branch-2.0 [build #1141 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1141//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Add ability for HBase Canary to ignore a configurable number of ZooKeeper > down nodes > > > Key: HBASE-21126 > URL: https://issues.apache.org/jira/browse/HBASE-21126 > Project: HBase > Issue Type: Improvement > Components: canary, Zookeeper >Affects Versions: 1.0.0, 3.0.0, 2.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.2.0 > > Attachments: HBASE-21126.branch-1.001.patch, > HBASE-21126.master.001.patch, HBASE-21126.master.002.patch, > HBASE-21126.master.003.patch, zookeeperCanaryLocalTestValidation.txt > > Original Estimate: 48h > Remaining Estimate: 48h > > When running org.apache.hadoop.hbase.tool.Canary with args -zookeeper > -treatFailureAsError, the Canary will try to get a znode from each ZooKeeper > server in the ensemble. If any server is unavailable or unresponsive, the > canary will exit with a failure code. > If we use the Canary to gauge server health, and alert accordingly, this can > be too strict. For example, in a 5-node ZooKeeper cluster, having one node > down is safe and expected in rolling upgrades/patches. > This is a request to allow the Canary to take another parameter > {code:java} > -permittedZookeeperFailures {code} > If N=1, in the 5-node ZooKeeper ensemble example, then the Canary will still > pass if 4 ZooKeeper nodes are reachable, but fail if 3 or fewer are reachable. > (This is my first Jira posting... sorry if I messed anything up.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21126) Add ability for HBase Canary to ignore a configurable number of ZooKeeper down nodes
[ https://issues.apache.org/jira/browse/HBASE-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711304#comment-16711304 ] Hudson commented on HBASE-21126: Results for branch branch-2.1 [build #662 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Add ability for HBase Canary to ignore a configurable number of ZooKeeper > down nodes > > > Key: HBASE-21126 > URL: https://issues.apache.org/jira/browse/HBASE-21126 > Project: HBase > Issue Type: Improvement > Components: canary, Zookeeper >Affects Versions: 1.0.0, 3.0.0, 2.0.0 >Reporter: David Manning >Assignee: David Manning >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.2.0 > > Attachments: HBASE-21126.branch-1.001.patch, > HBASE-21126.master.001.patch, HBASE-21126.master.002.patch, > HBASE-21126.master.003.patch, zookeeperCanaryLocalTestValidation.txt > > Original Estimate: 48h > Remaining Estimate: 48h > > When running org.apache.hadoop.hbase.tool.Canary with args -zookeeper > -treatFailureAsError, the Canary will try to get a znode from each ZooKeeper > server in the ensemble. If any server is unavailable or unresponsive, the > canary will exit with a failure code. > If we use the Canary to gauge server health, and alert accordingly, this can > be too strict. For example, in a 5-node ZooKeeper cluster, having one node > down is safe and expected in rolling upgrades/patches. > This is a request to allow the Canary to take another parameter > {code:java} > -permittedZookeeperFailures {code} > If N=1, in the 5-node ZooKeeper ensemble example, then the Canary will still > pass if 4 ZooKeeper nodes are reachable, but fail if 3 or fewer are reachable. > (This is my first Jira posting... sorry if I messed anything up.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side
[ https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711302#comment-16711302 ] Hudson commented on HBASE-21551: Results for branch branch-2.1 [build #662 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Memory leak when use scan with STREAM at server side > > > Key: HBASE-21551 > URL: https://issues.apache.org/jira/browse/HBASE-21551 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Blocker > Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4 > > Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, > HBASE-21551.v3.patch, heap-dump.jpg > > > We open the RegionServerScanner with STREAM as following: > {code} > RegionScannerImpl#initializeScanners > |---> HStore#getScanner > |--> StoreScanner() > |---> > StoreFileScanner#getScannersForStoreFiles > |--> > HStoreFile#getStreamScanner #1 > {code} > In #1, we put the StoreFileReader into a concurrent hash map streamReaders, > but not remove the StreamReader from streamReaders until closing the store > file. > So if we scan with stream with so many times, the streamReaders hash map > will be exploded. we can see the heap dump in the attached heap-dump.jpg. > I found this bug, because when i benchmark the scan performance by using YCSB > in a cluster (heap size of RS is 50g), the Rs was easy to occur a long time > full gc ( ~ 110 sec) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21558) Set version to 2.1.2 on branch-2.1 so can cut an RC
[ https://issues.apache.org/jira/browse/HBASE-21558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711303#comment-16711303 ] Hudson commented on HBASE-21558: Results for branch branch-2.1 [build #662 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/662//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Set version to 2.1.2 on branch-2.1 so can cut an RC > --- > > Key: HBASE-21558 > URL: https://issues.apache.org/jira/browse/HBASE-21558 > Project: HBase > Issue Type: Sub-task > Components: release >Reporter: stack >Assignee: stack >Priority: Major > > mvn clean org.codehaus.mojo:versions-maven-plugin:2.5:set -DnewVersion=2.1.2 > $ find . -name pom.xml -exec git add {} \; > $ git commit ... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21551) Memory leak when use scan with STREAM at server side
[ https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711284#comment-16711284 ] Hudson commented on HBASE-21551: Results for branch master [build #648 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/648/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/648//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/648//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/648//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Memory leak when use scan with STREAM at server side > > > Key: HBASE-21551 > URL: https://issues.apache.org/jira/browse/HBASE-21551 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Blocker > Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4 > > Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, > HBASE-21551.v3.patch, heap-dump.jpg > > > We open the RegionServerScanner with STREAM as following: > {code} > RegionScannerImpl#initializeScanners > |---> HStore#getScanner > |--> StoreScanner() > |---> > StoreFileScanner#getScannersForStoreFiles > |--> > HStoreFile#getStreamScanner #1 > {code} > In #1, we put the StoreFileReader into a concurrent hash map streamReaders, > but not remove the StreamReader from streamReaders until closing the store > file. > So if we scan with stream with so many times, the streamReaders hash map > will be exploded. we can see the heap dump in the attached heap-dump.jpg. > I found this bug, because when i benchmark the scan performance by using YCSB > in a cluster (heap size of RS is 50g), the Rs was easy to occur a long time > full gc ( ~ 110 sec) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21550) Add a new method preCreateTableRegionInfos for MasterObserver which allows CPs to modify the TableDescriptor
[ https://issues.apache.org/jira/browse/HBASE-21550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711283#comment-16711283 ] Hudson commented on HBASE-21550: Results for branch master [build #648 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/648/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/648//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/648//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/648//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Add a new method preCreateTableRegionInfos for MasterObserver which allows > CPs to modify the TableDescriptor > > > Key: HBASE-21550 > URL: https://issues.apache.org/jira/browse/HBASE-21550 > Project: HBase > Issue Type: Bug > Components: Coprocessors >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-21550.patch > > > Before 2.0, we will pass a HTableDescriptor and the CPs can modify the schema > of a table, but now we will pass a TableDescriptor, which is immutable. I > think it is correct to pass an immutable instance here, but we should have a > return value for this method to allow CPs to return a new TableDescriptor. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21464) Splitting blocked with meta NSRE during split transaction
[ https://issues.apache.org/jira/browse/HBASE-21464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711256#comment-16711256 ] Hudson commented on HBASE-21464: Results for branch branch-1.4 [build #577 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/577/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/577//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/577//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/577//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Splitting blocked with meta NSRE during split transaction > - > > Key: HBASE-21464 > URL: https://issues.apache.org/jira/browse/HBASE-21464 > Project: HBase > Issue Type: Bug >Affects Versions: 1.5.0, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 1.4.8, 1.4.7 >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Blocker > Fix For: 1.5.0, 1.4.9 > > Attachments: HBASE-21464-branch-1.patch, HBASE-21464-branch-1.patch, > HBASE-21464-branch-1.patch, HBASE-21464-branch-1.patch > > > Splitting is blocked during split transaction. The split worker is trying to > update meta but isn't able to relocate it after NSRE: > {noformat} > 2018-11-09 17:50:45,277 INFO > [regionserver/ip-172-31-5-92.us-west-2.compute.internal/172.31.5.92:8120-splits-1541785709434] > client.RpcRetryingCaller: Call exception, tries=13, retries=350, > started=88590 ms ago, cancelled=false, > msg=org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 > is not online on ip-172-31-13-83.us-west-2.compute.internal,8120,1541785618832 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3088) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1271) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2198) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36617) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2396) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)row > 'test,,1541785709452.5ba6596f0050c2dab969d152829227c6.44' on table > 'hbase:meta' at region=hbase:meta,1.1588230740, > hostname=ip-172-31-15-225.us-west-2.compute.internal,8120,1541785640586, > seqNum=0{noformat} > Clients, in this case YCSB, are hung with part of the keyspace missing: > {noformat} > 2018-11-09 17:51:06,033 DEBUG [hconnection-0x5739e567-shared--pool1-t165] > client.ConnectionManager$HConnectionImplementation: locateRegionInMeta > parentTable=hbase:meta, metaLocation=, attempt=14 of 35 failed; retrying > after sleep of 20158 because: No server address listed in hbase:meta for > region > test,user307326104267982763,1541785754600.ef90030b05cb02305b75e9bfbc3ee081. > containing row user3301635648728421323{noformat} > Balancing cannot run indefinitely because the split transaction is stuck > {noformat} > 2018-11-09 17:49:55,478 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=8100] master.HMaster: > Not running balancer because 3 region(s) in transition: > [{ef90030b05cb02305b75e9bfbc3ee081 state=SPLITTING_NEW, ts=1541785754606, > server=ip-172-31-5-92.us-west-2.compute.internal,8120,1541785626417}, > {5ba6596f0050c2dab969d152829227c6 state=SPLITTING, ts=1541785754606, > server=ip-172-31-5-92.us-west-2.compute{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21464) Splitting blocked with meta NSRE during split transaction
[ https://issues.apache.org/jira/browse/HBASE-21464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711253#comment-16711253 ] Hudson commented on HBASE-21464: Results for branch branch-1 [build #580 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/580/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/580//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/580//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/580//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 source release artifact{color} -- See build output for details. > Splitting blocked with meta NSRE during split transaction > - > > Key: HBASE-21464 > URL: https://issues.apache.org/jira/browse/HBASE-21464 > Project: HBase > Issue Type: Bug >Affects Versions: 1.5.0, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 1.4.8, 1.4.7 >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Blocker > Fix For: 1.5.0, 1.4.9 > > Attachments: HBASE-21464-branch-1.patch, HBASE-21464-branch-1.patch, > HBASE-21464-branch-1.patch, HBASE-21464-branch-1.patch > > > Splitting is blocked during split transaction. The split worker is trying to > update meta but isn't able to relocate it after NSRE: > {noformat} > 2018-11-09 17:50:45,277 INFO > [regionserver/ip-172-31-5-92.us-west-2.compute.internal/172.31.5.92:8120-splits-1541785709434] > client.RpcRetryingCaller: Call exception, tries=13, retries=350, > started=88590 ms ago, cancelled=false, > msg=org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 > is not online on ip-172-31-13-83.us-west-2.compute.internal,8120,1541785618832 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3088) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1271) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2198) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36617) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2396) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)row > 'test,,1541785709452.5ba6596f0050c2dab969d152829227c6.44' on table > 'hbase:meta' at region=hbase:meta,1.1588230740, > hostname=ip-172-31-15-225.us-west-2.compute.internal,8120,1541785640586, > seqNum=0{noformat} > Clients, in this case YCSB, are hung with part of the keyspace missing: > {noformat} > 2018-11-09 17:51:06,033 DEBUG [hconnection-0x5739e567-shared--pool1-t165] > client.ConnectionManager$HConnectionImplementation: locateRegionInMeta > parentTable=hbase:meta, metaLocation=, attempt=14 of 35 failed; retrying > after sleep of 20158 because: No server address listed in hbase:meta for > region > test,user307326104267982763,1541785754600.ef90030b05cb02305b75e9bfbc3ee081. > containing row user3301635648728421323{noformat} > Balancing cannot run indefinitely because the split transaction is stuck > {noformat} > 2018-11-09 17:49:55,478 DEBUG > [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=8100] master.HMaster: > Not running balancer because 3 region(s) in transition: > [{ef90030b05cb02305b75e9bfbc3ee081 state=SPLITTING_NEW, ts=1541785754606, > server=ip-172-31-5-92.us-west-2.compute.internal,8120,1541785626417}, > {5ba6596f0050c2dab969d152829227c6 state=SPLITTING, ts=1541785754606, > server=ip-172-31-5-92.us-west-2.compute{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)