[jira] [Commented] (HBASE-12331) Shorten the mob snapshot unit tests
[ https://issues.apache.org/jira/browse/HBASE-12331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201790#comment-14201790 ] Li Jiajia commented on HBASE-12331: --- hi, [~jmhsieh] , in the HBASE-12332 has some improvement of read clone cell, and drop some time cost of unit tests. We can write less data to finish in less time, then most of them(except the case of exporting snapshot) are finished within 100 seconds. And the TestExportSnapshot also takes a long time, so do we still need to move these unit tests to integration tests? Please advise. Thanks. Shorten the mob snapshot unit tests --- Key: HBASE-12331 URL: https://issues.apache.org/jira/browse/HBASE-12331 Project: HBase Issue Type: Sub-task Components: mob Affects Versions: hbase-11339 Reporter: Jonathan Hsieh Fix For: hbase-11339 Attachments: HBASE-12331-V1.diff The mob snapshot patch introduced a whole log of tests that take a long time to run and would be better as integration tests. {code} --- T E S T S --- Running org.apache.hadoop.hbase.client.TestMobRestoreSnapshotFromClientWithRegionReplicas Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 394.803 sec - in org.apache.hadoop.hbase.client.TestMobRestoreSnapshotFromClientWithRegionReplicas Running org.apache.hadoop.hbase.client.TestMobRestoreSnapshotFromClient Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 212.377 sec - in org.apache.hadoop.hbase.client.TestMobRestoreSnapshotFromClient Running org.apache.hadoop.hbase.client.TestMobSnapshotFromClientWithRegionReplicas Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 49.463 sec - in org.apache.hadoop.hbase.client.TestMobSnapshotFromClientWithRegionReplicas Running org.apache.hadoop.hbase.client.TestMobSnapshotFromClient Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 46.724 sec - in org.apache.hadoop.hbase.client.TestMobSnapshotFromClient Running org.apache.hadoop.hbase.client.TestMobCloneSnapshotFromClient Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 204.03 sec - in org.apache.hadoop.hbase.client.TestMobCloneSnapshotFromClient Running org.apache.hadoop.hbase.client.TestMobCloneSnapshotFromClientWithRegionReplicas Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 214.052 sec - in org.apache.hadoop.hbase.client.TestMobCloneSnapshotFromClientWithRegionReplicas Running org.apache.hadoop.hbase.client.TestMobSnapshotCloneIndependence Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 105.139 sec - in org.apache.hadoop.hbase.client.TestMobSnapshotCloneIndependence Running org.apache.hadoop.hbase.regionserver.TestMobStoreScanner Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 24.42 sec - in org.apache.hadoop.hbase.regionserver.TestMobStoreScanner Running org.apache.hadoop.hbase.regionserver.TestDeleteMobTable Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.136 sec - in org.apache.hadoop.hbase.regionserver.TestDeleteMobTable Running org.apache.hadoop.hbase.regionserver.TestHMobStore Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.09 sec - in org.apache.hadoop.hbase.regionserver.TestHMobStore Running org.apache.hadoop.hbase.regionserver.TestMobCompaction Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 20.629 sec - in org.apache.hadoop.hbase.regionserver.TestMobCompaction Running org.apache.hadoop.hbase.mob.TestCachedMobFile Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.301 sec - in org.apache.hadoop.hbase.mob.TestCachedMobFile Running org.apache.hadoop.hbase.mob.mapreduce.TestMobSweepJob Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.752 sec - in org.apache.hadoop.hbase.mob.mapreduce.TestMobSweepJob Running org.apache.hadoop.hbase.mob.mapreduce.TestMobSweepReducer Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.276 sec - in org.apache.hadoop.hbase.mob.mapreduce.TestMobSweepReducer Running org.apache.hadoop.hbase.mob.mapreduce.TestMobSweepMapper Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.46 sec - in org.apache.hadoop.hbase.mob.mapreduce.TestMobSweepMapper Running org.apache.hadoop.hbase.mob.mapreduce.TestMobSweeper Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 173.05 sec - in org.apache.hadoop.hbase.mob.mapreduce.TestMobSweeper Running org.apache.hadoop.hbase.mob.TestMobDataBlockEncoding Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 20.86 sec - in org.apache.hadoop.hbase.mob.TestMobDataBlockEncoding Running
[jira] [Updated] (HBASE-12279) Generated thrift files were generated with the wrong parameters
[ https://issues.apache.org/jira/browse/HBASE-12279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niels Basjes updated HBASE-12279: - Status: Open (was: Patch Available) Generated thrift files were generated with the wrong parameters --- Key: HBASE-12279 URL: https://issues.apache.org/jira/browse/HBASE-12279 Project: HBase Issue Type: Bug Affects Versions: 0.99.0, 0.98.0, 0.94.0 Reporter: Niels Basjes Fix For: 2.0.0, 0.98.8, 0.94.26, 0.99.2 Attachments: HBASE-12279-2014-10-16-v1.patch It turns out that the java code generated from the thrift files have been generated with the wrong settings. Instead of the documented ([thrift|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift/package-summary.html], [thrift2|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift2/package-summary.html]) {code} thrift -strict --gen java:hashcode {code} the current files seem to be generated instead with {code} thrift -strict --gen java {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12279) Generated thrift files were generated with the wrong parameters
[ https://issues.apache.org/jira/browse/HBASE-12279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niels Basjes updated HBASE-12279: - Attachment: HBASE-12279-2014-11-07-v2.patch Because HBASE-12272 has been committed this patch can now be done simply created by running {code}mvn generate-sources -Pcompile-thrift{code} on a clean checkout of the source tree. To ensure this doesn't break any existing tests I've attached the patch for the current master branch so Jenkins can do a verification run. I think that this patch file shouldn't be used for the actual commit. I think the above command should be used as it is much easier to do the same on all branches. Generated thrift files were generated with the wrong parameters --- Key: HBASE-12279 URL: https://issues.apache.org/jira/browse/HBASE-12279 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.98.0, 0.99.0 Reporter: Niels Basjes Fix For: 2.0.0, 0.98.8, 0.94.26, 0.99.2 Attachments: HBASE-12279-2014-10-16-v1.patch, HBASE-12279-2014-11-07-v2.patch It turns out that the java code generated from the thrift files have been generated with the wrong settings. Instead of the documented ([thrift|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift/package-summary.html], [thrift2|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift2/package-summary.html]) {code} thrift -strict --gen java:hashcode {code} the current files seem to be generated instead with {code} thrift -strict --gen java {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12279) Generated thrift files were generated with the wrong parameters
[ https://issues.apache.org/jira/browse/HBASE-12279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niels Basjes updated HBASE-12279: - Status: Patch Available (was: Open) Generated thrift files were generated with the wrong parameters --- Key: HBASE-12279 URL: https://issues.apache.org/jira/browse/HBASE-12279 Project: HBase Issue Type: Bug Affects Versions: 0.99.0, 0.98.0, 0.94.0 Reporter: Niels Basjes Fix For: 2.0.0, 0.98.8, 0.94.26, 0.99.2 Attachments: HBASE-12279-2014-10-16-v1.patch, HBASE-12279-2014-11-07-v2.patch It turns out that the java code generated from the thrift files have been generated with the wrong settings. Instead of the documented ([thrift|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift/package-summary.html], [thrift2|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift2/package-summary.html]) {code} thrift -strict --gen java:hashcode {code} the current files seem to be generated instead with {code} thrift -strict --gen java {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12440) Region may remain offline on clean startup under certain race condition
[ https://issues.apache.org/jira/browse/HBASE-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virag Kothari updated HBASE-12440: -- Attachment: HBASE-12440-0.98_v2.patch HBASE-12440-branch-1.patch Thanks for the review [~apurtell] v2 removes the changes in ServerManager. I overthought the test before. Also on branch-1, one of the tests started failing with v1 as the case where the table is disabled before SSH tries to do the assign was not handled. v2 adds a check for that. Region may remain offline on clean startup under certain race condition --- Key: HBASE-12440 URL: https://issues.apache.org/jira/browse/HBASE-12440 Project: HBase Issue Type: Bug Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 0.98.8, 0.99.1 Attachments: HBASE-12440-0.98.patch, HBASE-12440-0.98_v2.patch, HBASE-12440-branch-1.patch Saw this in prod some time back with zk assignment On clean startup, while master was doing bulk assign while one of the region servers dies. The bulk assigner then tried to assign it individually using AssignCallable. The AssignCallable does a forceStateToOffline() and skips assigning as it wants the SSH to do the assignment {code} 2014-10-16 16:05:23,593 DEBUG master.AssignmentManager [AM.-pool1-t1] : Offline sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., no need to unassign since it's on a dead server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 2014-10-16 16:05:23,593 INFO master.RegionStates [AM.-pool1-t1] : Transition {1f1620174d2542fe7d5b034f3311c3a8 state=PENDING_OPEN, ts=1413475519482, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} to {1f1620174d2542fe7d5b034f3311c3a8 state=OFFLINE, ts=1413475523593, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} 2014-10-16 16:05:23,598 INFO master.AssignmentManager [AM.-pool1-t1] : Skip assigning sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., it is on a dead but not processed yet server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 {code} But the SSH wont assign as the region is offline but not in transition {code} 2014-10-16 16:05:24,606 INFO handler.ServerShutdownHandler [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Reassigning 0 region(s) that gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 was carrying (and 0 regions(s) that were opening on this server) 2014-10-16 16:05:24,606 DEBUG master.DeadServer [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Finished processing gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 {code} In zk-less assignment, the bulk assigner invoking AssignCallable and the SSH may try to assign the region. But as they go through lock, only one will succeed and doesn't seem to be an issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12440) Region may remain offline on clean startup under certain race condition
[ https://issues.apache.org/jira/browse/HBASE-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virag Kothari updated HBASE-12440: -- Component/s: Region Assignment Region may remain offline on clean startup under certain race condition --- Key: HBASE-12440 URL: https://issues.apache.org/jira/browse/HBASE-12440 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 0.98.8, 0.99.1 Attachments: HBASE-12440-0.98.patch, HBASE-12440-0.98_v2.patch, HBASE-12440-branch-1.patch Saw this in prod some time back with zk assignment On clean startup, while master was doing bulk assign while one of the region servers dies. The bulk assigner then tried to assign it individually using AssignCallable. The AssignCallable does a forceStateToOffline() and skips assigning as it wants the SSH to do the assignment {code} 2014-10-16 16:05:23,593 DEBUG master.AssignmentManager [AM.-pool1-t1] : Offline sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., no need to unassign since it's on a dead server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 2014-10-16 16:05:23,593 INFO master.RegionStates [AM.-pool1-t1] : Transition {1f1620174d2542fe7d5b034f3311c3a8 state=PENDING_OPEN, ts=1413475519482, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} to {1f1620174d2542fe7d5b034f3311c3a8 state=OFFLINE, ts=1413475523593, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} 2014-10-16 16:05:23,598 INFO master.AssignmentManager [AM.-pool1-t1] : Skip assigning sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., it is on a dead but not processed yet server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 {code} But the SSH wont assign as the region is offline but not in transition {code} 2014-10-16 16:05:24,606 INFO handler.ServerShutdownHandler [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Reassigning 0 region(s) that gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 was carrying (and 0 regions(s) that were opening on this server) 2014-10-16 16:05:24,606 DEBUG master.DeadServer [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Finished processing gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 {code} In zk-less assignment, the bulk assigner invoking AssignCallable and the SSH may try to assign the region. But as they go through lock, only one will succeed and doesn't seem to be an issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12279) Generated thrift files were generated with the wrong parameters
[ https://issues.apache.org/jira/browse/HBASE-12279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201905#comment-14201905 ] Hadoop QA commented on HBASE-12279: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680115/HBASE-12279-2014-11-07-v2.patch against trunk revision . ATTACHMENT ID: 12680115 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color}. The applied patch generated 108 javac compiler warnings (more than the trunk's current 102 warnings). {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +lastComparison = Boolean.valueOf(isSetAuthorizations()).compareTo(typedOther.isSetAuthorizations()); +lastComparison = Boolean.valueOf(isSetCellVisibility()).compareTo(typedOther.isSetCellVisibility()); +lastComparison = Boolean.valueOf(isSetAuthorizations()).compareTo(typedOther.isSetAuthorizations()); {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11613//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11613//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11613//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11613//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11613//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11613//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11613//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11613//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11613//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11613//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11613//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11613//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11613//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11613//console This message is automatically generated. Generated thrift files were generated with the wrong parameters --- Key: HBASE-12279 URL: https://issues.apache.org/jira/browse/HBASE-12279 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.98.0, 0.99.0 Reporter: Niels Basjes Fix For: 2.0.0, 0.98.8, 0.94.26, 0.99.2 Attachments: HBASE-12279-2014-10-16-v1.patch, HBASE-12279-2014-11-07-v2.patch It turns out that the java code generated from the thrift files have been generated with the wrong settings. Instead of the documented ([thrift|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift/package-summary.html], [thrift2|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift2/package-summary.html]) {code} thrift -strict --gen java:hashcode {code} the
[jira] [Updated] (HBASE-10483) Provide API for retrieving info port when hbase.master.info.port is set to 0
[ https://issues.apache.org/jira/browse/HBASE-10483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated HBASE-10483: Attachment: HBASE-10483-v3.diff A new patch for hbase master. - Add info port field to master pb in zk - Client, RegionServersand Backup Masters get active master's info port through MasterAddressTracker. [~stack] [~tedyu] [~enis] Please help to review this patch, thx. Provide API for retrieving info port when hbase.master.info.port is set to 0 Key: HBASE-10483 URL: https://issues.apache.org/jira/browse/HBASE-10483 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Liu Shaohui Attachments: HBASE-10483-trunk-v1.diff, HBASE-10483-trunk-v2.diff, HBASE-10483-v3.diff When hbase.master.info.port is set to 0, info port is dynamically determined. An API should be provided so that client can retrieve the actual info port. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12443) After increasing the TTL value of a Hbase Table , table gets inaccessible. Scan table not working.
[ https://issues.apache.org/jira/browse/HBASE-12443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202175#comment-14202175 ] Lars Hofhansl commented on HBASE-12443: --- If this is not the same issue, feel free to reopen of course. After increasing the TTL value of a Hbase Table , table gets inaccessible. Scan table not working. -- Key: HBASE-12443 URL: https://issues.apache.org/jira/browse/HBASE-12443 Project: HBase Issue Type: Bug Components: HFile Reporter: Prabhu Joseph Priority: Blocker Fix For: 2.0.0 After increasing the TTL value of a Hbase Table , table gets inaccessible. Scan table not working. Scan in hbase shell throws java.lang.IllegalStateException: Block index not loaded at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hbase.io.hfile.HFileReaderV1.blockContainingKey(HFileReaderV1.java:181) at org.apache.hadoop.hbase.io.hfile.HFileReaderV1$AbstractScannerV1.seekTo(HFileReaderV1.java:426) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145) at org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:131) at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2015) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3706) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1761) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1753) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1730) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2409) at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) STEPS to Reproduce: create 'debugger',{NAME = 'd',TTL = 15552000} put 'debugger','jdb','d:desc','Java debugger',1399699792000 disable 'debugger' alter 'debugger',{NAME = 'd',TTL = 6912} enable 'debugger' scan 'debugger' Reason for the issue: When inserting already expired data in debugger table, hbase creates a hfile with empty data block and index block. On scanning table, StoreFile.Reader checks whether the TimeRangeTracker's maximum timestamp is greater than ttl value, so it skips the empty file. But when ttl is changed, the maximum timestamp will be lesser than ttl value, so StoreFile.Reader tries to read index block from HFile leading to java.lang.IllegalStateException: Block index not loaded. SOLUTION: StoreFile.java boolean passesTimerangeFilter(Scan scan, long oldestUnexpiredTS) { if (timeRangeTracker == null) { return true; } else { return timeRangeTracker.includesTimeRange(scan.getTimeRange()) timeRangeTracker.getMaximumTimestamp() = oldestUnexpiredTS; } } In the above method, by checking whether there are entries in the hfile by using FixedFileTrailer block we can skip scanning the empty hfile. // changed code will solve the issue boolean passesTimerangeFilter(Scan scan, long oldestUnexpiredTS) { if (timeRangeTracker == null) { return true; } else { return timeRangeTracker.includesTimeRange(scan.getTimeRange()) timeRangeTracker.getMaximumTimestamp() = oldestUnexpiredTS reader.getEntries()0; } } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12443) After increasing the TTL value of a Hbase Table , table gets inaccessible. Scan table not working.
[ https://issues.apache.org/jira/browse/HBASE-12443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202508#comment-14202508 ] Lars Hofhansl commented on HBASE-12443: --- And if you have a patch please post it. :) After increasing the TTL value of a Hbase Table , table gets inaccessible. Scan table not working. -- Key: HBASE-12443 URL: https://issues.apache.org/jira/browse/HBASE-12443 Project: HBase Issue Type: Bug Components: HFile Reporter: Prabhu Joseph Priority: Blocker Fix For: 2.0.0 After increasing the TTL value of a Hbase Table , table gets inaccessible. Scan table not working. Scan in hbase shell throws java.lang.IllegalStateException: Block index not loaded at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hbase.io.hfile.HFileReaderV1.blockContainingKey(HFileReaderV1.java:181) at org.apache.hadoop.hbase.io.hfile.HFileReaderV1$AbstractScannerV1.seekTo(HFileReaderV1.java:426) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145) at org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:131) at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2015) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3706) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1761) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1753) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1730) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2409) at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) STEPS to Reproduce: create 'debugger',{NAME = 'd',TTL = 15552000} put 'debugger','jdb','d:desc','Java debugger',1399699792000 disable 'debugger' alter 'debugger',{NAME = 'd',TTL = 6912} enable 'debugger' scan 'debugger' Reason for the issue: When inserting already expired data in debugger table, hbase creates a hfile with empty data block and index block. On scanning table, StoreFile.Reader checks whether the TimeRangeTracker's maximum timestamp is greater than ttl value, so it skips the empty file. But when ttl is changed, the maximum timestamp will be lesser than ttl value, so StoreFile.Reader tries to read index block from HFile leading to java.lang.IllegalStateException: Block index not loaded. SOLUTION: StoreFile.java boolean passesTimerangeFilter(Scan scan, long oldestUnexpiredTS) { if (timeRangeTracker == null) { return true; } else { return timeRangeTracker.includesTimeRange(scan.getTimeRange()) timeRangeTracker.getMaximumTimestamp() = oldestUnexpiredTS; } } In the above method, by checking whether there are entries in the hfile by using FixedFileTrailer block we can skip scanning the empty hfile. // changed code will solve the issue boolean passesTimerangeFilter(Scan scan, long oldestUnexpiredTS) { if (timeRangeTracker == null) { return true; } else { return timeRangeTracker.includesTimeRange(scan.getTimeRange()) timeRangeTracker.getMaximumTimestamp() = oldestUnexpiredTS reader.getEntries()0; } } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12012) Improve cancellation for the scan RPCs
[ https://issues.apache.org/jira/browse/HBASE-12012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202514#comment-14202514 ] Devaraj Das commented on HBASE-12012: - [~stack] this patch mostly refactors the scan code to use the (improved client side) cancellation that is used in the other RPC parts of the code. I think I need to update the code a little. The tests on HBASE-11564 were with this, yes. I think I need to update the patch some to take into account some fixes that I did after cluster testing. Will post one soon. Improve cancellation for the scan RPCs -- Key: HBASE-12012 URL: https://issues.apache.org/jira/browse/HBASE-12012 Project: HBase Issue Type: Sub-task Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 2.0.0, 0.99.2 Attachments: 12012-1.txt Similar to HBASE-11564 but for scans. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12440) Region may remain offline on clean startup under certain race condition
[ https://issues.apache.org/jira/browse/HBASE-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202520#comment-14202520 ] Andrew Purtell commented on HBASE-12440: The v2 patch lgtm, let me check for a bit that it doesn't cause any tests to flap or anything like that and will then commit the latest 0.98 and branch-1 patches on this issue. Thanks Virag. Region may remain offline on clean startup under certain race condition --- Key: HBASE-12440 URL: https://issues.apache.org/jira/browse/HBASE-12440 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 0.98.8, 0.99.1 Attachments: HBASE-12440-0.98.patch, HBASE-12440-0.98_v2.patch, HBASE-12440-branch-1.patch Saw this in prod some time back with zk assignment On clean startup, while master was doing bulk assign while one of the region servers dies. The bulk assigner then tried to assign it individually using AssignCallable. The AssignCallable does a forceStateToOffline() and skips assigning as it wants the SSH to do the assignment {code} 2014-10-16 16:05:23,593 DEBUG master.AssignmentManager [AM.-pool1-t1] : Offline sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., no need to unassign since it's on a dead server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 2014-10-16 16:05:23,593 INFO master.RegionStates [AM.-pool1-t1] : Transition {1f1620174d2542fe7d5b034f3311c3a8 state=PENDING_OPEN, ts=1413475519482, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} to {1f1620174d2542fe7d5b034f3311c3a8 state=OFFLINE, ts=1413475523593, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} 2014-10-16 16:05:23,598 INFO master.AssignmentManager [AM.-pool1-t1] : Skip assigning sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., it is on a dead but not processed yet server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 {code} But the SSH wont assign as the region is offline but not in transition {code} 2014-10-16 16:05:24,606 INFO handler.ServerShutdownHandler [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Reassigning 0 region(s) that gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 was carrying (and 0 regions(s) that were opening on this server) 2014-10-16 16:05:24,606 DEBUG master.DeadServer [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Finished processing gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 {code} In zk-less assignment, the bulk assigner invoking AssignCallable and the SSH may try to assign the region. But as they go through lock, only one will succeed and doesn't seem to be an issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12272) Generate Thrift code through maven
[ https://issues.apache.org/jira/browse/HBASE-12272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-12272: -- Fix Version/s: (was: 0.94.26) 0.94.25 Generate Thrift code through maven -- Key: HBASE-12272 URL: https://issues.apache.org/jira/browse/HBASE-12272 Project: HBase Issue Type: Improvement Components: build, documentation, Thrift Reporter: Niels Basjes Assignee: Niels Basjes Fix For: 2.0.0, 0.98.8, 0.94.25, 0.99.2 Attachments: HBASE-12272-2014-10-15-v1-PREVIEW.patch, HBASE-12272-2014-10-16-v2.patch, HBASE-12272-2014-10-16-v3.patch, HBASE-12272-2014-10-16-v4.patch, HBASE-12272-2014-11-04-v5.patch, HBASE-12272-2014-11-05-v5.patch, HBASE-12272-2014-11-05-v5.patch The generated thrift code is currently under source control, but the instructions on rebuilding it is buried in package javadocs. We should have a simple maven command to rebuild them, similar to what we have for protobufs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-5162) Basic client pushback mechanism
[ https://issues.apache.org/jira/browse/HBASE-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HBASE-5162: --- Status: Open (was: Patch Available) Basic client pushback mechanism --- Key: HBASE-5162 URL: https://issues.apache.org/jira/browse/HBASE-5162 Project: HBase Issue Type: New Feature Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jesse Yates Fix For: 1.0.0 Attachments: hbase-5162-trunk-v0.patch, hbase-5162-trunk-v1.patch, hbase-5162-trunk-v2.patch, hbase-5162-trunk-v3.patch, hbase-5162-trunk-v4.patch, hbase-5162-trunk-v5.patch, java_HBASE-5162.patch The current blocking we do when we are close to some limits (memstores over the multiplier factor, too many store files, global memstore memory) is bad, too coarse and confusing. After hitting HBASE-5161, it really becomes obvious that we need something better. I did a little brainstorm with Stack, we came up quickly with two solutions: - Send some exception to the client, like OverloadedException, that's thrown when some situation happens like getting past the low memory barrier. It would be thrown when the client gets a handler and does some check while putting or deleting. The client would treat this a retryable exception but ideally wouldn't check .META. for a new location. It could be fancy and have multiple levels of pushback, like send the exception to 25% of the clients, and then go up if the situation persists. Should be easy to implement but we'll be using a lot more IO to send the payload over and over again (but at least it wouldn't sit in the RS's memory). - Send a message alongside a successful put or delete to tell the client to slow down a little, this way we don't have to do back and forth with the payload between the client and the server. It's a cleaner (I think) but more involved solution. In every case the RS should do very obvious things to notify the operators of this situation, through logs, web UI, metrics, etc. Other ideas? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-5162) Basic client pushback mechanism
[ https://issues.apache.org/jira/browse/HBASE-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HBASE-5162: --- Attachment: hbase-5162-trunk-v6.patch Updated patch on latest master (we were getting behind a little bit) and hopefully fixing the checkstyle and findbugs issues. Basic client pushback mechanism --- Key: HBASE-5162 URL: https://issues.apache.org/jira/browse/HBASE-5162 Project: HBase Issue Type: New Feature Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jesse Yates Fix For: 1.0.0 Attachments: hbase-5162-trunk-v0.patch, hbase-5162-trunk-v1.patch, hbase-5162-trunk-v2.patch, hbase-5162-trunk-v3.patch, hbase-5162-trunk-v4.patch, hbase-5162-trunk-v5.patch, hbase-5162-trunk-v6.patch, java_HBASE-5162.patch The current blocking we do when we are close to some limits (memstores over the multiplier factor, too many store files, global memstore memory) is bad, too coarse and confusing. After hitting HBASE-5161, it really becomes obvious that we need something better. I did a little brainstorm with Stack, we came up quickly with two solutions: - Send some exception to the client, like OverloadedException, that's thrown when some situation happens like getting past the low memory barrier. It would be thrown when the client gets a handler and does some check while putting or deleting. The client would treat this a retryable exception but ideally wouldn't check .META. for a new location. It could be fancy and have multiple levels of pushback, like send the exception to 25% of the clients, and then go up if the situation persists. Should be easy to implement but we'll be using a lot more IO to send the payload over and over again (but at least it wouldn't sit in the RS's memory). - Send a message alongside a successful put or delete to tell the client to slow down a little, this way we don't have to do back and forth with the payload between the client and the server. It's a cleaner (I think) but more involved solution. In every case the RS should do very obvious things to notify the operators of this situation, through logs, web UI, metrics, etc. Other ideas? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-5162) Basic client pushback mechanism
[ https://issues.apache.org/jira/browse/HBASE-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HBASE-5162: --- Status: Patch Available (was: Open) Basic client pushback mechanism --- Key: HBASE-5162 URL: https://issues.apache.org/jira/browse/HBASE-5162 Project: HBase Issue Type: New Feature Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jesse Yates Fix For: 1.0.0 Attachments: hbase-5162-trunk-v0.patch, hbase-5162-trunk-v1.patch, hbase-5162-trunk-v2.patch, hbase-5162-trunk-v3.patch, hbase-5162-trunk-v4.patch, hbase-5162-trunk-v5.patch, hbase-5162-trunk-v6.patch, java_HBASE-5162.patch The current blocking we do when we are close to some limits (memstores over the multiplier factor, too many store files, global memstore memory) is bad, too coarse and confusing. After hitting HBASE-5161, it really becomes obvious that we need something better. I did a little brainstorm with Stack, we came up quickly with two solutions: - Send some exception to the client, like OverloadedException, that's thrown when some situation happens like getting past the low memory barrier. It would be thrown when the client gets a handler and does some check while putting or deleting. The client would treat this a retryable exception but ideally wouldn't check .META. for a new location. It could be fancy and have multiple levels of pushback, like send the exception to 25% of the clients, and then go up if the situation persists. Should be easy to implement but we'll be using a lot more IO to send the payload over and over again (but at least it wouldn't sit in the RS's memory). - Send a message alongside a successful put or delete to tell the client to slow down a little, this way we don't have to do back and forth with the payload between the client and the server. It's a cleaner (I think) but more involved solution. In every case the RS should do very obvious things to notify the operators of this situation, through logs, web UI, metrics, etc. Other ideas? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10201: --- Fix Version/s: 0.98.9 Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12346) Scan's default auths behavior under Visibility labels
[ https://issues.apache.org/jira/browse/HBASE-12346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202582#comment-14202582 ] Andrew Purtell commented on HBASE-12346: Yes, a new SLG. The proposed change to EnforcingScanLabelGenerator would remove its essential feature. Doesn't have to be complex to configure from the user's perspective, we could provide canned shortcut configuration strings that expand into SLG stacks. Documentation would be good. Scan's default auths behavior under Visibility labels - Key: HBASE-12346 URL: https://issues.apache.org/jira/browse/HBASE-12346 Project: HBase Issue Type: Bug Components: API, security Affects Versions: 0.98.7, 0.99.1 Reporter: Jerry He Fix For: 0.98.8, 0.99.2 Attachments: HBASE-12346-master-v2.patch, HBASE-12346-master-v3.patch, HBASE-12346-master.patch In Visibility Labels security, a set of labels (auths) are administered and associated with a user. A user can normally only see cell data during scan that are part of the user's label set (auths). Scan uses setAuthorizations to indicates its wants to use the auths to access the cells. Similarly in the shell: {code} scan 'table1', AUTHORIZATIONS = ['private'] {code} But it is a surprise to find that setAuthorizations seems to be 'mandatory' in the default visibility label security setting. Every scan needs to setAuthorizations before the scan can get any cells even the cells are under the labels the request user is part of. The following steps will illustrate the issue: Run as superuser. {code} 1. create a visibility label called 'private' 2. create 'table1' 3. put into 'table1' data and label the data as 'private' 4. set_auths 'user1', 'private' 5. grant 'user1', 'RW', 'table1' {code} Run as 'user1': {code} 1. scan 'table1' This show no cells. 2. scan 'table1', scan 'table1', AUTHORIZATIONS = ['private'] This will show all the data. {code} I am not sure if this is expected by design or a bug. But a more reasonable, more client application backward compatible, and less surprising default behavior should probably look like this: A scan's default auths, if its Authorizations attributes is not set explicitly, should be all the auths the request user is administered and allowed on the server. If scan.setAuthorizations is used, then the server further filter the auths during scan: use the input auths minus what is not in user's label set on the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12424) Finer grained logging and metrics for split transactions
[ https://issues.apache.org/jira/browse/HBASE-12424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12424: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to 0.98+. Thanks for the review [~jesse_yates] Finer grained logging and metrics for split transactions Key: HBASE-12424 URL: https://issues.apache.org/jira/browse/HBASE-12424 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 0001-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, 0002-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, 0003-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, HBASE-12424-0.98.patch, HBASE-12424.patch, HBASE-12424.patch, HBASE-12424.patch, HowHBaseRegionSplitsareImplemented.pdf A split transaction is a complex orchestration of activity between the RegionServer, Master, ZooKeeper, and HDFS NameNode. We have some visibility into the time taken by various phases of the split transaction in the logs. We will see Starting split of region $PARENT before the transaction begins, before the parent is offlined. Later we will see Opening $DAUGHTER as one of the last steps in the transaction, this is after the parent has been flushed, offlined, and closed. Finally Region split, hbase:meta updated, and report to master ... Split took $TIME after all steps are complete and including the total running time of the transaction. For debugging the cause(s) of long running split transactions it would be useful to know the distribution of time spent in all of the phases of the split transaction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12336) RegionServer failed to shutdown for NodeFailoverWorker thread
[ https://issues.apache.org/jira/browse/HBASE-12336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12336: --- Fix Version/s: (was: 0.98.9) 0.98.8 RegionServer failed to shutdown for NodeFailoverWorker thread - Key: HBASE-12336 URL: https://issues.apache.org/jira/browse/HBASE-12336 Project: HBase Issue Type: Bug Affects Versions: 0.94.11 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 2.0.0, 0.98.8, 0.94.25, 0.99.2 Attachments: HBASE-12336-trunk-v1.diff, stack After enabling hbase.zookeeper.useMulti in hbase cluster, we found that regionserver failed to shutdown. Other threads have exited except a NodeFailoverWorker thread. {code} ReplicationExecutor-0 prio=10 tid=0x7f0d40195ad0 nid=0x73a in Object.wait() [0x7f0dc8fe6000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309) - locked 0x0005a16df080 (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:930) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:912) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:531) at org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1518) at org.apache.hadoop.hbase.replication.ReplicationZookeeper.copyQueuesFromRSUsingMulti(ReplicationZookeeper.java:804) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:612) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} It's sure that the shutdown method of the executor is called in ReplicationSourceManager#join. I am looking for the root cause and suggestions are welcomed. Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12381) Add maven enforcer rules for build assumptions
[ https://issues.apache.org/jira/browse/HBASE-12381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12381: --- Fix Version/s: (was: 0.98.9) 0.98.8 Add maven enforcer rules for build assumptions -- Key: HBASE-12381 URL: https://issues.apache.org/jira/browse/HBASE-12381 Project: HBase Issue Type: Task Components: build Reporter: Sean Busbey Assignee: Sean Busbey Priority: Minor Fix For: 2.0.0, 0.98.8, 0.94.25, 0.99.2 Attachments: HBASE-12381.1.patch.txt our ref guide says that you need maven 3 to build. add an enforcer rule so that people find out early that they have the wrong maven version, rather then however things fall over if someone tries to build with maven 2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12424) Finer grained logging and metrics for split transactions
[ https://issues.apache.org/jira/browse/HBASE-12424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202590#comment-14202590 ] Hudson commented on HBASE-12424: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #630 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/630/]) HBASE-12424 Finer grained logging and metrics for split transactions (apurtell: rev 60fb3530364364202235b3c40bdf55ff1ea459a8) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSource.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServer.java Finer grained logging and metrics for split transactions Key: HBASE-12424 URL: https://issues.apache.org/jira/browse/HBASE-12424 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 0001-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, 0002-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, 0003-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, HBASE-12424-0.98.patch, HBASE-12424.patch, HBASE-12424.patch, HBASE-12424.patch, HowHBaseRegionSplitsareImplemented.pdf A split transaction is a complex orchestration of activity between the RegionServer, Master, ZooKeeper, and HDFS NameNode. We have some visibility into the time taken by various phases of the split transaction in the logs. We will see Starting split of region $PARENT before the transaction begins, before the parent is offlined. Later we will see Opening $DAUGHTER as one of the last steps in the transaction, this is after the parent has been flushed, offlined, and closed. Finally Region split, hbase:meta updated, and report to master ... Split took $TIME after all steps are complete and including the total running time of the transaction. For debugging the cause(s) of long running split transactions it would be useful to know the distribution of time spent in all of the phases of the split transaction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12375) LoadIncrementalHFiles fails to load data in table when CF name starts with '_'
[ https://issues.apache.org/jira/browse/HBASE-12375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12375: --- Fix Version/s: (was: 0.98.9) 0.98.8 LoadIncrementalHFiles fails to load data in table when CF name starts with '_' -- Key: HBASE-12375 URL: https://issues.apache.org/jira/browse/HBASE-12375 Project: HBase Issue Type: Bug Affects Versions: 0.98.5 Reporter: Ashish Singhi Assignee: Ashish Singhi Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12375-0.98.patch, HBASE-12375-v2.patch, HBASE-12375.patch We do not restrict user from creating a table having column family starting with '_'. So when user creates a table in such a way then LoadIncrementalHFiles will skip those family data to load into the table. {code} // Skip _logs, etc if (familyDir.getName().startsWith(_)) continue; {code} I think we should remove that check as I do not see any _logs directory being created by the bulkload tool in the output directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12376) HBaseAdmin leaks ZK connections if failure starting watchers (ConnectionLossException)
[ https://issues.apache.org/jira/browse/HBASE-12376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12376: --- Fix Version/s: (was: 0.98.9) 0.98.8 HBaseAdmin leaks ZK connections if failure starting watchers (ConnectionLossException) -- Key: HBASE-12376 URL: https://issues.apache.org/jira/browse/HBASE-12376 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.98.7, 0.94.24 Reporter: stack Assignee: stack Priority: Critical Fix For: 0.98.8, 0.94.25 Attachments: 0001-12376-HBaseAdmin-leaks-ZK-connections-if-failure-sta.patch, 0001-12376-HBaseAdmin-leaks-ZK-connections-if-failure-sta.version2.patch This is a 0.98 issue that some users have been running into mostly running Canary and for whatever reason, setup of zk connection fails, usually with a ConnectionLossException. End result is ugly leak zk connections. ZKWatcher created instances are just left hang out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12432) RpcRetryingCaller should log after fixed number of retries like AsyncProcess
[ https://issues.apache.org/jira/browse/HBASE-12432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202596#comment-14202596 ] Andrew Purtell commented on HBASE-12432: I'm going to commit this momentarily RpcRetryingCaller should log after fixed number of retries like AsyncProcess Key: HBASE-12432 URL: https://issues.apache.org/jira/browse/HBASE-12432 Project: HBase Issue Type: Improvement Components: Client Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12432.00-0.98.patch, HBASE-12432.00.patch, HBASE-12432.01-0.98.patch, HBASE-12432.01.patch Scanner retry is handled by RpcRetryingCaller. This is different from multi, which is handled by AsyncProcess. AsyncProcess will start logging operation status after hbase.client.start.log.errors.counter retries have been attempted. Let's bring the same functionality over to Scanner path. Noticed this while debugging IntegrationTestMTTR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12432) RpcRetryingCaller should log after fixed number of retries like AsyncProcess
[ https://issues.apache.org/jira/browse/HBASE-12432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12432: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to 0.98+ RpcRetryingCaller should log after fixed number of retries like AsyncProcess Key: HBASE-12432 URL: https://issues.apache.org/jira/browse/HBASE-12432 Project: HBase Issue Type: Improvement Components: Client Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12432.00-0.98.patch, HBASE-12432.00.patch, HBASE-12432.01-0.98.patch, HBASE-12432.01.patch Scanner retry is handled by RpcRetryingCaller. This is different from multi, which is handled by AsyncProcess. AsyncProcess will start logging operation status after hbase.client.start.log.errors.counter retries have been attempted. Let's bring the same functionality over to Scanner path. Noticed this while debugging IntegrationTestMTTR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12432) RpcRetryingCaller should log after fixed number of retries like AsyncProcess
[ https://issues.apache.org/jira/browse/HBASE-12432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202614#comment-14202614 ] Nick Dimiduk commented on HBASE-12432: -- Ran out of time yesterday and I'm still catching up with today. Thanks Andrew. RpcRetryingCaller should log after fixed number of retries like AsyncProcess Key: HBASE-12432 URL: https://issues.apache.org/jira/browse/HBASE-12432 Project: HBase Issue Type: Improvement Components: Client Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12432.00-0.98.patch, HBASE-12432.00.patch, HBASE-12432.01-0.98.patch, HBASE-12432.01.patch Scanner retry is handled by RpcRetryingCaller. This is different from multi, which is handled by AsyncProcess. AsyncProcess will start logging operation status after hbase.client.start.log.errors.counter retries have been attempted. Let's bring the same functionality over to Scanner path. Noticed this while debugging IntegrationTestMTTR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12432) RpcRetryingCaller should log after fixed number of retries like AsyncProcess
[ https://issues.apache.org/jira/browse/HBASE-12432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202626#comment-14202626 ] Nick Dimiduk commented on HBASE-12432: -- While I'm at it, both this and AsyncProcess should be emitting this log at debug, not info. We should fold that into Sean's work. RpcRetryingCaller should log after fixed number of retries like AsyncProcess Key: HBASE-12432 URL: https://issues.apache.org/jira/browse/HBASE-12432 Project: HBase Issue Type: Improvement Components: Client Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12432.00-0.98.patch, HBASE-12432.00.patch, HBASE-12432.01-0.98.patch, HBASE-12432.01.patch Scanner retry is handled by RpcRetryingCaller. This is different from multi, which is handled by AsyncProcess. AsyncProcess will start logging operation status after hbase.client.start.log.errors.counter retries have been attempted. Let's bring the same functionality over to Scanner path. Noticed this while debugging IntegrationTestMTTR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12441) Export and CopyTable need to be able to keep tags/labels in cells
[ https://issues.apache.org/jira/browse/HBASE-12441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-12441: - Issue Type: Improvement (was: Bug) Export and CopyTable need to be able to keep tags/labels in cells - Key: HBASE-12441 URL: https://issues.apache.org/jira/browse/HBASE-12441 Project: HBase Issue Type: Improvement Components: mapreduce, security Affects Versions: 0.98.7, 0.99.3 Reporter: Jerry He Export and CopyTable (and possibly other MR tools) currently do not carry over tags/labels in cells. These tools should be able to keep tags/labels in cells when they back up the table cells. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12439) Procedure V2
[ https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202666#comment-14202666 ] stack commented on HBASE-12439: --- Doc is great. When you have a chance, a few examples would help. Procedure V2 Key: HBASE-12439 URL: https://issues.apache.org/jira/browse/HBASE-12439 Project: HBase Issue Type: New Feature Components: master Affects Versions: 2.0.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Attachments: ProcedureV2.pdf Procedure v2 (aka Notification Bus) aims to provide a unified way to build: * multi-steps procedure with a rollback/rollforward ability in case of failure (e.g. create/delete table) ** HBASE-12070 * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache updates) ** Make sure that every machine has the grant/revoke/label ** Enforce space limit quota across the namespace ** HBASE-10295 eliminate permanent replication zk node * procedures across multiple machines (e.g. Snapshots) * coordinated long-running procedures (e.g. compactions, splits, ...) * Synchronous calls, with the ability to see the state/result in case of failure. ** HBASE-11608 sync split still work in progress/initial prototype: https://reviews.apache.org/r/27703/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12432) RpcRetryingCaller should log after fixed number of retries like AsyncProcess
[ https://issues.apache.org/jira/browse/HBASE-12432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202680#comment-14202680 ] Andrew Purtell commented on HBASE-12432: Flushing pending work for 0.98 RC tonight RpcRetryingCaller should log after fixed number of retries like AsyncProcess Key: HBASE-12432 URL: https://issues.apache.org/jira/browse/HBASE-12432 Project: HBase Issue Type: Improvement Components: Client Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12432.00-0.98.patch, HBASE-12432.00.patch, HBASE-12432.01-0.98.patch, HBASE-12432.01.patch Scanner retry is handled by RpcRetryingCaller. This is different from multi, which is handled by AsyncProcess. AsyncProcess will start logging operation status after hbase.client.start.log.errors.counter retries have been attempted. Let's bring the same functionality over to Scanner path. Noticed this while debugging IntegrationTestMTTR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-5162) Basic client pushback mechanism
[ https://issues.apache.org/jira/browse/HBASE-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202694#comment-14202694 ] Hadoop QA commented on HBASE-5162: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680247/hbase-5162-trunk-v6.patch against trunk revision . ATTACHMENT ID: 12680247 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.http.TestHttpServerLifecycle.testStartedServerIsAlive(TestHttpServerLifecycle.java:71) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11615//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11615//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11615//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11615//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11615//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11615//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11615//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11615//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11615//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11615//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11615//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11615//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11615//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11615//console This message is automatically generated. Basic client pushback mechanism --- Key: HBASE-5162 URL: https://issues.apache.org/jira/browse/HBASE-5162 Project: HBase Issue Type: New Feature Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Jesse Yates Fix For: 1.0.0 Attachments: hbase-5162-trunk-v0.patch, hbase-5162-trunk-v1.patch, hbase-5162-trunk-v2.patch, hbase-5162-trunk-v3.patch, hbase-5162-trunk-v4.patch, hbase-5162-trunk-v5.patch, hbase-5162-trunk-v6.patch, java_HBASE-5162.patch The current blocking we do when we are close to some limits (memstores over the multiplier factor, too many store files, global memstore memory) is bad, too coarse and confusing. After hitting HBASE-5161, it really becomes obvious that we need something better. I did a little brainstorm with Stack, we came up quickly with two solutions: - Send some exception to the client, like OverloadedException, that's thrown when some situation happens like getting past the low memory barrier. It would be thrown when the client gets a handler and does some check while putting or deleting. The client would treat this a retryable exception but
[jira] [Commented] (HBASE-12432) RpcRetryingCaller should log after fixed number of retries like AsyncProcess
[ https://issues.apache.org/jira/browse/HBASE-12432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202703#comment-14202703 ] Hudson commented on HBASE-12432: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #631 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/631/]) HBASE-12432 RpcRetryingCaller should log after fixed number of retries like AsyncProcess (apurtell: rev 2d9bb9d340eeef468f74500209ea2324d5988bb8) * hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerFactory.java * hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncProcess.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCaller.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java RpcRetryingCaller should log after fixed number of retries like AsyncProcess Key: HBASE-12432 URL: https://issues.apache.org/jira/browse/HBASE-12432 Project: HBase Issue Type: Improvement Components: Client Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12432.00-0.98.patch, HBASE-12432.00.patch, HBASE-12432.01-0.98.patch, HBASE-12432.01.patch Scanner retry is handled by RpcRetryingCaller. This is different from multi, which is handled by AsyncProcess. AsyncProcess will start logging operation status after hbase.client.start.log.errors.counter retries have been attempted. Let's bring the same functionality over to Scanner path. Noticed this while debugging IntegrationTestMTTR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12424) Finer grained logging and metrics for split transactions
[ https://issues.apache.org/jira/browse/HBASE-12424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202725#comment-14202725 ] Hudson commented on HBASE-12424: FAILURE: Integrated in HBase-1.0 #444 (See [https://builds.apache.org/job/HBase-1.0/444/]) HBASE-12424 Finer grained logging and metrics for split transactions (apurtell: rev 3eed03268ff73fb67a674bcab6102d3224d44316) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSource.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java Finer grained logging and metrics for split transactions Key: HBASE-12424 URL: https://issues.apache.org/jira/browse/HBASE-12424 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 0001-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, 0002-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, 0003-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, HBASE-12424-0.98.patch, HBASE-12424.patch, HBASE-12424.patch, HBASE-12424.patch, HowHBaseRegionSplitsareImplemented.pdf A split transaction is a complex orchestration of activity between the RegionServer, Master, ZooKeeper, and HDFS NameNode. We have some visibility into the time taken by various phases of the split transaction in the logs. We will see Starting split of region $PARENT before the transaction begins, before the parent is offlined. Later we will see Opening $DAUGHTER as one of the last steps in the transaction, this is after the parent has been flushed, offlined, and closed. Finally Region split, hbase:meta updated, and report to master ... Split took $TIME after all steps are complete and including the total running time of the transaction. For debugging the cause(s) of long running split transactions it would be useful to know the distribution of time spent in all of the phases of the split transaction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12424) Finer grained logging and metrics for split transactions
[ https://issues.apache.org/jira/browse/HBASE-12424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202785#comment-14202785 ] Hudson commented on HBASE-12424: FAILURE: Integrated in HBase-0.98 #661 (See [https://builds.apache.org/job/HBase-0.98/661/]) HBASE-12424 Finer grained logging and metrics for split transactions (apurtell: rev 60fb3530364364202235b3c40bdf55ff1ea459a8) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSource.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java Finer grained logging and metrics for split transactions Key: HBASE-12424 URL: https://issues.apache.org/jira/browse/HBASE-12424 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 0001-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, 0002-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, 0003-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, HBASE-12424-0.98.patch, HBASE-12424.patch, HBASE-12424.patch, HBASE-12424.patch, HowHBaseRegionSplitsareImplemented.pdf A split transaction is a complex orchestration of activity between the RegionServer, Master, ZooKeeper, and HDFS NameNode. We have some visibility into the time taken by various phases of the split transaction in the logs. We will see Starting split of region $PARENT before the transaction begins, before the parent is offlined. Later we will see Opening $DAUGHTER as one of the last steps in the transaction, this is after the parent has been flushed, offlined, and closed. Finally Region split, hbase:meta updated, and report to master ... Split took $TIME after all steps are complete and including the total running time of the transaction. For debugging the cause(s) of long running split transactions it would be useful to know the distribution of time spent in all of the phases of the split transaction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12445) hbase is removing all remaining cells immediately after the cell marked with marker = KeyValue.Type.DeleteColumn via PUT
sri created HBASE-12445: --- Summary: hbase is removing all remaining cells immediately after the cell marked with marker = KeyValue.Type.DeleteColumn via PUT Key: HBASE-12445 URL: https://issues.apache.org/jira/browse/HBASE-12445 Project: HBase Issue Type: Bug Reporter: sri Code executed: {code} @Test public void testHbasePutDeleteCell() throws Exception { TableName tableName = TableName.valueOf(my_test); Configuration configuration = HBaseConfiguration.create(); HTableInterface table = new HTable(configuration, tableName); final String rowKey = 12345; final byte[] familly = Bytes.toBytes(default); // put one row Put put = new Put(rowKey); put.add(family, Bytes.toBytes(A), Bytes.toBytes(a)); put.add(family, Bytes.toBytes(B), Bytes.toBytes(b)); put.add(family, Bytes.toBytes(C), Bytes.toBytes(c)); put.add(family, Bytes.toBytes(D), Bytes.toBytes(d)); table.put(put); // get row back and assert the values Get get = new Get(rowKey); Result result = table.get(get); assertTrue(Column A value should be a, Bytes.toString(result.getValue(family, Bytes.toBytes(A))).equals(a)); assertTrue(Column B value should be b, Bytes.toString(result.getValue(family, Bytes.toBytes(B))).equals(b)); assertTrue(Column C value should be c, Bytes.toString(result.getValue(family, Bytes.toBytes(C))).equals(c)); assertTrue(Column D value should be d, Bytes.toString(result.getValue(family, Bytes.toBytes(D))).equals(d)); // put the same row again with C column deleted put = new Put(rowKey); put.add(family, Bytes.toBytes(A), Bytes.toBytes(a1)); put.add(family, Bytes.toBytes(B), Bytes.toBytes(b1)); KeyValue marker = new KeyValue(rowKey, family, Bytes.toBytes(C), HConstants.LATEST_TIMESTAMP, KeyValue.Type.DeleteColumn); put.add(marker); put.add(family, Bytes.toBytes(D), Bytes.toBytes(d1)); table.put(put); // get row back and assert the values get = new Get(rowKey); result = table.get(get); assertTrue(Column A value should be a1, Bytes.toString(result.getValue(family, Bytes.toBytes(A))).equals(a1)); assertTrue(Column B value should be b1, Bytes.toString(result.getValue(family, Bytes.toBytes(B))).equals(b1)); assertTrue(Column C should not exist, result.getValue(family, Bytes.toBytes(C)) == null); assertTrue(Column D value should be d1, Bytes.toString(result.getValue(family, Bytes.toBytes(D))).equals(d1)); } {code} This assertion fails, the cell D is also deleted -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12432) RpcRetryingCaller should log after fixed number of retries like AsyncProcess
[ https://issues.apache.org/jira/browse/HBASE-12432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202818#comment-14202818 ] Hudson commented on HBASE-12432: SUCCESS: Integrated in HBase-TRUNK #5754 (See [https://builds.apache.org/job/HBase-TRUNK/5754/]) HBASE-12432 RpcRetryingCaller should log after fixed number of retries like AsyncProcess (apurtell: rev fb1af86ee1700ca1e6817c0c988ec9d5da1215d2) * hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestFastFailWithoutTestUtil.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCaller.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java * hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncProcess.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerFactory.java RpcRetryingCaller should log after fixed number of retries like AsyncProcess Key: HBASE-12432 URL: https://issues.apache.org/jira/browse/HBASE-12432 Project: HBase Issue Type: Improvement Components: Client Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12432.00-0.98.patch, HBASE-12432.00.patch, HBASE-12432.01-0.98.patch, HBASE-12432.01.patch Scanner retry is handled by RpcRetryingCaller. This is different from multi, which is handled by AsyncProcess. AsyncProcess will start logging operation status after hbase.client.start.log.errors.counter retries have been attempted. Let's bring the same functionality over to Scanner path. Noticed this while debugging IntegrationTestMTTR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12424) Finer grained logging and metrics for split transactions
[ https://issues.apache.org/jira/browse/HBASE-12424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202819#comment-14202819 ] Hudson commented on HBASE-12424: SUCCESS: Integrated in HBase-TRUNK #5754 (See [https://builds.apache.org/job/HBase-TRUNK/5754/]) HBASE-12424 Finer grained logging and metrics for split transactions (apurtell: rev 7718390703fb3c193ea58d5287250e14002e9852) * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSource.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerSourceImpl.java Finer grained logging and metrics for split transactions Key: HBASE-12424 URL: https://issues.apache.org/jira/browse/HBASE-12424 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: 0001-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, 0002-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, 0003-HBASE-12424-Finer-grained-logging-and-metrics-for-sp.patch, HBASE-12424-0.98.patch, HBASE-12424.patch, HBASE-12424.patch, HBASE-12424.patch, HowHBaseRegionSplitsareImplemented.pdf A split transaction is a complex orchestration of activity between the RegionServer, Master, ZooKeeper, and HDFS NameNode. We have some visibility into the time taken by various phases of the split transaction in the logs. We will see Starting split of region $PARENT before the transaction begins, before the parent is offlined. Later we will see Opening $DAUGHTER as one of the last steps in the transaction, this is after the parent has been flushed, offlined, and closed. Finally Region split, hbase:meta updated, and report to master ... Split took $TIME after all steps are complete and including the total running time of the transaction. For debugging the cause(s) of long running split transactions it would be useful to know the distribution of time spent in all of the phases of the split transaction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12445) hbase is removing all remaining cells immediately after the cell marked with marker = KeyValue.Type.DeleteColumn via PUT
[ https://issues.apache.org/jira/browse/HBASE-12445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sri updated HBASE-12445: Attachment: TestPutAfterDeleteColumn.java hbase is removing all remaining cells immediately after the cell marked with marker = KeyValue.Type.DeleteColumn via PUT Key: HBASE-12445 URL: https://issues.apache.org/jira/browse/HBASE-12445 Project: HBase Issue Type: Bug Reporter: sri Attachments: TestPutAfterDeleteColumn.java Code executed: {code} @Test public void testHbasePutDeleteCell() throws Exception { TableName tableName = TableName.valueOf(my_test); Configuration configuration = HBaseConfiguration.create(); HTableInterface table = new HTable(configuration, tableName); final String rowKey = 12345; final byte[] familly = Bytes.toBytes(default); // put one row Put put = new Put(rowKey); put.add(family, Bytes.toBytes(A), Bytes.toBytes(a)); put.add(family, Bytes.toBytes(B), Bytes.toBytes(b)); put.add(family, Bytes.toBytes(C), Bytes.toBytes(c)); put.add(family, Bytes.toBytes(D), Bytes.toBytes(d)); table.put(put); // get row back and assert the values Get get = new Get(rowKey); Result result = table.get(get); assertTrue(Column A value should be a, Bytes.toString(result.getValue(family, Bytes.toBytes(A))).equals(a)); assertTrue(Column B value should be b, Bytes.toString(result.getValue(family, Bytes.toBytes(B))).equals(b)); assertTrue(Column C value should be c, Bytes.toString(result.getValue(family, Bytes.toBytes(C))).equals(c)); assertTrue(Column D value should be d, Bytes.toString(result.getValue(family, Bytes.toBytes(D))).equals(d)); // put the same row again with C column deleted put = new Put(rowKey); put.add(family, Bytes.toBytes(A), Bytes.toBytes(a1)); put.add(family, Bytes.toBytes(B), Bytes.toBytes(b1)); KeyValue marker = new KeyValue(rowKey, family, Bytes.toBytes(C), HConstants.LATEST_TIMESTAMP, KeyValue.Type.DeleteColumn); put.add(marker); put.add(family, Bytes.toBytes(D), Bytes.toBytes(d1)); table.put(put); // get row back and assert the values get = new Get(rowKey); result = table.get(get); assertTrue(Column A value should be a1, Bytes.toString(result.getValue(family, Bytes.toBytes(A))).equals(a1)); assertTrue(Column B value should be b1, Bytes.toString(result.getValue(family, Bytes.toBytes(B))).equals(b1)); assertTrue(Column C should not exist, result.getValue(family, Bytes.toBytes(C)) == null); assertTrue(Column D value should be d1, Bytes.toString(result.getValue(family, Bytes.toBytes(D))).equals(d1)); } {code} This assertion fails, the cell D is also deleted -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11788) hbase is not deleting the cell when a Put with a KeyValue, KeyValue.Type.Delete is submitted
[ https://issues.apache.org/jira/browse/HBASE-11788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202823#comment-14202823 ] sri commented on HBASE-11788: - Andrew, Created a new issue for this and linked to this bug. HBASE-12445 hbase is removing all remaining cells immediately after the cell marked with marker = KeyValue.Type.DeleteColumn via PUT. Thanks Sri Bora hbase is not deleting the cell when a Put with a KeyValue, KeyValue.Type.Delete is submitted Key: HBASE-11788 URL: https://issues.apache.org/jira/browse/HBASE-11788 Project: HBase Issue Type: Bug Affects Versions: 0.99.0, 0.96.1.1, 0.98.5, 2.0.0 Environment: Cloudera CDH 5.1.x Reporter: Cristian Armaselu Assignee: Srikanth Srungarapu Fix For: 0.99.0, 2.0.0, 0.98.6 Attachments: HBASE-11788-master.patch, HBASE-11788-master_v2.patch, TestPutAfterDeleteColumn.java, TestPutWithDelete.java Code executed: {code} @Test public void testHbasePutDeleteCell() throws Exception { TableName tableName = TableName.valueOf(my_test); Configuration configuration = HBaseConfiguration.create(); HTableInterface table = new HTable(configuration, tableName); final String rowKey = 12345; final byte[] familly = Bytes.toBytes(default); // put one row Put put = new Put(Bytes.toBytes(rowKey)); put.add(familly, Bytes.toBytes(A), Bytes.toBytes(a)); put.add(familly, Bytes.toBytes(B), Bytes.toBytes(b)); put.add(familly, Bytes.toBytes(C), Bytes.toBytes(c)); table.put(put); // get row back and assert the values Get get = new Get(Bytes.toBytes(rowKey)); Result result = table.get(get); Assert.isTrue(Bytes.toString(result.getValue(familly, Bytes.toBytes(A))).equals(a), Column A value should be a); Assert.isTrue(Bytes.toString(result.getValue(familly, Bytes.toBytes(B))).equals(b), Column B value should be b); Assert.isTrue(Bytes.toString(result.getValue(familly, Bytes.toBytes(C))).equals(c), Column C value should be c); // put the same row again with C column deleted put = new Put(Bytes.toBytes(rowKey)); put.add(familly, Bytes.toBytes(A), Bytes.toBytes(a)); put.add(familly, Bytes.toBytes(B), Bytes.toBytes(b)); put.add(new KeyValue(Bytes.toBytes(rowKey), familly, Bytes.toBytes(C), HConstants.LATEST_TIMESTAMP, KeyValue.Type.DeleteColumn)); table.put(put); // get row back and assert the values get = new Get(Bytes.toBytes(rowKey)); result = table.get(get); Assert.isTrue(Bytes.toString(result.getValue(familly, Bytes.toBytes(A))).equals(a), Column A value should be a); Assert.isTrue(Bytes.toString(result.getValue(familly, Bytes.toBytes(B))).equals(b), Column A value should be b); Assert.isTrue(result.getValue(familly, Bytes.toBytes(C)) == null, Column C should not exists); } {code} This assertion fails, the cell is not deleted but rather the value is empty: {code} hbase(main):029:0 scan 'my_test' ROW COLUMN+CELL 12345column=default:A, timestamp=1408473082290, value=a 12345column=default:B, timestamp=1408473082290, value=b 12345column=default:C, timestamp=1408473082290, value= {code} This behavior is different than previous 4.8.x Cloudera version and is currently corrupting all hive queries involving is null or is not null operators on the columns mapped to hbase -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12279) Generated thrift files were generated with the wrong parameters
[ https://issues.apache.org/jira/browse/HBASE-12279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202825#comment-14202825 ] Niels Basjes commented on HBASE-12279: -- Ehhh, does it or does it not increase the javac compiler warnings? Hickup in Jenkins? {code}-1 javac. The applied patch generated 108 javac compiler warnings (more than the trunk's current 102 warnings). +1 javac. The applied patch does not increase the total number of javac compiler warnings.{code} Generated thrift files were generated with the wrong parameters --- Key: HBASE-12279 URL: https://issues.apache.org/jira/browse/HBASE-12279 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.98.0, 0.99.0 Reporter: Niels Basjes Fix For: 2.0.0, 0.98.8, 0.94.26, 0.99.2 Attachments: HBASE-12279-2014-10-16-v1.patch, HBASE-12279-2014-11-07-v2.patch It turns out that the java code generated from the thrift files have been generated with the wrong settings. Instead of the documented ([thrift|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift/package-summary.html], [thrift2|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift2/package-summary.html]) {code} thrift -strict --gen java:hashcode {code} the current files seem to be generated instead with {code} thrift -strict --gen java {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12438) Add -Dsurefire.rerunFailingTestsCount=2 to patch build runs so flakies get rerun
[ https://issues.apache.org/jira/browse/HBASE-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202845#comment-14202845 ] Manukranth Kolloju commented on HBASE-12438: Does Hudson report that it has rerun the tests or it just gives us a blue build without any hint of failing tests? Add -Dsurefire.rerunFailingTestsCount=2 to patch build runs so flakies get rerun - Key: HBASE-12438 URL: https://issues.apache.org/jira/browse/HBASE-12438 Project: HBase Issue Type: Task Components: test Reporter: stack Assignee: stack Fix For: 2.0.0 Attachments: 12438.txt Tripped over this config today: -Dsurefire.rerunFailingTestsCount= I made a test fail, then pass, and I got this output: {code} Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Flakes: 1 {code} Notice the 'Flakes' addition on the far-right. Let me enable this on hadoopqa builds. Hopefully will help make it so new contribs are not frightened off by flakies thinking their patch the cause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12438) Add -Dsurefire.rerunFailingTestsCount=2 to patch build runs so flakies get rerun
[ https://issues.apache.org/jira/browse/HBASE-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202849#comment-14202849 ] Dima Spivak commented on HBASE-12438: - Out of the box, Jenkins doesn't treat flakey tests in a special way, but [there is a plugin|https://wiki.jenkins-ci.org/display/JENKINS/Flaky+Test+Handler+Plugin] that can change this. Perhaps worth checking with our friends at b.o.a. to get this set up, [~stack]? Add -Dsurefire.rerunFailingTestsCount=2 to patch build runs so flakies get rerun - Key: HBASE-12438 URL: https://issues.apache.org/jira/browse/HBASE-12438 Project: HBase Issue Type: Task Components: test Reporter: stack Assignee: stack Fix For: 2.0.0 Attachments: 12438.txt Tripped over this config today: -Dsurefire.rerunFailingTestsCount= I made a test fail, then pass, and I got this output: {code} Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Flakes: 1 {code} Notice the 'Flakes' addition on the far-right. Let me enable this on hadoopqa builds. Hopefully will help make it so new contribs are not frightened off by flakies thinking their patch the cause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12446) [list | abort] Compactions
Manukranth Kolloju created HBASE-12446: -- Summary: [list | abort] Compactions Key: HBASE-12446 URL: https://issues.apache.org/jira/browse/HBASE-12446 Project: HBase Issue Type: New Feature Affects Versions: 1.0.0 Reporter: Manukranth Kolloju Fix For: 1.0.0 In some cases, we would need to quickly reduce load on a server without killing it. Compactions is one of the critical processes which takes up a lot of CPU and disk IOPS. We should have a way to list compactions given the regionserver and abort compactions given regionserver and compaction id. And additionally abort all compactions. Pardon me if there was already a similar Jira, I'd be happy to merge this there. The current code handles interrupts. We should be able to interrupt the thread that is performing the compaction and abort it from either the UI or from the command line. This Jira is targeted to expose an admin function to perform such a task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12445) hbase is removing all remaining cells immediately after the cell marked with marker = KeyValue.Type.DeleteColumn via PUT
[ https://issues.apache.org/jira/browse/HBASE-12445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202870#comment-14202870 ] Ted Yu commented on HBASE-12445: Can you formulate the new test as a patch ? hbase is removing all remaining cells immediately after the cell marked with marker = KeyValue.Type.DeleteColumn via PUT Key: HBASE-12445 URL: https://issues.apache.org/jira/browse/HBASE-12445 Project: HBase Issue Type: Bug Reporter: sri Attachments: TestPutAfterDeleteColumn.java Code executed: {code} @Test public void testHbasePutDeleteCell() throws Exception { TableName tableName = TableName.valueOf(my_test); Configuration configuration = HBaseConfiguration.create(); HTableInterface table = new HTable(configuration, tableName); final String rowKey = 12345; final byte[] familly = Bytes.toBytes(default); // put one row Put put = new Put(rowKey); put.add(family, Bytes.toBytes(A), Bytes.toBytes(a)); put.add(family, Bytes.toBytes(B), Bytes.toBytes(b)); put.add(family, Bytes.toBytes(C), Bytes.toBytes(c)); put.add(family, Bytes.toBytes(D), Bytes.toBytes(d)); table.put(put); // get row back and assert the values Get get = new Get(rowKey); Result result = table.get(get); assertTrue(Column A value should be a, Bytes.toString(result.getValue(family, Bytes.toBytes(A))).equals(a)); assertTrue(Column B value should be b, Bytes.toString(result.getValue(family, Bytes.toBytes(B))).equals(b)); assertTrue(Column C value should be c, Bytes.toString(result.getValue(family, Bytes.toBytes(C))).equals(c)); assertTrue(Column D value should be d, Bytes.toString(result.getValue(family, Bytes.toBytes(D))).equals(d)); // put the same row again with C column deleted put = new Put(rowKey); put.add(family, Bytes.toBytes(A), Bytes.toBytes(a1)); put.add(family, Bytes.toBytes(B), Bytes.toBytes(b1)); KeyValue marker = new KeyValue(rowKey, family, Bytes.toBytes(C), HConstants.LATEST_TIMESTAMP, KeyValue.Type.DeleteColumn); put.add(marker); put.add(family, Bytes.toBytes(D), Bytes.toBytes(d1)); table.put(put); // get row back and assert the values get = new Get(rowKey); result = table.get(get); assertTrue(Column A value should be a1, Bytes.toString(result.getValue(family, Bytes.toBytes(A))).equals(a1)); assertTrue(Column B value should be b1, Bytes.toString(result.getValue(family, Bytes.toBytes(B))).equals(b1)); assertTrue(Column C should not exist, result.getValue(family, Bytes.toBytes(C)) == null); assertTrue(Column D value should be d1, Bytes.toString(result.getValue(family, Bytes.toBytes(D))).equals(d1)); } {code} This assertion fails, the cell D is also deleted -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12432) RpcRetryingCaller should log after fixed number of retries like AsyncProcess
[ https://issues.apache.org/jira/browse/HBASE-12432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202878#comment-14202878 ] Hudson commented on HBASE-12432: SUCCESS: Integrated in HBase-1.0 #445 (See [https://builds.apache.org/job/HBase-1.0/445/]) HBASE-12432 RpcRetryingCaller should log after fixed number of retries like AsyncProcess (apurtell: rev df3ba6ea4b33962145803678d369c476b6ba5817) * hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestFastFailWithoutTestUtil.java * hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncProcess.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerFactory.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCaller.java RpcRetryingCaller should log after fixed number of retries like AsyncProcess Key: HBASE-12432 URL: https://issues.apache.org/jira/browse/HBASE-12432 Project: HBase Issue Type: Improvement Components: Client Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12432.00-0.98.patch, HBASE-12432.00.patch, HBASE-12432.01-0.98.patch, HBASE-12432.01.patch Scanner retry is handled by RpcRetryingCaller. This is different from multi, which is handled by AsyncProcess. AsyncProcess will start logging operation status after hbase.client.start.log.errors.counter retries have been attempted. Let's bring the same functionality over to Scanner path. Noticed this while debugging IntegrationTestMTTR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12447) Add support for setTimeRange for CopyTable, RowCounter and CellCounter
Esteban Gutierrez created HBASE-12447: - Summary: Add support for setTimeRange for CopyTable, RowCounter and CellCounter Key: HBASE-12447 URL: https://issues.apache.org/jira/browse/HBASE-12447 Project: HBase Issue Type: Improvement Reporter: Esteban Gutierrez Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-12447) Add support for setTimeRange for CopyTable, RowCounter and CellCounter
[ https://issues.apache.org/jira/browse/HBASE-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez reassigned HBASE-12447: - Assignee: Esteban Gutierrez Add support for setTimeRange for CopyTable, RowCounter and CellCounter -- Key: HBASE-12447 URL: https://issues.apache.org/jira/browse/HBASE-12447 Project: HBase Issue Type: Improvement Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Minor It would be nice to copy a subset of data to a remote cluster based on time range or just count the rows/cells also for a time range. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12448) Fix rate reporting in compaction progress DEBUG logging
Andrew Purtell created HBASE-12448: -- Summary: Fix rate reporting in compaction progress DEBUG logging Key: HBASE-12448 URL: https://issues.apache.org/jira/browse/HBASE-12448 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Fix For: 2.0.0, 0.98.8, 0.99.2 HBASE-11702 introduced rate reporting at DEBUG level for long running compactions but failed to align bytesWritten with the reporting interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12440) Region may remain offline on clean startup under certain race condition
[ https://issues.apache.org/jira/browse/HBASE-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202916#comment-14202916 ] Andrew Purtell commented on HBASE-12440: All o.a.h.h.master.** and o.a.h.h.regionserver.** tests pass on 0.98 and branch-1. TestAssignmentManagerOnCluster passes 10 out of 10 times on 0.98 and branch-1. Going to push this to both branches shortly Region may remain offline on clean startup under certain race condition --- Key: HBASE-12440 URL: https://issues.apache.org/jira/browse/HBASE-12440 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 0.98.8, 0.99.1 Attachments: HBASE-12440-0.98.patch, HBASE-12440-0.98_v2.patch, HBASE-12440-branch-1.patch Saw this in prod some time back with zk assignment On clean startup, while master was doing bulk assign while one of the region servers dies. The bulk assigner then tried to assign it individually using AssignCallable. The AssignCallable does a forceStateToOffline() and skips assigning as it wants the SSH to do the assignment {code} 2014-10-16 16:05:23,593 DEBUG master.AssignmentManager [AM.-pool1-t1] : Offline sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., no need to unassign since it's on a dead server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 2014-10-16 16:05:23,593 INFO master.RegionStates [AM.-pool1-t1] : Transition {1f1620174d2542fe7d5b034f3311c3a8 state=PENDING_OPEN, ts=1413475519482, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} to {1f1620174d2542fe7d5b034f3311c3a8 state=OFFLINE, ts=1413475523593, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} 2014-10-16 16:05:23,598 INFO master.AssignmentManager [AM.-pool1-t1] : Skip assigning sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., it is on a dead but not processed yet server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 {code} But the SSH wont assign as the region is offline but not in transition {code} 2014-10-16 16:05:24,606 INFO handler.ServerShutdownHandler [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Reassigning 0 region(s) that gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 was carrying (and 0 regions(s) that were opening on this server) 2014-10-16 16:05:24,606 DEBUG master.DeadServer [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Finished processing gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 {code} In zk-less assignment, the bulk assigner invoking AssignCallable and the SSH may try to assign the region. But as they go through lock, only one will succeed and doesn't seem to be an issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12447) Add support for setTimeRange for RowCounter and CellCounter
[ https://issues.apache.org/jira/browse/HBASE-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez updated HBASE-12447: -- Description: It would be nice to count the rows/cells also for a time range. Copy Table already supports that. (was: It would be nice to copy a subset of data to a remote cluster based on time range or just count the rows/cells also for a time range.) Summary: Add support for setTimeRange for RowCounter and CellCounter (was: Add support for setTimeRange for CopyTable, RowCounter and CellCounter) Add support for setTimeRange for RowCounter and CellCounter --- Key: HBASE-12447 URL: https://issues.apache.org/jira/browse/HBASE-12447 Project: HBase Issue Type: Improvement Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Minor It would be nice to count the rows/cells also for a time range. Copy Table already supports that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12447) Add support for setTimeRange for CopyTable, RowCounter and CellCounter
[ https://issues.apache.org/jira/browse/HBASE-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez updated HBASE-12447: -- Description: It would be nice to copy a subset of data to a remote cluster based on time range or just count the rows/cells also for a time range. Add support for setTimeRange for CopyTable, RowCounter and CellCounter -- Key: HBASE-12447 URL: https://issues.apache.org/jira/browse/HBASE-12447 Project: HBase Issue Type: Improvement Reporter: Esteban Gutierrez Priority: Minor It would be nice to copy a subset of data to a remote cluster based on time range or just count the rows/cells also for a time range. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12432) RpcRetryingCaller should log after fixed number of retries like AsyncProcess
[ https://issues.apache.org/jira/browse/HBASE-12432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202944#comment-14202944 ] Hudson commented on HBASE-12432: FAILURE: Integrated in HBase-0.98 #662 (See [https://builds.apache.org/job/HBase-0.98/662/]) HBASE-12432 RpcRetryingCaller should log after fixed number of retries like AsyncProcess (apurtell: rev 2d9bb9d340eeef468f74500209ea2324d5988bb8) * hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncProcess.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCaller.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerFactory.java RpcRetryingCaller should log after fixed number of retries like AsyncProcess Key: HBASE-12432 URL: https://issues.apache.org/jira/browse/HBASE-12432 Project: HBase Issue Type: Improvement Components: Client Reporter: Nick Dimiduk Assignee: Nick Dimiduk Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12432.00-0.98.patch, HBASE-12432.00.patch, HBASE-12432.01-0.98.patch, HBASE-12432.01.patch Scanner retry is handled by RpcRetryingCaller. This is different from multi, which is handled by AsyncProcess. AsyncProcess will start logging operation status after hbase.client.start.log.errors.counter retries have been attempted. Let's bring the same functionality over to Scanner path. Noticed this while debugging IntegrationTestMTTR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-12440) Region may remain offline on clean startup under certain race condition
[ https://issues.apache.org/jira/browse/HBASE-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-12440. Resolution: Fixed Fix Version/s: (was: 0.99.1) 0.99.2 Hadoop Flags: Reviewed Region may remain offline on clean startup under certain race condition --- Key: HBASE-12440 URL: https://issues.apache.org/jira/browse/HBASE-12440 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 0.98.8, 0.99.2 Attachments: HBASE-12440-0.98.patch, HBASE-12440-0.98_v2.patch, HBASE-12440-branch-1.patch Saw this in prod some time back with zk assignment On clean startup, while master was doing bulk assign while one of the region servers dies. The bulk assigner then tried to assign it individually using AssignCallable. The AssignCallable does a forceStateToOffline() and skips assigning as it wants the SSH to do the assignment {code} 2014-10-16 16:05:23,593 DEBUG master.AssignmentManager [AM.-pool1-t1] : Offline sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., no need to unassign since it's on a dead server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 2014-10-16 16:05:23,593 INFO master.RegionStates [AM.-pool1-t1] : Transition {1f1620174d2542fe7d5b034f3311c3a8 state=PENDING_OPEN, ts=1413475519482, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} to {1f1620174d2542fe7d5b034f3311c3a8 state=OFFLINE, ts=1413475523593, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} 2014-10-16 16:05:23,598 INFO master.AssignmentManager [AM.-pool1-t1] : Skip assigning sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., it is on a dead but not processed yet server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 {code} But the SSH wont assign as the region is offline but not in transition {code} 2014-10-16 16:05:24,606 INFO handler.ServerShutdownHandler [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Reassigning 0 region(s) that gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 was carrying (and 0 regions(s) that were opening on this server) 2014-10-16 16:05:24,606 DEBUG master.DeadServer [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Finished processing gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 {code} In zk-less assignment, the bulk assigner invoking AssignCallable and the SSH may try to assign the region. But as they go through lock, only one will succeed and doesn't seem to be an issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12449) Use the max timestamp of current or old cell's timestamp in HRegion.append()
Enis Soztutar created HBASE-12449: - Summary: Use the max timestamp of current or old cell's timestamp in HRegion.append() Key: HBASE-12449 URL: https://issues.apache.org/jira/browse/HBASE-12449 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 2.0.0, 0.98.8, 0.99.2 We have observed an issue in SLES clusters where the system timestamp regularly goes back in time. This happens frequently enough to cause test failures when LTT is used with updater. Everytime an mutation is performed, the updater creates a string in the form #column:mutation_type and appends it to the column mutate_info. It seems that when the test fails, it is always the case that the mutate_info information for that particular column reported is not there in the column mutate_info. However, according to the MultiThreadedUpdater source code, if a row gets updated, all the columns will be mutated. So if a row contains 15 columns, all 15 should appear in mutate_info. When the test fails though, we get an exception like: {code} 2014-11-02 04:31:12,018 ERROR [HBaseReaderThread_7] util.MultiThreadedAction: Error checking data for key [b0485292cde20d8a76cca37410a9f115-23787], column family [test_cf], column [8], mutation [null]; value of length 818 {code} For the same row, the mutate info DOES NOT contain columns 8 (and 9) while it should: {code} test_cf:mutate_info timestamp=1414902651388, value=#increment:1#0:0#1:0#10:3#11:0#12:3#13:0#14:0#15:0#16:2#2:3#3:0#4:2#5:3#6:0#7:0 {code} Further debugging led to finding the root cause where It seems that on SUSE System.currentTimeMillis() can go back in time freely (especially when run in a virtualized env like EC2), and actually happens very frequently. This is from a debug log that was put in place: {code} 2014-11-04 01:16:05,025 INFO [B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765025/Put/mvcc=8239/#increment:1 2014-11-04 01:16:05,038 INFO [B.DefaultRpcServer.handler=19,queue=1,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765038/Put/mvcc=8255/#increment:1#0:3 2014-11-04 01:16:05,047 INFO [B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765047/Put/mvcc=8265/#increment:1#0:3#1:3 2014-11-04 01:16:05,057 INFO [B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765056/Put/mvcc=8274/#increment:1#0:3#1:3#10:2 2014-11-04 01:16:05,061 INFO [B.DefaultRpcServer.handler=6,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765061/Put/mvcc=8278/#increment:1#0:3#1:3#10:2#11:0 2014-11-04 01:16:05,070 INFO [B.DefaultRpcServer.handler=20,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765070/Put/mvcc=8285/#increment:1#0:3#1:3#10:2#11:0#12:3 2014-11-04 01:16:05,076 INFO [B.DefaultRpcServer.handler=3,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765076/Put/mvcc=8289/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0 2014-11-04 01:16:05,084 INFO [B.DefaultRpcServer.handler=2,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765084/Put/mvcc=8293/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0 2014-11-04 01:16:05,090 INFO [B.DefaultRpcServer.handler=7,queue=1,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765090/Put/mvcc=8297/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0 2014-11-04 01:16:05,097 INFO [B.DefaultRpcServer.handler=0,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765097/Put/mvcc=8301/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0 2014-11-04 01:16:05,100 INFO [B.DefaultRpcServer.handler=14,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765100/Put/mvcc=8303/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0 2014-11-04 01:16:05,103 INFO [B.DefaultRpcServer.handler=11,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765103/Put/mvcc=8305/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0 2014-11-04 01:16:05,110 INFO
[jira] [Updated] (HBASE-12448) Fix rate reporting in compaction progress DEBUG logging
[ https://issues.apache.org/jira/browse/HBASE-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12448: --- Attachment: HBASE-12448-0.98.patch Fix rate reporting in compaction progress DEBUG logging --- Key: HBASE-12448 URL: https://issues.apache.org/jira/browse/HBASE-12448 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12448-0.98.patch HBASE-11702 introduced rate reporting at DEBUG level for long running compactions but failed to align bytesWritten with the reporting interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12448) Fix rate reporting in compaction progress DEBUG logging
[ https://issues.apache.org/jira/browse/HBASE-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12448: --- Attachment: HBASE-12448.patch Fix rate reporting in compaction progress DEBUG logging --- Key: HBASE-12448 URL: https://issues.apache.org/jira/browse/HBASE-12448 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12448-0.98.patch, HBASE-12448.patch HBASE-11702 introduced rate reporting at DEBUG level for long running compactions but failed to align bytesWritten with the reporting interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12448) Fix rate reporting in compaction progress DEBUG logging
[ https://issues.apache.org/jira/browse/HBASE-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12448: --- Status: Patch Available (was: Open) Fix rate reporting in compaction progress DEBUG logging --- Key: HBASE-12448 URL: https://issues.apache.org/jira/browse/HBASE-12448 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12448-0.98.patch, HBASE-12448.patch HBASE-11702 introduced rate reporting at DEBUG level for long running compactions but failed to align bytesWritten with the reporting interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12448) Fix rate reporting in compaction progress DEBUG logging
[ https://issues.apache.org/jira/browse/HBASE-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202964#comment-14202964 ] Andrew Purtell commented on HBASE-12448: I was trying to save a local variable before but messed up. Just add one for tracking bytes written for the compaction progress report, if DEBUG logging is enabled. Also use EnvironmentEdgeManager#currentTime instead of System#currentTimeMillis. Fix rate reporting in compaction progress DEBUG logging --- Key: HBASE-12448 URL: https://issues.apache.org/jira/browse/HBASE-12448 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12448-0.98.patch, HBASE-12448.patch HBASE-11702 introduced rate reporting at DEBUG level for long running compactions but failed to align bytesWritten with the reporting interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12450) Unbalance chaos monkey might kill all region servers without starting them back
Virag Kothari created HBASE-12450: - Summary: Unbalance chaos monkey might kill all region servers without starting them back Key: HBASE-12450 URL: https://issues.apache.org/jira/browse/HBASE-12450 Project: HBase Issue Type: Bug Reporter: Virag Kothari Assignee: Virag Kothari Priority: Minor UnbalanceKillAndRebalanceAction does kill, balance and then start of region servers. But if the balance fails exception is thrown causing the region servers to not start. For me, the balance always kept on failing with socket timeout (default 1 min) as master runs one iteration of balance for 5 mins (default config). Eventually all servers are killed but never started back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12449) Use the max timestamp of current or old cell's timestamp in HRegion.append()
[ https://issues.apache.org/jira/browse/HBASE-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12449: --- Fix Version/s: (was: 0.98.8) 0.98.9 Use the max timestamp of current or old cell's timestamp in HRegion.append() Key: HBASE-12449 URL: https://issues.apache.org/jira/browse/HBASE-12449 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 2.0.0, 0.98.9, 0.99.2 We have observed an issue in SLES clusters where the system timestamp regularly goes back in time. This happens frequently enough to cause test failures when LTT is used with updater. Everytime an mutation is performed, the updater creates a string in the form #column:mutation_type and appends it to the column mutate_info. It seems that when the test fails, it is always the case that the mutate_info information for that particular column reported is not there in the column mutate_info. However, according to the MultiThreadedUpdater source code, if a row gets updated, all the columns will be mutated. So if a row contains 15 columns, all 15 should appear in mutate_info. When the test fails though, we get an exception like: {code} 2014-11-02 04:31:12,018 ERROR [HBaseReaderThread_7] util.MultiThreadedAction: Error checking data for key [b0485292cde20d8a76cca37410a9f115-23787], column family [test_cf], column [8], mutation [null]; value of length 818 {code} For the same row, the mutate info DOES NOT contain columns 8 (and 9) while it should: {code} test_cf:mutate_info timestamp=1414902651388, value=#increment:1#0:0#1:0#10:3#11:0#12:3#13:0#14:0#15:0#16:2#2:3#3:0#4:2#5:3#6:0#7:0 {code} Further debugging led to finding the root cause where It seems that on SUSE System.currentTimeMillis() can go back in time freely (especially when run in a virtualized env like EC2), and actually happens very frequently. This is from a debug log that was put in place: {code} 2014-11-04 01:16:05,025 INFO [B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765025/Put/mvcc=8239/#increment:1 2014-11-04 01:16:05,038 INFO [B.DefaultRpcServer.handler=19,queue=1,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765038/Put/mvcc=8255/#increment:1#0:3 2014-11-04 01:16:05,047 INFO [B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765047/Put/mvcc=8265/#increment:1#0:3#1:3 2014-11-04 01:16:05,057 INFO [B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765056/Put/mvcc=8274/#increment:1#0:3#1:3#10:2 2014-11-04 01:16:05,061 INFO [B.DefaultRpcServer.handler=6,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765061/Put/mvcc=8278/#increment:1#0:3#1:3#10:2#11:0 2014-11-04 01:16:05,070 INFO [B.DefaultRpcServer.handler=20,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765070/Put/mvcc=8285/#increment:1#0:3#1:3#10:2#11:0#12:3 2014-11-04 01:16:05,076 INFO [B.DefaultRpcServer.handler=3,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765076/Put/mvcc=8289/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0 2014-11-04 01:16:05,084 INFO [B.DefaultRpcServer.handler=2,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765084/Put/mvcc=8293/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0 2014-11-04 01:16:05,090 INFO [B.DefaultRpcServer.handler=7,queue=1,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765090/Put/mvcc=8297/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0 2014-11-04 01:16:05,097 INFO [B.DefaultRpcServer.handler=0,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765097/Put/mvcc=8301/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0 2014-11-04 01:16:05,100 INFO [B.DefaultRpcServer.handler=14,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765100/Put/mvcc=8303/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0 2014-11-04
[jira] [Updated] (HBASE-12450) Unbalance chaos monkey might kill all region servers without starting them back
[ https://issues.apache.org/jira/browse/HBASE-12450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virag Kothari updated HBASE-12450: -- Attachment: HBASE-12450.patch Attached is patch for master which just logs a warning if the balance fails. One unrelated log statement change Unbalance chaos monkey might kill all region servers without starting them back --- Key: HBASE-12450 URL: https://issues.apache.org/jira/browse/HBASE-12450 Project: HBase Issue Type: Bug Reporter: Virag Kothari Assignee: Virag Kothari Priority: Minor Fix For: 0.98.8, 0.99.2 Attachments: HBASE-12450.patch UnbalanceKillAndRebalanceAction does kill, balance and then start of region servers. But if the balance fails exception is thrown causing the region servers to not start. For me, the balance always kept on failing with socket timeout (default 1 min) as master runs one iteration of balance for 5 mins (default config). Eventually all servers are killed but never started back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12450) Unbalance chaos monkey might kill all region servers without starting them back
[ https://issues.apache.org/jira/browse/HBASE-12450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virag Kothari updated HBASE-12450: -- Fix Version/s: 0.99.2 0.98.8 Unbalance chaos monkey might kill all region servers without starting them back --- Key: HBASE-12450 URL: https://issues.apache.org/jira/browse/HBASE-12450 Project: HBase Issue Type: Bug Reporter: Virag Kothari Assignee: Virag Kothari Priority: Minor Fix For: 0.98.8, 0.99.2 Attachments: HBASE-12450.patch UnbalanceKillAndRebalanceAction does kill, balance and then start of region servers. But if the balance fails exception is thrown causing the region servers to not start. For me, the balance always kept on failing with socket timeout (default 1 min) as master runs one iteration of balance for 5 mins (default config). Eventually all servers are killed but never started back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12450) Unbalance chaos monkey might kill all region servers without starting them back
[ https://issues.apache.org/jira/browse/HBASE-12450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202982#comment-14202982 ] Andrew Purtell commented on HBASE-12450: +1 Unbalance chaos monkey might kill all region servers without starting them back --- Key: HBASE-12450 URL: https://issues.apache.org/jira/browse/HBASE-12450 Project: HBase Issue Type: Bug Reporter: Virag Kothari Assignee: Virag Kothari Priority: Minor Fix For: 0.98.8, 0.99.2 Attachments: HBASE-12450.patch UnbalanceKillAndRebalanceAction does kill, balance and then start of region servers. But if the balance fails exception is thrown causing the region servers to not start. For me, the balance always kept on failing with socket timeout (default 1 min) as master runs one iteration of balance for 5 mins (default config). Eventually all servers are killed but never started back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12450) Unbalance chaos monkey might kill all region servers without starting them back
[ https://issues.apache.org/jira/browse/HBASE-12450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12450: --- Fix Version/s: 2.0.0 Unbalance chaos monkey might kill all region servers without starting them back --- Key: HBASE-12450 URL: https://issues.apache.org/jira/browse/HBASE-12450 Project: HBase Issue Type: Bug Reporter: Virag Kothari Assignee: Virag Kothari Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12450.patch UnbalanceKillAndRebalanceAction does kill, balance and then start of region servers. But if the balance fails exception is thrown causing the region servers to not start. For me, the balance always kept on failing with socket timeout (default 1 min) as master runs one iteration of balance for 5 mins (default config). Eventually all servers are killed but never started back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12346) Scan's default auths behavior under Visibility labels
[ https://issues.apache.org/jira/browse/HBASE-12346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12346: --- Fix Version/s: (was: 0.98.8) 0.98.9 2.0.0 Scan's default auths behavior under Visibility labels - Key: HBASE-12346 URL: https://issues.apache.org/jira/browse/HBASE-12346 Project: HBase Issue Type: Bug Components: API, security Affects Versions: 0.98.7, 0.99.1 Reporter: Jerry He Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-12346-master-v2.patch, HBASE-12346-master-v3.patch, HBASE-12346-master.patch In Visibility Labels security, a set of labels (auths) are administered and associated with a user. A user can normally only see cell data during scan that are part of the user's label set (auths). Scan uses setAuthorizations to indicates its wants to use the auths to access the cells. Similarly in the shell: {code} scan 'table1', AUTHORIZATIONS = ['private'] {code} But it is a surprise to find that setAuthorizations seems to be 'mandatory' in the default visibility label security setting. Every scan needs to setAuthorizations before the scan can get any cells even the cells are under the labels the request user is part of. The following steps will illustrate the issue: Run as superuser. {code} 1. create a visibility label called 'private' 2. create 'table1' 3. put into 'table1' data and label the data as 'private' 4. set_auths 'user1', 'private' 5. grant 'user1', 'RW', 'table1' {code} Run as 'user1': {code} 1. scan 'table1' This show no cells. 2. scan 'table1', scan 'table1', AUTHORIZATIONS = ['private'] This will show all the data. {code} I am not sure if this is expected by design or a bug. But a more reasonable, more client application backward compatible, and less surprising default behavior should probably look like this: A scan's default auths, if its Authorizations attributes is not set explicitly, should be all the auths the request user is administered and allowed on the server. If scan.setAuthorizations is used, then the server further filter the auths during scan: use the input auths minus what is not in user's label set on the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12431) Use of getColumnLatestCell(byte[], int, int, byte[], int, int) is Not Thread Safe
[ https://issues.apache.org/jira/browse/HBASE-12431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12431: --- Fix Version/s: (was: 0.98.8) 0.98.9 Use of getColumnLatestCell(byte[], int, int, byte[], int, int) is Not Thread Safe - Key: HBASE-12431 URL: https://issues.apache.org/jira/browse/HBASE-12431 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.98.1 Reporter: Jonathan Jarvis Assignee: Jingcheng Du Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-12431-V2.diff, HBASE-12431-V3.diff, HBASE-12431.diff Result declares that it is NOT THREAD SAFE at the top of the source code, but one would assume that refers to many different threads accessing the same Result object. I've run into an issue when I have several different threads accessing their own Result object that runs into an issue because of use of common static member variable. I noticed the problem when I switched from: getColumnLatestCell(byte[], byte[]) to getColumnLatestCell(byte[], int, int, byte[], int, int) These methods call different binarySearch methods, the latter invoking: protected int binarySearch(final Cell [] kvs, 309 final byte [] family, final int foffset, final int flength, 310 final byte [] qualifier, final int qoffset, final int qlength) { This method utilizes a private static member variable called buffer If more than one thread is utilizing buffer you'll see unpredictable behavior unless you synchronize(Result.class) {}. If buffer is to remain a static variable, I would recommend changing it to a ThreadLocalbyte[] instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12425) Document the phases of the split transaction
[ https://issues.apache.org/jira/browse/HBASE-12425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12425: --- Fix Version/s: (was: 0.99.2) (was: 0.98.8) Document the phases of the split transaction Key: HBASE-12425 URL: https://issues.apache.org/jira/browse/HBASE-12425 Project: HBase Issue Type: Sub-task Components: documentation Reporter: Andrew Purtell Assignee: Misty Stanley-Jones Fix For: 2.0.0 See PDF document attached to parent issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12319) Inconsistencies during region recovery due to close/open of a region during recovery
[ https://issues.apache.org/jira/browse/HBASE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12319: --- Fix Version/s: (was: 0.98.8) 0.98.9 2.0.0 Inconsistencies during region recovery due to close/open of a region during recovery Key: HBASE-12319 URL: https://issues.apache.org/jira/browse/HBASE-12319 Project: HBase Issue Type: Bug Affects Versions: 0.98.7, 0.99.1 Reporter: Devaraj Das Assignee: Jeffrey Zhong Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-12319.patch In one of my test runs, I saw the following: {noformat} 2014-10-14 13:45:30,782 DEBUG [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore: loaded hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04, isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true 2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Found 3 recovered edits file(s) under hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d . . 2014-10-14 13:45:31,916 WARN [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Null or non-existent edits file: hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0198080 {noformat} The above logs is from a regionserver, say RS2. From the initial analysis it seemed like the master asked a certain regionserver to open the region (let's say RS1) and for some reason asked it to close soon after. The open was still proceeding on RS1 but the master reassigned the region to RS2. This also started the recovery but it ended up seeing an inconsistent view of the recovered-edits files (it reports missing files as per the logs above) since the first regionserver (RS1) deleted some files after it completed the recovery. When RS2 really opens the region, it might not see the recent data that was written by flushes on hor9n10 during the recovery process. Reads of that data would have inconsistencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12448) Fix rate reporting in compaction progress DEBUG logging
[ https://issues.apache.org/jira/browse/HBASE-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202990#comment-14202990 ] Lars Hofhansl commented on HBASE-12448: --- Can we write more than 2gb bytes in one minute? That'd just be 35.8mb/s, so I guess the answer is yes. So bytesWrittenInProgress should be a long. Fix rate reporting in compaction progress DEBUG logging --- Key: HBASE-12448 URL: https://issues.apache.org/jira/browse/HBASE-12448 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12448-0.98.patch, HBASE-12448.patch HBASE-11702 introduced rate reporting at DEBUG level for long running compactions but failed to align bytesWritten with the reporting interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12450) Unbalance chaos monkey might kill all region servers without starting them back
[ https://issues.apache.org/jira/browse/HBASE-12450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virag Kothari updated HBASE-12450: -- Status: Patch Available (was: Open) Unbalance chaos monkey might kill all region servers without starting them back --- Key: HBASE-12450 URL: https://issues.apache.org/jira/browse/HBASE-12450 Project: HBase Issue Type: Bug Reporter: Virag Kothari Assignee: Virag Kothari Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12450-0.98.patch, HBASE-12450.patch UnbalanceKillAndRebalanceAction does kill, balance and then start of region servers. But if the balance fails exception is thrown causing the region servers to not start. For me, the balance always kept on failing with socket timeout (default 1 min) as master runs one iteration of balance for 5 mins (default config). Eventually all servers are killed but never started back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12450) Unbalance chaos monkey might kill all region servers without starting them back
[ https://issues.apache.org/jira/browse/HBASE-12450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virag Kothari updated HBASE-12450: -- Attachment: HBASE-12450-0.98.patch Thanks for the quick review Andrew. Attached is patch for 0.98. The patch for master is cleanly applying to branch-1 Unbalance chaos monkey might kill all region servers without starting them back --- Key: HBASE-12450 URL: https://issues.apache.org/jira/browse/HBASE-12450 Project: HBase Issue Type: Bug Reporter: Virag Kothari Assignee: Virag Kothari Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12450-0.98.patch, HBASE-12450.patch UnbalanceKillAndRebalanceAction does kill, balance and then start of region servers. But if the balance fails exception is thrown causing the region servers to not start. For me, the balance always kept on failing with socket timeout (default 1 min) as master runs one iteration of balance for 5 mins (default config). Eventually all servers are killed but never started back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12279) Generated thrift files were generated with the wrong parameters
[ https://issues.apache.org/jira/browse/HBASE-12279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202993#comment-14202993 ] Andrew Purtell commented on HBASE-12279: Running the commands listed by [~nielsbasjes] above and committing to 0.98+ now. Generated thrift files were generated with the wrong parameters --- Key: HBASE-12279 URL: https://issues.apache.org/jira/browse/HBASE-12279 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.98.0, 0.99.0 Reporter: Niels Basjes Fix For: 2.0.0, 0.98.8, 0.94.26, 0.99.2 Attachments: HBASE-12279-2014-10-16-v1.patch, HBASE-12279-2014-11-07-v2.patch It turns out that the java code generated from the thrift files have been generated with the wrong settings. Instead of the documented ([thrift|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift/package-summary.html], [thrift2|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift2/package-summary.html]) {code} thrift -strict --gen java:hashcode {code} the current files seem to be generated instead with {code} thrift -strict --gen java {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12279) Generated thrift files were generated with the wrong parameters
[ https://issues.apache.org/jira/browse/HBASE-12279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12279: --- Assignee: Niels Basjes Generated thrift files were generated with the wrong parameters --- Key: HBASE-12279 URL: https://issues.apache.org/jira/browse/HBASE-12279 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.98.0, 0.99.0 Reporter: Niels Basjes Assignee: Niels Basjes Fix For: 2.0.0, 0.98.8, 0.94.26, 0.99.2 Attachments: HBASE-12279-2014-10-16-v1.patch, HBASE-12279-2014-11-07-v2.patch It turns out that the java code generated from the thrift files have been generated with the wrong settings. Instead of the documented ([thrift|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift/package-summary.html], [thrift2|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift2/package-summary.html]) {code} thrift -strict --gen java:hashcode {code} the current files seem to be generated instead with {code} thrift -strict --gen java {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12440) Region may remain offline on clean startup under certain race condition
[ https://issues.apache.org/jira/browse/HBASE-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202994#comment-14202994 ] Hudson commented on HBASE-12440: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #632 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/632/]) HBASE-12440 Region may remain offline on clean startup under certain race condition (Virag Kothari) (apurtell: rev d2eb3cf3fa4897333f08dc87e6b830cca5d375ad) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java Region may remain offline on clean startup under certain race condition --- Key: HBASE-12440 URL: https://issues.apache.org/jira/browse/HBASE-12440 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 0.98.8, 0.99.2 Attachments: HBASE-12440-0.98.patch, HBASE-12440-0.98_v2.patch, HBASE-12440-branch-1.patch Saw this in prod some time back with zk assignment On clean startup, while master was doing bulk assign while one of the region servers dies. The bulk assigner then tried to assign it individually using AssignCallable. The AssignCallable does a forceStateToOffline() and skips assigning as it wants the SSH to do the assignment {code} 2014-10-16 16:05:23,593 DEBUG master.AssignmentManager [AM.-pool1-t1] : Offline sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., no need to unassign since it's on a dead server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 2014-10-16 16:05:23,593 INFO master.RegionStates [AM.-pool1-t1] : Transition {1f1620174d2542fe7d5b034f3311c3a8 state=PENDING_OPEN, ts=1413475519482, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} to {1f1620174d2542fe7d5b034f3311c3a8 state=OFFLINE, ts=1413475523593, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} 2014-10-16 16:05:23,598 INFO master.AssignmentManager [AM.-pool1-t1] : Skip assigning sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., it is on a dead but not processed yet server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 {code} But the SSH wont assign as the region is offline but not in transition {code} 2014-10-16 16:05:24,606 INFO handler.ServerShutdownHandler [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Reassigning 0 region(s) that gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 was carrying (and 0 regions(s) that were opening on this server) 2014-10-16 16:05:24,606 DEBUG master.DeadServer [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Finished processing gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 {code} In zk-less assignment, the bulk assigner invoking AssignCallable and the SSH may try to assign the region. But as they go through lock, only one will succeed and doesn't seem to be an issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12450) Unbalance chaos monkey might kill all region servers without starting them back
[ https://issues.apache.org/jira/browse/HBASE-12450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203001#comment-14203001 ] Hadoop QA commented on HBASE-12450: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680323/HBASE-12450-0.98.patch against trunk revision . ATTACHMENT ID: 12680323 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11617//console This message is automatically generated. Unbalance chaos monkey might kill all region servers without starting them back --- Key: HBASE-12450 URL: https://issues.apache.org/jira/browse/HBASE-12450 Project: HBase Issue Type: Bug Reporter: Virag Kothari Assignee: Virag Kothari Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12450-0.98.patch, HBASE-12450.patch UnbalanceKillAndRebalanceAction does kill, balance and then start of region servers. But if the balance fails exception is thrown causing the region servers to not start. For me, the balance always kept on failing with socket timeout (default 1 min) as master runs one iteration of balance for 5 mins (default config). Eventually all servers are killed but never started back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-12279) Generated thrift files were generated with the wrong parameters
[ https://issues.apache.org/jira/browse/HBASE-12279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202993#comment-14202993 ] Andrew Purtell edited comment on HBASE-12279 at 11/8/14 12:22 AM: -- Running the commands listed by [~nielsbasjes] above and committing to 0.94+ now. was (Author: apurtell): Running the commands listed by [~nielsbasjes] above and committing to 0.98+ now. Generated thrift files were generated with the wrong parameters --- Key: HBASE-12279 URL: https://issues.apache.org/jira/browse/HBASE-12279 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.98.0, 0.99.0 Reporter: Niels Basjes Assignee: Niels Basjes Fix For: 2.0.0, 0.98.8, 0.94.26, 0.99.2 Attachments: HBASE-12279-2014-10-16-v1.patch, HBASE-12279-2014-11-07-v2.patch It turns out that the java code generated from the thrift files have been generated with the wrong settings. Instead of the documented ([thrift|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift/package-summary.html], [thrift2|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift2/package-summary.html]) {code} thrift -strict --gen java:hashcode {code} the current files seem to be generated instead with {code} thrift -strict --gen java {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12448) Fix rate reporting in compaction progress DEBUG logging
[ https://issues.apache.org/jira/browse/HBASE-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12448: --- Status: Open (was: Patch Available) Fix rate reporting in compaction progress DEBUG logging --- Key: HBASE-12448 URL: https://issues.apache.org/jira/browse/HBASE-12448 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12448-0.98.patch, HBASE-12448.patch HBASE-11702 introduced rate reporting at DEBUG level for long running compactions but failed to align bytesWritten with the reporting interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12448) Fix rate reporting in compaction progress DEBUG logging
[ https://issues.apache.org/jira/browse/HBASE-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12448: --- Attachment: HBASE-12448.patch HBASE-12448-0.98.patch Updated patches. Made 'bytesWritten' a long too. Fix rate reporting in compaction progress DEBUG logging --- Key: HBASE-12448 URL: https://issues.apache.org/jira/browse/HBASE-12448 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12448-0.98.patch, HBASE-12448-0.98.patch, HBASE-12448.patch, HBASE-12448.patch HBASE-11702 introduced rate reporting at DEBUG level for long running compactions but failed to align bytesWritten with the reporting interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12449) Use the max timestamp of current or old cell's timestamp in HRegion.append()
[ https://issues.apache.org/jira/browse/HBASE-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-12449: -- Attachment: hbase-12449.patch hbase-12449-0.98.patch Here is a simple patch which ensures that on append() the new cell's ts is the max of current time or old cell's time. If they are equal, the new cell will always sort first due to seqId being higher. Use the max timestamp of current or old cell's timestamp in HRegion.append() Key: HBASE-12449 URL: https://issues.apache.org/jira/browse/HBASE-12449 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: hbase-12449-0.98.patch, hbase-12449.patch We have observed an issue in SLES clusters where the system timestamp regularly goes back in time. This happens frequently enough to cause test failures when LTT is used with updater. Everytime an mutation is performed, the updater creates a string in the form #column:mutation_type and appends it to the column mutate_info. It seems that when the test fails, it is always the case that the mutate_info information for that particular column reported is not there in the column mutate_info. However, according to the MultiThreadedUpdater source code, if a row gets updated, all the columns will be mutated. So if a row contains 15 columns, all 15 should appear in mutate_info. When the test fails though, we get an exception like: {code} 2014-11-02 04:31:12,018 ERROR [HBaseReaderThread_7] util.MultiThreadedAction: Error checking data for key [b0485292cde20d8a76cca37410a9f115-23787], column family [test_cf], column [8], mutation [null]; value of length 818 {code} For the same row, the mutate info DOES NOT contain columns 8 (and 9) while it should: {code} test_cf:mutate_info timestamp=1414902651388, value=#increment:1#0:0#1:0#10:3#11:0#12:3#13:0#14:0#15:0#16:2#2:3#3:0#4:2#5:3#6:0#7:0 {code} Further debugging led to finding the root cause where It seems that on SUSE System.currentTimeMillis() can go back in time freely (especially when run in a virtualized env like EC2), and actually happens very frequently. This is from a debug log that was put in place: {code} 2014-11-04 01:16:05,025 INFO [B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765025/Put/mvcc=8239/#increment:1 2014-11-04 01:16:05,038 INFO [B.DefaultRpcServer.handler=19,queue=1,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765038/Put/mvcc=8255/#increment:1#0:3 2014-11-04 01:16:05,047 INFO [B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765047/Put/mvcc=8265/#increment:1#0:3#1:3 2014-11-04 01:16:05,057 INFO [B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765056/Put/mvcc=8274/#increment:1#0:3#1:3#10:2 2014-11-04 01:16:05,061 INFO [B.DefaultRpcServer.handler=6,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765061/Put/mvcc=8278/#increment:1#0:3#1:3#10:2#11:0 2014-11-04 01:16:05,070 INFO [B.DefaultRpcServer.handler=20,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765070/Put/mvcc=8285/#increment:1#0:3#1:3#10:2#11:0#12:3 2014-11-04 01:16:05,076 INFO [B.DefaultRpcServer.handler=3,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765076/Put/mvcc=8289/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0 2014-11-04 01:16:05,084 INFO [B.DefaultRpcServer.handler=2,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765084/Put/mvcc=8293/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0 2014-11-04 01:16:05,090 INFO [B.DefaultRpcServer.handler=7,queue=1,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765090/Put/mvcc=8297/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0 2014-11-04 01:16:05,097 INFO [B.DefaultRpcServer.handler=0,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765097/Put/mvcc=8301/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0
[jira] [Updated] (HBASE-12449) Use the max timestamp of current or old cell's timestamp in HRegion.append()
[ https://issues.apache.org/jira/browse/HBASE-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-12449: -- Status: Patch Available (was: Open) Use the max timestamp of current or old cell's timestamp in HRegion.append() Key: HBASE-12449 URL: https://issues.apache.org/jira/browse/HBASE-12449 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: hbase-12449-0.98.patch, hbase-12449.patch We have observed an issue in SLES clusters where the system timestamp regularly goes back in time. This happens frequently enough to cause test failures when LTT is used with updater. Everytime an mutation is performed, the updater creates a string in the form #column:mutation_type and appends it to the column mutate_info. It seems that when the test fails, it is always the case that the mutate_info information for that particular column reported is not there in the column mutate_info. However, according to the MultiThreadedUpdater source code, if a row gets updated, all the columns will be mutated. So if a row contains 15 columns, all 15 should appear in mutate_info. When the test fails though, we get an exception like: {code} 2014-11-02 04:31:12,018 ERROR [HBaseReaderThread_7] util.MultiThreadedAction: Error checking data for key [b0485292cde20d8a76cca37410a9f115-23787], column family [test_cf], column [8], mutation [null]; value of length 818 {code} For the same row, the mutate info DOES NOT contain columns 8 (and 9) while it should: {code} test_cf:mutate_info timestamp=1414902651388, value=#increment:1#0:0#1:0#10:3#11:0#12:3#13:0#14:0#15:0#16:2#2:3#3:0#4:2#5:3#6:0#7:0 {code} Further debugging led to finding the root cause where It seems that on SUSE System.currentTimeMillis() can go back in time freely (especially when run in a virtualized env like EC2), and actually happens very frequently. This is from a debug log that was put in place: {code} 2014-11-04 01:16:05,025 INFO [B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765025/Put/mvcc=8239/#increment:1 2014-11-04 01:16:05,038 INFO [B.DefaultRpcServer.handler=19,queue=1,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765038/Put/mvcc=8255/#increment:1#0:3 2014-11-04 01:16:05,047 INFO [B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765047/Put/mvcc=8265/#increment:1#0:3#1:3 2014-11-04 01:16:05,057 INFO [B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765056/Put/mvcc=8274/#increment:1#0:3#1:3#10:2 2014-11-04 01:16:05,061 INFO [B.DefaultRpcServer.handler=6,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765061/Put/mvcc=8278/#increment:1#0:3#1:3#10:2#11:0 2014-11-04 01:16:05,070 INFO [B.DefaultRpcServer.handler=20,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765070/Put/mvcc=8285/#increment:1#0:3#1:3#10:2#11:0#12:3 2014-11-04 01:16:05,076 INFO [B.DefaultRpcServer.handler=3,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765076/Put/mvcc=8289/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0 2014-11-04 01:16:05,084 INFO [B.DefaultRpcServer.handler=2,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765084/Put/mvcc=8293/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0 2014-11-04 01:16:05,090 INFO [B.DefaultRpcServer.handler=7,queue=1,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765090/Put/mvcc=8297/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0 2014-11-04 01:16:05,097 INFO [B.DefaultRpcServer.handler=0,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765097/Put/mvcc=8301/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0 2014-11-04 01:16:05,100 INFO [B.DefaultRpcServer.handler=14,queue=2,port=60020] regionserver.MemStore: upserting:
[jira] [Updated] (HBASE-12254) Document limitations related to pluggable replication endpoint feature usage in 0.98
[ https://issues.apache.org/jira/browse/HBASE-12254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12254: --- Fix Version/s: (was: 0.98.8) 0.98.9 Document limitations related to pluggable replication endpoint feature usage in 0.98 Key: HBASE-12254 URL: https://issues.apache.org/jira/browse/HBASE-12254 Project: HBase Issue Type: Sub-task Components: documentation Affects Versions: 0.98.7 Reporter: ramkrishna.s.vasudevan Fix For: 0.98.9 The pluggable Replication endpoint in 0.98 will need to be documented as how exactly it can be used because of limitations that we may have due to mixed version compatability where the peers may be in an older version of 0.98 where pluggable replication endpoint is not there. Also this feature adds some more data to the znodes like the name of the Endpoint impl, data and the Replication config. A peer cluster with the older version will not be able to read this data particularly when there is a custom replication configured. This JIRA aims at documenting such cases for the ease of user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12450) Unbalance chaos monkey might kill all region servers without starting them back
[ https://issues.apache.org/jira/browse/HBASE-12450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virag Kothari updated HBASE-12450: -- Attachment: HBASE-12450.patch Reattaching master patch as precommit ran against 0.98 patch Unbalance chaos monkey might kill all region servers without starting them back --- Key: HBASE-12450 URL: https://issues.apache.org/jira/browse/HBASE-12450 Project: HBase Issue Type: Bug Reporter: Virag Kothari Assignee: Virag Kothari Priority: Minor Fix For: 2.0.0, 0.98.8, 0.99.2 Attachments: HBASE-12450-0.98.patch, HBASE-12450.patch, HBASE-12450.patch UnbalanceKillAndRebalanceAction does kill, balance and then start of region servers. But if the balance fails exception is thrown causing the region servers to not start. For me, the balance always kept on failing with socket timeout (default 1 min) as master runs one iteration of balance for 5 mins (default config). Eventually all servers are killed but never started back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12168) Document Rest gateway SPNEGO-based authentication for client
[ https://issues.apache.org/jira/browse/HBASE-12168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12168: --- Fix Version/s: (was: 0.99.2) (was: 0.98.8) 2.0.0 Document Rest gateway SPNEGO-based authentication for client Key: HBASE-12168 URL: https://issues.apache.org/jira/browse/HBASE-12168 Project: HBase Issue Type: Task Components: documentation, REST, security Reporter: Jerry He Fix For: 2.0.0 After HBASE-5050, we seem to support SPNEGO-based authentication from client on Rest gateway. But I had a tough time finding the info. The support is not mentioned in Security book. In the security book, we still have: bq. It should be possible for clients to authenticate with the HBase cluster through the REST gateway in a pass-through manner via SPEGNO HTTP authentication. This is future work. The release note in HBASE-5050 seems to be obsolete as well. e.g. hbase.rest.kerberos.spnego.principal seems to be obsolete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11979) Compaction progress reporting is wrong
[ https://issues.apache.org/jira/browse/HBASE-11979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-11979: --- Fix Version/s: (was: 0.98.8) 0.98.9 Compaction progress reporting is wrong -- Key: HBASE-11979 URL: https://issues.apache.org/jira/browse/HBASE-11979 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Esteban Gutierrez Priority: Minor Fix For: 2.0.0, 0.98.9, 0.99.2 This is a long standing problem and previously could be observed in regionserver metrics, but, we recently added logging for long running compactions, and this has exposed the issue in a new way, e.g. {noformat} 2014-09-15 14:20:59,450 DEBUG [regionserver8120-largeCompactions-1410813534627] compactions.Compactor: Compaction progress: 22683625/6808179 (333.18%), rate=162.08 kB/sec {noformat} The 'rate' reported in such logging is consistent and what we were really after, but the progress indication is clearly broken and should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12148) Remove TimeRangeTracker as point of contention when many threads writing a Store
[ https://issues.apache.org/jira/browse/HBASE-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12148: --- Fix Version/s: (was: 0.98.8) 0.98.9 Remove TimeRangeTracker as point of contention when many threads writing a Store Key: HBASE-12148 URL: https://issues.apache.org/jira/browse/HBASE-12148 Project: HBase Issue Type: Sub-task Components: Performance Affects Versions: 2.0.0, 0.99.1 Reporter: stack Assignee: stack Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: 0001-In-AtomicUtils-change-updateMin-and-updateMax-to-ret.patch, 12148.addendum.txt, 12148.txt, 12148.txt, 12148v2.txt, 12148v2.txt, Screen Shot 2014-10-01 at 3.39.46 PM.png, Screen Shot 2014-10-01 at 3.41.07 PM.png -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12442) Bring KeyValue#createFirstOnRow() back to branch-1 as deprecated methods
[ https://issues.apache.org/jira/browse/HBASE-12442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203059#comment-14203059 ] Enis Soztutar commented on HBASE-12442: --- +1 if Phoenix does not compile with 0.99. Bring KeyValue#createFirstOnRow() back to branch-1 as deprecated methods Key: HBASE-12442 URL: https://issues.apache.org/jira/browse/HBASE-12442 Project: HBase Issue Type: Task Affects Versions: 0.99.0 Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.99.2 Attachments: 12442-v1.patch, 12442-v2.patch KeyValue.createFirstOnRow() methods are used by downstream projects such as Phoenix. They haven't been deprecated in 0.98 branch. This JIRA brings KeyValue.createFirstOnRow() back to branch as deprecated methods. They are removed in master branch (hbase 2.0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-10919) [VisibilityController] ScanLabelGenerator using LDAP
[ https://issues.apache.org/jira/browse/HBASE-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10919: --- Fix Version/s: (was: 0.98.8) 0.98.9 2.0.0 [VisibilityController] ScanLabelGenerator using LDAP Key: HBASE-10919 URL: https://issues.apache.org/jira/browse/HBASE-10919 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: slides-10919.pdf A ScanLabelGenerator that queries an external service, using the LDAP protocol, for a set of attributes corresponding to the principal represented by the request UGI, and converts any returned in the response to additional auths in the effective set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11639) [Visibility controller] Replicate the visibility of Cells as strings
[ https://issues.apache.org/jira/browse/HBASE-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-11639: --- Fix Version/s: (was: 0.98.8) 0.98.9 [Visibility controller] Replicate the visibility of Cells as strings Key: HBASE-11639 URL: https://issues.apache.org/jira/browse/HBASE-11639 Project: HBase Issue Type: Improvement Components: Replication, security Affects Versions: 0.98.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Labels: VisibilityLabels Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-11639_v2.patch, HBASE-11639_v2.patch, HBASE-11639_v3.patch, HBASE-11639_v3.patch, HBASE-11639_v5.patch This issue is aimed at persisting the visibility labels as strings in the WAL rather than Label ordinals. This would help in replicating the label ordinals to the replication cluster as strings directly and also that after HBASE-11553 would help because the replication cluster could have an implementation as string based visibility labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12128) Cache configuration and RpcController selection for Table in Connection
[ https://issues.apache.org/jira/browse/HBASE-12128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12128: --- Fix Version/s: (was: 0.98.8) 0.98.9 Cache configuration and RpcController selection for Table in Connection --- Key: HBASE-12128 URL: https://issues.apache.org/jira/browse/HBASE-12128 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Fix For: 2.0.0, 0.98.9, 0.99.2 Creating Table instances should be lightweight. Apps that manage their own Connections are expected to create Tables on demand for each interaction. However we look up values from Hadoop Configuration when constructing Table objects for storing to some of its fields. Configuration is a heavyweight registry that does a lot of string operations and regex matching. Method calls into Configuration account for 48.25% of CPU time when creating the HTable object in 0.98. Another ~48% of CPU is spent constructing the desired RpcController object via reflection in 0.98. Together this can account for ~20% of total on-CPU time of the client. See parent issue for more detail. We are using Connection like a factory for Table. We should cache configuration for Table in Connection. We should also create by reflection once and cache the desired RpcController object, and clone it for new Tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12279) Generated thrift files were generated with the wrong parameters
[ https://issues.apache.org/jira/browse/HBASE-12279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203080#comment-14203080 ] Andrew Purtell commented on HBASE-12279: 'mvn generate-sources -Pcompile-thrift' works for 0.98 and higher. We are missing this for 0.94. I regenerated files for 0.94 using version 0.8.0 of the compiler by hand. Regenerated Thrift for 0.98+ with compiler Thrift version 0.9.0 *0.98 tests* {noformat} Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 32, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 68.372 sec Running org.apache.hadoop.hbase.thrift.TestThriftServer Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 58.219 sec Running org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandler Tests run: 20, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.955 sec Running org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandlerWithLabels Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 21.931 sec Results : Tests run: 59, Failures: 0, Errors: 0, Skipped: 0 {noformat} *branch-1 tests* {noformat} Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 32, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 60.84 sec - in org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Running org.apache.hadoop.hbase.thrift.TestThriftServer Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 51.037 sec - in org.apache.hadoop.hbase.thrift.TestThriftServer Running org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandler Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.799 sec - in org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandler Running org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandlerWithLabels Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.776 sec - in org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandlerWithLabels Results : Tests run: 60, Failures: 0, Errors: 0, Skipped: 0 {noformat} *master tests* {noformat} Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 32, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 59.82 sec - in org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Running org.apache.hadoop.hbase.thrift.TestThriftServer Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.724 sec - in org.apache.hadoop.hbase.thrift.TestThriftServer Running org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandler Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.98 sec - in org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandler Running org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandlerWithLabels Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.261 sec - in org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandlerWithLabels Results : Tests run: 60, Failures: 0, Errors: 0, Skipped: 0 {noformat} Regenerated Thrift for 0.94 with compiler Thrift version 0.8.0. Built this version of the compiler from the Thrift 0.8.0 distribution tarball downloaded from archive.apache.org. *0.94 tests* {noformat} Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 20, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 317.191 sec Running org.apache.hadoop.hbase.thrift.TestThriftServer Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 40.331 sec Running org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandler Tests run: 19, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.159 sec Results : Tests run: 40, Failures: 0, Errors: 0, Skipped: 0 {noformat} Going to commit 0.98+ shortly unless objection. Going to commit 0.94 over the weekend probably, ping [~lhofhansl] Generated thrift files were generated with the wrong parameters --- Key: HBASE-12279 URL: https://issues.apache.org/jira/browse/HBASE-12279 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.98.0, 0.99.0 Reporter: Niels Basjes Assignee: Niels Basjes Fix For: 2.0.0, 0.98.8, 0.94.26, 0.99.2 Attachments: HBASE-12279-2014-10-16-v1.patch, HBASE-12279-2014-11-07-v2.patch It turns out that the java code generated from the thrift files have been generated with the wrong settings. Instead of the documented ([thrift|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift/package-summary.html], [thrift2|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift2/package-summary.html]) {code} thrift -strict --gen java:hashcode {code} the current files seem to be generated instead with {code} thrift -strict --gen java {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11819) Unit test for CoprocessorHConnection
[ https://issues.apache.org/jira/browse/HBASE-11819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-11819: --- Fix Version/s: (was: 0.98.8) 0.98.9 Unit test for CoprocessorHConnection - Key: HBASE-11819 URL: https://issues.apache.org/jira/browse/HBASE-11819 Project: HBase Issue Type: Test Reporter: Andrew Purtell Assignee: Talat UYARER Priority: Minor Labels: newbie++ Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-11819.patch, HBASE-11819v2.patch, HBASE-11819v3.patch, HBASE-11819v4-0.98.patch, HBASE-11819v4-branch-1.patch, HBASE-11819v4-master.patch, HBASE-11819v4-master.patch Add a unit test to hbase-server that exercises CoprocessorHConnection . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12173) Backport: [PE] Allow random value size
[ https://issues.apache.org/jira/browse/HBASE-12173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12173: --- Fix Version/s: (was: 0.98.8) 0.98.9 Backport: [PE] Allow random value size -- Key: HBASE-12173 URL: https://issues.apache.org/jira/browse/HBASE-12173 Project: HBase Issue Type: Sub-task Components: Performance Reporter: Lars Hofhansl Fix For: 0.94.26, 0.98.9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12053) SecurityBulkLoadEndPoint set 777 permission on input data files
[ https://issues.apache.org/jira/browse/HBASE-12053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12053: --- Fix Version/s: (was: 0.98.8) 0.98.9 SecurityBulkLoadEndPoint set 777 permission on input data files Key: HBASE-12053 URL: https://issues.apache.org/jira/browse/HBASE-12053 Project: HBase Issue Type: Bug Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-12053.patch We have code in SecureBulkLoadEndpoint#secureBulkLoadHFiles {code} LOG.trace(Setting permission for: + p); fs.setPermission(p, PERM_ALL_ACCESS); {code} This is against the point we use staging folder for secure bulk load. Currently we create a hidden staging folder which has ALL_ACCESS permission and we use doAs to move input files into staging folder. Therefore, we should not set 777 permission on the original input data files but files in staging folder after move. This may comprise security setting especially when there is an error we move the file with 777 permission back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12440) Region may remain offline on clean startup under certain race condition
[ https://issues.apache.org/jira/browse/HBASE-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203084#comment-14203084 ] Hudson commented on HBASE-12440: SUCCESS: Integrated in HBase-1.0 #446 (See [https://builds.apache.org/job/HBase-1.0/446/]) HBASE-12440 Region may remain offline on clean startup under certain race condition (Virag Kothari) (apurtell: rev 87fb974765f4241026ef23f2abf4622ba372ffa9) * hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java Region may remain offline on clean startup under certain race condition --- Key: HBASE-12440 URL: https://issues.apache.org/jira/browse/HBASE-12440 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Virag Kothari Assignee: Virag Kothari Fix For: 0.98.8, 0.99.2 Attachments: HBASE-12440-0.98.patch, HBASE-12440-0.98_v2.patch, HBASE-12440-branch-1.patch Saw this in prod some time back with zk assignment On clean startup, while master was doing bulk assign while one of the region servers dies. The bulk assigner then tried to assign it individually using AssignCallable. The AssignCallable does a forceStateToOffline() and skips assigning as it wants the SSH to do the assignment {code} 2014-10-16 16:05:23,593 DEBUG master.AssignmentManager [AM.-pool1-t1] : Offline sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., no need to unassign since it's on a dead server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 2014-10-16 16:05:23,593 INFO master.RegionStates [AM.-pool1-t1] : Transition {1f1620174d2542fe7d5b034f3311c3a8 state=PENDING_OPEN, ts=1413475519482, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} to {1f1620174d2542fe7d5b034f3311c3a8 state=OFFLINE, ts=1413475523593, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016} 2014-10-16 16:05:23,598 INFO master.AssignmentManager [AM.-pool1-t1] : Skip assigning sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8., it is on a dead but not processed yet server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 {code} But the SSH wont assign as the region is offline but not in transition {code} 2014-10-16 16:05:24,606 INFO handler.ServerShutdownHandler [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Reassigning 0 region(s) that gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 was carrying (and 0 regions(s) that were opening on this server) 2014-10-16 16:05:24,606 DEBUG master.DeadServer [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0] : Finished processing gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 {code} In zk-less assignment, the bulk assigner invoking AssignCallable and the SSH may try to assign the region. But as they go through lock, only one will succeed and doesn't seem to be an issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12163) Move test annotation classes to the same package as in master
[ https://issues.apache.org/jira/browse/HBASE-12163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12163: --- Fix Version/s: (was: 0.98.8) 0.98.9 Move test annotation classes to the same package as in master - Key: HBASE-12163 URL: https://issues.apache.org/jira/browse/HBASE-12163 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Trivial Fix For: 0.98.9, 0.99.2 Test classe annotations (SmallTests, etc) are in different packages in master vs 0.98 and branch-1 making backporting difficult. Lets move them to the same package. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12223) MultiTableInputFormatBase.getSplits is too slow
[ https://issues.apache.org/jira/browse/HBASE-12223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-12223: --- Fix Version/s: (was: 0.98.8) 0.98.9 MultiTableInputFormatBase.getSplits is too slow --- Key: HBASE-12223 URL: https://issues.apache.org/jira/browse/HBASE-12223 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 0.94.15 Reporter: shanwen Assignee: YuanBo Peng Priority: Minor Fix For: 2.0.0, 0.94.26, 0.98.9, 0.99.2 Attachments: HBASE-12223.patch when use Multiple scan,getSplits is too slow,800 scans take five minutes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11996) Add Table Creator to the HTD
[ https://issues.apache.org/jira/browse/HBASE-11996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-11996: --- Fix Version/s: (was: 0.98.8) 0.98.9 Add Table Creator to the HTD -- Key: HBASE-11996 URL: https://issues.apache.org/jira/browse/HBASE-11996 Project: HBase Issue Type: New Feature Components: Admin, master, Operability Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Trivial Fix For: 2.0.0, 0.98.9, 0.99.2 It will be nice storing the user who created the table. It is useful in situations where you want to remove a table but you don't know who asking to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-11962) Port HBASE-11897 Add append and remove peer table-cfs cmds for replication to 0.98
[ https://issues.apache.org/jira/browse/HBASE-11962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-11962. Resolution: Not a Problem Fix Version/s: (was: 0.98.8) No progress, resolving as NaP Port HBASE-11897 Add append and remove peer table-cfs cmds for replication to 0.98 Key: HBASE-11962 URL: https://issues.apache.org/jira/browse/HBASE-11962 Project: HBase Issue Type: Improvement Reporter: Ted Yu Priority: Minor This issue is to backport the commands for appending and removing peer table-cfs for replication to 0.98 Two new commands, append_peer_tableCFs and remove_peer_tableCFs, are added to do the operation of adding and removing a table/table-column family. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag
[ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-9531: -- Fix Version/s: (was: 0.98.8) 0.98.9 a command line (hbase shell) interface to retreive the replication metrics and show replication lag --- Key: HBASE-9531 URL: https://issues.apache.org/jira/browse/HBASE-9531 Project: HBase Issue Type: New Feature Components: Replication Affects Versions: 0.99.0 Reporter: Demai Ni Assignee: Demai Ni Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, HBASE-9531-master-v1.patch, HBASE-9531-master-v2.patch, HBASE-9531-master-v3.patch, HBASE-9531-master-v4.patch, HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch This jira is to provide a command line (hbase shell) interface to retreive the replication metrics info such as:ageOfLastShippedOp, timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and timeStampsOfLastAppliedOp. And also to provide a point of time info of the lag of replication(source only) Understand that hbase is using Hadoop metrics(http://hbase.apache.org/metrics.html), which is a common way to monitor metric info. This Jira is to serve as a light-weight client interface, comparing to a completed(certainly better, but heavier)GUI monitoring package. I made the code works on 0.94.9 now, and like to use this jira to get opinions about whether the feature is valuable to other users/workshop. If so, I will build a trunk patch. All inputs are greatly appreciated. Thank you! The overall design is to reuse the existing logic which supports hbase shell command 'status', and invent a new module, called ReplicationLoad. In HRegionServer.buildServerLoad() , use the local replication service objects to get their loads which could be wrapped in a ReplicationLoad object and then simply pass it to the ServerLoad. In ReplicationSourceMetrics and ReplicationSinkMetrics, a few getters and setters will be created, and ask Replication to build a ReplicationLoad. (many thanks to Jean-Daniel for his kindly suggestions through dev email list) the replication lag will be calculated for source only, and use this formula: {code:title=Replication lag|borderStyle=solid} if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - timeStampsOfLastShippedOp)) //err on the large side else if (current time - timeStampsOfLastShippedOp) 2* ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen recently else lag = 0 // last shipped may happens last night, so NO real lag although ageOfLastShippedOp is non-zero {code} External will look something like: {code:title=status 'replication'|borderStyle=solid} hbase(main):001:0 status 'replication' version 0.94.9 3 live servers hdtest017.svl.ibm.com: SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48 PDT 2013 hdtest018.svl.ibm.com: SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 14:50:59 PDT 2013 hdtest015.svl.ibm.com: SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48 PDT 2013 hbase(main):002:0 status 'replication','source' version 0.94.9 3 live servers hdtest017.svl.ibm.com: SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013 hdtest018.svl.ibm.com: SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 hdtest015.svl.ibm.com: SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013 hbase(main):003:0 status 'replication','sink' version 0.94.9 3 live servers hdtest017.svl.ibm.com: SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48 PDT 2013 hdtest018.svl.ibm.com: SINK :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 14:50:59 PDT 2013 hdtest015.svl.ibm.com: SINK :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48 PDT 2013 hbase(main):003:0 status 'replication','lag' version 0.94.9 3 live servers hdtest017.svl.ibm.com: lag = 0