[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4308: -- Status: Open (was: Patch Available) Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch, HBASE-4308_1.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4308: -- Status: Patch Available (was: Open) Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4308: -- Attachment: HBASE-4308_2.patch Updated patch addressing Stack's comments. Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155762#comment-13155762 ] stack commented on HBASE-4308: -- Is this check the wrong way round Ram? {code} +if (!openedNodeDeleted) { + if (this.assignmentManager.getZKTable().isDisablingOrDisabledTable( + regionInfo.getTableNameAsString())) { +debugLog(regionInfo, Opened region ++ regionInfo.getRegionNameAsString() + but ++ this table is disabled, triggering close of region); +assignmentManager.unassign(regionInfo); + } } {code} If we failed to delete the znode, only then you check if disabled? Won't openedNodeDeleted be true if all goes well and this is when you want to check if region is of a disabling table? It looks like in old code that we checked table disabling whether we succeeded znode delete or not? Otherwise, I'm +1 on this patch (You can do fixup if I'm right and go ahead and commit) Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4853: - Attachment: 4853-v4.txt Working patch. Not done yet. Also has unit test to show hole (an edit is getting in and its seqid is sticking around in lastSeqid for the region w/o being cleared). HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 4853.txt Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155771#comment-13155771 ] Hadoop QA commented on HBASE-4853: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504860/4853-v4.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/346//console This message is automatically generated. HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 4853.txt Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4851) hadoop maven dependency needs to be an optional one
[ https://issues.apache.org/jira/browse/HBASE-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155783#comment-13155783 ] Hudson commented on HBASE-4851: --- Integrated in HBase-TRUNK-security #6 (See [https://builds.apache.org/job/HBase-TRUNK-security/6/]) HBASE-4851 hadoop maven dependency needs to be an optional one stack : Files : * /hbase/trunk/pom.xml hadoop maven dependency needs to be an optional one --- Key: HBASE-4851 URL: https://issues.apache.org/jira/browse/HBASE-4851 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.92.0, 0.94.0, 0.92.1 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Fix For: 0.92.0 Attachments: HBASE-4851.92.patch.txt, HBASE-4851.trunk.patch.txt Given that HBase 0.92/0.94 is likely to be used with at least 3 different versions of Hadoop (0.20, 0.22 and 0.23) it seems appropriate to make hadoop maven dependencies into optional ones (IOW, the build of HBase will see NO changes in behavior, but any component that has HBase as a dependency will be in control of what version of Hadoop gets used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4825) TestRegionServersMetrics and TestZKLeaderManager are not categorized (small/medium/large)
[ https://issues.apache.org/jira/browse/HBASE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155785#comment-13155785 ] Hudson commented on HBASE-4825: --- Integrated in HBase-TRUNK-security #6 (See [https://builds.apache.org/job/HBase-TRUNK-security/6/]) HBASE-4825 TestRegionServersMetrics and TestZKLeaderManager are not categorized (small/medium/large); ADDENDUM; PARTIAL REVERT; MISTAKENLY COMMITTED TestCatalogTracker change HBASE-4825 TestRegionServersMetrics and TestZKLeaderManager are not categorized (small/medium/large) stack : Files : * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java stack : Files : * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKLeaderManager.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperACL.java TestRegionServersMetrics and TestZKLeaderManager are not categorized (small/medium/large) - Key: HBASE-4825 URL: https://issues.apache.org/jira/browse/HBASE-4825 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Fix For: 0.94.0 Attachments: 4825_trunk_java.patch see title -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4849) TestCatalogTracker can fail if an existing zookeeper running
[ https://issues.apache.org/jira/browse/HBASE-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155787#comment-13155787 ] Hudson commented on HBASE-4849: --- Integrated in HBase-TRUNK-security #6 (See [https://builds.apache.org/job/HBase-TRUNK-security/6/]) HBASE-4849 TestCatalogTracker can fail if an existing zookeeper running stack : Files : * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java TestCatalogTracker can fail if an existing zookeeper running Key: HBASE-4849 URL: https://issues.apache.org/jira/browse/HBASE-4849 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.0 Attachments: 4849.txt This fact sunk my attempt at building an RC. Fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4854) it seems that CLASSPATH elements coming from Hadoop change HBase behaviour
[ https://issues.apache.org/jira/browse/HBASE-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155784#comment-13155784 ] Hudson commented on HBASE-4854: --- Integrated in HBase-TRUNK-security #6 (See [https://builds.apache.org/job/HBase-TRUNK-security/6/]) HBASE-4854 it seems that CLASSPATH elements coming from Hadoop change HBase behaviour stack : Files : * /hbase/trunk/bin/hbase it seems that CLASSPATH elements coming from Hadoop change HBase behaviour -- Key: HBASE-4854 URL: https://issues.apache.org/jira/browse/HBASE-4854 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.92.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Fix For: 0.92.0 Attachments: HBASE-4854.patch.txt It looks like HBASE-3465 introduced a slight change in behavior. The ordering of classpath elements makes Hadoop ones go before the HBase ones, which leads to log4j properties picked up from the wrong place, etc. It seems that the easies way to fix that would be to revert the ordering of classpath. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
[ https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155786#comment-13155786 ] Hudson commented on HBASE-4842: --- Integrated in HBase-TRUNK-security #6 (See [https://builds.apache.org/job/HBase-TRUNK-security/6/]) HBASE-4842 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck stack : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck --- Key: HBASE-4842 URL: https://issues.apache.org/jira/browse/HBASE-4842 Project: HBase Issue Type: Bug Affects Versions: 0.90.4, 0.92.0, 0.94.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is intermittently failing. In the test, a region's assignment is purposely changed in META but not in ZK. After the equivalent of 'hbck -fix', a subsequent check that should be clean comes up with a new ZK assignment but with META still being inconsistent with ZK. The RS in ZK sometimes this points to the same RS, but sometimes it moves to another ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155788#comment-13155788 ] Hudson commented on HBASE-4797: --- Integrated in HBase-TRUNK-security #6 (See [https://builds.apache.org/job/HBase-TRUNK-security/6/]) HBASE-4797 [availability] Skip recovered.edits files with edits we know older than what region currently has (Jimmy Jiang) tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4848) TestScanner failing because hostname can't be null
[ https://issues.apache.org/jira/browse/HBASE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155789#comment-13155789 ] Hudson commented on HBASE-4848: --- Integrated in HBase-TRUNK-security #6 (See [https://builds.apache.org/job/HBase-TRUNK-security/6/]) HBASE-4848 TestScanner failing because hostname can't be null stack : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanner.java TestScanner failing because hostname can't be null -- Key: HBASE-4848 URL: https://issues.apache.org/jira/browse/HBASE-4848 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: stack Assignee: stack Fix For: 0.90.5 Attachments: 4848-092.txt, 4848.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155792#comment-13155792 ] ramkrishna.s.vasudevan commented on HBASE-4308: --- @Stack Thanks for your review {code} + private void makeRegionOnline(RegionState rs, HRegionInfo regionInfo) { +regionOnline(regionInfo, rs.serverName); +LOG.info(The master has opened the region ++ regionInfo.getRegionNameAsString() + that was online on ++ rs.serverName); +if (this.getZKTable().isDisablingOrDisabledTable( +regionInfo.getTableNameAsString())) { + debugLog(regionInfo, Opened region + + regionInfo.getRegionNameAsString() + but + + this table is disabled, triggering close of region); + unassign(regionInfo); +} + } {code} I have not broken the logic of unassign if the table is disabled. In OpenedRegionHandler also the same code is present even if deletion of the node fails. Same way if it the callback comes on successful deletion even there this code is present. Is it ok Stack? I will commit after your confirmation :) Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155795#comment-13155795 ] Hadoop QA commented on HBASE-4308: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504855/HBASE-4308_2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestInstantSchemaChange org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/345//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/345//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/345//console This message is automatically generated. Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4519) 25s sleep when expiring sessions in tests
[ https://issues.apache.org/jira/browse/HBASE-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155815#comment-13155815 ] nkeywal commented on HBASE-4519: Fixed in HBASE-4798. We now set a timeout for the zookeeper of 0,5s, then we wait 7 seconds. It works. 25s sleep when expiring sessions in tests - Key: HBASE-4519 URL: https://issues.apache.org/jira/browse/HBASE-4519 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: nkeywal Fix For: 0.92.0 There's a hardcoded 25 seconds sleep in HBaseTestingUtility.expireSession: {code} int sessionTimeout = 5 * 1000; // 5 seconds ... final long sleep = sessionTimeout * 5L; LOG.info(ZK Closed Session 0x + Long.toHexString(sessionID) + ; sleeping= + sleep); Thread.sleep(sleep); {code} I'm pretty sure this can be lowered at lot, and it would speed up a couple of tests. The only thing I'm afraid of is if this was made to accomodate flaky tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4851) hadoop maven dependency needs to be an optional one
[ https://issues.apache.org/jira/browse/HBASE-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155823#comment-13155823 ] Hudson commented on HBASE-4851: --- Integrated in HBase-0.92-security #8 (See [https://builds.apache.org/job/HBase-0.92-security/8/]) HBASE-4851 hadoop maven dependency needs to be an optional one stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/pom.xml hadoop maven dependency needs to be an optional one --- Key: HBASE-4851 URL: https://issues.apache.org/jira/browse/HBASE-4851 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.92.0, 0.94.0, 0.92.1 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Fix For: 0.92.0 Attachments: HBASE-4851.92.patch.txt, HBASE-4851.trunk.patch.txt Given that HBase 0.92/0.94 is likely to be used with at least 3 different versions of Hadoop (0.20, 0.22 and 0.23) it seems appropriate to make hadoop maven dependencies into optional ones (IOW, the build of HBase will see NO changes in behavior, but any component that has HBase as a dependency will be in control of what version of Hadoop gets used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4854) it seems that CLASSPATH elements coming from Hadoop change HBase behaviour
[ https://issues.apache.org/jira/browse/HBASE-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155824#comment-13155824 ] Hudson commented on HBASE-4854: --- Integrated in HBase-0.92-security #8 (See [https://builds.apache.org/job/HBase-0.92-security/8/]) HBASE-4854 it seems that CLASSPATH elements coming from Hadoop change HBase behaviour stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/bin/hbase it seems that CLASSPATH elements coming from Hadoop change HBase behaviour -- Key: HBASE-4854 URL: https://issues.apache.org/jira/browse/HBASE-4854 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.92.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Fix For: 0.92.0 Attachments: HBASE-4854.patch.txt It looks like HBASE-3465 introduced a slight change in behavior. The ordering of classpath elements makes Hadoop ones go before the HBase ones, which leads to log4j properties picked up from the wrong place, etc. It seems that the easies way to fix that would be to revert the ordering of classpath. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4851) hadoop maven dependency needs to be an optional one
[ https://issues.apache.org/jira/browse/HBASE-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155867#comment-13155867 ] Hudson commented on HBASE-4851: --- Integrated in HBase-TRUNK #2474 (See [https://builds.apache.org/job/HBase-TRUNK/2474/]) HBASE-4851 hadoop maven dependency needs to be an optional one stack : Files : * /hbase/trunk/pom.xml hadoop maven dependency needs to be an optional one --- Key: HBASE-4851 URL: https://issues.apache.org/jira/browse/HBASE-4851 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.92.0, 0.94.0, 0.92.1 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Fix For: 0.92.0 Attachments: HBASE-4851.92.patch.txt, HBASE-4851.trunk.patch.txt Given that HBase 0.92/0.94 is likely to be used with at least 3 different versions of Hadoop (0.20, 0.22 and 0.23) it seems appropriate to make hadoop maven dependencies into optional ones (IOW, the build of HBase will see NO changes in behavior, but any component that has HBase as a dependency will be in control of what version of Hadoop gets used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155869#comment-13155869 ] Hudson commented on HBASE-4797: --- Integrated in HBase-TRUNK #2474 (See [https://builds.apache.org/job/HBase-TRUNK/2474/]) HBASE-4797 [availability] Skip recovered.edits files with edits we know older than what region currently has (Jimmy Jiang) tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4854) it seems that CLASSPATH elements coming from Hadoop change HBase behaviour
[ https://issues.apache.org/jira/browse/HBASE-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155868#comment-13155868 ] Hudson commented on HBASE-4854: --- Integrated in HBase-TRUNK #2474 (See [https://builds.apache.org/job/HBase-TRUNK/2474/]) HBASE-4854 it seems that CLASSPATH elements coming from Hadoop change HBase behaviour stack : Files : * /hbase/trunk/bin/hbase it seems that CLASSPATH elements coming from Hadoop change HBase behaviour -- Key: HBASE-4854 URL: https://issues.apache.org/jira/browse/HBASE-4854 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.92.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Fix For: 0.92.0 Attachments: HBASE-4854.patch.txt It looks like HBASE-3465 introduced a slight change in behavior. The ordering of classpath elements makes Hadoop ones go before the HBase ones, which leads to log4j properties picked up from the wrong place, etc. It seems that the easies way to fix that would be to revert the ordering of classpath. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155919#comment-13155919 ] ramkrishna.s.vasudevan commented on HBASE-4855: --- Will dig in more tomorrow. SplitLogManager hangs on cluster restart. -- Key: HBASE-4855 URL: https://issues.apache.org/jira/browse/HBASE-4855 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Start a master and RS RS goes down (kill -9) Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is there it cannot be processed. Restart both master and bring up an RS. The master hangs in SplitLogManager.waitforTasks(). I feel that batch.done is not getting incremented properly. Not yet digged in fully. This may be the reason for occasional failure of TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4855) SplitLogManager hangs on cluster restart.
SplitLogManager hangs on cluster restart. -- Key: HBASE-4855 URL: https://issues.apache.org/jira/browse/HBASE-4855 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Start a master and RS RS goes down (kill -9) Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is there it cannot be processed. Restart both master and bring up an RS. The master hangs in SplitLogManager.waitforTasks(). I feel that batch.done is not getting incremented properly. Not yet digged in fully. This may be the reason for occasional failure of TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155921#comment-13155921 ] ramkrishna.s.vasudevan commented on HBASE-4855: --- {code} java.lang.AssertionError at org.apache.hadoop.hbase.master.SplitLogManager.heartbeat(SplitLogManager.java:466) at org.apache.hadoop.hbase.master.SplitLogManager.getDataSetWatchSuccess(SplitLogManager.java:401) at org.apache.hadoop.hbase.master.SplitLogManager.access$14(SplitLogManager.java:388) at org.apache.hadoop.hbase.master.SplitLogManager$GetDataAsyncCallback.processResult(SplitLogManager.java:914) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:571) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) {code} Some time on restart i get this log also. SplitLogManager hangs on cluster restart. -- Key: HBASE-4855 URL: https://issues.apache.org/jira/browse/HBASE-4855 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Start a master and RS RS goes down (kill -9) Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is there it cannot be processed. Restart both master and bring up an RS. The master hangs in SplitLogManager.waitforTasks(). I feel that batch.done is not getting incremented properly. Not yet digged in fully. This may be the reason for occasional failure of TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155933#comment-13155933 ] stack commented on HBASE-4308: -- +1 on commit. I see now that the effect is the same. In ORH, we'd run the disabling code regardless whether we deleted znode or not and whether region in RIT or not. I see now that the disabling code will work for all three possible conditions still -- its just that one of the handlings has been moved up into AM; only two are done in ORH now. Good work Ram. Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4519) 25s sleep when expiring sessions in tests
[ https://issues.apache.org/jira/browse/HBASE-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-4519. -- Resolution: Fixed Fix Version/s: (was: 0.92.0) 0.94.0 Fixed by hbase-4798 25s sleep when expiring sessions in tests - Key: HBASE-4519 URL: https://issues.apache.org/jira/browse/HBASE-4519 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: nkeywal Fix For: 0.94.0 There's a hardcoded 25 seconds sleep in HBaseTestingUtility.expireSession: {code} int sessionTimeout = 5 * 1000; // 5 seconds ... final long sleep = sessionTimeout * 5L; LOG.info(ZK Closed Session 0x + Long.toHexString(sessionID) + ; sleeping= + sleep); Thread.sleep(sleep); {code} I'm pretty sure this can be lowered at lot, and it would speed up a couple of tests. The only thing I'm afraid of is if this was made to accomodate flaky tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4783) Improve RowCounter to count rows in a specific key range.
[ https://issues.apache.org/jira/browse/HBASE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4783: -- Status: Open (was: Patch Available) Improve RowCounter to count rows in a specific key range. - Key: HBASE-4783 URL: https://issues.apache.org/jira/browse/HBASE-4783 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial Fix For: 0.94.0 Attachments: 4783.txt, HBASE-4783.patch Currently RowCounter in MR package is a very simple map only job that does a full scan of a table. Enhance the utility to let the user specify a key range and count the number of rows in this range. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4783) Improve RowCounter to count rows in a specific key range.
[ https://issues.apache.org/jira/browse/HBASE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4783: -- Status: Patch Available (was: Open) Improve RowCounter to count rows in a specific key range. - Key: HBASE-4783 URL: https://issues.apache.org/jira/browse/HBASE-4783 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial Fix For: 0.94.0 Attachments: 4783.txt, HBASE-4783.patch Currently RowCounter in MR package is a very simple map only job that does a full scan of a table. Enhance the utility to let the user specify a key range and count the number of rows in this range. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4783) Improve RowCounter to count rows in a specific key range.
[ https://issues.apache.org/jira/browse/HBASE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4783: -- Attachment: 4783.txt Patch usable by HadoopQA Improve RowCounter to count rows in a specific key range. - Key: HBASE-4783 URL: https://issues.apache.org/jira/browse/HBASE-4783 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial Fix For: 0.94.0 Attachments: 4783.txt, HBASE-4783.patch Currently RowCounter in MR package is a very simple map only job that does a full scan of a table. Enhance the utility to let the user specify a key range and count the number of rows in this range. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4811) Support reverse Scan
[ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155961#comment-13155961 ] John Carrino commented on HBASE-4811: - Digging a littler deeper it appears that this was already planned when the V2 HFile format was written. In the header of a block is the offset of the previous block of the same type. I think this is currently used to support efficient lookups when seeking to a location, but could also be used easily for reverse scan. Support reverse Scan Key: HBASE-4811 URL: https://issues.apache.org/jira/browse/HBASE-4811 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.20.6 Reporter: John Carrino All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending. Is there a fundamental reason that HBase only supports forward Scan? It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables. I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4856) Unit tests under security profile need more heap space
Unit tests under security profile need more heap space -- Key: HBASE-4856 URL: https://issues.apache.org/jira/browse/HBASE-4856 Project: HBase Issue Type: Task Reporter: Ted Yu In more than one 0.92-security builds (build #9, e.g.), we had the following: {code} Running org.apache.hadoop.hbase.master.TestDistributedLogSplitting Exception in thread ThreadedStreamConsumer java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) at java.lang.StringBuffer.append(StringBuffer.java:224) at org.apache.maven.surefire.report.TestSetRunListener.getAsString(TestSetRunListener.java:201) at org.apache.maven.surefire.report.TestSetRunListener.testError(TestSetRunListener.java:139) at org.apache.maven.plugin.surefire.booterclient.output.ForkClient.consumeLine(ForkClient.java:112) Running org.apache.hadoop.hbase.master.TestMasterFailover Exception in thread ThreadedStreamConsumer java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) at java.lang.StringBuffer.append(StringBuffer.java:224) at org.apache.maven.surefire.report.TestSetRunListener.getAsString(TestSetRunListener.java:201) at org.apache.maven.surefire.report.TestSetRunListener.testError(TestSetRunListener.java:139) {code} We should increase maximum heap for tests under security profile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections
[ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155973#comment-13155973 ] Terry Siu commented on HBASE-3792: -- Bryan, would be able to post a patch of the changes you are using for 0.90.4? I applied the trunk patch to 0.90.4 and aside from one minor flub, the patch was very clean. I left my mapreduce jobs to run overnight and am seeing ZK connections accummulating again, but at a slower rate, so now I'm wondering what differences exist between the changes you made for 0.90.4 versus the one you posted. Thanks! TableInputFormat leaks ZK connections - Key: HBASE-3792 URL: https://issues.apache.org/jira/browse/HBASE-3792 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.1 Environment: Java 1.6.0_24, Mac OS X 10.6.7 Reporter: Bryan Keller Attachments: tableinput.patch The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader. The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate. I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins
[ https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4847: --- Status: Patch Available (was: Open) It seems to be ok, I will change the category of the test that fails from small to medium and we will be able to push it. Activate single jvm for small tests on jenkins -- Key: HBASE-4847 URL: https://issues.apache.org/jira/browse/HBASE-4847 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.94.0 Environment: build Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4847_pom.patch, 4847_pom.v2.patch, 4847_pom.v2.patch, 4847_pom.v2.patch This will not revolutionate performances alone. We will win between 1 to 4 minutes. But we win as well: - it's a step for parallelizing the tests - new tests are less expensive as they do not create a new jvm: it's a continuous win - it will allow to push it on dev env while having the same env on dev on build, and 3 minutes are 10% of small + medium tests execution time. I will do a few submit patch to see if it works well before asking for the real commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins
[ https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4847: --- Attachment: 4847_pom.v2.patch Activate single jvm for small tests on jenkins -- Key: HBASE-4847 URL: https://issues.apache.org/jira/browse/HBASE-4847 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.94.0 Environment: build Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4847_pom.patch, 4847_pom.v2.patch, 4847_pom.v2.patch, 4847_pom.v2.patch This will not revolutionate performances alone. We will win between 1 to 4 minutes. But we win as well: - it's a step for parallelizing the tests - new tests are less expensive as they do not create a new jvm: it's a continuous win - it will allow to push it on dev env while having the same env on dev on build, and 3 minutes are 10% of small + medium tests execution time. I will do a few submit patch to see if it works well before asking for the real commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4783) Improve RowCounter to count rows in a specific key range.
[ https://issues.apache.org/jira/browse/HBASE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HBASE-4783: --- Resolution: Fixed Status: Resolved (was: Patch Available) Improve RowCounter to count rows in a specific key range. - Key: HBASE-4783 URL: https://issues.apache.org/jira/browse/HBASE-4783 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial Fix For: 0.94.0 Attachments: 4783.txt, HBASE-4783.patch Currently RowCounter in MR package is a very simple map only job that does a full scan of a table. Enhance the utility to let the user specify a key range and count the number of rows in this range. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager
Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager - Key: HBASE-4857 URL: https://issues.apache.org/jira/browse/HBASE-4857 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.92.0, 0.94.0 Reporter: Gary Helmling Fix For: 0.92.0 Looking through stack traces for {{TestMasterFailover}}, I see a case where the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop when a {{KeeperException}} is encountered: {noformat} Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 waiting on condition [0x7f9fab376000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:302) at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328) at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154) at org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397) at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435) at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) {noformat} The {{KeeperException}} causes {{ZKLeaderManager}} to call {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another {{KeeperException}}, and so on... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections
[ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155997#comment-13155997 ] Bryan Keller commented on HBASE-3792: - Sure, I'll post a patch for 0.90.4 in a bit. There have been quite a few changes to ZK connection handling in trunk (deep compare of configs, reference counting), so it is possible the patch might need to be tweaked or the leak is somewhere else. TableInputFormat leaks ZK connections - Key: HBASE-3792 URL: https://issues.apache.org/jira/browse/HBASE-3792 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.1 Environment: Java 1.6.0_24, Mac OS X 10.6.7 Reporter: Bryan Keller Attachments: tableinput.patch The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader. The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate. I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager
[ https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Helmling updated HBASE-4857: - Attachment: HBASE-4857.patch The simple fix is to recognize when we are already stopping. Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager - Key: HBASE-4857 URL: https://issues.apache.org/jira/browse/HBASE-4857 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.92.0, 0.94.0 Reporter: Gary Helmling Fix For: 0.92.0 Attachments: HBASE-4857.patch Looking through stack traces for {{TestMasterFailover}}, I see a case where the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop when a {{KeeperException}} is encountered: {noformat} Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 waiting on condition [0x7f9fab376000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:302) at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328) at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154) at org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397) at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435) at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) {noformat} The {{KeeperException}} causes {{ZKLeaderManager}} to call {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another {{KeeperException}}, and so on... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager
[ https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156007#comment-13156007 ] Ted Yu commented on HBASE-4857: --- Good catch, Gary. +1 on patch. Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager - Key: HBASE-4857 URL: https://issues.apache.org/jira/browse/HBASE-4857 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.92.0, 0.94.0 Reporter: Gary Helmling Fix For: 0.92.0 Attachments: HBASE-4857.patch Looking through stack traces for {{TestMasterFailover}}, I see a case where the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop when a {{KeeperException}} is encountered: {noformat} Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 waiting on condition [0x7f9fab376000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:302) at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328) at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154) at org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397) at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435) at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) {noformat} The {{KeeperException}} causes {{ZKLeaderManager}} to call {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another {{KeeperException}}, and so on... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4785) Improve recovery time of HBase client when a region server dies.
[ https://issues.apache.org/jira/browse/HBASE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156011#comment-13156011 ] Nicolas Spiegelberg commented on HBASE-4785: @stack : You're correct about the missing entrySet(). There was a previous commit in 89-fb (r1181942) that I could not find a use for. I guess it's this feature. Improve recovery time of HBase client when a region server dies. Key: HBASE-4785 URL: https://issues.apache.org/jira/browse/HBASE-4785 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4785.patch When a region server dies, the HBase client waits until the RPC timesout before learning that it needs to check META to find the new location of the region. And it incurs this *timeout* cost for every region being served by the dead region server. Remove this overhead by clearing the entries in cache that have the dead region server as their values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager
[ https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156017#comment-13156017 ] ramkrishna.s.vasudevan commented on HBASE-4857: --- +1 Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager - Key: HBASE-4857 URL: https://issues.apache.org/jira/browse/HBASE-4857 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.92.0, 0.94.0 Reporter: Gary Helmling Fix For: 0.92.0 Attachments: HBASE-4857.patch Looking through stack traces for {{TestMasterFailover}}, I see a case where the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop when a {{KeeperException}} is encountered: {noformat} Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 waiting on condition [0x7f9fab376000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:302) at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328) at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154) at org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397) at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435) at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) {noformat} The {{KeeperException}} causes {{ZKLeaderManager}} to call {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another {{KeeperException}}, and so on... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4783) Improve RowCounter to count rows in a specific key range.
[ https://issues.apache.org/jira/browse/HBASE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156018#comment-13156018 ] Hadoop QA commented on HBASE-4783: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504889/4783.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestInstantSchemaChange org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/347//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/347//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/347//console This message is automatically generated. Improve RowCounter to count rows in a specific key range. - Key: HBASE-4783 URL: https://issues.apache.org/jira/browse/HBASE-4783 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial Fix For: 0.94.0 Attachments: 4783.txt, HBASE-4783.patch Currently RowCounter in MR package is a very simple map only job that does a full scan of a table. Enhance the utility to let the user specify a key range and count the number of rows in this range. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager
[ https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156030#comment-13156030 ] Ted Yu commented on HBASE-4857: --- Since zookeeper 3.4 is released, should we change the following in pom.xml as well ? {code} zookeeper.version3.4.0-SNAPSHOT/zookeeper.version {code} Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager - Key: HBASE-4857 URL: https://issues.apache.org/jira/browse/HBASE-4857 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.92.0, 0.94.0 Reporter: Gary Helmling Fix For: 0.92.0 Attachments: HBASE-4857.patch Looking through stack traces for {{TestMasterFailover}}, I see a case where the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop when a {{KeeperException}} is encountered: {noformat} Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 waiting on condition [0x7f9fab376000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:302) at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328) at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154) at org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397) at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435) at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) {noformat} The {{KeeperException}} causes {{ZKLeaderManager}} to call {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another {{KeeperException}}, and so on... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156043#comment-13156043 ] Todd Lipcon commented on HBASE-4820: Can we meet in the middle with this patch? A few suggestions that would make the patch more trivial to review: - don't do the whitespace-only fixes in parts of the code you're not touching - don't expand out the import foo.*s - don't move the callback code to different parts of the file - *do* fix variable names to conform to style, remove dead code, add javadoc, rename classes, etc. This should make the patch very easy to look over and make sure it doesn't break anything. It'll then be easy for FB to pull it into their branch if they want. Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4785) Improve recovery time of HBase client when a region server dies.
[ https://issues.apache.org/jira/browse/HBASE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HBASE-4785: --- Attachment: HBASE-4785.patch Fixes SoftValueSortedMap. Internal comments: Currently SoftValueSortedMap.entrySet() tries to iteraate through the entry set of the underlying map, and put all the values (SoftValueK,V) to a newly created TreeSetEntryK,V. The entry set of SortedMap is already sorted, so it's not necessary to have a TreeSet to sort those entries again upon adding. This gets rid of the runtime class cast exception because it does not require comparing anymore. Improve recovery time of HBase client when a region server dies. Key: HBASE-4785 URL: https://issues.apache.org/jira/browse/HBASE-4785 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4785.patch, HBASE-4785.patch When a region server dies, the HBase client waits until the RPC timesout before learning that it needs to check META to find the new location of the region. And it incurs this *timeout* cost for every region being served by the dead region server. Remove this overhead by clearing the entries in cache that have the dead region server as their values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4823) long running scans lose benefit of bloomfilters and timerange hints
[ https://issues.apache.org/jira/browse/HBASE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-4823: --- Attachment: HBASE-4823.D519.1.patch aaiyer requested code review of HBASE-4823 [jira] long running scans lose benefit of bloomfilters and timerange hints. Reviewers: JIRA Changes to the StoreScanner so that whenever we do a resetScannerStack we use the same getScanner() method as done in the constructor to ignore files that are not going to be touched by the scan. Includes a test to ensure correctness. When you have a long running scan due to say an MR job, you can lose the benefit of timerange hints bloom filters midway if your scanner gets reset. span class=error[Note: The scanners can get reset say due to a flush or compaction]/span. In one of our workloads, we periodically want to do rollups on recent 15 minutes of data in a column family... but the timerange hint benefit is lost midway when this resetScannerStack (shown below) happens. And end result-- we end up reading all the old HFiles rather than just the recent HFiles. div class=code panel style=border-width: 1px;div class=codeContent panelContent pre class=code-javaspan class=code-keywordprivate/span void resetScannerStack(KeyValue lastTopKey) span class=code-keywordthrows/span IOException { span class=code-keywordif/span (heap != span class=code-keywordnull/span) { span class=code-keywordthrow/span span class=code-keywordnew/span RuntimeException(span class=code-quoteStoreScanner.reseek run on an existing heap!/span); } /* When we have the scan object, should we not pass it to getScanners() * to get a limited set of scanners? We did so in the constructor and we * could have done it now by storing the scan object from the constructor */ ListKeyValueScanner scanners = getScanners();/pre /div/div The comment in the code seems to be aware of this issue and even has the suggested fix! TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D519 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/regionserver/TestScannerResets.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/1149/ Tip: use the X-Herald-Rules header to filter Herald messages in your client. long running scans lose benefit of bloomfilters and timerange hints --- Key: HBASE-4823 URL: https://issues.apache.org/jira/browse/HBASE-4823 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Amitanand Aiyer Attachments: HBASE-4823.D519.1.patch, TestScannerResets-89fb.txt When you have a long running scan due to say an MR job, you can lose the benefit of timerange hints bloom filters midway if your scanner gets reset. [Note: The scanners can get reset say due to a flush or compaction]. In one of our workloads, we periodically want to do rollups on recent 15 minutes of data in a column family... but the timerange hint benefit is lost midway when this resetScannerStack (shown below) happens. And end result-- we end up reading all the old HFiles rather than just the recent HFiles. {code} private void resetScannerStack(KeyValue lastTopKey) throws IOException { if (heap != null) { throw new RuntimeException(StoreScanner.reseek run on an existing heap!); } /* When we have the scan object, should we not pass it to getScanners() * to get a limited set of scanners? We did so in the constructor and we * could have done it now by storing the scan object from the constructor */ ListKeyValueScanner scanners = getScanners(); {code} The comment in the code seems to be aware of this issue and even has the suggested fix! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4856) Upgrade zookeeper to 3.4.0 release
[ https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4856: -- Description: Zookeeper 3.4.0 has been released. We should upgade. was: In more than one 0.92-security builds (build #9, e.g.), we had the following: {code} Running org.apache.hadoop.hbase.master.TestDistributedLogSplitting Exception in thread ThreadedStreamConsumer java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) at java.lang.StringBuffer.append(StringBuffer.java:224) at org.apache.maven.surefire.report.TestSetRunListener.getAsString(TestSetRunListener.java:201) at org.apache.maven.surefire.report.TestSetRunListener.testError(TestSetRunListener.java:139) at org.apache.maven.plugin.surefire.booterclient.output.ForkClient.consumeLine(ForkClient.java:112) Running org.apache.hadoop.hbase.master.TestMasterFailover Exception in thread ThreadedStreamConsumer java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) at java.lang.StringBuffer.append(StringBuffer.java:224) at org.apache.maven.surefire.report.TestSetRunListener.getAsString(TestSetRunListener.java:201) at org.apache.maven.surefire.report.TestSetRunListener.testError(TestSetRunListener.java:139) {code} We should increase maximum heap for tests under security profile Summary: Upgrade zookeeper to 3.4.0 release (was: Unit tests under security profile need more heap space) Upgrade zookeeper to 3.4.0 release -- Key: HBASE-4856 URL: https://issues.apache.org/jira/browse/HBASE-4856 Project: HBase Issue Type: Task Reporter: Ted Yu Zookeeper 3.4.0 has been released. We should upgade. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156201#comment-13156201 ] Jean-Daniel Cryans commented on HBASE-4739: --- bq. Do we need make a patch for 0.90.5 ? Like you said earlier: bq. In 0.90 version, I think there is no this scenario, The closing zk node is only created by RS. So we should be fine without it. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4856) Upgrade zookeeper to 3.4.0 release
[ https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-4856: - Assignee: Ted Yu Upgrade zookeeper to 3.4.0 release -- Key: HBASE-4856 URL: https://issues.apache.org/jira/browse/HBASE-4856 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Zookeeper 3.4.0 has been released. We should upgade. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4856) Upgrade zookeeper to 3.4.0 release
[ https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4856: -- Attachment: 4856.txt Upgrade zookeeper to 3.4.0 release -- Key: HBASE-4856 URL: https://issues.apache.org/jira/browse/HBASE-4856 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.92.0 Attachments: 4856.txt Zookeeper 3.4.0 has been released. We should upgade. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4856) Upgrade zookeeper to 3.4.0 release
[ https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4856: -- Fix Version/s: 0.92.0 Upgrade zookeeper to 3.4.0 release -- Key: HBASE-4856 URL: https://issues.apache.org/jira/browse/HBASE-4856 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.92.0 Attachments: 4856.txt Zookeeper 3.4.0 has been released. We should upgade. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4823) long running scans lose benefit of bloomfilters and timerange hints
[ https://issues.apache.org/jira/browse/HBASE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156204#comment-13156204 ] Phabricator commented on HBASE-4823: Kannan has accepted the revision HBASE-4823 [jira] long running scans lose benefit of bloomfilters and timerange hints. Super! +1 for commit. REVISION DETAIL https://reviews.facebook.net/D519 long running scans lose benefit of bloomfilters and timerange hints --- Key: HBASE-4823 URL: https://issues.apache.org/jira/browse/HBASE-4823 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Amitanand Aiyer Attachments: HBASE-4823.D519.1.patch, TestScannerResets-89fb.txt When you have a long running scan due to say an MR job, you can lose the benefit of timerange hints bloom filters midway if your scanner gets reset. [Note: The scanners can get reset say due to a flush or compaction]. In one of our workloads, we periodically want to do rollups on recent 15 minutes of data in a column family... but the timerange hint benefit is lost midway when this resetScannerStack (shown below) happens. And end result-- we end up reading all the old HFiles rather than just the recent HFiles. {code} private void resetScannerStack(KeyValue lastTopKey) throws IOException { if (heap != null) { throw new RuntimeException(StoreScanner.reseek run on an existing heap!); } /* When we have the scan object, should we not pass it to getScanners() * to get a limited set of scanners? We did so in the constructor and we * could have done it now by storing the scan object from the constructor */ ListKeyValueScanner scanners = getScanners(); {code} The comment in the code seems to be aware of this issue and even has the suggested fix! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4785) Improve recovery time of HBase client when a region server dies.
[ https://issues.apache.org/jira/browse/HBASE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4785: -- Status: Patch Available (was: Open) Improve recovery time of HBase client when a region server dies. Key: HBASE-4785 URL: https://issues.apache.org/jira/browse/HBASE-4785 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4785.patch, HBASE-4785.patch When a region server dies, the HBase client waits until the RPC timesout before learning that it needs to check META to find the new location of the region. And it incurs this *timeout* cost for every region being served by the dead region server. Remove this overhead by clearing the entries in cache that have the dead region server as their values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4785) Improve recovery time of HBase client when a region server dies.
[ https://issues.apache.org/jira/browse/HBASE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4785: -- Status: Open (was: Patch Available) Improve recovery time of HBase client when a region server dies. Key: HBASE-4785 URL: https://issues.apache.org/jira/browse/HBASE-4785 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4785.patch, HBASE-4785.patch When a region server dies, the HBase client waits until the RPC timesout before learning that it needs to check META to find the new location of the region. And it incurs this *timeout* cost for every region being served by the dead region server. Remove this overhead by clearing the entries in cache that have the dead region server as their values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release
[ https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156208#comment-13156208 ] Todd Lipcon commented on HBASE-4856: If we're separating a security build and non-security build, I'd recommend keeping the non-secure one at the 3.3 series. 3.4 has a lot of new features and my hunch is that there are going to be some bugs that shake out over the next few months. Upgrade zookeeper to 3.4.0 release -- Key: HBASE-4856 URL: https://issues.apache.org/jira/browse/HBASE-4856 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.92.0 Attachments: 4856.txt Zookeeper 3.4.0 has been released. We should upgade. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4847) Activate single jvm for small tests on jenkins
[ https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156209#comment-13156209 ] Hadoop QA commented on HBASE-4847: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504894/4847_pom.v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.client.TestInstantSchemaChange org.apache.hadoop.hbase.util.TestFSTableDescriptors Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/348//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/348//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/348//console This message is automatically generated. Activate single jvm for small tests on jenkins -- Key: HBASE-4847 URL: https://issues.apache.org/jira/browse/HBASE-4847 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.94.0 Environment: build Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4847_pom.patch, 4847_pom.v2.patch, 4847_pom.v2.patch, 4847_pom.v2.patch This will not revolutionate performances alone. We will win between 1 to 4 minutes. But we win as well: - it's a step for parallelizing the tests - new tests are less expensive as they do not create a new jvm: it's a continuous win - it will allow to push it on dev env while having the same env on dev on build, and 3 minutes are 10% of small + medium tests execution time. I will do a few submit patch to see if it works well before asking for the real commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager
[ https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-4857: - Assignee: Gary Helmling Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager - Key: HBASE-4857 URL: https://issues.apache.org/jira/browse/HBASE-4857 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.92.0, 0.94.0 Reporter: Gary Helmling Assignee: Gary Helmling Fix For: 0.92.0 Attachments: HBASE-4857.patch Looking through stack traces for {{TestMasterFailover}}, I see a case where the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop when a {{KeeperException}} is encountered: {noformat} Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 waiting on condition [0x7f9fab376000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:302) at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328) at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154) at org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397) at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435) at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) {noformat} The {{KeeperException}} causes {{ZKLeaderManager}} to call {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another {{KeeperException}}, and so on... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4785) Improve recovery time of HBase client when a region server dies.
[ https://issues.apache.org/jira/browse/HBASE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156207#comment-13156207 ] Ted Yu commented on HBASE-4785: --- +1 on patch v2. Improve recovery time of HBase client when a region server dies. Key: HBASE-4785 URL: https://issues.apache.org/jira/browse/HBASE-4785 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4785.patch, HBASE-4785.patch When a region server dies, the HBase client waits until the RPC timesout before learning that it needs to check META to find the new location of the region. And it incurs this *timeout* cost for every region being served by the dead region server. Remove this overhead by clearing the entries in cache that have the dead region server as their values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager
[ https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4857: -- Status: Patch Available (was: Open) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager - Key: HBASE-4857 URL: https://issues.apache.org/jira/browse/HBASE-4857 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.92.0, 0.94.0 Reporter: Gary Helmling Fix For: 0.92.0 Attachments: HBASE-4857.patch Looking through stack traces for {{TestMasterFailover}}, I see a case where the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop when a {{KeeperException}} is encountered: {noformat} Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 waiting on condition [0x7f9fab376000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:302) at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328) at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154) at org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397) at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435) at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) {noformat} The {{KeeperException}} causes {{ZKLeaderManager}} to call {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another {{KeeperException}}, and so on... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release
[ https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156220#comment-13156220 ] Ted Yu commented on HBASE-4856: --- My reasoning was that the 3.4.0 zookeeper release would be more stable than 3.4.0-SNAPSHOT build which would change after we release 0.92 When I switched zookeeper to 3.3.3 for non-secure build, I got: {code} [ERROR] /Users/zhihyu/92hbase/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java:[145,40] cannot find symbol [ERROR] symbol : class NIOServerCnxnFactory [ERROR] location: class org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster {code} Upgrade zookeeper to 3.4.0 release -- Key: HBASE-4856 URL: https://issues.apache.org/jira/browse/HBASE-4856 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.92.0 Attachments: 4856.txt Zookeeper 3.4.0 has been released. We should upgade. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-4820: --- Status: Patch Available (was: Open) Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-4820: --- Status: Open (was: Patch Available) Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-4820: --- Attachment: 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156235#comment-13156235 ] jirapos...@reviews.apache.org commented on HBASE-4820: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2895/ --- (Updated 2011-11-23 19:58:09.915833) Review request for hbase, Todd Lipcon and Jonathan Robie. Changes --- Per Todd's suggestion, the patch is enhanced for easy back porting. Summary --- Distributed log splitting coding enhancement to make it easier to understand, no semantics change. It is some issue raised during the code review in back porting this feature to CDH. This addresses bug HBASE-4820. https://issues.apache.org/jira/browse/HBASE-4820 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 2101054 src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java d7a648d src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 7dd67e9 src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 1d329b0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 21747b1 src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 51daa1f src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java c8684ec Diff: https://reviews.apache.org/r/2895/diff Testing --- Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, which are not related to this change. Thanks, Jimmy Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release
[ https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156254#comment-13156254 ] stack commented on HBASE-4856: -- We can't do 3.3.3 zk and have a secure zk. See conversation over in tail of HBASE-2418. Upgrade zookeeper to 3.4.0 release -- Key: HBASE-4856 URL: https://issues.apache.org/jira/browse/HBASE-4856 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.92.0 Attachments: 4856.txt Zookeeper 3.4.0 has been released. We should upgade. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156258#comment-13156258 ] Hudson commented on HBASE-4308: --- Integrated in HBase-TRUNK #2475 (See [https://builds.apache.org/job/HBase-TRUNK/2475/]) HBASE-4308 Race between RegionOpenedHandler and AssignmentManager(Ram) ramkrishna : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4783) Improve RowCounter to count rows in a specific key range.
[ https://issues.apache.org/jira/browse/HBASE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156259#comment-13156259 ] Hudson commented on HBASE-4783: --- Integrated in HBase-TRUNK #2475 (See [https://builds.apache.org/job/HBase-TRUNK/2475/]) HBASE-4783 Improve RowCounter to count rows in a specific key range. nspiegelberg : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java Improve RowCounter to count rows in a specific key range. - Key: HBASE-4783 URL: https://issues.apache.org/jira/browse/HBASE-4783 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial Fix For: 0.94.0 Attachments: 4783.txt, HBASE-4783.patch Currently RowCounter in MR package is a very simple map only job that does a full scan of a table. Enhance the utility to let the user specify a key range and count the number of rows in this range. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4785) Improve recovery time of HBase client when a region server dies.
[ https://issues.apache.org/jira/browse/HBASE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156269#comment-13156269 ] Hadoop QA commented on HBASE-4785: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504907/HBASE-4785.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.regionserver.wal.TestLogRolling Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/349//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/349//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/349//console This message is automatically generated. Improve recovery time of HBase client when a region server dies. Key: HBASE-4785 URL: https://issues.apache.org/jira/browse/HBASE-4785 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4785.patch, HBASE-4785.patch When a region server dies, the HBase client waits until the RPC timesout before learning that it needs to check META to find the new location of the region. And it incurs this *timeout* cost for every region being served by the dead region server. Remove this overhead by clearing the entries in cache that have the dead region server as their values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4858) hbase-site.xml example in quickstart doesn't work in Linux
hbase-site.xml example in quickstart doesn't work in Linux -- Key: HBASE-4858 URL: https://issues.apache.org/jira/browse/HBASE-4858 Project: HBase Issue Type: Bug Components: documentation Environment: java version 1.6.0_23 OpenJDK Runtime Environment (IcedTea6 1.11pre) (6b23~pre11-1) OpenJDK Client VM (build 20.0-b11, mixed mode, sharing) Reporter: Bryce Allen Priority: Minor Under Linux with OpenJDK 1.6, using a file:///XX URL in the config file creates a directory called 'file:' in the hbase root directory. If I use a standard Unix absolute path, it works as expected. This may work on other platforms, but it would be good to add a note in the example: {code} ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? configuration property namehbase.rootdir/name !-- Depending on your platform, this may create a 'file:' directory in hbase home instead of the desired behavior. Try using a standard platform specific absolute path instead. -- valuefile:///DIRECTORY/hbase/value /property /configuration {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4605) Constraints
[ https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156280#comment-13156280 ] Ted Yu commented on HBASE-4605: --- @Jesse: Patch v6 doesn't apply cleanly: {code} Hunk #13 FAILED at 1135. 1 out of 13 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java.rej {code} Do you mind uploading a patch (--no-prefix) which applies to TRUNK so that HadoopQA can run through it ? Thanks Constraints --- Key: HBASE-4605 URL: https://issues.apache.org/jira/browse/HBASE-4605 Project: HBase Issue Type: Improvement Components: client, coprocessors Affects Versions: 0.94.0 Reporter: Jesse Yates Assignee: Jesse Yates Attachments: constraint_as_cp.txt, java_Constraint_v2.patch From Jesse's comment on dev: {quote} What I would like to propose is a simple interface that people can use to implement a 'constraint' (matching the classic database definition). This would help ease of adoption by helping HBase more easily check that box, help minimize code duplication across organizations, and lead to easier adoption. Essentially, people would implement a 'Constraint' interface for checking keys before they are put into a table. Puts that are valid get written to the table, but if not people can will throw an exception that gets propagated back to the client explaining why the put was invalid. Constraints would be set on a per-table basis and the user would be expected to ensure the jars containing the constraint are present on the machines serving that table. Yes, people could roll their own mechanism for doing this via coprocessors each time, but this would make it easier to do so, so you only have to implement a very minimal interface and not worry about the specifics. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4722) TestGlobalMemStoreSize has started failing
[ https://issues.apache.org/jira/browse/HBASE-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-4722. -- Resolution: Won't Fix This committed fix has done damage. See HBASE-4853. Closing as won't fix. TestGlobalMemStoreSize has started failing -- Key: HBASE-4722 URL: https://issues.apache.org/jira/browse/HBASE-4722 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Attachments: 4722.txt, logging-v2.txt, logging.txt I'm digging in. It fails occasionally for me locally to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156283#comment-13156283 ] stack commented on HBASE-4853: -- Looks like this commit by me broke our memstore sizing: HBASE-4722. It takes memstore flush size outside of an update lock (more edits may have come in in meantime). HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 4853.txt Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4605) Constraints
[ https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156287#comment-13156287 ] Jesse Yates commented on HBASE-4605: Yeah, sure. I actually just ran into the same issue trying to work on the shell stuff. Pushing up new version shortly. Constraints --- Key: HBASE-4605 URL: https://issues.apache.org/jira/browse/HBASE-4605 Project: HBase Issue Type: Improvement Components: client, coprocessors Affects Versions: 0.94.0 Reporter: Jesse Yates Assignee: Jesse Yates Attachments: constraint_as_cp.txt, java_Constraint_v2.patch From Jesse's comment on dev: {quote} What I would like to propose is a simple interface that people can use to implement a 'constraint' (matching the classic database definition). This would help ease of adoption by helping HBase more easily check that box, help minimize code duplication across organizations, and lead to easier adoption. Essentially, people would implement a 'Constraint' interface for checking keys before they are put into a table. Puts that are valid get written to the table, but if not people can will throw an exception that gets propagated back to the client explaining why the put was invalid. Constraints would be set on a per-table basis and the user would be expected to ensure the jars containing the constraint are present on the machines serving that table. Yes, people could roll their own mechanism for doing this via coprocessors each time, but this would make it easier to do so, so you only have to implement a very minimal interface and not worry about the specifics. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3792) TableInputFormat leaks ZK connections
[ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156289#comment-13156289 ] Terry Siu commented on HBASE-3792: -- Thanks, Bryan, looking forward to getting the 0.90.4 patch. TableInputFormat leaks ZK connections - Key: HBASE-3792 URL: https://issues.apache.org/jira/browse/HBASE-3792 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.1 Environment: Java 1.6.0_24, Mac OS X 10.6.7 Reporter: Bryan Keller Attachments: tableinput.patch The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader. The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate. I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4605) Constraints
[ https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156290#comment-13156290 ] jirapos...@reviews.apache.org commented on HBASE-4605: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2579/ --- (Updated 2011-11-23 21:19:56.263794) Review request for hbase. Changes --- Updating to current trunk to take into account changes in HTD and for Hadoop QA. Otherwise, no changes from last diff. Summary --- Most of the implementation for adding constraints as a coprocessor. Looking for general comments on style/structure, though nitpicks are ok too. Currently missing implementation for disableConstraints() since that will require adding removeCoprocessor() to HTD (also comments on if this is worth it would be good). This addresses bug HBASE-4605. https://issues.apache.org/jira/browse/HBASE-4605 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 84a0d1a src/main/java/org/apache/hadoop/hbase/constraint/BaseConstraint.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/constraint/Constraint.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/constraint/ConstraintException.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/constraint/ConstraintProcessor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/constraint/Constraints.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/constraint/IntegerConstraint.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/constraint/package-info.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/TestHTableDescriptor.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/constraint/AllFailConstraint.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/constraint/AllPassConstraint.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/constraint/CheckConfigurationConstraint.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/constraint/IntegrationTestConstraint.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/constraint/RuntimeFailConstraint.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/constraint/TestConstraints.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/constraint/TestIntegerConstraint.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/constraint/WorksConstraint.java PRE-CREATION Diff: https://reviews.apache.org/r/2579/diff Testing --- Adding IntegrationTestConstraint and unit tests for Constraints and IntegerConstraint. All of those pass. Thanks, Jesse Constraints --- Key: HBASE-4605 URL: https://issues.apache.org/jira/browse/HBASE-4605 Project: HBase Issue Type: Improvement Components: client, coprocessors Affects Versions: 0.94.0 Reporter: Jesse Yates Assignee: Jesse Yates Attachments: constraint_as_cp.txt, java_Constraint_v2.patch From Jesse's comment on dev: {quote} What I would like to propose is a simple interface that people can use to implement a 'constraint' (matching the classic database definition). This would help ease of adoption by helping HBase more easily check that box, help minimize code duplication across organizations, and lead to easier adoption. Essentially, people would implement a 'Constraint' interface for checking keys before they are put into a table. Puts that are valid get written to the table, but if not people can will throw an exception that gets propagated back to the client explaining why the put was invalid. Constraints would be set on a per-table basis and the user would be expected to ensure the jars containing the constraint are present on the machines serving that table. Yes, people could roll their own mechanism for doing this via coprocessors each time, but this would make it easier to do so, so you only have to implement a very minimal interface and not worry about the specifics. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager
[ https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156292#comment-13156292 ] Hadoop QA commented on HBASE-4857: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504898/HBASE-4857.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.client.TestInstantSchemaChange Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/350//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/350//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/350//console This message is automatically generated. Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager - Key: HBASE-4857 URL: https://issues.apache.org/jira/browse/HBASE-4857 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.92.0, 0.94.0 Reporter: Gary Helmling Assignee: Gary Helmling Fix For: 0.92.0 Attachments: HBASE-4857.patch Looking through stack traces for {{TestMasterFailover}}, I see a case where the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop when a {{KeeperException}} is encountered: {noformat} Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 waiting on condition [0x7f9fab376000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:302) at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328) at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154) at org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397) at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435) at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) {noformat} The {{KeeperException}}
[jira] [Updated] (HBASE-4605) Constraints
[ https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4605: -- Attachment: 4605.v7 Constraints --- Key: HBASE-4605 URL: https://issues.apache.org/jira/browse/HBASE-4605 Project: HBase Issue Type: Improvement Components: client, coprocessors Affects Versions: 0.94.0 Reporter: Jesse Yates Assignee: Jesse Yates Attachments: 4605.v7, constraint_as_cp.txt, java_Constraint_v2.patch From Jesse's comment on dev: {quote} What I would like to propose is a simple interface that people can use to implement a 'constraint' (matching the classic database definition). This would help ease of adoption by helping HBase more easily check that box, help minimize code duplication across organizations, and lead to easier adoption. Essentially, people would implement a 'Constraint' interface for checking keys before they are put into a table. Puts that are valid get written to the table, but if not people can will throw an exception that gets propagated back to the client explaining why the put was invalid. Constraints would be set on a per-table basis and the user would be expected to ensure the jars containing the constraint are present on the machines serving that table. Yes, people could roll their own mechanism for doing this via coprocessors each time, but this would make it easier to do so, so you only have to implement a very minimal interface and not worry about the specifics. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4605) Constraints
[ https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4605: -- Status: Patch Available (was: Open) Patch testing v7. Constraints --- Key: HBASE-4605 URL: https://issues.apache.org/jira/browse/HBASE-4605 Project: HBase Issue Type: Improvement Components: client, coprocessors Affects Versions: 0.94.0 Reporter: Jesse Yates Assignee: Jesse Yates Attachments: 4605.v7, constraint_as_cp.txt, java_Constraint_v2.patch From Jesse's comment on dev: {quote} What I would like to propose is a simple interface that people can use to implement a 'constraint' (matching the classic database definition). This would help ease of adoption by helping HBase more easily check that box, help minimize code duplication across organizations, and lead to easier adoption. Essentially, people would implement a 'Constraint' interface for checking keys before they are put into a table. Puts that are valid get written to the table, but if not people can will throw an exception that gets propagated back to the client explaining why the put was invalid. Constraints would be set on a per-table basis and the user would be expected to ensure the jars containing the constraint are present on the machines serving that table. Yes, people could roll their own mechanism for doing this via coprocessors each time, but this would make it easier to do so, so you only have to implement a very minimal interface and not worry about the specifics. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92
[ https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156319#comment-13156319 ] Lars Hofhansl commented on HBASE-4838: -- With the above scenario what I found is this: o the table is populated with only two KV: aaa and aab. o after the split there two regions: ['', aaa) and [aaa,'') x the client scanner first tries the 1st region o then it tries the 2nd region The X is where the difference is. In trunk (and unpatched 0.92), the region's internal scanner finds no KVs (as it should) and returns an empty result to the client scanner, which then proceeds to the next region. In 0.92 with this patch, the region's internal scanner actually finds both aaa and aab in the 1st region (which is wrong), and then again the 2nd region (which is correct). I don't know, yet, why this is happening, though. Maybe the scanner picks up the wrong store files, or there a problem with flushes or compactions. Port 2856 (TestAcidGuarantee is failing) to 0.92 Key: HBASE-4838 URL: https://issues.apache.org/jira/browse/HBASE-4838 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4838-v1.txt Moving back port into a separate issue (as suggested by JonH), because this not trivial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156323#comment-13156323 ] Ted Yu commented on HBASE-4855: --- The above assertion error meant there was duplicate heartbeat: {code} assert false; LOG.warn(got dup heartbeat for + path + ver = + new_version); {code} We should either ignore the dup heartbeat or make the assertion message clearer. SplitLogManager hangs on cluster restart. -- Key: HBASE-4855 URL: https://issues.apache.org/jira/browse/HBASE-4855 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Start a master and RS RS goes down (kill -9) Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is there it cannot be processed. Restart both master and bring up an RS. The master hangs in SplitLogManager.waitforTasks(). I feel that batch.done is not getting incremented properly. Not yet digged in fully. This may be the reason for occasional failure of TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156326#comment-13156326 ] Ted Yu commented on HBASE-4855: --- @Ramkrishna: Can you post from the log file the following: {code} status.setStatus(Waiting for distributed tasks to finish. + scheduled= + batch.installed + done= + batch.done + error= + batch.error); {code} It is interesting that neither done nor error counts increased. Or maybe their sum became greater than batch.installed ? SplitLogManager hangs on cluster restart. -- Key: HBASE-4855 URL: https://issues.apache.org/jira/browse/HBASE-4855 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Start a master and RS RS goes down (kill -9) Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is there it cannot be processed. Restart both master and bring up an RS. The master hangs in SplitLogManager.waitforTasks(). I feel that batch.done is not getting incremented properly. Not yet digged in fully. This may be the reason for occasional failure of TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4853: - Attachment: 4853-v5.txt Here's a fix. I need a review given how this patch is actually revert of two commits I've made -- one recent and another a couple of months ago. HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 4853-v5.txt, 4853.txt Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4853: - Assignee: stack Status: Open (was: Patch Available) HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 4853-v5.txt, 4853.txt Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4853: - Status: Patch Available (was: Open) HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 4853-v5.txt, 4853.txt Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager
[ https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-4857: -- Priority: Critical (was: Major) Nice catch. +1 on commit and raise to Critical. Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager - Key: HBASE-4857 URL: https://issues.apache.org/jira/browse/HBASE-4857 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.92.0, 0.94.0 Reporter: Gary Helmling Assignee: Gary Helmling Priority: Critical Fix For: 0.92.0 Attachments: HBASE-4857.patch Looking through stack traces for {{TestMasterFailover}}, I see a case where the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop when a {{KeeperException}} is encountered: {noformat} Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 waiting on condition [0x7f9fab376000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:302) at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328) at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154) at org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397) at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435) at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) {noformat} The {{KeeperException}} causes {{ZKLeaderManager}} to call {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another {{KeeperException}}, and so on... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156332#comment-13156332 ] stack commented on HBASE-4853: -- Here's some explaination: M src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java On flush of memstores, we were decrementing the global region memory size by the size of the global memstore AT THE TIME OF THE DECREMENT rather than decrementing by the flush size (some edits may very well have come in in between the setup of flush and decrement time). This change undoes a brain-dead change of mine in hbase-4722. That broke this. M src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java Remove flagging of the original problem, our leaving an old edit id in the lastSeqWritten for a region that was offline. I tried to write a test but its too tough at mo. You need to get some edits into the memstore AFTER the update lock is freed down in internalFlushCache but BEFORE we decrement memstore size. Only way to make it work would be by mod'ing HRegion to insert a do-nothing method. Too dumb. HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 4853-v5.txt, 4853.txt Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4853: - Status: Open (was: Patch Available) HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 4853-v5.txt, 4853-v6.txt, 4853.txt Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4853: - Attachment: 4853-v6.txt Same patch with better variable naming. HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 4853-v5.txt, 4853-v6.txt, 4853.txt Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4853: - Status: Patch Available (was: Open) HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 4853-v5.txt, 4853-v6.txt, 4853.txt Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92
[ https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156350#comment-13156350 ] Todd Lipcon commented on HBASE-4838: Maybe a problem with the HalfHFile references? After a compaction of the split daughters, does the doubling persist? Port 2856 (TestAcidGuarantee is failing) to 0.92 Key: HBASE-4838 URL: https://issues.apache.org/jira/browse/HBASE-4838 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4838-v1.txt Moving back port into a separate issue (as suggested by JonH), because this not trivial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156356#comment-13156356 ] Ted Yu commented on HBASE-4853: --- With patch v5, I got the following: {code} testGlobalMemStore(org.apache.hadoop.hbase.TestGlobalMemStoreSize) Time elapsed: 11.516 sec FAILURE! java.lang.AssertionError: Server=10.246.204.31,62993,1322086547613, i=0 expected:0 but was:608 {code} Here is tail of test output: {code} 2011-11-23 14:15:55,955 INFO [main] regionserver.Store(631): Added hdfs://localhost:62971/user/zhihyu/.META./1028785192/info/6d51d01d9498464eb025ca045e696ce4, entries=47, sequenceid=36, filesize=8.4k 2011-11-23 14:15:55,956 INFO [main] regionserver.HRegion(1396): Finished memstore flush of ~17.2k/17608 for region .META.,,1.1028785192 in 44ms, sequenceid=36, compaction requested=false 2011-11-23 14:15:55,956 INFO [main] hbase.TestGlobalMemStoreSize(99): Flush .META.,,1.1028785192 on 10.246.204.31,62993,1322086547613, false, size=608 2011-11-23 14:15:55,957 INFO [main] hbase.TestGlobalMemStoreSize(99): Flush TestGlobalMemStoreSize,,1322086555196.e2b7276e785c7f6213a5bdd08a54cf8e. on 10.246.204.31,62993,1322086547613, false, size=608 2011-11-23 14:15:55,957 INFO [main] hbase.TestGlobalMemStoreSize(99): Flush TestGlobalMemStoreSize,c,P\xE3+,1322086555201.2c847584e6af6e64f3bae631bd722934. on 10.246.204.31,62993,1322086547613, false, size=608 2011-11-23 14:15:55,957 INFO [main] hbase.TestGlobalMemStoreSize(99): Flush TestGlobalMemStoreSize,q\x83\xCC\xF1{,1322086555217.f5079469f9fa696de61b9db6364cd6e7. on 10.246.204.31,62993,1322086547613, false, size=608 2011-11-23 14:15:55,957 INFO [main] hbase.TestGlobalMemStoreSize(101): Post flush on 10.246.204.31,62993,1322086547613 {code} Basically there was no mentioning of flush completion for TestGlobalMemStoreSize table. I think we should add a log before the assertion so that we know how long we spent waiting in the while loop: {code} assertEquals(Server= + server.getServerName() + , i= + i++, 0, server.getRegionServerAccounting().getGlobalMemstoreSize()); {code} We should increase the wait time beyond 3 seconds. HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 4853-v5.txt, 4853-v6.txt, 4853.txt Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager
[ https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156355#comment-13156355 ] Gary Helmling commented on HBASE-4857: -- The TestMasterObserver failure from hadoopqa is odd, but doesn't seem to be caused by this patch. The TestAdmin failure is from exhausted file handles: {noformat} Caused by: java.io.IOException: Too many open files at sun.nio.ch.IOUtil.initPipe(Native Method) at sun.nio.ch.EPollSelectorImpl.init(EPollSelectorImpl.java:49) at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18) at java.nio.channels.Selector.open(Selector.java:209) at org.apache.zookeeper.ClientCnxnSocketNIO.init(ClientCnxnSocketNIO.java:42) at sun.reflect.GeneratedConstructorAccessor41.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at java.lang.Class.newInstance0(Class.java:355) at java.lang.Class.newInstance(Class.java:308) at org.apache.zookeeper.ZooKeeper.getClientCnxnSocket(ZooKeeper.java:1737) ... 55 more {noformat} Going to go ahead with commit. Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager - Key: HBASE-4857 URL: https://issues.apache.org/jira/browse/HBASE-4857 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.92.0, 0.94.0 Reporter: Gary Helmling Assignee: Gary Helmling Priority: Critical Fix For: 0.92.0 Attachments: HBASE-4857.patch Looking through stack traces for {{TestMasterFailover}}, I see a case where the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop when a {{KeeperException}} is encountered: {noformat} Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 waiting on condition [0x7f9fab376000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:302) at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328) at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154) at org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397) at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435) at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96) at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) {noformat} The {{KeeperException}} causes {{ZKLeaderManager}} to call {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another {{KeeperException}}, and so on... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92
[ https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156357#comment-13156357 ] stack commented on HBASE-4838: -- Yeah, look see if TRUNK has a fix in Reference or HalfStoreFileReader. Port 2856 (TestAcidGuarantee is failing) to 0.92 Key: HBASE-4838 URL: https://issues.apache.org/jira/browse/HBASE-4838 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4838-v1.txt Moving back port into a separate issue (as suggested by JonH), because this not trivial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156359#comment-13156359 ] Hadoop QA commented on HBASE-4820: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504918/0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.replication.TestReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/351//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/351//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/351//console This message is automatically generated. Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156366#comment-13156366 ] stack commented on HBASE-4853: -- hmm... that don't fail for me and the change shouldn't effect this test. HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 4853-v5.txt, 4853-v6.txt, 4853.txt Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156367#comment-13156367 ] jirapos...@reviews.apache.org commented on HBASE-4820: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2895/#review3492 --- src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java https://reviews.apache.org/r/2895/#comment7770 put the edits where? - Todd On 2011-11-23 19:58:09, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2895/ bq. --- bq. bq. (Updated 2011-11-23 19:58:09) bq. bq. bq. Review request for hbase, Todd Lipcon and Jonathan Robie. bq. bq. bq. Summary bq. --- bq. bq. Distributed log splitting coding enhancement to make it easier to understand, no semantics change. bq. It is some issue raised during the code review in back porting this feature to CDH. bq. bq. bq. This addresses bug HBASE-4820. bq. https://issues.apache.org/jira/browse/HBASE-4820 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 2101054 bq.src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java d7a648d bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 7dd67e9 bq.src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 1d329b0 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 21747b1 bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 51daa1f bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java c8684ec bq. bq. Diff: https://reviews.apache.org/r/2895/diff bq. bq. bq. Testing bq. --- bq. bq. Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, which are not related to this change. bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156368#comment-13156368 ] Ted Yu commented on HBASE-4853: --- By increasing timeout to 6 seconds (Pardon me, N), I wasn't able to reproduce failure in TestGlobalMemStoreSize after 20 iterations: {code} Index: src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java === --- src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java (revision 1205638) +++ src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java (working copy) @@ -100,11 +100,12 @@ } LOG.info(Post flush on + server.getServerName()); long now = System.currentTimeMillis(); - long timeout = now + 3000; + long timeout = now + 6000; while(server.getRegionServerAccounting().getGlobalMemstoreSize() != 0 timeout System.currentTimeMillis()) { Threads.sleep(10); } + LOG.info(About to check GlobalMemstoreSize); assertEquals(Server= + server.getServerName() + , i= + i++, 0, server.getRegionServerAccounting().getGlobalMemstoreSize()); } {code} HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 4853-v5.txt, 4853-v6.txt, 4853.txt Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92
[ https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156369#comment-13156369 ] Lars Hofhansl commented on HBASE-4838: -- Reference.java and HalfStoreFileReader.java are identical between 0.92 and trunk (and neither Reference nor HalfStoreFileReader appear in this patch), so that is likely not the cause. I also verified now that it picks up the correct store file (judged by the filename), which means the content of the store file is not correct. I thought maybe it had to do with ignoring the version counts in the ColumnTrackers, but that does not appear to be the problem. ... going to have to shelve this for a bit to work on some other stuff. Port 2856 (TestAcidGuarantee is failing) to 0.92 Key: HBASE-4838 URL: https://issues.apache.org/jira/browse/HBASE-4838 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4838-v1.txt Moving back port into a separate issue (as suggested by JonH), because this not trivial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156370#comment-13156370 ] Ted Yu commented on HBASE-4853: --- We should let TestGlobalMemStoreSize pass consistently. HBASE-4722 tried to solve this issue. HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v4.txt, 4853-v5.txt, 4853-v6.txt, 4853.txt Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-4820: --- Attachment: 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch, 0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira