[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154933#comment-13154933 ] nkeywal commented on HBASE-4832: One little comment: there is a conflict between the timeout on the method (@Test(timeout=timeout)) and the timeout of the sleep (Thread.sleep(timeout)). As they're both set to the same value (30 seconds), it can be one or another so the failure analysis will be more complex. I think we can remove the timeout on the method, the test itself ensures that it won't last forever. TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: Eugene Koontz Priority: Minor Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, HBASE-4832.patch, HBASE-4832.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) at
[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
[ https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154952#comment-13154952 ] Hudson commented on HBASE-4842: --- Integrated in HBase-0.92-security #6 (See [https://builds.apache.org/job/HBase-0.92-security/6/]) HBASE-4842 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck --- Key: HBASE-4842 URL: https://issues.apache.org/jira/browse/HBASE-4842 Project: HBase Issue Type: Bug Affects Versions: 0.90.4, 0.92.0, 0.94.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is intermittently failing. In the test, a region's assignment is purposely changed in META but not in ZK. After the equivalent of 'hbck -fix', a subsequent check that should be clean comes up with a new ZK assignment but with META still being inconsistent with ZK. The RS in ZK sometimes this points to the same RS, but sometimes it moves to another ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4830) Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno running 0.20.205.0+
[ https://issues.apache.org/jira/browse/HBASE-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154951#comment-13154951 ] Hudson commented on HBASE-4830: --- Integrated in HBase-0.92-security #6 (See [https://builds.apache.org/job/HBase-0.92-security/6/]) HBASE-4830 Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno running 0.20.205.0+ stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/bin/hbase Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno running 0.20.205.0+ --- Key: HBASE-4830 URL: https://issues.apache.org/jira/browse/HBASE-4830 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.0 Attachments: 4830-v2.txt, 4830.txt, hbase-stack-regionserver-sv4r9s38.out Running 0.20.205.1 (I was not at tip of the branch) I ran into the following hung regionserver: {code} regionserver7003.logRoller daemon prio=10 tid=0x7fd98028f800 nid=0x61af in Object.wait() [0x7fd987bfa000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(DFSClient.java:3606) - locked 0xf8656788 (a java.util.LinkedList) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3595) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3687) - locked 0xf8656458 (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3626) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:966) - locked 0xf8655998 (a org.apache.hadoop.io.SequenceFile$Writer) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:214) at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:791) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:578) - locked 0xc443deb0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) at java.lang.Thread.run(Thread.java:662) {code} Other threads are like this (here's a sample): {code} regionserver7003.logSyncer daemon prio=10 tid=0x7fd98025e000 nid=0x61ae waiting for monitor entry [0x7fd987cfb000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1074) - waiting to lock 0xc443deb0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1195) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1057) at java.lang.Thread.run(Thread.java:662) IPC Server handler 0 on 7003 daemon prio=10 tid=0x7fd98049b800 nid=0x61b8 waiting for monitor entry [0x7fd9872f1000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:1007) - waiting to lock 0xc443deb0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1798) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1668) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2980) at sun.reflect.GeneratedMethodAccessor636.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1325) {code} Looks like HDFS-1529? (Todd?) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4120) isolation and allocation
[ https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154957#comment-13154957 ] Hadoop QA commented on HBASE-4120: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504706/TablePriority_v8.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -143 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 88 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestRestartCluster org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase org.apache.hadoop.hbase.master.TestMasterFailover org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildOverlap Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/330//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/330//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/330//console This message is automatically generated. isolation and allocation Key: HBASE-4120 URL: https://issues.apache.org/jira/browse/HBASE-4120 Project: HBase Issue Type: New Feature Components: master, regionserver Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0 Reporter: Liu Jia Assignee: Liu Jia Fix For: 0.94.0 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, Design_document_for_HBase_isolation_and_allocation_Revised.pdf, HBase_isolation_and_allocation_user_guide.pdf, Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, TablePriority_v8.patch, TablePriority_v8.patch, TablePriority_v8_for_trunk.patch The HBase isolation and allocation tool is designed to help users manage cluster resource among different application and tables. When we have a large scale of HBase cluster with many applications running on it, there will be lots of problems. In Taobao there is a cluster for many departments to test their applications performance, these applications are based on HBase. With one cluster which has 12 servers, there will be only one application running exclusively on this server, and many other applications must wait until the previous test finished. After we add allocation manage function to the cluster, applications can share the cluster and run concurrently. Also if the Test Engineer wants to make sure there is no interference, he/she can move out other tables from this group. In groups we use table priority to allocate resource, when system is busy; we can make sure high-priority tables are not affected lower-priority tables Different groups can have different region server configurations, some groups optimized for reading can have large block cache size, and others optimized for writing can have large memstore size. Tables and region servers can be moved easily between groups; after changing the configuration, a group can be restarted alone instead of restarting the whole cluster. git entry : https://github.com/ICT-Ope/HBase_allocation . We hope our work is helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4846) HBase can't find the native lib
HBase can't find the native lib --- Key: HBASE-4846 URL: https://issues.apache.org/jira/browse/HBASE-4846 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.90.4 Reporter: Aaron Guo the shell script:${HBASE_HOME}/bin/hbase load the hadoop native lib like this: if [ -d /usr/lib/hadoop-0.20/lib/native/${JAVA_PLATFORM} ] ; then JAVA_LIBRARY_PATH=$(append_path ${JAVA_LIBRARY_PATH} /usr/lib/hadoop-0.20/lib/native/${JAVA_PLATFORM}) fi It should work like this: if [ -d ${HADOOP_HOME}/lib/native/${JAVA_PLATFORM} ] ; then JAVA_LIBRARY_PATH=$(append_path ${JAVA_LIBRARY_PATH} ${HADOOP_HOME}/lib/native/${JAVA_PLATFORM}) fi -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaojinchao updated HBASE-4739: -- Attachment: HBASE-4739_trial6.patch Thanks for your review. Fixed all comments Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155048#comment-13155048 ] ramkrishna.s.vasudevan commented on HBASE-4739: --- +1 for HBASE-4739_trial6.patch. The state name change is to indicate that the node is created by master. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155050#comment-13155050 ] gaojinchao commented on HBASE-4739: --- This patch is not compatible, I added M_ZK_REGION_CLOSING and delete RS_ZK_REGION_CLOSING in EventHandler.java. I have another question, Can I delete below code block in function unassign(HRegionInfo region, boolean force) ? } catch (NotServingRegionException nsre) { LOG.info(Server + server + returned + nsre + for + region.getEncodedName()); // Presume that master has stale data. Presume remote side just split. // Presume that the split message when it comes in will fix up the master's // in memory cluster state. }catch (Throwable t) I think we should use the wrap of RemoteException. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4825) TestRegionServersMetrics and TestZKLeaderManager are not categorized (small/medium/large)
[ https://issues.apache.org/jira/browse/HBASE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155067#comment-13155067 ] nkeywal commented on HBASE-4825: Can be committed. There is zookeeper.TestZooKeeperACL as well. TestRegionServersMetrics and TestZKLeaderManager are not categorized (small/medium/large) - Key: HBASE-4825 URL: https://issues.apache.org/jira/browse/HBASE-4825 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Attachments: 4825_trunk_java.patch see title -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4825) TestRegionServersMetrics and TestZKLeaderManager are not categorized (small/medium/large)
[ https://issues.apache.org/jira/browse/HBASE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4825: --- Attachment: 4825_trunk_java.patch categories added only TestRegionServersMetrics and TestZKLeaderManager are not categorized (small/medium/large) - Key: HBASE-4825 URL: https://issues.apache.org/jira/browse/HBASE-4825 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Attachments: 4825_trunk_java.patch see title -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4847) Activate single jvm for small tests on jenkins
Activate single jvm for small tests on jenkins -- Key: HBASE-4847 URL: https://issues.apache.org/jira/browse/HBASE-4847 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.94.0 Environment: build Reporter: nkeywal Assignee: nkeywal Priority: Minor This will not revolutionate performances alone. We will win between 1 to 4 minutes. But we win as well: - it's a step for parallelizing the tests - new tests are less expensive as they do not create a new jvm: it's a continuous win - it will allow to push it on dev env while having the same env on dev on build, and 3 minutes are 10% of small + medium tests execution time. I will do a few submit patch to see if it works well before asking for the real commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4308: -- Status: Patch Available (was: Open) Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4308: -- Attachment: HBASE-4308.patch Test cases are passing. Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins
[ https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4847: --- Attachment: 4847_pom.patch Activate single jvm for small tests on jenkins -- Key: HBASE-4847 URL: https://issues.apache.org/jira/browse/HBASE-4847 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.94.0 Environment: build Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4847_pom.patch This will not revolutionate performances alone. We will win between 1 to 4 minutes. But we win as well: - it's a step for parallelizing the tests - new tests are less expensive as they do not create a new jvm: it's a continuous win - it will allow to push it on dev env while having the same env on dev on build, and 3 minutes are 10% of small + medium tests execution time. I will do a few submit patch to see if it works well before asking for the real commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins
[ https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4847: --- Status: Patch Available (was: Open) Activate single jvm for small tests on jenkins -- Key: HBASE-4847 URL: https://issues.apache.org/jira/browse/HBASE-4847 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.94.0 Environment: build Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4847_pom.patch This will not revolutionate performances alone. We will win between 1 to 4 minutes. But we win as well: - it's a step for parallelizing the tests - new tests are less expensive as they do not create a new jvm: it's a continuous win - it will allow to push it on dev env while having the same env on dev on build, and 3 minutes are 10% of small + medium tests execution time. I will do a few submit patch to see if it works well before asking for the real commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
[ https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155128#comment-13155128 ] Hudson commented on HBASE-4842: --- Integrated in HBase-0.92 #159 (See [https://builds.apache.org/job/HBase-0.92/159/]) HBASE-4842 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck --- Key: HBASE-4842 URL: https://issues.apache.org/jira/browse/HBASE-4842 Project: HBase Issue Type: Bug Affects Versions: 0.90.4, 0.92.0, 0.94.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is intermittently failing. In the test, a region's assignment is purposely changed in META but not in ZK. After the equivalent of 'hbck -fix', a subsequent check that should be clean comes up with a new ZK assignment but with META still being inconsistent with ZK. The RS in ZK sometimes this points to the same RS, but sometimes it moves to another ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4847) Activate single jvm for small tests on jenkins
[ https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155129#comment-13155129 ] Hadoop QA commented on HBASE-4847: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504745/4847_pom.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause mvn compile goal to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/333//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/333//console This message is automatically generated. Activate single jvm for small tests on jenkins -- Key: HBASE-4847 URL: https://issues.apache.org/jira/browse/HBASE-4847 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.94.0 Environment: build Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4847_pom.patch This will not revolutionate performances alone. We will win between 1 to 4 minutes. But we win as well: - it's a step for parallelizing the tests - new tests are less expensive as they do not create a new jvm: it's a continuous win - it will allow to push it on dev env while having the same env on dev on build, and 3 minutes are 10% of small + medium tests execution time. I will do a few submit patch to see if it works well before asking for the real commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155166#comment-13155166 ] Hadoop QA commented on HBASE-4739: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504734/HBASE-4739_trial6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestInstantSchemaChange org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/331//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/331//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/331//console This message is automatically generated. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.
[ https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-4841: - Assignee: ramkrishna.s.vasudevan If I call split fast enough, while inserting, rows disappear. -- Key: HBASE-4841 URL: https://issues.apache.org/jira/browse/HBASE-4841 Project: HBase Issue Type: Bug Reporter: Alex Newman Assignee: ramkrishna.s.vasudevan Priority: Critical Attachments: 1, log, log2 I'll attach a unit test for this. Basically if you call split, while inserting data you can get to the point to where the cluster becomes unstable, or rows will disappear. The unit test gives you some flexibility of: - How many rows - How wide the rows are - The frequency of the split. The default settings crash unit tests or cause the unit tests to fail on my laptop. On my macbook air, i could actually turn down the number of total rows, and the frequency of the splits which is surprising. I think this is because the macbook air has much better IO than my backup acer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins
[ https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4847: --- Status: Patch Available (was: Open) with the right version... Activate single jvm for small tests on jenkins -- Key: HBASE-4847 URL: https://issues.apache.org/jira/browse/HBASE-4847 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.94.0 Environment: build Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4847_pom.patch, 4847_pom.v2.patch This will not revolutionate performances alone. We will win between 1 to 4 minutes. But we win as well: - it's a step for parallelizing the tests - new tests are less expensive as they do not create a new jvm: it's a continuous win - it will allow to push it on dev env while having the same env on dev on build, and 3 minutes are 10% of small + medium tests execution time. I will do a few submit patch to see if it works well before asking for the real commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins
[ https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4847: --- Attachment: 4847_pom.v2.patch Activate single jvm for small tests on jenkins -- Key: HBASE-4847 URL: https://issues.apache.org/jira/browse/HBASE-4847 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.94.0 Environment: build Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4847_pom.patch, 4847_pom.v2.patch This will not revolutionate performances alone. We will win between 1 to 4 minutes. But we win as well: - it's a step for parallelizing the tests - new tests are less expensive as they do not create a new jvm: it's a continuous win - it will allow to push it on dev env while having the same env on dev on build, and 3 minutes are 10% of small + medium tests execution time. I will do a few submit patch to see if it works well before asking for the real commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins
[ https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4847: --- Status: Open (was: Patch Available) Activate single jvm for small tests on jenkins -- Key: HBASE-4847 URL: https://issues.apache.org/jira/browse/HBASE-4847 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.94.0 Environment: build Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4847_pom.patch, 4847_pom.v2.patch This will not revolutionate performances alone. We will win between 1 to 4 minutes. But we win as well: - it's a step for parallelizing the tests - new tests are less expensive as they do not create a new jvm: it's a continuous win - it will allow to push it on dev env while having the same env on dev on build, and 3 minutes are 10% of small + medium tests execution time. I will do a few submit patch to see if it works well before asking for the real commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155198#comment-13155198 ] Hadoop QA commented on HBASE-4308: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504738/HBASE-4308.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.client.TestInstantSchemaChange Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/332//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/332//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/332//console This message is automatically generated. Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4847) Activate single jvm for small tests on jenkins
[ https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155232#comment-13155232 ] Hadoop QA commented on HBASE-4847: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504754/4847_pom.v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.util.TestFSTableDescriptors Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/334//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/334//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/334//console This message is automatically generated. Activate single jvm for small tests on jenkins -- Key: HBASE-4847 URL: https://issues.apache.org/jira/browse/HBASE-4847 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.94.0 Environment: build Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4847_pom.patch, 4847_pom.v2.patch This will not revolutionate performances alone. We will win between 1 to 4 minutes. But we win as well: - it's a step for parallelizing the tests - new tests are less expensive as they do not create a new jvm: it's a continuous win - it will allow to push it on dev env while having the same env on dev on build, and 3 minutes are 10% of small + medium tests execution time. I will do a few submit patch to see if it works well before asking for the real commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins
[ https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4847: --- Status: Open (was: Patch Available) Activate single jvm for small tests on jenkins -- Key: HBASE-4847 URL: https://issues.apache.org/jira/browse/HBASE-4847 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.94.0 Environment: build Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4847_pom.patch, 4847_pom.v2.patch This will not revolutionate performances alone. We will win between 1 to 4 minutes. But we win as well: - it's a step for parallelizing the tests - new tests are less expensive as they do not create a new jvm: it's a continuous win - it will allow to push it on dev env while having the same env on dev on build, and 3 minutes are 10% of small + medium tests execution time. I will do a few submit patch to see if it works well before asking for the real commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins
[ https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4847: --- Attachment: 4847_pom.v2.patch Activate single jvm for small tests on jenkins -- Key: HBASE-4847 URL: https://issues.apache.org/jira/browse/HBASE-4847 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.94.0 Environment: build Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4847_pom.patch, 4847_pom.v2.patch, 4847_pom.v2.patch This will not revolutionate performances alone. We will win between 1 to 4 minutes. But we win as well: - it's a step for parallelizing the tests - new tests are less expensive as they do not create a new jvm: it's a continuous win - it will allow to push it on dev env while having the same env on dev on build, and 3 minutes are 10% of small + medium tests execution time. I will do a few submit patch to see if it works well before asking for the real commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4848) TestScanner failing because hostname can't be null
TestScanner failing because hostname can't be null -- Key: HBASE-4848 URL: https://issues.apache.org/jira/browse/HBASE-4848 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.90.5 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4848) TestScanner failing because hostname can't be null
[ https://issues.apache.org/jira/browse/HBASE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155250#comment-13155250 ] stack commented on HBASE-4848: -- This happened last night for builds #346 and #347 and its happening locally for me. https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/346/ TestScanner failing because hostname can't be null -- Key: HBASE-4848 URL: https://issues.apache.org/jira/browse/HBASE-4848 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.90.5 Attachments: 4848.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4848) TestScanner failing because hostname can't be null
[ https://issues.apache.org/jira/browse/HBASE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4848: - Attachment: 4848.txt Small fix. TestScanner failing because hostname can't be null -- Key: HBASE-4848 URL: https://issues.apache.org/jira/browse/HBASE-4848 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.90.5 Attachments: 4848.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4848) TestScanner failing because hostname can't be null
[ https://issues.apache.org/jira/browse/HBASE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4848: - Assignee: stack Affects Version/s: 0.90.5 Status: Patch Available (was: Open) Committed to trunk and branches. TestScanner failing because hostname can't be null -- Key: HBASE-4848 URL: https://issues.apache.org/jira/browse/HBASE-4848 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: stack Assignee: stack Fix For: 0.90.5 Attachments: 4848-092.txt, 4848.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4848) TestScanner failing because hostname can't be null
[ https://issues.apache.org/jira/browse/HBASE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155266#comment-13155266 ] Hadoop QA commented on HBASE-4848: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504770/4848-092.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/335//console This message is automatically generated. TestScanner failing because hostname can't be null -- Key: HBASE-4848 URL: https://issues.apache.org/jira/browse/HBASE-4848 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: stack Assignee: stack Fix For: 0.90.5 Attachments: 4848-092.txt, 4848.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4825) TestRegionServersMetrics and TestZKLeaderManager are not categorized (small/medium/large)
[ https://issues.apache.org/jira/browse/HBASE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-4825. -- Resolution: Fixed Fix Version/s: 0.94.0 Hadoop Flags: Reviewed Committed to trunk. Thanks for patch N. I committed a changed to TestCatalogTracker at same time mistakenly and then backed it ou. TestRegionServersMetrics and TestZKLeaderManager are not categorized (small/medium/large) - Key: HBASE-4825 URL: https://issues.apache.org/jira/browse/HBASE-4825 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Fix For: 0.94.0 Attachments: 4825_trunk_java.patch see title -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4849) TestCatalogTracker can fail if an existing zookeeper running
[ https://issues.apache.org/jira/browse/HBASE-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4849: - Attachment: 4849.txt Minor fix; pass HTU.getConfiguration() TestCatalogTracker can fail if an existing zookeeper running Key: HBASE-4849 URL: https://issues.apache.org/jira/browse/HBASE-4849 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.92.0 Attachments: 4849.txt This fact sunk my attempt at building an RC. Fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4849) TestCatalogTracker can fail if an existing zookeeper running
TestCatalogTracker can fail if an existing zookeeper running Key: HBASE-4849 URL: https://issues.apache.org/jira/browse/HBASE-4849 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.92.0 Attachments: 4849.txt This fact sunk my attempt at building an RC. Fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4849) TestCatalogTracker can fail if an existing zookeeper running
[ https://issues.apache.org/jira/browse/HBASE-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-4849. -- Resolution: Fixed Assignee: stack Committed branch and trunk. TestCatalogTracker can fail if an existing zookeeper running Key: HBASE-4849 URL: https://issues.apache.org/jira/browse/HBASE-4849 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.0 Attachments: 4849.txt This fact sunk my attempt at building an RC. Fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-4797: --- Attachment: 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch This patch is done with --no-prefix. Please check it in. Thanks a lot! [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4797: - Status: Open (was: Patch Available) [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4797: - Status: Patch Available (was: Open) [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
[ https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155289#comment-13155289 ] stack commented on HBASE-4842: -- Committed to TRUNK for completeness sake. I've messed up your issue Jon. Please forgive me. Will we resolve this one and open a new one for the root issue? [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck --- Key: HBASE-4842 URL: https://issues.apache.org/jira/browse/HBASE-4842 Project: HBase Issue Type: Bug Affects Versions: 0.90.4, 0.92.0, 0.94.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is intermittently failing. In the test, a region's assignment is purposely changed in META but not in ZK. After the equivalent of 'hbck -fix', a subsequent check that should be clean comes up with a new ZK assignment but with META still being inconsistent with ZK. The RS in ZK sometimes this points to the same RS, but sometimes it moves to another ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155292#comment-13155292 ] stack commented on HBASE-4797: -- @Jimmy Just FYI, since you are new, to trigger the build again, you need to re-upload the original patch or a new one (which you did), then (I think) you need to cancel and resubmit the patch. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-4797: --- Hadoop Flags: Reviewed Status: Patch Available (was: Open) [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-4797: --- Status: Open (was: Patch Available) [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155298#comment-13155298 ] Jimmy Xiang commented on HBASE-4797: Thanks! I cancel and resubmit the patch. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-4797: --- Attachment: 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-4797: --- Status: Patch Available (was: Open) [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155299#comment-13155299 ] Ted Yu commented on HBASE-4308: --- Patch makes sense. Minor comment: {code} +boolean deleteOpenedNode = false; {code} I think openedNodeDeleted would be a better name. Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-4797: --- Status: Open (was: Patch Available) [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155050#comment-13155050 ] Ted Yu edited comment on HBASE-4739 at 11/22/11 6:05 PM: - This patch is not compatible, I added M_ZK_REGION_CLOSING and delete RS_ZK_REGION_CLOSING in EventHandler.java. I have another question, Can I delete below code block in function unassign(HRegionInfo region, boolean force) ? {code} } catch (NotServingRegionException nsre) { LOG.info(Server + server + returned + nsre + for + region.getEncodedName()); // Presume that master has stale data. Presume remote side just split. // Presume that the split message when it comes in will fix up the master's // in memory cluster state. }catch (Throwable t) {code} I think we should use the wrap of RemoteException. was (Author: sunnygao): This patch is not compatible, I added M_ZK_REGION_CLOSING and delete RS_ZK_REGION_CLOSING in EventHandler.java. I have another question, Can I delete below code block in function unassign(HRegionInfo region, boolean force) ? } catch (NotServingRegionException nsre) { LOG.info(Server + server + returned + nsre + for + region.getEncodedName()); // Presume that master has stale data. Presume remote side just split. // Presume that the split message when it comes in will fix up the master's // in memory cluster state. }catch (Throwable t) I think we should use the wrap of RemoteException. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155309#comment-13155309 ] Ted Yu commented on HBASE-4739: --- All the test failures were caused by 'Too many open files'. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4308: -- Status: Patch Available (was: Open) Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch, HBASE-4308_1.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4308: -- Attachment: HBASE-4308_1.patch Updated patch addressing Ted's comments. Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch, HBASE-4308_1.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4308: -- Status: Open (was: Patch Available) Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch, HBASE-4308_1.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4773) HBaseAdmin leaks ZooKeeper connections
[ https://issues.apache.org/jira/browse/HBASE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155319#comment-13155319 ] ramkrishna.s.vasudevan commented on HBASE-4773: --- @Xufeng Have you tested the patch in real cluster ? HBaseAdmin leaks ZooKeeper connections -- Key: HBASE-4773 URL: https://issues.apache.org/jira/browse/HBASE-4773 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: gaojinchao Priority: Critical Fix For: 0.90.5 Attachments: 4773.patch When master crashs, HBaseAdmin will leaks ZooKeeper connections I think we should close the zk connetion when throw MasterNotRunningException public HBaseAdmin(Configuration c) throws MasterNotRunningException, ZooKeeperConnectionException { this.conf = HBaseConfiguration.create(c); this.connection = HConnectionManager.getConnection(this.conf); this.pause = this.conf.getLong(hbase.client.pause, 1000); this.numRetries = this.conf.getInt(hbase.client.retries.number, 10); this.retryLongerMultiplier = this.conf.getInt(hbase.client.retries.longer.multiplier, 10); //we should add this code and close the zk connection try{ this.connection.getMaster(); }catch(MasterNotRunningException e){ HConnectionManager.deleteConnection(conf, false); throw e; } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155320#comment-13155320 ] Ted Yu commented on HBASE-4739: --- Nice work, Jinchao. {code} //RS_ZK_REGION_CLOSING (1), // Master adds this region as closing in ZK {code} We should say that this event is replaced by M_ZK_REGION_CLOSING. {code} // RS is already processing this region, only need update the timestamp if(t instanceof RegionAlreadyInTransitionException){ {code} I think we should place else in front of if above. The comment should read 'need to update' Also leave space between if and (, between ) and {. Please also prepare patch for 0.90.5 Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155328#comment-13155328 ] ramkrishna.s.vasudevan commented on HBASE-4739: --- @Ted In 0.90.5 the CLOSING node is created by RS. Do we need to change that behaviour also? Are only forcefully calling unassign() will do ? Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155330#comment-13155330 ] Ted Yu commented on HBASE-4739: --- We will keep RS_ZK_REGION_CLOSING event for 0.90.x invokeUnassign() should still be called. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155346#comment-13155346 ] Ted Yu commented on HBASE-4308: --- +1 on patch v2. Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch, HBASE-4308_1.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3952) Guava snuck back in as a dependency via hbase-3777
[ https://issues.apache.org/jira/browse/HBASE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155360#comment-13155360 ] Todd Lipcon commented on HBASE-3952: We should put this in 0.90 before 0.90.5 Guava snuck back in as a dependency via hbase-3777 -- Key: HBASE-3952 URL: https://issues.apache.org/jira/browse/HBASE-3952 Project: HBase Issue Type: Task Reporter: stack Fix For: 0.92.0, 0.90.5 Attachments: hcm.txt Undo it as we did in HBASE-3264. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3952) Guava snuck back in as a dependency via hbase-3777
[ https://issues.apache.org/jira/browse/HBASE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-3952: --- Fix Version/s: 0.90.5 0.92.0 Guava snuck back in as a dependency via hbase-3777 -- Key: HBASE-3952 URL: https://issues.apache.org/jira/browse/HBASE-3952 Project: HBase Issue Type: Task Reporter: stack Fix For: 0.92.0, 0.90.5 Attachments: hcm.txt Undo it as we did in HBASE-3264. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-3952) Guava snuck back in as a dependency via hbase-3777
[ https://issues.apache.org/jira/browse/HBASE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HBASE-3952. Resolution: Fixed Hadoop Flags: Reviewed Committed to 0.90 branch. Guava snuck back in as a dependency via hbase-3777 -- Key: HBASE-3952 URL: https://issues.apache.org/jira/browse/HBASE-3952 Project: HBase Issue Type: Task Reporter: stack Fix For: 0.92.0, 0.90.5 Attachments: hcm.txt Undo it as we did in HBASE-3264. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155369#comment-13155369 ] Hadoop QA commented on HBASE-4797: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504777/0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestInstantSchemaChange org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/336//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/336//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/336//console This message is automatically generated. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155383#comment-13155383 ] stack commented on HBASE-4308: -- So, we are moving the call of regionOnline out of OpenRegionHandler and up as a reaction to the delete of znode in AM? That looks like a good change. What is odd though is that the log message -- Node deleted but still in RIT: -- gives the impression that there is something wrong when this log message comes out though this is now the legit way of onlining a region in master. I'd suggest that we change the log message to 'Node deleted ...'. Should this test which is in makeRegionOnline be up in the caller (You test SPLIT and SPLITTING in caller... it would make code easier to read): {code} if (rs.getState().equals(RegionState.State.OPEN)) {code} Why don't we do rs.isOpened() instead of the above check? Call the method makeRegionOnline instead regionOnline? This log message seems extraneous given the above logging of delete: {code} +debugLog(regionInfo, The znode of region ++ regionInfo.getRegionNameAsString() + has been deleted.); {code} Otherwise patch looks good. Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch, HBASE-4308_1.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155391#comment-13155391 ] Hadoop QA commented on HBASE-4797: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504779/0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestMasterFailover org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/337//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/337//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/337//console This message is automatically generated. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92
[ https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155394#comment-13155394 ] Lars Hofhansl commented on HBASE-4838: -- Still working on this. I have narrowed down the scenario (also happens with just 2 cells that are split), but don't have the root cause, yet. I looked through the patch multiple times and I pretty sure the changes are correct (in the sense that they represent the patch), there must be an assumption about some behavior in trunk that is not present in 0.92. Port 2856 (TestAcidGuarantee is failing) to 0.92 Key: HBASE-4838 URL: https://issues.apache.org/jira/browse/HBASE-4838 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4838-v1.txt Moving back port into a separate issue (as suggested by JonH), because this not trivial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155408#comment-13155408 ] Hadoop QA commented on HBASE-4308: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504781/HBASE-4308_1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.client.TestInstantSchemaChange Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/338//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/338//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/338//console This message is automatically generated. Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4308.patch, HBASE-4308_1.patch When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-4218: --- Attachment: D447.4.patch mbautin updated the revision [jira] [HBASE-4218] Delta encoding for keys in HFile. Reviewers: JIRA, tedyu, stack, nspiegelberg, Kannan Rebased on most recent changes in trunk, fixed conflicts. There are failing unit tests, and delta compression is not yet aware of the persistent memstore TS field added in 2856. REVISION DETAIL https://reviews.facebook.net/D447 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java src/main/java/org/apache/hadoop/hbase/KeyValue.java src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java src/main/java/org/apache/hadoop/hbase/io/deltaencoder/BitsetKeyDeltaEncoder.java src/main/java/org/apache/hadoop/hbase/io/deltaencoder/BufferedDeltaEncoder.java src/main/java/org/apache/hadoop/hbase/io/deltaencoder/CompressionState.java src/main/java/org/apache/hadoop/hbase/io/deltaencoder/CopyKeyDeltaEncoder.java src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncodedBlock.java src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoder.java src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoderAlgorithms.java src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoderToSmallBufferException.java src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DiffKeyDeltaEncoder.java src/main/java/org/apache/hadoop/hbase/io/deltaencoder/FastDiffDeltaEncoder.java src/main/java/org/apache/hadoop/hbase/io/deltaencoder/PrefixKeyDeltaEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java src/main/java/org/apache/hadoop/hbase/io/hfile/BlockDeltaEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java src/main/java/org/apache/hadoop/hbase/io/hfile/EmptyBlockDeltaEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockDeltaEncoder.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaConfigured.java src/main/ruby/hbase/admin.rb src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java src/test/java/org/apache/hadoop/hbase/HFilePerformanceEvaluation.java src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java src/test/java/org/apache/hadoop/hbase/io/deltaencoder/RedundantKVGenerator.java src/test/java/org/apache/hadoop/hbase/io/deltaencoder/TestBufferedDeltaEncoder.java src/test/java/org/apache/hadoop/hbase/io/deltaencoder/TestDeltaEncoders.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockDeltaEncoder.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java src/test/java/org/apache/hadoop/hbase/regionserver/DeltaEncodingSeekPerformance.java src/test/java/org/apache/hadoop/hbase/regionserver/DeltaEncodingUtil.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Labels: compression Attachments: D447.1.patch, D447.2.patch, D447.3.patch, D447.4.patch, open-source.diff A compression for keys. Keys are
[jira] [Commented] (HBASE-4846) HBase can't find the native lib
[ https://issues.apache.org/jira/browse/HBASE-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155420#comment-13155420 ] stack commented on HBASE-4846: -- Can you make a patch Aaron and give demo your fix works rather than what is currently there? Thanks. HBase can't find the native lib --- Key: HBASE-4846 URL: https://issues.apache.org/jira/browse/HBASE-4846 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.90.4 Reporter: Aaron Guo the shell script:${HBASE_HOME}/bin/hbase load the hadoop native lib like this: if [ -d /usr/lib/hadoop-0.20/lib/native/${JAVA_PLATFORM} ] ; then JAVA_LIBRARY_PATH=$(append_path ${JAVA_LIBRARY_PATH} /usr/lib/hadoop-0.20/lib/native/${JAVA_PLATFORM}) fi It should work like this: if [ -d ${HADOOP_HOME}/lib/native/${JAVA_PLATFORM} ] ; then JAVA_LIBRARY_PATH=$(append_path ${JAVA_LIBRARY_PATH} ${HADOOP_HOME}/lib/native/${JAVA_PLATFORM}) fi -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155422#comment-13155422 ] stack commented on HBASE-4832: -- I can address the N comment on commit? (Removing the @Test timeout). TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: Eugene Koontz Priority: Minor Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, HBASE-4832.patch, HBASE-4832.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725) at
[jira] [Commented] (HBASE-4811) Support reverse Scan
[ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155426#comment-13155426 ] stack commented on HBASE-4811: -- bq. Is there a fundamental reason that HBase only supports forward Scan? Yes. All the data is sorted in one direction only and all Scan objects are written to go in the data's 'natural' direction. There is no native support for going backwards whether its reading from files 'backwards' or getting a view on our MemStore that gives a reverse-sort-view. To make it work, you'd have to write a bunch of code and you'd be always going against the grain. It used to come up the odd time in the early days but versions on the above args would usually quiet them. If you need more detail, ask. Support reverse Scan Key: HBASE-4811 URL: https://issues.apache.org/jira/browse/HBASE-4811 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.20.6 Reporter: John Carrino All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending. Is there a fundamental reason that HBase only supports forward Scan? It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables. I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155453#comment-13155453 ] stack commented on HBASE-4832: -- And N, you want to uncomment this section now? This patch wants to do it. {code} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping //sleeper.skipSleepCycle(); //will be uncommented later, see discussion in jira 4798 } {code} TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: Eugene Koontz Priority: Minor Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, HBASE-4832.patch, HBASE-4832.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892) at
[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155466#comment-13155466 ] Eugene Koontz commented on HBASE-4832: -- @stack, that is fine, thanks. -Eugene TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: Eugene Koontz Priority: Minor Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, HBASE-4832.patch, HBASE-4832.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725) at
[jira] [Created] (HBASE-4850) hbase tests need to be made Hadoop version agnostic
hbase tests need to be made Hadoop version agnostic --- Key: HBASE-4850 URL: https://issues.apache.org/jira/browse/HBASE-4850 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.92.0, 0.94.0, 0.92.1 Reporter: Roman Shaposhnik Currently it is possible to have a single hbase jar that can work with multiple versions of Hadoop. It would be nice if hbase-test.jar also followed the suit. For now I'm aware of the following problems (but there could be more): 1. org.apache.hadoop.hbase.mapreduce.NMapInputFormat is failing because org.apache.hadoop.mapreduce.JobContext is either class or interface depending on which version of Hadoop you compile it against. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155468#comment-13155468 ] Eugene Koontz commented on HBASE-4832: -- @stack, I tried with sleeper.skipSleepCycle() uncommented and commented; test consistently succeeded 30+ iterations in both cases. TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: Eugene Koontz Priority: Minor Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, HBASE-4832.patch, HBASE-4832.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725) at
[jira] [Created] (HBASE-4851) hadoop maven dependency needs to be an optional one
hadoop maven dependency needs to be an optional one --- Key: HBASE-4851 URL: https://issues.apache.org/jira/browse/HBASE-4851 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.92.0, 0.94.0, 0.92.1 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Given that HBase 0.92/0.94 is likely to be used with at least 3 different versions of Hadoop (0.20, 0.22 and 0.23) it seems appropriate to make hadoop maven dependencies into optional ones (IOW, the build of HBase will see NO changes in behavior, but any component that has HBase as a dependency will be in control of what version of Hadoop gets used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koontz updated HBASE-4832: - Attachment: HBASE-4832.patch -Removes (timeout=3) from @Test per nkeywal's suggestion. -Add LOG.debug() concerning where interrupt occurs. TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: Eugene Koontz Priority: Minor Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725) at
[jira] [Updated] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koontz updated HBASE-4832: - Attachment: HBASE-4832.patch git diff --no-prefix TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: Eugene Koontz Priority: Minor Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725) at
[jira] [Updated] (HBASE-4850) hbase tests need to be made Hadoop version agnostic
[ https://issues.apache.org/jira/browse/HBASE-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4850: - Priority: Critical (was: Major) hbase tests need to be made Hadoop version agnostic --- Key: HBASE-4850 URL: https://issues.apache.org/jira/browse/HBASE-4850 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.92.0, 0.94.0, 0.92.1 Reporter: Roman Shaposhnik Priority: Critical Currently it is possible to have a single hbase jar that can work with multiple versions of Hadoop. It would be nice if hbase-test.jar also followed the suit. For now I'm aware of the following problems (but there could be more): 1. org.apache.hadoop.hbase.mapreduce.NMapInputFormat is failing because org.apache.hadoop.mapreduce.JobContext is either class or interface depending on which version of Hadoop you compile it against. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3307) Add checkAndPut to the Thrift API
[ https://issues.apache.org/jira/browse/HBASE-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155503#comment-13155503 ] Bryce Allen commented on HBASE-3307: checkAndPut/checkAndDelete is one of the key features that differentiates HBase from other key values stores, it's a shame that these are only available to Java clients, since they are missing from both the Thrift and REST APIs. Add checkAndPut to the Thrift API - Key: HBASE-3307 URL: https://issues.apache.org/jira/browse/HBASE-3307 Project: HBase Issue Type: New Feature Components: thrift Affects Versions: 0.89.20100924 Reporter: Chris Tarnas Priority: Minor It would be very useful to have the checkAndPut method available via the Thrift API. This would both allow for easier atomic updates as well as cut down on at least one Thrift roundtrip for quite a few common tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4811) Support reverse Scan
[ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155506#comment-13155506 ] John Carrino commented on HBASE-4811: - Yeah, I'm not that familiar with the codebase, but I'd assume that in order to get forward scans you'd have to have the data sorted. And from what I understand it is internally stored as sstables or HFiles. If you have it sorted to scan in one direction, it seems pretty easy to go the other direction. LevelDb uses ssTables and supports reverse ranges. The only thing that I could think of from the design (from a high level) that might make it difficult to do reverse ranges is dealing with splitting ranges when moving ranges from one region server to another. Just from a quick look at MemStore that you mention, it uses a KeyValueSkipListSet under the covers that is a NavigableSet and supports descendingSet and descendingIterator. -jc On Tue, Nov 22, 2011 at 12:52 PM, stack (Commented) (JIRA) Support reverse Scan Key: HBASE-4811 URL: https://issues.apache.org/jira/browse/HBASE-4811 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.20.6 Reporter: John Carrino All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending. Is there a fundamental reason that HBase only supports forward Scan? It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables. I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4811) Support reverse Scan
[ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155519#comment-13155519 ] John Carrino commented on HBASE-4811: - Yeah, I'm not that familiar with the codebase, but I'd assume that in order to get forward scans you'd have to have the data sorted. And from what I understand it is internally stored as sstables or HFiles. If you have it sorted to scan in one direction, it seems pretty easy to go the other direction. LevelDb uses ssTables and supports reverse ranges. The only thing that I could think of from the design (from a high level) that might make it difficult to do reverse ranges is dealing with splitting ranges when moving ranges from one region server to another. Just from a quick look at MemStore that you mention, it uses a KeyValueSkipListSet under the covers that is a NavigableSet and supports descendingSet and descendingIterator. Also to provide some context, this table we want to scan both ways is effectively an index which will be relatively small and we would like to pin in memory (as much as possible). Also likely that this will run on all Sold State, so doing reverse reads won't be a perf hit like it would be for spinny drives. Support reverse Scan Key: HBASE-4811 URL: https://issues.apache.org/jira/browse/HBASE-4811 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.20.6 Reporter: John Carrino All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending. Is there a fundamental reason that HBase only supports forward Scan? It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables. I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3307) Add checkAndPut to the Thrift API
[ https://issues.apache.org/jira/browse/HBASE-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155524#comment-13155524 ] Ted Yu commented on HBASE-3307: --- @Bryce: Have you checked out HBASE-1744 ? Add checkAndPut to the Thrift API - Key: HBASE-3307 URL: https://issues.apache.org/jira/browse/HBASE-3307 Project: HBase Issue Type: New Feature Components: thrift Affects Versions: 0.89.20100924 Reporter: Chris Tarnas Priority: Minor It would be very useful to have the checkAndPut method available via the Thrift API. This would both allow for easier atomic updates as well as cut down on at least one Thrift roundtrip for quite a few common tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-3307) Add checkAndPut to the Thrift API
[ https://issues.apache.org/jira/browse/HBASE-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155524#comment-13155524 ] Ted Yu edited comment on HBASE-3307 at 11/22/11 10:46 PM: -- @Bryce: Have you checked out HBASE-1744 ? ThriftHBaseServiceHandler implements the two APIs you mentioned. It has been integrated into HBase TRUNK. Once the feature gets validated, we can backport into 0.92 branch. was (Author: yuzhih...@gmail.com): @Bryce: Have you checked out HBASE-1744 ? Add checkAndPut to the Thrift API - Key: HBASE-3307 URL: https://issues.apache.org/jira/browse/HBASE-3307 Project: HBase Issue Type: New Feature Components: thrift Affects Versions: 0.89.20100924 Reporter: Chris Tarnas Priority: Minor It would be very useful to have the checkAndPut method available via the Thrift API. This would both allow for easier atomic updates as well as cut down on at least one Thrift roundtrip for quite a few common tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
[ https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155543#comment-13155543 ] Jonathan Hsieh commented on HBASE-4842: --- I'll file a new issue. The main issue isn't what is returned, but when. With the first 'hbck -fix', the master makes a call to the regionserver to issue a request open the region (which adds data to meta). This returns right away. The next hbck call will cause the master query meta again which is used to check consistency. Sometimes the new meta entries are fixed before the second hbck call is done (failing the test), sometimes it is not (not failing). The slight delay allows the open request to finish and the meta entry to be updated before the subsequent 'hbck' call. [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck --- Key: HBASE-4842 URL: https://issues.apache.org/jira/browse/HBASE-4842 Project: HBase Issue Type: Bug Affects Versions: 0.90.4, 0.92.0, 0.94.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is intermittently failing. In the test, a region's assignment is purposely changed in META but not in ZK. After the equivalent of 'hbck -fix', a subsequent check that should be clean comes up with a new ZK assignment but with META still being inconsistent with ZK. The RS in ZK sometimes this points to the same RS, but sometimes it moves to another ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
[ https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155544#comment-13155544 ] Jonathan Hsieh commented on HBASE-4842: --- Also, I don't think dist log splitting has anything do to with this failure. [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck --- Key: HBASE-4842 URL: https://issues.apache.org/jira/browse/HBASE-4842 Project: HBase Issue Type: Bug Affects Versions: 0.90.4, 0.92.0, 0.94.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is intermittently failing. In the test, a region's assignment is purposely changed in META but not in ZK. After the equivalent of 'hbck -fix', a subsequent check that should be clean comes up with a new ZK assignment but with META still being inconsistent with ZK. The RS in ZK sometimes this points to the same RS, but sometimes it moves to another ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
[ https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1314#comment-1314 ] stack commented on HBASE-4842: -- bq. Also, I don't think dist log splitting has anything do to with this failure. True. This was a misunderstanding on my part of a J-D comment up on IRC. [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck --- Key: HBASE-4842 URL: https://issues.apache.org/jira/browse/HBASE-4842 Project: HBase Issue Type: Bug Affects Versions: 0.90.4, 0.92.0, 0.94.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is intermittently failing. In the test, a region's assignment is purposely changed in META but not in ZK. After the equivalent of 'hbck -fix', a subsequent check that should be clean comes up with a new ZK assignment but with META still being inconsistent with ZK. The RS in ZK sometimes this points to the same RS, but sometimes it moves to another ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4852) Tests that use RegionServer.openRegion such as TestHBaseFsck#testHBaseFsck should call openRegion synchronously
Tests that use RegionServer.openRegion such as TestHBaseFsck#testHBaseFsck should call openRegion synchronously --- Key: HBASE-4852 URL: https://issues.apache.org/jira/browse/HBASE-4852 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Jonathan Hsieh Certain test cases like HBaseFsck#testHBaseFsck make calls to assign region servers and then read meta. The tests or hbck should be modified to make the RegionServer.openRegion call act synchronously. The main issue isn't what is returned, but when. Specifically in HBaseFsck#testHBaseFsck, the first 'hbck -fix', the master makes a call to the regionserver to issue an asynchronous request to open the region (which adds data to meta). The regionserver returns right away. The next hbck call will cause the master query meta again which is used to check consistency. A race is exposed -- sometimes the new meta entries are fixed before the second hbck call is done (failing the test), sometimes it is not (not failing). The hack in HBASE-4842 introduces a slight delay which usually allows the open request to finish and the meta entry to be updated before the subsequent 'hbck' call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4120) isolation and allocation
[ https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155568#comment-13155568 ] jirapos...@reviews.apache.org commented on HBASE-4120: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1421/#review3444 --- initial comments. haven't had time to look through the whole diff in detail. I'm guessing this was made against 0.90? This is missing integration with a lot of new functionality added a while ago into 92. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java https://reviews.apache.org/r/1421/#comment7687 should this functionality not be disabled by default? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityFunction.java https://reviews.apache.org/r/1421/#comment7696 this pull model seems a little hacky. why can't we just push this information when a region comes online on this server? we have online schema updates, so we can update the table priority with minimal downtime. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityFunction.java https://reviews.apache.org/r/1421/#comment7694 this pull model seems a little hacky. why can't we just push this information when a region comes online on this server? we have online schema updates, so we can update the table priority with minimal downtime. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityFunction.java https://reviews.apache.org/r/1421/#comment7688 instead of this big switch statement, why don't you add a getPriority() function to the Operation base class? All database operations (gets, puts, deletes) are supposed to override this anyways do we can print out fingerprint and other debug information. You can see its use in WritableRpcEngine.java http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityFunction.java https://reviews.apache.org/r/1421/#comment7689 your intent is to lower the priority of multiputs, correct? I see that HIGHEST_PRI = -10. If a MultiAction has that priority, won't it make a MultiActionT have a higher priority than T? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityFunction.java https://reviews.apache.org/r/1421/#comment7690 again, I'm confused about why we need to contact the master to get this information. we only care about the regions that are online for this server, correct? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityFunction.java https://reviews.apache.org/r/1421/#comment7691 can you explain this more? what does this mapping look like? In other words thread pri 1-10 == system pri X-Y http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityJobQueue.java https://reviews.apache.org/r/1421/#comment7700 why are all these accessors needed? Why can't you just use if(this.queue.size() this.capacity) queueFull.signal() http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityJobQueue.java https://reviews.apache.org/r/1421/#comment7698 this should be signal() instead of signalAll(). You will only decrease capacity by 1, so you should only wake 1 thread. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityJobQueue.java https://reviews.apache.org/r/1421/#comment7692 so, if you've waited too long, you'll add the Job to the queue anyways? it's not a tryAdd() then, which would imply that the action could fail. It's just a normal add() http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityJobQueue.java https://reviews.apache.org/r/1421/#comment7699 there is a race condition between (size = capacity) and addLock. Shouldn't you test the condition again after getting the lock? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityJobQueue.java https://reviews.apache.org/r/1421/#comment7701 LOG.debug - Nicolas On 2011-11-22 06:06:45, Jia Liu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1421/ bq. --- bq. bq. (Updated 2011-11-22 06:06:45) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Patch used for table priority alone,In this patch, not only tables can have
[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155570#comment-13155570 ] nkeywal commented on HBASE-4832: fyi, the patch for the region server itself is in HBASE-4833, if the trunk changed I will update the patch. TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: Eugene Koontz Priority: Minor Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725) at
[jira] [Commented] (HBASE-4811) Support reverse Scan
[ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155573#comment-13155573 ] stack commented on HBASE-4811: -- I'd suggest you spend more time w/ the code base to see how much of effort would be required doing a reverse scan (Superficially, yes, our MemStore is a NavigableSet but that is not what client interacts with; ditto our sstable-like hfile thing. IIRC leveldb counsels that the reverse range is going against the grain and at a minimum is much slower than the natural scan). Support reverse Scan Key: HBASE-4811 URL: https://issues.apache.org/jira/browse/HBASE-4811 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.20.6 Reporter: John Carrino All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending. Is there a fundamental reason that HBase only supports forward Scan? It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables. I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4811) Support reverse Scan
[ https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HBASE-4811: --- Comment: was deleted (was: Yeah, I'm not that familiar with the codebase, but I'd assume that in order to get forward scans you'd have to have the data sorted. And from what I understand it is internally stored as sstables or HFiles. If you have it sorted to scan in one direction, it seems pretty easy to go the other direction. LevelDb uses ssTables and supports reverse ranges. The only thing that I could think of from the design (from a high level) that might make it difficult to do reverse ranges is dealing with splitting ranges when moving ranges from one region server to another. Just from a quick look at MemStore that you mention, it uses a KeyValueSkipListSet under the covers that is a NavigableSet and supports descendingSet and descendingIterator. -jc On Tue, Nov 22, 2011 at 12:52 PM, stack (Commented) (JIRA) ) Support reverse Scan Key: HBASE-4811 URL: https://issues.apache.org/jira/browse/HBASE-4811 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.20.6 Reporter: John Carrino All the documentation I find about HBase says that if you want forward and reverse scans you should just build 2 tables and one be ascending and one descending. Is there a fundamental reason that HBase only supports forward Scan? It seems like a lot of extra space overhead and coding overhead (to keep them in sync) to support 2 tables. I am assuming this has been discussed before, but I can't find the discussions anywhere about it or why it would be infeasible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4848) TestScanner failing because hostname can't be null
[ https://issues.apache.org/jira/browse/HBASE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155576#comment-13155576 ] Hudson commented on HBASE-4848: --- Integrated in HBase-0.92 #161 (See [https://builds.apache.org/job/HBase-0.92/161/]) HBASE-4848 TestScanner failing because hostname can't be null stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanner.java TestScanner failing because hostname can't be null -- Key: HBASE-4848 URL: https://issues.apache.org/jira/browse/HBASE-4848 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: stack Assignee: stack Fix For: 0.90.5 Attachments: 4848-092.txt, 4848.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4849) TestCatalogTracker can fail if an existing zookeeper running
[ https://issues.apache.org/jira/browse/HBASE-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155575#comment-13155575 ] Hudson commented on HBASE-4849: --- Integrated in HBase-0.92 #161 (See [https://builds.apache.org/job/HBase-0.92/161/]) HBASE-4849 TestCatalogTracker can fail if an existing zookeeper running HBASE-4849 TestCatalogTracker can fail if an existing zookeeper running stack : Files : * /hbase/branches/0.92/CHANGES.txt stack : Files : * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java TestCatalogTracker can fail if an existing zookeeper running Key: HBASE-4849 URL: https://issues.apache.org/jira/browse/HBASE-4849 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.0 Attachments: 4849.txt This fact sunk my attempt at building an RC. Fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155574#comment-13155574 ] Hadoop QA commented on HBASE-4832: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504809/HBASE-4832.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/339//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/339//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/339//console This message is automatically generated. TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: Eugene Koontz Priority: Minor Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
[jira] [Created] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155578#comment-13155578 ] Jean-Daniel Cryans commented on HBASE-4853: --- An example of data loss. First, we see a flush with seqid being 1389 but the message Why is there a raw encodedRegionName in lastSeqWritten means it deleted 1389. {quote} 2011-11-22 11:21:08,526 WARN [RegionServer:1;hbasedev,54530,1321989619241.cacheFlusher] wal.HLog(1364): Why is there a raw encodedRegionName in lastSeqWritten? name=a9b1d96a4554f545a74142bc42f8c48e, seqid=1389 2011-11-22 11:21:08,526 INFO [RegionServer:1;hbasedev,54530,1321989619241.cacheFlusher] regionserver.HRegion(1259): Finished memstore flush of ~654.3k for region test,,1321989625594.a9b1d96a4554f545a74142bc42f8c48e. in 493ms, sequenceid=1388, compaction requested=false {quote} HLog roll happens, it sees 0 last seq id so it clears all the WALs. {quote} 2011-11-22 11:21:08,932 INFO [RegionServer:1;hbasedev,54530,1321989619241.logRoller] wal.HLog(582): Roll /user/jdcryans/.logs/hbasedev,54530,1321989619241/hbasedev%2C54530%2C1321989619241.1321989667490, entries=41, filesize=25596. for /user/jdcryans/.logs/hbasedev,54530,1321989619241/hbasedev%2C54530%2C1321989619241.1321989668526 2011-11-22 11:21:08,932 DEBUG [RegionServer:1;hbasedev,54530,1321989619241.logRoller] wal.HLog(593): Last sequenceid written is empty. Deleting all old hlogs {quote} The last one had max seq id as 1424, which was never flushed. {quote} 2011-11-22 11:21:08,967 INFO [RegionServer:1;hbasedev,54530,1321989619241.logRoller] wal.HLog(823): moving old hlog file /user/jdcryans/.logs/hbasedev,54530,1321989619241/hbasedev%2C54530%2C1321989619241.1321989667490 whose highest sequenceid is 1424 to /user/jdcryans/.oldlogs/hbasedev%2C54530%2C1321989619241.1321989667490 {quote} HBASE-4789 does overzealous pruning of seqids - Key: HBASE-4853 URL: https://issues.apache.org/jira/browse/HBASE-4853 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Working w/ J-D on failing replication test turned up hole in seqids made by the patch over in hbase-4789. With this patch in place we see lots of instances of the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs' At a minimum, these lines need removing: {code} diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java index 623edbe..a0bbe01 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { // Cleaning up of lastSeqWritten is in the finally clause because we // don't want to confuse getOldestOutstandingSeqNum() this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); - Long l = this.lastSeqWritten.remove(encodedRegionName); - if (l != null) { -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? name= + - Bytes.toString(encodedRegionName) + , seqid= + l); - } this.cacheFlushLock.unlock(); } } {code} ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4854) it seems that CLASSPATH elements coming from Hadoop change HBase behaviour
it seems that CLASSPATH elements coming from Hadoop change HBase behaviour -- Key: HBASE-4854 URL: https://issues.apache.org/jira/browse/HBASE-4854 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.92.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik It looks like HBASE-3465 introduced a slight change in behavior. The ordering of classpath elements makes Hadoop ones go before the HBase ones, which leads to log4j properties picked up from the wrong place, etc. It seems that the easies way to fix that would be to revert the ordering of classpath. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4120) isolation and allocation
[ https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155584#comment-13155584 ] Ted Yu commented on HBASE-4120: --- @Nicolas: Thanks for your valuable review. This feature was developed independently of HBASE-1730/HBASE-4213. Do you think priority refreshing can be done in a separate JIRA ? isolation and allocation Key: HBASE-4120 URL: https://issues.apache.org/jira/browse/HBASE-4120 Project: HBase Issue Type: New Feature Components: master, regionserver Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0 Reporter: Liu Jia Assignee: Liu Jia Fix For: 0.94.0 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, Design_document_for_HBase_isolation_and_allocation_Revised.pdf, HBase_isolation_and_allocation_user_guide.pdf, Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, TablePriority_v8.patch, TablePriority_v8.patch, TablePriority_v8_for_trunk.patch The HBase isolation and allocation tool is designed to help users manage cluster resource among different application and tables. When we have a large scale of HBase cluster with many applications running on it, there will be lots of problems. In Taobao there is a cluster for many departments to test their applications performance, these applications are based on HBase. With one cluster which has 12 servers, there will be only one application running exclusively on this server, and many other applications must wait until the previous test finished. After we add allocation manage function to the cluster, applications can share the cluster and run concurrently. Also if the Test Engineer wants to make sure there is no interference, he/she can move out other tables from this group. In groups we use table priority to allocate resource, when system is busy; we can make sure high-priority tables are not affected lower-priority tables Different groups can have different region server configurations, some groups optimized for reading can have large block cache size, and others optimized for writing can have large memstore size. Tables and region servers can be moved easily between groups; after changing the configuration, a group can be restarted alone instead of restarting the whole cluster. git entry : https://github.com/ICT-Ope/HBase_allocation . We hope our work is helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4854) it seems that CLASSPATH elements coming from Hadoop change HBase behaviour
[ https://issues.apache.org/jira/browse/HBASE-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Shaposhnik updated HBASE-4854: Status: Patch Available (was: Open) it seems that CLASSPATH elements coming from Hadoop change HBase behaviour -- Key: HBASE-4854 URL: https://issues.apache.org/jira/browse/HBASE-4854 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.92.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Attachments: HBASE-4854.patch.txt It looks like HBASE-3465 introduced a slight change in behavior. The ordering of classpath elements makes Hadoop ones go before the HBase ones, which leads to log4j properties picked up from the wrong place, etc. It seems that the easies way to fix that would be to revert the ordering of classpath. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4854) it seems that CLASSPATH elements coming from Hadoop change HBase behaviour
[ https://issues.apache.org/jira/browse/HBASE-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Shaposhnik updated HBASE-4854: Attachment: HBASE-4854.patch.txt it seems that CLASSPATH elements coming from Hadoop change HBase behaviour -- Key: HBASE-4854 URL: https://issues.apache.org/jira/browse/HBASE-4854 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.92.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Attachments: HBASE-4854.patch.txt It looks like HBASE-3465 introduced a slight change in behavior. The ordering of classpath elements makes Hadoop ones go before the HBase ones, which leads to log4j properties picked up from the wrong place, etc. It seems that the easies way to fix that would be to revert the ordering of classpath. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92
[ https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155596#comment-13155596 ] Lars Hofhansl commented on HBASE-4838: -- This still fails: {noformat} testFilterAcrossMultipleRegions(org.apache.hadoop.hbase.client.TestFromClientSid e) Time elapsed: 12.233 sec FAILURE! java.lang.AssertionError: expected:17576 but was:28064 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.client.TestFromClientSide.assertRowCount(Test FromClientSide.java:528) at org.apache.hadoop.hbase.client.TestFromClientSide.testFilterAcrossMul tipleRegions(TestFromClientSide.java:436) {noformat} I went through the entire diff between 0.92 with this patch and trunk. There is nothing that sticks out that could be causing this. This problem be easily reproduced by reducing the number of rows generated in HBaseTestingUtility.loadTable(...) to (note: replace 'z' with 'a' and 'b'): {noformat} public int loadTable(final HTable t, final byte[] f) throws IOException { t.setAutoFlush(false); byte[] k = new byte[3]; int rowCount = 0; for (byte b1 = 'a'; b1 = 'a'; b1++) { for (byte b2 = 'a'; b2 = 'a'; b2++) { for (byte b3 = 'a'; b3 = 'b'; b3++) { k[0] = b1; k[1] = b2; k[2] = b3; Put put = new Put(k); put.add(f, null, k); t.put(put); rowCount++; } } } t.flushCommits(); return rowCount; } {noformat} this will only generate two rows for this test, which makes it easier to debug. The same test then fails with: {noformat} java.lang.AssertionError: expected:2 but was:4 ... {noformat} Somehow it is scanning more rows after a split that it is supposed to repeating some of the same rows. If somebody else wanted have a look at it, I could use another pair of eyes. Port 2856 (TestAcidGuarantee is failing) to 0.92 Key: HBASE-4838 URL: https://issues.apache.org/jira/browse/HBASE-4838 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4838-v1.txt Moving back port into a separate issue (as suggested by JonH), because this not trivial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira