[jira] [Commented] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258662#comment-13258662 ] stack commented on HBASE-5833: -- Looking at this more, it seems like the actual TMF tests run pretty fast but then the test gets stuck a long time finishing up on the end... like 10x more time finishing than was spent running the test. This long cleanup seems to be our 'timeout'. In TMF case, digging, there are orphan logsyncer threads outstanding trying to sync a fileystem that has been since closed. This seems to be part of the issue (There are other orphan threads -- executors). TMF does HRegion.createRegions but it doesn't pass in a WAL to use in creation. In this case, this method makes an individual WAL for the create use. In TMF, and in a bunch of other tests, this hlog is never closed leaving logsyncer threads running (and some executors). Will attach a patch that goes through all tests and does close and cleanup of wals that use this special createRegions method (there are a bunch). 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5833: - Attachment: closehregions.txt This patch is for 0.92 only; thats where I'm doing investigation. Doesn't apply to trunk yet. 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HBASE-5833: -- Reopen 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258796#comment-13258796 ] stack commented on HBASE-5844: -- If we go to a count 100 we just continue the startup? Is that what you want? {code} +while (!tracker.checkIfBaseNodeAvailable() ++count100) { + Thread.sleep(100); +} {code} Be like the rest of the code regards spaces; i.e. spaces around operators... + +if (fileName==null){ Maybe you don't need deleteMyEphemeralNodeOnDisk if you instead use http://docs.oracle.com/javase/6/docs/api/java/io/File.html#deleteOnExit() inside in writeMyEphemeralNodeOnDisk? Patch looks good N. We upped the timeout because noobs would install hbase then run big mapreduce jobs w/o turning jvm and so big GCs. We figured they'd rather have their regionserver ride over the big pauses than have them be 'sensitive' out of the box. Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5844.v1.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5621) Convert admin protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5621: - Attachment: hbase_5621_v4.patch Retrying Convert admin protocol of HRegionInterface to PB Key: HBASE-5621 URL: https://issues.apache.org/jira/browse/HBASE-5621 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5621_v3.patch, hbase_5621_v4.patch, hbase_5621_v4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5621) Convert admin protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5621: - Status: Patch Available (was: Open) Convert admin protocol of HRegionInterface to PB Key: HBASE-5621 URL: https://issues.apache.org/jira/browse/HBASE-5621 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5621_v3.patch, hbase_5621_v4.patch, hbase_5621_v4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5621) Convert admin protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5621: - Status: Open (was: Patch Available) Convert admin protocol of HRegionInterface to PB Key: HBASE-5621 URL: https://issues.apache.org/jira/browse/HBASE-5621 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5621_v3.patch, hbase_5621_v4.patch, hbase_5621_v4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258988#comment-13258988 ] stack commented on HBASE-5844: -- Ok on your reasoning for not using deleteOnExit. Try and have the two methods share more code like getting the name of the file w/ the znode name in it. Otherwise, sounds good. Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5844.v1.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5621) Convert admin protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258994#comment-13258994 ] stack commented on HBASE-5621: -- We seem to have run a good few less tests, ~880 vs 906 or so Does it pass all tests for you Jimmy? Thanks. Convert admin protocol of HRegionInterface to PB Key: HBASE-5621 URL: https://issues.apache.org/jira/browse/HBASE-5621 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5621_v3.patch, hbase_5621_v4.patch, hbase_5621_v4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258998#comment-13258998 ] stack commented on HBASE-5833: -- More digging. The newest test added here, testShouldCheckMasterFailOverWhenMETAIsInOpenedState, is a little interesting. It was added by this commit: {code} r1172063 | tedyu | 2011-09-17 13:27:00 -0700 (Sat, 17 Sep 2011) | 3 lines HBASE-4400 .META. getting stuck if RS hosting it is dead and znode state is in RS_ZK_REGION_OPENED (Ramkrishna) {code} The test is a bunch of copy/paste confirming stuff its not using. It then does a cluster shutdown but does it explicitly on a cluster object and not via HBaseTestingUtility though it then starts a cluster subsequently with HBaseTestingUtility. Not using HTU to do both the shutodwn and the startup can make he HTU state confused on whether there a master available so we just wait for ever. This seems to be responsible for case where test would timeout after 15 minutes and say no tests run and none failed. I added a timeout for this test of 3 minutes. Other interesting stuff is that this TestMasterFailover starts clusters per method but shutdown leaves around some threads. I dug in some and was able to clean up an LruBlockCache eviction thread but others persist and would take a little more work to undo. They seem harmless but I'll list them anyways: {code} TestMasterFailover [JUnit] org.eclipse.jdt.internal.junit.runner.RemoteTestRunner at localhost:54811 Thread [main] (Running) Thread [ReaderThread] (Running) Thread [Thread-2] (Suspended (breakpoint at line 587 in HBaseTestingUtility)) HBaseTestingUtility.shutdownMiniCluster() line: 587 TestMasterFailover.testSimpleMasterFailover() line: 178 NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method] NativeMethodAccessorImpl.invoke(Object, Object[]) line: 39 DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 25 Method.invoke(Object, Object...) line: 597 FrameworkMethod$1.runReflectiveCall() line: 45 FrameworkMethod$1(ReflectiveCallable).run() line: 15 FrameworkMethod.invokeExplosively(Object, Object...) line: 42 InvokeMethod.evaluate() line: 20 FailOnTimeout$StatementThread.run() line: 62 Daemon Thread [Poller SunPKCS11-Darwin] (Running) Thread [pool-1-thread-1] (Running) Thread [pool-2-thread-1] (Running) Thread [pool-3-thread-1] (Running) Thread [pool-4-thread-1] (Running) Daemon Thread [LeaseChecker] (Running) Daemon Thread [RegionServer:2;192.168.1.74,54842,1335066804457.decayingSampleTick.1] (Running) Daemon Thread [Master:2;192.168.1.74,54838,1335066803952-SendThread(fe80:0:0:0:0:0:0:1%1:21818)] (Running) Daemon Thread [Master:2;192.168.1.74,54838,1335066803952-EventThread] (Running) Daemon Thread [Master:1;192.168.1.74,54836,1335066798880-EventThread] (Running) Daemon Thread [Master:1;192.168.1.74,54836,1335066798880-SendThread(localhost:21818)] (Running) /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java (Apr 21, 2012 8:53:07 PM) {code} The thread names are enhanced -- v2 of this patch -- but things like decayingSampleTick are set in a static so hard to get rid of in test setup. The SendThread/EventThread are zk client hangouts. Not sure what pool-4-thread-1 are (I've enhanced the HTable executor to include htable in name so these are identifiable going forward but above executor does not seem to be HTable). 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Updated] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5833: - Attachment: 5833-v2.092.txt {code} M src/main/java/org/apache/hadoop/hbase/client/HTable.java Give the htable executor an htable prefix. M src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java Add a shutdown so I can close out the eviction thread. M src/main/java/org/apache/hadoop/hbase/metrics/histogram/ExponentiallyDecayingSample.java Add hosting thread prefix to the executor made thread names. M src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java Add javadoc warning regions made must be closed when done and their logs closed too to avoid leaving threads hanging. Added utility closeHRegion to answer createHRegion. M src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java Closing out masters in cluster, close out the backup masters first. M src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java Javadoc that you have to close out regions made with createNewRegion and close their log. M src/test/java/org/apache/hadoop/hbase/filter/TestColumnPrefixFilter.java M src/test/java/org/apache/hadoop/hbase/filter/TestDependentColumnFilter.java M src/test/java/org/apache/hadoop/hbase/filter/TestFilter.java M src/test/java/org/apache/hadoop/hbase/filter/TestMultipleColumnPrefixFilter.java M src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java M src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java M src/test/java/org/apache/hadoop/hbase/regionserver/TestColumnSeeking.java M src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java M src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java Do close of region and close and delete of wal log when done w/ tests. M src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java Do close of region and close and delete of wal log when done w/ tests. Also, some cleanup in testShouldCheckMasterFailOverWhenMETAIsInOpenedState Removed useless code, made it use HBaseTestingUtility to do cluster shutdown so HTU was clear on what state of test was and added a timeout. {code} 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833-v2.092.txt, 5833.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5833: - Attachment: 5833-trunk.txt Trunk patch 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833-trunk.txt, 5833-v2.092.txt, 5833.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5833: - Status: Patch Available (was: Reopened) Trying trunk patch against hadoopqa. Trunk patch is not as fat as the 0.92/0.90 patch because a bunch of the fixing has been done in trunk already. 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833-trunk.txt, 5833-v2.092.txt, 5833.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5833: - Attachment: 5833v3092.txt Running tests, I found that I'd failed convert a method in TestHRegion. 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833-trunk.txt, 5833-v2.092.txt, 5833.txt, 5833v3092.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5833: - Attachment: 5833v4092.txt Address last of Ted's comments. 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833-trunk.txt, 5833-v2.092.txt, 5833.txt, 5833v3092.txt, 5833v4092.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259355#comment-13259355 ] stack commented on HBASE-5833: -- I ran through complete test suite twice and got the following failures (I fixed the failed TestHRegion w/ last patch). This is on 092. {code} Results : Failed tests: testGetScanner_WithOkFamilies(org.apache.hadoop.hbase.regionserver.TestHRegion): Families could not be found in Region testCyclicReplication(org.apache.hadoop.hbase.replication.TestMasterReplication): Waited too much time for put replication testLeaderSelection(org.apache.hadoop.hbase.zookeeper.TestZKLeaderManager): New leader should exist after stop Tests in error: testTableOperations(org.apache.hadoop.hbase.coprocessor.TestMasterObserver): org.apache.hadoop.hbase.InvalidFamilyOperationException: Column family 'fam2' does not exist testRegionTransitionOperations(org.apache.hadoop.hbase.coprocessor.TestMasterObserver): org.apache.hadoop.hbase.TableExistsException: observed_table testShouldCheckMasterFailOverWhenMETAIsInOpenedState(org.apache.hadoop.hbase.master.TestMasterFailover): test timed out after 18 milliseconds testSimplePutDelete(org.apache.hadoop.hbase.replication.TestMasterReplication): Cluster already running at /Users/stack/checkouts/hbase/target/test-data/49fb7ad7-156f-4461-964b-7bb54d70db63/dfscluster_c7c5efc0-4085-4914-9be5-c2b3e1530af9 Tests run: 1074, Failures: 3, Errors: 4, Skipped: 8 Results : Failed tests: testGetScanner_WithOkFamilies(org.apache.hadoop.hbase.regionserver.TestHRegion): Families could not be found in Region testCyclicReplication(org.apache.hadoop.hbase.replication.TestMasterReplication): Waited too much time for put replication Tests in error: org.apache.hadoop.hbase.client.TestAdmin: No server address listed in -ROOT- for region .META.,,1.1028785192 testSimplePutDelete(org.apache.hadoop.hbase.replication.TestMasterReplication): Cluster already running at /Users/stack/checkouts/hbase/target/test-data/fcfffe27-e2ce-4f91-84fc-bd65521b6426/dfscluster_bd55ee62-0532-40d4-86e5-487bd45d9565 Tests run: 1075, Failures: 2, Errors: 2, Skipped: 8 {code} Let me commit this on 0.92. Will then work on getting it into 0.90 -- since same breakage is there but will take some massaging and testing to get this patch in -- and ditto on 0.94 and trunk (something up w/ TestHRegion on trunk at mo). 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833-trunk.txt, 5833-v2.092.txt, 5833.txt, 5833v3092.txt, 5833v4092.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259380#comment-13259380 ] stack commented on HBASE-5833: -- Committed 5833v4092.txt to 0.92 branch. 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833-trunk.txt, 5833-v2.092.txt, 5833.txt, 5833v3092.txt, 5833v4092.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5833: - Attachment: 5833v4090.txt Here is patch for 0.90. Ran all tests and all pass. Will commit. 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833-trunk.txt, 5833-v2.092.txt, 5833.txt, 5833v3092.txt, 5833v4090.txt, 5833v4092.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5857) RIT map in RS not getting cleared while region opening
[ https://issues.apache.org/jira/browse/HBASE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259649#comment-13259649 ] stack commented on HBASE-5857: -- @Chinna Good find. Nice use of Mockito Whitebox too. Does your test confirm the fix though? Maybe I'm not reading it right. Should it check the regionserver instance does not have the test region in its this.regionsInTransitionInRS Map? Thanks. RIT map in RS not getting cleared while region opening -- Key: HBASE-5857 URL: https://issues.apache.org/jira/browse/HBASE-5857 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Chinna Rao Lalam Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5857_0.92.patch, HBASE-5857_94.patch, HBASE-5857_trunk.patch While opening the region in RS after adding the region to regionsInTransitionInRS if tableDescriptors.get() throws exception the region wont be cleared from regionsInTransitionInRS. So next time if it tries to open the region in the same RS it will throw the RegionAlreadyInTransitionException. if swap the below statement this issue wont come. {code} this.regionsInTransitionInRS.putIfAbsent(region.getEncodedNameAsBytes(),true); HTableDescriptor htd = this.tableDescriptors.get(region.getTableName()); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5861) Hadoop 23 compile broken due to tests introduced in HBASE-5064
[ https://issues.apache.org/jira/browse/HBASE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259773#comment-13259773 ] stack commented on HBASE-5861: -- @Jon Which code breaks the build? The hbase-5064 changes have been in there with a good while now. Hadoop 23 compile broken due to tests introduced in HBASE-5064 --- Key: HBASE-5861 URL: https://issues.apache.org/jira/browse/HBASE-5861 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Priority: Blocker Fix For: 0.94.0, 0.96.0 When attempting to compile HBase 0.94rc1 against hadoop 23, I got this set of compilation error messages: {code} jon@swoop:~/proj/hbase-0.94$ mvn clean test -Dhadoop.profile=23 -DskipTests ... [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 18.926s [INFO] Finished at: Mon Apr 23 10:38:47 PDT 2012 [INFO] Final Memory: 55M/555M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile (default-testCompile) on project hbase: Compilation failure: Compilation failure: [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[147,46] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[153,29] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[194,46] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[206,29] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[213,29] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[226,29] org.apache.hadoop.mapreduce.TaskAttemptContext is abstract; cannot be instantiated [ERROR] - [Help 1] {code} Upon further investigation this issue is due to code introduced in HBASE-5064 and is also present in trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5830) Cleanup SequenceFileLogWriter to use syncFs api from SequenceFile#Writer directly in trunk.
[ https://issues.apache.org/jira/browse/HBASE-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259784#comment-13259784 ] stack commented on HBASE-5830: -- This will work for hadoop 1.0.x and for hadoop 2.x Uma? Cleanup SequenceFileLogWriter to use syncFs api from SequenceFile#Writer directly in trunk. --- Key: HBASE-5830 URL: https://issues.apache.org/jira/browse/HBASE-5830 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.96.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HBASE-5830.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5621) Convert admin protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259788#comment-13259788 ] stack commented on HBASE-5621: -- TestHRegion has a problem the way its written. Will fix in 5833 commit that is coming up. Convert admin protocol of HRegionInterface to PB Key: HBASE-5621 URL: https://issues.apache.org/jira/browse/HBASE-5621 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5621_v3.patch, hbase_5621_v4.patch, hbase_5621_v4.patch, hbase_5621_v5.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5621) Convert admin protocol of HRegionInterface to PB
[ https://issues.apache.org/jira/browse/HBASE-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5621: - Resolution: Fixed Status: Resolved (was: Patch Available) Applied to trunk. Thanks for the patch Jimmy. Convert admin protocol of HRegionInterface to PB Key: HBASE-5621 URL: https://issues.apache.org/jira/browse/HBASE-5621 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5621_v3.patch, hbase_5621_v4.patch, hbase_5621_v4.patch, hbase_5621_v5.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5830) Cleanup SequenceFileLogWriter to use syncFs api from SequenceFile#Writer directly in trunk.
[ https://issues.apache.org/jira/browse/HBASE-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259802#comment-13259802 ] stack commented on HBASE-5830: -- +1 on patch then. Cleanup SequenceFileLogWriter to use syncFs api from SequenceFile#Writer directly in trunk. --- Key: HBASE-5830 URL: https://issues.apache.org/jira/browse/HBASE-5830 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.96.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HBASE-5830.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5851) TestProcessBasedCluster sometimes fails
[ https://issues.apache.org/jira/browse/HBASE-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259805#comment-13259805 ] stack commented on HBASE-5851: -- This is secure hbase only? We should change the subject. Thanks for digging in Jimmy. Make it a blocker too? TestProcessBasedCluster sometimes fails --- Key: HBASE-5851 URL: https://issues.apache.org/jira/browse/HBASE-5851 Project: HBase Issue Type: Test Reporter: Zhihong Yu Assignee: Jimmy Xiang TestProcessBasedCluster failed in https://builds.apache.org/job/HBase-TRUNK-security/178 Looks like cluster failed to start: {code} 2012-04-21 14:22:32,666 INFO [Thread-1] util.ProcessBasedLocalHBaseCluster(176): Waiting for HBase to startup. Retries left: 2 java.io.IOException: Giving up trying to location region in meta: thread is interrupted. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1173) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:956) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:917) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:252) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:174) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:134) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:178) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259811#comment-13259811 ] stack commented on HBASE-5844: -- Should master write out its znode name too? If it crashes this code could bring on the second master faster? Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5844.v1.patch, 5844.v2.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5856) byte - String is not consistent between HBaseAdmin and HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259817#comment-13259817 ] stack commented on HBASE-5856: -- I think regionNameStr in HRI is wrong. We should only do toBytesBinary for the cases where we are outputting in shell or in ui; toBytesBinary is for human consumption. HBase should be about undoctored bytes. byte - String is not consistent between HBaseAdmin and HRegionInfo Key: HBASE-5856 URL: https://issues.apache.org/jira/browse/HBASE-5856 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1 Reporter: binlijin In HBaseAdmin public void split(final String tableNameOrRegionName) throws IOException, InterruptedException { split(Bytes.toBytes(tableNameOrRegionName)); // string - byte } In HRegionInfo this.regionNameStr = Bytes.toStringBinary(this.regionName); // byte - string Should we use Bytes.toBytesBinary in HBaseAdmin ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259866#comment-13259866 ] stack commented on HBASE-5833: -- Two 0.92 builds in a row just passed (Previous it failed four in a row...). Of the last three 0.90 builds, two passed. Previous four failed in a row (the fail in the middle was a timed out TestShell). 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833-trunk.txt, 5833-v2.092.txt, 5833.txt, 5833v3092.txt, 5833v4090.txt, 5833v4092.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5833: - Attachment: 5833v4trunk.txt Patch for trunk. Smaller than 0.92 patch because some of its work already done in trunk. One extra in this patch is fix for TestHRegion. The test testDataCorrectnessReplayingRecoveredEdits had a flakey manner of calculating which regionserver to kill so we'd end up stuck in an endless loop because we'd want to move a region back to where it was already running. I fixed the calculation and moved this test out to a new class, TestHRegionOnCluster because this was only test in TestHRegion that spun up a cluster. 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833-trunk.txt, 5833-v2.092.txt, 5833.txt, 5833v3092.txt, 5833v4090.txt, 5833v4092.txt, 5833v4trunk.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5833: - Attachment: 5833v5trunk.txt What I committed to trunk. 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.2 Attachments: 5833-trunk.txt, 5833-v2.092.txt, 5833.txt, 5833v3092.txt, 5833v4090.txt, 5833v4092.txt, 5833v4trunk.txt, 5833v5trunk.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5833: - Attachment: 5833v5094.txt What I applied to 0.94 branch. 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.90.7, 0.92.2, 0.94.1 Attachments: 5833-trunk.txt, 5833-v2.092.txt, 5833.txt, 5833v3092.txt, 5833v4090.txt, 5833v4092.txt, 5833v4trunk.txt, 5833v5094.txt, 5833v5trunk.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
[ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5833: - Resolution: Fixed Fix Version/s: 0.94.1 0.90.7 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.90, 0.92, 0.94, and trunk. 0.92 build has been failing pretty consistently on TestMasterFailover - Key: HBASE-5833 URL: https://issues.apache.org/jira/browse/HBASE-5833 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.90.7, 0.92.2, 0.94.1 Attachments: 5833-trunk.txt, 5833-v2.092.txt, 5833.txt, 5833v3092.txt, 5833v4090.txt, 5833v4092.txt, 5833v4trunk.txt, 5833v5094.txt, 5833v5trunk.txt, closehregions.txt Trunk seems fine but 0.92 fails on this test pretty regularly. Running it local it seems to hang for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259921#comment-13259921 ] stack commented on HBASE-5844: -- @N Ok. I'll commit this then in a new JIRA move out the common code to util or some place. I'll handle Ted's comment on commit. Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 5844.v1.patch, 5844.v2.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-5844. -- Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Committed to trunk. Thanks for the patch N. Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5844: - Attachment: 5844.v3.patch What I committed. Its v2 + addressing Ted comment. Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259928#comment-13259928 ] stack commented on HBASE-5849: -- Sounds good Enis. What should RS do then? On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5857) RIT map in RS not getting cleared while region opening
[ https://issues.apache.org/jira/browse/HBASE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5857: - Attachment: HBASE-5857_trunk.1.patch Retry against hadoopqa RIT map in RS not getting cleared while region opening -- Key: HBASE-5857 URL: https://issues.apache.org/jira/browse/HBASE-5857 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5857_0.92.1.patch, HBASE-5857_0.92.patch, HBASE-5857_0.94.1.patch, HBASE-5857_94.patch, HBASE-5857_trunk.1.patch, HBASE-5857_trunk.1.patch, HBASE-5857_trunk.patch While opening the region in RS after adding the region to regionsInTransitionInRS if tableDescriptors.get() throws exception the region wont be cleared from regionsInTransitionInRS. So next time if it tries to open the region in the same RS it will throw the RegionAlreadyInTransitionException. if swap the below statement this issue wont come. {code} this.regionsInTransitionInRS.putIfAbsent(region.getEncodedNameAsBytes(),true); HTableDescriptor htd = this.tableDescriptors.get(region.getTableName()); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5857) RIT map in RS not getting cleared while region opening
[ https://issues.apache.org/jira/browse/HBASE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5857: - Status: Open (was: Patch Available) RIT map in RS not getting cleared while region opening -- Key: HBASE-5857 URL: https://issues.apache.org/jira/browse/HBASE-5857 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5857_0.92.1.patch, HBASE-5857_0.92.patch, HBASE-5857_0.94.1.patch, HBASE-5857_94.patch, HBASE-5857_trunk.1.patch, HBASE-5857_trunk.1.patch, HBASE-5857_trunk.patch While opening the region in RS after adding the region to regionsInTransitionInRS if tableDescriptors.get() throws exception the region wont be cleared from regionsInTransitionInRS. So next time if it tries to open the region in the same RS it will throw the RegionAlreadyInTransitionException. if swap the below statement this issue wont come. {code} this.regionsInTransitionInRS.putIfAbsent(region.getEncodedNameAsBytes(),true); HTableDescriptor htd = this.tableDescriptors.get(region.getTableName()); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5857) RIT map in RS not getting cleared while region opening
[ https://issues.apache.org/jira/browse/HBASE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259932#comment-13259932 ] stack commented on HBASE-5857: -- +1 on patch. Formatting is off but we can fix on commit. Will wait on next hadoopqa run to see if previous failure real. RIT map in RS not getting cleared while region opening -- Key: HBASE-5857 URL: https://issues.apache.org/jira/browse/HBASE-5857 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5857_0.92.1.patch, HBASE-5857_0.92.patch, HBASE-5857_0.94.1.patch, HBASE-5857_94.patch, HBASE-5857_trunk.1.patch, HBASE-5857_trunk.1.patch, HBASE-5857_trunk.patch While opening the region in RS after adding the region to regionsInTransitionInRS if tableDescriptors.get() throws exception the region wont be cleared from regionsInTransitionInRS. So next time if it tries to open the region in the same RS it will throw the RegionAlreadyInTransitionException. if swap the below statement this issue wont come. {code} this.regionsInTransitionInRS.putIfAbsent(region.getEncodedNameAsBytes(),true); HTableDescriptor htd = this.tableDescriptors.get(region.getTableName()); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5857) RIT map in RS not getting cleared while region opening
[ https://issues.apache.org/jira/browse/HBASE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5857: - Status: Patch Available (was: Open) RIT map in RS not getting cleared while region opening -- Key: HBASE-5857 URL: https://issues.apache.org/jira/browse/HBASE-5857 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5857_0.92.1.patch, HBASE-5857_0.92.patch, HBASE-5857_0.94.1.patch, HBASE-5857_94.patch, HBASE-5857_trunk.1.patch, HBASE-5857_trunk.1.patch, HBASE-5857_trunk.patch While opening the region in RS after adding the region to regionsInTransitionInRS if tableDescriptors.get() throws exception the region wont be cleared from regionsInTransitionInRS. So next time if it tries to open the region in the same RS it will throw the RegionAlreadyInTransitionException. if swap the below statement this issue wont come. {code} this.regionsInTransitionInRS.putIfAbsent(region.getEncodedNameAsBytes(),true); HTableDescriptor htd = this.tableDescriptors.get(region.getTableName()); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5848) Create table with EMPTY_START_ROW passed as splitKey causes the HMaster to abort
[ https://issues.apache.org/jira/browse/HBASE-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259935#comment-13259935 ] stack commented on HBASE-5848: -- @Ram This is a nit comment for future patches, not for this one. I would suggest you avoid changes like below going forward: {code} - hRegionInfos = new HRegionInfo[]{ - new HRegionInfo(hTableDescriptor.getName(), null, null)}; + hRegionInfos = new HRegionInfo[] { new HRegionInfo(hTableDescriptor + .getName(), null, null) }; {code} The replacement is harder to read w/ its break in the middle of a phrase. Patch lgtm. Lars, does this fix the issue you saw? Create table with EMPTY_START_ROW passed as splitKey causes the HMaster to abort Key: HBASE-5848 URL: https://issues.apache.org/jira/browse/HBASE-5848 Project: HBase Issue Type: Bug Components: client Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Attachments: HBASE-5848.patch A coworker of mine just had this scenario. It does not make sense the EMPTY_START_ROW as splitKey (since the region with the empty start key is implicit), but it should not cause the HMaster to abort. The abort happens because it tries to bulk assign the same region twice and then runs into race conditions with ZK. The same would (presumably) happen when two identical split keys are passed, but the client blocks that. The simplest solution here is to also block passed null or EMPTY_START_ROW as split key by the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5849: - Status: Patch Available (was: Open) On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5849_v1.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259937#comment-13259937 ] stack commented on HBASE-5849: -- Patch lgtm. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5849_v1.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5861) Hadoop 23 compile broken due to tests introduced in HBASE-5064
[ https://issues.apache.org/jira/browse/HBASE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259950#comment-13259950 ] stack commented on HBASE-5861: -- @Jon Good stuff. Looks like our Andrew saw this before we did over in HBASE-5807: JobContext and TaskAttemptContext are only interfaces in 0.23+. TestHLogRecordReader should be reimplemented with a different approach. I'll mark that issue dup of this. Hadoop 23 compile broken due to tests introduced in HBASE-5064 --- Key: HBASE-5861 URL: https://issues.apache.org/jira/browse/HBASE-5861 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Blocker Fix For: 0.94.0, 0.96.0 When attempting to compile HBase 0.94rc1 against hadoop 23, I got this set of compilation error messages: {code} jon@swoop:~/proj/hbase-0.94$ mvn clean test -Dhadoop.profile=23 -DskipTests ... [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 18.926s [INFO] Finished at: Mon Apr 23 10:38:47 PDT 2012 [INFO] Final Memory: 55M/555M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile (default-testCompile) on project hbase: Compilation failure: Compilation failure: [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[147,46] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[153,29] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[194,46] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[206,29] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[213,29] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[226,29] org.apache.hadoop.mapreduce.TaskAttemptContext is abstract; cannot be instantiated [ERROR] - [Help 1] {code} Upon further investigation this issue is due to code introduced in HBASE-5064 and is also present in trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5807) TestHLogRecordReader does not compile against Hadoop 2
[ https://issues.apache.org/jira/browse/HBASE-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-5807. -- Resolution: Duplicate Marking as dup of HBASE-5861 (You saw it first Andrew but Jon is going to work on this over in 5861..) TestHLogRecordReader does not compile against Hadoop 2 -- Key: HBASE-5807 URL: https://issues.apache.org/jira/browse/HBASE-5807 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.96.0 Reporter: Andrew Purtell See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-23/144/console I see this trying to compile against branch 2 also. {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile (default-testCompile) on project hbase: Compilation failure: Compilation failure: [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-23/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java:[1333,35] cannot find symbol [ERROR] symbol : method getJobTracker() [ERROR] location: class org.apache.hadoop.mapred.MiniMRCluster.JobTrackerRunner [ERROR] [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-23/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[147,46] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-23/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[153,29] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-23/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[194,46] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-23/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[206,29] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-23/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[213,29] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-TRUNK-on-Hadoop-23/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[226,29] org.apache.hadoop.mapreduce.TaskAttemptContext is abstract; cannot be instantiated {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5830) Cleanup SequenceFileLogWriter to use syncFs api from SequenceFile#Writer directly in trunk.
[ https://issues.apache.org/jira/browse/HBASE-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5830: - Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk. Thanks for the patch Uma. Cleanup SequenceFileLogWriter to use syncFs api from SequenceFile#Writer directly in trunk. --- Key: HBASE-5830 URL: https://issues.apache.org/jira/browse/HBASE-5830 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.96.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Fix For: 0.96.0 Attachments: HBASE-5830.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5652) [findbugs] Fix lock release on all paths
[ https://issues.apache.org/jira/browse/HBASE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259957#comment-13259957 ] stack commented on HBASE-5652: -- I ran TestSplitLogManager ten times on trunk and it passed each time. FYI. [findbugs] Fix lock release on all paths - Key: HBASE-5652 URL: https://issues.apache.org/jira/browse/HBASE-5652 Project: HBase Issue Type: Sub-task Components: scripts Reporter: Jonathan Hsieh Assignee: Gregory Chanan Attachments: HBASE-5652-v0.patch, HBASE-5652-v1.patch See https://builds.apache.org/job/PreCommit-HBASE-Build/1313//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html#Warnings_MT_CORRECTNESS Category UL -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5857) RIT map in RS not getting cleared while region opening
[ https://issues.apache.org/jira/browse/HBASE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5857: - Attachment: HBASE-5857_trunk.2.patch This is the version I applied. Same as trunk.1.patch but w/ the tabs removed replaced by two spaces. RIT map in RS not getting cleared while region opening -- Key: HBASE-5857 URL: https://issues.apache.org/jira/browse/HBASE-5857 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5857_0.92.1.patch, HBASE-5857_0.92.patch, HBASE-5857_0.94.1.patch, HBASE-5857_94.patch, HBASE-5857_trunk.1.patch, HBASE-5857_trunk.1.patch, HBASE-5857_trunk.2.patch, HBASE-5857_trunk.patch While opening the region in RS after adding the region to regionsInTransitionInRS if tableDescriptors.get() throws exception the region wont be cleared from regionsInTransitionInRS. So next time if it tries to open the region in the same RS it will throw the RegionAlreadyInTransitionException. if swap the below statement this issue wont come. {code} this.regionsInTransitionInRS.putIfAbsent(region.getEncodedNameAsBytes(),true); HTableDescriptor htd = this.tableDescriptors.get(region.getTableName()); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5857) RIT map in RS not getting cleared while region opening
[ https://issues.apache.org/jira/browse/HBASE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5857: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Applied to 0.92, 0.94, and to trunk. Thanks for the patch Chinna. RIT map in RS not getting cleared while region opening -- Key: HBASE-5857 URL: https://issues.apache.org/jira/browse/HBASE-5857 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5857_0.92.1.patch, HBASE-5857_0.92.patch, HBASE-5857_0.94.1.patch, HBASE-5857_94.patch, HBASE-5857_trunk.1.patch, HBASE-5857_trunk.1.patch, HBASE-5857_trunk.2.patch, HBASE-5857_trunk.patch While opening the region in RS after adding the region to regionsInTransitionInRS if tableDescriptors.get() throws exception the region wont be cleared from regionsInTransitionInRS. So next time if it tries to open the region in the same RS it will throw the RegionAlreadyInTransitionException. if swap the below statement this issue wont come. {code} this.regionsInTransitionInRS.putIfAbsent(region.getEncodedNameAsBytes(),true); HTableDescriptor htd = this.tableDescriptors.get(region.getTableName()); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5857) RIT map in RS not getting cleared while region opening
[ https://issues.apache.org/jira/browse/HBASE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5857: - Fix Version/s: (was: 0.92.2) NOT applied to 0.92, just to 0.94 and trunk. Patch doesn't cleanly apply on 0.92. RIT map in RS not getting cleared while region opening -- Key: HBASE-5857 URL: https://issues.apache.org/jira/browse/HBASE-5857 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5857_0.92.1.patch, HBASE-5857_0.92.patch, HBASE-5857_0.94.1.patch, HBASE-5857_94.patch, HBASE-5857_trunk.1.patch, HBASE-5857_trunk.1.patch, HBASE-5857_trunk.2.patch, HBASE-5857_trunk.patch While opening the region in RS after adding the region to regionsInTransitionInRS if tableDescriptors.get() throws exception the region wont be cleared from regionsInTransitionInRS. So next time if it tries to open the region in the same RS it will throw the RegionAlreadyInTransitionException. if swap the below statement this issue wont come. {code} this.regionsInTransitionInRS.putIfAbsent(region.getEncodedNameAsBytes(),true); HTableDescriptor htd = this.tableDescriptors.get(region.getTableName()); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260005#comment-13260005 ] stack commented on HBASE-5699: -- @Ted Why delete a comment, especially someone elses? Run with 1 WAL in HRegionServer - Key: HBASE-5699 URL: https://issues.apache.org/jira/browse/HBASE-5699 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: Li Pi -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260020#comment-13260020 ] stack commented on HBASE-5699: -- @Ted Would suggest you just leave it. When you delete, we all get a message in our mailbox about the delete transaction. Then we start to wonder... Run with 1 WAL in HRegionServer - Key: HBASE-5699 URL: https://issues.apache.org/jira/browse/HBASE-5699 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: Li Pi -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5851) TestProcessBasedCluster sometimes fails
[ https://issues.apache.org/jira/browse/HBASE-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260024#comment-13260024 ] stack commented on HBASE-5851: -- It fails for me too. Will attach the two failure types I saw. TestProcessBasedCluster sometimes fails --- Key: HBASE-5851 URL: https://issues.apache.org/jira/browse/HBASE-5851 Project: HBase Issue Type: Test Reporter: Zhihong Yu Assignee: Jimmy Xiang Attachments: hbase-5851.patch TestProcessBasedCluster failed in https://builds.apache.org/job/HBase-TRUNK-security/178 Looks like cluster failed to start: {code} 2012-04-21 14:22:32,666 INFO [Thread-1] util.ProcessBasedLocalHBaseCluster(176): Waiting for HBase to startup. Retries left: 2 java.io.IOException: Giving up trying to location region in meta: thread is interrupted. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1173) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:956) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:917) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:252) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:174) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:134) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:178) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5851) TestProcessBasedCluster sometimes fails
[ https://issues.apache.org/jira/browse/HBASE-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5851: - Attachment: metahang.txt Here we just kept scanning meta w/ no movement beyond that. TestProcessBasedCluster sometimes fails --- Key: HBASE-5851 URL: https://issues.apache.org/jira/browse/HBASE-5851 Project: HBase Issue Type: Test Reporter: Zhihong Yu Assignee: Jimmy Xiang Attachments: hbase-5851.patch, metahang.txt, zkfail.txt TestProcessBasedCluster failed in https://builds.apache.org/job/HBase-TRUNK-security/178 Looks like cluster failed to start: {code} 2012-04-21 14:22:32,666 INFO [Thread-1] util.ProcessBasedLocalHBaseCluster(176): Waiting for HBase to startup. Retries left: 2 java.io.IOException: Giving up trying to location region in meta: thread is interrupted. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1173) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:956) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:917) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:252) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:174) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:134) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:178) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5851) TestProcessBasedCluster sometimes fails
[ https://issues.apache.org/jira/browse/HBASE-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5851: - Attachment: zkfail.txt Here is case of zk not being able to make a connection. Goes on and on. TestProcessBasedCluster sometimes fails --- Key: HBASE-5851 URL: https://issues.apache.org/jira/browse/HBASE-5851 Project: HBase Issue Type: Test Reporter: Zhihong Yu Assignee: Jimmy Xiang Attachments: hbase-5851.patch, metahang.txt, zkfail.txt TestProcessBasedCluster failed in https://builds.apache.org/job/HBase-TRUNK-security/178 Looks like cluster failed to start: {code} 2012-04-21 14:22:32,666 INFO [Thread-1] util.ProcessBasedLocalHBaseCluster(176): Waiting for HBase to startup. Retries left: 2 java.io.IOException: Giving up trying to location region in meta: thread is interrupted. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1173) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:956) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:917) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:252) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:174) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:134) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:178) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5851) TestProcessBasedCluster sometimes fails
[ https://issues.apache.org/jira/browse/HBASE-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5851: - Attachment: disable.txt Patch to disable the flakey test for now. This is not a critical functional piece and would like to have passing tests for a while so will commit this until this issue is fixed. TestProcessBasedCluster sometimes fails --- Key: HBASE-5851 URL: https://issues.apache.org/jira/browse/HBASE-5851 Project: HBase Issue Type: Test Reporter: Zhihong Yu Assignee: Jimmy Xiang Attachments: disable.txt, hbase-5851.patch, metahang.txt, zkfail.txt TestProcessBasedCluster failed in https://builds.apache.org/job/HBase-TRUNK-security/178 Looks like cluster failed to start: {code} 2012-04-21 14:22:32,666 INFO [Thread-1] util.ProcessBasedLocalHBaseCluster(176): Waiting for HBase to startup. Retries left: 2 java.io.IOException: Giving up trying to location region in meta: thread is interrupted. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1173) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:956) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:917) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:252) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:174) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:134) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:178) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5851) TestProcessBasedCluster sometimes fails
[ https://issues.apache.org/jira/browse/HBASE-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260036#comment-13260036 ] stack commented on HBASE-5851: -- Committed to trunk the attached disable.txt to disable the failing test. TestProcessBasedCluster sometimes fails --- Key: HBASE-5851 URL: https://issues.apache.org/jira/browse/HBASE-5851 Project: HBase Issue Type: Test Reporter: Zhihong Yu Assignee: Jimmy Xiang Attachments: disable.txt, hbase-5851.patch, metahang.txt, zkfail.txt TestProcessBasedCluster failed in https://builds.apache.org/job/HBase-TRUNK-security/178 Looks like cluster failed to start: {code} 2012-04-21 14:22:32,666 INFO [Thread-1] util.ProcessBasedLocalHBaseCluster(176): Waiting for HBase to startup. Retries left: 2 java.io.IOException: Giving up trying to location region in meta: thread is interrupted. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1173) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:956) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:917) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:252) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:174) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:134) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:178) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5851) TestProcessBasedCluster sometimes fails; currently disabled -- needs to be fixed and reenabled
[ https://issues.apache.org/jira/browse/HBASE-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5851: - Summary: TestProcessBasedCluster sometimes fails; currently disabled -- needs to be fixed and reenabled (was: TestProcessBasedCluster sometimes fails) Updated the subject. TestProcessBasedCluster sometimes fails; currently disabled -- needs to be fixed and reenabled -- Key: HBASE-5851 URL: https://issues.apache.org/jira/browse/HBASE-5851 Project: HBase Issue Type: Test Reporter: Zhihong Yu Assignee: Jimmy Xiang Attachments: disable.txt, hbase-5851.patch, metahang.txt, zkfail.txt TestProcessBasedCluster failed in https://builds.apache.org/job/HBase-TRUNK-security/178 Looks like cluster failed to start: {code} 2012-04-21 14:22:32,666 INFO [Thread-1] util.ProcessBasedLocalHBaseCluster(176): Waiting for HBase to startup. Retries left: 2 java.io.IOException: Giving up trying to location region in meta: thread is interrupted. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1173) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:956) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:917) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:252) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:174) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:134) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:178) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5851) TestProcessBasedCluster sometimes fails; currently disabled -- needs to be fixed and reenabled
[ https://issues.apache.org/jira/browse/HBASE-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260054#comment-13260054 ] stack commented on HBASE-5851: -- @Jimmy Thanks. Add the patch and we'll commit. TestProcessBasedCluster sometimes fails; currently disabled -- needs to be fixed and reenabled -- Key: HBASE-5851 URL: https://issues.apache.org/jira/browse/HBASE-5851 Project: HBase Issue Type: Test Reporter: Zhihong Yu Assignee: Jimmy Xiang Attachments: disable.txt, hbase-5851.patch, metahang.txt, zkfail.txt TestProcessBasedCluster failed in https://builds.apache.org/job/HBase-TRUNK-security/178 Looks like cluster failed to start: {code} 2012-04-21 14:22:32,666 INFO [Thread-1] util.ProcessBasedLocalHBaseCluster(176): Waiting for HBase to startup. Retries left: 2 java.io.IOException: Giving up trying to location region in meta: thread is interrupted. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1173) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:956) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:917) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:252) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:174) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:134) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:178) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5805) TestServerCustomProtocol failing intermittently.
[ https://issues.apache.org/jira/browse/HBASE-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260055#comment-13260055 ] stack commented on HBASE-5805: -- I ran this test ten times on trunk and it passes for me -- but I've seen it fail up on jenkins. TestServerCustomProtocol failing intermittently. Key: HBASE-5805 URL: https://issues.apache.org/jira/browse/HBASE-5805 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Reporter: Uma Maheswara Rao G Attachments: TestServerCustomProtocol.log Trace: java.lang.AssertionError: Results should contain region test,ccc,1334638013935.b9d77206f6eb226928b898e66fd1d508. for row 'ccc' at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.verifyRegionResults(TestServerCustomProtocol.java:363) at org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.testNullReturn(TestServerCustomProtocol.java:330) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5851) TestProcessBasedCluster sometimes fails; currently disabled -- needs to be fixed and reenabled
[ https://issues.apache.org/jira/browse/HBASE-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260066#comment-13260066 ] stack commented on HBASE-5851: -- Sorry Jimmy. Missed your patch upload. Why we need to this now? We've not had to up to this: {code} +conf.reloadConfiguration(); {code} And here, could we not just update the conf by putting the current conf into place? {code} +// copy some important settings from configuration from this.conf +if (conf.get(hbase.rpc.engine) != null) { + confMap.put(hbase.rpc.engine, conf.get(hbase.rpc.engine)); +} {code} Would the test pass if we did not do the reload and if we just updated the content of confMap w/ what current Config is? BTW, this feels like right soln. Good stuff Jimmy. TestProcessBasedCluster sometimes fails; currently disabled -- needs to be fixed and reenabled -- Key: HBASE-5851 URL: https://issues.apache.org/jira/browse/HBASE-5851 Project: HBase Issue Type: Test Reporter: Zhihong Yu Assignee: Jimmy Xiang Attachments: disable.txt, hbase-5851.patch, metahang.txt, zkfail.txt TestProcessBasedCluster failed in https://builds.apache.org/job/HBase-TRUNK-security/178 Looks like cluster failed to start: {code} 2012-04-21 14:22:32,666 INFO [Thread-1] util.ProcessBasedLocalHBaseCluster(176): Waiting for HBase to startup. Retries left: 2 java.io.IOException: Giving up trying to location region in meta: thread is interrupted. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1173) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:956) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:917) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:252) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:174) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:134) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:178) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5861) Hadoop 23 compile broken due to tests introduced in HBASE-5064
[ https://issues.apache.org/jira/browse/HBASE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260137#comment-13260137 ] stack commented on HBASE-5861: -- You uploaded wrong patch boss? Hadoop 23 compile broken due to tests introduced in HBASE-5064 --- Key: HBASE-5861 URL: https://issues.apache.org/jira/browse/HBASE-5861 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Blocker Fix For: 0.94.0, 0.96.0 Attachments: 5782-v3.txt, 5861.txt, hbase-5861-jon.patch, hbase-5861-v2.patch When attempting to compile HBase 0.94rc1 against hadoop 23, I got this set of compilation error messages: {code} jon@swoop:~/proj/hbase-0.94$ mvn clean test -Dhadoop.profile=23 -DskipTests ... [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 18.926s [INFO] Finished at: Mon Apr 23 10:38:47 PDT 2012 [INFO] Final Memory: 55M/555M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile (default-testCompile) on project hbase: Compilation failure: Compilation failure: [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[147,46] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[153,29] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[194,46] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[206,29] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[213,29] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[226,29] org.apache.hadoop.mapreduce.TaskAttemptContext is abstract; cannot be instantiated [ERROR] - [Help 1] {code} Upon further investigation this issue is due to code introduced in HBASE-5064 and is also present in trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5850) Backport HBASE-5454 to 90 and 92 Refuse operations from Admin before master is initialized
[ https://issues.apache.org/jira/browse/HBASE-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260138#comment-13260138 ] stack commented on HBASE-5850: -- +1 Backport HBASE-5454 to 90 and 92 Refuse operations from Admin before master is initialized --- Key: HBASE-5850 URL: https://issues.apache.org/jira/browse/HBASE-5850 Project: HBase Issue Type: Bug Reporter: xufeng Assignee: xufeng Fix For: 0.90.7, 0.92.2, 0.94.0 Attachments: 5850-trunk.txt, No_patch_90_surefire-report.html, backport-5454(createTable)-to-94.patch, backport-5454(createTable)-to-94_surefire-report.html, backport-5454(createTable)-to-trunk.patch, backport-5454(createTable)-to-trunk_surefire-report.html, backport-5454-to-90-surefire-report.html, backport-5454-to-90.patch, backport-5454-to-92.patch, backport-5454-to-92_surefire-report.html This issue is needed in 0.90 0.92 also. And update the hbase-5454 patch that add the checkInitialized() into HMaster#createTable(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5849: - Attachment: 5849v3.txt Enis's v2 patch with this added to end of test: {code} + @org.junit.Rule + public org.apache.hadoop.hbase.ResourceCheckerJUnitRule cu = +new org.apache.hadoop.hbase.ResourceCheckerJUnitRule(); {code} Nice test. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5849: - Resolution: Fixed Fix Version/s: 0.94.0 0.92.2 Release Note: Rather than exit, the regionserver will now wait even though the root directory in zookeeper has yet to be created. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.92, 0.94, and to trunk. Thanks for the patch Enis. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5862) After Region Close remove the Operation Metrics.
[ https://issues.apache.org/jira/browse/HBASE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260148#comment-13260148 ] stack commented on HBASE-5862: -- @Elliott What happens? The region looks like its on a regionserver its no longer on? The counters just don't change? Whats that mean? That our per-region metrics are going to be messy if regions move? Good on you. After Region Close remove the Operation Metrics. Key: HBASE-5862 URL: https://issues.apache.org/jira/browse/HBASE-5862 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Priority: Minor Attachments: HBASE-5862-0.patch If a region is closed then Hadoop metrics shouldn't still be reporting about that region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5851) TestProcessBasedCluster sometimes fails; currently disabled -- needs to be fixed and reenabled
[ https://issues.apache.org/jira/browse/HBASE-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260193#comment-13260193 ] stack commented on HBASE-5851: -- It failed on the 9th iteration (It used fail every other time for me so its an improvement). {code} --- T E S T S --- Running org.apache.hadoop.hbase.util.TestProcessBasedCluster 2012-04-23 19:38:20.210 java[97418:a003] Unable to load realm info from SCDynamicStore Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 300.829 sec FAILURE! Results : Tests in error: testProcessBasedCluster(org.apache.hadoop.hbase.util.TestProcessBasedCluster): test timed out after 30 milliseconds Tests run: 2, Failures: 0, Errors: 1, Skipped: 0 {code} There is nothing in the .out... TestProcessBasedCluster sometimes fails; currently disabled -- needs to be fixed and reenabled -- Key: HBASE-5851 URL: https://issues.apache.org/jira/browse/HBASE-5851 Project: HBase Issue Type: Test Reporter: Zhihong Yu Assignee: Jimmy Xiang Attachments: disable.txt, hbase-5851.patch, hbase-5851_v2.patch, metahang.txt, zkfail.txt TestProcessBasedCluster failed in https://builds.apache.org/job/HBase-TRUNK-security/178 Looks like cluster failed to start: {code} 2012-04-21 14:22:32,666 INFO [Thread-1] util.ProcessBasedLocalHBaseCluster(176): Waiting for HBase to startup. Retries left: 2 java.io.IOException: Giving up trying to location region in meta: thread is interrupted. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1173) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:956) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:917) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:252) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:192) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:174) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:134) at org.apache.hadoop.hbase.util.ProcessBasedLocalHBaseCluster.startHBase(ProcessBasedLocalHBaseCluster.java:178) at org.apache.hadoop.hbase.util.TestProcessBasedCluster.testProcessBasedCluster(TestProcessBasedCluster.java:56) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5861) Hadoop 23 compile broken due to tests introduced in HBASE-5064
[ https://issues.apache.org/jira/browse/HBASE-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260209#comment-13260209 ] stack commented on HBASE-5861: -- Does HBASE-5853 need to get fixed too? It looks like a 0.23 issue also? Hadoop 23 compile broken due to tests introduced in HBASE-5064 --- Key: HBASE-5861 URL: https://issues.apache.org/jira/browse/HBASE-5861 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Blocker Fix For: 0.94.0, 0.96.0 Attachments: 5861.txt, hbase-5861-jon.patch, hbase-5861-v2.patch When attempting to compile HBase 0.94rc1 against hadoop 23, I got this set of compilation error messages: {code} jon@swoop:~/proj/hbase-0.94$ mvn clean test -Dhadoop.profile=23 -DskipTests ... [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 18.926s [INFO] Finished at: Mon Apr 23 10:38:47 PDT 2012 [INFO] Final Memory: 55M/555M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile (default-testCompile) on project hbase: Compilation failure: Compilation failure: [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[147,46] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[153,29] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[194,46] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[206,29] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[213,29] org.apache.hadoop.mapreduce.JobContext is abstract; cannot be instantiated [ERROR] [ERROR] /home/jon/proj/hbase-0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java:[226,29] org.apache.hadoop.mapreduce.TaskAttemptContext is abstract; cannot be instantiated [ERROR] - [Help 1] {code} Upon further investigation this issue is due to code introduced in HBASE-5064 and is also present in trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260208#comment-13260208 ] stack commented on HBASE-5849: -- I tried it before committing and it passed then. I just tried it on trunk now: {code} --- T E S T S --- Running org.apache.hadoop.hbase.TestClusterBootOrder 2012-04-23 21:27:45.213 java[97823:d007] Unable to load realm info from SCDynamicStore Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.727 sec Results : Tests run: 2, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase --- [INFO] Tests are skipped. [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 34.313s [INFO] Finished at: Mon Apr 23 21:28:02 PDT 2012 [INFO] Final Memory: 21M/81M [INFO] {code} On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5852) Standalone HBase-0.92.1 fails to start master when coexisting with Hadoop-1.0.1, unnecessarily trying connecting to namenode.
[ https://issues.apache.org/jira/browse/HBASE-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260211#comment-13260211 ] stack commented on HBASE-5852: -- You should ask about this issue on mailing list first. For sure the hadoop classpath is not being picked up by hbase when its started? Standalone HBase-0.92.1 fails to start master when coexisting with Hadoop-1.0.1, unnecessarily trying connecting to namenode. -- Key: HBASE-5852 URL: https://issues.apache.org/jira/browse/HBASE-5852 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1 Environment: Scientific Linux 6.2 x86_64 with Oracle JDK 1.6 Mac OS X Lion 10.7.1 with Oracle JDK 1.6 Reporter: Jianwen WEI Labels: hadoop, hbase, newbie Original Estimate: 168h Remaining Estimate: 168h I want to run a standalone HBase instance for development test purpose, which requires no HDFS. It works well on my server. However, when I add a hadoop directory into my server, HBase seems to notice that change and try to connect to Hadoop's namenode. The failures of HBase's connecting to Hadoop NameNode cause MHBase fail to start. I extracted HBase-0.92.1 to my home directory: ~/hbase - ~/hbase-0.92.1 In configuration file ~/hbase/config/hbase-site.xml, I set HBase to standalone mode and specify the data directory it uses. configuration property namehbase.rootdir/name valuefile:///home/jianwen/.hbase.data/value /property /configuration Then I start HBase service with start-hbase.sh, enter HBase shell. Tests go well. But things change when I install Hadoop into the same server. Hadoop-1.0.1 lies in my home directory too. ~/hadoop - ~/hadoop-1.0.1 In configuration file ~/hadoop/config/core-site.xml, I set Hadoop to run in a pseudo distributed environment and specify the data directory for HDFS. configuration property namehadoop.tmp.dir/name value/home/jianwen/.hdfs.data/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://localhost:9000/value /property /configuration Then I format namenode, start hadoop service, run some MapReduce test programs, such as Pi, grep, et al. Hadoop works on my pesudo distributed environment. Then I stop hadoop service. Since I add hadoop in my home directory, HBase fails to start. HBase log shows that HBase tries to connect to Hadoop NameNode when starting up, then fails. That's ridiculous because HBase in standalone mode should have NOTHING about NameNode and HDFS. In summary, there may be two problems: - Standalone HBase attempts to connect to Hadoop NameNode at starting up when hadoop directory is co-located in home. - Not stating in HBase's configuration files, HBase seems to implicitly search hadoop directory around it and read the configuration information, such as NameNode in file core-site.xml. This unclear behavior confuses me a lot. Log for standalone HBase starting up: ... 2012-04-22 11:50:41,078 DEBUG org.apache.hadoop.hbase.master.LogCleaner: Add log cleaner in chain: org.apache.hadoop.hbase.master.TimeToLiveLogCleaner 2012-04-22 11:50:41,115 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2012-04-22 11:50:41,160 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 2012-04-22 11:50:41,165 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 60010 2012-04-22 11:50:41,165 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 60010 webServer.getConnectors()[0].getLocalPort() returned 60010 2012-04-22 11:50:41,165 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 60010 2012-04-22 11:50:41,165 INFO org.mortbay.log: jetty-6.1.26 2012-04-22 11:50:41,548 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:60010 2012-04-22 11:50:41,548 DEBUG org.apache.hadoop.hbase.master.HMaster: Started service threads 2012-04-22 11:50:41,790 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s). 2012-04-22 11:50:42,792 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s). 2012-04-22 11:50:43,050 INFO
[jira] [Commented] (HBASE-5829) Inconsistency between the regions map and the servers map in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260238#comment-13260238 ] stack commented on HBASE-5829: -- Do you have a patch for us Maryann? The first at least seems legit (For the second, there is no associated server, right?) Inconsistency between the regions map and the servers map in AssignmentManager -- Key: HBASE-5829 URL: https://issues.apache.org/jira/browse/HBASE-5829 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1 Reporter: Maryann Xue There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt. In AssignmentManager.unassign(HRegionInfo, boolean) try { // TODO: We should consider making this look more like it does for the // region open where we catch all throwables and never abort if (serverManager.sendRegionClose(server, state.getRegion(), versionOfClosingNode)) { LOG.debug(Sent CLOSE to + server + for region + region.getRegionNameAsString()); return; } // This never happens. Currently regionserver close always return true. LOG.warn(Server + server + region CLOSE RPC returned false for + region.getRegionNameAsString()); } catch (NotServingRegionException nsre) { LOG.info(Server + server + returned + nsre + for + region.getRegionNameAsString()); // Presume that master has stale data. Presume remote side just split. // Presume that the split message when it comes in will fix up the master's // in memory cluster state. } catch (Throwable t) { if (t instanceof RemoteException) { t = ((RemoteException)t).unwrapRemoteException(); if (t instanceof NotServingRegionException) { if (checkIfRegionBelongsToDisabling(region)) { // Remove from the regionsinTransition map LOG.info(While trying to recover the table + region.getTableNameAsString() + to DISABLED state the region + region + was offlined but the table was in DISABLING state); synchronized (this.regionsInTransition) { this.regionsInTransition.remove(region.getEncodedName()); } // Remove from the regionsMap synchronized (this.regions) { this.regions.remove(region); } deleteClosingOrClosedNode(region); } } // RS is already processing this region, only need to update the timestamp if (t instanceof RegionAlreadyInTransitionException) { LOG.debug(update + state + the timestamp.); state.update(state.getState()); } } In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean) synchronized (this.regions) { this.regions.put(plan.getRegionInfo(), plan.getDestination()); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5816) Balancer and ServerShutdownHandler concurrently reassigning the same region
[ https://issues.apache.org/jira/browse/HBASE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260240#comment-13260240 ] stack commented on HBASE-5816: -- @Ram I think the aim would be a simplification; one queue to assign from rather than from multiple. Also as is, I think state a little distributed across multiple variables and maps. We should coalesce if possible. I think the Maryann suggestion of trying a double or triple concurrent assign in a unit test a good start. Balancer and ServerShutdownHandler concurrently reassigning the same region --- Key: HBASE-5816 URL: https://issues.apache.org/jira/browse/HBASE-5816 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Maryann Xue Assignee: ramkrishna.s.vasudevan Priority: Critical Attachments: HBASE-5816.patch The first assign thread exits with success after updating the RegionState to PENDING_OPEN, while the second assign follows immediately into assign and fails the RegionState check in setOfflineInZooKeeper(). This causes the master to abort. In the below case, the two concurrent assigns occurred when AM tried to assign a region to a dying/dead RS, and meanwhile the ShutdownServerHandler tried to assign this region (from the region plan) spontaneously. 2012-04-17 05:44:57,648 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=hadoop05.sh.intel.com,60020,1334544902186, dest=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. (offlining) 2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=hadoop05.sh.intel.com,60020,1334544902186, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 2012-04-17 05:44:57,666 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/fe38fe31caf40b6e607a3e6bbed6404b (region=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., server=hadoop05.sh.intel.com,60020,1334544902186, state=RS_ZK_REGION_CLOSING) 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=CLOSED, ts=1334612697672, server=hadoop05.sh.intel.com,60020,1334544902186 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x236b912e9b3000e Creating (or updating) unassigned node for fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state 2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.; plan=hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., src=hadoop05.sh.intel.com,60020,1334544902186, dest=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:54:19,159 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. state=PENDING_OPEN, ts=1334613179096, server=xmlqa-clv16.sh.intel.com,60020,1334612497253 2012-04-17 05:54:59,033 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to serverName=xmlqa-clv16.sh.intel.com,60020,1334612497253, load=(requests=0, regions=0, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0 java.net.SocketTimeoutException: Call to /10.239.47.87:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 12 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302 remote=/10.239.47.87:60020] at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:805) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:778) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:283) at $Proxy7.openRegion(Unknown Source) at
[jira] [Updated] (HBASE-5831) hadoopqa builds not completing
[ https://issues.apache.org/jira/browse/HBASE-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5831: - Resolution: Won't Fix Status: Resolved (was: Patch Available) hadoopqa has completed a few times since work fixing up tests and changing client retry so it no longer 100 times but default 10. Closing as won't fix. hadoopqa builds not completing -- Key: HBASE-5831 URL: https://issues.apache.org/jira/browse/HBASE-5831 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: stack Priority: Blocker Attachments: 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.TestLoadIncrementalHFilesSplitRecovery.txt, 5831.remove.all.mapreduce.txt, 5831.remove.all.mapreduce.txt, 5831.remove.all.mapreduce.txt, 5831.remove.all.mapreduce.txt, 5831.remove.all.mapreduce.txt No test failures but build complains it has failed. trunk build seems to have the same affliction: {code} Results : Tests run: 909, Failures: 0, Errors: 0, Skipped: 9 [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 41:19.273s [INFO] Finished at: Wed Apr 18 21:54:31 UTC 2012 [INFO] Final Memory: 59M/451M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12-TRUNK-HBASE-2:test (secondPartTestsExecution) on project hbase: Failure or timeout - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523250/5811+%281%29.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: {code} Its not apparent that any particular test is not finishing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260246#comment-13260246 ] stack commented on HBASE-5849: -- There is something wrong now. This test won't complete for me (though it has previous). I thought it the subsequent commit: {code} r1329555 | larsh | 2012-04-23 22:12:45 -0700 (Mon, 23 Apr 2012) | 1 line Refuse operations from Admin before master is initialized - fix for all branches {code} ..that was bringing on the problem but removing that, its still not completing. I poked around in debugger and was getting an NPE in reportForDuty after master came up because this.hbaseMaster was null; we were failing allocating the Interface (hard to trace because toString would throw its on exception). For now backing this out. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260247#comment-13260247 ] stack commented on HBASE-5849: -- I mean, it even passed hadoopqa above apart from my testing. Backing it out though... its ugly hang when it happens. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260248#comment-13260248 ] stack commented on HBASE-5849: -- So, yes, I'm seeing what Ted reports above. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HBASE-5849: -- Reopening. Backing out patch. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260255#comment-13260255 ] stack commented on HBASE-5849: -- I killed all running builds in case they'd run into this hang. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260256#comment-13260256 ] stack commented on HBASE-5849: -- Enis, might taking a look at this? On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5842) Passing shell commands as an argument
[ https://issues.apache.org/jira/browse/HBASE-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260267#comment-13260267 ] stack commented on HBASE-5842: -- We used to have it so you could pass a formatter class that the shell would use. Default is console formatting. We used to have an html output one which was useful when you could type in shell commands on ui and get results as an html page. Another formatter would emit results per line so greppable. Passing shell commands as an argument - Key: HBASE-5842 URL: https://issues.apache.org/jira/browse/HBASE-5842 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.94.0 Reporter: Harsh J Priority: Minor Many times we've required scans of .META. to analyze issues with the cluster we work on, and to have the result in a file we can pass around we usually end up doing something like: {{echo scan '.META.'| hbase shell meta-scan.txt}} This can rather be simplified as something like the following instead, with support for a commands reading argument: {{hbase shell -c scan '.META.'}} [Note though: File reading is possible already, i.e. {{hbase shell file.hs}}, but then thats two steps and we usually don't keep a file around for just a meta table scan.] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5847) Support createTable(splitKeys) in Thrift
[ https://issues.apache.org/jira/browse/HBASE-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260270#comment-13260270 ] stack commented on HBASE-5847: -- Does the thrift2 package support this Nicolas? Support createTable(splitKeys) in Thrift Key: HBASE-5847 URL: https://issues.apache.org/jira/browse/HBASE-5847 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial The Thrift API does not allow a user to create a table with multiple split keys. This is needed for a handful of new internal projects that are written in PHP/C++. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5677) The master never does balance because duplicate openhandled the one region
[ https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5677: - Resolution: Won't Fix Status: Resolved (was: Patch Available) Will be fixed over in HBASE-5850. The master never does balance because duplicate openhandled the one region -- Key: HBASE-5677 URL: https://issues.apache.org/jira/browse/HBASE-5677 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Environment: 0.90 Reporter: xufeng Assignee: xufeng Fix For: 0.90.7, 0.92.2 Attachments: 5677-proposal.txt, 5677-proposal.txt, Backport-HBASE-5454-to-90.patch, Backport-HBASE-5454-to-92.patch, HBASE-5677-90-v1.patch, surefire-report_no_patched_v1.html, surefire-report_patched_v1.html If region be assigned When the master is doing initialization(before do processFailover),the region will be duplicate openhandled. because the unassigned node in zookeeper will be handled again in AssignmentManager#processFailover() it cause the region in RIT,thus the master never does balance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5564) Bulkload is discarding duplicate records
[ https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260282#comment-13260282 ] stack commented on HBASE-5564: -- @Laxman Any luck? Bulkload is discarding duplicate records Key: HBASE-5564 URL: https://issues.apache.org/jira/browse/HBASE-5564 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.96.0 Environment: HBase 0.92 Reporter: Laxman Assignee: Laxman Labels: bulkloader Fix For: 0.96.0 Attachments: 5564.lint, 5564v5.txt, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.2.patch, HBASE-5564_trunk.3.patch, HBASE-5564_trunk.4_final.patch, HBASE-5564_trunk.patch Duplicate records are getting discarded when duplicate records exists in same input file and more specifically if they exists in same split. Duplicate records are considered if the records are from diffrent different splits. Version under test: HBase 0.92 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5732) Remove the SecureRPCEngine and merge the security-related logic in the core engine
[ https://issues.apache.org/jira/browse/HBASE-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260280#comment-13260280 ] stack commented on HBASE-5732: -- @Devaraj If this patch only works against 1.0.x hadoop, what do we do when we want to run hbase 0.96 on hadoop 2.0.x? Here is some more feedback on posted patch: Its big! Its mostly deletes and generated code though thankfully. So, you are going to move to UGI over User? Are you going to get rid of the security dir that is at top level in hbase? Is there anything left in it after this patch? What kind of hadoop will be required? One that supports security: i.e. apache hadoop 1.0.x, 2.0.x? And then for the others? The will need to have an answer for the security methods? Just remove rather than do this commenting out: {code} -builder.setError(error != null); +//builder.setStatus( {code} This is going to make a copy of the response? {code} + token = connection.saslServer.wrap(buf.array(), + buf.arrayOffset(), buf.remaining()); {code} Do we have to? Can't we feed it out on the output stream, first the wrapping, then the response? Any tests? Good stuff. Remove the SecureRPCEngine and merge the security-related logic in the core engine -- Key: HBASE-5732 URL: https://issues.apache.org/jira/browse/HBASE-5732 Project: HBase Issue Type: Improvement Reporter: Devaraj Das Assignee: Devaraj Das Attachments: rpcengine-merge.patch Remove the SecureRPCEngine and merge the security-related logic in the core engine. Follow up to HBASE-5727. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5672) TestLruBlockCache#testBackgroundEvictionThread fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5672: - Status: Patch Available (was: Open) +1 on patch. Submitting to hadoopqa. TestLruBlockCache#testBackgroundEvictionThread fails occasionally - Key: HBASE-5672 URL: https://issues.apache.org/jira/browse/HBASE-5672 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-5672.patch, HBASE-5672v2.patch We find TestLruBlockCache#testBackgroundEvictionThread fails occasionally. I think it's a problem of the test case. Because runEviction() only do evictionThread.evict(): {code} public void evict() { synchronized(this) { this.notify(); // FindBugs NN_NAKED_NOTIFY } } {code} However when we call evictionThread.evict(), the evictionThread may haven't been in run() in the TestLruBlockCache#testBackgroundEvictionThread. If we run the test many times, we could find failture easily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5672) TestLruBlockCache#testBackgroundEvictionThread fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260307#comment-13260307 ] stack commented on HBASE-5672: -- Sorry Chunhui, we let the patch rot. Mind updating it? Thanks. TestLruBlockCache#testBackgroundEvictionThread fails occasionally - Key: HBASE-5672 URL: https://issues.apache.org/jira/browse/HBASE-5672 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-5672.patch, HBASE-5672v2.patch We find TestLruBlockCache#testBackgroundEvictionThread fails occasionally. I think it's a problem of the test case. Because runEviction() only do evictionThread.evict(): {code} public void evict() { synchronized(this) { this.notify(); // FindBugs NN_NAKED_NOTIFY } } {code} However when we call evictionThread.evict(), the evictionThread may haven't been in run() in the TestLruBlockCache#testBackgroundEvictionThread. If we run the test many times, we could find failture easily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4393) Implement a canary monitoring program
[ https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4393: - Resolution: Fixed Fix Version/s: 0.96.0 Release Note: Tool to check cluster. See $ ./bin/hbase org.apache.hadoop.hbase.tool.Canary -help for how to use. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk. Thanks for the patch Matteo. I tried it out. Does the basics. Nice. Thanks. Implement a canary monitoring program - Key: HBASE-4393 URL: https://issues.apache.org/jira/browse/HBASE-4393 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Matteo Bertozzi Fix For: 0.96.0 Attachments: Canary-v0.java, HBASE-4393-v0.patch, HBaseCanary.java This JIRA is to implement a standalone program that can be used to do canary monitoring of a running HBase cluster. This program would gather a list of the regions in the cluster, then iterate over them doing lightweight operations (eg short scans) to provide metrics about latency as well as alert on availability issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4393) Implement a canary monitoring program
[ https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4393: - Fix Version/s: 0.94.0 Committed to 0.94 (thought you might like this Lars). Implement a canary monitoring program - Key: HBASE-4393 URL: https://issues.apache.org/jira/browse/HBASE-4393 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Matteo Bertozzi Fix For: 0.94.0, 0.96.0 Attachments: Canary-v0.java, HBASE-4393-v0.patch, HBaseCanary.java This JIRA is to implement a standalone program that can be used to do canary monitoring of a running HBase cluster. This program would gather a list of the regions in the cluster, then iterate over them doing lightweight operations (eg short scans) to provide metrics about latency as well as alert on availability issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5864) Error while reading from hfile in 0.94
[ https://issues.apache.org/jira/browse/HBASE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260645#comment-13260645 ] stack commented on HBASE-5864: -- Good find lads. I'm not sure I follow. Is it fixable? Error while reading from hfile in 0.94 -- Key: HBASE-5864 URL: https://issues.apache.org/jira/browse/HBASE-5864 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.0 Attachments: HBASE-5864_test.patch Got the following stacktrace during region split. {noformat} 2012-04-24 16:05:42,168 WARN org.apache.hadoop.hbase.regionserver.Store: Failed getting store size for value java.io.IOException: Requested block is out of range: 2906737606134037404, lastDataBlockOffset: 84764558 at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:278) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:285) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:402) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1638) at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1943) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:77) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:4921) at org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2901) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4393) Implement a canary monitoring program
[ https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260651#comment-13260651 ] stack commented on HBASE-4393: -- No. This patch has a license. The failure was because of the OOME. RAT complaint is this: {code} Unapproved licenses: hs_err_pid23951.log ... {code} Implement a canary monitoring program - Key: HBASE-4393 URL: https://issues.apache.org/jira/browse/HBASE-4393 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Matteo Bertozzi Fix For: 0.94.0, 0.96.0 Attachments: Canary-v0.java, HBASE-4393-v0.patch, HBaseCanary.java This JIRA is to implement a standalone program that can be used to do canary monitoring of a running HBase cluster. This program would gather a list of the regions in the cluster, then iterate over them doing lightweight operations (eg short scans) to provide metrics about latency as well as alert on availability issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5856) byte - String is not consistent between HBaseAdmin and HRegionInfo
[ https://issues.apache.org/jira/browse/HBASE-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260661#comment-13260661 ] stack commented on HBASE-5856: -- Help me understand Binlijin. This does not seem right to me but you are obviously on to something. The String that is being passed in here is coming from the shell? Is jruby shell doing the escaping before it passes the String to HBaseAdmin? Thanks. byte - String is not consistent between HBaseAdmin and HRegionInfo Key: HBASE-5856 URL: https://issues.apache.org/jira/browse/HBASE-5856 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.6, 0.92.1 Reporter: binlijin Attachments: HBASE-5856-0.92.patch In HBaseAdmin public void split(final String tableNameOrRegionName) throws IOException, InterruptedException { split(Bytes.toBytes(tableNameOrRegionName)); // string - byte } In HRegionInfo this.regionNameStr = Bytes.toStringBinary(this.regionName); // byte - string Should we use Bytes.toBytesBinary in HBaseAdmin ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5863) Improve the graceful_stop.sh CLI help (especially about reloads)
[ https://issues.apache.org/jira/browse/HBASE-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5863: - Resolution: Fixed Fix Version/s: 0.94.1 0.92.2 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.92, 0.94 and trunk. Thanks for the patch Harsh. Improve the graceful_stop.sh CLI help (especially about reloads) Key: HBASE-5863 URL: https://issues.apache.org/jira/browse/HBASE-5863 Project: HBase Issue Type: Improvement Components: scripts Affects Versions: 0.94.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.92.2, 0.94.1 Attachments: HBASE-5863.patch Right now, graceful_stop.sh prints: {code} Usage: graceful_stop.sh [--config conf-dir] [--restart] [--reload] [--thrift] [--rest] hostname thrift If we should stop/start thrift before/after the hbase stop/start restIf we should stop/start rest before/after the hbase stop/start restart If we should restart after graceful stop reload Move offloaded regions back on to the stopped server debug Move offloaded regions back on to the stopped server hostnameHostname of server we are to stop {code} This does not help us specify that reload is actually a sub/additive-option to restart. Also, the debug line seems to still have an old copy/paste mistake. I've updated these two in the patch here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5864) Error while reading from hfile in 0.94
[ https://issues.apache.org/jira/browse/HBASE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260672#comment-13260672 ] stack commented on HBASE-5864: -- I'm not sure I follow what your patch is doing Ram. And maybe we need a test around split of hfile? What is this doing: {code} -final int ENTRY_COUNT = 1; +final int ENTRY_COUNT = 5; {code} This is asking for too many entries? Good stuff. Error while reading from hfile in 0.94 -- Key: HBASE-5864 URL: https://issues.apache.org/jira/browse/HBASE-5864 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.0 Attachments: HBASE-5864_1.patch, HBASE-5864_test.patch Got the following stacktrace during region split. {noformat} 2012-04-24 16:05:42,168 WARN org.apache.hadoop.hbase.regionserver.Store: Failed getting store size for value java.io.IOException: Requested block is out of range: 2906737606134037404, lastDataBlockOffset: 84764558 at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:278) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:285) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:402) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1638) at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1943) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:77) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:4921) at org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2901) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4393) Implement a canary monitoring program
[ https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260726#comment-13260726 ] stack commented on HBASE-4393: -- @Ted FYI, if you go under artifacts produced by the build into the target dir, you can see the rat.txt now. Thats where I got the above from. Thinking on it, I also went to change the build order so site goes first so we'll fail fast if a rat problem but it seems build is already this way -- must run unit tests up front anyways. Implement a canary monitoring program - Key: HBASE-4393 URL: https://issues.apache.org/jira/browse/HBASE-4393 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Matteo Bertozzi Fix For: 0.94.0, 0.96.0 Attachments: Canary-v0.java, HBASE-4393-v0.patch, HBaseCanary.java This JIRA is to implement a standalone program that can be used to do canary monitoring of a running HBase cluster. This program would gather a list of the regions in the cluster, then iterate over them doing lightweight operations (eg short scans) to provide metrics about latency as well as alert on availability issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5866) Canary in tool package but says its in tools.
stack created HBASE-5866: Summary: Canary in tool package but says its in tools. Key: HBASE-5866 URL: https://issues.apache.org/jira/browse/HBASE-5866 Project: HBase Issue Type: Bug Reporter: stack -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5866) Canary in tool package but says its in tools.
[ https://issues.apache.org/jira/browse/HBASE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5866: - Attachment: 5866.txt Spotted by Lars Franke. Moved Canary into tool package. Canary in tool package but says its in tools. - Key: HBASE-5866 URL: https://issues.apache.org/jira/browse/HBASE-5866 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.94.0, 0.96.0 Attachments: 5866.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5866) Canary in tool package but says its in tools.
[ https://issues.apache.org/jira/browse/HBASE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-5866. -- Resolution: Fixed Fix Version/s: 0.96.0 0.94.0 Assignee: stack Committed trunk and 0.94. Canary in tool package but says its in tools. - Key: HBASE-5866 URL: https://issues.apache.org/jira/browse/HBASE-5866 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.94.0, 0.96.0 Attachments: 5866.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4393) Implement a canary monitoring program
[ https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260766#comment-13260766 ] stack commented on HBASE-4393: -- Let me take care of it Lars. I should have seen that. Thanks for pointing it out. Fixed over in HBASE-5866. Implement a canary monitoring program - Key: HBASE-4393 URL: https://issues.apache.org/jira/browse/HBASE-4393 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Matteo Bertozzi Fix For: 0.94.0, 0.96.0 Attachments: Canary-v0.java, HBASE-4393-v0.patch, HBaseCanary.java This JIRA is to implement a standalone program that can be used to do canary monitoring of a running HBase cluster. This program would gather a list of the regions in the cluster, then iterate over them doing lightweight operations (eg short scans) to provide metrics about latency as well as alert on availability issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5865) test-util.sh broken with unittest updates
[ https://issues.apache.org/jira/browse/HBASE-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-5865. -- Resolution: Fixed Hadoop Flags: Reviewed Tried it. Seems to work. Committed trunk and 0.94. Thanks for the patch Jesse. test-util.sh broken with unittest updates - Key: HBASE-5865 URL: https://issues.apache.org/jira/browse/HBASE-5865 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.96.0, 0.94.1 Reporter: Jesse Yates Assignee: Jesse Yates Priority: Minor Fix For: 0.96.0, 0.94.1 Attachments: sh_HBASE-5865-v0.patch Since the default maven test is meant to be run on the server, this test script always fails. Needs to take into account the location of where the script is being run as well as some debugging options for future fixes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5840) Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status
[ https://issues.apache.org/jira/browse/HBASE-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260774#comment-13260774 ] stack commented on HBASE-5840: -- Patch looks good. Is it just moving all of this (too long) method into a try block and then adding a finally that sets status.abort on the end (FYI, needs spacing around the operators in the status.abort line so its same code style as rest of file). Do you have to convert the Exception to an IOE? WHy is that? What does this method let out? IOEs only? If so, why we catch Exception? In case its a non-checked exception? On the test, it looks good too but in the finally you might want to use the new HRegion.closeHRegion(region) to clean up the wal log that gets made by the constructor. Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status -- Key: HBASE-5840 URL: https://issues.apache.org/jira/browse/HBASE-5840 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5840.patch TaskMonitor Status will not be cleared in case Regions FAILED_OPEN. This will keeps showing old status. This will miss leads the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira