[jira] [Updated] (HBASE-4722) TestGlobalMemStoreSize has started failing
[ https://issues.apache.org/jira/browse/HBASE-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4722: - Attachment: logging-v2.txt Bit more logging. Since I added this, I can't make it fail locally. Its like a timing issue seemingly where we skip flushing seemingly. Still digging. TestGlobalMemStoreSize has started failing -- Key: HBASE-4722 URL: https://issues.apache.org/jira/browse/HBASE-4722 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Attachments: logging-v2.txt, logging.txt I'm digging in. It fails occasionally for me locally to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4722) TestGlobalMemStoreSize has started failing
[ https://issues.apache.org/jira/browse/HBASE-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141948#comment-13141948 ] stack commented on HBASE-4722: -- I committed logging-v2 so can get more info when fails on jenkins since can't make it fail local (and I'm going to bed...) TestGlobalMemStoreSize has started failing -- Key: HBASE-4722 URL: https://issues.apache.org/jira/browse/HBASE-4722 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Attachments: logging-v2.txt, logging.txt I'm digging in. It fails occasionally for me locally to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4703) Improvements in tests
[ https://issues.apache.org/jira/browse/HBASE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141957#comment-13141957 ] nkeywal commented on HBASE-4703: Yes, the join are failing, and as the main thread does not end the JVM does not end. I pulled the trunk again, I still have the issue on rpc version. Improvements in tests - Key: HBASE-4703 URL: https://issues.apache.org/jira/browse/HBASE-4703 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.92.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.92.0 Attachments: 20111030_4703_all.patch, 20111030_4703_all.v2.patch, 20111030_4703_all.v3.patch, 20111030_4703_all.v4.patch Global: - when possible, make the test using the default cluster configuration for the number of region (1 instead of 2 or 3). This allows a faster stop/start, and is a step toward a shared cluster configuration. - 'sleep': lower or remove the sleep based synchronisation in the tests (in HBaseTestingUtility, TestGlobalMemStoreSize, TestAdmin, TestCoprocessorEndpoint, TestHFileOutputFormat, TestLruBlockCache, TestServerCustomProtocol, TestReplicationSink) - Optimize 'put' by setting setWriteToWAL to false, when the 'put' is big or in a loop. Not done for tests that rely on the WAL. Local issues: - TestTableInputFormatScan fully deletes the hadoop.tmp.dir directory on tearDown, that makes it impossible to use in // with another test using this directory - TestIdLock logs too much (9000 lines per seconds). Test time lowered to 15 seconds to make it a part of the small subset - TestMemoryBoundedLogMessageBuffer useless System.out.println - io.hfile.TestReseekTo useless System.out.println - TestTableInputFormat does not shutdown the cluster - testGlobalMemStore does not shutdown the cluster - rest.client.TestRemoteAdmin: simplified, does not use local admin, single test instead of two. - HBaseTestingUtility#ensureSomeRegionServersAvailable starts only one server, should start the number of missing server instead. - TestMergeTool should starts/stops the dfs cluster with HBaseTestingUtility -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4724) TestAdmin hangs randomly in trunk
TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Priority: Critical fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104) org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277) org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:523) org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:468) org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309) java.lang.Thread.run(Thread.java:662) Thread 165 (LruBlockCache.EvictionThread): State: WAITING Blocked count: 0 Waited count: 1 Waiting on org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread@3e9d7b56 Stack:
[jira] [Commented] (HBASE-4703) Improvements in tests
[ https://issues.apache.org/jira/browse/HBASE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141961#comment-13141961 ] nkeywal commented on HBASE-4703: Created HBASE-4724 to track this. Improvements in tests - Key: HBASE-4703 URL: https://issues.apache.org/jira/browse/HBASE-4703 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.92.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.92.0 Attachments: 20111030_4703_all.patch, 20111030_4703_all.v2.patch, 20111030_4703_all.v3.patch, 20111030_4703_all.v4.patch Global: - when possible, make the test using the default cluster configuration for the number of region (1 instead of 2 or 3). This allows a faster stop/start, and is a step toward a shared cluster configuration. - 'sleep': lower or remove the sleep based synchronisation in the tests (in HBaseTestingUtility, TestGlobalMemStoreSize, TestAdmin, TestCoprocessorEndpoint, TestHFileOutputFormat, TestLruBlockCache, TestServerCustomProtocol, TestReplicationSink) - Optimize 'put' by setting setWriteToWAL to false, when the 'put' is big or in a loop. Not done for tests that rely on the WAL. Local issues: - TestTableInputFormatScan fully deletes the hadoop.tmp.dir directory on tearDown, that makes it impossible to use in // with another test using this directory - TestIdLock logs too much (9000 lines per seconds). Test time lowered to 15 seconds to make it a part of the small subset - TestMemoryBoundedLogMessageBuffer useless System.out.println - io.hfile.TestReseekTo useless System.out.println - TestTableInputFormat does not shutdown the cluster - testGlobalMemStore does not shutdown the cluster - rest.client.TestRemoteAdmin: simplified, does not use local admin, single test instead of two. - HBaseTestingUtility#ensureSomeRegionServersAvailable starts only one server, should start the number of missing server instead. - TestMergeTool should starts/stops the dfs cluster with HBaseTestingUtility -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows
[ https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141962#comment-13141962 ] jirapos...@reviews.apache.org commented on HBASE-4536: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2178/#review3009 --- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java https://reviews.apache.org/r/2178/#comment6726 Hi Lars, Isn't this early-out problematic? It doesn't take into account min-versions. It doesn't take into account the newly introduced keepDeletedCells mode. - Prakash On 2011-10-18 21:43:38, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2178/ bq. --- bq. bq. (Updated 2011-10-18 21:43:38) bq. bq. bq. Review request for hbase, Ted Yu and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. HBase timerange Gets and Scans allow to do timetravel in HBase. I.e. look at the state of the data at any point in the past, provided the data is still around. bq. This did not work for deletes, however. Deletes would always mask all puts in the past. bq. This change adds a flag that can be on HColumnDescriptor to enable retention of deleted rows. bq. These rows are still subject to TTL and/or VERSIONS. bq. bq. This changes the following: bq. 1. There is a new flag on HColumnDescriptor enabling that behavior. bq. 2. Allow gets/scans with a timerange to retrieve rows hidden by a delete marker, if the timerange does not include the delete marker. bq. 3. Do not unconditionally collect all deleted rows during a compaction. bq. 4. Allow a raw Scan, which retrieves all delete markers and deleted rows. bq. bq. The change is small'ish, but the logic is intricate, so please review carefully. bq. bq. bq. This addresses bug HBASE-4536. bq. https://issues.apache.org/jira/browse/HBASE-4536 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Attributes.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1185362 bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeepDeletes.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1185362 bq.
[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141971#comment-13141971 ] nkeywal commented on HBASE-4724: I reproduce the issue 100% of the time on trunk, with TestAdmin#testCreateBadTables. I was not reproducing the error at all previously. TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Priority: Critical fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141979#comment-13141979 ] nkeywal commented on HBASE-4724: I believe that this log: {noformat} org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) [...] at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1150) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1145) at org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1821) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:522) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:468) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309) at java.lang.Thread.run(Thread.java:662) {noformat} could explain why we have this in the final threads dump: {noformat} Thread 149 (Master:0;localhost,36968,1320216715828): State: WAITING Blocked count: 148 Waited count: 148 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@1d4f0fb4 Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104) org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277) org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:523) org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:468) org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309) java.lang.Thread.run(Thread.java:662) {noformat} TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Priority: Critical fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on
[jira] [Commented] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
[ https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142008#comment-13142008 ] gaojinchao commented on HBASE-4577: --- Test failed, it seems not a patch problem. Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB - Key: HBASE-4577 URL: https://issues.apache.org/jira/browse/HBASE-4577 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4577_trial_Trunk.patch, HBASE-4577_trunk.patch Minor issue while looking at the RS metrics: bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, storefileSizeMB=2420, compressionRatio=1.0008 I guess there's a truncation somewhere when it's adding the numbers up. FWIW there's no compression on that table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
[ https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142007#comment-13142007 ] Ted Yu commented on HBASE-4577: --- I don't see 'Too many open files' for https://builds.apache.org/job/PreCommit-HBASE-Build/133//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoadWithSplit/ Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB - Key: HBASE-4577 URL: https://issues.apache.org/jira/browse/HBASE-4577 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4577_trial_Trunk.patch, HBASE-4577_trunk.patch Minor issue while looking at the RS metrics: bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, storefileSizeMB=2420, compressionRatio=1.0008 I guess there's a truncation somewhere when it's adding the numbers up. FWIW there's no compression on that table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4724: --- Attachment: 2002_4724_TestAdmin.patch TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Priority: Critical Attachments: 2002_4724_TestAdmin.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104) org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277) org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:523)
[jira] [Updated] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation
[ https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucian George Iordache updated HBASE-4713: -- Affects Version/s: 0.90.4 Status: Patch Available (was: Open) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation --- Key: HBASE-4713 URL: https://issues.apache.org/jira/browse/HBASE-4713 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Lucian George Iordache Attachments: HBASE-4713-patch.txt The ExecutionException is logged on debug level, and it should be logged on warn. I've met the problem in the next case: - hbase.rpc.timeout = 6 - lease time on region server = 24 - started a scan that takes more than 60 seconds on the region server == SocketTimeoutException logged on debug Having the log level on info, the exception was not observable on the client side and it took me a while to figure out what was hapenning. See also: - https://issues.apache.org/jira/browse/HBASE-3154 - http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4724: --- Assignee: nkeywal Status: Patch Available (was: Open) TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Attachments: 2002_4724_TestAdmin.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104) org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
[jira] [Commented] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
[ https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142012#comment-13142012 ] gaojinchao commented on HBASE-4577: --- My local test result: Running org.apache.hadoop.hbase.TestMultiVersions Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 39.045 sec Results : Failed tests: testHBaseFsck(org.apache.hadoop.hbase.util.TestHBaseFsck): expected:0 but was:1 Tests in error: testMasterFailoverWithMockedRITOnDeadRS(org.apache.hadoop.hbase.master.TestMasterFailover): test timed out after 18 milliseconds testEnableTableRoundRobinAssignment(org.apache.hadoop.hbase.client.TestAdmin): org.apache.hadoop.hbase.TableNotEnabledException: testEnableTableAssignment testBadOriginalRootLocation(org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster): unknown host: example.org Tests run: 1073, Failures: 1, Errors: 3, Skipped: 9 Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB - Key: HBASE-4577 URL: https://issues.apache.org/jira/browse/HBASE-4577 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4577_trial_Trunk.patch, HBASE-4577_trunk.patch Minor issue while looking at the RS metrics: bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, storefileSizeMB=2420, compressionRatio=1.0008 I guess there's a truncation somewhere when it's adding the numbers up. FWIW there's no compression on that table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142011#comment-13142011 ] nkeywal commented on HBASE-4724: I don't see the same stuff in the global build. I am gonna give a try by removing the modifications linked to the wal in admin. TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Attachments: 2002_4724_TestAdmin.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
[jira] [Commented] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
[ https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142015#comment-13142015 ] Ted Yu commented on HBASE-4577: --- @Jinchao: Please start with test output of TestHFileOutputFormat#testMRIncrementalLoadWithSplit and see why the test failed. Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB - Key: HBASE-4577 URL: https://issues.apache.org/jira/browse/HBASE-4577 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4577_trial_Trunk.patch, HBASE-4577_trunk.patch Minor issue while looking at the RS metrics: bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, storefileSizeMB=2420, compressionRatio=1.0008 I guess there's a truncation somewhere when it's adding the numbers up. FWIW there's no compression on that table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142013#comment-13142013 ] Hadoop QA commented on HBASE-4724: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501929/2002_4724_TestAdmin.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/134//console This message is automatically generated. TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Attachments: 2002_4724_TestAdmin.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on
[jira] [Commented] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation
[ https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142016#comment-13142016 ] Hadoop QA commented on HBASE-4713: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501928/HBASE-4713-patch.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/135//console This message is automatically generated. Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation --- Key: HBASE-4713 URL: https://issues.apache.org/jira/browse/HBASE-4713 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Lucian George Iordache Attachments: HBASE-4713-patch.txt The ExecutionException is logged on debug level, and it should be logged on warn. I've met the problem in the next case: - hbase.rpc.timeout = 6 - lease time on region server = 24 - started a scan that takes more than 60 seconds on the region server == SocketTimeoutException logged on debug Having the log level on info, the exception was not observable on the client side and it took me a while to figure out what was hapenning. See also: - https://issues.apache.org/jira/browse/HBASE-3154 - http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4724: --- Status: Open (was: Patch Available) TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104) org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4724: --- Status: Patch Available (was: Open) TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104) org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4724: --- Attachment: 2002_4724_TestAdmin.v2.patch TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104) org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
[jira] [Commented] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation
[ https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142022#comment-13142022 ] Ted Yu commented on HBASE-4713: --- @Lucian: Your attachment is a diff file which is not recognized by HadoopQA. Can you generate a patch ? See http://wiki.apache.org/hadoop/Hbase/HowToContribute Once you check out source code and make the above modification, you can use 'svn diff 4713.patch' to obtain patch. Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation --- Key: HBASE-4713 URL: https://issues.apache.org/jira/browse/HBASE-4713 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Lucian George Iordache Attachments: HBASE-4713-patch.txt The ExecutionException is logged on debug level, and it should be logged on warn. I've met the problem in the next case: - hbase.rpc.timeout = 6 - lease time on region server = 24 - started a scan that takes more than 60 seconds on the region server == SocketTimeoutException logged on debug Having the log level on info, the exception was not observable on the client side and it took me a while to figure out what was hapenning. See also: - https://issues.apache.org/jira/browse/HBASE-3154 - http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4716) Improve locking for single column family bulk load
[ https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142026#comment-13142026 ] Ted Yu commented on HBASE-4716: --- closeBulkRegionOperation() is at the beginning of finally block. For the alternate code path, we only take one lock. The lock would be released in closeBulkRegionOperation() accordingly. Improve locking for single column family bulk load -- Key: HBASE-4716 URL: https://issues.apache.org/jira/browse/HBASE-4716 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: Ted Yu Priority: Critical Fix For: 0.92.0 Attachments: 4716-v2.txt, 4716.txt HBASE-4552 changed the locking behavior for single column family bulk load, namely we don't need to take write lock. A read lock would suffice in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142033#comment-13142033 ] Hudson commented on HBASE-1744: --- Integrated in HBase-TRUNK #2399 (See [https://builds.apache.org/job/HBase-TRUNK/2399/]) HBASE-1744 Thrift server to match the new java api (Tim Sell) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/bin/hbase * /hbase/trunk/src/examples/thrift2 * /hbase/trunk/src/examples/thrift2/DemoClient.java * /hbase/trunk/src/examples/thrift2/DemoClient.py * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2 * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/ThriftHBaseServiceHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/ThriftServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/ThriftUtilities.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumn.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnIncrement.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnValue.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TDelete.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TDeleteType.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TGet.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TIOError.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TIllegalArgument.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TIncrement.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TPut.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TResult.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TScan.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TTimeRange.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/package.html * /hbase/trunk/src/main/resources/org/apache/hadoop/hbase/thrift2 * /hbase/trunk/src/main/resources/org/apache/hadoop/hbase/thrift2/hbase.thrift * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/thrift2 * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/thrift2/TestThriftHBaseServiceHandler.java Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Tim Sell Priority: Critical Fix For: 0.94.0 Attachments: 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4722) TestGlobalMemStoreSize has started failing
[ https://issues.apache.org/jira/browse/HBASE-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142034#comment-13142034 ] Hudson commented on HBASE-4722: --- Integrated in HBase-TRUNK #2399 (See [https://builds.apache.org/job/HBase-TRUNK/2399/]) HBASE-4722 TestGlobalMemStoreSize has started failing; commit some extra logging to help debug whats going on up on jenkins stack : Files : * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java TestGlobalMemStoreSize has started failing -- Key: HBASE-4722 URL: https://issues.apache.org/jira/browse/HBASE-4722 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Attachments: logging-v2.txt, logging.txt I'm digging in. It fails occasionally for me locally to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.
[ https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142035#comment-13142035 ] Ted Yu commented on HBASE-4377: --- Integrated to 0.90, 0.92 and TRUNK. Thanks for the patch Jonathan. [hbck] Offline rebuild .META. from fs data only. Key: HBASE-4377 URL: https://issues.apache.org/jira/browse/HBASE-4377 Project: HBase Issue Type: New Feature Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90-v4.patch, 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90.v3.patch, 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch, 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v1.patch, 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v2.patch, EXT_AC.regioninfo, EXT_ATU_05f84d32cbc0bdabf00e00bc2f3570f0.regioninfo, hbase-4377-trunk.v2.patch, hbase-4377.0.90.v6.patch, hbase-4377.trunk.v3.txt, hbase-4377.trunk.v4.txt, hbase-4377.trunk.v5.txt, hbase-4377.trunk.v6.patch In a worst case situation, it may be helpful to have an offline .META. rebuilder that just looks at the file system's .regioninfos and rebuilds meta from scratch. Users could move bad regions out until there is a clean rebuild. It would likely fill in region split holes. Follow on work could given options to merge or select regions that overlap, or do online rebuilds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-1744: -- Attachment: 1744.addendum Addendum that makes TestThriftHBaseServiceHandler immune to hanging minicluster. Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Tim Sell Priority: Critical Fix For: 0.94.0 Attachments: 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 1744.addendum, HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation
[ https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucian George Iordache updated HBASE-4713: -- Status: Open (was: Patch Available) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation --- Key: HBASE-4713 URL: https://issues.apache.org/jira/browse/HBASE-4713 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Lucian George Iordache Attachments: HBASE-4713-patch.txt The ExecutionException is logged on debug level, and it should be logged on warn. I've met the problem in the next case: - hbase.rpc.timeout = 6 - lease time on region server = 24 - started a scan that takes more than 60 seconds on the region server == SocketTimeoutException logged on debug Having the log level on info, the exception was not observable on the client side and it took me a while to figure out what was hapenning. See also: - https://issues.apache.org/jira/browse/HBASE-3154 - http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation
[ https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucian George Iordache updated HBASE-4713: -- Status: Patch Available (was: Open) Try 4713.patch Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation --- Key: HBASE-4713 URL: https://issues.apache.org/jira/browse/HBASE-4713 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Lucian George Iordache Attachments: 4713.patch, HBASE-4713-patch.txt The ExecutionException is logged on debug level, and it should be logged on warn. I've met the problem in the next case: - hbase.rpc.timeout = 6 - lease time on region server = 24 - started a scan that takes more than 60 seconds on the region server == SocketTimeoutException logged on debug Having the log level on info, the exception was not observable on the client side and it took me a while to figure out what was hapenning. See also: - https://issues.apache.org/jira/browse/HBASE-3154 - http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation
[ https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucian George Iordache updated HBASE-4713: -- Attachment: 4713.patch Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation --- Key: HBASE-4713 URL: https://issues.apache.org/jira/browse/HBASE-4713 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Lucian George Iordache Attachments: 4713.patch, HBASE-4713-patch.txt The ExecutionException is logged on debug level, and it should be logged on warn. I've met the problem in the next case: - hbase.rpc.timeout = 6 - lease time on region server = 24 - started a scan that takes more than 60 seconds on the region server == SocketTimeoutException logged on debug Having the log level on info, the exception was not observable on the client side and it took me a while to figure out what was hapenning. See also: - https://issues.apache.org/jira/browse/HBASE-3154 - http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.
[ https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142071#comment-13142071 ] Hudson commented on HBASE-4377: --- Integrated in HBase-0.92 #98 (See [https://builds.apache.org/job/HBase-0.92/98/]) HBASE-4377 [hbck] Offline rebuild .META. from fs data only (Jonathan Hsieh) HBASE-4377 [hbck] Offline rebuild .META. from fs data only (Jonathan Hsieh) (detail) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java [hbck] Offline rebuild .META. from fs data only. Key: HBASE-4377 URL: https://issues.apache.org/jira/browse/HBASE-4377 Project: HBase Issue Type: New Feature Affects Versions: 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90-v4.patch, 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90.v3.patch, 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch, 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v1.patch, 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v2.patch, EXT_AC.regioninfo, EXT_ATU_05f84d32cbc0bdabf00e00bc2f3570f0.regioninfo, hbase-4377-trunk.v2.patch, hbase-4377.0.90.v6.patch, hbase-4377.trunk.v3.txt, hbase-4377.trunk.v4.txt, hbase-4377.trunk.v5.txt, hbase-4377.trunk.v6.patch In a worst case situation, it may be helpful to have an offline .META. rebuilder that just looks at the file system's .regioninfos and rebuilds meta from scratch. Users could move bad regions out until there is a clean rebuild. It would likely fill in region split holes. Follow on work could given options to merge or select regions that overlap, or do online rebuilds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3601) TestMasterFailover broken in TRUNK
[ https://issues.apache.org/jira/browse/HBASE-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-3601: -- Fix Version/s: 0.92.0 TestMasterFailover broken in TRUNK -- Key: HBASE-3601 URL: https://issues.apache.org/jira/browse/HBASE-3601 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.0 After HBASE-3573, went in, TestMasterFailover broke. The change in shutdown technique revealed an issue with our in-memory accounting when a master joins an already cluster; we don't add .META. and -ROOT- to our set of online regions in the new master so could make for some interesting issues as the new master progressed (Previous shutdown did a count of remaining servers, new shutdown process looks at in-memory state to see if only catalog carrying regionservers online... this is what was going out of whack in new master). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3605) Fix balancer log message
[ https://issues.apache.org/jira/browse/HBASE-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-3605: -- Fix Version/s: 0.92.0 Fix balancer log message Key: HBASE-3605 URL: https://issues.apache.org/jira/browse/HBASE-3605 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.92.0 From Gaojinchao up on user list: In balanceCluster function , It should be leastloaded= + serversByLoad.firstKey ().getLoad().getNumberOfRegions()) {code} if(serversByLoad.lastKey().getLoad().getNumberOfRegions() = max serversByLoad.firstKey().getLoad().getNumberOfRegions() = min) { // Skipped because no server outside (min,max) range LOG.info(Skipping load balancing. servers= + numServers + + regions= + numRegions + average= + average + + mostloaded= + serversByLoad.lastKey().getLoad().getNumberOfRegions() + leastloaded= + serversByLoad.lastKey().getLoad().getNumberOfRegions()); return null; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation
[ https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142119#comment-13142119 ] Hadoop QA commented on HBASE-4713: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501940/4713.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -165 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 42 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.master.TestMasterFailover Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/138//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/138//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/138//console This message is automatically generated. Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation --- Key: HBASE-4713 URL: https://issues.apache.org/jira/browse/HBASE-4713 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Lucian George Iordache Attachments: 4713.patch, HBASE-4713-patch.txt The ExecutionException is logged on debug level, and it should be logged on warn. I've met the problem in the next case: - hbase.rpc.timeout = 6 - lease time on region server = 24 - started a scan that takes more than 60 seconds on the region server == SocketTimeoutException logged on debug Having the log level on info, the exception was not observable on the client side and it took me a while to figure out what was hapenning. See also: - https://issues.apache.org/jira/browse/HBASE-3154 - http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation
[ https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4713: - Resolution: Fixed Fix Version/s: 0.92.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thank you for the patch Lucian George Iordache. I applied trunk and 0.92 branch.. Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation --- Key: HBASE-4713 URL: https://issues.apache.org/jira/browse/HBASE-4713 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Lucian George Iordache Fix For: 0.92.0 Attachments: 4713.patch, HBASE-4713-patch.txt The ExecutionException is logged on debug level, and it should be logged on warn. I've met the problem in the next case: - hbase.rpc.timeout = 6 - lease time on region server = 24 - started a scan that takes more than 60 seconds on the region server == SocketTimeoutException logged on debug Having the log level on info, the exception was not observable on the client side and it took me a while to figure out what was hapenning. See also: - https://issues.apache.org/jira/browse/HBASE-3154 - http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4716) Improve locking for single column family bulk load
[ https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142172#comment-13142172 ] stack commented on HBASE-4716: -- Thanks. +1 on commit. Improve locking for single column family bulk load -- Key: HBASE-4716 URL: https://issues.apache.org/jira/browse/HBASE-4716 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: Ted Yu Priority: Critical Fix For: 0.92.0 Attachments: 4716-v2.txt, 4716.txt HBASE-4552 changed the locking behavior for single column family bulk load, namely we don't need to take write lock. A read lock would suffice in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4609) ThriftServer.getRegionInfo() is expecting old ServerName format, need to use new Addressing class instead
[ https://issues.apache.org/jira/browse/HBASE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4609: -- Status: Open (was: Patch Available) ThriftServer.getRegionInfo() is expecting old ServerName format, need to use new Addressing class instead - Key: HBASE-4609 URL: https://issues.apache.org/jira/browse/HBASE-4609 Project: HBase Issue Type: Bug Components: thrift Affects Versions: 0.92.0, 0.94.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4609-v1.patch ThriftServer.getRegionInfo() is expecting the old ServerName that doesn't include start code. Need to fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
[ https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142187#comment-13142187 ] gaojinchao commented on HBASE-4577: --- Sorry, I am not familiar with MR. Continue to dig this issue tomorrow. Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB - Key: HBASE-4577 URL: https://issues.apache.org/jira/browse/HBASE-4577 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4577_trial_Trunk.patch, HBASE-4577_trunk.patch Minor issue while looking at the RS metrics: bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, storefileSizeMB=2420, compressionRatio=1.0008 I guess there's a truncation somewhere when it's adding the numbers up. FWIW there's no compression on that table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4609) ThriftServer.getRegionInfo() is expecting old ServerName format, need to use new Addressing class instead
[ https://issues.apache.org/jira/browse/HBASE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4609: -- Attachment: 4609-v2.txt ThriftServer.getRegionInfo() is expecting old ServerName format, need to use new Addressing class instead - Key: HBASE-4609 URL: https://issues.apache.org/jira/browse/HBASE-4609 Project: HBase Issue Type: Bug Components: thrift Affects Versions: 0.92.0, 0.94.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.92.0 Attachments: 4609-v2.txt, HBASE-4609-v1.patch ThriftServer.getRegionInfo() is expecting the old ServerName that doesn't include start code. Need to fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4609) ThriftServer.getRegionInfo() is expecting old ServerName format, need to use new Addressing class instead
[ https://issues.apache.org/jira/browse/HBASE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4609: -- Status: Patch Available (was: Open) ThriftServer.getRegionInfo() is expecting old ServerName format, need to use new Addressing class instead - Key: HBASE-4609 URL: https://issues.apache.org/jira/browse/HBASE-4609 Project: HBase Issue Type: Bug Components: thrift Affects Versions: 0.92.0, 0.94.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.92.0 Attachments: 4609-v2.txt, HBASE-4609-v1.patch ThriftServer.getRegionInfo() is expecting the old ServerName that doesn't include start code. Need to fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142197#comment-13142197 ] Ted Yu commented on HBASE-1744: --- From https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2399/artifact/trunk/target/surefire-reports/org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandler-output.txt: {code} 2011-11-02 09:18:25,930 INFO [main] zookeeper.MiniZooKeeperCluster(141): Failed binding ZK Server to client port: 21818 2011-11-02 09:18:25,958 INFO [main] zookeeper.MiniZooKeeperCluster(164): Started MiniZK Cluster and connect 1 ZK server on client port: 21819 {code} Let's see what happens in build 2400. Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Tim Sell Priority: Critical Fix For: 0.94.0 Attachments: 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 1744.addendum, HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142209#comment-13142209 ] Hudson commented on HBASE-1744: --- Integrated in HBase-TRUNK #2400 (See [https://builds.apache.org/job/HBase-TRUNK/2400/]) HBASE-1744 HBaseAdmin ctor should obtain Configuration from HBaseTestingUtility tedyu : Files : * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/thrift2/TestThriftHBaseServiceHandler.java Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Tim Sell Priority: Critical Fix For: 0.94.0 Attachments: 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 1744.addendum, HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142215#comment-13142215 ] stack commented on HBASE-4724: -- I applied your restore of wal behavior in v2. Lets see how it plays out on jenkins. TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Fix For: 0.92.0 Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142217#comment-13142217 ] stack commented on HBASE-4724: -- Oh, you need to do --no-prefix when making patches for patch-build. TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Fix For: 0.92.0 Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142229#comment-13142229 ] stack commented on HBASE-4724: -- @N I don't have version mismatch when I run TestAdmin. It doesn't fail for me either. bq. Adding a maximum retry to Threads#threadDumpingIsAlive could help. Yes. We should do this. No point in going on after we've thread dumped three times I'd say. TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Fix For: 0.92.0 Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485)
[jira] [Commented] (HBASE-4583) Integrate RWCC with Append and Increment operations
[ https://issues.apache.org/jira/browse/HBASE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142236#comment-13142236 ] stack commented on HBASE-4583: -- bq. There are various ways to produce serializable schedules (pessimistic locking, optimistic locking with rechecking of pre conditions, snapshot isolation, etc), all which will probably mean worse performance for both append and increment. Shouldn't we do it anyways (though big yuck on your list above -- it makes my brain hurt just thinking on it. Can you imagine rechecking pre-conditions and then replaying the failed transaction.. how much fun that'll be to code up!)? Shouldn't we be correct first and then performant? bq. As said above the current implementation sync's the WAL after the memstore is updated and the new values are visible to other threads, and after the locks are released. Sounds broke to me; sounds like big compromise for sake of better perf. Should we open new issue on this? bq. (1) and (2) together mean that the WAL needs to be sync'ed with the row lock held (which would be quite a performance degradation). Shouldn't we ship with this config. with options to run hbase otherwise (memstore put then sync, etc.) bq. Now, what we could do is use rwcc to make the changes to the CFs atomic, and still sync the WAL after all the locks are released (as we do now). With this compromise everything would be correct unless the sync'ing of WAL fails Sounds broke still? Thanks for the write up and for digging in here fellas. Integrate RWCC with Append and Increment operations --- Key: HBASE-4583 URL: https://issues.apache.org/jira/browse/HBASE-4583 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 4583-v2.txt, 4583-v3.txt, 4583-v4.txt, 4583.txt Currently Increment and Append operations do not work with RWCC and hence a client could see the results of multiple such operation mixed in the same Get/Scan. The semantics might be a bit more interesting here as upsert adds and removes to and from the memstore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142239#comment-13142239 ] stack commented on HBASE-4724: -- Before applying the patch, I got the below on random run: {code}Tests in error: testDisableAndEnableTable(org.apache.hadoop.hbase.client.TestAdmin): org.apache.hadoop.hbase.TableNotEnabledException: testDisableAndEnableTable{code} Trying w/ your v2 patch now. TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Fix For: 0.92.0 Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485)
[jira] [Commented] (HBASE-4480) Testing script to simplify local testing
[ https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142242#comment-13142242 ] stack commented on HBASE-4480: -- I like this script. We should check it in under new 'dev-support' dir? The usage is a bit off. It says '-n=N' when I think it means to say '-n N' Testing script to simplify local testing Key: HBASE-4480 URL: https://issues.apache.org/jira/browse/HBASE-4480 Project: HBase Issue Type: Improvement Reporter: Jesse Yates Priority: Minor Labels: test Attachments: HBASE-4480.patch, HBASE-4480_v2.patch, HBASE-4480_v3.patch, runtest-no-npe-check.sh, runtest.sh, runtest2.sh As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a script that would handle more of the finer points of running/checking our test suite. This script should: (1) Allow people to determine which tests are hanging/taking a long time to run (2) Allow rerunning of particular tests to make sure it wasn't an artifact of running the whole suite that caused the failure (3) Allow people to specify to run just unit tests or also integration tests (essentially wrapping calls to 'maven test' and 'maven verify'). This script should just be a convenience script - running tests directly from maven should not be impacted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4518) TestServerCustomProtocol is flaky
[ https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142244#comment-13142244 ] stack commented on HBASE-4518: -- I shouldn't have said anything. My mention of this issue caused the test to start failing up on jenkins again. TestServerCustomProtocol is flaky - Key: HBASE-4518 URL: https://issues.apache.org/jira/browse/HBASE-4518 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.92.0 Reporter: Gary Helmling Fix For: 0.92.0 Attachments: org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt, org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt TestServerCustomProtocol has been showing some intermittent failures in Jenkins due to what looks like region transitions. Here is the most recent failure: {noformat} Results : Failed tests: testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): Results should contain region test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb' {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142249#comment-13142249 ] nkeywal commented on HBASE-4724: I recloned the repo, and now it works all the time. I have this is in the logs, it seems to be new to me: {noformat} 2011-11-02 03:36:17,271 DEBUG [main] zookeeper.ZKUtil(1034): hconnection Retrieved 29 byte(s) of data from znode /hbase/root-region-server and set watcher; localhost,39688,1320229746489 2011-11-02 03:36:17,272 DEBUG [Finalizer] client.HConnectionManager$HConnectionImplementation(1715): The connection to null has been closed. 2011-11-02 03:36:17,272 DEBUG [Finalizer] client.HConnectionManager$HConnectionImplementation(1734): The connection to null was closed by the finalize method. 2011-11-02 03:36:17,272 DEBUG [Finalizer] client.HConnectionManager$HConnectionImplementation(1715): The connection to null has been closed. 2011-11-02 03:36:17,272 DEBUG [Finalizer] client.HConnectionManager$HConnectionImplementation(1734): The connection to null was closed by the finalize method. {noformat} Could it have a side effect in some cases? FWIW, I also got this once, but the test case succeeded anyway. It something I saw in the past already. {noformat} 2011-11-02 02:08:13,311 ERROR [MASTER_OPEN_REGION-localhost,40499,132022456-3] executor.EventHandler(171): Caught throwable while processing event RS_ZK_REGION_OPENED java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.updateTimers(AssignmentManager.java:1059) at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1033) at org.apache.hadoop.hbase.master.handler.OpenedRegionHandler.process(OpenedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2011-11-02 02:08:13,311 INFO [RS_OPEN_REGION-localhost,47967,132022999-2] regionserver.HRegion(402): Setting up tabledescriptor config now ... {noformat} TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Fix For: 0.92.0 Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat}
[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142251#comment-13142251 ] stack commented on HBASE-4724: -- Let me look at the above. I reran the TestAdmin a few times and got this again: {code} --- Test set: org.apache.hadoop.hbase.client.TestAdmin --- Tests run: 33, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 509.79 sec FAILURE! testEnableDisableAddColumnDeleteColumn(org.apache.hadoop.hbase.client.TestAdmin) Time elapsed: 0.841 sec ERROR! org.apache.hadoop.hbase.TableNotEnabledException: org.apache.hadoop.hbase.TableNotEnabledException: testMasterAdmin {code} You think some of the cuts in timers too aggressive still? TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Fix For: 0.92.0 Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
[jira] [Commented] (HBASE-4609) ThriftServer.getRegionInfo() is expecting old ServerName format, need to use new Addressing class instead
[ https://issues.apache.org/jira/browse/HBASE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142255#comment-13142255 ] Hadoop QA commented on HBASE-4609: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501968/4609-v2.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -165 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 42 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.master.TestMasterFailover Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/139//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/139//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/139//console This message is automatically generated. ThriftServer.getRegionInfo() is expecting old ServerName format, need to use new Addressing class instead - Key: HBASE-4609 URL: https://issues.apache.org/jira/browse/HBASE-4609 Project: HBase Issue Type: Bug Components: thrift Affects Versions: 0.92.0, 0.94.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.92.0 Attachments: 4609-v2.txt, HBASE-4609-v1.patch ThriftServer.getRegionInfo() is expecting the old ServerName that doesn't include start code. Need to fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4716) Improve locking for single column family bulk load
[ https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142257#comment-13142257 ] Ted Yu commented on HBASE-4716: --- Integrated to 0.92 and TRUNK. Thanks for the review Todd and Stack. Improve locking for single column family bulk load -- Key: HBASE-4716 URL: https://issues.apache.org/jira/browse/HBASE-4716 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: Ted Yu Priority: Critical Fix For: 0.92.0 Attachments: 4716-v2.txt, 4716.txt HBASE-4552 changed the locking behavior for single column family bulk load, namely we don't need to take write lock. A read lock would suffice in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4716) Improve locking for single column family bulk load
[ https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4716: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Improve locking for single column family bulk load -- Key: HBASE-4716 URL: https://issues.apache.org/jira/browse/HBASE-4716 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: Ted Yu Priority: Critical Fix For: 0.92.0 Attachments: 4716-v2.txt, 4716.txt HBASE-4552 changed the locking behavior for single column family bulk load, namely we don't need to take write lock. A read lock would suffice in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4725) NPE in AM#updateTimers
NPE in AM#updateTimers -- Key: HBASE-4725 URL: https://issues.apache.org/jira/browse/HBASE-4725 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4725) NPE in AM#updateTimers
[ https://issues.apache.org/jira/browse/HBASE-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142267#comment-13142267 ] stack commented on HBASE-4725: -- {code} 2011-11-02 02:08:13,311 ERROR [MASTER_OPEN_REGION-localhost,40499,132022456-3] executor.EventHandler(171): Caught throwable while processing event RS_ZK_REGION_OPENED java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.updateTimers(AssignmentManager.java:1059) at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1033) at org.apache.hadoop.hbase.master.handler.OpenedRegionHandler.process(OpenedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2011-11-02 02:08:13,311 INFO [RS_OPEN_REGION-localho {code} NPE in AM#updateTimers -- Key: HBASE-4725 URL: https://issues.apache.org/jira/browse/HBASE-4725 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.0 Attachments: am.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4725) NPE in AM#updateTimers
[ https://issues.apache.org/jira/browse/HBASE-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4725: - Status: Patch Available (was: Open) NPE in AM#updateTimers -- Key: HBASE-4725 URL: https://issues.apache.org/jira/browse/HBASE-4725 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.0 Attachments: am.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4725) NPE in AM#updateTimers
[ https://issues.apache.org/jira/browse/HBASE-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4725: - Attachment: am.txt NPE in AM#updateTimers -- Key: HBASE-4725 URL: https://issues.apache.org/jira/browse/HBASE-4725 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.0 Attachments: am.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142269#comment-13142269 ] stack commented on HBASE-4724: -- I made HBASE-4725 for the NPE. Looking at the null connection... TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Fix For: 0.92.0 Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142272#comment-13142272 ] nkeywal commented on HBASE-4724: The timer I changed on the admin are all with a condition; so instead of checking once per second they check 5 times per second. This should not change the final behaviour, and I took care of not changing the final timemout. testEnableDisableAddColumnDeleteColumn has not been directly impacted by 4703, there is no timer there. What's the line that's throwing this exception? TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Fix For: 0.92.0 Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
[jira] [Commented] (HBASE-4415) Add configuration script for setup HBase (hbase-setup-conf.sh)
[ https://issues.apache.org/jira/browse/HBASE-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142284#comment-13142284 ] stack commented on HBASE-4415: -- Ditto on what Ted asks above. Why would we have this duplicated (and now behind) hbase-env.sh over in src/packages/templates/conf/hbase-env.sh? Why would we not copy it from original location? @Andrew I see this patch has: {code} + property +namehbase.master.kerberos.principal/name +value${HBASE_M_K_PRINCIPAL}/value +description/description + /property {code} We need more than that now? Add configuration script for setup HBase (hbase-setup-conf.sh) -- Key: HBASE-4415 URL: https://issues.apache.org/jira/browse/HBASE-4415 Project: HBase Issue Type: New Feature Components: scripts Affects Versions: 0.90.4, 0.92.0 Environment: Java 6, Linux Reporter: Eric Yang Assignee: Eric Yang Fix For: 0.90.4, 0.92.0 Attachments: HBASE-4415-1.patch, HBASE-4415-2.patch, HBASE-4415-3.patch, HBASE-4415-4.patch, HBASE-4415-5.patch, HBASE-4415-6.patch, HBASE-4415.patch The goal of this jura is to provide a installation script for configuring HBase environment and configuration. By using the same pattern of *-setup-conf.sh for all Hadoop related projects. For HBase, the usage of the script looks like this: {noformat} usage: ./hbase-setup-conf.sh parameters Optional parameters: --hadoop-conf=/etc/hadoopSet Hadoop configuration directory location --hadoop-home=/usr Set Hadoop directory location --hadoop-namenode=localhost Set Hadoop namenode hostname --hadoop-replication=3 Set HDFS replication --hbase-home=/usrSet HBase directory location --hbase-conf=/etc/hbase Set HBase configuration directory location --hbase-log=/var/log/hbase Set HBase log directory location --hbase-pid=/var/run/hbase Set HBase pid directory location --hbase-user=hbase Set HBase user --java-home=/usr/java/defaultSet JAVA_HOME directory location --kerberos-realm=KERBEROS.EXAMPLE.COMSet Kerberos realm --kerberos-principal-id=_HOSTSet Kerberos principal ID --keytab-dir=/etc/security/keytabs Set keytab directory --regionservers=localhostSet regionservers hostnames --zookeeper-home=/usrSet ZooKeeper directory location --zookeeper-quorum=localhost Set ZooKeeper Quorum --zookeeper-snapshot=/var/lib/zookeeper Set ZooKeeper snapshot location {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4523) dfs.support.append config should be present in the hadoop configs, we should remove them from hbase so the user is not confused when they see the config in 2 places
[ https://issues.apache.org/jira/browse/HBASE-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142285#comment-13142285 ] stack commented on HBASE-4523: -- This patch looks fine. dfs.support.append config should be present in the hadoop configs, we should remove them from hbase so the user is not confused when they see the config in 2 places Key: HBASE-4523 URL: https://issues.apache.org/jira/browse/HBASE-4523 Project: HBase Issue Type: Bug Affects Versions: 0.90.4, 0.92.0 Reporter: Arpit Gupta Assignee: Eric Yang Fix For: 0.90.4, 0.92.0 Attachments: HBASE-4523.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4535) hbase-env.sh in hbase rpm does not set HBASE_CONF_DIR
[ https://issues.apache.org/jira/browse/HBASE-4535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142286#comment-13142286 ] stack commented on HBASE-4535: -- Patch looks fine. Should this be done over in original hbase-env.sh? hbase-env.sh in hbase rpm does not set HBASE_CONF_DIR - Key: HBASE-4535 URL: https://issues.apache.org/jira/browse/HBASE-4535 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.90.3 Reporter: Ramya Sunil Assignee: Eric Yang Attachments: HBASE-4535.patch After a hbase rpm install, hbase-env.sh does not define HBASE_CONF_DIR. This needs to be fixed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4635) Remove dependency of java for rpm/deb packaging
[ https://issues.apache.org/jira/browse/HBASE-4635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142288#comment-13142288 ] stack commented on HBASE-4635: -- This has this change: {code} . /etc/default/hadoop-env.sh -. /etc/default/zookeeper-env.sh {code} ... which is a little unrelated. You are trying to make this hbase-env.sh same as the original? Remove dependency of java for rpm/deb packaging --- Key: HBASE-4635 URL: https://issues.apache.org/jira/browse/HBASE-4635 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.92.0 Environment: Java, Ubuntu, RHEL Reporter: Eric Yang Assignee: Eric Yang Attachments: HBASE-4635.patch Comment from HBASE-3606: Eric, it looks like hbase rpm spec file sets dependency on jdk. Can we remove the jdk dependency ? As everyone will not be installing jdk through rpm. There are multiple ways to install Java on Linux. It would be better to remove Java dependency declaration for packaging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4518) TestServerCustomProtocol is flaky
[ https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Helmling updated HBASE-4518: - Attachment: HBASE-4518.patch This patch cleans up the config of the PingHandler endpoint and changes the mini cluster to use a single region server to avoid region transition issues. With this patch I was able to run TestServerCustomProtocol in a batch of 50 runs with no failures. TestServerCustomProtocol is flaky - Key: HBASE-4518 URL: https://issues.apache.org/jira/browse/HBASE-4518 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.92.0 Reporter: Gary Helmling Fix For: 0.92.0 Attachments: HBASE-4518.patch, org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt, org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt TestServerCustomProtocol has been showing some intermittent failures in Jenkins due to what looks like region transitions. Here is the most recent failure: {noformat} Results : Failed tests: testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): Results should contain region test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb' {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4518) TestServerCustomProtocol is flaky
[ https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Helmling updated HBASE-4518: - Assignee: Gary Helmling Status: Patch Available (was: Open) TestServerCustomProtocol is flaky - Key: HBASE-4518 URL: https://issues.apache.org/jira/browse/HBASE-4518 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.92.0 Reporter: Gary Helmling Assignee: Gary Helmling Fix For: 0.92.0 Attachments: HBASE-4518.patch, org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt, org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt TestServerCustomProtocol has been showing some intermittent failures in Jenkins due to what looks like region transitions. Here is the most recent failure: {noformat} Results : Failed tests: testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): Results should contain region test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb' {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4518) TestServerCustomProtocol is flaky
[ https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142294#comment-13142294 ] stack commented on HBASE-4518: -- This line: {code} +util.getConfiguration().set(CoprocessorHost.REGION_COPROCESSOR_CONF_KEY, +PingHandler.class.getName()); {code} does what {code} -// TODO: use a test coprocessor for registration (once merged with CP code) -// sleep here is an ugly hack to allow region transitions to finish -Thread.sleep(5000); -for (JVMClusterUtil.RegionServerThread t : - cluster.getRegionServerThreads()) { - for (HRegionInfo r : t.getRegionServer().getOnlineRegions()) { -t.getRegionServer().getOnlineRegion(r.getRegionName()) -.registerProtocol(PingProtocol.class, new PingHandler()); - } -} {code} ... used to do? If so, +1 on commit if patch-build gives back reasonable results. TestServerCustomProtocol is flaky - Key: HBASE-4518 URL: https://issues.apache.org/jira/browse/HBASE-4518 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.92.0 Reporter: Gary Helmling Assignee: Gary Helmling Fix For: 0.92.0 Attachments: HBASE-4518.patch, org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt, org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt TestServerCustomProtocol has been showing some intermittent failures in Jenkins due to what looks like region transitions. Here is the most recent failure: {noformat} Results : Failed tests: testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): Results should contain region test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb' {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4726) RS should close region if it fails to mark it as 'OPENED'.
RS should close region if it fails to mark it as 'OPENED'. -- Key: HBASE-4726 URL: https://issues.apache.org/jira/browse/HBASE-4726 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.89.20100924 Reporter: Madhuwanti Vaidya Assignee: Madhuwanti Vaidya Priority: Minor Currently if a RS fails to mark a region as 'OPENED' it only logs an error. It will leave the region open - this has caused duplicate region assignments in one of our production clusters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142299#comment-13142299 ] stack commented on HBASE-4724: -- @N Trying to reproduce but testing the NPE fix at same time its not failing for me now I'll let it run longer. TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Fix For: 0.92.0 Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
[jira] [Updated] (HBASE-4553) The update of .tableinfo is not atomic; we remove then rename
[ https://issues.apache.org/jira/browse/HBASE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4553: - Attachment: 4553-v11.txt Same as v10. The update of .tableinfo is not atomic; we remove then rename - Key: HBASE-4553 URL: https://issues.apache.org/jira/browse/HBASE-4553 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Priority: Critical Fix For: 0.92.0 Attachments: 3446-v8.txt, 4553-v10.txt, 4553-v11.txt, 4553-v5.txt, 4553-v9.txt, HBase-4553-TestAvroServer.patch This comes of HBASE-4547. The rename in 0.20 hdfs fails if file exists already. In 0.20+ its better but still 'some' issues if existing reader when file is renamed. This issue is about fixing this (though we depend on fix first being in hdfs). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HBASE-4553) The update of .tableinfo is not atomic; we remove then rename
[ https://issues.apache.org/jira/browse/HBASE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-4553 started by stack. The update of .tableinfo is not atomic; we remove then rename - Key: HBASE-4553 URL: https://issues.apache.org/jira/browse/HBASE-4553 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Priority: Critical Fix For: 0.92.0 Attachments: 3446-v8.txt, 4553-v10.txt, 4553-v11.txt, 4553-v5.txt, 4553-v9.txt, HBase-4553-TestAvroServer.patch This comes of HBASE-4547. The rename in 0.20 hdfs fails if file exists already. In 0.20+ its better but still 'some' issues if existing reader when file is renamed. This issue is about fixing this (though we depend on fix first being in hdfs). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4553) The update of .tableinfo is not atomic; we remove then rename
[ https://issues.apache.org/jira/browse/HBASE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4553: - Status: Open (was: Patch Available) The update of .tableinfo is not atomic; we remove then rename - Key: HBASE-4553 URL: https://issues.apache.org/jira/browse/HBASE-4553 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Priority: Critical Fix For: 0.92.0 Attachments: 3446-v8.txt, 4553-v10.txt, 4553-v11.txt, 4553-v5.txt, 4553-v9.txt, HBase-4553-TestAvroServer.patch This comes of HBASE-4547. The rename in 0.20 hdfs fails if file exists already. In 0.20+ its better but still 'some' issues if existing reader when file is renamed. This issue is about fixing this (though we depend on fix first being in hdfs). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4518) TestServerCustomProtocol is flaky
[ https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142307#comment-13142307 ] Gary Helmling commented on HBASE-4518: -- @Stack, Yes, for some reason in the original code I was being a bit too clever and registering the protocol handler directly. Maybe it was before all the connecting bits had been filled in yet... In any case, the manual registration in the original code would mean that the PingHandler would not get re-registered if a region closed on one RS and was reopened on another. So that is a flaw. And the test code should really be doing what we tell people to do with endpoints, which is to configure them as coprocessors. TestServerCustomProtocol is flaky - Key: HBASE-4518 URL: https://issues.apache.org/jira/browse/HBASE-4518 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.92.0 Reporter: Gary Helmling Assignee: Gary Helmling Fix For: 0.92.0 Attachments: HBASE-4518.patch, org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt, org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt TestServerCustomProtocol has been showing some intermittent failures in Jenkins due to what looks like region transitions. Here is the most recent failure: {noformat} Results : Failed tests: testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): Results should contain region test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb' {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4719) HBase script assumes pre-Hadoop 0.21 layout of jar files
[ https://issues.apache.org/jira/browse/HBASE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Shaposhnik updated HBASE-4719: Status: Patch Available (was: Open) HBase script assumes pre-Hadoop 0.21 layout of jar files Key: HBASE-4719 URL: https://issues.apache.org/jira/browse/HBASE-4719 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.0 Reporter: Roman Shaposhnik Attachments: HBASE-4719.patch.txt The following in the bin/hbase: {noformat} HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls ${HADOOP_HOME}/hadoop-core*.jar`) {noformat} assumes a pre-21 Hadoop layout. It'll be better to dynamically account for either hadoop-core* or hadoop-common*, hadoop-hdfs*, hadoop-mapreduce* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4727) Don't unconditionally delete UNASSIGNED ZNode for a region.
Don't unconditionally delete UNASSIGNED ZNode for a region. --- Key: HBASE-4727 URL: https://issues.apache.org/jira/browse/HBASE-4727 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.89.20100924 Reporter: Madhuwanti Vaidya Assignee: Madhuwanti Vaidya Priority: Minor Unconditionally deleting an UNASSIGNED ZNode when master processes RS2ZK_REGION_OPENED (from the toDo queue) for a region has caused multiply assigned regions or unassigned regions. One proposed fix is to check whether the ZNode is actually in the state RS2ZK_REGION_OPENED before deleting it. Another fix is to not delete the ZNode at all. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4719) HBase script assumes pre-Hadoop 0.21 layout of jar files
[ https://issues.apache.org/jira/browse/HBASE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Shaposhnik updated HBASE-4719: Attachment: HBASE-4719.patch.txt HBase script assumes pre-Hadoop 0.21 layout of jar files Key: HBASE-4719 URL: https://issues.apache.org/jira/browse/HBASE-4719 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.0 Reporter: Roman Shaposhnik Attachments: HBASE-4719.patch.txt The following in the bin/hbase: {noformat} HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls ${HADOOP_HOME}/hadoop-core*.jar`) {noformat} assumes a pre-21 Hadoop layout. It'll be better to dynamically account for either hadoop-core* or hadoop-common*, hadoop-hdfs*, hadoop-mapreduce* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142308#comment-13142308 ] nkeywal commented on HBASE-4724: It happened once out of may be 20 tries. For the null connection, there is at least a leak in {noformat} @Before public void setUp() throws Exception { this.admin = new HBaseAdmin(TEST_UTIL.getConfiguration()); } {noformat} But I guess it's not the only one, because I saw quite a lot of lines in the logs from Jenkins. TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Fix For: 0.92.0 Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107) Thread 152 (Master:0;localhost,39664,1320187706355): State: WAITING Blocked count: 217 Waited count: 174 Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c Stack: java.lang.Object.wait(Native Method)
[jira] [Updated] (HBASE-3716) Intermittent TestRegionRebalancing failure
[ https://issues.apache.org/jira/browse/HBASE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-3716: -- Fix Version/s: 0.92.0 Intermittent TestRegionRebalancing failure -- Key: HBASE-3716 URL: https://issues.apache.org/jira/browse/HBASE-3716 Project: HBase Issue Type: Bug Components: master Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.92.0 Attachments: 3716-addendum.txt, 3716.txt See HBase-TRUNK build #1820 This could be due to HBASE-3681 In trunk, default value of hbase.regions.slop is 20%. It is possible for load balancer to see region distribution which falls within 20% of optimal distribution. However, assertRegionsAreBalanced() uses 10% slop. One solution is to align the slop in assertRegionsAreBalanced() with hbase.regions.slop value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4728) Clean up noisy HBaseAdmin#close messages
Clean up noisy HBaseAdmin#close messages Key: HBASE-4728 URL: https://issues.apache.org/jira/browse/HBASE-4728 Project: HBase Issue Type: Bug Reporter: stack See tail of HBASE-4724 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk
[ https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142320#comment-13142320 ] stack commented on HBASE-4724: -- Ok. Want to make new issue to fix the leak? It just failed for me with this: {code} Tests in error: testHundredsOfTable(org.apache.hadoop.hbase.client.TestAdmin): Call to sv4r11s38/10.4.11.38:46152 failed on socket timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.4.11.38:52051 remote=sv4r11s38/10.4.11.38:46152] {code} Is 1500ms not enough? Did you change that? On the message: {code} 2011-11-02 03:36:17,272 DEBUG [Finalizer] client.HConnectionManager$HConnectionImplementation(1715): The connection to null has been closed. {code} ... yeah, it looks like this test is making lots of instances of HBaseAdmin... ones it should be cleaning up but it does look like the message is harmless... a close of a closed connection. I made HBASE-4728 for this. TestAdmin hangs randomly in trunk - Key: HBASE-4724 URL: https://issues.apache.org/jira/browse/HBASE-4724 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Critical Fix For: 0.92.0 Attachments: 2002_4724_TestAdmin.patch, 2002_4724_TestAdmin.v2.patch fom the logs in my env {noformat} 2011-11-01 15:48:40,744 WARN [Master:0;localhost,39664,1320187706355] master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, server = 29) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300) {noformat} Anyway, after this the logs finishes with: {noformat} 2011-11-01 15:54:35,132 INFO [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): Master:0;localhost,39664,1320187706355.oldLogCleaner exiting Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on Master:0;localhost,39664,1320187706355 {noformat} it's in {noformat} sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405) org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590) org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89) {noformat} So that's at least why adding a timeout wont help and may be why it does not end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help. I also wonder if the root cause of the non ending is my modif on the wal, with some threads surprised to have updates that were not written in the wal. Here is the full stack dump: {noformat} Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from nkeywal): State: TIMED_WAITING Blocked count: 360 Waited count: 359 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702) org.apache.hadoop.ipc.Client$Connection.run(Client.java:744) Thread 272 (Master:0;localhost,39664,1320187706355-EventThread): State: WAITING Blocked count: 0 Waited count: 4 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)): State: RUNNABLE Blocked count: 2 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
[jira] [Commented] (HBASE-4719) HBase script assumes pre-Hadoop 0.21 layout of jar files
[ https://issues.apache.org/jira/browse/HBASE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142322#comment-13142322 ] stack commented on HBASE-4719: -- Have you tried it on hbase trunk and then on an hbase with 0.21+ hadoop plugged in Roman? HBase script assumes pre-Hadoop 0.21 layout of jar files Key: HBASE-4719 URL: https://issues.apache.org/jira/browse/HBASE-4719 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.0 Reporter: Roman Shaposhnik Attachments: HBASE-4719.patch.txt The following in the bin/hbase: {noformat} HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls ${HADOOP_HOME}/hadoop-core*.jar`) {noformat} assumes a pre-21 Hadoop layout. It'll be better to dynamically account for either hadoop-core* or hadoop-common*, hadoop-hdfs*, hadoop-mapreduce* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation
[ https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142323#comment-13142323 ] Hudson commented on HBASE-4713: --- Integrated in HBase-0.92 #100 (See [https://builds.apache.org/job/HBase-0.92/100/]) HBASE-4713 Raise debug level to warn on ExecutionException in HConnectionManager stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation --- Key: HBASE-4713 URL: https://issues.apache.org/jira/browse/HBASE-4713 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Lucian George Iordache Fix For: 0.92.0 Attachments: 4713.patch, HBASE-4713-patch.txt The ExecutionException is logged on debug level, and it should be logged on warn. I've met the problem in the next case: - hbase.rpc.timeout = 6 - lease time on region server = 24 - started a scan that takes more than 60 seconds on the region server == SocketTimeoutException logged on debug Having the log level on info, the exception was not observable on the client side and it took me a while to figure out what was hapenning. See also: - https://issues.apache.org/jira/browse/HBASE-3154 - http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4716) Improve locking for single column family bulk load
[ https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142326#comment-13142326 ] Todd Lipcon commented on HBASE-4716: this was already committed, but I just got to my email for the day: - why does getColumnFamilyType return an enum when all we care about is hasMultipleColumnFamilies() (a boolean?) This makes it harder to understand. (this function doesn't return a type of column family.) - even if you use an enum, our style is not ALL_CAPS for enum class names. Only ALL_CAPS for the values. - why is the new function public? it should be private. Improve locking for single column family bulk load -- Key: HBASE-4716 URL: https://issues.apache.org/jira/browse/HBASE-4716 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: Ted Yu Priority: Critical Fix For: 0.92.0 Attachments: 4716-v2.txt, 4716.txt HBASE-4552 changed the locking behavior for single column family bulk load, namely we don't need to take write lock. A read lock would suffice in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4729) Race between online altering and splitting kills the master
Race between online altering and splitting kills the master --- Key: HBASE-4729 URL: https://issues.apache.org/jira/browse/HBASE-4729 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Fix For: 0.92.0, 0.94.0 I was running an online alter while regions were splitting, and suddenly the master died and left my table half-altered (haven't restarted the master yet). What killed the master: {quote} 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception creating node CLOSING org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 at org.apache.zookeeper.KeeperException.create(KeeperException.java:110) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) at org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} A znode was created because the region server was splitting the region 4 seconds before: {quote} 2011-11-02 17:06:40,704 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. 2011-11-02 17:06:40,704 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:62023-0x132f043bbde0710 Creating ephemeral node for f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Attempting to transition node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING ... 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Successfully transitioned node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLIT 2011-11-02 17:06:44,061 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for f7e1783e65ea8d621a4bc96ad310f101 {quote} Now that the master is dead the region server is spewing those last two lines like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4716) Improve locking for single column family bulk load
[ https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142339#comment-13142339 ] Ted Yu commented on HBASE-4716: --- When creating a method for hasMultipleColumnFamilies(), I found that I should deal with the possibility of familyPaths being null. So I created an enum that can represent tri-state. I thought getColumnFamilyType() might be useful in other occasions. For now, I can change it to private. I will change spelling for the enum type as well. Improve locking for single column family bulk load -- Key: HBASE-4716 URL: https://issues.apache.org/jira/browse/HBASE-4716 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: Ted Yu Priority: Critical Fix For: 0.92.0 Attachments: 4716-v2.txt, 4716.txt HBASE-4552 changed the locking behavior for single column family bulk load, namely we don't need to take write lock. A read lock would suffice in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3898) TestSplitTransactionOnCluster broke in TRUNK
[ https://issues.apache.org/jira/browse/HBASE-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-3898: -- Fix Version/s: 0.92.0 TestSplitTransactionOnCluster broke in TRUNK Key: HBASE-3898 URL: https://issues.apache.org/jira/browse/HBASE-3898 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.0 Attachments: 3898.txt It hangs for 15 minutes. I see a NPE trying to split a region. The splitKey passed is null. Looks to be by-product of recent compaction refactorings. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4719) HBase script assumes pre-Hadoop 0.21 layout of jar files
[ https://issues.apache.org/jira/browse/HBASE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4719: - Fix Version/s: 0.92.0 Assignee: Roman Shaposhnik HBase script assumes pre-Hadoop 0.21 layout of jar files Key: HBASE-4719 URL: https://issues.apache.org/jira/browse/HBASE-4719 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.0 Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Fix For: 0.92.0 Attachments: HBASE-4719.patch.txt The following in the bin/hbase: {noformat} HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls ${HADOOP_HOME}/hadoop-core*.jar`) {noformat} assumes a pre-21 Hadoop layout. It'll be better to dynamically account for either hadoop-core* or hadoop-common*, hadoop-hdfs*, hadoop-mapreduce* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4716) Improve locking for single column family bulk load
[ https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142361#comment-13142361 ] Ted Yu commented on HBASE-4716: --- When familyPaths is null, hasMultipleColumnFamilies() returns false. Just want to confirm. Personally I think patch v1 looks cleaner. Improve locking for single column family bulk load -- Key: HBASE-4716 URL: https://issues.apache.org/jira/browse/HBASE-4716 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: Ted Yu Priority: Critical Fix For: 0.92.0 Attachments: 4716-v2.txt, 4716.addendum, 4716.txt HBASE-4552 changed the locking behavior for single column family bulk load, namely we don't need to take write lock. A read lock would suffice in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4716) Improve locking for single column family bulk load
[ https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4716: -- Attachment: (was: 4716.addendum) Improve locking for single column family bulk load -- Key: HBASE-4716 URL: https://issues.apache.org/jira/browse/HBASE-4716 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: Ted Yu Priority: Critical Fix For: 0.92.0 Attachments: 4716-v2.txt, 4716.txt HBASE-4552 changed the locking behavior for single column family bulk load, namely we don't need to take write lock. A read lock would suffice in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4716) Improve locking for single column family bulk load
[ https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4716: -- Attachment: 4716.addendum Removed enum in addendum. TestLoadIncrementalHFilesSplitRecovery passes. Improve locking for single column family bulk load -- Key: HBASE-4716 URL: https://issues.apache.org/jira/browse/HBASE-4716 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: Ted Yu Priority: Critical Fix For: 0.92.0 Attachments: 4716-v2.txt, 4716.addendum, 4716.txt HBASE-4552 changed the locking behavior for single column family bulk load, namely we don't need to take write lock. A read lock would suffice in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-1744: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Tim Sell Priority: Critical Fix For: 0.94.0 Attachments: 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 1744.addendum, HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4725) NPE in AM#updateTimers
[ https://issues.apache.org/jira/browse/HBASE-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142393#comment-13142393 ] Hadoop QA commented on HBASE-4725: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501982/am.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -165 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 42 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/140//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/140//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/140//console This message is automatically generated. NPE in AM#updateTimers -- Key: HBASE-4725 URL: https://issues.apache.org/jira/browse/HBASE-4725 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.0 Attachments: am.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4716) Improve locking for single column family bulk load
[ https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4716: -- Attachment: (was: 4716.addendum) Improve locking for single column family bulk load -- Key: HBASE-4716 URL: https://issues.apache.org/jira/browse/HBASE-4716 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: Ted Yu Priority: Critical Fix For: 0.92.0 Attachments: 4716-v2.txt, 4716.txt HBASE-4552 changed the locking behavior for single column family bulk load, namely we don't need to take write lock. A read lock would suffice in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4583) Integrate RWCC with Append and Increment operations
[ https://issues.apache.org/jira/browse/HBASE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142398#comment-13142398 ] Lars Hofhansl commented on HBASE-4583: -- You make a good point. If people want performance they'd pass false as wreToWal. Otherwise they will get correct and slow behavior. Integrate RWCC with Append and Increment operations --- Key: HBASE-4583 URL: https://issues.apache.org/jira/browse/HBASE-4583 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Attachments: 4583-v2.txt, 4583-v3.txt, 4583-v4.txt, 4583.txt Currently Increment and Append operations do not work with RWCC and hence a client could see the results of multiple such operation mixed in the same Get/Scan. The semantics might be a bit more interesting here as upsert adds and removes to and from the memstore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4716) Improve locking for single column family bulk load
[ https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142399#comment-13142399 ] Todd Lipcon commented on HBASE-4716: +1 on addendum, thanks Ted Improve locking for single column family bulk load -- Key: HBASE-4716 URL: https://issues.apache.org/jira/browse/HBASE-4716 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: Ted Yu Priority: Critical Fix For: 0.92.0 Attachments: 4716-v2.txt, 4716.addendum, 4716.txt HBASE-4552 changed the locking behavior for single column family bulk load, namely we don't need to take write lock. A read lock would suffice in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4518) TestServerCustomProtocol is flaky
[ https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142400#comment-13142400 ] Hadoop QA commented on HBASE-4518: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501985/HBASE-4518.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -165 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 42 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.TestGlobalMemStoreSize org.apache.hadoop.hbase.master.TestMasterFailover Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/141//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/141//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/141//console This message is automatically generated. TestServerCustomProtocol is flaky - Key: HBASE-4518 URL: https://issues.apache.org/jira/browse/HBASE-4518 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.92.0 Reporter: Gary Helmling Assignee: Gary Helmling Fix For: 0.92.0 Attachments: HBASE-4518.patch, org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt, org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt TestServerCustomProtocol has been showing some intermittent failures in Jenkins due to what looks like region transitions. Here is the most recent failure: {noformat} Results : Failed tests: testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): Results should contain region test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb' {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows
[ https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142405#comment-13142405 ] jirapos...@reviews.apache.org commented on HBASE-4536: -- bq. On 2011-11-02 06:44:54, Prakash Khemani wrote: bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, lines 210-212 bq. https://reviews.apache.org/r/2178/diff/12/?file=50954#file50954line210 bq. bq. Hi Lars, Isn't this early-out problematic? It doesn't take into account min-versions. It doesn't take into account the newly introduced keepDeletedCells mode. The early out only happens when miversions is not set. Check out *ColumnTracker - Lars --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2178/#review3009 --- On 2011-10-18 21:43:38, Lars Hofhansl wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2178/ bq. --- bq. bq. (Updated 2011-10-18 21:43:38) bq. bq. bq. Review request for hbase, Ted Yu and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. HBase timerange Gets and Scans allow to do timetravel in HBase. I.e. look at the state of the data at any point in the past, provided the data is still around. bq. This did not work for deletes, however. Deletes would always mask all puts in the past. bq. This change adds a flag that can be on HColumnDescriptor to enable retention of deleted rows. bq. These rows are still subject to TTL and/or VERSIONS. bq. bq. This changes the following: bq. 1. There is a new flag on HColumnDescriptor enabling that behavior. bq. 2. Allow gets/scans with a timerange to retrieve rows hidden by a delete marker, if the timerange does not include the delete marker. bq. 3. Do not unconditionally collect all deleted rows during a compaction. bq. 4. Allow a raw Scan, which retrieves all delete markers and deleted rows. bq. bq. The change is small'ish, but the logic is intricate, so please review carefully. bq. bq. bq. This addresses bug HBASE-4536. bq. https://issues.apache.org/jira/browse/HBASE-4536 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Attributes.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1185362 bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeepDeletes.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1185362 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java 1185362
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Status: Patch Available (was: Open) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.92.0 Attachments: 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Attachment: 4213-Nov-2-2011_patch_.patch Patch from Subbu for TRUNK. Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.92.0 Attachments: 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation
[ https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142418#comment-13142418 ] Hudson commented on HBASE-4713: --- Integrated in HBase-TRUNK #2401 (See [https://builds.apache.org/job/HBase-TRUNK/2401/]) HBASE-4713 Raise debug level to warn on ExecutionException in HConnectionManager stack : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation --- Key: HBASE-4713 URL: https://issues.apache.org/jira/browse/HBASE-4713 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Lucian George Iordache Fix For: 0.92.0 Attachments: 4713.patch, HBASE-4713-patch.txt The ExecutionException is logged on debug level, and it should be logged on warn. I've met the problem in the next case: - hbase.rpc.timeout = 6 - lease time on region server = 24 - started a scan that takes more than 60 seconds on the region server == SocketTimeoutException logged on debug Having the log level on info, the exception was not observable on the client side and it took me a while to figure out what was hapenning. See also: - https://issues.apache.org/jira/browse/HBASE-3154 - http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira