[jira] [Commented] (HBASE-5444) Add PB-based calls to HMasterRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267217#comment-13267217 ] Hadoop QA commented on HBASE-5444: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525392/HBASE-5444-v10-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1740//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1740//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1740//console This message is automatically generated. Add PB-based calls to HMasterRegionInterface Key: HBASE-5444 URL: https://issues.apache.org/jira/browse/HBASE-5444 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Gregory Chanan Attachments: HBASE-5444-v10-trunk.patch, HBASE-5444-v6-trunk.patch, HBASE-5444-v9-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267218#comment-13267218 ] chunhui shen commented on HBASE-5916: - We have also encountered this issue. What about remove {code}throw new PleaseHoldException(message); {code} in ServerManager#checkAlreadySameHostPort RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4990) Document secure HBase setup
[ https://issues.apache.org/jira/browse/HBASE-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267219#comment-13267219 ] Hudson commented on HBASE-4990: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-4990 Document secure HBase setup (Revision 1333212) Result = SUCCESS stack : Files : * /hbase/trunk/src/docbkx/book.xml * /hbase/trunk/src/docbkx/security.xml Document secure HBase setup --- Key: HBASE-4990 URL: https://issues.apache.org/jira/browse/HBASE-4990 Project: HBase Issue Type: Sub-task Affects Versions: 0.92.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.96.0 Attachments: 4990.txt, 4990v2.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5840) Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status
[ https://issues.apache.org/jira/browse/HBASE-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267221#comment-13267221 ] Hudson commented on HBASE-5840: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-5840 Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status (RajeshBabu) (Revision 1333124) HBASE-5548 Add ability to get a table in the shell; BACKING OUT MISTAKEN CO-COMMIT OF HBASE-5840 (Revision 1333123) Result = SUCCESS ramkrishna : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status -- Key: HBASE-5840 URL: https://issues.apache.org/jira/browse/HBASE-5840 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5840.patch, HBASE-5840_trunk.patch, HBASE-5840_v2.patch TaskMonitor Status will not be cleared in case Regions FAILED_OPEN. This will keeps showing old status. This will miss leads the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1996) Configure scanner buffer in bytes instead of number of rows
[ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267220#comment-13267220 ] Hudson commented on HBASE-1996: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-2214 Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly (Ferdy Galema) (Revision 1333122) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/RPCProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServer.java * /hbase/trunk/src/main/protobuf/Client.proto * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java Configure scanner buffer in bytes instead of number of rows --- Key: HBASE-1996 URL: https://issues.apache.org/jira/browse/HBASE-1996 Project: HBase Issue Type: Improvement Reporter: Dave Latham Assignee: Dave Latham Fix For: 0.90.0 Attachments: 1966.patch, 1996-0.20.3-v2.patch, 1996-0.20.3-v3.patch, 1996-0.20.3.patch Currently, the default scanner fetches a single row at a time. This makes for very slow scans on tables where the rows are not large. You can change the setting for an HTable instance or for each Scan. It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot. Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows. Let's change the setting so that it works with a size in bytes, rather than in rows. This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM. Note that the case is very similar to table writes as well. When disabling auto flush, we buffer a list of Put's to commit at once. That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush. If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows. Changing the scan buffer to be configured like the write buffer will make it more consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5919) Add fixes for Ted's review comments from HBASE-5869
[ https://issues.apache.org/jira/browse/HBASE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267223#comment-13267223 ] Hudson commented on HBASE-5919: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-5919 Add fixes for Ted's review comments from HBASE-5869 (Revision 104) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java Add fixes for Ted's review comments from HBASE-5869 --- Key: HBASE-5919 URL: https://issues.apache.org/jira/browse/HBASE-5919 Project: HBase Issue Type: Bug Reporter: stack Assignee: Ted Yu Priority: Blocker Attachments: 5919-v2.txt, 5919-v4.txt, 5919.txt I missed addressing a few of Ted's comments on the end of my navigating HBASE-5869 commit. Fix here. Make it a blocker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5548) Add ability to get a table in the shell
[ https://issues.apache.org/jira/browse/HBASE-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267222#comment-13267222 ] Hudson commented on HBASE-5548: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-5548 Add ability to get a table in the shell; BACKING OUT MISTAKEN CO-COMMIT OF HBASE-5840 (Revision 1333123) Result = SUCCESS stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java Add ability to get a table in the shell --- Key: HBASE-5548 URL: https://issues.apache.org/jira/browse/HBASE-5548 Project: HBase Issue Type: Improvement Components: shell Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: ruby_HBASE-5528-v0.patch, ruby_HBASE-5548-addendum.patch, ruby_HBASE-5548-v1.patch, ruby_HBASE-5548-v2.patch, ruby_HBASE-5548-v3.patch, ruby_HBASE-5548-v5.patch Currently, all the commands that operate on a table in the shell first have to take the table as name as input. There are two main considerations: * It is annoying to have to write the table name every time, when you should just be able to get a reference to a table * the current implementation is very wasteful - it creates a new HTable for each call (but reuses the connection since it uses the same configuration) We should be able to get a handle to a single HTable and then operate on that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5913) Speed up the full scan of META
[ https://issues.apache.org/jira/browse/HBASE-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267225#comment-13267225 ] Hudson commented on HBASE-5913: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-5913 Speed up the full scan of META (Chunhui) (Revision 1333283) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java Speed up the full scan of META -- Key: HBASE-5913 URL: https://issues.apache.org/jira/browse/HBASE-5913 Project: HBase Issue Type: Improvement Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.1 Attachments: 5913-v2.txt, HBASE-5913.patch In the master, we will do the full scan of META in some situations for example, 1.master start up 2.CatalogJanitor do the full scan per 5 mins 3.ServerShutdownHandler, getServerUserRegions for dead server. For the online applications, we should try the best to reduce the process time of ServerShutdownHandler in the situation 3. However, we found MetaReader#getServerUserRegions take 14mins for 10w regions in our production environment. And it is caused by two reasons: The first, we don't use cache and get one row per next() when fully scan .META. The second, hbase.ipc.client.tcpnodelay is false as default, and in our environment it take 40ms for per next() (It is related to the length of row in the .META. , if someone also found, could try to set it true) For this issue, I think we could set the caching when do the full scan of META -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb
[ https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267224#comment-13267224 ] Hudson commented on HBASE-5869: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-5919 Add fixes for Ted's review comments from HBASE-5869 (Revision 104) HBASE-5869 Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb (Revision 1333099) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/DeserializationException.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/EmptyWatcher.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HBaseException.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/RegionTransition.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ServerName.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/SplitLogCounters.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/SplitLogTask.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/RegionTransitionData.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/EmptyWatcher.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterAddressTracker.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java * /hbase/trunk/src/main/protobuf/ZooKeeper.proto * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestSerialization.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/Mocking.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestOpenRegionHandler.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb - Key: HBASE-5869 URL: https://issues.apache.org/jira/browse/HBASE-5869 Project: HBase Issue Type: Task Reporter: stack Assignee:
[jira] [Commented] (HBASE-5625) Avoid byte buffer allocations when reading a value from a Result object
[ https://issues.apache.org/jira/browse/HBASE-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267226#comment-13267226 ] Hudson commented on HBASE-5625: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-5625 Avoid byte buffer allocations when reading a value from a Result object (Tudor Scurtu) (Revision 1333159) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Result.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestKeyValue.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestResult.java Avoid byte buffer allocations when reading a value from a Result object --- Key: HBASE-5625 URL: https://issues.apache.org/jira/browse/HBASE-5625 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.92.1 Reporter: Tudor Scurtu Assignee: Tudor Scurtu Labels: patch Fix For: 0.96.0 Attachments: 5625.txt, 5625v2.txt, 5625v3.txt, 5625v4.txt, 5625v5.txt, 5625v6.txt, 5625v7.txt, 5625v8.txt When calling Result.getValue(), an extra dummy KeyValue and its associated underlying byte array are allocated, as well as a persistent buffer that will contain the returned value. These can be avoided by reusing a static array for the dummy object and by passing a ByteBuffer object as a value destination buffer to the read method. The current functionality is maintained, and we have added a separate method call stack that employs the described changes. I will provide more details with the patch. Running tests with a profiler, the reduction of read time seems to be of up to 40%. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2214) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly
[ https://issues.apache.org/jira/browse/HBASE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267227#comment-13267227 ] Hudson commented on HBASE-2214: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-2214 Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly (Ferdy Galema) (Revision 1333122) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/RPCProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServer.java * /hbase/trunk/src/main/protobuf/Client.proto * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly - Key: HBASE-2214 URL: https://issues.apache.org/jira/browse/HBASE-2214 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Ferdy Galema Fix For: 0.96.0, 0.94.1 Attachments: HBASE-2214-0.94-v2.txt, HBASE-2214-0.94-v3.txt, HBASE-2214-0.94.txt, HBASE-2214-v4.txt, HBASE-2214-v5.txt, HBASE-2214-v6.txt, HBASE-2214-v7.txt, HBASE-2214_with_broken_TestShell.txt The notion that you set size rather than row count specifying how many rows a scanner should return in each cycle was raised over in hbase-1966. Its a good one making hbase regular though the data under it may vary. HBase-1966 was committed but the patch was constrained by the fact that it needed to not change RPC interface. This issue is about doing hbase-1966 for 0.21 in a clean, unconstrained way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError
[ https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267228#comment-13267228 ] stack commented on HBASE-5922: -- @Anoop Its late, but what you say makes sense. Looking over in our half file test, TestHalfStoreFileReader, it seems pretty poor coverage. What do you think? It does not seem to test the boundary condition Nate ran into or that you reason above? HalfStoreFileReader seekBefore causes StackOverflowError Key: HBASE-5922 URL: https://issues.apache.org/jira/browse/HBASE-5922 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.90.0 Environment: HBase 0.90.4 Reporter: Nate Putnam Assignee: Nate Putnam Fix For: 0.90.0 Attachments: HBASE-5922.patch, HBASE-5922.patch Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the underlying store file is a reference and the row key is in the bottom. java.io.IOException: java.io.IOException: java.lang.StackOverflowError at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651) at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.StackOverflowError at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5913) Speed up the full scan of META
[ https://issues.apache.org/jira/browse/HBASE-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5913: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to 0.94 as well. Speed up the full scan of META -- Key: HBASE-5913 URL: https://issues.apache.org/jira/browse/HBASE-5913 Project: HBase Issue Type: Improvement Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.1 Attachments: 5913-v2.txt, HBASE-5913.patch In the master, we will do the full scan of META in some situations for example, 1.master start up 2.CatalogJanitor do the full scan per 5 mins 3.ServerShutdownHandler, getServerUserRegions for dead server. For the online applications, we should try the best to reduce the process time of ServerShutdownHandler in the situation 3. However, we found MetaReader#getServerUserRegions take 14mins for 10w regions in our production environment. And it is caused by two reasons: The first, we don't use cache and get one row per next() when fully scan .META. The second, hbase.ipc.client.tcpnodelay is false as default, and in our environment it take 40ms for per next() (It is related to the length of row in the .META. , if someone also found, could try to set it true) For this issue, I think we could set the caching when do the full scan of META -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267232#comment-13267232 ] stack commented on HBASE-5916: -- Well, thats useful, right? Its useful in case where a regionserver crashes and a new one comes up fast, before the original regionserver's znode has expired in zk. We shouldn't remove it. On startup, you should not get this exception unless you have a condition like that described above where there was a regionserver on same host and port registered previously in the master and then a new regionserver comes in w/ same host and port but with different startcode? RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5444) Add PB-based calls to HMasterRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5444: - Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk. Thanks for the patch Gregory. Add PB-based calls to HMasterRegionInterface Key: HBASE-5444 URL: https://issues.apache.org/jira/browse/HBASE-5444 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Gregory Chanan Fix For: 0.96.0 Attachments: HBASE-5444-v10-trunk.patch, HBASE-5444-v6-trunk.patch, HBASE-5444-v9-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5918) Master will block forever when startup if root server died between assign root and assign meta
[ https://issues.apache.org/jira/browse/HBASE-5918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267243#comment-13267243 ] stack commented on HBASE-5918: -- @Chunhui I suppose so Its just odd have flags set in such different locations... It makes the tracking of stuff difficult. At a minimum I'd think we'd make a single method that set the flag and then did the call to expireDeadNotExpiredServers so they are grouped... What about the call to expireDeadNotExpiredServers that is done twice? On first call, we'd process possibly the server that was carrying root. What happens when we call it again later out in finishInitialization? Could we end up processing same server twice at all? Thanks. Master will block forever when startup if root server died between assign root and assign meta -- Key: HBASE-5918 URL: https://issues.apache.org/jira/browse/HBASE-5918 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1 Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-5918.patch, HBASE-5918.patch When master is initializing, if root server died between assign root and assign meta, master will block at HMaster#assignRootAndMeta:{code}assignmentManager.assignMeta(); this.catalogTracker.waitForMeta();{code} because ServerShutdownHandler is disabled, So we should enable ServerShutdownHandler after called assignmentManager.assignMeta(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267253#comment-13267253 ] ramkrishna.s.vasudevan commented on HBASE-5916: --- We should not remove PleaseHoldException(message) directly. {code} if (services.isServerShutdownHandlerEnabled()) { // master has completed the initialization throw new PleaseHoldException(message); } {code} This solved the actual problem. But the problem due to filenotfoundException should be addressed in a different way. RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError
[ https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267254#comment-13267254 ] Anoop Sam John commented on HBASE-5922: --- @Stack, I have not gone though the test cases.. Seems like boudary condition is not checked.. As far as my analysis there are 2 bugs in this seekBefore().. I will take a look at the tests and the other methods of HalfStoreFileReader... Bugs 1. As the case with Nate, Stackoverflow when seekBefore() called with a key=splitKey on the bottom half file 2. On the top half file a seekBefore() call with a key = splitkey is supposed to return false but it wont happen. It will try to seek into the bottom half I fear .. HalfStoreFileReader seekBefore causes StackOverflowError Key: HBASE-5922 URL: https://issues.apache.org/jira/browse/HBASE-5922 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.90.0 Environment: HBase 0.90.4 Reporter: Nate Putnam Assignee: Nate Putnam Fix For: 0.90.0 Attachments: HBASE-5922.patch, HBASE-5922.patch Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the underlying store file is a reference and the row key is in the bottom. java.io.IOException: java.io.IOException: java.lang.StackOverflowError at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651) at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.StackOverflowError at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5923) Cleanup checkAndXXX logic
[ https://issues.apache.org/jira/browse/HBASE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267260#comment-13267260 ] Hadoop QA commented on HBASE-5923: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525400/5923-trunk.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1741//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1741//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1741//console This message is automatically generated. Cleanup checkAndXXX logic - Key: HBASE-5923 URL: https://issues.apache.org/jira/browse/HBASE-5923 Project: HBase Issue Type: Improvement Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.1 Attachments: 5923-trunk.txt 1. the checkAnd{Put|Delete} method that takes a CompareOP is not exposed via HTable[Interface]. 2. there is unnecessary duplicate code in the check{Put|Delete} code in HRegionServer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5918) Master will block forever when startup if root server died between assign root and assign meta
[ https://issues.apache.org/jira/browse/HBASE-5918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-5918: Attachment: HBASE-5918V2.patch In the v2 patch, I make a single method that set the flag and then did the call to expireDeadNotExpiredServers. And we will only call this method once now. Master will block forever when startup if root server died between assign root and assign meta -- Key: HBASE-5918 URL: https://issues.apache.org/jira/browse/HBASE-5918 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1 Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-5918.patch, HBASE-5918.patch, HBASE-5918V2.patch When master is initializing, if root server died between assign root and assign meta, master will block at HMaster#assignRootAndMeta:{code}assignmentManager.assignMeta(); this.catalogTracker.waitForMeta();{code} because ServerShutdownHandler is disabled, So we should enable ServerShutdownHandler after called assignmentManager.assignMeta(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5913) Speed up the full scan of META
[ https://issues.apache.org/jira/browse/HBASE-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267263#comment-13267263 ] Hudson commented on HBASE-5913: --- Integrated in HBase-0.94 #174 (See [https://builds.apache.org/job/HBase-0.94/174/]) HBASE-5913 Speed up the full scan of META (Revision 115) Result = SUCCESS larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java Speed up the full scan of META -- Key: HBASE-5913 URL: https://issues.apache.org/jira/browse/HBASE-5913 Project: HBase Issue Type: Improvement Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.1 Attachments: 5913-v2.txt, HBASE-5913.patch In the master, we will do the full scan of META in some situations for example, 1.master start up 2.CatalogJanitor do the full scan per 5 mins 3.ServerShutdownHandler, getServerUserRegions for dead server. For the online applications, we should try the best to reduce the process time of ServerShutdownHandler in the situation 3. However, we found MetaReader#getServerUserRegions take 14mins for 10w regions in our production environment. And it is caused by two reasons: The first, we don't use cache and get one row per next() when fully scan .META. The second, hbase.ipc.client.tcpnodelay is false as default, and in our environment it take 40ms for per next() (It is related to the length of row in the .META. , if someone also found, could try to set it true) For this issue, I think we could set the caching when do the full scan of META -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5732) Remove the SecureRPCEngine and merge the security-related logic in the core engine
[ https://issues.apache.org/jira/browse/HBASE-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267270#comment-13267270 ] Devaraj Das commented on HBASE-5732: bq. There is no corresponding remove of the /security directory. Should it be included here? Yeah, it shouldn't be there. However, I generated the patch with --no-diff-deleted and hence these files still show up but if you download the patch you will see a bunch of lines that say Index: security/... (deleted). The person who commits needs to be aware of this I guess and run the appropriate svn commands. bq. I don't see you regenerating pb stuff after making these changes in this proto file There is actually - RPCProtos.java. bq. What is this? Mistake? (comment to do with the conf file change). I merged in the stuff from hbase-site.xml from the security/src/test/resources into the src/test/resources one since the security one would go away (yeah you won't know about it unless you do a manual diff of the two hbase-site.xml files). I am in the process of setting up a secure cluster etc. for some manual testing.. Fingers crossed. Remove the SecureRPCEngine and merge the security-related logic in the core engine -- Key: HBASE-5732 URL: https://issues.apache.org/jira/browse/HBASE-5732 Project: HBase Issue Type: Improvement Reporter: Devaraj Das Assignee: Devaraj Das Attachments: rpcengine-merge.3.patch, rpcengine-merge.4.patch, rpcengine-merge.patch Remove the SecureRPCEngine and merge the security-related logic in the core engine. Follow up to HBASE-5727. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5444) Add PB-based calls to HMasterRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267284#comment-13267284 ] Hudson commented on HBASE-5444: --- Integrated in HBase-TRUNK #2842 (See [https://builds.apache.org/job/HBase-TRUNK/2842/]) HBASE-5444 Add PB-based calls to HMasterRegionInterface (Revision 119) Result = FAILURE stack : Files : * /hbase/trunk/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/MasterStatusTmpl.jamon * /hbase/trunk/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ServerLoad.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HMasterRegionInterface.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/RegionServerStatusProtocol.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MXBean.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MXBeanImpl.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterDumpServlet.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/RegionServerStatusProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/protobuf/RegionServerStatus.proto * /hbase/trunk/src/main/protobuf/hbase.proto * /hbase/trunk/src/main/resources/hbase-webapps/master/table.jsp * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMXBean.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestServerCustomProtocol.java Add PB-based calls to HMasterRegionInterface Key: HBASE-5444 URL: https://issues.apache.org/jira/browse/HBASE-5444 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Gregory Chanan Fix For: 0.96.0 Attachments: HBASE-5444-v10-trunk.patch, HBASE-5444-v6-trunk.patch, HBASE-5444-v9-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5918) Master will block forever when startup if root server died between assign root and assign meta
[ https://issues.apache.org/jira/browse/HBASE-5918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267287#comment-13267287 ] Hadoop QA commented on HBASE-5918: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525408/HBASE-5918V2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1742//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1742//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1742//console This message is automatically generated. Master will block forever when startup if root server died between assign root and assign meta -- Key: HBASE-5918 URL: https://issues.apache.org/jira/browse/HBASE-5918 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1 Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-5918.patch, HBASE-5918.patch, HBASE-5918V2.patch When master is initializing, if root server died between assign root and assign meta, master will block at HMaster#assignRootAndMeta:{code}assignmentManager.assignMeta(); this.catalogTracker.waitForMeta();{code} because ServerShutdownHandler is disabled, So we should enable ServerShutdownHandler after called assignmentManager.assignMeta(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5876) TestImportExport has been failing against hadoop 0.23 profile
[ https://issues.apache.org/jira/browse/HBASE-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267310#comment-13267310 ] Jonathan Hsieh commented on HBASE-5876: --- This is related to HBASE-5697 -- it has to do with some sort of inconsistencies with old mr properties (fs.default.name) and new mr properties (fs.defaultFs). When only using the new hadoop 23 properties, this test passes consistently. TestImportExport has been failing against hadoop 0.23 profile - Key: HBASE-5876 URL: https://issues.apache.org/jira/browse/HBASE-5876 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.96.0 Reporter: Zhihong Yu Assignee: Jonathan Hsieh TestImportExport has been failing against hadoop 0.23 profile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5844: --- Attachment: 5844.v3.patch Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 5844.v3.patch, 5844.v4.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5844: --- Status: Open (was: Patch Available) Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 5844.v3.patch, 5844.v4.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5844: --- Attachment: 5844.v4.patch Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 5844.v3.patch, 5844.v4.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5844: --- Status: Patch Available (was: Reopened) Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 5844.v3.patch, 5844.v4.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5844: --- Status: Patch Available (was: Open) Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 5844.v3.patch, 5844.v4.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267332#comment-13267332 ] nkeywal commented on HBASE-5844: v4 should be ok. I will do another jira for the master. Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 5844.v3.patch, 5844.v4.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
nkeywal created HBASE-5924: -- Summary: In the client code, don't wait for all the requests to be executed before resubmitting a request in error. Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError
[ https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267346#comment-13267346 ] Anoop Sam John commented on HBASE-5922: --- Checked other methods in HalfStoreFileReader. Looks ok to me... As Stack also asked how u get this issue in cluster? [Functionaly reproduce] Any way the code is supposed to handle these cases I feel and needs fix HalfStoreFileReader seekBefore causes StackOverflowError Key: HBASE-5922 URL: https://issues.apache.org/jira/browse/HBASE-5922 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.90.0 Environment: HBase 0.90.4 Reporter: Nate Putnam Assignee: Nate Putnam Fix For: 0.90.0 Attachments: HBASE-5922.patch, HBASE-5922.patch Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the underlying store file is a reference and the row key is in the bottom. java.io.IOException: java.io.IOException: java.lang.StackOverflowError at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651) at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.StackOverflowError at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5902) Some scripts are not executable
[ https://issues.apache.org/jira/browse/HBASE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5902: --- Status: Patch Available (was: Open) Some scripts are not executable --- Key: HBASE-5902 URL: https://issues.apache.org/jira/browse/HBASE-5902 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Attachments: 5902.v1.patch -rw-rw-r-- graceful_stop.sh -rw-rw-r-- hbase-config.sh -rw-rw-r-- local-master-backup.sh -rw-rw-r-- local-regionservers.sh -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5902) Some scripts are not executable
[ https://issues.apache.org/jira/browse/HBASE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267350#comment-13267350 ] Hadoop QA commented on HBASE-5902: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525056/5902.v1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1744//console This message is automatically generated. Some scripts are not executable --- Key: HBASE-5902 URL: https://issues.apache.org/jira/browse/HBASE-5902 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Attachments: 5902.v1.patch -rw-rw-r-- graceful_stop.sh -rw-rw-r-- hbase-config.sh -rw-rw-r-- local-master-backup.sh -rw-rw-r-- local-regionservers.sh -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267352#comment-13267352 ] Hadoop QA commented on HBASE-5844: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525420/5844.v4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1743//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1743//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1743//console This message is automatically generated. Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 5844.v3.patch, 5844.v4.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-5883: Attachment: (was: HBASE-5883-94.patch) Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, HBASE-5883-94.patch, HBASE-5883-trunk.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-5883: Attachment: HBASE-5883-90.patch HBASE-5883-92.patch HBASE-5883-94.patch Patches for all the branches. All test cases passed. Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, HBASE-5883-94.patch, HBASE-5883-trunk.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267392#comment-13267392 ] ramkrishna.s.vasudevan edited comment on HBASE-5875 at 5/3/12 12:44 PM: bq.What is the above referring to? Which part of the code? In assignRootAndMeta() {code} boolean rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); {code} bq.Can the master not detect this corner case just by looking at whats in zk? Here zk you mean the RS node or the ROOT region node? was (Author: ram_krish): bq.What is the above referring to? Which part of the code? In assignRootAndMeta() {code} boolean rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); {code} Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 Attachments: HBASE-5875.patch If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267392#comment-13267392 ] ramkrishna.s.vasudevan commented on HBASE-5875: --- bq.What is the above referring to? Which part of the code? In assignRootAndMeta() {code} boolean rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); {code} Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 Attachments: HBASE-5875.patch If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267395#comment-13267395 ] Hadoop QA commented on HBASE-5883: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525434/HBASE-5883-90.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1745//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1745//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1745//console This message is automatically generated. Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, HBASE-5883-94.patch, HBASE-5883-trunk.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at
[jira] [Commented] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError
[ https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267424#comment-13267424 ] Todd Johnson commented on HBASE-5922: - I worked with Nate on this yesterday. We couldn't think of any reason you would want the delegate to search for the splitkey, nor any reason this method would need to recursively call itself. Our reading of the code was that there are two reasons to return false: if 'top' is true (you're in the top half of the split file) and the search key is greater than the splitkey (this works now) OR if 'top' is false (you're in the bottom half of the file) and the search key is less-than-or-equal-to the splitkey (presumably, the splitkey is stored in the top half, thus or-equal-to). If neither of those conditions exist, there is a possibility of finding the search key in the half-file you're looking at, so you call the delegate. HalfStoreFileReader seekBefore causes StackOverflowError Key: HBASE-5922 URL: https://issues.apache.org/jira/browse/HBASE-5922 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.90.0 Environment: HBase 0.90.4 Reporter: Nate Putnam Assignee: Nate Putnam Fix For: 0.90.0 Attachments: HBASE-5922.patch, HBASE-5922.patch Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the underlying store file is a reference and the row key is in the bottom. java.io.IOException: java.io.IOException: java.lang.StackOverflowError at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651) at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.StackOverflowError at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError
[ https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267453#comment-13267453 ] Anoop Sam John commented on HBASE-5922: --- In case of the bottom file, if the passed key is = splitkey, that is the case where we need to work with the passed key. This is not a case to return false. At this case ideally the the scanner should get pointed to the last key in the bottom file. Yes bottom file will not have the split key in it. So we should change the key and need to seekBefore the splitKey, which in turn can make the pointer to the last key. I think why the stack overflow was coming is clear to you... It is because of the = check .. That is some thing unwanted... HalfStoreFileReader seekBefore causes StackOverflowError Key: HBASE-5922 URL: https://issues.apache.org/jira/browse/HBASE-5922 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.90.0 Environment: HBase 0.90.4 Reporter: Nate Putnam Assignee: Nate Putnam Fix For: 0.90.0 Attachments: HBASE-5922.patch, HBASE-5922.patch Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the underlying store file is a reference and the row key is in the bottom. java.io.IOException: java.io.IOException: java.lang.StackOverflowError at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651) at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.StackOverflowError at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5925) Issue with only using the old config param hbase.hstore.compactionThreshold but not the corresponding new one
Anoop Sam John created HBASE-5925: - Summary: Issue with only using the old config param hbase.hstore.compactionThreshold but not the corresponding new one Key: HBASE-5925 URL: https://issues.apache.org/jira/browse/HBASE-5925 Project: HBase Issue Type: Bug Reporter: Anoop Sam John Priority: Minor One observation while going through the code:- In MemStoreFlusher constructor {code} this.blockingStoreFilesNumber = conf.getInt(hbase.hstore.blockingStoreFiles, 7); if (this.blockingStoreFilesNumber == -1) { this.blockingStoreFilesNumber = 1 + conf.getInt(hbase.hstore.compactionThreshold, 3); } {code} Here as per the code if hbase.hstore.blockingStoreFiles is configured as -1, we are making this value to be 1+ min files to compact But here we read the old config item only! Here also we need to read the new config 1st and if not there then the old one.. Is this a miss? Like conf.getInt(hbase.hstore.compaction.min, conf.getInt(hbase.hstore.compactionThreshold, 3)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5926) Delete the master znode after a znode crash
nkeywal created HBASE-5926: -- Summary: Delete the master znode after a znode crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script delete the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be delete soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a znode crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Description: This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. was: This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script delete the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be delete soon enough), but it can happen. Delete the master znode after a znode crash --- Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5919) Add fixes for Ted's review comments from HBASE-5869
[ https://issues.apache.org/jira/browse/HBASE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5919: -- Resolution: Fixed Status: Resolved (was: Patch Available) Add fixes for Ted's review comments from HBASE-5869 --- Key: HBASE-5919 URL: https://issues.apache.org/jira/browse/HBASE-5919 Project: HBase Issue Type: Bug Reporter: stack Assignee: Ted Yu Priority: Blocker Attachments: 5919-v2.txt, 5919-v4.txt, 5919.txt I missed addressing a few of Ted's comments on the end of my navigating HBASE-5869 commit. Fix here. Make it a blocker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError
[ https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267484#comment-13267484 ] Todd Johnson commented on HBASE-5922: - Yeah, that's not how it seems to me. But then, I didn't write the code originally, so perhaps I misunderstand it. We added a test case that causes infinite recursion with the old code, but appears to work with the patch. The test case searches for a key that is not equal to the split key. Given this, I don't see how the equals check could be the problem. HalfStoreFileReader seekBefore causes StackOverflowError Key: HBASE-5922 URL: https://issues.apache.org/jira/browse/HBASE-5922 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.90.0 Environment: HBase 0.90.4 Reporter: Nate Putnam Assignee: Nate Putnam Fix For: 0.90.0 Attachments: HBASE-5922.patch, HBASE-5922.patch Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the underlying store file is a reference and the row key is in the bottom. java.io.IOException: java.io.IOException: java.lang.StackOverflowError at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651) at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.StackOverflowError at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Summary: Delete the master znode after a master crash (was: Delete the master znode after a znode crash) Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError
[ https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267502#comment-13267502 ] Nate Putnam commented on HBASE-5922: @Anoop as far as reproducing the issue, I'm not sure the exact steps that would cause this in a production environment. The test case in the patch will reproduce the issue though. HalfStoreFileReader seekBefore causes StackOverflowError Key: HBASE-5922 URL: https://issues.apache.org/jira/browse/HBASE-5922 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.90.0 Environment: HBase 0.90.4 Reporter: Nate Putnam Assignee: Nate Putnam Fix For: 0.90.0 Attachments: HBASE-5922.patch, HBASE-5922.patch Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the underlying store file is a reference and the row key is in the bottom. java.io.IOException: java.io.IOException: java.lang.StackOverflowError at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651) at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.StackOverflowError at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5876) TestImportExport has been failing against hadoop 0.23 profile
[ https://issues.apache.org/jira/browse/HBASE-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267513#comment-13267513 ] Jonathan Hsieh commented on HBASE-5876: --- Problems in previous code: # yarn execution framework not used because HBaseTestingUtility.startMiniCluster().getConfiguration() used instead of the HBaseTestingUtility.getConfiguration(). # hadoop 1's mapred.output.dir and hadoop 2's fileoutputformat.outputdir caused export job's data to get lost. Currently running full builds against hadoop 1.0 and hadoop 0.23.x. TestImportExport has been failing against hadoop 0.23 profile - Key: HBASE-5876 URL: https://issues.apache.org/jira/browse/HBASE-5876 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.96.0 Reporter: Zhihong Yu Assignee: Jonathan Hsieh TestImportExport has been failing against hadoop 0.23 profile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267521#comment-13267521 ] ramkrishna.s.vasudevan commented on HBASE-5875: --- I have reproduced the scenario addressing the title of the JIRA with a testcase. I have tried follow a approach that Bijieshan had suggested in https://issues.apache.org/jira/browse/HBASE-5875?focusedCommentId=13264874page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13264874 to solve the problem. Tomorrow i can upload the testcase. Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 Attachments: HBASE-5875.patch If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5927) AM#unassign should handle local exceptions after calling sendRegionClose
Jieshan Bean created HBASE-5927: --- Summary: AM#unassign should handle local exceptions after calling sendRegionClose Key: HBASE-5927 URL: https://issues.apache.org/jira/browse/HBASE-5927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1, 0.96.0, 0.94.1 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.92.2, 0.96.0, 0.94.1 A possible exception: If the related regionserver was just killed(But HMaster has not perceived that), then we will get a local exception Connection reset by peer. If this region belongs to a disabling table. what will happen? ServerShutdownHandler will remove this region from AM#regions. So this region is still existing in RIT. TimeoutMonitor will take care of it after it got timeout. Then invoke unassign again. But it has been removed from AM#regions, so it will return directly due to the below code: public void unassign(HRegionInfo region, boolean force) { // TODO: Method needs refactoring. Ugly buried returns throughout. Beware! LOG.debug(Starting unassignment of region + region.getRegionNameAsString() + (offlining)); synchronized (this.regions) { // Check if this region is currently assigned if (!regions.containsKey(region)) { LOG.debug(Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } } Then it leads to an end-less loop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5927) AM#unassign should handle local exceptions after calling sendRegionClose
[ https://issues.apache.org/jira/browse/HBASE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-5927: Description: A possible exception: If the related regionserver was just killed(But HMaster has not perceived that), then we will get a local exception Connection reset by peer. If this region belongs to a disabling table. what will happen? ServerShutdownHandler will remove this region from AM#regions. So this region is still existing in RIT. TimeoutMonitor will take care of it after it got timeout. Then invoke unassign again. Since this region has been removed from AM#regions, it will return directly due to the below code: synchronized (this.regions) { // Check if this region is currently assigned if (!regions.containsKey(region)) { LOG.debug(Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } } Then it leads to an end-less loop. was: A possible exception: If the related regionserver was just killed(But HMaster has not perceived that), then we will get a local exception Connection reset by peer. If this region belongs to a disabling table. what will happen? ServerShutdownHandler will remove this region from AM#regions. So this region is still existing in RIT. TimeoutMonitor will take care of it after it got timeout. Then invoke unassign again. But it has been removed from AM#regions, so it will return directly due to the below code: public void unassign(HRegionInfo region, boolean force) { // TODO: Method needs refactoring. Ugly buried returns throughout. Beware! LOG.debug(Starting unassignment of region + region.getRegionNameAsString() + (offlining)); synchronized (this.regions) { // Check if this region is currently assigned if (!regions.containsKey(region)) { LOG.debug(Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } } Then it leads to an end-less loop. AM#unassign should handle local exceptions after calling sendRegionClose Key: HBASE-5927 URL: https://issues.apache.org/jira/browse/HBASE-5927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1, 0.96.0, 0.94.1 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.92.2, 0.96.0, 0.94.1 A possible exception: If the related regionserver was just killed(But HMaster has not perceived that), then we will get a local exception Connection reset by peer. If this region belongs to a disabling table. what will happen? ServerShutdownHandler will remove this region from AM#regions. So this region is still existing in RIT. TimeoutMonitor will take care of it after it got timeout. Then invoke unassign again. Since this region has been removed from AM#regions, it will return directly due to the below code: synchronized (this.regions) { // Check if this region is currently assigned if (!regions.containsKey(region)) { LOG.debug(Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } } Then it leads to an end-less loop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5927) AM#unassign should handle local exceptions after calling sendRegionClose
[ https://issues.apache.org/jira/browse/HBASE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267550#comment-13267550 ] Zhihong Yu commented on HBASE-5927: --- @Jieshan: Can you a new test case show this possibility ? AM#unassign should handle local exceptions after calling sendRegionClose Key: HBASE-5927 URL: https://issues.apache.org/jira/browse/HBASE-5927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1, 0.96.0, 0.94.1 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.92.2, 0.96.0, 0.94.1 A possible exception: If the related regionserver was just killed(But HMaster has not perceived that), then we will get a local exception Connection reset by peer. If this region belongs to a disabling table. what will happen? ServerShutdownHandler will remove this region from AM#regions. So this region is still existing in RIT. TimeoutMonitor will take care of it after it got timeout. Then invoke unassign again. Since this region has been removed from AM#regions, it will return directly due to the below code: synchronized (this.regions) { // Check if this region is currently assigned if (!regions.containsKey(region)) { LOG.debug(Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } } Then it leads to an end-less loop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267560#comment-13267560 ] Zhihong Yu commented on HBASE-5883: --- Integrated to 0.94 and trunk. Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, HBASE-5883-94.patch, HBASE-5883-trunk.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5927) AM#unassign should handle local exceptions after calling sendRegionClose
[ https://issues.apache.org/jira/browse/HBASE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5927: -- Description: A possible exception: If the related regionserver was just killed(But HMaster has not perceived that), then we will get a local exception Connection reset by peer. If this region belongs to a disabling table. what will happen? ServerShutdownHandler will remove this region from AM#regions. So this region is still existing in RIT. TimeoutMonitor will take care of it after it got timeout. Then invoke unassign again. Since this region has been removed from AM#regions, it will return directly due to the below code: {code} synchronized (this.regions) { // Check if this region is currently assigned if (!regions.containsKey(region)) { LOG.debug(Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } } {code} Then it leads to an end-less loop. was: A possible exception: If the related regionserver was just killed(But HMaster has not perceived that), then we will get a local exception Connection reset by peer. If this region belongs to a disabling table. what will happen? ServerShutdownHandler will remove this region from AM#regions. So this region is still existing in RIT. TimeoutMonitor will take care of it after it got timeout. Then invoke unassign again. Since this region has been removed from AM#regions, it will return directly due to the below code: synchronized (this.regions) { // Check if this region is currently assigned if (!regions.containsKey(region)) { LOG.debug(Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } } Then it leads to an end-less loop. AM#unassign should handle local exceptions after calling sendRegionClose Key: HBASE-5927 URL: https://issues.apache.org/jira/browse/HBASE-5927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1, 0.96.0, 0.94.1 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.92.2, 0.96.0, 0.94.1 A possible exception: If the related regionserver was just killed(But HMaster has not perceived that), then we will get a local exception Connection reset by peer. If this region belongs to a disabling table. what will happen? ServerShutdownHandler will remove this region from AM#regions. So this region is still existing in RIT. TimeoutMonitor will take care of it after it got timeout. Then invoke unassign again. Since this region has been removed from AM#regions, it will return directly due to the below code: {code} synchronized (this.regions) { // Check if this region is currently assigned if (!regions.containsKey(region)) { LOG.debug(Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } } {code} Then it leads to an end-less loop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5923) Cleanup checkAndXXX logic
[ https://issues.apache.org/jira/browse/HBASE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267568#comment-13267568 ] stack commented on HBASE-5923: -- This patch is great. Thanks for going back and doing the cleanup. This class should not be in filter package? +import org.apache.hadoop.hbase.filter.WritableByteArrayComparable; Probably hard to move it now? Its part of a public API? Could deprecate and replace w/ a more generic, non-filter specific class? Moving it should not be part of this patch. Its not so bad anyways having this filter package pollution since its in client facing code and clients need access to filter stuff... Would think pollution: +import org.apache.hadoop.hbase.protobuf.generated.ClientProtos.Condition.CompareType; Should be pulling in a non-pb class into an Interface like this. Can we encapsulate these Client conditions in a non-pb class? Cleanup checkAndXXX logic - Key: HBASE-5923 URL: https://issues.apache.org/jira/browse/HBASE-5923 Project: HBase Issue Type: Improvement Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.1 Attachments: 5923-trunk.txt 1. the checkAnd{Put|Delete} method that takes a CompareOP is not exposed via HTable[Interface]. 2. there is unnecessary duplicate code in the check{Put|Delete} code in HRegionServer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5883: -- Comment: was deleted (was: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525434/HBASE-5883-90.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1745//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1745//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1745//console This message is automatically generated.) Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, HBASE-5883-94.patch, HBASE-5883-trunk.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at
[jira] [Commented] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267574#comment-13267574 ] Zhihong Yu commented on HBASE-5883: --- Integrated to 0.92 and 0.90 as well. Thanks for the patch Jieshan. Thanks for the review, Lars. Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, HBASE-5883-94.patch, HBASE-5883-trunk.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5923) Cleanup checkAndXXX logic
[ https://issues.apache.org/jira/browse/HBASE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5923: - Status: Open (was: Patch Available) Cleanup checkAndXXX logic - Key: HBASE-5923 URL: https://issues.apache.org/jira/browse/HBASE-5923 Project: HBase Issue Type: Improvement Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.1 Attachments: 5923-trunk.txt 1. the checkAnd{Put|Delete} method that takes a CompareOP is not exposed via HTable[Interface]. 2. there is unnecessary duplicate code in the check{Put|Delete} code in HRegionServer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5923) Cleanup checkAndXXX logic
[ https://issues.apache.org/jira/browse/HBASE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5923: - Attachment: 5923-0.94.txt 0.94 patch. Looking at the two patches now, the PB stuff is leaking through. I.e. in trunk the generated CompareType is used by a client, whereas 0.94 CompareFilter.compareOp has to be used. That also means that is 0.94 there would be a dependency on CompareFilter in HTableInterface. Please let me know what you think. Cleanup checkAndXXX logic - Key: HBASE-5923 URL: https://issues.apache.org/jira/browse/HBASE-5923 Project: HBase Issue Type: Improvement Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.1 Attachments: 5923-0.94.txt, 5923-trunk.txt 1. the checkAnd{Put|Delete} method that takes a CompareOP is not exposed via HTable[Interface]. 2. there is unnecessary duplicate code in the check{Put|Delete} code in HRegionServer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5923) Cleanup checkAndXXX logic
[ https://issues.apache.org/jira/browse/HBASE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267593#comment-13267593 ] stack commented on HBASE-5923: -- bq. That also means that is 0.94 there would be a dependency on CompareFilter in HTableInterface. Thats better than a generated pb dependency IMO. If you'd like, I can make it so you can do same or similar in trunk: i.e. not have to import generated pb but rather the filter.CompareFilter or some such similar class? Just say. Cleanup checkAndXXX logic - Key: HBASE-5923 URL: https://issues.apache.org/jira/browse/HBASE-5923 Project: HBase Issue Type: Improvement Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.1 Attachments: 5923-0.94.txt, 5923-trunk.txt 1. the checkAnd{Put|Delete} method that takes a CompareOP is not exposed via HTable[Interface]. 2. there is unnecessary duplicate code in the check{Put|Delete} code in HRegionServer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267594#comment-13267594 ] Zhihong Yu commented on HBASE-3996: --- There're a few suggestions from Stack pending. @Stack: Can you take a look at Eran's comments from Apr 5th ? Support multiple tables and scanners as input to the mapper in map/reduce jobs -- Key: HBASE-3996 URL: https://issues.apache.org/jira/browse/HBASE-3996 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: Eran Kutner Assignee: Eran Kutner Fix For: 0.96.0 Attachments: 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, 3996-v5.txt, 3996-v6.txt, 3996-v7.txt, HBase-3996.patch It seems that in many cases feeding data from multiple tables or multiple scanners on a single table can save a lot of time when running map/reduce jobs. I propose a new MultiTableInputFormat class that would allow doing this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5923) Cleanup checkAndXXX logic
[ https://issues.apache.org/jira/browse/HBASE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267597#comment-13267597 ] Lars Hofhansl commented on HBASE-5923: -- Thanks Stack. These are exactly the concerns I had. It becomes even more pronounced when looking at the 0.94 patch, which needs to have a slightly different client facing API - since the PB stuff not exist there. I can see a few solutions: * Only allow using WritableByteArrayComparable, i.e. make it implied and don't even pass it (and hence only create the dependency for HTable but not HTableInterface). * As you said, have a separate CompareOp class that gets translated to the correct compareType in HTable (again would allow only HTable having the dependency, but not HTableInterface) Cleanup checkAndXXX logic - Key: HBASE-5923 URL: https://issues.apache.org/jira/browse/HBASE-5923 Project: HBase Issue Type: Improvement Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.1 Attachments: 5923-0.94.txt, 5923-trunk.txt 1. the checkAnd{Put|Delete} method that takes a CompareOP is not exposed via HTable[Interface]. 2. there is unnecessary duplicate code in the check{Put|Delete} code in HRegionServer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5886) Add new metric for possible data loss due to puts without WAL
[ https://issues.apache.org/jira/browse/HBASE-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-5886: --- Attachment: HBASE-5886-v4.patch Add new metric for possible data loss due to puts without WAL -- Key: HBASE-5886 URL: https://issues.apache.org/jira/browse/HBASE-5886 Project: HBase Issue Type: New Feature Components: metrics, regionserver Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: metrics Attachments: HBASE-5886-v0.patch, HBASE-5886-v1.patch, HBASE-5886-v2.patch, HBASE-5886-v3.patch, HBASE-5886-v4.patch Add a metrics to keep track of puts without WAL and possible data loss size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5923) Cleanup checkAndXXX logic
[ https://issues.apache.org/jira/browse/HBASE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267604#comment-13267604 ] Lars Hofhansl commented on HBASE-5923: -- @Stack: You mean have a CompareFilter.CompareOp to o.a.h.h.p.g.ClientProtos.Condition.CompareType mapping? That'd be nice as the client facing interface would not change between 0.94 and trunk. Or have a completely separate CompareOp/CompareType class? Cleanup checkAndXXX logic - Key: HBASE-5923 URL: https://issues.apache.org/jira/browse/HBASE-5923 Project: HBase Issue Type: Improvement Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.1 Attachments: 5923-0.94.txt, 5923-trunk.txt 1. the checkAnd{Put|Delete} method that takes a CompareOP is not exposed via HTable[Interface]. 2. there is unnecessary duplicate code in the check{Put|Delete} code in HRegionServer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5889) Remove HRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5889: --- Status: Patch Available (was: Open) Remove HRegionInterface --- Key: HBASE-5889 URL: https://issues.apache.org/jira/browse/HBASE-5889 Project: HBase Issue Type: Improvement Components: client, ipc, regionserver Affects Versions: 0.96.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase_5889.patch As a step to move internals to PB, so as to avoid the conversion for performance reason, we should remove the HRegionInterface. Therefore region server only supports ClientProtocol and AdminProtocol. Later on, HRegion can work with PB messages directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5889) Remove HRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5889: --- Attachment: hbase_5889.patch Remove HRegionInterface --- Key: HBASE-5889 URL: https://issues.apache.org/jira/browse/HBASE-5889 Project: HBase Issue Type: Improvement Components: client, ipc, regionserver Affects Versions: 0.96.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase_5889.patch As a step to move internals to PB, so as to avoid the conversion for performance reason, we should remove the HRegionInterface. Therefore region server only supports ClientProtocol and AdminProtocol. Later on, HRegion can work with PB messages directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5889) Remove HRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267613#comment-13267613 ] jirapos...@reviews.apache.org commented on HBASE-5889: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4993/ --- Review request for hbase. Summary --- Removed HRegionInterface, and cleaned up the HRegionServer, moved pb code from RegionServer back to HRegionServer. The goal is to avoid two copies of region server code to maintain, and make it possible to avoid data type conversion in the sever side. Fixed some unit tests. Now all region server unit tests test the new pb functions. Enhanced getServerInfo so that it returns the webui port too. This addresses bug HBASE-5889. https://issues.apache.org/jira/browse/HBASE-5889 Diffs - conf/hbase-policy.xml e45f23c security/src/main/java/org/apache/hadoop/hbase/security/HBasePolicyProvider.java 0c4b4cb src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon 87f04f4 src/main/java/org/apache/hadoop/hbase/HConstants.java a9d80a0 src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java e3912c2 src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java fc9176d src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 757f98e src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java cd9b528 src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java 79d5fdd src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java 212ee3e src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java d1e0993 src/main/java/org/apache/hadoop/hbase/protobuf/ResponseConverter.java 81603af src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java fbf0127 src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java db1333b src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java ae2094d src/main/java/org/apache/hadoop/hbase/protobuf/generated/RPCProtos.java 8b45f03 src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java 827fb23 src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 96ac8bd src/main/java/org/apache/hadoop/hbase/regionserver/HRegionThriftServer.java 4cb070e src/main/java/org/apache/hadoop/hbase/regionserver/RegionServer.java c2c89ea src/main/protobuf/Admin.proto 2ad6fb0 src/main/protobuf/RPC.proto 105fb3f src/main/resources/hbase-default.xml f54b345 src/main/resources/hbase-webapps/master/table.jsp ca7310c src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java a1992c3 src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java ad77e0a src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 5574b7f src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 3dfc94e src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java 42092b7 src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java c270e28 src/test/java/org/apache/hadoop/hbase/client/TestMultiParallel.java c36272f src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java bdec3ee src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 7dbba66 src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFilesSplitRecovery.java 3acb988 src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java eb546a5 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java ceba5cd src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java ec08b17 src/test/java/org/apache/hadoop/hbase/master/TestRollingRestart.java 30c6cf1 src/test/java/org/apache/hadoop/hbase/master/TestZKBasedOpenCloseRegion.java 8c3f67e src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java 7bfe4cd src/test/java/org/apache/hadoop/hbase/regionserver/TestRSStatusServlet.java ffce7e8 src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java aa5ca37 src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java 65fa948 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java fc4a0a7 Diff: https://reviews.apache.org/r/4993/diff Testing --- All regular and security profile tests are green before I rebased to the latest today. Thanks, Jimmy Remove HRegionInterface --- Key: HBASE-5889 URL: https://issues.apache.org/jira/browse/HBASE-5889 Project: HBase Issue Type: Improvement Components: client, ipc, regionserver
[jira] [Commented] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267624#comment-13267624 ] Hudson commented on HBASE-5883: --- Integrated in HBase-TRUNK #2843 (See [https://builds.apache.org/job/HBase-TRUNK/2843/]) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333530) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, HBASE-5883-94.patch, HBASE-5883-trunk.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more
[jira] [Commented] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267633#comment-13267633 ] Hudson commented on HBASE-5883: --- Integrated in HBase-0.94 #175 (See [https://builds.apache.org/job/HBase-0.94/175/]) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333533) Result = FAILURE tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, HBASE-5883-94.patch, HBASE-5883-trunk.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
[jira] [Created] (HBASE-5928) Hbck shouldn't npe when there are no tables.
Elliott Clark created HBASE-5928: Summary: Hbck shouldn't npe when there are no tables. Key: HBASE-5928 URL: https://issues.apache.org/jira/browse/HBASE-5928 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Priority: Minor hbase fsck errors out when there are no tables. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck.reportTablesInFlux(HBaseFsck.java:560) at org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:346) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5886) Add new metric for possible data loss due to puts without WAL
[ https://issues.apache.org/jira/browse/HBASE-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267642#comment-13267642 ] Hadoop QA commented on HBASE-5886: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525470/HBASE-5886-v4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1746//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1746//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1746//console This message is automatically generated. Add new metric for possible data loss due to puts without WAL -- Key: HBASE-5886 URL: https://issues.apache.org/jira/browse/HBASE-5886 Project: HBase Issue Type: New Feature Components: metrics, regionserver Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: metrics Attachments: HBASE-5886-v0.patch, HBASE-5886-v1.patch, HBASE-5886-v2.patch, HBASE-5886-v3.patch, HBASE-5886-v4.patch Add a metrics to keep track of puts without WAL and possible data loss size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5889) Remove HRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267645#comment-13267645 ] Zhihong Yu commented on HBASE-5889: --- HRegionInterface is used by asynchbase: {code} writeHBaseString(buf, org.apache.hadoop.hbase.ipc.HRegionInterface); final String klass = org.apache.hadoop.hbase.ipc.HRegionInterface; ./src/RegionClient.java {code} Should we start a discussion on dev@hbase to get wider feedback about the roadmap for non-bundled (third-party) HBase client(s) ? Remove HRegionInterface --- Key: HBASE-5889 URL: https://issues.apache.org/jira/browse/HBASE-5889 Project: HBase Issue Type: Improvement Components: client, ipc, regionserver Affects Versions: 0.96.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase_5889.patch As a step to move internals to PB, so as to avoid the conversion for performance reason, we should remove the HRegionInterface. Therefore region server only supports ClientProtocol and AdminProtocol. Later on, HRegion can work with PB messages directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5889) Remove HRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267654#comment-13267654 ] Hadoop QA commented on HBASE-5889: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525477/hbase_5889.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 60 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1747//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1747//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1747//console This message is automatically generated. Remove HRegionInterface --- Key: HBASE-5889 URL: https://issues.apache.org/jira/browse/HBASE-5889 Project: HBase Issue Type: Improvement Components: client, ipc, regionserver Affects Versions: 0.96.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase_5889.patch As a step to move internals to PB, so as to avoid the conversion for performance reason, we should remove the HRegionInterface. Therefore region server only supports ClientProtocol and AdminProtocol. Later on, HRegion can work with PB messages directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5494: --- Attachment: D2997.3.patch avf requested code review of [jira] [HBASE-5494] [89-fb] Table-level locks for schema changing operations.. Reviewers: Kannan, mbautin, Liyin, JIRA Since concurrent modification (e.g., disabling and dropping a table under creation) could leave a cluster in an inconsistent state, we need table-level locks for schema changing operations. A ZooKeeper-based distributed lock has been implemented that attempts to create a persistent ZNode (one ZNode per entity being locked, i.e., one per table) if one does not exist. Currently in case a master crashes while holding the lock, the lock must be manually removed using the ZooKeeper command line (locks being stored in /hbase/tableLock/). The locks implemented are not fair or re-entrant. RecoverableZooKeeper is used to correctly handle connection loss. To test the locks, InjectionHandler and InjectionEvent have been introduced, allowing for injection of arbitrary events, in this case adding delays during schema changing operations as to induce a race condition. Future work involves automatically deleting stale lock ZNodes upon server recovery (providing the attempted operations are not resumed), adding metrics around locks (e.g., list all locks held). TEST PLAN Since concurrent modification (e.g., disabling and dropping a table under creation) could leave a cluster in an inconsistent state, we need table-level locks for schema changing operations. A ZooKeeper-based distributed lock has been implemented that attempts to create a persistent ZNode (one ZNode per entity being locked, i.e., one per table) if one does not exist. Currently in case a master crashes while holding the lock, the lock must be manually removed using the ZooKeeper command line (locks being stored in /hbase/tableLock/). The locks implemented are not fair or re-entrant. RecoverableZooKeeper is used to correctly handle connection loss. To test the locks, InjectionHandler and InjectionEvent have been introduced, allowing for injection of arbitrary events, in this case adding delays during schema changing operations as to induce a race condition. Future work involves automatically deleting stale lock ZNodes upon server recovery (providing the attempted operations are not resumed), adding metrics around locks (e.g., list all locks held). REVISION DETAIL https://reviews.facebook.net/D2997 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/TableLockTimeoutException.java src/main/java/org/apache/hadoop/hbase/master/HMaster.java src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java src/main/java/org/apache/hadoop/hbase/util/InjectionEvent.java src/main/java/org/apache/hadoop/hbase/util/InjectionHandler.java src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java src/test/java/org/apache/hadoop/hbase/master/TestSchemaModificationLocks.java src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java src/test/java/org/apache/hadoop/hbase/zookeeper/TestDistributedLock.java Introduce a zk hosted table-wide read/write lock so only one table operation at a time -- Key: HBASE-5494 URL: https://issues.apache.org/jira/browse/HBASE-5494 Project: HBase Issue Type: Improvement Reporter: stack Attachments: D2997.3.patch I saw this facility over in the accumulo code base. Currently we just try to sort out the mess when splits come in during an online schema edit; somehow we figure we can figure all possible region transition combinations and make the right call. We could try and narrow the number of combinations by taking out a zk table lock when doing table operations. For example, on split or merge, we could take a read-only lock meaning the table can't be disabled while these are running. We could then take a write only lock if we want to ensure the table doesn't change while disabling or enabling process is happening. Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5928) Hbck shouldn't npe when there are no tables.
[ https://issues.apache.org/jira/browse/HBASE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-5928: - Attachment: HBASE-5928-0.patch Pretty small patch. HConnectionManager.getHTableDescriptors returns null when there are no tables. I assumed this was expected so handling the null is needed. Hbck shouldn't npe when there are no tables. Key: HBASE-5928 URL: https://issues.apache.org/jira/browse/HBASE-5928 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Priority: Minor Attachments: HBASE-5928-0.patch hbase fsck errors out when there are no tables. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck.reportTablesInFlux(HBaseFsck.java:560) at org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:346) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5928) Hbck shouldn't npe when there are no tables.
[ https://issues.apache.org/jira/browse/HBASE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-5928: - Status: Patch Available (was: Open) Hbck shouldn't npe when there are no tables. Key: HBASE-5928 URL: https://issues.apache.org/jira/browse/HBASE-5928 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Priority: Minor Attachments: HBASE-5928-0.patch hbase fsck errors out when there are no tables. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck.reportTablesInFlux(HBaseFsck.java:560) at org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:346) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5373) Table level lock to prevent the race of multiple table level operation
[ https://issues.apache.org/jira/browse/HBASE-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267667#comment-13267667 ] Liyin Tang commented on HBASE-5373: --- Sure. I am glad that Alex is working this jira right now and I will help on the code-review. Table level lock to prevent the race of multiple table level operation -- Key: HBASE-5373 URL: https://issues.apache.org/jira/browse/HBASE-5373 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang A table level lock can guarantee that only one table operation would happen at one time for each table. The master should require and release these table locks correctly during the failover time. One proposal is to keep track of the lock and its corresponding operation in the zookeeper. If there is a master failover, the secondary should have a way to check whether these operations are succeeded nor not before releasing the lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267681#comment-13267681 ] Alex Feinberg commented on HBASE-5494: -- This patch implements a ZK-hosted mutual exclusion lock (DistributedLock), and table level locks (TableLockManager), and ensures that all schema changing operations are serialized. Further work would be needed to add read-write locks to handle region splitting and merges. Introduce a zk hosted table-wide read/write lock so only one table operation at a time -- Key: HBASE-5494 URL: https://issues.apache.org/jira/browse/HBASE-5494 Project: HBase Issue Type: Improvement Reporter: stack Attachments: D2997.3.patch I saw this facility over in the accumulo code base. Currently we just try to sort out the mess when splits come in during an online schema edit; somehow we figure we can figure all possible region transition combinations and make the right call. We could try and narrow the number of combinations by taking out a zk table lock when doing table operations. For example, on split or merge, we could take a read-only lock meaning the table can't be disabled while these are running. We could then take a write only lock if we want to ensure the table doesn't change while disabling or enabling process is happening. Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5889) Remove HRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267684#comment-13267684 ] Jimmy Xiang commented on HBASE-5889: @Ted, I posted a message to dev@hbase as suggested. I think it is to their benefits to migrate as well. Remove HRegionInterface --- Key: HBASE-5889 URL: https://issues.apache.org/jira/browse/HBASE-5889 Project: HBase Issue Type: Improvement Components: client, ipc, regionserver Affects Versions: 0.96.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase_5889.patch As a step to move internals to PB, so as to avoid the conversion for performance reason, we should remove the HRegionInterface. Therefore region server only supports ClientProtocol and AdminProtocol. Later on, HRegion can work with PB messages directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267689#comment-13267689 ] Phabricator commented on HBASE-5494: tedyu has commented on the revision [jira] [HBASE-5494] [89-fb] Table-level locks for schema changing operations.. I only reviewed part of the patch. Would this feature be refined in 0.89-fb branch before being ported to Apache HBase trunk ? INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/HConstants.java:98 Schema changes would always involve master. 'master.' can be omitted. src/main/java/org/apache/hadoop/hbase/HConstants.java:108 Is this value big enough in cluster testing ? src/main/java/org/apache/hadoop/hbase/TableLockTimeoutException.java:2 No year is needed. src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1353 This lock is used to prevent two concurrent table creation attempts. tryLockTable() is more desirable here. src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1310 Can we add tryLockTable() ? It would be useful for the non-winning thread to exit quickly. src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:2 No year, please. src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:47 Should Bytes.toStringBinary() be used here ? src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:53 Add 'be ' before 'released' src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:137 What if lock release fails ? src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:27 Can you tell me which zookeeper branch provides this lock ? In http://svn.apache.org/repos/asf/zookeeper/trunk, I don't seem to find this class. REVISION DETAIL https://reviews.facebook.net/D2997 Introduce a zk hosted table-wide read/write lock so only one table operation at a time -- Key: HBASE-5494 URL: https://issues.apache.org/jira/browse/HBASE-5494 Project: HBase Issue Type: Improvement Reporter: stack Attachments: D2997.3.patch I saw this facility over in the accumulo code base. Currently we just try to sort out the mess when splits come in during an online schema edit; somehow we figure we can figure all possible region transition combinations and make the right call. We could try and narrow the number of combinations by taking out a zk table lock when doing table operations. For example, on split or merge, we could take a read-only lock meaning the table can't be disabled while these are running. We could then take a write only lock if we want to ensure the table doesn't change while disabling or enabling process is happening. Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5929) HBaseAdmin.majorCompact and hbase shell randomly throw exceptions when asked to majorcompact regions.
Aravind Gottipati created HBASE-5929: Summary: HBaseAdmin.majorCompact and hbase shell randomly throw exceptions when asked to majorcompact regions. Key: HBASE-5929 URL: https://issues.apache.org/jira/browse/HBASE-5929 Project: HBase Issue Type: Bug Components: client, shell Affects Versions: 0.92.1 Environment: Linux Ubuntu Lucid 64bit Reporter: Aravind Gottipati Priority: Minor I have been noticing that calls to HBaseAdmin.majorCompact throws exceptions randomly for some regions. I could not find a pattern to these exception. The code I have simply does this admin.majorCompact(region.getRegionNameAsString()). admin is an instance of HBaseAdmin and region is an instance of HRegionInfo. The exception I get is org.apache.hadoop.hbase.TableNotFoundException: -ROOT-,,0 at org.apache.hadoop.hbase.client.HBaseAdmin.tableNameString(HBaseAdmin.java:1473) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1235) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.majorCompact(HBaseAdmin.java:1209) ~[hbase-0.92.1.jar:0.92.1] at com.stumbleupon.hbaseadmin.HBaseCompact.compactAllServers(Unknown Source) [hbase_compact.jar:na] In this case it's the root region, but I get similar exceptions for other tables, like this. 2012-05-03 19:03:42,994 WARN [main] HBaseCompact: Could not compact: org.apache.hadoop.hbase.TableNotFoundException: ad_daily,49842:2009-07-10,1269763588508.1997607018 at org.apache.hadoop.hbase.client.HBaseAdmin.tableNameString(HBaseAdmin.java:1473) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1235) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.majorCompact(HBaseAdmin.java:1209) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.majorCompact(HBaseAdmin.java:1196) ~[hbase-0.92.1.jar:0.92.1] at com.stumbleupon.hbaseadmin.HBaseCompact.compactAllServers(Unknown Source) [hbase_compact.jar:na] at com.stumbleupon.hbaseadmin.HBaseCompact.main(Unknown Source) [hbase_compact.jar:na] I see this on hbase shell as well. However, I don't see these exceptions if I use admin.majorCompact(region.getRegionName()), so it looks like something gets lost when I use getRegionNameAsString(). Let me know if I can provide more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5928) Hbck shouldn't npe when there are no tables.
[ https://issues.apache.org/jira/browse/HBASE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267713#comment-13267713 ] Hadoop QA commented on HBASE-5928: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525489/HBASE-5928-0.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestAssignmentManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1748//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1748//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1748//console This message is automatically generated. Hbck shouldn't npe when there are no tables. Key: HBASE-5928 URL: https://issues.apache.org/jira/browse/HBASE-5928 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Priority: Minor Attachments: HBASE-5928-0.patch hbase fsck errors out when there are no tables. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck.reportTablesInFlux(HBaseFsck.java:560) at org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:346) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267714#comment-13267714 ] Hudson commented on HBASE-5883: --- Integrated in HBase-0.92 #396 (See [https://builds.apache.org/job/HBase-0.92/396/]) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333537) Result = FAILURE tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, HBASE-5883-94.patch, HBASE-5883-trunk.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267728#comment-13267728 ] Phabricator commented on HBASE-5494: avf has commented on the revision [jira] [HBASE-5494] [89-fb] Table-level locks for schema changing operations.. Thanks for the inline comments, @tedyu -- I've replied to a few quick ones inline. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:27 DistributedLock is implemented as part of the patch (see DistributedLock.java) src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:47 Metadata for table level locks is stored as plain text -- this is to allow operations to view lock information from the zookeeper CLI: toStringBinary() would not be needed here. src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:137 In this case, an IOException is thrown up to the caller: this is to indicate a non-recoverable ZooKeeper error (DistributedLock uses RecoverableZooKeeper class under the covers). .release() may also throw an IllegalStateException -- but this is essentially used an assertion in this case (releasing a lock that isn't held). REVISION DETAIL https://reviews.facebook.net/D2997 Introduce a zk hosted table-wide read/write lock so only one table operation at a time -- Key: HBASE-5494 URL: https://issues.apache.org/jira/browse/HBASE-5494 Project: HBase Issue Type: Improvement Reporter: stack Attachments: D2997.3.patch I saw this facility over in the accumulo code base. Currently we just try to sort out the mess when splits come in during an online schema edit; somehow we figure we can figure all possible region transition combinations and make the right call. We could try and narrow the number of combinations by taking out a zk table lock when doing table operations. For example, on split or merge, we could take a read-only lock meaning the table can't be disabled while these are running. We could then take a write only lock if we want to ensure the table doesn't change while disabling or enabling process is happening. Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267743#comment-13267743 ] stack commented on HBASE-5883: -- Can't we at least check the message to ensure its what we expect? (See the second catch below where we look for connection reset). Can we be sure what comes up here is the ConnectException we set down in HBaseRPC? {code} + if (ioe instanceof ConnectException) { +// Catch. Connect refused. {code} This redoing of an exception seems problematic. Its really necessary? {code} +} else if (ioex.getMessage().toLowerCase() +.contains(connection refused)) { + ce = new ConnectException(ioex.getMessage()); + ioe = ce; {code} I'd feel better about this fix if we could figure where the exception came from (Its not from the rpc stringifying of exceptions to pass them from server to client? Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, HBASE-5883-94.patch, HBASE-5883-trunk.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO
[jira] [Commented] (HBASE-5928) Hbck shouldn't npe when there are no tables.
[ https://issues.apache.org/jira/browse/HBASE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267744#comment-13267744 ] stack commented on HBASE-5928: -- +1 on patch. Jon Hsieh? Hbck shouldn't npe when there are no tables. Key: HBASE-5928 URL: https://issues.apache.org/jira/browse/HBASE-5928 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Priority: Minor Attachments: HBASE-5928-0.patch hbase fsck errors out when there are no tables. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck.reportTablesInFlux(HBaseFsck.java:560) at org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:346) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5928) Hbck shouldn't npe when there are no tables.
[ https://issues.apache.org/jira/browse/HBASE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267746#comment-13267746 ] Elliott Clark commented on HBASE-5928: -- I looped TestAssignmentManager several times locally and it always passes. Hbck shouldn't npe when there are no tables. Key: HBASE-5928 URL: https://issues.apache.org/jira/browse/HBASE-5928 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Priority: Minor Attachments: HBASE-5928-0.patch hbase fsck errors out when there are no tables. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck.reportTablesInFlux(HBaseFsck.java:560) at org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:346) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5928) Hbck shouldn't npe when there are no tables.
[ https://issues.apache.org/jira/browse/HBASE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267747#comment-13267747 ] Todd Lipcon commented on HBASE-5928: Why not make it return an empty list instead? Returning null instead of empty collections is just begging for bugs like this. Hbck shouldn't npe when there are no tables. Key: HBASE-5928 URL: https://issues.apache.org/jira/browse/HBASE-5928 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Priority: Minor Attachments: HBASE-5928-0.patch hbase fsck errors out when there are no tables. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck.reportTablesInFlux(HBaseFsck.java:560) at org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:346) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5931) HBase security profile doesn't compile
Jimmy Xiang created HBASE-5931: -- Summary: HBase security profile doesn't compile Key: HBASE-5931 URL: https://issues.apache.org/jira/browse/HBASE-5931 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang The compilation is broken. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5930) Periodically flush the Memstore?
Lars Hofhansl created HBASE-5930: Summary: Periodically flush the Memstore? Key: HBASE-5930 URL: https://issues.apache.org/jira/browse/HBASE-5930 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Priority: Minor A colleague of mine ran into an interesting issue. He inserted some data with the WAL disabled, which happened to fit in the aggregate Memstores memory. Two weeks later he a had problem with the HDFS cluster, which caused the region servers to abort. He found that his data was lost. Looking at the log we found that the Memstores were not flushed at all during these two weeks. Should we have an option to flush memstores periodically. There are obvious downsides to this, like many small storefiles, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5931) HBase security profile doesn't compile
[ https://issues.apache.org/jira/browse/HBASE-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5931: --- Attachment: hbase-5931.patch HBase security profile doesn't compile --- Key: HBASE-5931 URL: https://issues.apache.org/jira/browse/HBASE-5931 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: hbase-5931.patch The compilation is broken. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267751#comment-13267751 ] nkeywal commented on HBASE-5877: v12, should be final. 1) ServerName is used everywhere in the interface, thanks to protobuf 2) hadoop.ipc serialization of exception is based on the #getMessage. So we have to parse it internally. It's not visisble to the exception user. 3) The code to manage the error in the client package is quite complex. We have the exception at the very beginning, and then it's checked again, but we don't have the real exception anymore. I used a new historyList to make it works. There is another JIRA for other improvement, in which I could get rid of this (HBASE-5924) 4) Generated with protobuf 2.4.1 5) The destination is the closeRegion interface is a kind of interface hijacking. Other options would be: - sharing the region state in zookeeper - letting the regionserver calls the master to get the new server. On paper this would be more efficient than a client - master call. In both cases we could consider that the client should not connect to the master except for cluster administration (create table, split regin; ...). That would increase global reliability. That's for another discussion as well I think. 6) RegionServerServices has been modified to set a destination when removing a region from the online regions. 7) In another JIRA I will manage the case when the destination is not specified when calling the move function. When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5877: --- Attachment: 5877.v12.patch When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5931) HBase security profile doesn't compile
[ https://issues.apache.org/jira/browse/HBASE-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5931: --- Status: Patch Available (was: Open) HBase security profile doesn't compile --- Key: HBASE-5931 URL: https://issues.apache.org/jira/browse/HBASE-5931 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: hbase-5931.patch The compilation is broken. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5877: --- Status: Open (was: Patch Available) When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5877: --- Status: Patch Available (was: Open) When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5929) HBaseAdmin.majorCompact and hbase shell randomly throw exceptions when asked to majorcompact regions.
[ https://issues.apache.org/jira/browse/HBASE-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267753#comment-13267753 ] stack commented on HBASE-5929: -- This seems uninterpretable as table name or region name 'org.apache.hadoop.hbase.TableNotFoundException: ROOT,,0'... I'd have expected it to be -ROOT-,,0 if hbase was to have any chance? Is this coming in via jruby mighty Aravind? Does 'ad_daily,49842:2009-07-10,1269763588508.1997607018' exist on the cluster? (I know I should look myself). HBaseAdmin.majorCompact and hbase shell randomly throw exceptions when asked to majorcompact regions. - Key: HBASE-5929 URL: https://issues.apache.org/jira/browse/HBASE-5929 Project: HBase Issue Type: Bug Components: client, shell Affects Versions: 0.92.1 Environment: Linux Ubuntu Lucid 64bit Reporter: Aravind Gottipati Priority: Minor I have been noticing that calls to HBaseAdmin.majorCompact throws exceptions randomly for some regions. I could not find a pattern to these exception. The code I have simply does this admin.majorCompact(region.getRegionNameAsString()). admin is an instance of HBaseAdmin and region is an instance of HRegionInfo. The exception I get is org.apache.hadoop.hbase.TableNotFoundException: -ROOT-,,0 at org.apache.hadoop.hbase.client.HBaseAdmin.tableNameString(HBaseAdmin.java:1473) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1235) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.majorCompact(HBaseAdmin.java:1209) ~[hbase-0.92.1.jar:0.92.1] at com.stumbleupon.hbaseadmin.HBaseCompact.compactAllServers(Unknown Source) [hbase_compact.jar:na] In this case it's the root region, but I get similar exceptions for other tables, like this. 2012-05-03 19:03:42,994 WARN [main] HBaseCompact: Could not compact: org.apache.hadoop.hbase.TableNotFoundException: ad_daily,49842:2009-07-10,1269763588508.1997607018 at org.apache.hadoop.hbase.client.HBaseAdmin.tableNameString(HBaseAdmin.java:1473) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1235) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.majorCompact(HBaseAdmin.java:1209) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.majorCompact(HBaseAdmin.java:1196) ~[hbase-0.92.1.jar:0.92.1] at com.stumbleupon.hbaseadmin.HBaseCompact.compactAllServers(Unknown Source) [hbase_compact.jar:na] at com.stumbleupon.hbaseadmin.HBaseCompact.main(Unknown Source) [hbase_compact.jar:na] I see this on hbase shell as well. However, I don't see these exceptions if I use admin.majorCompact(region.getRegionName()), so it looks like something gets lost when I use getRegionNameAsString(). Let me know if I can provide more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira