[jira] [Commented] (HBASE-5548) Add ability to get a table in the shell
[ https://issues.apache.org/jira/browse/HBASE-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266725#comment-13266725 ] stack commented on HBASE-5548: -- Hmm... did I? It doesn't list the files above. Let me check. Add ability to get a table in the shell --- Key: HBASE-5548 URL: https://issues.apache.org/jira/browse/HBASE-5548 Project: HBase Issue Type: Improvement Components: shell Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: ruby_HBASE-5528-v0.patch, ruby_HBASE-5548-addendum.patch, ruby_HBASE-5548-v1.patch, ruby_HBASE-5548-v2.patch, ruby_HBASE-5548-v3.patch, ruby_HBASE-5548-v5.patch Currently, all the commands that operate on a table in the shell first have to take the table as name as input. There are two main considerations: * It is annoying to have to write the table name every time, when you should just be able to get a reference to a table * the current implementation is very wasteful - it creates a new HTable for each call (but reuses the connection since it uses the same configuration) We should be able to get a handle to a single HTable and then operate on that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5548) Add ability to get a table in the shell
[ https://issues.apache.org/jira/browse/HBASE-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266732#comment-13266732 ] stack commented on HBASE-5548: -- Ok. Over in hbase-5840 I say I'm going to leave it in but instead I'm going to back it out so its easier on the fellows who are trying to follow behind us trying to make sense of our actions. Add ability to get a table in the shell --- Key: HBASE-5548 URL: https://issues.apache.org/jira/browse/HBASE-5548 Project: HBase Issue Type: Improvement Components: shell Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: ruby_HBASE-5528-v0.patch, ruby_HBASE-5548-addendum.patch, ruby_HBASE-5548-v1.patch, ruby_HBASE-5548-v2.patch, ruby_HBASE-5548-v3.patch, ruby_HBASE-5548-v5.patch Currently, all the commands that operate on a table in the shell first have to take the table as name as input. There are two main considerations: * It is annoying to have to write the table name every time, when you should just be able to get a reference to a table * the current implementation is very wasteful - it creates a new HTable for each call (but reuses the connection since it uses the same configuration) We should be able to get a handle to a single HTable and then operate on that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5548) Add ability to get a table in the shell
[ https://issues.apache.org/jira/browse/HBASE-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266735#comment-13266735 ] stack commented on HBASE-5548: -- Backed out the miscommit of hbase-5840 that went in with this. Sorry for the mess. Thanks Ram for fingering it. Add ability to get a table in the shell --- Key: HBASE-5548 URL: https://issues.apache.org/jira/browse/HBASE-5548 Project: HBase Issue Type: Improvement Components: shell Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: ruby_HBASE-5528-v0.patch, ruby_HBASE-5548-addendum.patch, ruby_HBASE-5548-v1.patch, ruby_HBASE-5548-v2.patch, ruby_HBASE-5548-v3.patch, ruby_HBASE-5548-v5.patch Currently, all the commands that operate on a table in the shell first have to take the table as name as input. There are two main considerations: * It is annoying to have to write the table name every time, when you should just be able to get a reference to a table * the current implementation is very wasteful - it creates a new HTable for each call (but reuses the connection since it uses the same configuration) We should be able to get a handle to a single HTable and then operate on that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5840) Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status
[ https://issues.apache.org/jira/browse/HBASE-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266736#comment-13266736 ] stack commented on HBASE-5840: -- Changed my mind. Backed out the miscommit of this trunk patch that went in w/ the commit of hbase-5548 by mistake. So, this patch is still to be committed on trunk and branch. Sorry for my mess. Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status -- Key: HBASE-5840 URL: https://issues.apache.org/jira/browse/HBASE-5840 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5840.patch, HBASE-5840_trunk.patch, HBASE-5840_v2.patch TaskMonitor Status will not be cleared in case Regions FAILED_OPEN. This will keeps showing old status. This will miss leads the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5840) Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status
[ https://issues.apache.org/jira/browse/HBASE-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266751#comment-13266751 ] ramkrishna.s.vasudevan commented on HBASE-5840: --- Thanks Stack. Committed to trunk. Waiting for Lars to confirm on 0.94. Then will commit there. Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status -- Key: HBASE-5840 URL: https://issues.apache.org/jira/browse/HBASE-5840 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5840.patch, HBASE-5840_trunk.patch, HBASE-5840_v2.patch TaskMonitor Status will not be cleared in case Regions FAILED_OPEN. This will keeps showing old status. This will miss leads the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb
[ https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266765#comment-13266765 ] Hudson commented on HBASE-5869: --- Integrated in HBase-TRUNK #2836 (See [https://builds.apache.org/job/HBase-TRUNK/2836/]) HBASE-5869 Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb (Revision 1333099) Result = SUCCESS stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/DeserializationException.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/EmptyWatcher.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HBaseException.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/RegionTransition.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ServerName.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/SplitLogCounters.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/SplitLogTask.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/RegionTransitionData.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/EmptyWatcher.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterAddressTracker.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java * /hbase/trunk/src/main/protobuf/ZooKeeper.proto * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestSerialization.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/Mocking.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestOpenRegionHandler.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb - Key: HBASE-5869 URL: https://issues.apache.org/jira/browse/HBASE-5869 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.96.0 Attachments: 5869v7.txt, 5869v8.txt, 5869v9.txt, firstcut.txt, secondcut.txt, v10.txt, v11.txt, v12.txt, v13.txt, v13.txt, v4.txt, v5.txt, v6.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https
[jira] [Commented] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb
[ https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266773#comment-13266773 ] stack commented on HBASE-5869: -- No. I committed the patch that passed hadoopqa. Will do new issue to address your comments. Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb - Key: HBASE-5869 URL: https://issues.apache.org/jira/browse/HBASE-5869 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.96.0 Attachments: 5869v7.txt, 5869v8.txt, 5869v9.txt, firstcut.txt, secondcut.txt, v10.txt, v11.txt, v12.txt, v13.txt, v13.txt, v4.txt, v5.txt, v6.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5919) Add missing Ted review fixes for HBASE-5869
stack created HBASE-5919: Summary: Add missing Ted review fixes for HBASE-5869 Key: HBASE-5919 URL: https://issues.apache.org/jira/browse/HBASE-5919 Project: HBase Issue Type: Bug Reporter: stack Priority: Blocker I missed addressing a few of Ted's comments on the end of my navigating HBASE-5869 commit. Fix here. Make it a blocker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5840) Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status
[ https://issues.apache.org/jira/browse/HBASE-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266816#comment-13266816 ] Hudson commented on HBASE-5840: --- Integrated in HBase-TRUNK #2837 (See [https://builds.apache.org/job/HBase-TRUNK/2837/]) HBASE-5840 Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status (RajeshBabu) (Revision 1333124) HBASE-5548 Add ability to get a table in the shell; BACKING OUT MISTAKEN CO-COMMIT OF HBASE-5840 (Revision 1333123) Result = SUCCESS ramkrishna : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status -- Key: HBASE-5840 URL: https://issues.apache.org/jira/browse/HBASE-5840 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5840.patch, HBASE-5840_trunk.patch, HBASE-5840_v2.patch TaskMonitor Status will not be cleared in case Regions FAILED_OPEN. This will keeps showing old status. This will miss leads the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2214) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly
[ https://issues.apache.org/jira/browse/HBASE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266818#comment-13266818 ] Hudson commented on HBASE-2214: --- Integrated in HBase-TRUNK #2837 (See [https://builds.apache.org/job/HBase-TRUNK/2837/]) HBASE-2214 Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly (Ferdy Galema) (Revision 1333122) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/RPCProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServer.java * /hbase/trunk/src/main/protobuf/Client.proto * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly - Key: HBASE-2214 URL: https://issues.apache.org/jira/browse/HBASE-2214 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Ferdy Galema Fix For: 0.96.0, 0.94.1 Attachments: HBASE-2214-0.94-v2.txt, HBASE-2214-0.94-v3.txt, HBASE-2214-0.94.txt, HBASE-2214-v4.txt, HBASE-2214-v5.txt, HBASE-2214-v6.txt, HBASE-2214-v7.txt, HBASE-2214_with_broken_TestShell.txt The notion that you set size rather than row count specifying how many rows a scanner should return in each cycle was raised over in hbase-1966. Its a good one making hbase regular though the data under it may vary. HBase-1966 was committed but the patch was constrained by the fact that it needed to not change RPC interface. This issue is about doing hbase-1966 for 0.21 in a clean, unconstrained way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5548) Add ability to get a table in the shell
[ https://issues.apache.org/jira/browse/HBASE-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266817#comment-13266817 ] Hudson commented on HBASE-5548: --- Integrated in HBase-TRUNK #2837 (See [https://builds.apache.org/job/HBase-TRUNK/2837/]) HBASE-5548 Add ability to get a table in the shell; BACKING OUT MISTAKEN CO-COMMIT OF HBASE-5840 (Revision 1333123) Result = SUCCESS stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java Add ability to get a table in the shell --- Key: HBASE-5548 URL: https://issues.apache.org/jira/browse/HBASE-5548 Project: HBase Issue Type: Improvement Components: shell Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: ruby_HBASE-5528-v0.patch, ruby_HBASE-5548-addendum.patch, ruby_HBASE-5548-v1.patch, ruby_HBASE-5548-v2.patch, ruby_HBASE-5548-v3.patch, ruby_HBASE-5548-v5.patch Currently, all the commands that operate on a table in the shell first have to take the table as name as input. There are two main considerations: * It is annoying to have to write the table name every time, when you should just be able to get a reference to a table * the current implementation is very wasteful - it creates a new HTable for each call (but reuses the connection since it uses the same configuration) We should be able to get a handle to a single HTable and then operate on that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1996) Configure scanner buffer in bytes instead of number of rows
[ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266815#comment-13266815 ] Hudson commented on HBASE-1996: --- Integrated in HBase-TRUNK #2837 (See [https://builds.apache.org/job/HBase-TRUNK/2837/]) HBASE-2214 Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly (Ferdy Galema) (Revision 1333122) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/RPCProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServer.java * /hbase/trunk/src/main/protobuf/Client.proto * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java Configure scanner buffer in bytes instead of number of rows --- Key: HBASE-1996 URL: https://issues.apache.org/jira/browse/HBASE-1996 Project: HBase Issue Type: Improvement Reporter: Dave Latham Assignee: Dave Latham Fix For: 0.90.0 Attachments: 1966.patch, 1996-0.20.3-v2.patch, 1996-0.20.3-v3.patch, 1996-0.20.3.patch Currently, the default scanner fetches a single row at a time. This makes for very slow scans on tables where the rows are not large. You can change the setting for an HTable instance or for each Scan. It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot. Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows. Let's change the setting so that it works with a size in bytes, rather than in rows. This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM. Note that the case is very similar to table writes as well. When disabling auto flush, we buffer a list of Put's to commit at once. That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush. If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows. Changing the scan buffer to be configured like the write buffer will make it more consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266827#comment-13266827 ] nkeywal commented on HBASE-5877: I have the same locally, so it's likely my patch... When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch, 5877.v6.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5905) Protobuf interface for Admin: split between the internal and the external/customer interface
[ https://issues.apache.org/jira/browse/HBASE-5905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266839#comment-13266839 ] nkeywal commented on HBASE-5905: This would make sense if we think that customers should/will use the protobuf interface. Protobuf interface for Admin: split between the internal and the external/customer interface Key: HBASE-5905 URL: https://issues.apache.org/jira/browse/HBASE-5905 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal After a short discussion with Stack, I create a jira. -- I'am a little bit confused by the protobuf interface for closeRegion. We have two types of closeRegion today: 1) the external ones; available in client.HBaseAdmin. They take the server and the region identifier as a parameter and nothing else. 2) The internal ones, called for example by the master. They have more parameters (like versionOfClosingNode or transitionInZK). When I look at protobuf.ProtobufUtil, I see: public static void closeRegion(final AdminProtocol admin, final byte[] regionName, final boolean transitionInZK) throws IOException { CloseRegionRequest closeRegionRequest = RequestConverter.buildCloseRegionRequest(regionName, transitionInZK); try { admin.closeRegion(null, closeRegionRequest); } catch (ServiceException se) { throw getRemoteException(se); } } In other words, it seems that we merged the two interfaces into a single one. Is that the intend? I checked, the internal fields in closeRegionRequest are all optional (that's good). Still, it means that the end user could use them or at least would need to distinguish between the optional for functional reasons and the optional - do not use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1996) Configure scanner buffer in bytes instead of number of rows
[ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266870#comment-13266870 ] Hudson commented on HBASE-1996: --- Integrated in HBase-0.94 #170 (See [https://builds.apache.org/job/HBase-0.94/170/]) HBASE-2214 Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly (Ferdy Galema) (Revision 1333157) Result = FAILURE tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Scan.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java Configure scanner buffer in bytes instead of number of rows --- Key: HBASE-1996 URL: https://issues.apache.org/jira/browse/HBASE-1996 Project: HBase Issue Type: Improvement Reporter: Dave Latham Assignee: Dave Latham Fix For: 0.90.0 Attachments: 1966.patch, 1996-0.20.3-v2.patch, 1996-0.20.3-v3.patch, 1996-0.20.3.patch Currently, the default scanner fetches a single row at a time. This makes for very slow scans on tables where the rows are not large. You can change the setting for an HTable instance or for each Scan. It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot. Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows. Let's change the setting so that it works with a size in bytes, rather than in rows. This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM. Note that the case is very similar to table writes as well. When disabling auto flush, we buffer a list of Put's to commit at once. That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush. If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows. Changing the scan buffer to be configured like the write buffer will make it more consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2214) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly
[ https://issues.apache.org/jira/browse/HBASE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266871#comment-13266871 ] Hudson commented on HBASE-2214: --- Integrated in HBase-0.94 #170 (See [https://builds.apache.org/job/HBase-0.94/170/]) HBASE-2214 Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly (Ferdy Galema) (Revision 1333157) Result = FAILURE tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Scan.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly - Key: HBASE-2214 URL: https://issues.apache.org/jira/browse/HBASE-2214 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Ferdy Galema Fix For: 0.96.0, 0.94.1 Attachments: HBASE-2214-0.94-v2.txt, HBASE-2214-0.94-v3.txt, HBASE-2214-0.94.txt, HBASE-2214-v4.txt, HBASE-2214-v5.txt, HBASE-2214-v6.txt, HBASE-2214-v7.txt, HBASE-2214_with_broken_TestShell.txt The notion that you set size rather than row count specifying how many rows a scanner should return in each cycle was raised over in hbase-1966. Its a good one making hbase regular though the data under it may vary. HBase-1966 was committed but the patch was constrained by the fact that it needed to not change RPC interface. This issue is about doing hbase-1966 for 0.21 in a clean, unconstrained way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5907) enhance HLog pretty printer to print additional useful stats
[ https://issues.apache.org/jira/browse/HBASE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266904#comment-13266904 ] Phabricator commented on HBASE-5907: mbautin has committed the revision [jira] [HBASE-5907] [89-fb] enhance HLog pretty printer to print additional useful stats. REVISION DETAIL https://reviews.facebook.net/D2979 COMMIT https://reviews.facebook.net/rHBASEEIGHTNINEFBBRANCH1333198 enhance HLog pretty printer to print additional useful stats Key: HBASE-5907 URL: https://issues.apache.org/jira/browse/HBASE-5907 Project: HBase Issue Type: Improvement Reporter: Kannan Muthukkaruppan Priority: Minor Attachments: D2979.1.patch, D2979.2.patch It would be useful for analysis purposes to enhance the HLog pretty printer to optionally print a bunch of additional stats such as: 1) # of txns 2) # of KVs updated 3) avg size of txns 4) avg size of KVs 5) avg # of KVs written per txn 5) unique CF signatures involved in put/delete operatons; and breakdown of some of the above metrics by these signatures, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5879) Enable JMX metrics collection for the Thrift proxy
[ https://issues.apache.org/jira/browse/HBASE-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266903#comment-13266903 ] Phabricator commented on HBASE-5879: mbautin has committed the revision [jira] [HBASE-5879] [89-fb] Enable JMX metrics collection for the Thrift proxy. REVISION DETAIL https://reviews.facebook.net/D2955 COMMIT https://reviews.facebook.net/rHBASEEIGHTNINEFBBRANCH1333194 Enable JMX metrics collection for the Thrift proxy -- Key: HBASE-5879 URL: https://issues.apache.org/jira/browse/HBASE-5879 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Priority: Minor Fix For: 0.96.0 Attachments: 5879_trunk.txt, D2955.1.patch We need to enable JMX on the Thrift proxy on a separate port different from the JMX port used by regionserver. This is necessary for metrics collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2625) Make testDynamicBloom()'s randomness deterministic
[ https://issues.apache.org/jira/browse/HBASE-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266912#comment-13266912 ] Hudson commented on HBASE-2625: --- Integrated in HBase-TRUNK #2838 (See [https://builds.apache.org/job/HBase-TRUNK/2838/]) HBASE-2625 Avoid byte buffer allocations when reading a value from a Result object (Tudor Scurtu) (Revision 1333159) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Result.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestKeyValue.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestResult.java Make testDynamicBloom()'s randomness deterministic Key: HBASE-2625 URL: https://issues.apache.org/jira/browse/HBASE-2625 Project: HBase Issue Type: Test Components: test Affects Versions: 0.90.0 Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.90.0 Attachments: hbase-2625.patch Had a failure with testDynamicBloom on Hudson today. Will investigate, however it would be nice to reproduce the problem to make sure it's not the fault of my test assumptions. I plan to seed the Random number generator with the current time and print that out for post-mortem analysis. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4990) Document secure HBase setup
[ https://issues.apache.org/jira/browse/HBASE-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4990: - Attachment: 4990v2.txt Version two; worked on it w/ Andrew sitting beside me. Document secure HBase setup --- Key: HBASE-4990 URL: https://issues.apache.org/jira/browse/HBASE-4990 Project: HBase Issue Type: Sub-task Affects Versions: 0.92.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Attachments: 4990.txt, 4990v2.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4990) Document secure HBase setup
[ https://issues.apache.org/jira/browse/HBASE-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-4990. -- Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Committed to trunk. Document secure HBase setup --- Key: HBASE-4990 URL: https://issues.apache.org/jira/browse/HBASE-4990 Project: HBase Issue Type: Sub-task Affects Versions: 0.92.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.96.0 Attachments: 4990.txt, 4990v2.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5919) Add missing Ted review fixes for HBASE-5869
[ https://issues.apache.org/jira/browse/HBASE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5919: - Attachment: 5919.txt Address issues raised by Ted over in hbase-5869 that I did not want to address there because the criticims' were relatively nits for a patch of 300k. I'd gotten two clean hadoopqa builds on v13... and was afraid my patch would fail to apply if I did more hadoopqa cycles. Add missing Ted review fixes for HBASE-5869 --- Key: HBASE-5919 URL: https://issues.apache.org/jira/browse/HBASE-5919 Project: HBase Issue Type: Bug Reporter: stack Priority: Blocker Attachments: 5919.txt I missed addressing a few of Ted's comments on the end of my navigating HBASE-5869 commit. Fix here. Make it a blocker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5919) Add missing Ted review fixes for HBASE-5869
[ https://issues.apache.org/jira/browse/HBASE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5919: - Status: Patch Available (was: Open) Add missing Ted review fixes for HBASE-5869 --- Key: HBASE-5919 URL: https://issues.apache.org/jira/browse/HBASE-5919 Project: HBase Issue Type: Bug Reporter: stack Priority: Blocker Attachments: 5919.txt I missed addressing a few of Ted's comments on the end of my navigating HBASE-5869 commit. Fix here. Make it a blocker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5919) Add missing Ted review fixes for HBASE-5869
[ https://issues.apache.org/jira/browse/HBASE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266981#comment-13266981 ] stack commented on HBASE-5919: -- Review how this method is called. Look up in the file two methods. Add missing Ted review fixes for HBASE-5869 --- Key: HBASE-5919 URL: https://issues.apache.org/jira/browse/HBASE-5919 Project: HBase Issue Type: Bug Reporter: stack Priority: Blocker Attachments: 5919.txt I missed addressing a few of Ted's comments on the end of my navigating HBASE-5869 commit. Fix here. Make it a blocker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4990) Document secure HBase setup
[ https://issues.apache.org/jira/browse/HBASE-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266991#comment-13266991 ] Hudson commented on HBASE-4990: --- Integrated in HBase-TRUNK #2839 (See [https://builds.apache.org/job/HBase-TRUNK/2839/]) HBASE-4990 Document secure HBase setup (Revision 1333212) Result = SUCCESS stack : Files : * /hbase/trunk/src/docbkx/book.xml * /hbase/trunk/src/docbkx/security.xml Document secure HBase setup --- Key: HBASE-4990 URL: https://issues.apache.org/jira/browse/HBASE-4990 Project: HBase Issue Type: Sub-task Affects Versions: 0.92.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.96.0 Attachments: 4990.txt, 4990v2.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError
[ https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13266998#comment-13266998 ] stack commented on HBASE-5922: -- When does this issue arise? When we try to getclosest on split key? HalfStoreFileReader seekBefore causes StackOverflowError Key: HBASE-5922 URL: https://issues.apache.org/jira/browse/HBASE-5922 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.90.0 Environment: HBase 0.90.4 Reporter: Nate Putnam Assignee: Nate Putnam Fix For: 0.90.0 Attachments: HBase-5922.patch Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the underlying store file is a reference and the row key is in the bottom. java.io.IOException: java.io.IOException: java.lang.StackOverflowError at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651) at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.StackOverflowError at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError
[ https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267002#comment-13267002 ] stack commented on HBASE-5922: -- {code} === --- src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java (revision 1169834) +++ src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java (revision ) @@ -146,7 +146,7 @@ } else { if (getComparator().compare(key, offset, length, splitkey, 0, splitkey.length) = 0) { -return seekBefore(splitkey, 0, splitkey.length); +return false; } {code} So, if result is splitKey, we need to get something before the splitkey, not fail? Maybe we should check what comes out of the seekBefore how is it that its returning us splitkey again? HalfStoreFileReader seekBefore causes StackOverflowError Key: HBASE-5922 URL: https://issues.apache.org/jira/browse/HBASE-5922 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.90.0 Environment: HBase 0.90.4 Reporter: Nate Putnam Assignee: Nate Putnam Fix For: 0.90.0 Attachments: HBase-5922.patch Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the underlying store file is a reference and the row key is in the bottom. java.io.IOException: java.io.IOException: java.lang.StackOverflowError at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651) at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.StackOverflowError at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError
[ https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267015#comment-13267015 ] stack commented on HBASE-5922: -- Hmm... maybe your fix is right. There are other tests of half file, ones we added when we ran into issues w/ it in the past. Lets try your patch against hadoopqa. HalfStoreFileReader seekBefore causes StackOverflowError Key: HBASE-5922 URL: https://issues.apache.org/jira/browse/HBASE-5922 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.90.0 Environment: HBase 0.90.4 Reporter: Nate Putnam Assignee: Nate Putnam Fix For: 0.90.0 Attachments: HBase-5922.patch Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the underlying store file is a reference and the row key is in the bottom. java.io.IOException: java.io.IOException: java.lang.StackOverflowError at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651) at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.StackOverflowError at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5444) Add PB-based calls to HMasterRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267040#comment-13267040 ] stack commented on HBASE-5444: -- Is v6 what is up in rb? And you've submitted it to hadoopqa? Thanks Gregory. Add PB-based calls to HMasterRegionInterface Key: HBASE-5444 URL: https://issues.apache.org/jira/browse/HBASE-5444 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Gregory Chanan Attachments: HBASE-5444-v6-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5919) Add fixes for Ted's review comments from HBASE-5869
[ https://issues.apache.org/jira/browse/HBASE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reassigned HBASE-5919: Assignee: Ted Yu I thought this was my issue but you seem to have taken it over Ted. Assigning you. Add fixes for Ted's review comments from HBASE-5869 --- Key: HBASE-5919 URL: https://issues.apache.org/jira/browse/HBASE-5919 Project: HBase Issue Type: Bug Reporter: stack Assignee: Ted Yu Priority: Blocker Attachments: 5919-v2.txt, 5919-v4.txt, 5919.txt I missed addressing a few of Ted's comments on the end of my navigating HBASE-5869 commit. Fix here. Make it a blocker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5914) Bulk assign regions in the process of ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267154#comment-13267154 ] ramkrishna.s.vasudevan commented on HBASE-5914: --- @Chunhui Yes, in SSH case retain assignment is not valid. But what if i don't need the roundrobin assignment itself to be called like how we have in master startup? I just meant a similar configuration like the 'hbase.master.startup.retainassign', for eg., 'hbase.master.servershutdown.roundrobinassign'something like this. Just my thoughts. Bulk assign regions in the process of ServerShutdownHandler --- Key: HBASE-5914 URL: https://issues.apache.org/jira/browse/HBASE-5914 Project: HBase Issue Type: Improvement Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: HBASE-5914.patch, HBASE-5914v2.patch In the process of ServerShutdownHandler, we currently assign regions singly. In the large cluster, one regionserver always carried many regions, this action is quite slow. What about using bulk assign regions like cluster start up. In current logic, if we failed assigning many regions to one destination server, we will wait unitl timeout, however in the process of ServerShutdownHandler, we should retry it to another server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5444) Add PB-based calls to HMasterRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267156#comment-13267156 ] stack commented on HBASE-5444: -- Sorry Gregory, I meant to ask, would you like me commit this and then in separate issues work on the outstanding stuff or do you want to update a new patch? Add PB-based calls to HMasterRegionInterface Key: HBASE-5444 URL: https://issues.apache.org/jira/browse/HBASE-5444 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Gregory Chanan Attachments: HBASE-5444-v6-trunk.patch, HBASE-5444-v9-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5919) Add fixes for Ted's review comments from HBASE-5869
[ https://issues.apache.org/jira/browse/HBASE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267163#comment-13267163 ] stack commented on HBASE-5919: -- None. Add fixes for Ted's review comments from HBASE-5869 --- Key: HBASE-5919 URL: https://issues.apache.org/jira/browse/HBASE-5919 Project: HBase Issue Type: Bug Reporter: stack Assignee: Ted Yu Priority: Blocker Attachments: 5919-v2.txt, 5919-v4.txt, 5919.txt I missed addressing a few of Ted's comments on the end of my navigating HBASE-5869 commit. Fix here. Make it a blocker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5914) Bulk assign regions in the process of ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267164#comment-13267164 ] ramkrishna.s.vasudevan commented on HBASE-5914: --- Fine Ted. I agree. Bulk assign regions in the process of ServerShutdownHandler --- Key: HBASE-5914 URL: https://issues.apache.org/jira/browse/HBASE-5914 Project: HBase Issue Type: Improvement Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: HBASE-5914.patch, HBASE-5914v2.patch In the process of ServerShutdownHandler, we currently assign regions singly. In the large cluster, one regionserver always carried many regions, this action is quite slow. What about using bulk assign regions like cluster start up. In current logic, if we failed assigning many regions to one destination server, we will wait unitl timeout, however in the process of ServerShutdownHandler, we should retry it to another server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5918) Master will block forever when startup if root server died between assign root and assign meta
[ https://issues.apache.org/jira/browse/HBASE-5918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267175#comment-13267175 ] ramkrishna.s.vasudevan commented on HBASE-5918: --- +1 on patch. Master will block forever when startup if root server died between assign root and assign meta -- Key: HBASE-5918 URL: https://issues.apache.org/jira/browse/HBASE-5918 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1 Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-5918.patch, HBASE-5918.patch When master is initializing, if root server died between assign root and assign meta, master will block at HMaster#assignRootAndMeta:{code}assignmentManager.assignMeta(); this.catalogTracker.waitForMeta();{code} because ServerShutdownHandler is disabled, So we should enable ServerShutdownHandler after called assignmentManager.assignMeta(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError
[ https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5922: - Attachment: HBASE-5922.patch Retry HalfStoreFileReader seekBefore causes StackOverflowError Key: HBASE-5922 URL: https://issues.apache.org/jira/browse/HBASE-5922 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.90.0 Environment: HBase 0.90.4 Reporter: Nate Putnam Assignee: Nate Putnam Fix For: 0.90.0 Attachments: HBASE-5922.patch, HBASE-5922.patch Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the underlying store file is a reference and the row key is in the bottom. java.io.IOException: java.io.IOException: java.lang.StackOverflowError at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651) at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.StackOverflowError at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError
[ https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5922: - Status: Patch Available (was: Open) HalfStoreFileReader seekBefore causes StackOverflowError Key: HBASE-5922 URL: https://issues.apache.org/jira/browse/HBASE-5922 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.90.0 Environment: HBase 0.90.4 Reporter: Nate Putnam Assignee: Nate Putnam Fix For: 0.90.0 Attachments: HBASE-5922.patch, HBASE-5922.patch Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the underlying store file is a reference and the row key is in the bottom. java.io.IOException: java.io.IOException: java.lang.StackOverflowError at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651) at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.StackOverflowError at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError
[ https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5922: - Status: Open (was: Patch Available) HalfStoreFileReader seekBefore causes StackOverflowError Key: HBASE-5922 URL: https://issues.apache.org/jira/browse/HBASE-5922 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.90.0 Environment: HBase 0.90.4 Reporter: Nate Putnam Assignee: Nate Putnam Fix For: 0.90.0 Attachments: HBASE-5922.patch, HBASE-5922.patch Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the underlying store file is a reference and the row key is in the bottom. java.io.IOException: java.io.IOException: java.lang.StackOverflowError at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651) at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.StackOverflowError at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5913) Speed up the full scan of META
[ https://issues.apache.org/jira/browse/HBASE-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267178#comment-13267178 ] stack commented on HBASE-5913: -- +1 on patch Speed up the full scan of META -- Key: HBASE-5913 URL: https://issues.apache.org/jira/browse/HBASE-5913 Project: HBase Issue Type: Improvement Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.1 Attachments: 5913-v2.txt, HBASE-5913.patch In the master, we will do the full scan of META in some situations for example, 1.master start up 2.CatalogJanitor do the full scan per 5 mins 3.ServerShutdownHandler, getServerUserRegions for dead server. For the online applications, we should try the best to reduce the process time of ServerShutdownHandler in the situation 3. However, we found MetaReader#getServerUserRegions take 14mins for 10w regions in our production environment. And it is caused by two reasons: The first, we don't use cache and get one row per next() when fully scan .META. The second, hbase.ipc.client.tcpnodelay is false as default, and in our environment it take 40ms for per next() (It is related to the length of row in the .META. , if someone also found, could try to set it true) For this issue, I think we could set the caching when do the full scan of META -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5913) Speed up the full scan of META
[ https://issues.apache.org/jira/browse/HBASE-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267179#comment-13267179 ] Hudson commented on HBASE-5913: --- Integrated in HBase-TRUNK #2840 (See [https://builds.apache.org/job/HBase-TRUNK/2840/]) HBASE-5913 Speed up the full scan of META (Chunhui) (Revision 1333283) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java Speed up the full scan of META -- Key: HBASE-5913 URL: https://issues.apache.org/jira/browse/HBASE-5913 Project: HBase Issue Type: Improvement Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.1 Attachments: 5913-v2.txt, HBASE-5913.patch In the master, we will do the full scan of META in some situations for example, 1.master start up 2.CatalogJanitor do the full scan per 5 mins 3.ServerShutdownHandler, getServerUserRegions for dead server. For the online applications, we should try the best to reduce the process time of ServerShutdownHandler in the situation 3. However, we found MetaReader#getServerUserRegions take 14mins for 10w regions in our production environment. And it is caused by two reasons: The first, we don't use cache and get one row per next() when fully scan .META. The second, hbase.ipc.client.tcpnodelay is false as default, and in our environment it take 40ms for per next() (It is related to the length of row in the .META. , if someone also found, could try to set it true) For this issue, I think we could set the caching when do the full scan of META -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5916: -- Priority: Critical (was: Major) Upped the priority of this defect. While master is coming up {code} 1-Wait for region server to register 2-Get the online server list 3- Start splitting the logs {code} Between step 2 and 3 if another new region server registers, we just split the logs of the new region server and in fact delete the HLog folder for that new region server. This seems critical. While analysing the issue for which this JIRA was created we ended up in this problem. RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5918) Master will block forever when startup if root server died between assign root and assign meta
[ https://issues.apache.org/jira/browse/HBASE-5918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267187#comment-13267187 ] stack commented on HBASE-5918: -- Shouldn't we remove the setting of this flag that happens later in finishInitialization? +1 on the patch otherwise. This is good stuff. Any chance of a test? It looks like it'd be hard to get one in here but it be good if you fellas at least said why a test is hard to squeeze in here to show you at least tried figuring how to test this stuff. The flag disabling shutdown handler was only added recently but here we find an issue w/ it already. {code} Author: Michael Stack st...@apache.org 2012-03-13 08:35:54 Committer: Michael Stack st...@apache.org 2012-03-13 08:35:54 Parent: fbd4bebd5cca129f49e91ec9936f604998a7025a (HBASE-5314 racefully rolling restart region servers in rolling-restart.sh) Child: 59e5460807a1dc0fb5763e4b12dda4be49ef3bb4 (HBASE-5574 DEFAULT_MAX_FILE_SIZE defaults to a negative value) Branches: 094.testfail, 5833trunk, hanging, pbwork, remotes/origin/instant_schema_alter, remotes/origin/trunk, v10, v4, v6 Follows: Precedes: HBASE-5179 Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1300194 13f79535-47bb-0310-9956-ffa450edef68 - src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java - inde {code} Master will block forever when startup if root server died between assign root and assign meta -- Key: HBASE-5918 URL: https://issues.apache.org/jira/browse/HBASE-5918 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1 Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-5918.patch, HBASE-5918.patch When master is initializing, if root server died between assign root and assign meta, master will block at HMaster#assignRootAndMeta:{code}assignmentManager.assignMeta(); this.catalogTracker.waitForMeta();{code} because ServerShutdownHandler is disabled, So we should enable ServerShutdownHandler after called assignmentManager.assignMeta(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5444) Add PB-based calls to HMasterRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267189#comment-13267189 ] stack commented on HBASE-5444: -- @Gregory np Please file the other issues. Waiting on hadoopqa before committing. Good stuff. Add PB-based calls to HMasterRegionInterface Key: HBASE-5444 URL: https://issues.apache.org/jira/browse/HBASE-5444 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Gregory Chanan Attachments: HBASE-5444-v10-trunk.patch, HBASE-5444-v6-trunk.patch, HBASE-5444-v9-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5918) Master will block forever when startup if root server died between assign root and assign meta
[ https://issues.apache.org/jira/browse/HBASE-5918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267192#comment-13267192 ] ramkrishna.s.vasudevan commented on HBASE-5918: --- Even HBASE-5916 is also due to this. I will try on working on a testcase. Master will block forever when startup if root server died between assign root and assign meta -- Key: HBASE-5918 URL: https://issues.apache.org/jira/browse/HBASE-5918 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1 Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-5918.patch, HBASE-5918.patch When master is initializing, if root server died between assign root and assign meta, master will block at HMaster#assignRootAndMeta:{code}assignmentManager.assignMeta(); this.catalogTracker.waitForMeta();{code} because ServerShutdownHandler is disabled, So we should enable ServerShutdownHandler after called assignmentManager.assignMeta(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267198#comment-13267198 ] stack commented on HBASE-5916: -- Tell us more Ram. If new regionserver comes in and we split its logs, so what? It is not carrying any regions, right? We'll assign it regions when done splitting? Or are you talking about a case where a regionserver is just very slow about checking in? RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267205#comment-13267205 ] ramkrishna.s.vasudevan commented on HBASE-5916: --- Yes the RS is slow in checking in. The problem here is the master will do the split of the newly checked in RS as he is the RS that is not there in the online list. {code} SetServerName onlineServers = new HashSetServerName(serverManager .getOnlineServers().keySet()); // TODO: Should do this in background rather than block master startup status.setStatus(Splitting logs after master startup); splitLogAfterStartup(this.fileSystemManager, onlineServers); {code} To split if i find any log folder which is not belonging to any of those in 'onlineServers'(the online list is already got) we will call split log and finally delete the log folder. So though the server is online i will not be able to use the hLog and i get filenotfoundException. Sorry if am not clear. RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267209#comment-13267209 ] stack commented on HBASE-5875: -- The patch looks dodgy -- saying a region is online, if it is root or meta, seems incorrect. bq. Consider the case where my ROOT node is found in RIT. Hence the processRIT will trigger the assignment. What is the above referring to? Which part of the code? bq. It so happened that when i try to verifyRootRegionLocation the root node is created but the OpenRegionHandler has not added the ROOT region in its memory(very very corner case and this happened once while testing). So the verifyRootRegionLocation returns false and hence the master thinks it an server to be expired. Can the master not detect this corner case just by looking at whats in zk? Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 Attachments: HBASE-5875.patch If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5919) Add fixes for Ted's review comments from HBASE-5869
[ https://issues.apache.org/jira/browse/HBASE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267212#comment-13267212 ] Hudson commented on HBASE-5919: --- Integrated in HBase-TRUNK #2841 (See [https://builds.apache.org/job/HBase-TRUNK/2841/]) HBASE-5919 Add fixes for Ted's review comments from HBASE-5869 (Revision 104) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java Add fixes for Ted's review comments from HBASE-5869 --- Key: HBASE-5919 URL: https://issues.apache.org/jira/browse/HBASE-5919 Project: HBase Issue Type: Bug Reporter: stack Assignee: Ted Yu Priority: Blocker Attachments: 5919-v2.txt, 5919-v4.txt, 5919.txt I missed addressing a few of Ted's comments on the end of my navigating HBASE-5869 commit. Fix here. Make it a blocker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb
[ https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267213#comment-13267213 ] Hudson commented on HBASE-5869: --- Integrated in HBase-TRUNK #2841 (See [https://builds.apache.org/job/HBase-TRUNK/2841/]) HBASE-5919 Add fixes for Ted's review comments from HBASE-5869 (Revision 104) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb - Key: HBASE-5869 URL: https://issues.apache.org/jira/browse/HBASE-5869 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.96.0 Attachments: 5869v7.txt, 5869v8.txt, 5869v9.txt, firstcut.txt, secondcut.txt, v10.txt, v11.txt, v12.txt, v13.txt, v13.txt, v4.txt, v5.txt, v6.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267214#comment-13267214 ] stack commented on HBASE-5916: -- bq. So though the server is online i will not be able to use the hLog and i get filenotfoundException. Master should not take over hlogs that were created at about same time as master start and that have no content in them? Should we check for regionserver znodes and if present, wait a little longer? Give them another chance? RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4990) Document secure HBase setup
[ https://issues.apache.org/jira/browse/HBASE-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267219#comment-13267219 ] Hudson commented on HBASE-4990: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-4990 Document secure HBase setup (Revision 1333212) Result = SUCCESS stack : Files : * /hbase/trunk/src/docbkx/book.xml * /hbase/trunk/src/docbkx/security.xml Document secure HBase setup --- Key: HBASE-4990 URL: https://issues.apache.org/jira/browse/HBASE-4990 Project: HBase Issue Type: Sub-task Affects Versions: 0.92.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.96.0 Attachments: 4990.txt, 4990v2.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5840) Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status
[ https://issues.apache.org/jira/browse/HBASE-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267221#comment-13267221 ] Hudson commented on HBASE-5840: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-5840 Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status (RajeshBabu) (Revision 1333124) HBASE-5548 Add ability to get a table in the shell; BACKING OUT MISTAKEN CO-COMMIT OF HBASE-5840 (Revision 1333123) Result = SUCCESS ramkrishna : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status -- Key: HBASE-5840 URL: https://issues.apache.org/jira/browse/HBASE-5840 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5840.patch, HBASE-5840_trunk.patch, HBASE-5840_v2.patch TaskMonitor Status will not be cleared in case Regions FAILED_OPEN. This will keeps showing old status. This will miss leads the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1996) Configure scanner buffer in bytes instead of number of rows
[ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267220#comment-13267220 ] Hudson commented on HBASE-1996: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-2214 Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly (Ferdy Galema) (Revision 1333122) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/RPCProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServer.java * /hbase/trunk/src/main/protobuf/Client.proto * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java Configure scanner buffer in bytes instead of number of rows --- Key: HBASE-1996 URL: https://issues.apache.org/jira/browse/HBASE-1996 Project: HBase Issue Type: Improvement Reporter: Dave Latham Assignee: Dave Latham Fix For: 0.90.0 Attachments: 1966.patch, 1996-0.20.3-v2.patch, 1996-0.20.3-v3.patch, 1996-0.20.3.patch Currently, the default scanner fetches a single row at a time. This makes for very slow scans on tables where the rows are not large. You can change the setting for an HTable instance or for each Scan. It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot. Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows. Let's change the setting so that it works with a size in bytes, rather than in rows. This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM. Note that the case is very similar to table writes as well. When disabling auto flush, we buffer a list of Put's to commit at once. That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush. If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows. Changing the scan buffer to be configured like the write buffer will make it more consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5919) Add fixes for Ted's review comments from HBASE-5869
[ https://issues.apache.org/jira/browse/HBASE-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267223#comment-13267223 ] Hudson commented on HBASE-5919: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-5919 Add fixes for Ted's review comments from HBASE-5869 (Revision 104) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java Add fixes for Ted's review comments from HBASE-5869 --- Key: HBASE-5919 URL: https://issues.apache.org/jira/browse/HBASE-5919 Project: HBase Issue Type: Bug Reporter: stack Assignee: Ted Yu Priority: Blocker Attachments: 5919-v2.txt, 5919-v4.txt, 5919.txt I missed addressing a few of Ted's comments on the end of my navigating HBASE-5869 commit. Fix here. Make it a blocker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5548) Add ability to get a table in the shell
[ https://issues.apache.org/jira/browse/HBASE-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267222#comment-13267222 ] Hudson commented on HBASE-5548: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-5548 Add ability to get a table in the shell; BACKING OUT MISTAKEN CO-COMMIT OF HBASE-5840 (Revision 1333123) Result = SUCCESS stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java Add ability to get a table in the shell --- Key: HBASE-5548 URL: https://issues.apache.org/jira/browse/HBASE-5548 Project: HBase Issue Type: Improvement Components: shell Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: ruby_HBASE-5528-v0.patch, ruby_HBASE-5548-addendum.patch, ruby_HBASE-5548-v1.patch, ruby_HBASE-5548-v2.patch, ruby_HBASE-5548-v3.patch, ruby_HBASE-5548-v5.patch Currently, all the commands that operate on a table in the shell first have to take the table as name as input. There are two main considerations: * It is annoying to have to write the table name every time, when you should just be able to get a reference to a table * the current implementation is very wasteful - it creates a new HTable for each call (but reuses the connection since it uses the same configuration) We should be able to get a handle to a single HTable and then operate on that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5913) Speed up the full scan of META
[ https://issues.apache.org/jira/browse/HBASE-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267225#comment-13267225 ] Hudson commented on HBASE-5913: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-5913 Speed up the full scan of META (Chunhui) (Revision 1333283) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java Speed up the full scan of META -- Key: HBASE-5913 URL: https://issues.apache.org/jira/browse/HBASE-5913 Project: HBase Issue Type: Improvement Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.1 Attachments: 5913-v2.txt, HBASE-5913.patch In the master, we will do the full scan of META in some situations for example, 1.master start up 2.CatalogJanitor do the full scan per 5 mins 3.ServerShutdownHandler, getServerUserRegions for dead server. For the online applications, we should try the best to reduce the process time of ServerShutdownHandler in the situation 3. However, we found MetaReader#getServerUserRegions take 14mins for 10w regions in our production environment. And it is caused by two reasons: The first, we don't use cache and get one row per next() when fully scan .META. The second, hbase.ipc.client.tcpnodelay is false as default, and in our environment it take 40ms for per next() (It is related to the length of row in the .META. , if someone also found, could try to set it true) For this issue, I think we could set the caching when do the full scan of META -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb
[ https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267224#comment-13267224 ] Hudson commented on HBASE-5869: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-5919 Add fixes for Ted's review comments from HBASE-5869 (Revision 104) HBASE-5869 Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb (Revision 1333099) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Bytes.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/DeserializationException.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/EmptyWatcher.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HBaseException.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/RegionTransition.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ServerName.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/SplitLogCounters.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/SplitLogTask.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/RegionTransitionData.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/EmptyWatcher.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterAddressTracker.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java * /hbase/trunk/src/main/protobuf/ZooKeeper.proto * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestSerialization.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/Mocking.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestOpenRegionHandler.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb - Key: HBASE-5869 URL: https://issues.apache.org/jira/browse/HBASE-5869 Project: HBase Issue Type: Task Reporter: stack Assignee
[jira] [Commented] (HBASE-5625) Avoid byte buffer allocations when reading a value from a Result object
[ https://issues.apache.org/jira/browse/HBASE-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267226#comment-13267226 ] Hudson commented on HBASE-5625: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-5625 Avoid byte buffer allocations when reading a value from a Result object (Tudor Scurtu) (Revision 1333159) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Result.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestKeyValue.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestResult.java Avoid byte buffer allocations when reading a value from a Result object --- Key: HBASE-5625 URL: https://issues.apache.org/jira/browse/HBASE-5625 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.92.1 Reporter: Tudor Scurtu Assignee: Tudor Scurtu Labels: patch Fix For: 0.96.0 Attachments: 5625.txt, 5625v2.txt, 5625v3.txt, 5625v4.txt, 5625v5.txt, 5625v6.txt, 5625v7.txt, 5625v8.txt When calling Result.getValue(), an extra dummy KeyValue and its associated underlying byte array are allocated, as well as a persistent buffer that will contain the returned value. These can be avoided by reusing a static array for the dummy object and by passing a ByteBuffer object as a value destination buffer to the read method. The current functionality is maintained, and we have added a separate method call stack that employs the described changes. I will provide more details with the patch. Running tests with a profiler, the reduction of read time seems to be of up to 40%. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2214) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly
[ https://issues.apache.org/jira/browse/HBASE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267227#comment-13267227 ] Hudson commented on HBASE-2214: --- Integrated in HBase-TRUNK-security #190 (See [https://builds.apache.org/job/HBase-TRUNK-security/190/]) HBASE-2214 Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly (Ferdy Galema) (Revision 1333122) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/RPCProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServer.java * /hbase/trunk/src/main/protobuf/Client.proto * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly - Key: HBASE-2214 URL: https://issues.apache.org/jira/browse/HBASE-2214 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Ferdy Galema Fix For: 0.96.0, 0.94.1 Attachments: HBASE-2214-0.94-v2.txt, HBASE-2214-0.94-v3.txt, HBASE-2214-0.94.txt, HBASE-2214-v4.txt, HBASE-2214-v5.txt, HBASE-2214-v6.txt, HBASE-2214-v7.txt, HBASE-2214_with_broken_TestShell.txt The notion that you set size rather than row count specifying how many rows a scanner should return in each cycle was raised over in hbase-1966. Its a good one making hbase regular though the data under it may vary. HBase-1966 was committed but the patch was constrained by the fact that it needed to not change RPC interface. This issue is about doing hbase-1966 for 0.21 in a clean, unconstrained way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5922) HalfStoreFileReader seekBefore causes StackOverflowError
[ https://issues.apache.org/jira/browse/HBASE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267228#comment-13267228 ] stack commented on HBASE-5922: -- @Anoop Its late, but what you say makes sense. Looking over in our half file test, TestHalfStoreFileReader, it seems pretty poor coverage. What do you think? It does not seem to test the boundary condition Nate ran into or that you reason above? HalfStoreFileReader seekBefore causes StackOverflowError Key: HBASE-5922 URL: https://issues.apache.org/jira/browse/HBASE-5922 Project: HBase Issue Type: Bug Components: client, io Affects Versions: 0.90.0 Environment: HBase 0.90.4 Reporter: Nate Putnam Assignee: Nate Putnam Fix For: 0.90.0 Attachments: HBASE-5922.patch, HBASE-5922.patch Calling HRegionServer.getClosestRowBefore() can cause a stack overflow if the underlying store file is a reference and the row key is in the bottom. java.io.IOException: java.io.IOException: java.lang.StackOverflowError at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:990) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:978) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1651) at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.StackOverflowError at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:147) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:149) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267232#comment-13267232 ] stack commented on HBASE-5916: -- Well, thats useful, right? Its useful in case where a regionserver crashes and a new one comes up fast, before the original regionserver's znode has expired in zk. We shouldn't remove it. On startup, you should not get this exception unless you have a condition like that described above where there was a regionserver on same host and port registered previously in the master and then a new regionserver comes in w/ same host and port but with different startcode? RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5444) Add PB-based calls to HMasterRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5444: - Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk. Thanks for the patch Gregory. Add PB-based calls to HMasterRegionInterface Key: HBASE-5444 URL: https://issues.apache.org/jira/browse/HBASE-5444 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Gregory Chanan Fix For: 0.96.0 Attachments: HBASE-5444-v10-trunk.patch, HBASE-5444-v6-trunk.patch, HBASE-5444-v9-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5918) Master will block forever when startup if root server died between assign root and assign meta
[ https://issues.apache.org/jira/browse/HBASE-5918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267243#comment-13267243 ] stack commented on HBASE-5918: -- @Chunhui I suppose so Its just odd have flags set in such different locations... It makes the tracking of stuff difficult. At a minimum I'd think we'd make a single method that set the flag and then did the call to expireDeadNotExpiredServers so they are grouped... What about the call to expireDeadNotExpiredServers that is done twice? On first call, we'd process possibly the server that was carrying root. What happens when we call it again later out in finishInitialization? Could we end up processing same server twice at all? Thanks. Master will block forever when startup if root server died between assign root and assign meta -- Key: HBASE-5918 URL: https://issues.apache.org/jira/browse/HBASE-5918 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1 Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-5918.patch, HBASE-5918.patch When master is initializing, if root server died between assign root and assign meta, master will block at HMaster#assignRootAndMeta:{code}assignmentManager.assignMeta(); this.catalogTracker.waitForMeta();{code} because ServerShutdownHandler is disabled, So we should enable ServerShutdownHandler after called assignmentManager.assignMeta(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
[ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267253#comment-13267253 ] ramkrishna.s.vasudevan commented on HBASE-5916: --- We should not remove PleaseHoldException(message) directly. {code} if (services.isServerShutdownHandlerEnabled()) { // master has completed the initialization throw new PleaseHoldException(message); } {code} This solved the actual problem. But the problem due to filenotfoundException should be addressed in a different way. RS restart just before master intialization we make the cluster non operative - Key: HBASE-5916 URL: https://issues.apache.org/jira/browse/HBASE-5916 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.94.1 Consider a case where my master is getting restarted. RS that was alive when the master restart started, gets restarted before the master initializes the ServerShutDownHandler. {code} serverShutdownHandlerEnabled = true; {code} In this case when the RS tries to register with the master, the master will try to expire the server but the server cannot be expired as still the serverShutdownHandler is not enabled. This case may happen when i have only one RS gets restarted or all the RS gets restarted at the same time.(before assignRootandMeta). {code} LOG.info(message); if (existingServer.getStartcode() serverName.getStartcode()) { LOG.info(Triggering server recovery; existingServer + existingServer + looks stale, new server: + serverName); expireServer(existingServer); } {code} If another RS is brought up then the cluster comes back to normalcy. May be a very corner case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5913) Speed up the full scan of META
[ https://issues.apache.org/jira/browse/HBASE-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267263#comment-13267263 ] Hudson commented on HBASE-5913: --- Integrated in HBase-0.94 #174 (See [https://builds.apache.org/job/HBase-0.94/174/]) HBASE-5913 Speed up the full scan of META (Revision 115) Result = SUCCESS larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java Speed up the full scan of META -- Key: HBASE-5913 URL: https://issues.apache.org/jira/browse/HBASE-5913 Project: HBase Issue Type: Improvement Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.1 Attachments: 5913-v2.txt, HBASE-5913.patch In the master, we will do the full scan of META in some situations for example, 1.master start up 2.CatalogJanitor do the full scan per 5 mins 3.ServerShutdownHandler, getServerUserRegions for dead server. For the online applications, we should try the best to reduce the process time of ServerShutdownHandler in the situation 3. However, we found MetaReader#getServerUserRegions take 14mins for 10w regions in our production environment. And it is caused by two reasons: The first, we don't use cache and get one row per next() when fully scan .META. The second, hbase.ipc.client.tcpnodelay is false as default, and in our environment it take 40ms for per next() (It is related to the length of row in the .META. , if someone also found, could try to set it true) For this issue, I think we could set the caching when do the full scan of META -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5444) Add PB-based calls to HMasterRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267284#comment-13267284 ] Hudson commented on HBASE-5444: --- Integrated in HBase-TRUNK #2842 (See [https://builds.apache.org/job/HBase-TRUNK/2842/]) HBASE-5444 Add PB-based calls to HMasterRegionInterface (Revision 119) Result = FAILURE stack : Files : * /hbase/trunk/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/MasterStatusTmpl.jamon * /hbase/trunk/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ServerLoad.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HMasterRegionInterface.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/RegionServerStatusProtocol.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MXBean.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MXBeanImpl.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterDumpServlet.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/RegionServerStatusProtos.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/protobuf/RegionServerStatus.proto * /hbase/trunk/src/main/protobuf/hbase.proto * /hbase/trunk/src/main/resources/hbase-webapps/master/table.jsp * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMXBean.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestServerCustomProtocol.java Add PB-based calls to HMasterRegionInterface Key: HBASE-5444 URL: https://issues.apache.org/jira/browse/HBASE-5444 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Reporter: Todd Lipcon Assignee: Gregory Chanan Fix For: 0.96.0 Attachments: HBASE-5444-v10-trunk.patch, HBASE-5444-v6-trunk.patch, HBASE-5444-v9-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5844: --- Attachment: 5844.v3.patch Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 5844.v3.patch, 5844.v4.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5844: --- Status: Open (was: Patch Available) Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 5844.v3.patch, 5844.v4.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5844: --- Attachment: 5844.v4.patch Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 5844.v3.patch, 5844.v4.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5844: --- Status: Patch Available (was: Reopened) Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 5844.v3.patch, 5844.v4.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5844: --- Status: Patch Available (was: Open) Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 5844.v3.patch, 5844.v4.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5844) Delete the region servers znode after a regions server crash
[ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267332#comment-13267332 ] nkeywal commented on HBASE-5844: v4 should be ok. I will do another jira for the master. Delete the region servers znode after a regions server crash Key: HBASE-5844 URL: https://issues.apache.org/jira/browse/HBASE-5844 Project: HBase Issue Type: Improvement Components: regionserver, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 5844.v3.patch, 5844.v4.patch today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s. By deleting the znode in start script, we remove this delay and the recovery starts immediately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
nkeywal created HBASE-5924: -- Summary: In the client code, don't wait for all the requests to be executed before resubmitting a request in error. Key: HBASE-5924 URL: https://issues.apache.org/jira/browse/HBASE-5924 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor The client (in the function HConnectionManager#processBatchCallback) works in two steps: - make the requests - collect the failures and successes and prepare for retry It means that when there is an immediate error (region moved, split, dead server, ...) we still wait for all the initial requests to be executed before submitting again the failed request. If we have a scenario with all the requests taking 5 seconds we have a final execution time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s. We could improve this by analyzing immediately the results. This would lead us, for the scenario mentioned above, to 6 seconds. So we could have a performance improvement of nearly 50% in many cases, and much more than 50% if the request execution time is different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5902) Some scripts are not executable
[ https://issues.apache.org/jira/browse/HBASE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5902: --- Status: Patch Available (was: Open) Some scripts are not executable --- Key: HBASE-5902 URL: https://issues.apache.org/jira/browse/HBASE-5902 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Attachments: 5902.v1.patch -rw-rw-r-- graceful_stop.sh -rw-rw-r-- hbase-config.sh -rw-rw-r-- local-master-backup.sh -rw-rw-r-- local-regionservers.sh -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267392#comment-13267392 ] ramkrishna.s.vasudevan edited comment on HBASE-5875 at 5/3/12 12:44 PM: bq.What is the above referring to? Which part of the code? In assignRootAndMeta() {code} boolean rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); {code} bq.Can the master not detect this corner case just by looking at whats in zk? Here zk you mean the RS node or the ROOT region node? was (Author: ram_krish): bq.What is the above referring to? Which part of the code? In assignRootAndMeta() {code} boolean rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); {code} Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 Attachments: HBASE-5875.patch If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267392#comment-13267392 ] ramkrishna.s.vasudevan commented on HBASE-5875: --- bq.What is the above referring to? Which part of the code? In assignRootAndMeta() {code} boolean rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); {code} Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 Attachments: HBASE-5875.patch If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5926) Delete the master znode after a znode crash
nkeywal created HBASE-5926: -- Summary: Delete the master znode after a znode crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script delete the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be delete soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a znode crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Description: This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. was: This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script delete the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be delete soon enough), but it can happen. Delete the master znode after a znode crash --- Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash
[ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5926: --- Summary: Delete the master znode after a master crash (was: Delete the master znode after a znode crash) Delete the master znode after a master crash Key: HBASE-5926 URL: https://issues.apache.org/jira/browse/HBASE-5926 Project: HBase Issue Type: Improvement Components: master, scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor This is the continuation of the work done in HBASE-5844. But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master backup master there is a single znode for both. So if we apply the same strategy as for a regionserver, we may have this scenario: 1) Master starts 2) Backup master starts 3) Master dies 4) ZK detects it 5) Backup master receives the update from ZK 6) Backup master creates the new master node and become the main master 7) Previous master script continues 8) Previous master script deletes the master node in ZK 9) = issue: we deleted the node just created by the new master This should not happen often (usually the znode will be deleted soon enough), but it can happen. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267521#comment-13267521 ] ramkrishna.s.vasudevan commented on HBASE-5875: --- I have reproduced the scenario addressing the title of the JIRA with a testcase. I have tried follow a approach that Bijieshan had suggested in https://issues.apache.org/jira/browse/HBASE-5875?focusedCommentId=13264874page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13264874 to solve the problem. Tomorrow i can upload the testcase. Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 Attachments: HBASE-5875.patch If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5923) Cleanup checkAndXXX logic
[ https://issues.apache.org/jira/browse/HBASE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267568#comment-13267568 ] stack commented on HBASE-5923: -- This patch is great. Thanks for going back and doing the cleanup. This class should not be in filter package? +import org.apache.hadoop.hbase.filter.WritableByteArrayComparable; Probably hard to move it now? Its part of a public API? Could deprecate and replace w/ a more generic, non-filter specific class? Moving it should not be part of this patch. Its not so bad anyways having this filter package pollution since its in client facing code and clients need access to filter stuff... Would think pollution: +import org.apache.hadoop.hbase.protobuf.generated.ClientProtos.Condition.CompareType; Should be pulling in a non-pb class into an Interface like this. Can we encapsulate these Client conditions in a non-pb class? Cleanup checkAndXXX logic - Key: HBASE-5923 URL: https://issues.apache.org/jira/browse/HBASE-5923 Project: HBase Issue Type: Improvement Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.1 Attachments: 5923-trunk.txt 1. the checkAnd{Put|Delete} method that takes a CompareOP is not exposed via HTable[Interface]. 2. there is unnecessary duplicate code in the check{Put|Delete} code in HRegionServer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5923) Cleanup checkAndXXX logic
[ https://issues.apache.org/jira/browse/HBASE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267593#comment-13267593 ] stack commented on HBASE-5923: -- bq. That also means that is 0.94 there would be a dependency on CompareFilter in HTableInterface. Thats better than a generated pb dependency IMO. If you'd like, I can make it so you can do same or similar in trunk: i.e. not have to import generated pb but rather the filter.CompareFilter or some such similar class? Just say. Cleanup checkAndXXX logic - Key: HBASE-5923 URL: https://issues.apache.org/jira/browse/HBASE-5923 Project: HBase Issue Type: Improvement Components: client, regionserver Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.1 Attachments: 5923-0.94.txt, 5923-trunk.txt 1. the checkAnd{Put|Delete} method that takes a CompareOP is not exposed via HTable[Interface]. 2. there is unnecessary duplicate code in the check{Put|Delete} code in HRegionServer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267624#comment-13267624 ] Hudson commented on HBASE-5883: --- Integrated in HBase-TRUNK #2843 (See [https://builds.apache.org/job/HBase-TRUNK/2843/]) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333530) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, HBASE-5883-94.patch, HBASE-5883-trunk.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more
[jira] [Commented] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267633#comment-13267633 ] Hudson commented on HBASE-5883: --- Integrated in HBase-0.94 #175 (See [https://builds.apache.org/job/HBase-0.94/175/]) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333533) Result = FAILURE tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, HBASE-5883-94.patch, HBASE-5883-trunk.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
[jira] [Updated] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-5494: --- Attachment: D2997.3.patch avf requested code review of [jira] [HBASE-5494] [89-fb] Table-level locks for schema changing operations.. Reviewers: Kannan, mbautin, Liyin, JIRA Since concurrent modification (e.g., disabling and dropping a table under creation) could leave a cluster in an inconsistent state, we need table-level locks for schema changing operations. A ZooKeeper-based distributed lock has been implemented that attempts to create a persistent ZNode (one ZNode per entity being locked, i.e., one per table) if one does not exist. Currently in case a master crashes while holding the lock, the lock must be manually removed using the ZooKeeper command line (locks being stored in /hbase/tableLock/). The locks implemented are not fair or re-entrant. RecoverableZooKeeper is used to correctly handle connection loss. To test the locks, InjectionHandler and InjectionEvent have been introduced, allowing for injection of arbitrary events, in this case adding delays during schema changing operations as to induce a race condition. Future work involves automatically deleting stale lock ZNodes upon server recovery (providing the attempted operations are not resumed), adding metrics around locks (e.g., list all locks held). TEST PLAN Since concurrent modification (e.g., disabling and dropping a table under creation) could leave a cluster in an inconsistent state, we need table-level locks for schema changing operations. A ZooKeeper-based distributed lock has been implemented that attempts to create a persistent ZNode (one ZNode per entity being locked, i.e., one per table) if one does not exist. Currently in case a master crashes while holding the lock, the lock must be manually removed using the ZooKeeper command line (locks being stored in /hbase/tableLock/). The locks implemented are not fair or re-entrant. RecoverableZooKeeper is used to correctly handle connection loss. To test the locks, InjectionHandler and InjectionEvent have been introduced, allowing for injection of arbitrary events, in this case adding delays during schema changing operations as to induce a race condition. Future work involves automatically deleting stale lock ZNodes upon server recovery (providing the attempted operations are not resumed), adding metrics around locks (e.g., list all locks held). REVISION DETAIL https://reviews.facebook.net/D2997 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/TableLockTimeoutException.java src/main/java/org/apache/hadoop/hbase/master/HMaster.java src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java src/main/java/org/apache/hadoop/hbase/util/InjectionEvent.java src/main/java/org/apache/hadoop/hbase/util/InjectionHandler.java src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java src/test/java/org/apache/hadoop/hbase/master/TestSchemaModificationLocks.java src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java src/test/java/org/apache/hadoop/hbase/zookeeper/TestDistributedLock.java Introduce a zk hosted table-wide read/write lock so only one table operation at a time -- Key: HBASE-5494 URL: https://issues.apache.org/jira/browse/HBASE-5494 Project: HBase Issue Type: Improvement Reporter: stack Attachments: D2997.3.patch I saw this facility over in the accumulo code base. Currently we just try to sort out the mess when splits come in during an online schema edit; somehow we figure we can figure all possible region transition combinations and make the right call. We could try and narrow the number of combinations by taking out a zk table lock when doing table operations. For example, on split or merge, we could take a read-only lock meaning the table can't be disabled while these are running. We could then take a write only lock if we want to ensure the table doesn't change while disabling or enabling process is happening. Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267689#comment-13267689 ] Phabricator commented on HBASE-5494: tedyu has commented on the revision [jira] [HBASE-5494] [89-fb] Table-level locks for schema changing operations.. I only reviewed part of the patch. Would this feature be refined in 0.89-fb branch before being ported to Apache HBase trunk ? INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/HConstants.java:98 Schema changes would always involve master. 'master.' can be omitted. src/main/java/org/apache/hadoop/hbase/HConstants.java:108 Is this value big enough in cluster testing ? src/main/java/org/apache/hadoop/hbase/TableLockTimeoutException.java:2 No year is needed. src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1353 This lock is used to prevent two concurrent table creation attempts. tryLockTable() is more desirable here. src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1310 Can we add tryLockTable() ? It would be useful for the non-winning thread to exit quickly. src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:2 No year, please. src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:47 Should Bytes.toStringBinary() be used here ? src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:53 Add 'be ' before 'released' src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:137 What if lock release fails ? src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:27 Can you tell me which zookeeper branch provides this lock ? In http://svn.apache.org/repos/asf/zookeeper/trunk, I don't seem to find this class. REVISION DETAIL https://reviews.facebook.net/D2997 Introduce a zk hosted table-wide read/write lock so only one table operation at a time -- Key: HBASE-5494 URL: https://issues.apache.org/jira/browse/HBASE-5494 Project: HBase Issue Type: Improvement Reporter: stack Attachments: D2997.3.patch I saw this facility over in the accumulo code base. Currently we just try to sort out the mess when splits come in during an online schema edit; somehow we figure we can figure all possible region transition combinations and make the right call. We could try and narrow the number of combinations by taking out a zk table lock when doing table operations. For example, on split or merge, we could take a read-only lock meaning the table can't be disabled while these are running. We could then take a write only lock if we want to ensure the table doesn't change while disabling or enabling process is happening. Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267714#comment-13267714 ] Hudson commented on HBASE-5883: --- Integrated in HBase-0.92 #396 (See [https://builds.apache.org/job/HBase-0.92/396/]) HBASE-5883 Backup master is going down due to connection refused exception (Jieshan) (Revision 1333537) Result = FAILURE tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, HBASE-5883-94.patch, HBASE-5883-trunk.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure
[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267728#comment-13267728 ] Phabricator commented on HBASE-5494: avf has commented on the revision [jira] [HBASE-5494] [89-fb] Table-level locks for schema changing operations.. Thanks for the inline comments, @tedyu -- I've replied to a few quick ones inline. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:27 DistributedLock is implemented as part of the patch (see DistributedLock.java) src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:47 Metadata for table level locks is stored as plain text -- this is to allow operations to view lock information from the zookeeper CLI: toStringBinary() would not be needed here. src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:137 In this case, an IOException is thrown up to the caller: this is to indicate a non-recoverable ZooKeeper error (DistributedLock uses RecoverableZooKeeper class under the covers). .release() may also throw an IllegalStateException -- but this is essentially used an assertion in this case (releasing a lock that isn't held). REVISION DETAIL https://reviews.facebook.net/D2997 Introduce a zk hosted table-wide read/write lock so only one table operation at a time -- Key: HBASE-5494 URL: https://issues.apache.org/jira/browse/HBASE-5494 Project: HBase Issue Type: Improvement Reporter: stack Attachments: D2997.3.patch I saw this facility over in the accumulo code base. Currently we just try to sort out the mess when splits come in during an online schema edit; somehow we figure we can figure all possible region transition combinations and make the right call. We could try and narrow the number of combinations by taking out a zk table lock when doing table operations. For example, on split or merge, we could take a read-only lock meaning the table can't be disabled while these are running. We could then take a write only lock if we want to ensure the table doesn't change while disabling or enabling process is happening. Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267743#comment-13267743 ] stack commented on HBASE-5883: -- Can't we at least check the message to ensure its what we expect? (See the second catch below where we look for connection reset). Can we be sure what comes up here is the ConnectException we set down in HBaseRPC? {code} + if (ioe instanceof ConnectException) { +// Catch. Connect refused. {code} This redoing of an exception seems problematic. Its really necessary? {code} +} else if (ioex.getMessage().toLowerCase() +.contains(connection refused)) { + ce = new ConnectException(ioex.getMessage()); + ioe = ce; {code} I'd feel better about this fix if we could figure where the exception came from (Its not from the rpc stringifying of exceptions to pass them from server to client? Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5883-90.patch, HBASE-5883-92.patch, HBASE-5883-94.patch, HBASE-5883-trunk.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO
[jira] [Commented] (HBASE-5928) Hbck shouldn't npe when there are no tables.
[ https://issues.apache.org/jira/browse/HBASE-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267744#comment-13267744 ] stack commented on HBASE-5928: -- +1 on patch. Jon Hsieh? Hbck shouldn't npe when there are no tables. Key: HBASE-5928 URL: https://issues.apache.org/jira/browse/HBASE-5928 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Priority: Minor Attachments: HBASE-5928-0.patch hbase fsck errors out when there are no tables. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hbase.util.HBaseFsck.reportTablesInFlux(HBaseFsck.java:560) at org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:346) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267751#comment-13267751 ] nkeywal commented on HBASE-5877: v12, should be final. 1) ServerName is used everywhere in the interface, thanks to protobuf 2) hadoop.ipc serialization of exception is based on the #getMessage. So we have to parse it internally. It's not visisble to the exception user. 3) The code to manage the error in the client package is quite complex. We have the exception at the very beginning, and then it's checked again, but we don't have the real exception anymore. I used a new historyList to make it works. There is another JIRA for other improvement, in which I could get rid of this (HBASE-5924) 4) Generated with protobuf 2.4.1 5) The destination is the closeRegion interface is a kind of interface hijacking. Other options would be: - sharing the region state in zookeeper - letting the regionserver calls the master to get the new server. On paper this would be more efficient than a client - master call. In both cases we could consider that the client should not connect to the master except for cluster administration (create table, split regin; ...). That would increase global reliability. That's for another discussion as well I think. 6) RegionServerServices has been modified to set a destination when removing a region from the online regions. 7) In another JIRA I will manage the case when the destination is not specified when calling the move function. When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5877: --- Attachment: 5877.v12.patch When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5877: --- Status: Open (was: Patch Available) When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5877: --- Status: Patch Available (was: Open) When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5929) HBaseAdmin.majorCompact and hbase shell randomly throw exceptions when asked to majorcompact regions.
[ https://issues.apache.org/jira/browse/HBASE-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267753#comment-13267753 ] stack commented on HBASE-5929: -- This seems uninterpretable as table name or region name 'org.apache.hadoop.hbase.TableNotFoundException: ROOT,,0'... I'd have expected it to be -ROOT-,,0 if hbase was to have any chance? Is this coming in via jruby mighty Aravind? Does 'ad_daily,49842:2009-07-10,1269763588508.1997607018' exist on the cluster? (I know I should look myself). HBaseAdmin.majorCompact and hbase shell randomly throw exceptions when asked to majorcompact regions. - Key: HBASE-5929 URL: https://issues.apache.org/jira/browse/HBASE-5929 Project: HBase Issue Type: Bug Components: client, shell Affects Versions: 0.92.1 Environment: Linux Ubuntu Lucid 64bit Reporter: Aravind Gottipati Priority: Minor I have been noticing that calls to HBaseAdmin.majorCompact throws exceptions randomly for some regions. I could not find a pattern to these exception. The code I have simply does this admin.majorCompact(region.getRegionNameAsString()). admin is an instance of HBaseAdmin and region is an instance of HRegionInfo. The exception I get is org.apache.hadoop.hbase.TableNotFoundException: -ROOT-,,0 at org.apache.hadoop.hbase.client.HBaseAdmin.tableNameString(HBaseAdmin.java:1473) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1235) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.majorCompact(HBaseAdmin.java:1209) ~[hbase-0.92.1.jar:0.92.1] at com.stumbleupon.hbaseadmin.HBaseCompact.compactAllServers(Unknown Source) [hbase_compact.jar:na] In this case it's the root region, but I get similar exceptions for other tables, like this. 2012-05-03 19:03:42,994 WARN [main] HBaseCompact: Could not compact: org.apache.hadoop.hbase.TableNotFoundException: ad_daily,49842:2009-07-10,1269763588508.1997607018 at org.apache.hadoop.hbase.client.HBaseAdmin.tableNameString(HBaseAdmin.java:1473) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.compact(HBaseAdmin.java:1235) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.majorCompact(HBaseAdmin.java:1209) ~[hbase-0.92.1.jar:0.92.1] at org.apache.hadoop.hbase.client.HBaseAdmin.majorCompact(HBaseAdmin.java:1196) ~[hbase-0.92.1.jar:0.92.1] at com.stumbleupon.hbaseadmin.HBaseCompact.compactAllServers(Unknown Source) [hbase_compact.jar:na] at com.stumbleupon.hbaseadmin.HBaseCompact.main(Unknown Source) [hbase_compact.jar:na] I see this on hbase shell as well. However, I don't see these exceptions if I use admin.majorCompact(region.getRegionName()), so it looks like something gets lost when I use getRegionNameAsString(). Let me know if I can provide more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5931) HBase security profile doesn't compile
[ https://issues.apache.org/jira/browse/HBASE-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5931: - Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk. Thanks for the patch Jimmy HBase security profile doesn't compile --- Key: HBASE-5931 URL: https://issues.apache.org/jira/browse/HBASE-5931 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5931.patch The compilation is broken. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267771#comment-13267771 ] Phabricator commented on HBASE-5494: tedyu has commented on the revision [jira] [HBASE-5494] [89-fb] Table-level locks for schema changing operations.. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:137 Is there a chance that acquiredTableLocks is out of sync with the lock status up in zk ? src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:87 Add a space after 'wait' src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:104 Do we really need this exception ? src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:111 Please include lockZNodeVersion here. src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:180 fullyQualifiedZNode should be included. src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:292 Please include fullyQualifiedZNode src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java:1310 Please make the second part of the sentence correct src/main/java/org/apache/hadoop/hbase/util/InjectionHandler.java:144 Can this method be made non-public ? REVISION DETAIL https://reviews.facebook.net/D2997 Introduce a zk hosted table-wide read/write lock so only one table operation at a time -- Key: HBASE-5494 URL: https://issues.apache.org/jira/browse/HBASE-5494 Project: HBase Issue Type: Improvement Reporter: stack Attachments: D2997.3.patch I saw this facility over in the accumulo code base. Currently we just try to sort out the mess when splits come in during an online schema edit; somehow we figure we can figure all possible region transition combinations and make the right call. We could try and narrow the number of combinations by taking out a zk table lock when doing table operations. For example, on split or merge, we could take a read-only lock meaning the table can't be disabled while these are running. We could then take a write only lock if we want to ensure the table doesn't change while disabling or enabling process is happening. Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5877) When a query fails because the region has moved, let the regionserver return the new address to the client
[ https://issues.apache.org/jira/browse/HBASE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267787#comment-13267787 ] stack commented on HBASE-5877: -- You don't want to have RegionMovedException carry a ServerName#toString instead of host and port? Or it doesn't make sense when our cached region exceptions are keyed by hostname+port only? Is this a bug fix? {code} @@ -1910,6 +1989,7 @@ public class HConnectionManager { } } catch (ExecutionException e) { LOG.debug(Failed all from + loc, e); +updateCachedLocations(updateHistory, loc, e); {code} Put the history of moved regions out into its own class? Don't presize this I'd say: + private static final long TIMEOUT_REGION_MOVED = (2L * 60L * 1000L); Stuff is lazily cleared from movedRegions? Should we have a cleaner come visit occasionally? Patch looks fine to me. Nice fat test. bq. 5) The destination is the closeRegion interface is a kind of interface hijacking. Other options would be: Why you say the above? When we protobuf it, it'll just be an option so it shouldn't be too bad? The HCM stuff is ugly but thats not your fault. When a query fails because the region has moved, let the regionserver return the new address to the client -- Key: HBASE-5877 URL: https://issues.apache.org/jira/browse/HBASE-5877 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5877.v1.patch, 5877.v12.patch, 5877.v6.patch This is mainly useful when we do a rolling restart. This will decrease the load on the master and the network load. Note that a region is not immediately opened after a close. So: - it seems preferable to wait before retrying on the other server. An optimisation would be to have an heuristic depending on when the region was closed. - during a rolling restart, the server moves the regions then stops. So we may have failures when the server is stopped, and this patch won't help. The implementation in the first patch does: - on the region move, there is an added parameter on the regionserver#close to say where we are sending the region - the regionserver keeps a list of what was moved. Each entry is kept 100 seconds. - the regionserver sends a specific exception when it receives a query on a moved region. This exception contains the new address. - the client analyses the exeptions and update its cache accordingly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5931) HBase security profile doesn't compile
[ https://issues.apache.org/jira/browse/HBASE-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267791#comment-13267791 ] Hudson commented on HBASE-5931: --- Integrated in HBase-TRUNK #2844 (See [https://builds.apache.org/job/HBase-TRUNK/2844/]) HBASE-5931 HBase security profile doesn't compile (Revision 1333600) Result = SUCCESS stack : Files : * /hbase/trunk/security/src/main/java/org/apache/hadoop/hbase/security/HBasePolicyProvider.java * /hbase/trunk/src/main/resources/hbase-webapps/master/table.jsp HBase security profile doesn't compile --- Key: HBASE-5931 URL: https://issues.apache.org/jira/browse/HBASE-5931 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 Attachments: hbase-5931.patch The compilation is broken. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5494) Introduce a zk hosted table-wide read/write lock so only one table operation at a time
[ https://issues.apache.org/jira/browse/HBASE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13267802#comment-13267802 ] Phabricator commented on HBASE-5494: Kannan has commented on the revision [jira] [HBASE-5494] [89-fb] Table-level locks for schema changing operations.. INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:75 It is not clear why we need eventsToWaitFor structure at all. For events that a test is not interested in, it seems odd that we are putting those in the eventsToWaitFor structure. eventToDelayTimeMS seems sufficient. src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java:1321 spoke with Alex offline. We need to handle the case where after the create failed (because the node already exists), but before we could set a watch on this, if the znode was deleted, then we should handle that case correctly. Currently, we'll return false even in that case, and the caller will wait for CountDownLatch to reach 0, but it'll never reach zero since we missed the delete event. REVISION DETAIL https://reviews.facebook.net/D2997 Introduce a zk hosted table-wide read/write lock so only one table operation at a time -- Key: HBASE-5494 URL: https://issues.apache.org/jira/browse/HBASE-5494 Project: HBase Issue Type: Improvement Reporter: stack Attachments: D2997.3.patch I saw this facility over in the accumulo code base. Currently we just try to sort out the mess when splits come in during an online schema edit; somehow we figure we can figure all possible region transition combinations and make the right call. We could try and narrow the number of combinations by taking out a zk table lock when doing table operations. For example, on split or merge, we could take a read-only lock meaning the table can't be disabled while these are running. We could then take a write only lock if we want to ensure the table doesn't change while disabling or enabling process is happening. Shouldn't be too hard to add. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira