[jira] [Issue Comment Edited] (HBASE-4374) Up default regions size from 256M to 1G
[ https://issues.apache.org/jira/browse/HBASE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103397#comment-13103397 ] Andrew Purtell edited comment on HBASE-4374 at 9/13/11 6:25 AM: If we can get online schema edits into 0.92, at least metadata like HTD/HCD attributes, then users can easily change the region size split threshold as tables grow. Can add a ruby script helper for this purpose in bin or shell support. Does not take away from talking up presplitting but provides a good alternative if presplitting is not an option for whatever reason (e.g. keyspace distribution not well known). was (Author: apurtell): If we can get online schema edits into 0.92, at least metadata like HTD/HCD attributes, then users can easily change the split points as tables grow. Can add a ruby script helper for this purpose in bin or shell support. Does not take away from talking up presplitting but provides a good alternative if presplitting is not an option for whatever reason (e.g. keyspace distribution not well known). Up default regions size from 256M to 1G --- Key: HBASE-4374 URL: https://issues.apache.org/jira/browse/HBASE-4374 Project: HBase Issue Type: Task Reporter: stack Priority: Blocker Fix For: 0.92.0 HBASE-4365 has some discussion of why we default for a table should tend to fewer bigger regions. It doesn't look like this issue will be done for 0.92. For 0.92, lets up default region size from 256M to 1G and talk up pre-split on table creation in manual. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4374) Up default regions size from 256M to 1G
[ https://issues.apache.org/jira/browse/HBASE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103397#comment-13103397 ] Andrew Purtell commented on HBASE-4374: --- If we can get online schema edits into 0.92, at least metadata like HTD/HCD attributes, then users can easily change the split points as tables grow. Can add a ruby script helper for this purpose in bin or shell support. Does not take away from talking up presplitting but provides a good alternative if presplitting is not an option for whatever reason (e.g. keyspace distribution not well known). Up default regions size from 256M to 1G --- Key: HBASE-4374 URL: https://issues.apache.org/jira/browse/HBASE-4374 Project: HBase Issue Type: Task Reporter: stack Priority: Blocker Fix For: 0.92.0 HBASE-4365 has some discussion of why we default for a table should tend to fewer bigger regions. It doesn't look like this issue will be done for 0.92. For 0.92, lets up default region size from 256M to 1G and talk up pre-split on table creation in manual. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4238) CatalogJanitor can clear a daughter that split before processing its parent
[ https://issues.apache.org/jira/browse/HBASE-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4238: - Attachment: 4238-v2.txt CatalogJanitor can clear a daughter that split before processing its parent --- Key: HBASE-4238 URL: https://issues.apache.org/jira/browse/HBASE-4238 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: stack Priority: Critical Fix For: 0.92.0, 0.90.5 Attachments: 4238-v2.txt, 4238.txt I didn't dig a lot into this issue, but by splitting a table twice in a row I was able to trigger a situation where a daughter of the first split was deleted by the CatalogJanitor before it processed its parent. Will post log in a comment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4238) CatalogJanitor can clear a daughter that split before processing its parent
[ https://issues.apache.org/jira/browse/HBASE-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4238: - Status: Patch Available (was: Open) Submitting patch. Review J-D? CatalogJanitor can clear a daughter that split before processing its parent --- Key: HBASE-4238 URL: https://issues.apache.org/jira/browse/HBASE-4238 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: stack Priority: Critical Fix For: 0.92.0, 0.90.5 Attachments: 4238-v2.txt, 4238.txt I didn't dig a lot into this issue, but by splitting a table twice in a row I was able to trigger a situation where a daughter of the first split was deleted by the CatalogJanitor before it processed its parent. Will post log in a comment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4238) CatalogJanitor can clear a daughter that split before processing its parent
[ https://issues.apache.org/jira/browse/HBASE-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103404#comment-13103404 ] jirapos...@reviews.apache.org commented on HBASE-4238: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1819/ --- Review request for hbase. Summary --- Previous, we'd not clean up a parent if its daughter region didn't exist in the fs. This stipulation was added by HBASE-3872. This patch undoes this barrier to parent cleanup (See HBASE-3872 for why its ok to do this). CatalogJanitor + Break out the Comparator used by CatalogJanitor. It was an anonymous class. Instead we make it a static inner class so can add test that its actually sorting properly. + Added method hasNoReferences that will return true if no daughter dir OR no refs in daughter dir Added some TODOs around SplitTransaction -- vaguely related to this patch. Added new Test that checks cleanParent to ensure it works properly. Refactored bits of previous tests so they use common code. This addresses bug hbase-4238. https://issues.apache.org/jira/browse/hbase-4238 Diffs - src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java b53e9a0 src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 742aea4 src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java abafe5e src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 78e7d62 Diff: https://reviews.apache.org/r/1819/diff Testing --- Thanks, Michael CatalogJanitor can clear a daughter that split before processing its parent --- Key: HBASE-4238 URL: https://issues.apache.org/jira/browse/HBASE-4238 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: stack Priority: Critical Fix For: 0.92.0, 0.90.5 Attachments: 4238-v2.txt, 4238.txt I didn't dig a lot into this issue, but by splitting a table twice in a row I was able to trigger a situation where a daughter of the first split was deleted by the CatalogJanitor before it processed its parent. Will post log in a comment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4347) Remove duplicated code from Put, Delete, Get, Scan, MultiPut
[ https://issues.apache.org/jira/browse/HBASE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103411#comment-13103411 ] Hudson commented on HBASE-4347: --- Integrated in HBase-TRUNK #2203 (See [https://builds.apache.org/job/HBase-TRUNK/2203/]) HBASE-4347 addendum that moves CLUSTER_ID_ATTR to Mutation tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Mutation.java Remove duplicated code from Put, Delete, Get, Scan, MultiPut Key: HBASE-4347 URL: https://issues.apache.org/jira/browse/HBASE-4347 Project: HBase Issue Type: Improvement Components: util Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.92.0 Attachments: 4347-addendum.txt, 4347-v2.txt, 4347-v3.txt, 4347.txt This came from discussion with Stack w.r.t. HBASE-2195. There is currently a lot of duplicated code especially between Put and Delete, and also between all Operations. For example all of Put/Delete/Get/Scan have attributes with exactly the same code in all classes. Put and Delete also have the familyMap, Row, Rowlock, Timestamp, etc. One way to do this is to introduce OperationWithAttributes which extends Operation, and have Put/Delete/Get/Scan extend that rather than Operation. In addition Put and Delete could extends from Mutation (which itself would extend OperationWithAttributes). If a static inheritance hierarchy is not desired here, we can use delegation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4373) HBaseAdmin.assign() doesnot use force flag
[ https://issues.apache.org/jira/browse/HBASE-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103425#comment-13103425 ] ramkrishna.s.vasudevan commented on HBASE-4373: --- @Stack But except in AssignmentManager.handleHBCK(RegionTransitionData data) for the M_ZK_REGION_OFFLINE case we use assign(regionInfo, false); with force=false. All other places force=true. And also when you take the HBaseAdmin.assign() api then allowing the user to use force=false will not yield him the result as he may not be aware in what state the znode is currently in. So i felt like removing the parameter. Pls provide your suggestions. HBaseAdmin.assign() doesnot use force flag -- Key: HBASE-4373 URL: https://issues.apache.org/jira/browse/HBASE-4373 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor The HBaseAdmin.assign() {code} public void assign(final byte [] regionName, final boolean force) throws MasterNotRunningException, ZooKeeperConnectionException, IOException { getMaster().assign(regionName, force); } {code} In the HMaster we call {code} PairHRegionInfo, ServerName pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toString(regionName)); if (cpHost != null) { if (cpHost.preAssign(pair.getFirst(), force)) { return; } } assignRegion(pair.getFirst()); if (cpHost != null) { cpHost.postAssign(pair.getFirst(), force); } {code} The force flag is not getting used. May be we need to update the javadoc or do not provide the force flag as a parameter if we are not going to use it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4383) SlabCache reports negative heap sizes
SlabCache reports negative heap sizes - Key: HBASE-4383 URL: https://issues.apache.org/jira/browse/HBASE-4383 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Li Pi Fix For: 0.92.0 2011-09-13 00:36:17,734 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: Request Stats 2011-09-13 00:36:17,734 INFO org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 72089: 0 occupied, out of a capacity of 226398 blocks. HeapSize is -798.5m bytes., churnTime=0sec 2011-09-13 00:36:17,734 INFO org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 137625: 0 occupied, out of a capacity of 29647 blocks. HeapSize is -202.1m bytes., churnTime=0sec 2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: Current heap size is: -1000.7m 2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: Successfully Cached Stats 2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 72089: 0 occupied, out of a capacity of 226398 blocks. HeapSize is -798.5m bytes., churnTime=0sec 2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 137625: 0 occupied, out of a capacity of 29647 blocks. HeapSize is -202.1m bytes., churnTime=0sec 2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: Current heap size is: -1000.7m -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
[ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103459#comment-13103459 ] Ted Yu commented on HBASE-4367: --- bq. but I can't tell you what time 12349582034 is It is Sat May 23 1970 15:26:22 GMT-0700 (PST) I use http://www.ruddwire.com/handy-code/date-to-millisecond-calculators/ quite often. Deadlock in MemStore flusher due to JDK internally synchronizing on current thread -- Key: HBASE-4367 URL: https://issues.apache.org/jira/browse/HBASE-4367 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: 0.92.0 Attachments: 4367.txt, hbase-4367.txt We observed a deadlock in production between the following threads: - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member) - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4384) Hard to tell what causes failure in ZKAssign#createNodeClosing
Hard to tell what causes failure in ZKAssign#createNodeClosing -- Key: HBASE-4384 URL: https://issues.apache.org/jira/browse/HBASE-4384 Project: HBase Issue Type: Task Components: zookeeper Affects Versions: 0.90.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.94.0 The current code goes like: {code} 467 public static int createNodeClosing(ZooKeeperWatcher zkw, HRegionInfo region, 468 String serverName) 469 throws KeeperException, KeeperException.NodeExistsException { 470 LOG.debug(zkw.prefix(Creating unassigned node for + 471 region.getEncodedName() + in a CLOSING state)); 472 473 RegionTransitionData data = new RegionTransitionData( 474 EventType.RS_ZK_REGION_CLOSING, region.getRegionName(), serverName); 475 476 synchronized (zkw.getNodes()) { 477 String node = getNodeName(zkw, region.getEncodedName()); 478 zkw.getNodes().add(node); 479 return ZKUtil.createAndWatch(zkw, node, data.getBytes()); 480 } 481 } {code} Both WARN cases would be identical this way. In case of an exception, I think an exception ought to be logged as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4384) Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion
[ https://issues.apache.org/jira/browse/HBASE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HBASE-4384: --- Description: The current code goes like: {code} 172* Get the node's current version 173* @return The expectedVersion. If -1, we failed getting the node 174*/ 175 private int getCurrentVersion() { 176 int expectedVersion = FAILED; 177 try { 178 if ((expectedVersion = ZKAssign.getVersion( 179 server.getZooKeeper(), regionInfo)) == FAILED) { 180 LOG.warn(Error getting node's version in CLOSING state, + 181aborting close of + regionInfo.getRegionNameAsString()); 182 } 183 } catch (KeeperException e) { 184 LOG.warn(Error creating node in CLOSING state, aborting close of + 185 regionInfo.getRegionNameAsString()); 186 } 187 return expectedVersion; 188 } 189 } {code} Both WARN cases would be identical this way. In case of an exception, I think an exception ought to be logged as well. was: The current code goes like: {code} 467 public static int createNodeClosing(ZooKeeperWatcher zkw, HRegionInfo region, 468 String serverName) 469 throws KeeperException, KeeperException.NodeExistsException { 470 LOG.debug(zkw.prefix(Creating unassigned node for + 471 region.getEncodedName() + in a CLOSING state)); 472 473 RegionTransitionData data = new RegionTransitionData( 474 EventType.RS_ZK_REGION_CLOSING, region.getRegionName(), serverName); 475 476 synchronized (zkw.getNodes()) { 477 String node = getNodeName(zkw, region.getEncodedName()); 478 zkw.getNodes().add(node); 479 return ZKUtil.createAndWatch(zkw, node, data.getBytes()); 480 } 481 } {code} Both WARN cases would be identical this way. In case of an exception, I think an exception ought to be logged as well. Summary: Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion (was: Hard to tell what causes failure in ZKAssign#createNodeClosing) (Updated topic comment/desc.) Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion Key: HBASE-4384 URL: https://issues.apache.org/jira/browse/HBASE-4384 Project: HBase Issue Type: Task Components: zookeeper Affects Versions: 0.90.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.94.0 The current code goes like: {code} 172* Get the node's current version 173* @return The expectedVersion. If -1, we failed getting the node 174*/ 175 private int getCurrentVersion() { 176 int expectedVersion = FAILED; 177 try { 178 if ((expectedVersion = ZKAssign.getVersion( 179 server.getZooKeeper(), regionInfo)) == FAILED) { 180 LOG.warn(Error getting node's version in CLOSING state, + 181aborting close of + regionInfo.getRegionNameAsString()); 182 } 183 } catch (KeeperException e) { 184 LOG.warn(Error creating node in CLOSING state, aborting close of + 185 regionInfo.getRegionNameAsString()); 186 } 187 return expectedVersion; 188 } 189 } {code} Both WARN cases would be identical this way. In case of an exception, I think an exception ought to be logged as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4384) Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion
[ https://issues.apache.org/jira/browse/HBASE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HBASE-4384: --- Attachment: HBASE-4384.r1.diff Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion Key: HBASE-4384 URL: https://issues.apache.org/jira/browse/HBASE-4384 Project: HBase Issue Type: Task Components: zookeeper Affects Versions: 0.90.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4384.r1.diff The current code goes like: {code} 172* Get the node's current version 173* @return The expectedVersion. If -1, we failed getting the node 174*/ 175 private int getCurrentVersion() { 176 int expectedVersion = FAILED; 177 try { 178 if ((expectedVersion = ZKAssign.getVersion( 179 server.getZooKeeper(), regionInfo)) == FAILED) { 180 LOG.warn(Error getting node's version in CLOSING state, + 181aborting close of + regionInfo.getRegionNameAsString()); 182 } 183 } catch (KeeperException e) { 184 LOG.warn(Error creating node in CLOSING state, aborting close of + 185 regionInfo.getRegionNameAsString()); 186 } 187 return expectedVersion; 188 } 189 } {code} Both WARN cases would be identical this way. In case of an exception, I think an exception ought to be logged as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4384) Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion
[ https://issues.apache.org/jira/browse/HBASE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HBASE-4384: --- Status: Patch Available (was: Reopened) Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion Key: HBASE-4384 URL: https://issues.apache.org/jira/browse/HBASE-4384 Project: HBase Issue Type: Task Components: zookeeper Affects Versions: 0.90.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4384.r1.diff The current code goes like: {code} 172* Get the node's current version 173* @return The expectedVersion. If -1, we failed getting the node 174*/ 175 private int getCurrentVersion() { 176 int expectedVersion = FAILED; 177 try { 178 if ((expectedVersion = ZKAssign.getVersion( 179 server.getZooKeeper(), regionInfo)) == FAILED) { 180 LOG.warn(Error getting node's version in CLOSING state, + 181aborting close of + regionInfo.getRegionNameAsString()); 182 } 183 } catch (KeeperException e) { 184 LOG.warn(Error creating node in CLOSING state, aborting close of + 185 regionInfo.getRegionNameAsString()); 186 } 187 return expectedVersion; 188 } 189 } {code} Both WARN cases would be identical this way. In case of an exception, I think an exception ought to be logged as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4060) Making region assignment more robust
[ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-4060: - Assignee: ramkrishna.s.vasudevan Making region assignment more robust Key: HBASE-4060 URL: https://issues.apache.org/jira/browse/HBASE-4060 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0 From Eran Kutner: My concern is that the region allocation process seems to rely too much on timing considerations and doesn't seem to take enough measures to guarantee conflicts do not occur. I understand that in a distributed environment, when you don't get a timely response from a remote machine you can't know for sure if it did or did not receive the request, however there are things that can be done to mitigate this and reduce the conflict time significantly. For example, when I run dbck it knows that some regions are multiply assigned, the master could do the same and try to resolve the conflict. Another approach would be to handle late responses, even if the response from the remote machine arrives after it was assumed to be dead the master should have enough information to know it had created a conflict by assigning the region to another server. An even better solution, I think, is for the RS to periodically test that it is indeed the rightful owner of every region it holds and relinquish control over the region if it's not. Obviously a state where two RSs hold the same region is pathological and can lead to data loss, as demonstrated in my case. The system should be able to actively protect itself against such a scenario. It probably doesn't need saying but there is really nothing worse for a data storage system than data loss. In my case the problem didn't happen in the initial phase but after disabling and enabling a table with about 12K regions. For more background information, see 'Errors after major compaction' discussion on u...@hbase.apache.org -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4352) Apply version of hbase-4015 to branch
[ https://issues.apache.org/jira/browse/HBASE-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103615#comment-13103615 ] ramkrishna.s.vasudevan commented on HBASE-4352: --- @Stack As part of this HBASE-4083 fix also needs to be applied to 0.90.x. HBASE-4083 fix has been checked into trunk version. If you can remember, you had told that once rolling restart is tested we can take it to 0.90.x version. Apply version of hbase-4015 to branch - Key: HBASE-4352 URL: https://issues.apache.org/jira/browse/HBASE-4352 Project: HBase Issue Type: Bug Reporter: stack Assignee: ramkrishna.s.vasudevan Fix For: 0.90.5 Consider adding a version of hbase-4015 to 0.90. It changes HRegionInterface so would need move change to end of the Interface and then test that it doesn't break rolling restart. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4385) Use CacheBuilder in place of MapMaker
Use CacheBuilder in place of MapMaker - Key: HBASE-4385 URL: https://issues.apache.org/jira/browse/HBASE-4385 Project: HBase Issue Type: Task Reporter: Ted Yu Guava release 10 introduced CacheBuilder. We should use it in place of MapMaker which is used for caching. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4321) Add more comprehensive region split calculator
[ https://issues.apache.org/jira/browse/HBASE-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4321: -- Component/s: hbck Add more comprehensive region split calculator -- Key: HBASE-4321 URL: https://issues.apache.org/jira/browse/HBASE-4321 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.90.4 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.92.0, 0.90.5 Attachments: 0001-HBASE-4321-Add-more-comprehensive-region-split-calcu.patch, 0001-HBASE-4321-Add-more-comprehensive-region-split-calcu.patch, hbase-4321.diff, hbase-4321.txt Hbck currently scans through meta one entry at a time, only keeping a reference to the previous meta entry. This is insufficient for capturing all the possible problems in meta and needs something more to properly identify holes, overlaps, duplicate start keys, and otherwise invalid meta entries. Ideally, this calculator could also be used online interrogating an existing meta (HBASE-4058), and also used to generate a completely new meta offline just from regioninfo and in hdfs (HBASE-3505). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4322) [hbck] Update checkIntegrity/checkRegionChain to present more accurate region split problem summary
[ https://issues.apache.org/jira/browse/HBASE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4322: -- Component/s: hbck [hbck] Update checkIntegrity/checkRegionChain to present more accurate region split problem summary --- Key: HBASE-4322 URL: https://issues.apache.org/jira/browse/HBASE-4322 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.90.4, 0.94.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4322-hbck-Update-checkIntegrity-checkRegionCha.patch, 0001-HBASE-4322-hbck-Update-checkIntegrity-checkRegionCha.patch This is a mostly semantics preserving upgrade to hbck that uses the RegionSplitCalculator from HBASE-4321 that provides more in depth information about region split problems in meta when running hbck. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4058) Extend TestHBaseFsck with a complete .META. recovery scenario
[ https://issues.apache.org/jira/browse/HBASE-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4058: -- Component/s: hbck Extend TestHBaseFsck with a complete .META. recovery scenario - Key: HBASE-4058 URL: https://issues.apache.org/jira/browse/HBASE-4058 Project: HBase Issue Type: Improvement Components: hbck Reporter: Andrew Purtell Assignee: stack Fix For: 0.94.0 We should have a unit test that launches a minicluster and constructs a few tables, then deletes META files on disk, then bounces the master, then recovers the result with HBCK. Perhaps it is possible to extend TestHBaseFsck to do this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3505) hbck should be able to fix case where region is missing from META but on FS
[ https://issues.apache.org/jira/browse/HBASE-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-3505: -- Component/s: hbck hbck should be able to fix case where region is missing from META but on FS --- Key: HBASE-3505 URL: https://issues.apache.org/jira/browse/HBASE-3505 Project: HBase Issue Type: Improvement Components: hbck Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hbase-3505.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3887) Add region deletion tool
[ https://issues.apache.org/jira/browse/HBASE-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Einspanjer updated HBASE-3887: - Attachment: online_delete.rb This script deletes regions on a live table. The only thing it doesn't currently do is split regions that are partially within the start / end key range. I would love it if someone could take a crack at putting that enhancement in this or writing a separate script to do it. Tested with a few test clusters in various conditions and then used this script to delete thousands of old regions from our large production table. Add region deletion tool Key: HBASE-3887 URL: https://issues.apache.org/jira/browse/HBASE-3887 Project: HBase Issue Type: New Feature Components: regionserver Reporter: Ophir Cohen Priority: Minor Attachments: online_delete.rb A region deletion tool can be very useful to remove large amount of data. For example, it can be used to remove all data older than specific date (assuming your data sorted by dates) etc... This tool should be something as follows: Input: region key or (even better!) start end key. 1. Split region to isolate the keys. 2. Disable the relevant regions. 3. Delete files from the file system. 4. Update .META. table. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4322) [hbck] Update checkIntegrity/checkRegionChain to present more accurate region split problem summary
[ https://issues.apache.org/jira/browse/HBASE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4322: -- Attachment: hbase-4322-0.90.patch Attached a 0.90 compatible patch. [hbck] Update checkIntegrity/checkRegionChain to present more accurate region split problem summary --- Key: HBASE-4322 URL: https://issues.apache.org/jira/browse/HBASE-4322 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.90.4, 0.94.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4322-hbck-Update-checkIntegrity-checkRegionCha.patch, 0001-HBASE-4322-hbck-Update-checkIntegrity-checkRegionCha.patch, hbase-4322-0.90.patch This is a mostly semantics preserving upgrade to hbck that uses the RegionSplitCalculator from HBASE-4321 that provides more in depth information about region split problems in meta when running hbck. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4379) [hbck] Does not complain about tables with no end region [Z,]
[ https://issues.apache.org/jira/browse/HBASE-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4379: -- Component/s: hbck [hbck] Does not complain about tables with no end region [Z,] - Key: HBASE-4379 URL: https://issues.apache.org/jira/browse/HBASE-4379 Project: HBase Issue Type: Bug Components: hbck Reporter: Jonathan Hsieh hbck does not detect or have an error condition when the last region of a table is missing (end key != ''). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4375) [hbck] Add region coverage visualization to hbck
[ https://issues.apache.org/jira/browse/HBASE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4375: -- Status: Patch Available (was: Open) [hbck] Add region coverage visualization to hbck Key: HBASE-4375 URL: https://issues.apache.org/jira/browse/HBASE-4375 Project: HBase Issue Type: New Feature Affects Versions: 0.94.0, 0.90.5 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4375-Add-region-coverage-visualization-to-hbck.patch After HBASE-4322 and HBASE-4321, we now have an accurate region splits / coverage map for properly identifying holes, overlaps, backwards regions and other kinds of problems in the .META. table. hbck should display this information so that someone can fix this. A simple version for a table with regions [,A], [A,B], [A,C], [C,] and would dump out something like this (showing an overlap in [A,B]) : ['table,,..', 'table,A,..'] A: ['table,A,..', 'B'] ['table,A,..', 'C'] B: ['table,A,..', 'C'] C: ['table,C', ''] null: My first thought is '-details' should this dump the full region map including all good and bad regions. Without -details, any errors should dump info with some context -- dump one region before problems, problem regions, and then one post problem region. Alternately we could add a new option or options to dump the region split map. What is the preferred way to toggle display of this information in hbck? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4375) [hbck] Add region coverage visualization to hbck
[ https://issues.apache.org/jira/browse/HBASE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103701#comment-13103701 ] Jonathan Hsieh commented on HBASE-4375: --- Patch applies on 0.90 and trunk after HBASE-4322. [hbck] Add region coverage visualization to hbck Key: HBASE-4375 URL: https://issues.apache.org/jira/browse/HBASE-4375 Project: HBase Issue Type: New Feature Affects Versions: 0.94.0, 0.90.5 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4375-Add-region-coverage-visualization-to-hbck.patch After HBASE-4322 and HBASE-4321, we now have an accurate region splits / coverage map for properly identifying holes, overlaps, backwards regions and other kinds of problems in the .META. table. hbck should display this information so that someone can fix this. A simple version for a table with regions [,A], [A,B], [A,C], [C,] and would dump out something like this (showing an overlap in [A,B]) : ['table,,..', 'table,A,..'] A: ['table,A,..', 'B'] ['table,A,..', 'C'] B: ['table,A,..', 'C'] C: ['table,C', ''] null: My first thought is '-details' should this dump the full region map including all good and bad regions. Without -details, any errors should dump info with some context -- dump one region before problems, problem regions, and then one post problem region. Alternately we could add a new option or options to dump the region split map. What is the preferred way to toggle display of this information in hbck? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4375) [hbck] Add region coverage visualization to hbck
[ https://issues.apache.org/jira/browse/HBASE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103702#comment-13103702 ] Jonathan Hsieh commented on HBASE-4375: --- Implemented simplest behavior -- if details mode is on, then dumps all regions split ranges. [hbck] Add region coverage visualization to hbck Key: HBASE-4375 URL: https://issues.apache.org/jira/browse/HBASE-4375 Project: HBase Issue Type: New Feature Affects Versions: 0.94.0, 0.90.5 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4375-Add-region-coverage-visualization-to-hbck.patch After HBASE-4322 and HBASE-4321, we now have an accurate region splits / coverage map for properly identifying holes, overlaps, backwards regions and other kinds of problems in the .META. table. hbck should display this information so that someone can fix this. A simple version for a table with regions [,A], [A,B], [A,C], [C,] and would dump out something like this (showing an overlap in [A,B]) : ['table,,..', 'table,A,..'] A: ['table,A,..', 'B'] ['table,A,..', 'C'] B: ['table,A,..', 'C'] C: ['table,C', ''] null: My first thought is '-details' should this dump the full region map including all good and bad regions. Without -details, any errors should dump info with some context -- dump one region before problems, problem regions, and then one post problem region. Alternately we could add a new option or options to dump the region split map. What is the preferred way to toggle display of this information in hbck? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4386) NPE in TaskMonitor
NPE in TaskMonitor -- Key: HBASE-4386 URL: https://issues.apache.org/jira/browse/HBASE-4386 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 Saw the following hitting /rs-status preINTERNAL_SERVER_ERROR/pre/ph3Caused by:/h3prejava.lang.NullPointerException at org.apache.hadoop.hbase.monitoring.TaskMonitor.purgeExpiredTasks(TaskMonitor.java:97) at org.apache.hadoop.hbase.monitoring.TaskMonitor.getTasks(TaskMonitor.java:127) at org.apache.hbase.tmpl.common.TaskMonitorTmplImpl.renderNoFlush(TaskMonitorTmplImpl.java:50) at org.apache.hbase.tmpl.common.TaskMonitorTmpl.renderNoFlush(TaskMonitorTmpl.java:170) at org.apache.hbase.tmpl.regionserver.RSStatusTmplImpl.renderNoFlush(RSStatusTmplImpl.java:70) at org.apache.hbase.tmpl.regionserver.RSStatusTmpl.renderNoFlush(RSStatusTmpl.java:176) at org.apache.hbase.tmpl.regionserver.RSStatusTmpl.render(RSStatusTmpl.java:167) at org.apache.hadoop.hbase.regionserver.RSStatusServlet.doGet(RSStatusServlet.java:48) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4386) NPE in TaskMonitor
[ https://issues.apache.org/jira/browse/HBASE-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103711#comment-13103711 ] Todd Lipcon commented on HBASE-4386: I think the issue is that items are added to the {{tasks}} list without synchronization. So the ArrayList can get into an inconsistent state where iterating it returns null. NPE in TaskMonitor -- Key: HBASE-4386 URL: https://issues.apache.org/jira/browse/HBASE-4386 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 Saw the following hitting /rs-status preINTERNAL_SERVER_ERROR/pre/ph3Caused by:/h3prejava.lang.NullPointerException at org.apache.hadoop.hbase.monitoring.TaskMonitor.purgeExpiredTasks(TaskMonitor.java:97) at org.apache.hadoop.hbase.monitoring.TaskMonitor.getTasks(TaskMonitor.java:127) at org.apache.hbase.tmpl.common.TaskMonitorTmplImpl.renderNoFlush(TaskMonitorTmplImpl.java:50) at org.apache.hbase.tmpl.common.TaskMonitorTmpl.renderNoFlush(TaskMonitorTmpl.java:170) at org.apache.hbase.tmpl.regionserver.RSStatusTmplImpl.renderNoFlush(RSStatusTmplImpl.java:70) at org.apache.hbase.tmpl.regionserver.RSStatusTmpl.renderNoFlush(RSStatusTmpl.java:176) at org.apache.hbase.tmpl.regionserver.RSStatusTmpl.render(RSStatusTmpl.java:167) at org.apache.hadoop.hbase.regionserver.RSStatusServlet.doGet(RSStatusServlet.java:48) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4386) NPE in TaskMonitor
[ https://issues.apache.org/jira/browse/HBASE-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned HBASE-4386: -- Assignee: Todd Lipcon NPE in TaskMonitor -- Key: HBASE-4386 URL: https://issues.apache.org/jira/browse/HBASE-4386 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: 0.92.0 Saw the following hitting /rs-status preINTERNAL_SERVER_ERROR/pre/ph3Caused by:/h3prejava.lang.NullPointerException at org.apache.hadoop.hbase.monitoring.TaskMonitor.purgeExpiredTasks(TaskMonitor.java:97) at org.apache.hadoop.hbase.monitoring.TaskMonitor.getTasks(TaskMonitor.java:127) at org.apache.hbase.tmpl.common.TaskMonitorTmplImpl.renderNoFlush(TaskMonitorTmplImpl.java:50) at org.apache.hbase.tmpl.common.TaskMonitorTmpl.renderNoFlush(TaskMonitorTmpl.java:170) at org.apache.hbase.tmpl.regionserver.RSStatusTmplImpl.renderNoFlush(RSStatusTmplImpl.java:70) at org.apache.hbase.tmpl.regionserver.RSStatusTmpl.renderNoFlush(RSStatusTmpl.java:176) at org.apache.hbase.tmpl.regionserver.RSStatusTmpl.render(RSStatusTmpl.java:167) at org.apache.hadoop.hbase.regionserver.RSStatusServlet.doGet(RSStatusServlet.java:48) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4387) Error while syncing: DFSOutputStream is closed
Error while syncing: DFSOutputStream is closed -- Key: HBASE-4387 URL: https://issues.apache.org/jira/browse/HBASE-4387 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 In a billion-row load on ~25 servers, I see error while syncing reasonable often with the error DFSOutputStream is closed around a roll. We have some race where a roll at the same time as heavy inserts causes a problem. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4387) Error while syncing: DFSOutputStream is closed
[ https://issues.apache.org/jira/browse/HBASE-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-4387: --- Attachment: errors-with-context.txt Here are the logs with 100 lines of context around all the ERROR lines Error while syncing: DFSOutputStream is closed -- Key: HBASE-4387 URL: https://issues.apache.org/jira/browse/HBASE-4387 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 Attachments: errors-with-context.txt In a billion-row load on ~25 servers, I see error while syncing reasonable often with the error DFSOutputStream is closed around a roll. We have some race where a roll at the same time as heavy inserts causes a problem. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.
[ https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4351: -- Status: Open (was: Patch Available) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign. Key: HBASE-4351 URL: https://issues.apache.org/jira/browse/HBASE-4351 Project: HBase Issue Type: Bug Environment: Linux Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4351.patch, HBASE-4351_1.patch The following is the problem Get the exact region name from UI and call HBaseAdmin.unassign(regionname, true). Here true is forceful option. As part of unassign api {code} public void unassign(final byte [] regionName, final boolean force) throws IOException { PairHRegionInfo, HServerAddress pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toStringBinary(regionName)); HRegionInfo hri = pair.getFirst(); if (force) this.assignmentManager.clearRegionFromTransition(hri); this.assignmentManager.unassign(hri, force); } {code} As part of clearRegionFromTransition() {code} synchronized (this.regions) { this.regions.remove(hri); for (SetHRegionInfo regions : this.servers.values()) { regions.remove(hri); } } {code} the region is also removed. Hence when the master tries to identify the region {code} if (!regions.containsKey(region)) { debugLog(region, Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } {code} It is not able to identify the region. It exists in trunk and 0.90.x also. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4153: -- Status: Patch Available (was: Open) Handle RegionAlreadyInTransitionException in AssignmentManager -- Key: HBASE-4153 URL: https://issues.apache.org/jira/browse/HBASE-4153 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4153_1.patch Comment from Stack over in HBASE-3741: {quote} Question: Looking at this patch again, if we throw a RegionAlreadyInTransitionException, won't we just assign the region elsewhere though RegionAlreadyInTransitionException in at least one case here is saying that the region is already open on this regionserver? {quote} Indeed looking at the code it's going to be handled the same way other exceptions are. Need to add special cases for assign and unassign. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.
[ https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4351: -- Attachment: HBASE-4351_1.patch If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign. Key: HBASE-4351 URL: https://issues.apache.org/jira/browse/HBASE-4351 Project: HBase Issue Type: Bug Environment: Linux Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4351.patch, HBASE-4351_1.patch The following is the problem Get the exact region name from UI and call HBaseAdmin.unassign(regionname, true). Here true is forceful option. As part of unassign api {code} public void unassign(final byte [] regionName, final boolean force) throws IOException { PairHRegionInfo, HServerAddress pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toStringBinary(regionName)); HRegionInfo hri = pair.getFirst(); if (force) this.assignmentManager.clearRegionFromTransition(hri); this.assignmentManager.unassign(hri, force); } {code} As part of clearRegionFromTransition() {code} synchronized (this.regions) { this.regions.remove(hri); for (SetHRegionInfo regions : this.servers.values()) { regions.remove(hri); } } {code} the region is also removed. Hence when the master tries to identify the region {code} if (!regions.containsKey(region)) { debugLog(region, Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } {code} It is not able to identify the region. It exists in trunk and 0.90.x also. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4153: -- Attachment: HBASE-4153_1.patch Handle RegionAlreadyInTransitionException in AssignmentManager -- Key: HBASE-4153 URL: https://issues.apache.org/jira/browse/HBASE-4153 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4153_1.patch Comment from Stack over in HBASE-3741: {quote} Question: Looking at this patch again, if we throw a RegionAlreadyInTransitionException, won't we just assign the region elsewhere though RegionAlreadyInTransitionException in at least one case here is saying that the region is already open on this regionserver? {quote} Indeed looking at the code it's going to be handled the same way other exceptions are. Need to add special cases for assign and unassign. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.
[ https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4351: -- Status: Patch Available (was: Open) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign. Key: HBASE-4351 URL: https://issues.apache.org/jira/browse/HBASE-4351 Project: HBase Issue Type: Bug Environment: Linux Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4351.patch, HBASE-4351_1.patch The following is the problem Get the exact region name from UI and call HBaseAdmin.unassign(regionname, true). Here true is forceful option. As part of unassign api {code} public void unassign(final byte [] regionName, final boolean force) throws IOException { PairHRegionInfo, HServerAddress pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toStringBinary(regionName)); HRegionInfo hri = pair.getFirst(); if (force) this.assignmentManager.clearRegionFromTransition(hri); this.assignmentManager.unassign(hri, force); } {code} As part of clearRegionFromTransition() {code} synchronized (this.regions) { this.regions.remove(hri); for (SetHRegionInfo regions : this.servers.values()) { regions.remove(hri); } } {code} the region is also removed. Hence when the master tries to identify the region {code} if (!regions.containsKey(region)) { debugLog(region, Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } {code} It is not able to identify the region. It exists in trunk and 0.90.x also. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.
[ https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4351: -- Attachment: HBASE-4351_2.patch Resubmitting patch with minor changes If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign. Key: HBASE-4351 URL: https://issues.apache.org/jira/browse/HBASE-4351 Project: HBase Issue Type: Bug Environment: Linux Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch The following is the problem Get the exact region name from UI and call HBaseAdmin.unassign(regionname, true). Here true is forceful option. As part of unassign api {code} public void unassign(final byte [] regionName, final boolean force) throws IOException { PairHRegionInfo, HServerAddress pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toStringBinary(regionName)); HRegionInfo hri = pair.getFirst(); if (force) this.assignmentManager.clearRegionFromTransition(hri); this.assignmentManager.unassign(hri, force); } {code} As part of clearRegionFromTransition() {code} synchronized (this.regions) { this.regions.remove(hri); for (SetHRegionInfo regions : this.servers.values()) { regions.remove(hri); } } {code} the region is also removed. Hence when the master tries to identify the region {code} if (!regions.containsKey(region)) { debugLog(region, Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } {code} It is not able to identify the region. It exists in trunk and 0.90.x also. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.
[ https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4351: -- Status: Open (was: Patch Available) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign. Key: HBASE-4351 URL: https://issues.apache.org/jira/browse/HBASE-4351 Project: HBase Issue Type: Bug Environment: Linux Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch The following is the problem Get the exact region name from UI and call HBaseAdmin.unassign(regionname, true). Here true is forceful option. As part of unassign api {code} public void unassign(final byte [] regionName, final boolean force) throws IOException { PairHRegionInfo, HServerAddress pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toStringBinary(regionName)); HRegionInfo hri = pair.getFirst(); if (force) this.assignmentManager.clearRegionFromTransition(hri); this.assignmentManager.unassign(hri, force); } {code} As part of clearRegionFromTransition() {code} synchronized (this.regions) { this.regions.remove(hri); for (SetHRegionInfo regions : this.servers.values()) { regions.remove(hri); } } {code} the region is also removed. Hence when the master tries to identify the region {code} if (!regions.containsKey(region)) { debugLog(region, Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } {code} It is not able to identify the region. It exists in trunk and 0.90.x also. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-3742) Master receives unexpected region close but doesn't do anything
[ https://issues.apache.org/jira/browse/HBASE-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans resolved HBASE-3742. --- Resolution: Won't Fix Resolving as won't fix, lots of rework done for the master in trunk. If there's still an issue, it'll probably come up differently. Master receives unexpected region close but doesn't do anything --- Key: HBASE-3742 URL: https://issues.apache.org/jira/browse/HBASE-3742 Project: HBase Issue Type: Bug Affects Versions: 0.90.1 Reporter: Jean-Daniel Cryans We got this in the context of HBASE-3741, a region was closed by a region server but the master wasn't expecting it and didn't do anything about it. We had to force assign it back. {quote} 2011-04-05 15:15:55,812 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:6-0x42ec2cece810b68 Retrieved 93 byte(s) of data from znode /prodjobs/unassigned/1470298961 and set watcher; region=stumbles_by_userid2,'穗���6,1266566087256, server=sv4borg42,60020,1300920459477, state=RS_ZK_REGION_CLOSING 2011-04-05 15:15:55,812 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /prodjobs/unassigned/1470298961 (region=stumbles_by_userid2,'穗���6,1266566087256, server=sv4borg42,60020,1300920459477, state=RS_ZK_REGION_CLOSING) 2011-04-05 15:15:55,812 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSING, server=sv4borg42,60020,1300920459477, region=1470298961 2011-04-05 15:15:55,812 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSING for region 1470298961 from server sv4borg42,60020,1300920459477 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2011-04-05 15:15:55,843 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:6-0x42ec2cece810b68 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/prodjobs/unassigned/1470298961 2011-04-05 15:15:55,843 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:6-0x42ec2cece810b68 Retrieved 93 byte(s) of data from znode /prodjobs/unassigned/1470298961 and set watcher; region=stumbles_by_userid2,'穗���6,1266566087256, server=sv4borg42,60020,1300920459477, state=RS_ZK_REGION_CLOSED 2011-04-05 15:15:55,843 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=sv4borg42,60020,1300920459477, region=1470298961 2011-04-05 15:15:55,843 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 1470298961 from server sv4borg42,60020,1300920459477 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.
[ https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4351: -- Status: Patch Available (was: Open) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign. Key: HBASE-4351 URL: https://issues.apache.org/jira/browse/HBASE-4351 Project: HBase Issue Type: Bug Environment: Linux Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch The following is the problem Get the exact region name from UI and call HBaseAdmin.unassign(regionname, true). Here true is forceful option. As part of unassign api {code} public void unassign(final byte [] regionName, final boolean force) throws IOException { PairHRegionInfo, HServerAddress pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toStringBinary(regionName)); HRegionInfo hri = pair.getFirst(); if (force) this.assignmentManager.clearRegionFromTransition(hri); this.assignmentManager.unassign(hri, force); } {code} As part of clearRegionFromTransition() {code} synchronized (this.regions) { this.regions.remove(hri); for (SetHRegionInfo regions : this.servers.values()) { regions.remove(hri); } } {code} the region is also removed. Hence when the master tries to identify the region {code} if (!regions.containsKey(region)) { debugLog(region, Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } {code} It is not able to identify the region. It exists in trunk and 0.90.x also. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103756#comment-13103756 ] Ted Yu commented on HBASE-4153: --- Also, the relatively long exception messages for closeRegion() and openRegion() can be extracted so that majority of the message is shared. The javadoc warning developer I mentioned above can be placed on the extracted exception message. Handle RegionAlreadyInTransitionException in AssignmentManager -- Key: HBASE-4153 URL: https://issues.apache.org/jira/browse/HBASE-4153 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4153_1.patch Comment from Stack over in HBASE-3741: {quote} Question: Looking at this patch again, if we throw a RegionAlreadyInTransitionException, won't we just assign the region elsewhere though RegionAlreadyInTransitionException in at least one case here is saying that the region is already open on this regionserver? {quote} Indeed looking at the code it's going to be handled the same way other exceptions are. Need to add special cases for assign and unassign. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4387) Error while syncing: DFSOutputStream is closed
[ https://issues.apache.org/jira/browse/HBASE-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103774#comment-13103774 ] Jean-Daniel Cryans commented on HBASE-4387: --- HLog.syncer() syncs outside of the updateLock and has the following comment: bq. // Done in parallel for all writer threads, thanks to HDFS-895 So we don't need to synchronize for sync'ing but we do need something when closing the file. Error while syncing: DFSOutputStream is closed -- Key: HBASE-4387 URL: https://issues.apache.org/jira/browse/HBASE-4387 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 Attachments: errors-with-context.txt In a billion-row load on ~25 servers, I see error while syncing reasonable often with the error DFSOutputStream is closed around a roll. We have some race where a roll at the same time as heavy inserts causes a problem. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.
[ https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103792#comment-13103792 ] stack commented on HBASE-4351: -- +1 on patch. J-D? If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign. Key: HBASE-4351 URL: https://issues.apache.org/jira/browse/HBASE-4351 Project: HBase Issue Type: Bug Environment: Linux Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch The following is the problem Get the exact region name from UI and call HBaseAdmin.unassign(regionname, true). Here true is forceful option. As part of unassign api {code} public void unassign(final byte [] regionName, final boolean force) throws IOException { PairHRegionInfo, HServerAddress pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toStringBinary(regionName)); HRegionInfo hri = pair.getFirst(); if (force) this.assignmentManager.clearRegionFromTransition(hri); this.assignmentManager.unassign(hri, force); } {code} As part of clearRegionFromTransition() {code} synchronized (this.regions) { this.regions.remove(hri); for (SetHRegionInfo regions : this.servers.values()) { regions.remove(hri); } } {code} the region is also removed. Hence when the master tries to identify the region {code} if (!regions.containsKey(region)) { debugLog(region, Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } {code} It is not able to identify the region. It exists in trunk and 0.90.x also. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.
[ https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4351: - Fix Version/s: 0.90.5 0.92.0 If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign. Key: HBASE-4351 URL: https://issues.apache.org/jira/browse/HBASE-4351 Project: HBase Issue Type: Bug Environment: Linux Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.90.5 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch The following is the problem Get the exact region name from UI and call HBaseAdmin.unassign(regionname, true). Here true is forceful option. As part of unassign api {code} public void unassign(final byte [] regionName, final boolean force) throws IOException { PairHRegionInfo, HServerAddress pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toStringBinary(regionName)); HRegionInfo hri = pair.getFirst(); if (force) this.assignmentManager.clearRegionFromTransition(hri); this.assignmentManager.unassign(hri, force); } {code} As part of clearRegionFromTransition() {code} synchronized (this.regions) { this.regions.remove(hri); for (SetHRegionInfo regions : this.servers.values()) { regions.remove(hri); } } {code} the region is also removed. Hence when the master tries to identify the region {code} if (!regions.containsKey(region)) { debugLog(region, Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } {code} It is not able to identify the region. It exists in trunk and 0.90.x also. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4320) Off Heap Cache never creates Slabs
[ https://issues.apache.org/jira/browse/HBASE-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103794#comment-13103794 ] Jonathan Gray commented on HBASE-4320: -- Looks like this was committed with HBASE-4027 in the message and not HBASE-4320. Guess there's no way to retroactively fix that but in case anyone comes here looking for the revision info it's linked over in the other jira. Off Heap Cache never creates Slabs -- Key: HBASE-4320 URL: https://issues.apache.org/jira/browse/HBASE-4320 Project: HBase Issue Type: Sub-task Reporter: Li Pi Assignee: Li Pi Fix For: 0.92.0 Attachments: confnotloading.txt On testing, the configuration file is never loaded by the off heap cache. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.
[ https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103802#comment-13103802 ] Jean-Daniel Cryans commented on HBASE-4351: --- Why not just: {code} if (force) { this.assignmentManager.clearRegionFromTransition(hri); assignRegion(hri); } else { this.assignmentManager.unassign(hri, force); } cpPostUnassign(hri, force); {code} No return, no double cpPostUnassign call. If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign. Key: HBASE-4351 URL: https://issues.apache.org/jira/browse/HBASE-4351 Project: HBase Issue Type: Bug Environment: Linux Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.90.5 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch The following is the problem Get the exact region name from UI and call HBaseAdmin.unassign(regionname, true). Here true is forceful option. As part of unassign api {code} public void unassign(final byte [] regionName, final boolean force) throws IOException { PairHRegionInfo, HServerAddress pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toStringBinary(regionName)); HRegionInfo hri = pair.getFirst(); if (force) this.assignmentManager.clearRegionFromTransition(hri); this.assignmentManager.unassign(hri, force); } {code} As part of clearRegionFromTransition() {code} synchronized (this.regions) { this.regions.remove(hri); for (SetHRegionInfo regions : this.servers.values()) { regions.remove(hri); } } {code} the region is also removed. Hence when the master tries to identify the region {code} if (!regions.containsKey(region)) { debugLog(region, Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } {code} It is not able to identify the region. It exists in trunk and 0.90.x also. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4375) [hbck] Add region coverage visualization to hbck
[ https://issues.apache.org/jira/browse/HBASE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103808#comment-13103808 ] stack commented on HBASE-4375: -- This is grand though it explicitly does System.out. Elsewhere when hbck prints, does it not take a PrintWriter or something? Do you want to do same here? Good stuff Jon. [hbck] Add region coverage visualization to hbck Key: HBASE-4375 URL: https://issues.apache.org/jira/browse/HBASE-4375 Project: HBase Issue Type: New Feature Affects Versions: 0.94.0, 0.90.5 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4375-Add-region-coverage-visualization-to-hbck.patch After HBASE-4322 and HBASE-4321, we now have an accurate region splits / coverage map for properly identifying holes, overlaps, backwards regions and other kinds of problems in the .META. table. hbck should display this information so that someone can fix this. A simple version for a table with regions [,A], [A,B], [A,C], [C,] and would dump out something like this (showing an overlap in [A,B]) : ['table,,..', 'table,A,..'] A: ['table,A,..', 'B'] ['table,A,..', 'C'] B: ['table,A,..', 'C'] C: ['table,C', ''] null: My first thought is '-details' should this dump the full region map including all good and bad regions. Without -details, any errors should dump info with some context -- dump one region before problems, problem regions, and then one post problem region. Alternately we could add a new option or options to dump the region split map. What is the preferred way to toggle display of this information in hbck? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4373) HBaseAdmin.assign() doesnot use force flag
[ https://issues.apache.org/jira/browse/HBASE-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103804#comment-13103804 ] stack commented on HBASE-4373: -- @Ram OK. Going by your rationale above, we should deprecate the override that has the force flag. HBaseAdmin.assign() doesnot use force flag -- Key: HBASE-4373 URL: https://issues.apache.org/jira/browse/HBASE-4373 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor The HBaseAdmin.assign() {code} public void assign(final byte [] regionName, final boolean force) throws MasterNotRunningException, ZooKeeperConnectionException, IOException { getMaster().assign(regionName, force); } {code} In the HMaster we call {code} PairHRegionInfo, ServerName pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toString(regionName)); if (cpHost != null) { if (cpHost.preAssign(pair.getFirst(), force)) { return; } } assignRegion(pair.getFirst()); if (cpHost != null) { cpHost.postAssign(pair.getFirst(), force); } {code} The force flag is not getting used. May be we need to update the javadoc or do not provide the force flag as a parameter if we are not going to use it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4380) large scan caching size causes RS to throw OOME
[ https://issues.apache.org/jira/browse/HBASE-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103818#comment-13103818 ] Ming Ma commented on HBASE-4380: Thanks, Ted. That should work for a more controlled environment like predefined hbase map job where we know the max number of concurrent scans at a given time for a given RS. In the case where any numbers of clients can call at any given time, we will need a better solution. large scan caching size causes RS to throw OOME --- Key: HBASE-4380 URL: https://issues.apache.org/jira/browse/HBASE-4380 Project: HBase Issue Type: Bug Components: regionserver Reporter: Ming Ma Assignee: Ming Ma If the hbase application specifies a large caching size via Scan.setCaching(...), RS will try to accumulate enough rows before returning to the client. This could blow up RS memory. In TableInputFormat scenario, we have couple mappers with large caching size, thus RS memory usage goes up quickly. RS perhaps should take memory usage into account, for example, return less results per HRegionInterface.next(long scannerId, int numberOfRows) call in the case of low memory. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4380) large scan caching size causes RS to throw OOME
[ https://issues.apache.org/jira/browse/HBASE-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103820#comment-13103820 ] stack commented on HBASE-4380: -- Agree Ming. large scan caching size causes RS to throw OOME --- Key: HBASE-4380 URL: https://issues.apache.org/jira/browse/HBASE-4380 Project: HBase Issue Type: Bug Components: regionserver Reporter: Ming Ma Assignee: Ming Ma If the hbase application specifies a large caching size via Scan.setCaching(...), RS will try to accumulate enough rows before returning to the client. This could blow up RS memory. In TableInputFormat scenario, we have couple mappers with large caching size, thus RS memory usage goes up quickly. RS perhaps should take memory usage into account, for example, return less results per HRegionInterface.next(long scannerId, int numberOfRows) call in the case of low memory. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2196) Support more than one slave cluster
[ https://issues.apache.org/jira/browse/HBASE-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103849#comment-13103849 ] Jean-Daniel Cryans commented on HBASE-2196: --- bq. Might be nice also if ReplicationSource would handle their own hlogs, rather than ReplicationSourceManager managing all of them. Yeah somewhere along the development the design changed but not all the parts moved, feel free to try it out in the scope of a follow-up jira. bq. @J-D, are you aware of anything specific that would not work with your patch (or the combined patch I posted earlier)? Have you tested it? I think it was basically done but I wanted to do more testing on real clusters before committing but it's really time-consuming. It's meant to be very simple to add multi-slave, it's just the testing part that I didn't want to be bothered with when I first wrote replication since we didn't need it back then. Support more than one slave cluster --- Key: HBASE-2196 URL: https://issues.apache.org/jira/browse/HBASE-2196 Project: HBase Issue Type: Sub-task Components: replication Reporter: Jean-Daniel Cryans Fix For: 0.92.0 Attachments: 2196-v2.txt, 2196.txt, HBASE-2196-wip.patch Currently replication supports only 1 slave cluster, need to ability to add more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103858#comment-13103858 ] ramkrishna.s.vasudevan commented on HBASE-4153: --- Throwing exception when we get RegionalreadyInTransition is fine but there are 2 problems - If we try HBaseAdmin.move() or HBaseAdmin.unassign() the ClosedRegionHandler will call assign() and in this flow if we throw RegionAlreadyInTransitionException is thrown then we cannot bring the exception upto the user as EventHandler.run() catches the exception So only for HBaseAdmin.assign() we can get the exception propogated upto the user. - If we make the assign() to throw exception then we need to handle it in many places. So i have just returned once we get RegionalreadyInTransition Exception. Another interesting thing observed was current in RegionalreadyInTransition.java {code} public RegionAlreadyInTransitionException(String action, String region) { } {code} we were passing 2 args. Now in the master if i had to decode this exception and unwrap the exception I was not able to do so because {code} private IOException instantiateException(Class? extends IOException cls) throws Exception { Constructor? extends IOException cn = cls.getConstructor(String.class); {code} This is what we are expecting a single arg String constructor in RemoteException.java. Hence i have done one modification of passing the exact exception msg in the OpenRegionHandler and CloseRegionHandler and just {code} public RegionAlreadyInTransitionException(String action) { super(s); } {code} Handle RegionAlreadyInTransitionException in AssignmentManager -- Key: HBASE-4153 URL: https://issues.apache.org/jira/browse/HBASE-4153 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 Attachments: HBASE-4153_1.patch Comment from Stack over in HBASE-3741: {quote} Question: Looking at this patch again, if we throw a RegionAlreadyInTransitionException, won't we just assign the region elsewhere though RegionAlreadyInTransitionException in at least one case here is saying that the region is already open on this regionserver? {quote} Indeed looking at the code it's going to be handled the same way other exceptions are. Need to add special cases for assign and unassign. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4306) Race between CatalogJanitor and LoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103864#comment-13103864 ] stack commented on HBASE-4306: -- When a region splits, handleSplitReport is called on master. It calls AM.regionOffline so the split parent region should be cleared from AM.regions. It should not be in set to balance. HBASE-4238 being fixed should at least change this from being a blocker to something less? Race between CatalogJanitor and LoadBalancer Key: HBASE-4306 URL: https://issues.apache.org/jira/browse/HBASE-4306 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.92.0, 0.90.5 It is possible for the LoadBalancer to try to assign an offline/split region while it is waiting to be CatalogJanitor'ed. It goes like this: {quote} 2011-08-25 00:32:07,137 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: parent: Daughters; d1, d2 from sv4r22s16,60020,1314211225331 ... (cleaning never happens or whatever) ... 2011-08-29 13:45:14,561 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=parent, src=sv4r22s16,60020,1314211225331, dest=sv4r19s17,60020,1314218170402 2011-08-29 13:45:14,561 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region parent (offlining) 2011-08-29 13:45:14,588 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=sv4r22s16,60020,1314211225331, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for parent but we are not serving it for parent {quote} Here it took 4 days of balancing to finally get to try to balance the parent (that was never deleted because of HBASE-4238), but it can also happen if the balancer decides to balance the parent just before it's cleaned. The end effect is that the balancer will be disabled _forever_ until that's fixed. The culprit here is that the master keeps the region online until AssignmentManager.regionOffline is called by the CJ, which means it's still treated like any other region although it's offline. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4381) Refactor split decisions into a split policy class
[ https://issues.apache.org/jira/browse/HBASE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103842#comment-13103842 ] stack commented on HBASE-4381: -- This looks great. Commit. Can do other policies later. Refactor split decisions into a split policy class -- Key: HBASE-4381 URL: https://issues.apache.org/jira/browse/HBASE-4381 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.92.0 Attachments: hbase-4381.txt This is a semantics-preserving refactor that moves the code that decides when and where to split into a new split policy class. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2196) Support more than one slave cluster
[ https://issues.apache.org/jira/browse/HBASE-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103867#comment-13103867 ] Lars Hofhansl commented on HBASE-2196: -- Thanks Stack and J-D. I started on having ReplicationSource manage their own logs. So far it does not actually make the code nicer and easier to read, the version I have so far also fails TestReplication. So that's for another jira (as you say). One thing I did was to remove HServerAddress from ReplicationSource in favor of using ServerName. HServerAddress resolves hostnames right away, which is good in this case, but as HConnectionManager caches the connection anyway, that should not be a problem. I'll add more tests and also do real world testing, and then send an update. Support more than one slave cluster --- Key: HBASE-2196 URL: https://issues.apache.org/jira/browse/HBASE-2196 Project: HBase Issue Type: Sub-task Components: replication Reporter: Jean-Daniel Cryans Fix For: 0.92.0 Attachments: 2196-v2.txt, 2196.txt, HBASE-2196-wip.patch Currently replication supports only 1 slave cluster, need to ability to add more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters
[ https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103869#comment-13103869 ] Lars Hofhansl commented on HBASE-3130: -- This seems like an important bug fix, can we but this into 0.92 even after we branched it? [replication] ReplicationSource can't recover from session expired on remote clusters - Key: HBASE-3130 URL: https://issues.apache.org/jira/browse/HBASE-3130 Project: HBase Issue Type: Bug Components: replication Reporter: Jean-Daniel Cryans Currently ReplicationSource cannot recover when its zookeeper connection to its remote cluster expires. HLogs are still being tracked, but a cluster restart is required to continue replication (or a rolling restart). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.
[ https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4351: -- Attachment: HBASE-4351_3.patch J-D's comment updated in patch If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign. Key: HBASE-4351 URL: https://issues.apache.org/jira/browse/HBASE-4351 Project: HBase Issue Type: Bug Environment: Linux Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.90.5 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch, HBASE-4351_3.patch The following is the problem Get the exact region name from UI and call HBaseAdmin.unassign(regionname, true). Here true is forceful option. As part of unassign api {code} public void unassign(final byte [] regionName, final boolean force) throws IOException { PairHRegionInfo, HServerAddress pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toStringBinary(regionName)); HRegionInfo hri = pair.getFirst(); if (force) this.assignmentManager.clearRegionFromTransition(hri); this.assignmentManager.unassign(hri, force); } {code} As part of clearRegionFromTransition() {code} synchronized (this.regions) { this.regions.remove(hri); for (SetHRegionInfo regions : this.servers.values()) { regions.remove(hri); } } {code} the region is also removed. Hence when the master tries to identify the region {code} if (!regions.containsKey(region)) { debugLog(region, Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } {code} It is not able to identify the region. It exists in trunk and 0.90.x also. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.
[ https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4351: -- Status: Open (was: Patch Available) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign. Key: HBASE-4351 URL: https://issues.apache.org/jira/browse/HBASE-4351 Project: HBase Issue Type: Bug Environment: Linux Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.90.5 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch, HBASE-4351_3.patch The following is the problem Get the exact region name from UI and call HBaseAdmin.unassign(regionname, true). Here true is forceful option. As part of unassign api {code} public void unassign(final byte [] regionName, final boolean force) throws IOException { PairHRegionInfo, HServerAddress pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toStringBinary(regionName)); HRegionInfo hri = pair.getFirst(); if (force) this.assignmentManager.clearRegionFromTransition(hri); this.assignmentManager.unassign(hri, force); } {code} As part of clearRegionFromTransition() {code} synchronized (this.regions) { this.regions.remove(hri); for (SetHRegionInfo regions : this.servers.values()) { regions.remove(hri); } } {code} the region is also removed. Hence when the master tries to identify the region {code} if (!regions.containsKey(region)) { debugLog(region, Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } {code} It is not able to identify the region. It exists in trunk and 0.90.x also. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4306) Race between CatalogJanitor and LoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103873#comment-13103873 ] stack commented on HBASE-4306: -- Chatting with J-D. Something else must be going on here if the parent region is in the set of regions to balance, the split message must have been missed. Changing this from blocker to major. Removing as necessary fix on 0.92. and 0.90.5 till we learn more. Race between CatalogJanitor and LoadBalancer Key: HBASE-4306 URL: https://issues.apache.org/jira/browse/HBASE-4306 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Priority: Blocker It is possible for the LoadBalancer to try to assign an offline/split region while it is waiting to be CatalogJanitor'ed. It goes like this: {quote} 2011-08-25 00:32:07,137 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: parent: Daughters; d1, d2 from sv4r22s16,60020,1314211225331 ... (cleaning never happens or whatever) ... 2011-08-29 13:45:14,561 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=parent, src=sv4r22s16,60020,1314211225331, dest=sv4r19s17,60020,1314218170402 2011-08-29 13:45:14,561 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region parent (offlining) 2011-08-29 13:45:14,588 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=sv4r22s16,60020,1314211225331, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for parent but we are not serving it for parent {quote} Here it took 4 days of balancing to finally get to try to balance the parent (that was never deleted because of HBASE-4238), but it can also happen if the balancer decides to balance the parent just before it's cleaned. The end effect is that the balancer will be disabled _forever_ until that's fixed. The culprit here is that the master keeps the region online until AssignmentManager.regionOffline is called by the CJ, which means it's still treated like any other region although it's offline. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4306) Race between CatalogJanitor and LoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4306: - Priority: Minor (was: Blocker) Fix Version/s: (was: 0.90.5) (was: 0.92.0) Race between CatalogJanitor and LoadBalancer Key: HBASE-4306 URL: https://issues.apache.org/jira/browse/HBASE-4306 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Priority: Minor It is possible for the LoadBalancer to try to assign an offline/split region while it is waiting to be CatalogJanitor'ed. It goes like this: {quote} 2011-08-25 00:32:07,137 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: parent: Daughters; d1, d2 from sv4r22s16,60020,1314211225331 ... (cleaning never happens or whatever) ... 2011-08-29 13:45:14,561 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=parent, src=sv4r22s16,60020,1314211225331, dest=sv4r19s17,60020,1314218170402 2011-08-29 13:45:14,561 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region parent (offlining) 2011-08-29 13:45:14,588 INFO org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=sv4r22s16,60020,1314211225331, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for parent but we are not serving it for parent {quote} Here it took 4 days of balancing to finally get to try to balance the parent (that was never deleted because of HBASE-4238), but it can also happen if the balancer decides to balance the parent just before it's cleaned. The end effect is that the balancer will be disabled _forever_ until that's fixed. The culprit here is that the master keeps the region online until AssignmentManager.regionOffline is called by the CJ, which means it's still treated like any other region although it's offline. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.
[ https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4351: -- Status: Patch Available (was: Open) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign. Key: HBASE-4351 URL: https://issues.apache.org/jira/browse/HBASE-4351 Project: HBase Issue Type: Bug Environment: Linux Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.90.5 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch, HBASE-4351_3.patch The following is the problem Get the exact region name from UI and call HBaseAdmin.unassign(regionname, true). Here true is forceful option. As part of unassign api {code} public void unassign(final byte [] regionName, final boolean force) throws IOException { PairHRegionInfo, HServerAddress pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toStringBinary(regionName)); HRegionInfo hri = pair.getFirst(); if (force) this.assignmentManager.clearRegionFromTransition(hri); this.assignmentManager.unassign(hri, force); } {code} As part of clearRegionFromTransition() {code} synchronized (this.regions) { this.regions.remove(hri); for (SetHRegionInfo regions : this.servers.values()) { regions.remove(hri); } } {code} the region is also removed. Hence when the master tries to identify the region {code} if (!regions.containsKey(region)) { debugLog(region, Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } {code} It is not able to identify the region. It exists in trunk and 0.90.x also. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4384) Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion
[ https://issues.apache.org/jira/browse/HBASE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103832#comment-13103832 ] stack commented on HBASE-4384: -- @Harsh So the patch is for 0.90 branch? Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion Key: HBASE-4384 URL: https://issues.apache.org/jira/browse/HBASE-4384 Project: HBase Issue Type: Task Components: zookeeper Affects Versions: 0.90.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4384.r1.diff The current code goes like: {code} 172* Get the node's current version 173* @return The expectedVersion. If -1, we failed getting the node 174*/ 175 private int getCurrentVersion() { 176 int expectedVersion = FAILED; 177 try { 178 if ((expectedVersion = ZKAssign.getVersion( 179 server.getZooKeeper(), regionInfo)) == FAILED) { 180 LOG.warn(Error getting node's version in CLOSING state, + 181aborting close of + regionInfo.getRegionNameAsString()); 182 } 183 } catch (KeeperException e) { 184 LOG.warn(Error creating node in CLOSING state, aborting close of + 185 regionInfo.getRegionNameAsString()); 186 } 187 return expectedVersion; 188 } 189 } {code} Both WARN cases would be identical this way. In case of an exception, I think an exception ought to be logged as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2196) Support more than one slave cluster
[ https://issues.apache.org/jira/browse/HBASE-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-2196: - Fix Version/s: 0.92.0 Pulling in. If done by friday, will commit. Support more than one slave cluster --- Key: HBASE-2196 URL: https://issues.apache.org/jira/browse/HBASE-2196 Project: HBase Issue Type: Sub-task Components: replication Reporter: Jean-Daniel Cryans Fix For: 0.92.0 Attachments: 2196-v2.txt, 2196.txt, HBASE-2196-wip.patch Currently replication supports only 1 slave cluster, need to ability to add more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4375) [hbck] Add region coverage visualization to hbck
[ https://issues.apache.org/jira/browse/HBASE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103837#comment-13103837 ] Jonathan Hsieh commented on HBASE-4375: --- In most other places in hbck, it prints out via System.out.println. I just tried to stay consistent with it. Some examples: {code} public synchronized void reportError(ERROR_CODE errorCode, String message) { errorList.add(errorCode); if (!summary) { System.out.println(ERROR: + message); } errorCount++; showProgress = 0; } {code} {code} public synchronized int summarize() { System.out.println(Integer.toString(errorCount) + inconsistencies detected.); if (errorCount == 0) { System.out.println(Status: OK); return 0; } else { System.out.println(Status: INCONSISTENT); return -1; } } {code} {code} /** * Prints summary of all tables found on the system. */ private void printTableSummary() { System.out.println(Summary:); for (TInfo tInfo : tablesInfo.values()) { if (errors.tableHasErrors(tInfo)) { System.out.println(Table + tInfo.getName() + is inconsistent.); } else { System.out.println( + tInfo.getName() + is okay.); } System.out.println(Number of regions: + tInfo.getNumRegions()); System.out.print(Deployed on: ); for (HServerAddress server : tInfo.deployedOn) { System.out.print( + server.toString()); } System.out.println(); } } {code} [hbck] Add region coverage visualization to hbck Key: HBASE-4375 URL: https://issues.apache.org/jira/browse/HBASE-4375 Project: HBase Issue Type: New Feature Affects Versions: 0.94.0, 0.90.5 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4375-Add-region-coverage-visualization-to-hbck.patch After HBASE-4322 and HBASE-4321, we now have an accurate region splits / coverage map for properly identifying holes, overlaps, backwards regions and other kinds of problems in the .META. table. hbck should display this information so that someone can fix this. A simple version for a table with regions [,A], [A,B], [A,C], [C,] and would dump out something like this (showing an overlap in [A,B]) : ['table,,..', 'table,A,..'] A: ['table,A,..', 'B'] ['table,A,..', 'C'] B: ['table,A,..', 'C'] C: ['table,C', ''] null: My first thought is '-details' should this dump the full region map including all good and bad regions. Without -details, any errors should dump info with some context -- dump one region before problems, problem regions, and then one post problem region. Alternately we could add a new option or options to dump the region split map. What is the preferred way to toggle display of this information in hbck? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4380) large scan caching size causes RS to throw OOME
[ https://issues.apache.org/jira/browse/HBASE-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103828#comment-13103828 ] Ted Yu commented on HBASE-4380: --- Can we utilize the following ? http://download.oracle.com/javase/1.5.0/docs/guide/management/mxbeans.html#low_memory http://download.oracle.com/javase/1.5.0/docs/api/java/lang/management/MemoryPoolMXBean.html large scan caching size causes RS to throw OOME --- Key: HBASE-4380 URL: https://issues.apache.org/jira/browse/HBASE-4380 Project: HBase Issue Type: Bug Components: regionserver Reporter: Ming Ma Assignee: Ming Ma If the hbase application specifies a large caching size via Scan.setCaching(...), RS will try to accumulate enough rows before returning to the client. This could blow up RS memory. In TableInputFormat scenario, we have couple mappers with large caching size, thus RS memory usage goes up quickly. RS perhaps should take memory usage into account, for example, return less results per HRegionInterface.next(long scannerId, int numberOfRows) call in the case of low memory. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.
[ https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103892#comment-13103892 ] Jean-Daniel Cryans commented on HBASE-4351: --- If it passes the tests :) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign. Key: HBASE-4351 URL: https://issues.apache.org/jira/browse/HBASE-4351 Project: HBase Issue Type: Bug Environment: Linux Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.90.5 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch, HBASE-4351_3.patch The following is the problem Get the exact region name from UI and call HBaseAdmin.unassign(regionname, true). Here true is forceful option. As part of unassign api {code} public void unassign(final byte [] regionName, final boolean force) throws IOException { PairHRegionInfo, HServerAddress pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toStringBinary(regionName)); HRegionInfo hri = pair.getFirst(); if (force) this.assignmentManager.clearRegionFromTransition(hri); this.assignmentManager.unassign(hri, force); } {code} As part of clearRegionFromTransition() {code} synchronized (this.regions) { this.regions.remove(hri); for (SetHRegionInfo regions : this.servers.values()) { regions.remove(hri); } } {code} the region is also removed. Hence when the master tries to identify the region {code} if (!regions.containsKey(region)) { debugLog(region, Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } {code} It is not able to identify the region. It exists in trunk and 0.90.x also. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.
[ https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103891#comment-13103891 ] Jean-Daniel Cryans commented on HBASE-4351: --- +1 If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign. Key: HBASE-4351 URL: https://issues.apache.org/jira/browse/HBASE-4351 Project: HBase Issue Type: Bug Environment: Linux Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.90.5 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch, HBASE-4351_3.patch The following is the problem Get the exact region name from UI and call HBaseAdmin.unassign(regionname, true). Here true is forceful option. As part of unassign api {code} public void unassign(final byte [] regionName, final boolean force) throws IOException { PairHRegionInfo, HServerAddress pair = MetaReader.getRegion(this.catalogTracker, regionName); if (pair == null) throw new UnknownRegionException(Bytes.toStringBinary(regionName)); HRegionInfo hri = pair.getFirst(); if (force) this.assignmentManager.clearRegionFromTransition(hri); this.assignmentManager.unassign(hri, force); } {code} As part of clearRegionFromTransition() {code} synchronized (this.regions) { this.regions.remove(hri); for (SetHRegionInfo regions : this.servers.values()) { regions.remove(hri); } } {code} the region is also removed. Hence when the master tries to identify the region {code} if (!regions.containsKey(region)) { debugLog(region, Attempted to unassign region + region.getRegionNameAsString() + but it is not + currently assigned anywhere); return; } {code} It is not able to identify the region. It exists in trunk and 0.90.x also. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4375) [hbck] Add region coverage visualization to hbck
[ https://issues.apache.org/jira/browse/HBASE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103898#comment-13103898 ] stack commented on HBASE-4375: -- ok. looks like something we need to clean up; do PrintWriter or System.out. Can do in another issue. Let me commit Jon. [hbck] Add region coverage visualization to hbck Key: HBASE-4375 URL: https://issues.apache.org/jira/browse/HBASE-4375 Project: HBase Issue Type: New Feature Affects Versions: 0.94.0, 0.90.5 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4375-Add-region-coverage-visualization-to-hbck.patch After HBASE-4322 and HBASE-4321, we now have an accurate region splits / coverage map for properly identifying holes, overlaps, backwards regions and other kinds of problems in the .META. table. hbck should display this information so that someone can fix this. A simple version for a table with regions [,A], [A,B], [A,C], [C,] and would dump out something like this (showing an overlap in [A,B]) : ['table,,..', 'table,A,..'] A: ['table,A,..', 'B'] ['table,A,..', 'C'] B: ['table,A,..', 'C'] C: ['table,C', ''] null: My first thought is '-details' should this dump the full region map including all good and bad regions. Without -details, any errors should dump info with some context -- dump one region before problems, problem regions, and then one post problem region. Alternately we could add a new option or options to dump the region split map. What is the preferred way to toggle display of this information in hbck? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4375) [hbck] Add region coverage visualization to hbck
[ https://issues.apache.org/jira/browse/HBASE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103910#comment-13103910 ] stack commented on HBASE-4375: -- I tried applying and it fails. Need to wait on other patches to go in first. Flag me Jon when this can go in (when we have necessary prereqs applied). Good stuff. [hbck] Add region coverage visualization to hbck Key: HBASE-4375 URL: https://issues.apache.org/jira/browse/HBASE-4375 Project: HBase Issue Type: New Feature Affects Versions: 0.94.0, 0.90.5 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4375-Add-region-coverage-visualization-to-hbck.patch After HBASE-4322 and HBASE-4321, we now have an accurate region splits / coverage map for properly identifying holes, overlaps, backwards regions and other kinds of problems in the .META. table. hbck should display this information so that someone can fix this. A simple version for a table with regions [,A], [A,B], [A,C], [C,] and would dump out something like this (showing an overlap in [A,B]) : ['table,,..', 'table,A,..'] A: ['table,A,..', 'B'] ['table,A,..', 'C'] B: ['table,A,..', 'C'] C: ['table,C', ''] null: My first thought is '-details' should this dump the full region map including all good and bad regions. Without -details, any errors should dump info with some context -- dump one region before problems, problem regions, and then one post problem region. Alternately we could add a new option or options to dump the region split map. What is the preferred way to toggle display of this information in hbck? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4383) SlabCache reports negative heap sizes
[ https://issues.apache.org/jira/browse/HBASE-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103914#comment-13103914 ] Todd Lipcon commented on HBASE-4383: It's also now reporting negative occupied: 2011-09-13 12:06:18,183 INFO org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 72089: -11917 occupied, out of a capacity of 226398 blocks. HeapSize is -798.5m bytes., churnTime=7mins, 53sec SlabCache reports negative heap sizes - Key: HBASE-4383 URL: https://issues.apache.org/jira/browse/HBASE-4383 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Li Pi Fix For: 0.92.0 2011-09-13 00:36:17,734 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: Request Stats 2011-09-13 00:36:17,734 INFO org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 72089: 0 occupied, out of a capacity of 226398 blocks. HeapSize is -798.5m bytes., churnTime=0sec 2011-09-13 00:36:17,734 INFO org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 137625: 0 occupied, out of a capacity of 29647 blocks. HeapSize is -202.1m bytes., churnTime=0sec 2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: Current heap size is: -1000.7m 2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: Successfully Cached Stats 2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 72089: 0 occupied, out of a capacity of 226398 blocks. HeapSize is -798.5m bytes., churnTime=0sec 2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 137625: 0 occupied, out of a capacity of 29647 blocks. HeapSize is -202.1m bytes., churnTime=0sec 2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: Current heap size is: -1000.7m -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4238) CatalogJanitor can clear a daughter that split before processing its parent
[ https://issues.apache.org/jira/browse/HBASE-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4238: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk and branch (didn't add test to branch because wouldn't apply -- uses TRUNK stuff like ServerName) CatalogJanitor can clear a daughter that split before processing its parent --- Key: HBASE-4238 URL: https://issues.apache.org/jira/browse/HBASE-4238 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: stack Priority: Critical Fix For: 0.92.0, 0.90.5 Attachments: 4238-v2.txt, 4238.txt I didn't dig a lot into this issue, but by splitting a table twice in a row I was able to trigger a situation where a daughter of the first split was deleted by the CatalogJanitor before it processed its parent. Will post log in a comment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4381) Refactor split decisions into a split policy class
[ https://issues.apache.org/jira/browse/HBASE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4381: - Status: Patch Available (was: Open) Refactor split decisions into a split policy class -- Key: HBASE-4381 URL: https://issues.apache.org/jira/browse/HBASE-4381 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.92.0 Attachments: hbase-4381.txt This is a semantics-preserving refactor that moves the code that decides when and where to split into a new split policy class. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4388) Second start after migration from 90 to trunk crashes
Second start after migration from 90 to trunk crashes - Key: HBASE-4388 URL: https://issues.apache.org/jira/browse/HBASE-4388 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Blocker Fix For: 0.92.0 I started a trunk cluster to upgrade from 90, inserted a ton of data, then did a clean shutdown. When I started again, I got the following exception: 11/09/13 12:29:09 INFO master.HMaster: Meta has HRI with HTDs. Updating meta now. 11/09/13 12:29:09 FATAL master.HMaster: Unhandled exception. Starting shutdown. java.lang.NegativeArraySizeException: -102 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:147) at org.apache.hadoop.hbase.HTableDescriptor.readFields(HTableDescriptor.java:606) at org.apache.hadoop.hbase.migration.HRegionInfo090x.readFields(HRegionInfo090x.java:641) at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:133) at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:103) at org.apache.hadoop.hbase.util.Writables.getHRegionInfoForMigration(Writables.java:228) at org.apache.hadoop.hbase.catalog.MetaEditor.getHRegionInfoForMigration(MetaEditor.java:350) at org.apache.hadoop.hbase.catalog.MetaEditor$1.visit(MetaEditor.java:273) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:633) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:255) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:235) at org.apache.hadoop.hbase.catalog.MetaEditor.updateMetaWithNewRegionInfo(MetaEditor.java:284) at org.apache.hadoop.hbase.catalog.MetaEditor.migrateRootAndMeta(MetaEditor.java:298) at org.apache.hadoop.hbase.master.HMaster.updateMetaWithNewHRI(HMaster.java:529) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:472) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103925#comment-13103925 ] Subbu M Iyer commented on HBASE-4213: - Based on further discussions, here are some call outs: 1. Provide separate public API's for alter instant operations so as to not break existing public API's with out deprecation or prior notice. 2. Provide a config level setting to enable instant schema update feature. (defaults to false). This will also enable us to release this feature in a more controlled and transparent manner. 3. We don't want to intimidate developers with scary boolean flags that does things in a such a way that they may not completely understand or care about. 4. Providing a developer level API to use instant-alter is good in the sense they can fully capitalize a scalable/fault tolerant variants. At the same time it might be confusing to some in the sense that why we are even providing a not scalable/fault tolerant variant in the first place. 5. We don't want to expose implementation details such as this flag uses ZK to track schema changes and so on So, long story short: In addition to review comments, V7 will include the following: 1. Add new config parameter hbase.instant.schema.change.enabled (exact name of flag is open) and default to false so that all existing API's will go over the current path untouched. 2. Separate public API's for all alter operations which supports the new pattern in addition to existing public API's. New API's will take in a boolean parameter that overrides the config setting on a per request basis. 3. Internally both the API's will go through the same pipeline to promote reuse as well as maintainability. Please let me know your thoughts/comments. Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.92.0 Attachments: 4213-Instant_Schema_change_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4384) Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion
[ https://issues.apache.org/jira/browse/HBASE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103926#comment-13103926 ] Harsh J commented on HBASE-4384: stack, No, patch is for all branches 0.90 to trunk. Please disregard my first comment, it was made when I was under a great deal of workspace switching and I thought I was looking at one snippet of trunk source, while I was looking at something else instead. This patch is targeted for trunk, but can also be backported atop other branches (0.92 if branched already, and 0.90). Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion Key: HBASE-4384 URL: https://issues.apache.org/jira/browse/HBASE-4384 Project: HBase Issue Type: Task Components: zookeeper Affects Versions: 0.90.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4384.r1.diff The current code goes like: {code} 172* Get the node's current version 173* @return The expectedVersion. If -1, we failed getting the node 174*/ 175 private int getCurrentVersion() { 176 int expectedVersion = FAILED; 177 try { 178 if ((expectedVersion = ZKAssign.getVersion( 179 server.getZooKeeper(), regionInfo)) == FAILED) { 180 LOG.warn(Error getting node's version in CLOSING state, + 181aborting close of + regionInfo.getRegionNameAsString()); 182 } 183 } catch (KeeperException e) { 184 LOG.warn(Error creating node in CLOSING state, aborting close of + 185 regionInfo.getRegionNameAsString()); 186 } 187 return expectedVersion; 188 } 189 } {code} Both WARN cases would be identical this way. In case of an exception, I think an exception ought to be logged as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4388) Second start after migration from 90 to trunk crashes
[ https://issues.apache.org/jira/browse/HBASE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-4388: --- Attachment: meta.tgz Attached the META table directory Second start after migration from 90 to trunk crashes - Key: HBASE-4388 URL: https://issues.apache.org/jira/browse/HBASE-4388 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Blocker Fix For: 0.92.0 Attachments: meta.tgz I started a trunk cluster to upgrade from 90, inserted a ton of data, then did a clean shutdown. When I started again, I got the following exception: 11/09/13 12:29:09 INFO master.HMaster: Meta has HRI with HTDs. Updating meta now. 11/09/13 12:29:09 FATAL master.HMaster: Unhandled exception. Starting shutdown. java.lang.NegativeArraySizeException: -102 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:147) at org.apache.hadoop.hbase.HTableDescriptor.readFields(HTableDescriptor.java:606) at org.apache.hadoop.hbase.migration.HRegionInfo090x.readFields(HRegionInfo090x.java:641) at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:133) at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:103) at org.apache.hadoop.hbase.util.Writables.getHRegionInfoForMigration(Writables.java:228) at org.apache.hadoop.hbase.catalog.MetaEditor.getHRegionInfoForMigration(MetaEditor.java:350) at org.apache.hadoop.hbase.catalog.MetaEditor$1.visit(MetaEditor.java:273) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:633) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:255) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:235) at org.apache.hadoop.hbase.catalog.MetaEditor.updateMetaWithNewRegionInfo(MetaEditor.java:284) at org.apache.hadoop.hbase.catalog.MetaEditor.migrateRootAndMeta(MetaEditor.java:298) at org.apache.hadoop.hbase.master.HMaster.updateMetaWithNewHRI(HMaster.java:529) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:472) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4389) Address lots of issues with migration from 90 to trunk
Address lots of issues with migration from 90 to trunk -- Key: HBASE-4389 URL: https://issues.apache.org/jira/browse/HBASE-4389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 Looking over the migration code that removes HTD from HRI, there are lots of issues. This JIRA is to redo this code in a way that will be less bug prone, and also future proof. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4389) Address lots of issues with migration from 90 to trunk
[ https://issues.apache.org/jira/browse/HBASE-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103938#comment-13103938 ] Todd Lipcon commented on HBASE-4389: After a quick pass through the migration code, here are the various issues I see: - HRegionInfo didn't have its VERSION incremented. Hence exception catching is used to try to determine which version is being read. - A single migrated boolean flag is used in ROOT to indicate that META has been updated to the new format. This leaves us no room for future migrations. migrated is not a boolean. It should instead be migratedToVersion or something - Migration should be idempotent - ie even if the migratedToVersion flag didn't get updated, migration should be able to re-run without crashing - Duplicated code between updateRootWithNewRegionInfo and updateMetaWithNewRegionInfo - Each region that is processed results in a call to createTableDescriptor, which results in calls to the NN - this will take a long time on a big cluster, and is unnecessary - No sanity checking that all of the HTDs for a table are equal - Migration code should ideally be moved to a separate class, instead of mixed with the non-migration code paths Address lots of issues with migration from 90 to trunk -- Key: HBASE-4389 URL: https://issues.apache.org/jira/browse/HBASE-4389 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 Looking over the migration code that removes HTD from HRI, there are lots of issues. This JIRA is to redo this code in a way that will be less bug prone, and also future proof. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4390) [replication] ReplicationSource's UncaughtExceptionHandler shouldn't join
[replication] ReplicationSource's UncaughtExceptionHandler shouldn't join - Key: HBASE-4390 URL: https://issues.apache.org/jira/browse/HBASE-4390 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Fix For: 0.90.5 From Jeff Whiting on the ML: {quote} regionserver60020.replicationSource,dev2 daemon prio=10 tid=0x2aaaf0312800 nid=0x69f8 in Object.wait() [0x4533e000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x2aaab12464c0 (a org.apache.hadoop.hbase.replication.regionserver.ReplicationSource) at java.lang.Thread.join(Thread.java:1151) - locked 0x2aaab12464c0 (a org.apache.hadoop.hbase.replication.regionserver.ReplicationSource) at org.apache.hadoop.hbase.util.Threads.shutdown(Threads.java:91) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.terminate(ReplicationSource.java:649) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$1.uncaughtException(ReplicationSource.java:628) at java.lang.Thread.dispatchUncaughtException(Thread.java:1831) {quote} That's pretty dumb, the thread is trying to join itself. UncaughtExceptionHandler shouldn't try to terminate() but just clear resources and then return. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4384) Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion
[ https://issues.apache.org/jira/browse/HBASE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4384: - Resolution: Fixed Fix Version/s: (was: 0.94.0) 0.90.5 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed branch and trunk. Thanks for the patch Harsh. Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion Key: HBASE-4384 URL: https://issues.apache.org/jira/browse/HBASE-4384 Project: HBase Issue Type: Task Components: zookeeper Affects Versions: 0.90.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 0.90.5 Attachments: HBASE-4384.r1.diff The current code goes like: {code} 172* Get the node's current version 173* @return The expectedVersion. If -1, we failed getting the node 174*/ 175 private int getCurrentVersion() { 176 int expectedVersion = FAILED; 177 try { 178 if ((expectedVersion = ZKAssign.getVersion( 179 server.getZooKeeper(), regionInfo)) == FAILED) { 180 LOG.warn(Error getting node's version in CLOSING state, + 181aborting close of + regionInfo.getRegionNameAsString()); 182 } 183 } catch (KeeperException e) { 184 LOG.warn(Error creating node in CLOSING state, aborting close of + 185 regionInfo.getRegionNameAsString()); 186 } 187 return expectedVersion; 188 } 189 } {code} Both WARN cases would be identical this way. In case of an exception, I think an exception ought to be logged as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-3446: --- Component/s: master ProcessServerShutdown fails if META moves, orphaning lots of regions Key: HBASE-3446 URL: https://issues.apache.org/jira/browse/HBASE-3446 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.0 Reporter: Todd Lipcon Assignee: stack Priority: Blocker Fix For: 0.92.0 Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2730) Expose RS work queue contents on web UI
[ https://issues.apache.org/jira/browse/HBASE-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-2730: --- Component/s: monitoring Expose RS work queue contents on web UI --- Key: HBASE-2730 URL: https://issues.apache.org/jira/browse/HBASE-2730 Project: HBase Issue Type: New Feature Components: monitoring, regionserver Reporter: Todd Lipcon Priority: Critical Fix For: 0.94.0 Would be nice to be able to see the contents of the various work queues - eg to know what regions are pending compaction/split/flush/etc. This is handy for debugging why a region might be blocked, etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2196) Support more than one slave cluster
[ https://issues.apache.org/jira/browse/HBASE-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103996#comment-13103996 ] Lars Hofhansl commented on HBASE-2196: -- One of my tests makes sure that no rows that existed before a peer was added is replicated to a new peer. It fails. But that's actually potentially the case even now, isn't it? Unless we roll the log when a peer is added, everything in latest log (which might be older) is replicated to the new peer. Correct? Support more than one slave cluster --- Key: HBASE-2196 URL: https://issues.apache.org/jira/browse/HBASE-2196 Project: HBase Issue Type: Sub-task Components: replication Reporter: Jean-Daniel Cryans Fix For: 0.92.0 Attachments: 2196-v2.txt, 2196.txt, HBASE-2196-wip.patch Currently replication supports only 1 slave cluster, need to ability to add more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4391) Add ability to start RS as root and call mlockall
Add ability to start RS as root and call mlockall - Key: HBASE-4391 URL: https://issues.apache.org/jira/browse/HBASE-4391 Project: HBase Issue Type: New Feature Components: regionserver Affects Versions: 0.94.0 Reporter: Todd Lipcon Fix For: 0.94.0 A common issue we've seen in practice is that users oversubscribe their region servers with too many MR tasks, etc. As soon as the machine starts swapping, the RS grinds to a halt, loses ZK session, aborts, etc. This can be combatted by starting the RS as root, calling mlockall(), and then setuid down to the hbase user. We should not require this, but we should provide it as an option. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4392) Add metrics for read/write throughput (#cells, #bytes)
Add metrics for read/write throughput (#cells, #bytes) -- Key: HBASE-4392 URL: https://issues.apache.org/jira/browse/HBASE-4392 Project: HBase Issue Type: Improvement Components: metrics, regionserver Affects Versions: 0.94.0 Reporter: Todd Lipcon Fix For: 0.94.0 Most of our metrics are currently based on RPC count. This is an inaccurate metric since some RPCs can be much more heavy weight than others. We should maintain our current metrics but also add counters for bytes and cells inserted / scanned. That gives a better idea of total load. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2196) Support more than one slave cluster
[ https://issues.apache.org/jira/browse/HBASE-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103998#comment-13103998 ] Jean-Daniel Cryans commented on HBASE-2196: --- Yes that's why I do this in TestReplication: {code} for ( JVMClusterUtil.RegionServerThread r : utility1.getHBaseCluster().getRegionServerThreads()) { r.getRegionServer().getWAL().rollWriter(); } {code} Support more than one slave cluster --- Key: HBASE-2196 URL: https://issues.apache.org/jira/browse/HBASE-2196 Project: HBase Issue Type: Sub-task Components: replication Reporter: Jean-Daniel Cryans Fix For: 0.92.0 Attachments: 2196-v2.txt, 2196.txt, HBASE-2196-wip.patch Currently replication supports only 1 slave cluster, need to ability to add more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2196) Support more than one slave cluster
[ https://issues.apache.org/jira/browse/HBASE-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104003#comment-13104003 ] Lars Hofhansl commented on HBASE-2196: -- Cool. I already added a rolling to the log in my MultiSlaveReplication as well, to verify that old logs are not replicated. Thanks for the clarification. Support more than one slave cluster --- Key: HBASE-2196 URL: https://issues.apache.org/jira/browse/HBASE-2196 Project: HBase Issue Type: Sub-task Components: replication Reporter: Jean-Daniel Cryans Fix For: 0.92.0 Attachments: 2196-v2.txt, 2196.txt, HBASE-2196-wip.patch Currently replication supports only 1 slave cluster, need to ability to add more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4393) Implement a canary monitoring program
Implement a canary monitoring program - Key: HBASE-4393 URL: https://issues.apache.org/jira/browse/HBASE-4393 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon This JIRA is to implement a standalone program that can be used to do canary monitoring of a running HBase cluster. This program would gather a list of the regions in the cluster, then iterate over them doing lightweight operations (eg short scans) to provide metrics about latency as well as alert on availability issues. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4394) Add support for seeking hints to FilterList
Add support for seeking hints to FilterList --- Key: HBASE-4394 URL: https://issues.apache.org/jira/browse/HBASE-4394 Project: HBase Issue Type: Improvement Components: filters Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.92.0 Currently FilterList's do not support getNextKeyHint() even if the underlying filters are giving hints. We should add support for FilterList to pass these through. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4394) Add support for seeking hints to FilterList
[ https://issues.apache.org/jira/browse/HBASE-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-4394: - Attachment: HBASE-4394-v1.patch Adds support for seek hints to FilterList and adds a unit test to TestFilterList that ensures it does the right thing across the different variations of inputs to a filterlist. Add support for seeking hints to FilterList --- Key: HBASE-4394 URL: https://issues.apache.org/jira/browse/HBASE-4394 Project: HBase Issue Type: Improvement Components: filters Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4394-v1.patch Currently FilterList's do not support getNextKeyHint() even if the underlying filters are giving hints. We should add support for FilterList to pass these through. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4394) Add support for seeking hints to FilterList
[ https://issues.apache.org/jira/browse/HBASE-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-4394: - Status: Patch Available (was: Open) Add support for seeking hints to FilterList --- Key: HBASE-4394 URL: https://issues.apache.org/jira/browse/HBASE-4394 Project: HBase Issue Type: Improvement Components: filters Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4394-v1.patch Currently FilterList's do not support getNextKeyHint() even if the underlying filters are giving hints. We should add support for FilterList to pass these through. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subbu M Iyer updated HBASE-4213: Attachment: 4213-V7-Support_instant_schema_changes_through_ZK.patch Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.92.0 Attachments: 4213-Instant_Schema_change_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104010#comment-13104010 ] Subbu M Iyer commented on HBASE-4213: - Attached V7 with all the review comments + above mentioned additions. Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.92.0 Attachments: 4213-Instant_Schema_change_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4394) Add support for seeking hints to FilterList
[ https://issues.apache.org/jira/browse/HBASE-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-4394: - Attachment: HBASE-4394-trunk-v2.patch Rebased for trunk Add support for seeking hints to FilterList --- Key: HBASE-4394 URL: https://issues.apache.org/jira/browse/HBASE-4394 Project: HBase Issue Type: Improvement Components: filters Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Minor Fix For: 0.92.0 Attachments: HBASE-4394-trunk-v2.patch, HBASE-4394-v1.patch Currently FilterList's do not support getNextKeyHint() even if the underlying filters are giving hints. We should add support for FilterList to pass these through. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4388) Second start after migration from 90 to trunk crashes
[ https://issues.apache.org/jira/browse/HBASE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-4388: --- Component/s: migration Second start after migration from 90 to trunk crashes - Key: HBASE-4388 URL: https://issues.apache.org/jira/browse/HBASE-4388 Project: HBase Issue Type: Bug Components: master, migration Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Blocker Fix For: 0.92.0 Attachments: meta.tgz I started a trunk cluster to upgrade from 90, inserted a ton of data, then did a clean shutdown. When I started again, I got the following exception: 11/09/13 12:29:09 INFO master.HMaster: Meta has HRI with HTDs. Updating meta now. 11/09/13 12:29:09 FATAL master.HMaster: Unhandled exception. Starting shutdown. java.lang.NegativeArraySizeException: -102 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:147) at org.apache.hadoop.hbase.HTableDescriptor.readFields(HTableDescriptor.java:606) at org.apache.hadoop.hbase.migration.HRegionInfo090x.readFields(HRegionInfo090x.java:641) at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:133) at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:103) at org.apache.hadoop.hbase.util.Writables.getHRegionInfoForMigration(Writables.java:228) at org.apache.hadoop.hbase.catalog.MetaEditor.getHRegionInfoForMigration(MetaEditor.java:350) at org.apache.hadoop.hbase.catalog.MetaEditor$1.visit(MetaEditor.java:273) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:633) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:255) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:235) at org.apache.hadoop.hbase.catalog.MetaEditor.updateMetaWithNewRegionInfo(MetaEditor.java:284) at org.apache.hadoop.hbase.catalog.MetaEditor.migrateRootAndMeta(MetaEditor.java:298) at org.apache.hadoop.hbase.master.HMaster.updateMetaWithNewHRI(HMaster.java:529) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:472) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4389) Address lots of issues with migration from 90 to trunk
[ https://issues.apache.org/jira/browse/HBASE-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-4389: --- Component/s: migration Address lots of issues with migration from 90 to trunk -- Key: HBASE-4389 URL: https://issues.apache.org/jira/browse/HBASE-4389 Project: HBase Issue Type: Bug Components: master, migration Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 Looking over the migration code that removes HTD from HRI, there are lots of issues. This JIRA is to redo this code in a way that will be less bug prone, and also future proof. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4238) CatalogJanitor can clear a daughter that split before processing its parent
[ https://issues.apache.org/jira/browse/HBASE-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104020#comment-13104020 ] Hudson commented on HBASE-4238: --- Integrated in HBase-TRUNK #2205 (See [https://builds.apache.org/job/HBase-TRUNK/2205/]) HBASE-4238 CatalogJanitor can clear a daughter that split before processing its parent stack : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java CatalogJanitor can clear a daughter that split before processing its parent --- Key: HBASE-4238 URL: https://issues.apache.org/jira/browse/HBASE-4238 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: stack Priority: Critical Fix For: 0.92.0, 0.90.5 Attachments: 4238-v2.txt, 4238.txt I didn't dig a lot into this issue, but by splitting a table twice in a row I was able to trigger a situation where a daughter of the first split was deleted by the CatalogJanitor before it processed its parent. Will post log in a comment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4395) EnableTableHandler races with itself
EnableTableHandler races with itself Key: HBASE-4395 URL: https://issues.apache.org/jira/browse/HBASE-4395 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Priority: Blocker Fix For: 0.90.5 Very often when we try to enable a big table we get something like: {quote} 2011-09-02 12:21:56,619 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state trying to OFFLINE; huge_ass_region_name state=PENDING_OPEN, ts=1314991316616 java.lang.IllegalStateException at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1074) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1030) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:858) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:838) at org.apache.hadoop.hbase.master.handler.EnableTableHandler$BulkEnabler$1.run(EnableTableHandler.java:154) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2011-09-02 12:21:56,620 INFO org.apache.hadoop.hbase.master.HMaster: Aborting {quote} The issue is that EnableTableHandler calls multiple BulkEnabler and it's possible that by the time it calls it a second time, using a stale list of still-not-enabled regions, that it tries to set one region offline in ZK but just after its state changed. Case in point: {quote} 2011-09-02 12:21:56,616 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region huge_ass_region_name to sv4r23s16,60020,1314880035029 2011-09-02 12:21:56,619 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state trying to OFFLINE; huge_ass_region_name state=PENDING_OPEN, ts=1314991316616 {quote} Here the first line is the first assign done in the first thread, and the second line is the second thread that got to process the same region around the same time. 3ms difference in time. After that, the master dies, and it's pretty sad when it restarts because it failovers an enabling table and it's ungodly slow. I'm pretty sure there's a window where double assignment are possible. Talking with Stack, it doesn't really make sense to call multiple enables since the list of regions is static (the table is disabled!). We should just call it and wait. Also there's a lot of cleanup to do in EnableTableHandler since it refers to disabling the table (copy pasta I guess). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4391) Add ability to start RS as root and call mlockall
[ https://issues.apache.org/jira/browse/HBASE-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned HBASE-4391: -- Assignee: Todd Lipcon Add ability to start RS as root and call mlockall - Key: HBASE-4391 URL: https://issues.apache.org/jira/browse/HBASE-4391 Project: HBase Issue Type: New Feature Components: regionserver Affects Versions: 0.94.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.94.0 A common issue we've seen in practice is that users oversubscribe their region servers with too many MR tasks, etc. As soon as the machine starts swapping, the RS grinds to a halt, loses ZK session, aborts, etc. This can be combatted by starting the RS as root, calling mlockall(), and then setuid down to the hbase user. We should not require this, but we should provide it as an option. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104053#comment-13104053 ] jirapos...@reviews.apache.org commented on HBASE-4213: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1786/#review1883 --- /src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java https://reviews.apache.org/r/1786/#comment4335 How many times is this expected to spin? Should there be a sleep here? - Andrew On 2011-09-12 18:36:02, Ted Yu wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1786/ bq. --- bq. bq. (Updated 2011-09-12 18:36:02) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. From Subbu: bq. here is the latest patch that support alter_instant, an instant schema change command that supports (Add, Modify, Delete column and Modify table) actions through ZK. bq. bq. 1. This pattern capitalizes on the fact that HRI's are now in HDFS and need not be sent over again from Master to RS cloud on every schema change event. bq. bq. 2. Offers real time instant schema change as we bypass the explicit bulk reassign (unassign + assign) of regions from master to RS. bq. bq. 3. Offers fault tolerant schema change support as schema changes now go through ZK. Secondary master taking over a failed schema change will be addressed through a separate JIRA. bq. bq. bq. This addresses bug HBASE-4213. bq. https://issues.apache.org/jira/browse/HBASE-4213 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/avro/AvroServer.java 1169522 bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1169522 bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1169522 bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1169522 bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1169522 bq./src/main/java/org/apache/hadoop/hbase/master/MasterServices.java 1169522 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java 1169522 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/ModifyTableHandler.java 1169522 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java 1169522 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java 1169522 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java 1169522 bq. /src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java 1169522 bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1169522 bq./src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java 1169522 bq./src/main/java/org/apache/hadoop/hbase/rest/SchemaResource.java 1169522 bq. /src/main/java/org/apache/hadoop/hbase/zookeeper/MasterSchemaChangeTracker.java PRE-CREATION bq. /src/main/java/org/apache/hadoop/hbase/zookeeper/SchemaChangeTracker.java PRE-CREATION bq./src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 1169522 bq./src/main/ruby/hbase/admin.rb 1169522 bq./src/main/ruby/shell.rb 1169522 bq./src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java 1169522 bq./src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 1169522 bq. /src/test/java/org/apache/hadoop/hbase/client/TestInstantSchemaChange.java PRE-CREATION bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 1169522 bq./src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 1169522 bq. /src/test/java/org/apache/hadoop/hbase/regionserver/handler/MockRegionServerServices.java 1169522 bq. bq. Diff: https://reviews.apache.org/r/1786/diff bq. bq. bq. Testing bq. --- bq. bq. Unit tests pass. bq. bq. bq. Thanks, bq. bq. Ted bq. bq. Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.92.0 Attachments: