[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115249#comment-13115249 ] Ted Yu commented on HBASE-4492: --- Found the following in output for the above timeout case: {code} 2011-09-27 05:28:59,047 DEBUG [RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] zookeeper.ZooKeeperWatcher(233): regionserver:57539-0x132a95afa18000d Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/root-region-server2011-09-27 05:28:59,047 DEBUG [RegionServer:3;us.ciq.com,58748,131710132-EventThread] zookeeper.ZKUtil(226): regionserver:58748-0x132a95afa18000a /hbase/root-region-server does not exist. Watcher is set.2011-09-27 05:28:59,047 DEBUG [Master:0;us.ciq.com,56327,1317101304726-EventThread] zookeeper.ZKUtil(226): hconnection-0x132a95afa180005 /hbase/root-region-server does not exist. Watcher is set.2011-09-27 05:28:59,048 DEBUG [Thread-1-EventThread] zookeeper.ZooKeeperWatcher(233): master:51567-0x132a95afa180008 Received ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/unassigned2011-09-27 05:28:59,048 DEBUG [RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] zookeeper.ZKUtil(226): regionserver:57539-0x132a95afa18000d /hbase/root-region-server does not exist. Watcher is set.2011-09-27 05:28:59,049 INFO [MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] master.AssignmentManager(1485): No previous transition plan was found (or we are ignoring an existing plan) for -ROOT-,,0.70236052 so generated a random one; hri=-ROOT-,,0.70236052, src=, dest=us.ciq.com,57500,1317101330748; 3 (online=3, exclude=null) available servers2011-09-27 05:28:59,049 INFO [MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] master.AssignmentManager(1485): Assigning region -ROOT-,,0.70236052 to us.ciq.com,57500,13171013307482011-09-27 05:28:59,049 DEBUG [MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] master.ServerManager(448): New connection to us.ciq.com,57500,13171013307482011-09-27 05:28:59,049 DEBUG [Thread-1-EventThread] zookeeper.ZKUtil(224): master:51567-0x132a95afa180008 Set watcher on existing znode /hbase/unassigned/702360522011-09-27 05:28:59,049 FATAL [MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] master.HMaster(1181): Master server abort: loaded coprocessors are: []2011-09-27 05:28:59,050 FATAL [MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] master.HMaster(1186): Unexpected state trying to OFFLINE; -ROOT-,,0.70236052 state=PENDING_OPEN, ts=1317101339049, server=us.ciq.com,57500,1317101330748 java.lang.IllegalStateException at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1517) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1392) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1169) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1144) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1139) at org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1816) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:105) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:123) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:186) {code} TestRollingRestart fails intermittently --- Key: HBASE-4492 URL: https://issues.apache.org/jira/browse/HBASE-4492 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Jonathan Gray Attachments: 4492.txt I got the following when running test suite on TRUNK: {code} testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart) Time elapsed: 300.28 sec ERROR! java.lang.Exception: test timed out after 30 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313) at org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210) {code} I ran TestRollingRestart#testBasicRollingRestart manually afterwards which wiped out test output file for the failed test. Similar failure can be found on Jenkins: https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/ -- This message is automatically
[jira] [Created] (HBASE-4495) CatalogTracker has an identity crisis; needs to be cut-back in scope
CatalogTracker has an identity crisis; needs to be cut-back in scope Key: HBASE-4495 URL: https://issues.apache.org/jira/browse/HBASE-4495 Project: HBase Issue Type: Improvement Affects Versions: 0.94.0 Reporter: stack CT needs a good reworking. I'd suggest its scope be cut way down to only deal in zk transactions rather than zk and reading meta location in hbase (over an HConnection) and being a purveyor of HRegionInterfaces on meta and root servers and being an Abortable and a verifier of catalog locations. Once this is done, I would suggest it then better belongs over under the zk package and that the Meta* classes then move to client package. Here's some messy notes I added to head of CT class in hbase-3446 where I spent some time trying to make out what it was CT did. {code} // TODO: This class needs a rethink. The original intent was that it would be // the one-stop-shop for root and meta locations and that it would get this // info from reading and watching zk state. The class was to be used by // servers when they needed to know of root and meta movement but also by // client-side (inside in HTable) so rather than figure root and meta // locations on fault, the client would instead get notifications out of zk. // // But this original intent is frustrated by the fact that this class has to // read an hbase table, the -ROOT- table, to figure out the .META. region // location which means we depend on an HConnection. HConnection will do // retrying but also, it has its own mechanism for finding root and meta // locations (and for 'verifying'; it tries the location and if it fails, does // new lookup, etc.). So, at least for now, HConnection (or HTable) can't // have a CT since CT needs a HConnection (Even then, do want HT to have a CT? // For HT keep up a session with ZK? Rather, shouldn't we do like asynchbase // where we'd open a connection to zk, read what we need then let the // connection go?). The 'fix' is make it so both root and meta addresses // are wholey up in zk -- not in zk (root) -- and in an hbase table (meta). // // But even then, this class does 'verification' of the location and it does // this by making a call over an HConnection (which will do its own root // and meta lookups). Isn't this verification 'useless' since when we // return, whatever is dependent on the result of this call then needs to // use HConnection; what we have verified may change in meantime (HConnection // uses the CT primitives, the root and meta trackers finding root locations). // // When meta is moved to zk, this class may make more sense. In the // meantime, it does not cohere. It should just watch meta and root and // NOT do verification -- let that be out in HConnection since its going to // be done there ultimately anyways. // // This class has spread throughout the codebase. It needs to be reigned in. // This class should be used server-side only, even if we move meta location // up into zk. Currently its used over in the client package. Its used in // MetaReader and MetaEditor classes usually just to get the Configuration // its using (It does this indirectly by asking its HConnection for its // Configuration and even then this is just used to get an HConnection out on // the other end). St.Ack 10/23/2011. // {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-3446: - Status: Patch Available (was: Open) ProcessServerShutdown fails if META moves, orphaning lots of regions Key: HBASE-3446 URL: https://issues.apache.org/jira/browse/HBASE-3446 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.0 Reporter: Todd Lipcon Assignee: stack Priority: Blocker Fix For: 0.92.0 Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 3446v15.txt I ran a rolling restart on a 5 node cluster with lots of regions, and afterwards had LOTS of regions left orphaned. The issue appears to be that ProcessServerShutdown failed because the server hosting META was restarted around the same time as another server was being processed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.
HFile V2 does not honor setCacheBlocks when scanning. - Key: HBASE-4496 URL: https://issues.apache.org/jira/browse/HBASE-4496 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Fix For: 0.92.0, 0.94.0 While testing the LRU cache during the scanning I noticed quite some churn in the cache even when Scan.cacheBlocks is set to false. After debugging this, I found that HFile V2 always caches blocks in the LRU cache regardless of the cacheBlocks setting. Here's a trace (from Eclipse) showing the problem: HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, HFileBlock) line: 191 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151 StoreFileScanner.reseek(KeyValue) line: 110 KeyValueHeap.reseek(KeyValue) line: 255 StoreScanner.reseek(KeyValue) line: 409 StoreScanner.next(ListKeyValue, int) line: 304 KeyValueHeap.next(ListKeyValue, int) line: 114 KeyValueHeap.next(ListKeyValue) line: 143 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774 HRegion$RegionScannerImpl.nextInternal(int) line: 2722 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 HRegionServer.next(long, int) line: 2092 Every scanner.next causes a reseek, which eventually causes a call to HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the cacheBlocks information is lost. HFileReaderV2.readBlockData calls HFileReaderV2.readBlock with cacheBlocks set unconditionally to true. The fix is not immediately clear, unless we want to pass cacheBlocks to HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly as readBlockData should not care about caching. Avoiding caching during scans is somewhat important for us. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row
[ https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115353#comment-13115353 ] Hudson commented on HBASE-4433: --- Integrated in HBase-TRUNK #2261 (See [https://builds.apache.org/job/HBase-TRUNK/2261/]) HBASE-4433 avoid extra next (potentially a seek) if done with column/row (kannan via jgray) jgray : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java avoid extra next (potentially a seek) if done with column/row - Key: HBASE-4433 URL: https://issues.apache.org/jira/browse/HBASE-4433 Project: HBase Issue Type: Improvement Reporter: Kannan Muthukkaruppan Assignee: Kannan Muthukkaruppan Fix For: 0.94.0 [Noticed this in 89, but quite likely true of trunk as well.] When we are done with the requested column(s) the code still does an extra next() call before it realizes that it is actually done. This extra next() call could potentially result in an unnecessary extra block load. This is likely to be especially bad for CFs where the KVs are large blobs where each KV may be occupying a block of its own. So the next() can often load a new unrelated block unnecessarily. -- For the simple case of reading say the top-most column in a row in a single file, where each column (KV) was say a block of its own-- it seems that we are reading 3 blocks, instead of 1 block! I am working on a simple patch and with that the number of seeks is down to 2. [There is still an extra seek left. I think there were two levels of extra/unnecessary next() we were doing without actually confirming that the next was needed. One at the StoreScanner/ScanQueryMatcher level which this diff avoids. I think the other is at hfs.next() (at the storefile scanner level) that's happening whenever a HFile scanner servers out a data-- and perhaps that's the additional seek that we need to avoid. But I want to tackle this optimization first as the two issues seem unrelated.] -- The basic idea of the patch I am working on/testing is as follows. The ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if the KV needs to be included and then if done, only in the the next call it returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases when ExplicitColumnTracker knows it is done with a particular column/row, the patch attempts to combine the INCLUDE code and done hint into a single match code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115391#comment-13115391 ] jirapos...@reviews.apache.org commented on HBASE-3446: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2065/#review2083 --- Thanks for the cleanup. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4725 Back to future :-) src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4727 Nice. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4726 Extra so. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4730 Do we need to update now and pass to the helper methods ? The helper methods can easily figure out what now should be. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4729 I think would be enough. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4731 I am confused by the condition here. - Ted On 2011-09-27 06:38:09, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2065/ bq. --- bq. bq. (Updated 2011-09-27 06:38:09) bq. bq. bq. Review request for hbase and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Make the Meta* operations against meta retry. We do it by using HTable instances. bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc). bq. In 0.89, we had special RetryableMetaOperation class that was a bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries bq. with its retry loop. Now we just use HTable instead (Costs some on setup but bq. otherwise, we avoid duplicating code). Upped the retries on serverside too. bq. bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying bq. heavily on CT methods getting proxy connections to meta and root servers. bq. CT needs to be cut back. This patch closes down access on (unused) public bq. methods and removes being able to get an HRegionInterface on meta and root bq. -- this stuff is used internally to CT only now; use MetaEditor or bq. MetaReader if you want to update or read catalog tables. Opening new issue bq. to cutback CT use over the code base. bq. bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor bq. trying to clean them up, I ended up moving meta migration code out to its bq. own class rather than have it in all inside in MetaEditor. bq. bq. Here is some detail to help reviews. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java bq.Clean up. Shutdown access on some of these unused methods. Don't bq.let out HRegionInterface instances in particular since we are going bq.away from raw HRI use to instead use a connection with retries: bq.i.e. HTable. bq. bq.Comments on state of this class. Javadoc edits. bq.getZooKeeperWatcher on HConnection is deprecated so don't use it bq.in constructor. Override MetaNodeTracker and on node delete bq.reset meta location (We used to do this over in MetaNodeTracker bq.but to do that we had to have a CatalogTracker over in zk package bq.which is silly -- bad package encapsulation). bq. bq.(waitForRootServer) Renamed getRootServerConnection and change it bq.from public to package private. bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed. bq.(getMetaServerConnection) Change from public to package private. bq.Use MetaReader to read the meta location in root rather than a bq.raw HRegionInterface so we get retrying. bq.(remaining, timedout) Added utility methods. bq.(waitForMetaServer) Changed from public to private. bq.(resetMetaLocation) Made it synchronized on metaAvailable. bq.Not all accesses were synchronized. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java bq.Refactor to use HTable instead of raw HRegionInterface so we get bq.retrying. For each operation we get an HTable, use it, then close it. bq.(putToMetaTable, putsToMetaTable, etc) Utility methods. bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own bq.class since these classes are for a one-time migration only. bq. bq. A
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115394#comment-13115394 ] jirapos...@reviews.apache.org commented on HBASE-3446: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2065/#review2084 --- src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4732 Log should be changed accordingly. - Ted On 2011-09-27 06:38:09, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2065/ bq. --- bq. bq. (Updated 2011-09-27 06:38:09) bq. bq. bq. Review request for hbase and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Make the Meta* operations against meta retry. We do it by using HTable instances. bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc). bq. In 0.89, we had special RetryableMetaOperation class that was a bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries bq. with its retry loop. Now we just use HTable instead (Costs some on setup but bq. otherwise, we avoid duplicating code). Upped the retries on serverside too. bq. bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying bq. heavily on CT methods getting proxy connections to meta and root servers. bq. CT needs to be cut back. This patch closes down access on (unused) public bq. methods and removes being able to get an HRegionInterface on meta and root bq. -- this stuff is used internally to CT only now; use MetaEditor or bq. MetaReader if you want to update or read catalog tables. Opening new issue bq. to cutback CT use over the code base. bq. bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor bq. trying to clean them up, I ended up moving meta migration code out to its bq. own class rather than have it in all inside in MetaEditor. bq. bq. Here is some detail to help reviews. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java bq.Clean up. Shutdown access on some of these unused methods. Don't bq.let out HRegionInterface instances in particular since we are going bq.away from raw HRI use to instead use a connection with retries: bq.i.e. HTable. bq. bq.Comments on state of this class. Javadoc edits. bq.getZooKeeperWatcher on HConnection is deprecated so don't use it bq.in constructor. Override MetaNodeTracker and on node delete bq.reset meta location (We used to do this over in MetaNodeTracker bq.but to do that we had to have a CatalogTracker over in zk package bq.which is silly -- bad package encapsulation). bq. bq.(waitForRootServer) Renamed getRootServerConnection and change it bq.from public to package private. bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed. bq.(getMetaServerConnection) Change from public to package private. bq.Use MetaReader to read the meta location in root rather than a bq.raw HRegionInterface so we get retrying. bq.(remaining, timedout) Added utility methods. bq.(waitForMetaServer) Changed from public to private. bq.(resetMetaLocation) Made it synchronized on metaAvailable. bq.Not all accesses were synchronized. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java bq.Refactor to use HTable instead of raw HRegionInterface so we get bq.retrying. For each operation we get an HTable, use it, then close it. bq.(putToMetaTable, putsToMetaTable, etc) Utility methods. bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own bq.class since these classes are for a one-time migration only. bq. bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java bq.New class that holds all Meta* methods updating meta table used bq.doing the one-time migration done to meta on startup. This class bq.is marked deprecated because its going to be dropped in 0.94. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java bq.Retrofit methods in here to use fullScan methods with Visitor. bq.(getCatalogRegionInterface, getCatalogRegionNameForTable, bq. getCatalogRegionNameForRegion) Removed. bq.(fullScan) Cleaned up the fullScans. Fixed up wrong javadoc. bq.(fullScanOfResults) Renamed as fullScan override. bq.(fullScanOfRoot) Added as deprecated. We should be doing bq.this against zk. bq.(metaRowToRegionPair, getServerNameFromResult)
[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests
[ https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115435#comment-13115435 ] Doug Meil commented on HBASE-4448: -- More than one cluster in the same test? That would have to be a replication test. I'm not sure that's going to work, I think the best-case is to re-use one cluster, and then fire-up another cluster from scratch. HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests - Key: HBASE-4448 URL: https://issues.apache.org/jira/browse/HBASE-4448 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: HBaseTestingUtilityFactory.java, hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch Setting up and tearing down HBaseTestingUtility instances in unit tests is very expensive. On my MacBook it takes about 10 seconds to set up a MiniCluster, and 7 seconds to tear it down. When multiplied by the number of test classes that use this facility, that's a lot of time in the build. This factory assumes that the JVM is being re-used across test classes in the build, otherwise this pattern won't work. I don't think this is appropriate for every use, but I think it can be applicable in a great many cases - especially where developers just want a simple MiniCluster with 1 slave. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4485) Eliminate window of missing Data
[ https://issues.apache.org/jira/browse/HBASE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amitanand Aiyer updated HBASE-4485: --- Attachment: repro_bug-4485.diff Here is a way to repro the bug. uses unnecessary sleep to get things to go bad. Not intended to be included in the final diff/submission. Eliminate window of missing Data Key: HBASE-4485 URL: https://issues.apache.org/jira/browse/HBASE-4485 Project: HBase Issue Type: Sub-task Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Fix For: 0.94.0 Attachments: 4485-v1.diff, repro_bug-4485.diff After incorporating v11 of the 2856 fix, we discovered that we are still having some ACID violations. This time, however, the problem is not about including newer updates; but, about missing older updates that should be including. Here is what seems to be happening. There is a race condition in the StoreScanner.getScanners() private ListKeyValueScanner getScanners(Scan scan, final NavigableSetbyte[] columns) throws IOException { // First the store file scanners ListStoreFileScanner sfScanners = StoreFileScanner .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks, isGet, false); ListKeyValueScanner scanners = new ArrayListKeyValueScanner(sfScanners.size()+1); // include only those scan files which pass all filters for (StoreFileScanner sfs : sfScanners) { if (sfs.shouldSeek(scan, columns)) { scanners.add(sfs); } } // Then the memstore scanners if (this.store.memstore.shouldSeek(scan)) { scanners.addAll(this.store.memstore.getScanners()); } return scanners; } If for example there is a call to Store.updateStorefiles() that happens between the store.getStorefiles() and this.store.memstore.getScanners(); then it is possible that there was a new HFile created, that is not seen by the StoreScanner, and the data is not present in the Memstore.snapshot either. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4485) Eliminate window of missing Data
[ https://issues.apache.org/jira/browse/HBASE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amitanand Aiyer updated HBASE-4485: --- Attachment: 4485-v2.diff fix the issue using locks to ensure that Store.updateStoreFiles does not get called between StoreScanner getting the List of store files, and it getting to the MemStoreScanner. Eliminate window of missing Data Key: HBASE-4485 URL: https://issues.apache.org/jira/browse/HBASE-4485 Project: HBase Issue Type: Sub-task Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Fix For: 0.94.0 Attachments: 4485-v1.diff, 4485-v2.diff, 4485-v3.diff, repro_bug-4485.diff After incorporating v11 of the 2856 fix, we discovered that we are still having some ACID violations. This time, however, the problem is not about including newer updates; but, about missing older updates that should be including. Here is what seems to be happening. There is a race condition in the StoreScanner.getScanners() private ListKeyValueScanner getScanners(Scan scan, final NavigableSetbyte[] columns) throws IOException { // First the store file scanners ListStoreFileScanner sfScanners = StoreFileScanner .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks, isGet, false); ListKeyValueScanner scanners = new ArrayListKeyValueScanner(sfScanners.size()+1); // include only those scan files which pass all filters for (StoreFileScanner sfs : sfScanners) { if (sfs.shouldSeek(scan, columns)) { scanners.add(sfs); } } // Then the memstore scanners if (this.store.memstore.shouldSeek(scan)) { scanners.addAll(this.store.memstore.getScanners()); } return scanners; } If for example there is a call to Store.updateStorefiles() that happens between the store.getStorefiles() and this.store.memstore.getScanners(); then it is possible that there was a new HFile created, that is not seen by the StoreScanner, and the data is not present in the Memstore.snapshot either. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4485) Eliminate window of missing Data
[ https://issues.apache.org/jira/browse/HBASE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amitanand Aiyer updated HBASE-4485: --- Attachment: 4485-v3.diff Move some functions to Store.java so that we do not have to access store.lock from StoreScanner.java Passes the TestAcidGuarantees on the internal branch (0.89). Running the test suite on the open source trunk (in progress) Eliminate window of missing Data Key: HBASE-4485 URL: https://issues.apache.org/jira/browse/HBASE-4485 Project: HBase Issue Type: Sub-task Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Fix For: 0.94.0 Attachments: 4485-v1.diff, 4485-v2.diff, 4485-v3.diff, repro_bug-4485.diff After incorporating v11 of the 2856 fix, we discovered that we are still having some ACID violations. This time, however, the problem is not about including newer updates; but, about missing older updates that should be including. Here is what seems to be happening. There is a race condition in the StoreScanner.getScanners() private ListKeyValueScanner getScanners(Scan scan, final NavigableSetbyte[] columns) throws IOException { // First the store file scanners ListStoreFileScanner sfScanners = StoreFileScanner .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks, isGet, false); ListKeyValueScanner scanners = new ArrayListKeyValueScanner(sfScanners.size()+1); // include only those scan files which pass all filters for (StoreFileScanner sfs : sfScanners) { if (sfs.shouldSeek(scan, columns)) { scanners.add(sfs); } } // Then the memstore scanners if (this.store.memstore.shouldSeek(scan)) { scanners.addAll(this.store.memstore.getScanners()); } return scanners; } If for example there is a call to Store.updateStorefiles() that happens between the store.getStorefiles() and this.store.memstore.getScanners(); then it is possible that there was a new HFile created, that is not seen by the StoreScanner, and the data is not present in the Memstore.snapshot either. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115543#comment-13115543 ] ramkrishna.s.vasudevan commented on HBASE-4492: --- Hi Ted I found the problem but i dont have the patch with me now. Pls correct me if am wrong. As per my analysis, the testcase always passes under one scenario when the ROOT and META are in the same RS. If we take the testcode {code} // Bring the RS hosting ROOT down and the RS hosting META down at once RegionServerThread rootServer = getServerHostingRoot(cluster); RegionServerThread metaServer = getServerHostingMeta(cluster); if (rootServer == metaServer) { log(ROOT and META on the same server so killing another random server); int i=0; while (rootServer == metaServer) { metaServer = cluster.getRegionServerThreads().get(i); i++; } } log(Stopping server hosting ROOT); rootServer.getRegionServer().stop(Stopping ROOT server); log(Stopping server hosting META #1); metaServer.getRegionServer().stop(Stopping META server); {code} we try to be cautious if the ROOT and META are in the same RS. If we find ROOT and META in same RS we just assign metaServer to another RS(we dont check if it is the one having META) but still call rootServer.stop(). Now this will stop the rootServer internally closing both the root and meta region in it. Hence the line {code} cluster.hbaseCluster.waitOnRegionServer(metaServer); {code} This also comes out. Now while the ServerShutdownhandler processes this it can cleanly open a new ROOT and META. If you take the failure cases the problem has been like the ROOT and META are assigned to different servers. There is a time gap inbetween the ROOT rs going down and the META rs going down. This where something happens like the ROOT is tried to be assigned by the ServerShutdownHandler invoked due to ROOT rs at the same time some one else tries to assign the ROOT node(who is this someone am not clear :)) Pls do correct me if am wrong. Will try to provide a patch so that this issue doesnt come. TestRollingRestart fails intermittently --- Key: HBASE-4492 URL: https://issues.apache.org/jira/browse/HBASE-4492 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Jonathan Gray Attachments: 4492.txt I got the following when running test suite on TRUNK: {code} testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart) Time elapsed: 300.28 sec ERROR! java.lang.Exception: test timed out after 30 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313) at org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210) {code} I ran TestRollingRestart#testBasicRollingRestart manually afterwards which wiped out test output file for the failed test. Similar failure can be found on Jenkins: https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4485) Eliminate window of missing Data
[ https://issues.apache.org/jira/browse/HBASE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amitanand Aiyer updated HBASE-4485: --- Attachment: 4485-v4.diff move notifyChangeReaderObservers() outside the lock at all places. Testing in progress. Eliminate window of missing Data Key: HBASE-4485 URL: https://issues.apache.org/jira/browse/HBASE-4485 Project: HBase Issue Type: Sub-task Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Fix For: 0.94.0 Attachments: 4485-v1.diff, 4485-v2.diff, 4485-v3.diff, 4485-v4.diff, repro_bug-4485.diff After incorporating v11 of the 2856 fix, we discovered that we are still having some ACID violations. This time, however, the problem is not about including newer updates; but, about missing older updates that should be including. Here is what seems to be happening. There is a race condition in the StoreScanner.getScanners() private ListKeyValueScanner getScanners(Scan scan, final NavigableSetbyte[] columns) throws IOException { // First the store file scanners ListStoreFileScanner sfScanners = StoreFileScanner .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks, isGet, false); ListKeyValueScanner scanners = new ArrayListKeyValueScanner(sfScanners.size()+1); // include only those scan files which pass all filters for (StoreFileScanner sfs : sfScanners) { if (sfs.shouldSeek(scan, columns)) { scanners.add(sfs); } } // Then the memstore scanners if (this.store.memstore.shouldSeek(scan)) { scanners.addAll(this.store.memstore.getScanners()); } return scanners; } If for example there is a call to Store.updateStorefiles() that happens between the store.getStorefiles() and this.store.memstore.getScanners(); then it is possible that there was a new HFile created, that is not seen by the StoreScanner, and the data is not present in the Memstore.snapshot either. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4344) Persist memstoreTS to disk
[ https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amitanand Aiyer updated HBASE-4344: --- Attachment: 4344-v12.txt combine 4344-v11.txt with 4485-v4.txt Running the testsuite (in progress) Persist memstoreTS to disk -- Key: HBASE-4344 URL: https://issues.apache.org/jira/browse/HBASE-4344 Project: HBase Issue Type: Sub-task Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Fix For: 0.89.20100924 Attachments: 4344-v10.txt, 4344-v11.txt, 4344-v12.txt, 4344-v2.txt, 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 4344-v7.txt, 4344-v8.txt, 4344-v9.txt, patch-2 Atomicity can be achieved in two ways -- (i) by using a multiversion concurrency system (MVCC), or (ii) by ensuring that new writes do not complete, until the old reads complete. Currently, Memstore uses something along the lines of MVCC (called RWCC for read-write-consistency-control). But, this mechanism is not incorporated for the key-values written to the disk, as they do not include the memstore TS. Let us make the two approaches be similar, by persisting the memstoreTS along with the key-value when it is written to the disk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests
[ https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-4448: - Attachment: java_HBASE_4448_v2.patch Fixed point #3. HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests - Key: HBASE-4448 URL: https://issues.apache.org/jira/browse/HBASE-4448 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: HBaseTestingUtilityFactory.java, hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch, java_HBASE_4448_v2.patch Setting up and tearing down HBaseTestingUtility instances in unit tests is very expensive. On my MacBook it takes about 10 seconds to set up a MiniCluster, and 7 seconds to tear it down. When multiplied by the number of test classes that use this facility, that's a lot of time in the build. This factory assumes that the JVM is being re-used across test classes in the build, otherwise this pattern won't work. I don't think this is appropriate for every use, but I think it can be applicable in a great many cases - especially where developers just want a simple MiniCluster with 1 slave. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115592#comment-13115592 ] jirapos...@reviews.apache.org commented on HBASE-3446: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2065/#review2097 --- src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4761 I was trying to minimize how many times we do System.currentTimeMillis src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4762 Agreed src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4764 Will fix w/ a comment. If timeout is 0, then we do not timeout. I should call it out explicitly. - Michael On 2011-09-27 06:38:09, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2065/ bq. --- bq. bq. (Updated 2011-09-27 06:38:09) bq. bq. bq. Review request for hbase and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Make the Meta* operations against meta retry. We do it by using HTable instances. bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc). bq. In 0.89, we had special RetryableMetaOperation class that was a bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries bq. with its retry loop. Now we just use HTable instead (Costs some on setup but bq. otherwise, we avoid duplicating code). Upped the retries on serverside too. bq. bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying bq. heavily on CT methods getting proxy connections to meta and root servers. bq. CT needs to be cut back. This patch closes down access on (unused) public bq. methods and removes being able to get an HRegionInterface on meta and root bq. -- this stuff is used internally to CT only now; use MetaEditor or bq. MetaReader if you want to update or read catalog tables. Opening new issue bq. to cutback CT use over the code base. bq. bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor bq. trying to clean them up, I ended up moving meta migration code out to its bq. own class rather than have it in all inside in MetaEditor. bq. bq. Here is some detail to help reviews. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java bq.Clean up. Shutdown access on some of these unused methods. Don't bq.let out HRegionInterface instances in particular since we are going bq.away from raw HRI use to instead use a connection with retries: bq.i.e. HTable. bq. bq.Comments on state of this class. Javadoc edits. bq.getZooKeeperWatcher on HConnection is deprecated so don't use it bq.in constructor. Override MetaNodeTracker and on node delete bq.reset meta location (We used to do this over in MetaNodeTracker bq.but to do that we had to have a CatalogTracker over in zk package bq.which is silly -- bad package encapsulation). bq. bq.(waitForRootServer) Renamed getRootServerConnection and change it bq.from public to package private. bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed. bq.(getMetaServerConnection) Change from public to package private. bq.Use MetaReader to read the meta location in root rather than a bq.raw HRegionInterface so we get retrying. bq.(remaining, timedout) Added utility methods. bq.(waitForMetaServer) Changed from public to private. bq.(resetMetaLocation) Made it synchronized on metaAvailable. bq.Not all accesses were synchronized. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java bq.Refactor to use HTable instead of raw HRegionInterface so we get bq.retrying. For each operation we get an HTable, use it, then close it. bq.(putToMetaTable, putsToMetaTable, etc) Utility methods. bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own bq.class since these classes are for a one-time migration only. bq. bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java bq.New class that holds all Meta* methods updating meta table used bq.doing the one-time migration done to meta on startup. This class bq.is marked deprecated because its going to be dropped in 0.94. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java bq.Retrofit methods in here to use fullScan methods with
[jira] [Assigned] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.
[ https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl reassigned HBASE-4496: Assignee: Lars Hofhansl HFile V2 does not honor setCacheBlocks when scanning. - Key: HBASE-4496 URL: https://issues.apache.org/jira/browse/HBASE-4496 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0, 0.94.0 While testing the LRU cache during the scanning I noticed quite some churn in the cache even when Scan.cacheBlocks is set to false. After debugging this, I found that HFile V2 always caches blocks in the LRU cache regardless of the cacheBlocks setting. Here's a trace (from Eclipse) showing the problem: HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, HFileBlock) line: 191 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151 StoreFileScanner.reseek(KeyValue) line: 110 KeyValueHeap.reseek(KeyValue) line: 255 StoreScanner.reseek(KeyValue) line: 409 StoreScanner.next(ListKeyValue, int) line: 304 KeyValueHeap.next(ListKeyValue, int) line: 114 KeyValueHeap.next(ListKeyValue) line: 143 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774 HRegion$RegionScannerImpl.nextInternal(int) line: 2722 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 HRegionServer.next(long, int) line: 2092 Every scanner.next causes a reseek, which eventually causes a call to HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the cacheBlocks information is lost. HFileReaderV2.readBlockData calls HFileReaderV2.readBlock with cacheBlocks set unconditionally to true. The fix is not immediately clear, unless we want to pass cacheBlocks to HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly as readBlockData should not care about caching. Avoiding caching during scans is somewhat important for us. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3025) Coprocessor based simple access control
[ https://issues.apache.org/jira/browse/HBASE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115708#comment-13115708 ] jirapos...@reviews.apache.org commented on HBASE-3025: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2041/#review2077 --- Looks good. The majority of my comments have to do with inconsistent logging practice. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControlFilter.java https://reviews.apache.org/r/2041/#comment4718 Could be stated better. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControlLists.java https://reviews.apache.org/r/2041/#comment4719 No. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControlLists.java https://reviews.apache.org/r/2041/#comment4720 Comment needs updating. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java https://reviews.apache.org/r/2041/#comment4721 Can we make this 1? security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java https://reviews.apache.org/r/2041/#comment4775 Debug logging should go to LOG not AUDITLOG security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java https://reviews.apache.org/r/2041/#comment4782 Should be INFO or TRACE level? TRACE makes more sense to me. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java https://reviews.apache.org/r/2041/#comment4776 Debug logging should go to LOG not AUDITLOG security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java https://reviews.apache.org/r/2041/#comment4779 Should be INFO or TRACE level? TRACE makes more sense to me. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java https://reviews.apache.org/r/2041/#comment4795 Should something go to AUDITLOG here? security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java https://reviews.apache.org/r/2041/#comment4797 Should hasFamilyQualifierPermission log to AUDITLOG? It is used in places to make decisions -- an exception is thrown directly or not. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java https://reviews.apache.org/r/2041/#comment4783 Another one of these was sent to AUDITLOG above. Do the same here? Should be INFO or TRACE level? TRACE makes more sense to me. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java https://reviews.apache.org/r/2041/#comment4786 Ultimately users should be allowed to enable or disable their own tables, but only after such operations don't carry as much systemic risk as they do currently. In that case, CREATE permission and an ownership check could follow the test for ADMIN permission. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java https://reviews.apache.org/r/2041/#comment4787 As above security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java https://reviews.apache.org/r/2041/#comment4791 Should be logged with ERROR? security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java https://reviews.apache.org/r/2041/#comment4799 Would it be clearer then to call permissionGranted() something like hasColumnsPermission() ? Just a random thought. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java https://reviews.apache.org/r/2041/#comment4803 Should this go to AUDITLOG? At INFO or TRACE level? My preference is TRACE. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java https://reviews.apache.org/r/2041/#comment4804 Should this go to AUDITLOG? At INFO or TRACE level? My preference is TRACE. security/src/main/java/org/apache/hadoop/hbase/security/rbac/Permission.java https://reviews.apache.org/r/2041/#comment4807 What if instead we check for version 0 and throw an IllegalArgumentException if so? Technically, it is an invalid request if it contains an unrecognizable action code. Skipping this check if version 0 would be a way to handle new perms while not accepting incorrect input otherwise. security/src/main/java/org/apache/hadoop/hbase/security/rbac/TableAuthManager.java https://reviews.apache.org/r/2041/#comment4813 Maybe we can call this .auth.? We don't really have an RBAC implementation yet. Likewise for the package name for all of this stuff? Just a random thought. security/src/main/java/org/apache/hadoop/hbase/security/rbac/TableAuthManager.java https://reviews.apache.org/r/2041/#comment4815 Isn't this an error?
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115728#comment-13115728 ] jirapos...@reviews.apache.org commented on HBASE-3446: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2065/#review2099 --- src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4835 The condition, now stopTime, is reversed for isTimedOut(). - Ted On 2011-09-27 06:38:09, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2065/ bq. --- bq. bq. (Updated 2011-09-27 06:38:09) bq. bq. bq. Review request for hbase and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Make the Meta* operations against meta retry. We do it by using HTable instances. bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc). bq. In 0.89, we had special RetryableMetaOperation class that was a bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries bq. with its retry loop. Now we just use HTable instead (Costs some on setup but bq. otherwise, we avoid duplicating code). Upped the retries on serverside too. bq. bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying bq. heavily on CT methods getting proxy connections to meta and root servers. bq. CT needs to be cut back. This patch closes down access on (unused) public bq. methods and removes being able to get an HRegionInterface on meta and root bq. -- this stuff is used internally to CT only now; use MetaEditor or bq. MetaReader if you want to update or read catalog tables. Opening new issue bq. to cutback CT use over the code base. bq. bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor bq. trying to clean them up, I ended up moving meta migration code out to its bq. own class rather than have it in all inside in MetaEditor. bq. bq. Here is some detail to help reviews. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java bq.Clean up. Shutdown access on some of these unused methods. Don't bq.let out HRegionInterface instances in particular since we are going bq.away from raw HRI use to instead use a connection with retries: bq.i.e. HTable. bq. bq.Comments on state of this class. Javadoc edits. bq.getZooKeeperWatcher on HConnection is deprecated so don't use it bq.in constructor. Override MetaNodeTracker and on node delete bq.reset meta location (We used to do this over in MetaNodeTracker bq.but to do that we had to have a CatalogTracker over in zk package bq.which is silly -- bad package encapsulation). bq. bq.(waitForRootServer) Renamed getRootServerConnection and change it bq.from public to package private. bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed. bq.(getMetaServerConnection) Change from public to package private. bq.Use MetaReader to read the meta location in root rather than a bq.raw HRegionInterface so we get retrying. bq.(remaining, timedout) Added utility methods. bq.(waitForMetaServer) Changed from public to private. bq.(resetMetaLocation) Made it synchronized on metaAvailable. bq.Not all accesses were synchronized. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java bq.Refactor to use HTable instead of raw HRegionInterface so we get bq.retrying. For each operation we get an HTable, use it, then close it. bq.(putToMetaTable, putsToMetaTable, etc) Utility methods. bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own bq.class since these classes are for a one-time migration only. bq. bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java bq.New class that holds all Meta* methods updating meta table used bq.doing the one-time migration done to meta on startup. This class bq.is marked deprecated because its going to be dropped in 0.94. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java bq.Retrofit methods in here to use fullScan methods with Visitor. bq.(getCatalogRegionInterface, getCatalogRegionNameForTable, bq. getCatalogRegionNameForRegion) Removed. bq.(fullScan) Cleaned up the fullScans. Fixed up wrong javadoc. bq.(fullScanOfResults) Renamed as fullScan override. bq.(fullScanOfRoot) Added as deprecated. We should be doing bq.this against zk. bq.(metaRowToRegionPair,
[jira] [Issue Comment Edited] (HBASE-4492) TestRollingRestart fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115249#comment-13115249 ] Ted Yu edited comment on HBASE-4492 at 9/27/11 5:26 PM: Found the following in output for the above timeout case: {code} 2011-09-27 05:28:59,047 DEBUG [RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] zookeeper.ZooKeeperWatcher(233): regionserver:57539-0x132a95afa18000d Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/root-region-server 2011-09-27 05:28:59,047 DEBUG [RegionServer:3;us.ciq.com,58748,131710132-EventThread] zookeeper.ZKUtil(226): regionserver:58748-0x132a95afa18000a /hbase/root-region-server does not exist. Watcher is set. 2011-09-27 05:28:59,047 DEBUG [Master:0;us.ciq.com,56327,1317101304726-EventThread] zookeeper.ZKUtil(226): hconnection-0x132a95afa180005 /hbase/root-region-server does not exist. Watcher is set. 2011-09-27 05:28:59,048 DEBUG [Thread-1-EventThread] zookeeper.ZooKeeperWatcher(233): master:51567-0x132a95afa180008 Received ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/unassigned 2011-09-27 05:28:59,048 DEBUG [RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] zookeeper.ZKUtil(226): regionserver:57539-0x132a95afa18000d /hbase/root-region-server does not exist. Watcher is set. 2011-09-27 05:28:59,049 INFO [MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] master.AssignmentManager(1485): No previous transition plan was found (or we are ignoring an existing plan) for -ROOT-,,0.70236052 so generated a random one; hri=-ROOT-,,0.70236052, src=, dest=us.ciq.com,57500,1317101330748; 3 (online=3, exclude=null) available servers 2011-09-27 05:28:59,049 INFO [MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] master.AssignmentManager(1485): Assigning region -ROOT-,,0.70236052 to us.ciq.com,57500,1317101330748 2011-09-27 05:28:59,049 DEBUG [MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] master.ServerManager(448): New connection to us.ciq.com,57500,1317101330748 2011-09-27 05:28:59,049 DEBUG [Thread-1-EventThread] zookeeper.ZKUtil(224): master:51567-0x132a95afa180008 Set watcher on existing znode /hbase/unassigned/70236052 2011-09-27 05:28:59,049 FATAL [MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] master.HMaster(1181): Master server abort: loaded coprocessors are: [] 2011-09-27 05:28:59,050 FATAL [MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] master.HMaster(1186): Unexpected state trying to OFFLINE; -ROOT-,,0.70236052 state=PENDING_OPEN, ts=1317101339049, server=us.ciq.com,57500,1317101330748 java.lang.IllegalStateException at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1517) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1392) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1169) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1144) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1139) at org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1816) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:105) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:123) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:186) {code} was (Author: yuzhih...@gmail.com): Found the following in output for the above timeout case: {code} 2011-09-27 05:28:59,047 DEBUG [RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] zookeeper.ZooKeeperWatcher(233): regionserver:57539-0x132a95afa18000d Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/root-region-server2011-09-27 05:28:59,047 DEBUG [RegionServer:3;us.ciq.com,58748,131710132-EventThread] zookeeper.ZKUtil(226): regionserver:58748-0x132a95afa18000a /hbase/root-region-server does not exist. Watcher is set.2011-09-27 05:28:59,047 DEBUG [Master:0;us.ciq.com,56327,1317101304726-EventThread] zookeeper.ZKUtil(226): hconnection-0x132a95afa180005 /hbase/root-region-server does not exist. Watcher is set.2011-09-27 05:28:59,048 DEBUG [Thread-1-EventThread] zookeeper.ZooKeeperWatcher(233): master:51567-0x132a95afa180008 Received ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/unassigned2011-09-27 05:28:59,048 DEBUG [RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] zookeeper.ZKUtil(226): regionserver:57539-0x132a95afa18000d /hbase/root-region-server does not exist.
[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row
[ https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115743#comment-13115743 ] Andrew Purtell commented on HBASE-4433: --- According to my tests, this is safe to do on 0.92 and 0.90 branches as well. This change should be applied there. avoid extra next (potentially a seek) if done with column/row - Key: HBASE-4433 URL: https://issues.apache.org/jira/browse/HBASE-4433 Project: HBase Issue Type: Improvement Reporter: Kannan Muthukkaruppan Assignee: Kannan Muthukkaruppan Fix For: 0.94.0 [Noticed this in 89, but quite likely true of trunk as well.] When we are done with the requested column(s) the code still does an extra next() call before it realizes that it is actually done. This extra next() call could potentially result in an unnecessary extra block load. This is likely to be especially bad for CFs where the KVs are large blobs where each KV may be occupying a block of its own. So the next() can often load a new unrelated block unnecessarily. -- For the simple case of reading say the top-most column in a row in a single file, where each column (KV) was say a block of its own-- it seems that we are reading 3 blocks, instead of 1 block! I am working on a simple patch and with that the number of seeks is down to 2. [There is still an extra seek left. I think there were two levels of extra/unnecessary next() we were doing without actually confirming that the next was needed. One at the StoreScanner/ScanQueryMatcher level which this diff avoids. I think the other is at hfs.next() (at the storefile scanner level) that's happening whenever a HFile scanner servers out a data-- and perhaps that's the additional seek that we need to avoid. But I want to tackle this optimization first as the two issues seem unrelated.] -- The basic idea of the patch I am working on/testing is as follows. The ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if the KV needs to be included and then if done, only in the the next call it returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases when ExplicitColumnTracker knows it is done with a particular column/row, the patch attempts to combine the INCLUDE code and done hint into a single match code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE
If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE --- Key: HBASE-4497 URL: https://issues.apache.org/jira/browse/HBASE-4497 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Priority: Critical As per the discussion in the mail chain HBCK reporting of possible mismatch in RS assignment this JIRA is created. Consider two RS- RS1 and RS2. A region tries to open in RS1. But it takes a while. The RS1 has still not updated meta and transitioned the node from OPENING to OPENED So timeout assigns the region to RS2. RS2 successfully updates the META and opens the region. Now RS1 tries to act on the region by first updating the META and then transiting the node to OPENING to OPENED. RS1 transiting the node to OPENING to OPENED will fail. But the META entry will have RS1 as the latest. Now HBCK reports this as an inconsistency and if we try to scan the Region we get NotServingRegionException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.
[ https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115775#comment-13115775 ] Lars Hofhansl commented on HBASE-4496: -- In a simple test this shaves about 20% of scanning time (with caching switched off), in addition to avoiding LRU and GC churn. HFile V2 does not honor setCacheBlocks when scanning. - Key: HBASE-4496 URL: https://issues.apache.org/jira/browse/HBASE-4496 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0, 0.94.0 While testing the LRU cache during the scanning I noticed quite some churn in the cache even when Scan.cacheBlocks is set to false. After debugging this, I found that HFile V2 always caches blocks in the LRU cache regardless of the cacheBlocks setting. Here's a trace (from Eclipse) showing the problem: HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, HFileBlock) line: 191 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151 StoreFileScanner.reseek(KeyValue) line: 110 KeyValueHeap.reseek(KeyValue) line: 255 StoreScanner.reseek(KeyValue) line: 409 StoreScanner.next(ListKeyValue, int) line: 304 KeyValueHeap.next(ListKeyValue, int) line: 114 KeyValueHeap.next(ListKeyValue) line: 143 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774 HRegion$RegionScannerImpl.nextInternal(int) line: 2722 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 HRegionServer.next(long, int) line: 2092 Every scanner.next causes a reseek, which eventually causes a call to HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the cacheBlocks information is lost. HFileReaderV2.readBlockData calls HFileReaderV2.readBlock with cacheBlocks set unconditionally to true. The fix is not immediately clear, unless we want to pass cacheBlocks to HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly as readBlockData should not care about caching. Avoiding caching during scans is somewhat important for us. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.
[ https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115787#comment-13115787 ] stack commented on HBASE-4496: -- Good find Lars. HFile V2 does not honor setCacheBlocks when scanning. - Key: HBASE-4496 URL: https://issues.apache.org/jira/browse/HBASE-4496 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0, 0.94.0 While testing the LRU cache during the scanning I noticed quite some churn in the cache even when Scan.cacheBlocks is set to false. After debugging this, I found that HFile V2 always caches blocks in the LRU cache regardless of the cacheBlocks setting. Here's a trace (from Eclipse) showing the problem: HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, HFileBlock) line: 191 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151 StoreFileScanner.reseek(KeyValue) line: 110 KeyValueHeap.reseek(KeyValue) line: 255 StoreScanner.reseek(KeyValue) line: 409 StoreScanner.next(ListKeyValue, int) line: 304 KeyValueHeap.next(ListKeyValue, int) line: 114 KeyValueHeap.next(ListKeyValue) line: 143 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774 HRegion$RegionScannerImpl.nextInternal(int) line: 2722 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 HRegionServer.next(long, int) line: 2092 Every scanner.next causes a reseek, which eventually causes a call to HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the cacheBlocks information is lost. HFileReaderV2.readBlockData calls HFileReaderV2.readBlock with cacheBlocks set unconditionally to true. The fix is not immediately clear, unless we want to pass cacheBlocks to HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly as readBlockData should not care about caching. Avoiding caching during scans is somewhat important for us. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE
[ https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115788#comment-13115788 ] stack commented on HBASE-4497: -- Do we need to add an extra tickle of OPENING znode after open of region and before we go to do meta update? If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE --- Key: HBASE-4497 URL: https://issues.apache.org/jira/browse/HBASE-4497 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Priority: Critical As per the discussion in the mail chain HBCK reporting of possible mismatch in RS assignment this JIRA is created. Consider two RS- RS1 and RS2. A region tries to open in RS1. But it takes a while. The RS1 has still not updated meta and transitioned the node from OPENING to OPENED So timeout assigns the region to RS2. RS2 successfully updates the META and opens the region. Now RS1 tries to act on the region by first updating the META and then transiting the node to OPENING to OPENED. RS1 transiting the node to OPENING to OPENED will fail. But the META entry will have RS1 as the latest. Now HBCK reports this as an inconsistency and if we try to scan the Region we get NotServingRegionException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row
[ https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HBASE-4433: -- Reopening until we apply to 0.92 and 0.90 too (as per Andrew). I'll do it (stop me Jon if you are already at this). avoid extra next (potentially a seek) if done with column/row - Key: HBASE-4433 URL: https://issues.apache.org/jira/browse/HBASE-4433 Project: HBase Issue Type: Improvement Reporter: Kannan Muthukkaruppan Assignee: Kannan Muthukkaruppan Fix For: 0.94.0 [Noticed this in 89, but quite likely true of trunk as well.] When we are done with the requested column(s) the code still does an extra next() call before it realizes that it is actually done. This extra next() call could potentially result in an unnecessary extra block load. This is likely to be especially bad for CFs where the KVs are large blobs where each KV may be occupying a block of its own. So the next() can often load a new unrelated block unnecessarily. -- For the simple case of reading say the top-most column in a row in a single file, where each column (KV) was say a block of its own-- it seems that we are reading 3 blocks, instead of 1 block! I am working on a simple patch and with that the number of seeks is down to 2. [There is still an extra seek left. I think there were two levels of extra/unnecessary next() we were doing without actually confirming that the next was needed. One at the StoreScanner/ScanQueryMatcher level which this diff avoids. I think the other is at hfs.next() (at the storefile scanner level) that's happening whenever a HFile scanner servers out a data-- and perhaps that's the additional seek that we need to avoid. But I want to tackle this optimization first as the two issues seem unrelated.] -- The basic idea of the patch I am working on/testing is as follows. The ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if the KV needs to be included and then if done, only in the the next call it returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases when ExplicitColumnTracker knows it is done with a particular column/row, the patch attempts to combine the INCLUDE code and done hint into a single match code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row
[ https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-4433. -- Resolution: Fixed Fix Version/s: (was: 0.94.0) 0.92.0 Applied to 0.92 branch too. Didn't add to 0.90 because doesn't apply clean (there are test files missing). avoid extra next (potentially a seek) if done with column/row - Key: HBASE-4433 URL: https://issues.apache.org/jira/browse/HBASE-4433 Project: HBase Issue Type: Improvement Reporter: Kannan Muthukkaruppan Assignee: Kannan Muthukkaruppan Fix For: 0.92.0 [Noticed this in 89, but quite likely true of trunk as well.] When we are done with the requested column(s) the code still does an extra next() call before it realizes that it is actually done. This extra next() call could potentially result in an unnecessary extra block load. This is likely to be especially bad for CFs where the KVs are large blobs where each KV may be occupying a block of its own. So the next() can often load a new unrelated block unnecessarily. -- For the simple case of reading say the top-most column in a row in a single file, where each column (KV) was say a block of its own-- it seems that we are reading 3 blocks, instead of 1 block! I am working on a simple patch and with that the number of seeks is down to 2. [There is still an extra seek left. I think there were two levels of extra/unnecessary next() we were doing without actually confirming that the next was needed. One at the StoreScanner/ScanQueryMatcher level which this diff avoids. I think the other is at hfs.next() (at the storefile scanner level) that's happening whenever a HFile scanner servers out a data-- and perhaps that's the additional seek that we need to avoid. But I want to tackle this optimization first as the two issues seem unrelated.] -- The basic idea of the patch I am working on/testing is as follows. The ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if the KV needs to be included and then if done, only in the the next call it returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases when ExplicitColumnTracker knows it is done with a particular column/row, the patch attempts to combine the INCLUDE code and done hint into a single match code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE
[ https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115795#comment-13115795 ] Jean-Daniel Cryans commented on HBASE-4497: --- Stack, it seems that it's already the case: {code} if (tickleOpening(post_region_open)) { if (updateMeta(region)) failed = false; } {code} In any case, there's still a hole as those two operations aren't done in an atomic fashion. If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE --- Key: HBASE-4497 URL: https://issues.apache.org/jira/browse/HBASE-4497 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Priority: Critical As per the discussion in the mail chain HBCK reporting of possible mismatch in RS assignment this JIRA is created. Consider two RS- RS1 and RS2. A region tries to open in RS1. But it takes a while. The RS1 has still not updated meta and transitioned the node from OPENING to OPENED So timeout assigns the region to RS2. RS2 successfully updates the META and opens the region. Now RS1 tries to act on the region by first updating the META and then transiting the node to OPENING to OPENED. RS1 transiting the node to OPENING to OPENED will fail. But the META entry will have RS1 as the latest. Now HBCK reports this as an inconsistency and if we try to scan the Region we get NotServingRegionException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115798#comment-13115798 ] Ming Ma commented on HBASE-4492: Here are the sequence of events how this could happen. It applies to any region. T1. After AM sent openRegion RPC to RS1, RS1 set ZK state as OPENING. T2. RS1 is stopped. T3. ZK callbacked is delayed with ZK state OPENING. AM skip the processing given RS1 is offline. Thus region's state in AM is still PENDING_OPEN. T4. During RS1's ServerShutDownHandler, it tries to assign the region by setting it offline first. This isn't allowed given the state isn't CLOSING or CLOSED. TestRollingRestart fails intermittently --- Key: HBASE-4492 URL: https://issues.apache.org/jira/browse/HBASE-4492 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Jonathan Gray Attachments: 4492.txt I got the following when running test suite on TRUNK: {code} testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart) Time elapsed: 300.28 sec ERROR! java.lang.Exception: test timed out after 30 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313) at org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210) {code} I ran TestRollingRestart#testBasicRollingRestart manually afterwards which wiped out test output file for the failed test. Similar failure can be found on Jenkins: https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.
[ https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4496: - Attachment: 4496.txt This fixes the problem for me. TestHFileBlock, TestCacheOnWrite, TestHFileBlockIndex, and TestHFileWriterV2 still pass. It's not pretty, though. Lots of places where cacheBlock now needs to be passed around. I erred on the side of caution, where is wasn't clear whether the block should be cached of not, I kept it being cached (as before). Somebody with more knowledge about HFileReaderV2 should have a look. If there's something nicer to do, please let me know. It is not clear to be how to write a test for this. HFile V2 does not honor setCacheBlocks when scanning. - Key: HBASE-4496 URL: https://issues.apache.org/jira/browse/HBASE-4496 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0, 0.94.0 Attachments: 4496.txt While testing the LRU cache during the scanning I noticed quite some churn in the cache even when Scan.cacheBlocks is set to false. After debugging this, I found that HFile V2 always caches blocks in the LRU cache regardless of the cacheBlocks setting. Here's a trace (from Eclipse) showing the problem: HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, HFileBlock) line: 191 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151 StoreFileScanner.reseek(KeyValue) line: 110 KeyValueHeap.reseek(KeyValue) line: 255 StoreScanner.reseek(KeyValue) line: 409 StoreScanner.next(ListKeyValue, int) line: 304 KeyValueHeap.next(ListKeyValue, int) line: 114 KeyValueHeap.next(ListKeyValue) line: 143 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774 HRegion$RegionScannerImpl.nextInternal(int) line: 2722 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 HRegionServer.next(long, int) line: 2092 Every scanner.next causes a reseek, which eventually causes a call to HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the cacheBlocks information is lost. HFileReaderV2.readBlockData calls HFileReaderV2.readBlock with cacheBlocks set unconditionally to true. The fix is not immediately clear, unless we want to pass cacheBlocks to HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly as readBlockData should not care about caching. Avoiding caching during scans is somewhat important for us. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115804#comment-13115804 ] jirapos...@reviews.apache.org commented on HBASE-3446: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2065/#review2106 --- Only part way done, will finish in the afternoon. I like the idea though, good stuff stack. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4855 Supposed to read When meta is moved to zk? src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4857 this comment talks a lot about what is wrong but it's not clear to me what changes are actually made right now. i see you say server-side only, but what do you propose instead? (i imagine i will find out reading the rest of the diff) src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4858 update this javadoc a bit... it's missing conf and you might also add additional context to stuff like abortable (which now appears optional and falls back to the connection itself?) src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4859 when would one override this? src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4860 is the behavior of this method unchanged? i guess now it returns before it's verified? any specific reason for the name change? (its behavior is definitely different from the old getRootServerConnection()). is it to match the Meta method? src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4861 i thought no verification in CT? src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4862 this is public but one with specified timeout is private now? src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4863 eeek, good catch - Jonathan On 2011-09-27 06:38:09, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2065/ bq. --- bq. bq. (Updated 2011-09-27 06:38:09) bq. bq. bq. Review request for hbase and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Make the Meta* operations against meta retry. We do it by using HTable instances. bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc). bq. In 0.89, we had special RetryableMetaOperation class that was a bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries bq. with its retry loop. Now we just use HTable instead (Costs some on setup but bq. otherwise, we avoid duplicating code). Upped the retries on serverside too. bq. bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying bq. heavily on CT methods getting proxy connections to meta and root servers. bq. CT needs to be cut back. This patch closes down access on (unused) public bq. methods and removes being able to get an HRegionInterface on meta and root bq. -- this stuff is used internally to CT only now; use MetaEditor or bq. MetaReader if you want to update or read catalog tables. Opening new issue bq. to cutback CT use over the code base. bq. bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor bq. trying to clean them up, I ended up moving meta migration code out to its bq. own class rather than have it in all inside in MetaEditor. bq. bq. Here is some detail to help reviews. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java bq.Clean up. Shutdown access on some of these unused methods. Don't bq.let out HRegionInterface instances in particular since we are going bq.away from raw HRI use to instead use a connection with retries: bq.i.e. HTable. bq. bq.Comments on state of this class. Javadoc edits. bq.getZooKeeperWatcher on HConnection is deprecated so don't use it bq.in constructor. Override MetaNodeTracker and on node delete bq.reset meta location (We used to do this over in MetaNodeTracker bq.but to do that we had to have a CatalogTracker over in zk package bq.which is silly -- bad package encapsulation). bq. bq.(waitForRootServer) Renamed getRootServerConnection and change it bq.from public
[jira] [Commented] (HBASE-4488) Store could miss rows during flush
[ https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115805#comment-13115805 ] Lars Hofhansl commented on HBASE-4488: -- @Jon, so you gonna commit this? :) Store could miss rows during flush -- Key: HBASE-4488 URL: https://issues.apache.org/jira/browse/HBASE-4488 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0, 0.94.0 Attachments: 4488.txt While looking at HBASE-4344 I found that my change HBASE-4241 contains a critical mistake: The while(scanner.next(kvs)) loop is incorrect and might miss the last edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests
[ https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115815#comment-13115815 ] Jesse Yates commented on HBASE-4448: Comments on the latest patch: In the usage, {code} - private final static HBaseTestingUtility TEST_UTIL = new HBaseTestingUtility(); + private static HBaseTestingUtility TEST_UTIL = new HBaseTestingUtility(); {code} Should probably not even assign anything, since it is done in setup. Going back to Ted's comment, {code} Integer i = mcUsageCount.get(tu); +if (i == null) { + i = ONE; +} else { + int j = i.intValue() + 1; + i = new Integer(j); +} +mcUsageCount.put(tu, i); + } {code} This seems overly complex - just use autoboxing here. Maybe we should use specific names for what ONE and ZERO mean (e.g. ONE - USE_INCREMENT, ZERO - UNUSED). In the UtilityCleaner, {code} + try { +while (true) { {code} could lead to some issues. We just need to make sure that the thread is a daemon thread when we start. HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests - Key: HBASE-4448 URL: https://issues.apache.org/jira/browse/HBASE-4448 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: HBaseTestingUtilityFactory.java, hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch, java_HBASE_4448_v2.patch Setting up and tearing down HBaseTestingUtility instances in unit tests is very expensive. On my MacBook it takes about 10 seconds to set up a MiniCluster, and 7 seconds to tear it down. When multiplied by the number of test classes that use this facility, that's a lot of time in the build. This factory assumes that the JVM is being re-used across test classes in the build, otherwise this pattern won't work. I don't think this is appropriate for every use, but I think it can be applicable in a great many cases - especially where developers just want a simple MiniCluster with 1 slave. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4352) Apply version of hbase-4015 to branch
[ https://issues.apache.org/jira/browse/HBASE-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115816#comment-13115816 ] Todd Lipcon commented on HBASE-4352: Can anyone volunteer to do some serious cluster testing with this patch? eg load up 1000 regions per server on 5+ nodes, and do rolling restarts or rolling crashes? Apply version of hbase-4015 to branch - Key: HBASE-4352 URL: https://issues.apache.org/jira/browse/HBASE-4352 Project: HBase Issue Type: Bug Reporter: stack Assignee: ramkrishna.s.vasudevan Fix For: 0.90.5 Attachments: HBASE-4352_0.90.patch, HBASE-4352_0.90_1.patch Consider adding a version of hbase-4015 to 0.90. It changes HRegionInterface so would need move change to end of the Interface and then test that it doesn't break rolling restart. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115818#comment-13115818 ] jirapos...@reviews.apache.org commented on HBASE-3446: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2065/#review2107 --- Thanks for reviews Ted and Jon. Will put up new patch when you fellas finish... src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4864 @Jon True. I opened another issue with suggested fix. I should at least reference it in here. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4865 Nah. Thats TODO. - Michael On 2011-09-27 06:38:09, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2065/ bq. --- bq. bq. (Updated 2011-09-27 06:38:09) bq. bq. bq. Review request for hbase and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Make the Meta* operations against meta retry. We do it by using HTable instances. bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc). bq. In 0.89, we had special RetryableMetaOperation class that was a bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries bq. with its retry loop. Now we just use HTable instead (Costs some on setup but bq. otherwise, we avoid duplicating code). Upped the retries on serverside too. bq. bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying bq. heavily on CT methods getting proxy connections to meta and root servers. bq. CT needs to be cut back. This patch closes down access on (unused) public bq. methods and removes being able to get an HRegionInterface on meta and root bq. -- this stuff is used internally to CT only now; use MetaEditor or bq. MetaReader if you want to update or read catalog tables. Opening new issue bq. to cutback CT use over the code base. bq. bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor bq. trying to clean them up, I ended up moving meta migration code out to its bq. own class rather than have it in all inside in MetaEditor. bq. bq. Here is some detail to help reviews. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java bq.Clean up. Shutdown access on some of these unused methods. Don't bq.let out HRegionInterface instances in particular since we are going bq.away from raw HRI use to instead use a connection with retries: bq.i.e. HTable. bq. bq.Comments on state of this class. Javadoc edits. bq.getZooKeeperWatcher on HConnection is deprecated so don't use it bq.in constructor. Override MetaNodeTracker and on node delete bq.reset meta location (We used to do this over in MetaNodeTracker bq.but to do that we had to have a CatalogTracker over in zk package bq.which is silly -- bad package encapsulation). bq. bq.(waitForRootServer) Renamed getRootServerConnection and change it bq.from public to package private. bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed. bq.(getMetaServerConnection) Change from public to package private. bq.Use MetaReader to read the meta location in root rather than a bq.raw HRegionInterface so we get retrying. bq.(remaining, timedout) Added utility methods. bq.(waitForMetaServer) Changed from public to private. bq.(resetMetaLocation) Made it synchronized on metaAvailable. bq.Not all accesses were synchronized. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java bq.Refactor to use HTable instead of raw HRegionInterface so we get bq.retrying. For each operation we get an HTable, use it, then close it. bq.(putToMetaTable, putsToMetaTable, etc) Utility methods. bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own bq.class since these classes are for a one-time migration only. bq. bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java bq.New class that holds all Meta* methods updating meta table used bq.doing the one-time migration done to meta on startup. This class bq.is marked deprecated because its going to be dropped in 0.94. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java bq.Retrofit methods in here to use fullScan methods with Visitor. bq.(getCatalogRegionInterface, getCatalogRegionNameForTable, bq.
[jira] [Commented] (HBASE-4245) Cluster hangs if RS serving root fails during startup sequence
[ https://issues.apache.org/jira/browse/HBASE-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115819#comment-13115819 ] Todd Lipcon commented on HBASE-4245: Yep, I agree that Ming's patch for 4455 fixes this. Cluster hangs if RS serving root fails during startup sequence -- Key: HBASE-4245 URL: https://issues.apache.org/jira/browse/HBASE-4245 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: Todd Lipcon Assignee: Ming Ma On a large-ish cluster, the following sequence of events was seen to happen: - master started, ROOT and META were both unassigned - ROOT is assigned to rs01 - META is assigned to rs02 - Upon open of META, it writes its location into ROOT on rs01 - rs01 crashes while appending to its HLog due to some other bug - rs02 fails the region open sequence - master notices that rs01 has crashed, and enqueues a ServerShutdownHandler - ServerShutdownHandler blocks on CatalogTracker.waitForMeta() since ROOT and META are not assigned yet - master times out assignment of META, but never succeeds because ROOT location is still marked as rs01 This causes the cluster to never start up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4192) Optimize HLog for Throughput Using Delayed RPCs
[ https://issues.apache.org/jira/browse/HBASE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115824#comment-13115824 ] Todd Lipcon commented on HBASE-4192: That sounds easier to review, thanks for the suggestion, dhruba. Optimize HLog for Throughput Using Delayed RPCs --- Key: HBASE-4192 URL: https://issues.apache.org/jira/browse/HBASE-4192 Project: HBase Issue Type: New Feature Components: wal Affects Versions: 0.92.0 Reporter: Vlad Dogaru Priority: Minor Introduce a new HLog configuration parameter (batchEntries) for more aggressive batching of appends. If this is enabled, HLog appends are not written to the HLog writer immediately, but batched and written either periodically or when a sync is requested. Because sync times become larger, they use delayed RPCs to free up RPC handler threads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4352) Apply version of hbase-4015 to branch
[ https://issues.apache.org/jira/browse/HBASE-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115822#comment-13115822 ] stack commented on HBASE-4352: -- I signed up to check we don't break rolling restart. Let me get that done first. Will report back if can do what you ask above (I'm trying to test 205 as background task, could do two things at once). Apply version of hbase-4015 to branch - Key: HBASE-4352 URL: https://issues.apache.org/jira/browse/HBASE-4352 Project: HBase Issue Type: Bug Reporter: stack Assignee: ramkrishna.s.vasudevan Fix For: 0.90.5 Attachments: HBASE-4352_0.90.patch, HBASE-4352_0.90_1.patch Consider adding a version of hbase-4015 to 0.90. It changes HRegionInterface so would need move change to end of the Interface and then test that it doesn't break rolling restart. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4488) Store could miss rows during flush
[ https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4488: - Status: Patch Available (was: Open) Marking patch available. Store could miss rows during flush -- Key: HBASE-4488 URL: https://issues.apache.org/jira/browse/HBASE-4488 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0, 0.94.0 Attachments: 4488.txt While looking at HBASE-4344 I found that my change HBASE-4241 contains a critical mistake: The while(scanner.next(kvs)) loop is incorrect and might miss the last edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE
[ https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115828#comment-13115828 ] stack commented on HBASE-4497: -- Thanks J-D. Thats what I was too lazy to looksee for myself. Looks like we are doing enough tickling. Weird that timeout monitor can cut in, region can be assigned elsewhere AND successfully update meta before this comes back. Here is from Rams email up on list earlier with log snippets: {code} RS1 === 2011-09-23 22:34:34,000 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: addToOnlineRegions is doneREGION = {NAME = 't5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.', TableName = 't5', STARTKEY = '', ENDKEY = '', ENCODED = 2d06b3ca4d398ec96920ae86441a68c9,} 2011-09-23 22:34:34,009 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Updated row t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9. in region .META.,,1 with serverName=linux76,60020,1316796517682 2011-09-23 22:34:34,009 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open deploy taks for region=t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9., daughter=false 2011-09-23 22:34:34,009 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1328ceaa1ff0037 Attempting to transition node 2d06b3ca4d398ec96920ae86441a68c9 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2011-09-23 22:34:34,038 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Completed the OPEN of region t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9. but when transitioning from OPENING to OPENED got a version mismatch, someone else clashed so now unassigning -- closing region 2011-09-23 22:34:34,038 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.: disabling compactions flushes 2011-09-23 22:34:34,038 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9. 2011-09-23 22:34:34,038 DEBUG org.apache.hadoop.hbase.regionserver.Store: closed f5 2011-09-23 22:34:34,038 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9. RS2 === 2011-09-23 22:33:56,546 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1328ceaa1ff0039 Successfully transitioned node 2d06b3ca4d398ec96920ae86441a68c9 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2011-09-23 22:33:56,845 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9., daughter=false 2011-09-23 22:33:56,845 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: addToOnlineRegions is doneREGION = {NAME = 't5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.', TableName = 't5', STARTKEY = '', ENDKEY = '', ENCODED = 2d06b3ca4d398ec96920ae86441a68c9,} 2011-09-23 22:33:56,856 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Updated row t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9. in region .META.,,1 with serverName=linux146,60020,1316796499216 2011-09-23 22:33:56,856 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open deploy taks for region=t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9., daughter=false 2011-09-23 22:33:58,887 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1328ceaa1ff0039 Attempting to transition node 2d06b3ca4d398ec96920ae86441a68c9 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2011-09-23 22:33:58,893 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1328ceaa1ff0039 Successfully transitioned node 2d06b3ca4d398ec96920ae86441a68c9 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2011-09-23 22:33:58,893 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9. {code} If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE --- Key: HBASE-4497 URL: https://issues.apache.org/jira/browse/HBASE-4497 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Priority: Critical As per the discussion in the mail chain HBCK reporting of possible mismatch in RS assignment this JIRA is created. Consider two RS- RS1 and RS2. A region tries to open in RS1. But it takes a while. The RS1 has still not updated meta and transitioned the node from OPENING to OPENED So timeout assigns the region to RS2. RS2 successfully updates the META and opens the region. Now RS1 tries to act on the region by first updating the META and then transiting the node to OPENING to OPENED. RS1 transiting the node to OPENING to OPENED
[jira] [Commented] (HBASE-4494) AvroServer:: get fails with NPE on a non-existent row
[ https://issues.apache.org/jira/browse/HBASE-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115833#comment-13115833 ] stack commented on HBASE-4494: -- Are you back Kay Kay? AvroServer:: get fails with NPE on a non-existent row - Key: HBASE-4494 URL: https://issues.apache.org/jira/browse/HBASE-4494 Project: HBase Issue Type: Bug Components: avro Affects Versions: 0.90.4 Reporter: Kay Kay Assignee: Kay Kay Fix For: 0.90.5 Attachments: HBASE-4494.patch Try to submit a get request to the avro gateway. If the row specified for a given table does not exist, the server request fails with a NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row
[ https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115867#comment-13115867 ] Jonathan Gray commented on HBASE-4433: -- Is this not strictly an improvement/feature? It seems like it doesn't belong in stable branches :) avoid extra next (potentially a seek) if done with column/row - Key: HBASE-4433 URL: https://issues.apache.org/jira/browse/HBASE-4433 Project: HBase Issue Type: Improvement Reporter: Kannan Muthukkaruppan Assignee: Kannan Muthukkaruppan Fix For: 0.92.0 [Noticed this in 89, but quite likely true of trunk as well.] When we are done with the requested column(s) the code still does an extra next() call before it realizes that it is actually done. This extra next() call could potentially result in an unnecessary extra block load. This is likely to be especially bad for CFs where the KVs are large blobs where each KV may be occupying a block of its own. So the next() can often load a new unrelated block unnecessarily. -- For the simple case of reading say the top-most column in a row in a single file, where each column (KV) was say a block of its own-- it seems that we are reading 3 blocks, instead of 1 block! I am working on a simple patch and with that the number of seeks is down to 2. [There is still an extra seek left. I think there were two levels of extra/unnecessary next() we were doing without actually confirming that the next was needed. One at the StoreScanner/ScanQueryMatcher level which this diff avoids. I think the other is at hfs.next() (at the storefile scanner level) that's happening whenever a HFile scanner servers out a data-- and perhaps that's the additional seek that we need to avoid. But I want to tackle this optimization first as the two issues seem unrelated.] -- The basic idea of the patch I am working on/testing is as follows. The ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if the KV needs to be included and then if done, only in the the next call it returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases when ExplicitColumnTracker knows it is done with a particular column/row, the patch attempts to combine the INCLUDE code and done hint into a single match code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3025) Coprocessor based simple access control
[ https://issues.apache.org/jira/browse/HBASE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115868#comment-13115868 ] jirapos...@reviews.apache.org commented on HBASE-3025: -- bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java, line 98 bq. https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line98 bq. bq. Can we make this 1? sure bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java, line 192 bq. https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line192 bq. bq. Debug logging should go to LOG not AUDITLOG The idea was that all authorization decisions should be separated into audit log. Here we're allowing access, so AUDITLOG seemed to make sense. I agree that this still needs to be cleaned up a lot. Maybe all audit logging should be done up in requirePermission() with authorization result? At the very least we need a consistent format and consistent logging levels for messages (trace, right?). bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java, line 200 bq. https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line200 bq. bq. Should be INFO or TRACE level? TRACE makes more sense to me. Sure, can use trace for all audit log decisions. bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java, line 208 bq. https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line208 bq. bq. Debug logging should go to LOG not AUDITLOG This is an authorization decision since we're returning true below. We can make this trace level, and improve the format, but I think AUDITLOG (if enabled) should contain a single message per request on why the request was allowed or denied. bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java, line 274 bq. https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line274 bq. bq. Should be INFO or TRACE level? TRACE makes more sense to me. will change to trace. bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java, line 354 bq. https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line354 bq. bq. Should something go to AUDITLOG here? Failure should already have been recorded in AUDITLOG via logDenied(). Agree that moving AUDITLOG messages up here with consistent format would be clearer, but will require some restructuring of return value from permissionGranted() so that some context specific reason can be pulled back up for logging. bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java, line 366 bq. https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line366 bq. bq. Should hasFamilyQualifierPermission log to AUDITLOG? It is used in places to make decisions -- an exception is thrown directly or not. Yes, agree, we should either log to AUDITLOG at decision points here or consistently move the AUDITLOG logging up a level out of permissionGranted() and hasFamilyQualifierPermission(). bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java, line 375 bq. https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line375 bq. bq. Another one of these was sent to AUDITLOG above. Do the same here? Should be INFO or TRACE level? TRACE makes more sense to me. Agree, should go to AUDITLOG at trace. bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java, line 590 bq. https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line590 bq. bq. Should be logged with ERROR? sure bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java, line 856 bq. https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line856 bq. bq. Should this go to AUDITLOG? At INFO or TRACE level? My preference is TRACE. Yes, agree. bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/Permission.java, line 174 bq. https://reviews.apache.org/r/2041/diff/1/?file=45406#file45406line174 bq. bq. What if instead we check for version 0 and throw an IllegalArgumentException if so?
[jira] [Commented] (HBASE-3025) Coprocessor based simple access control
[ https://issues.apache.org/jira/browse/HBASE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115869#comment-13115869 ] jirapos...@reviews.apache.org commented on HBASE-3025: -- bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. Looks good. The majority of my comments have to do with inconsistent logging practice. Thanks for the review. I'll post an update with some cleanups and some reworking of the AUDITLOG handling. - Gary --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2041/#review2077 --- On 2011-09-23 19:14:20, Gary Helmling wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2041/ bq. --- bq. bq. (Updated 2011-09-23 19:14:20) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. This patch implements access control list based authorization of HBase operations. The patch depends on the currently posted patch for HBASE-2742 (secure RPC engine). bq. bq. Key parts of the implementation are: bq. bq. * AccessControlLists - encapsulates storage of permission grants in a metadata table (_acl_). This differs from previous implementation where the .META. table was used to store permissions. bq. bq. * AccessController - bq.- implements MasterObserver and RegionObserver, performing authorization checks in each of the preXXX() hooks. If authorization fails, an AccessDeniedException is thrown. bq.- implements AccessControllerProtocol as a coprocessor endpoint to provide RPC methods for granting, revoking and listing permissions. bq. bq. * ZKPermissionWatcher (and TableAuthManager) - synchronizes ACL entries and updates throughout the cluster nodes using ZK. ACL entries are stored in per-table znodes as /hbase/acl/tablename. bq. bq. * Additional ruby shell scripts providing the grant, revoke and user_permission commands bq. bq. * Support for a new OWNER attribute in HTableDescriptor. I could separate out this change into a new JIRA for discussion, but I don't see it as currently useful outside of security. Alternately, I could handle the OWNER attribute completely in AccessController without changing HTD, but that would make interaction via hbase shell a bit uglier. bq. bq. bq. This addresses bug HBASE-3025. bq. https://issues.apache.org/jira/browse/HBASE-3025 bq. bq. bq. Diffs bq. - bq. bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControlFilter.java PRE-CREATION bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControlLists.java PRE-CREATION bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java PRE-CREATION bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControllerProtocol.java PRE-CREATION bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/Permission.java PRE-CREATION bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/TableAuthManager.java PRE-CREATION bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/TablePermission.java PRE-CREATION bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/UserPermission.java PRE-CREATION bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/ZKPermissionWatcher.java PRE-CREATION bq. security/src/test/java/org/apache/hadoop/hbase/security/rbac/SecureTestUtil.java PRE-CREATION bq. security/src/test/java/org/apache/hadoop/hbase/security/rbac/TestAccessControlFilter.java PRE-CREATION bq. security/src/test/java/org/apache/hadoop/hbase/security/rbac/TestAccessController.java PRE-CREATION bq. security/src/test/java/org/apache/hadoop/hbase/security/rbac/TestTablePermissions.java PRE-CREATION bq. security/src/test/java/org/apache/hadoop/hbase/security/rbac/TestZKPermissionsWatcher.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 46a1a3d bq.src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 699a5f5 bq.src/main/resources/hbase-default.xml 2c8f44b bq.src/main/ruby/hbase.rb 4d27191 bq.src/main/ruby/hbase/admin.rb b244ffe bq.src/main/ruby/hbase/hbase.rb beb2450 bq.src/main/ruby/hbase/security.rb PRE-CREATION bq.src/main/ruby/shell.rb 9a47600 bq.src/main/ruby/shell/commands.rb a352c2e bq.src/main/ruby/shell/commands/grant.rb PRE-CREATION bq.src/main/ruby/shell/commands/revoke.rb PRE-CREATION bq.src/main/ruby/shell/commands/table_permission.rb PRE-CREATION bq.
[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter
[ https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115870#comment-13115870 ] Dave Revell commented on HBASE-4489: @Ted: Bytes.split() will check the number of splits and throw IllegalArgumentException if =0. It seems like a shame to add more code to duplicate a check that's already being done. Better key splitting in RegionSplitter -- Key: HBASE-4489 URL: https://issues.apache.org/jira/browse/HBASE-4489 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Dave Revell Assignee: Dave Revell Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch The RegionSplitter utility allows users to create a pre-split table from the command line or do a rolling split on an existing table. It supports pluggable split algorithms that implement the SplitAlgorithm interface. The only/default SplitAlgorithm is one that assumes keys fall in the range from ASCII string to ASCII string 7FFF. This is not a sane default, and seems useless to most users. Users are likely to be surprised by the fact that all the region splits occur in in the byte range of ASCII characters. A better default split algorithm would be one that evenly divides the space of all bytes, which is what this patch does. Making a table with five regions would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and \xFF\xFF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4488) Store could miss rows during flush
[ https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115876#comment-13115876 ] Jonathan Gray commented on HBASE-4488: -- Aren't you a committer now? Or didn't get the rights yet? :) Yes, I will commit later today. Store could miss rows during flush -- Key: HBASE-4488 URL: https://issues.apache.org/jira/browse/HBASE-4488 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0, 0.94.0 Attachments: 4488.txt While looking at HBASE-4344 I found that my change HBASE-4241 contains a critical mistake: The while(scanner.next(kvs)) loop is incorrect and might miss the last edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4488) Store could miss rows during flush
[ https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115877#comment-13115877 ] Lars Hofhansl commented on HBASE-4488: -- Not yet... :) Store could miss rows during flush -- Key: HBASE-4488 URL: https://issues.apache.org/jira/browse/HBASE-4488 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0, 0.94.0 Attachments: 4488.txt While looking at HBASE-4344 I found that my change HBASE-4241 contains a critical mistake: The while(scanner.next(kvs)) loop is incorrect and might miss the last edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4498) HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly
HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly - Key: HBASE-4498 URL: https://issues.apache.org/jira/browse/HBASE-4498 Project: HBase Issue Type: Bug Components: build, scripts Affects Versions: 0.92.0 Environment: Java, Linux Reporter: Eric Yang Assignee: Eric Yang Fix For: 0.92.0 HBase RPM packaging was done prior to completion of ZooKeeper RPM packaging. In update-hbase-env.sh, it expects ZooKeeper environment script to exist in /etc/default/zookeeper-env.sh. After several revision of ZOOKEEPER-999, it was decided to remove /etc/default/zookeeper-env.sh by ZooKeeper community. Hence, update-hbase-env.sh should not depend on /etc/default/zookeeper-env.sh. Instead, update-hbase-env.sh should assume ZooKeeper exists in /usr for RPM/deb packages. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4498) HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly
[ https://issues.apache.org/jira/browse/HBASE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated HBASE-4498: - Status: Patch Available (was: Open) HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly - Key: HBASE-4498 URL: https://issues.apache.org/jira/browse/HBASE-4498 Project: HBase Issue Type: Bug Components: build, scripts Affects Versions: 0.92.0 Environment: Java, Linux Reporter: Eric Yang Assignee: Eric Yang Fix For: 0.92.0 Attachments: HBASE-4498.patch HBase RPM packaging was done prior to completion of ZooKeeper RPM packaging. In update-hbase-env.sh, it expects ZooKeeper environment script to exist in /etc/default/zookeeper-env.sh. After several revision of ZOOKEEPER-999, it was decided to remove /etc/default/zookeeper-env.sh by ZooKeeper community. Hence, update-hbase-env.sh should not depend on /etc/default/zookeeper-env.sh. Instead, update-hbase-env.sh should assume ZooKeeper exists in /usr for RPM/deb packages. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4498) HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly
[ https://issues.apache.org/jira/browse/HBASE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated HBASE-4498: - Attachment: HBASE-4498.patch Set ZOOKEEPER_HOME to /usr by default. HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly - Key: HBASE-4498 URL: https://issues.apache.org/jira/browse/HBASE-4498 Project: HBase Issue Type: Bug Components: build, scripts Affects Versions: 0.92.0 Environment: Java, Linux Reporter: Eric Yang Assignee: Eric Yang Fix For: 0.92.0 Attachments: HBASE-4498.patch HBase RPM packaging was done prior to completion of ZooKeeper RPM packaging. In update-hbase-env.sh, it expects ZooKeeper environment script to exist in /etc/default/zookeeper-env.sh. After several revision of ZOOKEEPER-999, it was decided to remove /etc/default/zookeeper-env.sh by ZooKeeper community. Hence, update-hbase-env.sh should not depend on /etc/default/zookeeper-env.sh. Instead, update-hbase-env.sh should assume ZooKeeper exists in /usr for RPM/deb packages. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters
[ https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115886#comment-13115886 ] Jean-Daniel Cryans commented on HBASE-3130: --- Another thing about the latest patch, this line needs to be removed in RP: bq. * @param zkw zookeeper connection to the peer [replication] ReplicationSource can't recover from session expired on remote clusters - Key: HBASE-3130 URL: https://issues.apache.org/jira/browse/HBASE-3130 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Chris Trezzo Fix For: 0.92.0 Attachments: 3130-v2.txt, 3130-v3.txt, 3130.txt Currently ReplicationSource cannot recover when its zookeeper connection to its remote cluster expires. HLogs are still being tracked, but a cluster restart is required to continue replication (or a rolling restart). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4498) HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly
[ https://issues.apache.org/jira/browse/HBASE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115895#comment-13115895 ] jirapos...@reviews.apache.org commented on HBASE-4498: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2075/ --- Review request for hbase. Summary --- Updated ZOOKEEPER_HOME location to default to /usr (default location of ZooKeeper rpm/deb package) This addresses bug HBASE-4498. https://issues.apache.org/jira/browse/HBASE-4498 Diffs - /src/packages/update-hbase-env.sh 1176602 Diff: https://reviews.apache.org/r/2075/diff Testing --- Thanks, Eric HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly - Key: HBASE-4498 URL: https://issues.apache.org/jira/browse/HBASE-4498 Project: HBase Issue Type: Bug Components: build, scripts Affects Versions: 0.92.0 Environment: Java, Linux Reporter: Eric Yang Assignee: Eric Yang Fix For: 0.92.0 Attachments: HBASE-4498.patch HBase RPM packaging was done prior to completion of ZooKeeper RPM packaging. In update-hbase-env.sh, it expects ZooKeeper environment script to exist in /etc/default/zookeeper-env.sh. After several revision of ZOOKEEPER-999, it was decided to remove /etc/default/zookeeper-env.sh by ZooKeeper community. Hence, update-hbase-env.sh should not depend on /etc/default/zookeeper-env.sh. Instead, update-hbase-env.sh should assume ZooKeeper exists in /usr for RPM/deb packages. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3025) Coprocessor based simple access control
[ https://issues.apache.org/jira/browse/HBASE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115897#comment-13115897 ] jirapos...@reviews.apache.org commented on HBASE-3025: -- bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java, line 192 bq. https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line192 bq. bq. Debug logging should go to LOG not AUDITLOG bq. bq. Gary Helmling wrote: bq. The idea was that all authorization decisions should be separated into audit log. Here we're allowing access, so AUDITLOG seemed to make sense. I agree that this still needs to be cleaned up a lot. Maybe all audit logging should be done up in requirePermission() with authorization result? At the very least we need a consistent format and consistent logging levels for messages (trace, right?). bq. Maybe all audit logging should be done up in requirePermission() with authorization result? Sounds good. bq. At the very least we need a consistent format and consistent logging levels for messages (trace, right?). I'd argue for TRACE bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/TableAuthManager.java, line 47 bq. https://reviews.apache.org/r/2041/diff/1/?file=45407#file45407line47 bq. bq. Maybe we can call this .auth.? We don't really have an RBAC implementation yet. Likewise for the package name for all of this stuff? Just a random thought. bq. bq. Gary Helmling wrote: bq. Yeah rbac here and in package name is a misnomer. How about using access instead? auth seems ambiguous to me as it could mean authentication or authorization. JDK uses auth in javax.security.auth and claims it's for both, but seems like that and sub-packages are more authentication related to me. Hadoop uses authorize for a similar package to this. access sounds good to me. bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/TableAuthManager.java, line 84 bq. https://reviews.apache.org/r/2041/diff/1/?file=45407#file45407line84 bq. bq. Isn't this an error? bq. bq. Gary Helmling wrote: bq. Yes, and in this context a pretty bad one, as it probably means region server initiated RPCs won't work or will be denied. We should probably let the IOE escape here... Agree. bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/Permission.java, line 174 bq. https://reviews.apache.org/r/2041/diff/1/?file=45406#file45406line174 bq. bq. What if instead we check for version 0 and throw an IllegalArgumentException if so? Technically, it is an invalid request if it contains an unrecognizable action code. Skipping this check if version 0 would be a way to handle new perms while not accepting incorrect input otherwise. bq. bq. Gary Helmling wrote: bq. Yeah, seems safer to throw an exception here than to ignore invalid input. What about throwing an IOException (to tie in to existing error handling)? bq. bq. We could potentially trap the VersionMismatchException from VersionedWritable to allow skip and continue when reading newer versions of Permission with potentially added Action codes. Would need to think about what kind of errors that would expose us to. bq. What about throwing an IOException (to tie in to existing error handling)? Throwing an IOE sounds good. bq. We could potentially trap the VersionMismatchException from VersionedWritable to allow skip and continue when reading newer versions of Permission with potentially added Action codes. I think that is reasonable, with something logged at WARN level. The idea here is to ride over a rolling restart. Would not see long term operation with mismatching versions. bq. On 2011-09-27 16:58:47, Andrew Purtell wrote: bq. security/src/main/java/org/apache/hadoop/hbase/security/rbac/ZKPermissionWatcher.java, line 59 bq. https://reviews.apache.org/r/2041/diff/1/?file=45410#file45410line59 bq. bq. I wonder if there is some way we can check if a secure variant of ZooKeeper is running, and refuse to initialize if not. bq. bq. Gary Helmling wrote: bq. My thinking has been to handle all secure ZooKeeper changes separately. So I'd prefer to handle any check here as part of that. bq. bq. I do think it's reasonable to run AccessController with only SIMPLE auth and no secure ZooKeeper. It's not secure but could still be useful (we currently use this setup for tests). bq. bq. We could complain loudly to give an indication that you have a security hole though. bq. I do think it's reasonable to run AccessController with only SIMPLE auth and
[jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115917#comment-13115917 ] Ted Yu commented on HBASE-4455: --- Integrated to 0.92 and TRUNK. Thanks for the nice work, Ming. Thanks for the review Todd, Jonathan, Ramkrishna and Stack. Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager -- Key: HBASE-4455 URL: https://issues.apache.org/jira/browse/HBASE-4455 Project: HBase Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Fix For: 0.92.0 Keep Master up all the time, do rolling restart of RSs like this - stop RS1, wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start RS2, wait for 2 seconds, etc. After a while, you will find the -ROOT-, .META. regions aren't in regions in transtion from AssignmentManager point of view, but they aren't assigned to any regions. Here are the issues. 1. .-ROOT- or .META. location is stale when MetaServerShutdownHandler is invoked to check if it contains -ROOT- region. That is due to long delay from ZK notification and async nature of the system. Here is an example, even though new root region server sea-lab-1,60020,1316380133656 is set at T2, at T3 the shutdown process for sea-lab-1,60020,1316380133656, the root location still points to old server sea-lab-3,60020,1316380037898. T1: 2011-09-18 14:08:52,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:6 -0x1327e43175e Retrieved 29 byte(s) of data from znode /hbase/root-regio n-server and set watcher; sea-lab-3,60020,1316380037898 T2: 2011-09-18 14:08:57,173 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region location in ZooKeeper as sea-lab-1,60020,1316380133656 T3: 2011-09-18 14:10:26,393 DEBUG org.apache.hadoop.hbase.master.ServerManager: Adde d=sea-lab-1,60020,1316380133656 to dead servers, submitted shutdown handler to be executed, root=false, meta=true, current Root Location: sea-lab-3,60020,1316380037898 T4: 2011-09-18 14:12:37,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:6 -0x1327e43175e Retrieved 29 byte(s) of data from znode /hbase/root-region-server and set watcher; sea-lab-1,60020,1316380133656 2. The MetaServerShutdownHandler worker thread that waits for -ROOT- or .META. availability could be blocked. If meanwhile, the new server that -ROOT- or .META. is being assigned restarted, another instance of MetaServerShutdownHandler is queued. Eventually, all MetaServerShutdownHandler worker threads are filled up. It looks like HBASE-4245. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row
[ https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115938#comment-13115938 ] Hudson commented on HBASE-4433: --- Integrated in HBase-0.92 #23 (See [https://builds.apache.org/job/HBase-0.92/23/]) HBASE-4433: avoid extra next (potentially a seek) if done with column/row HBASE-4433: avoid extra next (potentially a seek) if done with column/row stack : Files : * /hbase/branches/0.92/CHANGES.txt stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java avoid extra next (potentially a seek) if done with column/row - Key: HBASE-4433 URL: https://issues.apache.org/jira/browse/HBASE-4433 Project: HBase Issue Type: Improvement Reporter: Kannan Muthukkaruppan Assignee: Kannan Muthukkaruppan Fix For: 0.92.0 [Noticed this in 89, but quite likely true of trunk as well.] When we are done with the requested column(s) the code still does an extra next() call before it realizes that it is actually done. This extra next() call could potentially result in an unnecessary extra block load. This is likely to be especially bad for CFs where the KVs are large blobs where each KV may be occupying a block of its own. So the next() can often load a new unrelated block unnecessarily. -- For the simple case of reading say the top-most column in a row in a single file, where each column (KV) was say a block of its own-- it seems that we are reading 3 blocks, instead of 1 block! I am working on a simple patch and with that the number of seeks is down to 2. [There is still an extra seek left. I think there were two levels of extra/unnecessary next() we were doing without actually confirming that the next was needed. One at the StoreScanner/ScanQueryMatcher level which this diff avoids. I think the other is at hfs.next() (at the storefile scanner level) that's happening whenever a HFile scanner servers out a data-- and perhaps that's the additional seek that we need to avoid. But I want to tackle this optimization first as the two issues seem unrelated.] -- The basic idea of the patch I am working on/testing is as follows. The ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if the KV needs to be included and then if done, only in the the next call it returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases when ExplicitColumnTracker knows it is done with a particular column/row, the patch attempts to combine the INCLUDE code and done hint into a single match code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4499) [replication] Source shouldn't update ZK if it didn't progress
[replication] Source shouldn't update ZK if it didn't progress -- Key: HBASE-4499 URL: https://issues.apache.org/jira/browse/HBASE-4499 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Priority: Minor Fix For: 0.92.0 A relatively minor optimization to be done in ReplicationSource, currently it calls ReplicationSourceManager.logPositionAndCleanOldLogs whether it made progress or not, generating more load on ZK than necessary. The last position should be kept around so that we can compare. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing
[ https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115966#comment-13115966 ] Scott Kuehn commented on HBASE-4480: I'm happy to pick this up, if nobody is working on it. Testing script to simplfy local testing --- Key: HBASE-4480 URL: https://issues.apache.org/jira/browse/HBASE-4480 Project: HBase Issue Type: Improvement Reporter: Jesse Yates Priority: Minor Labels: test Attachments: runtest.sh As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a script that would handle more of the finer points of running/checking our test suite. This script should: (1) Allow people to determine which tests are hanging/taking a long time to run (2) Allow rerunning of particular tests to make sure it wasn't an artifact of running the whole suite that caused the failure (3) Allow people to specify to run just unit tests or also integration tests (essentially wrapping calls to 'maven test' and 'maven verify'). This script should just be a convenience script - running tests directly from maven should not be impacted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4500) [replication] Add a delayed option
[replication] Add a delayed option -- Key: HBASE-4500 URL: https://issues.apache.org/jira/browse/HBASE-4500 Project: HBase Issue Type: New Feature Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Fix For: 0.94.0 A typical DR solution with databases is to have one slave that receives data a few hours late to guard against destructive operations. On top of my head one way to implement this would be tail the log, see how current the edit is, sleep if needed, replicate, repeat. The harder part will be adding a configuration to the stream. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing
[ https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115972#comment-13115972 ] stack commented on HBASE-4480: -- @Ted You want to check this in? Where? Into src/test/bin? Testing script to simplfy local testing --- Key: HBASE-4480 URL: https://issues.apache.org/jira/browse/HBASE-4480 Project: HBase Issue Type: Improvement Reporter: Jesse Yates Priority: Minor Labels: test Attachments: runtest.sh As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a script that would handle more of the finer points of running/checking our test suite. This script should: (1) Allow people to determine which tests are hanging/taking a long time to run (2) Allow rerunning of particular tests to make sure it wasn't an artifact of running the whole suite that caused the failure (3) Allow people to specify to run just unit tests or also integration tests (essentially wrapping calls to 'maven test' and 'maven verify'). This script should just be a convenience script - running tests directly from maven should not be impacted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing
[ https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115975#comment-13115975 ] Jesse Yates commented on HBASE-4480: @stack - I was thinking this script would be incorporated into a bigger script that we would run for testing. @Scott - I'm fine if you want to do the bigger script. Testing script to simplfy local testing --- Key: HBASE-4480 URL: https://issues.apache.org/jira/browse/HBASE-4480 Project: HBase Issue Type: Improvement Reporter: Jesse Yates Priority: Minor Labels: test Attachments: runtest.sh As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a script that would handle more of the finer points of running/checking our test suite. This script should: (1) Allow people to determine which tests are hanging/taking a long time to run (2) Allow rerunning of particular tests to make sure it wasn't an artifact of running the whole suite that caused the failure (3) Allow people to specify to run just unit tests or also integration tests (essentially wrapping calls to 'maven test' and 'maven verify'). This script should just be a convenience script - running tests directly from maven should not be impacted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing
[ https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115978#comment-13115978 ] stack commented on HBASE-4480: -- @Jesse Sounds good Testing script to simplfy local testing --- Key: HBASE-4480 URL: https://issues.apache.org/jira/browse/HBASE-4480 Project: HBase Issue Type: Improvement Reporter: Jesse Yates Priority: Minor Labels: test Attachments: runtest.sh As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a script that would handle more of the finer points of running/checking our test suite. This script should: (1) Allow people to determine which tests are hanging/taking a long time to run (2) Allow rerunning of particular tests to make sure it wasn't an artifact of running the whole suite that caused the failure (3) Allow people to specify to run just unit tests or also integration tests (essentially wrapping calls to 'maven test' and 'maven verify'). This script should just be a convenience script - running tests directly from maven should not be impacted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4501) [replication] Shutting down a stream leaves recovered sources running
[replication] Shutting down a stream leaves recovered sources running - Key: HBASE-4501 URL: https://issues.apache.org/jira/browse/HBASE-4501 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Fix For: 0.92.0 When removing a peer it will call ReplicationSourceManager.removePeer which calls closeRecoveredQueue which does this: {code} LOG.info(Done with the recovered queue + src.getPeerClusterZnode()); this.oldsources.remove(src); this.zkHelper.deleteSource(src.getPeerClusterZnode(), false); {code} This works in the case where the recovered source is done and is calling this method, but when removing a peer it never calls terminate on thus it leaving it running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-2196) Support more than one slave cluster
[ https://issues.apache.org/jira/browse/HBASE-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans reassigned HBASE-2196: - Assignee: Lars Hofhansl Support more than one slave cluster --- Key: HBASE-2196 URL: https://issues.apache.org/jira/browse/HBASE-2196 Project: HBase Issue Type: Sub-task Components: replication Reporter: Jean-Daniel Cryans Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 2196-add.txt, 2196-v2.txt, 2196-v5.txt, 2196-v6.txt, 2196.txt, HBASE-2196-0.90-v2.patch, HBASE-2196-0.90.patch, HBASE-2196-wip.patch Currently replication supports only 1 slave cluster, need to ability to add more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4502) Error at HMaster start
Error at HMaster start -- Key: HBASE-4502 URL: https://issues.apache.org/jira/browse/HBASE-4502 Project: HBase Issue Type: Bug Environment: Ubuntu 11 Reporter: Madhab Nayak I have set up hadoop 0.20.2 as Pseduo distributed. This is running fine. And trying to integrate hbase-0.90.4/3 with hadoop. But while starting, getting following error at Master log. 2011-09-27 15:01:16,295 INFO org.apache.zookeeper.server.NIOServerCnxn: Client attempting to establish new session at /127.0.0.1:37643 2011-09-27 15:01:16,296 INFO org.apache.zookeeper.server.NIOServerCnxn: Established session 0x132ace7f4fa0002 with negotiated timeout 4 for client /127.0.0.1:37643 2011-09-27 15:01:16,296 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x132ace7f4fa0002, negotiated timeout = 4 2011-09-27 15:01:16,468 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: Call to localhost/127.0.0.1:8020 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:1139) at org.apache.hadoop.ipc.Client.call(Client.java:1107) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at $Proxy6.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:213) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:180) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183) at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:364) at org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:81) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:346) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:193) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:812) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:720) 2011-09-27 15:01:16,470 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2011-09-27 15:01:16,470 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads 2011-09-27 15:01:16,470 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 51567 hbase-site.xml file structure is as follows configuration property namehbase.rootdir/name valuehdfs://localhost:8020/hbase/value /property property namedfs.replication/name value1/value /property /configuration Please help me to fix this error -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116008#comment-13116008 ] jirapos...@reviews.apache.org commented on HBASE-3446: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2065/#review2111 --- There is a lot of excellence in here. I'm going to look at the code itself with this diff applied to try and understand where/how CT is now being used. I'm a little unclear between the lines you'd like to draw and the lines you actually draw in this diff. Great work! src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4873 maybe note here that you should not be synchronized on metaAvailable (and it will do so in the method)... the next method below is nicely clear in this regard src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java https://reviews.apache.org/r/2065/#comment4874 verify the connection works, and also that the server is actually hosting the region we think it is... the comment makes me think this is looking up which server hosts the passed region but it's just verifying if we can connect to the server we think is hosting the region and verifies whether it's hosting it or not (so this fails if we can't connect or if the region is not on this server) src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java https://reviews.apache.org/r/2065/#comment4875 i'm still trying to understand exactly what you've changed and what is still a TODO, but this looks much nicer now! :) src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java https://reviews.apache.org/r/2065/#comment4876 same here! nice (old stuff looks ripe with race conditions) src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java https://reviews.apache.org/r/2065/#comment4877 missing copyright and year? src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java https://reviews.apache.org/r/2065/#comment4878 nice moving cruft to separate classes src/main/java/org/apache/hadoop/hbase/client/Result.java https://reviews.apache.org/r/2065/#comment4881 seems like this should be moved to static methods in a helper class rather than exposing to our client-side Result src/main/java/org/apache/hadoop/hbase/client/Result.java https://reviews.apache.org/r/2065/#comment4882 yeah, shouldn't this be in MetaReader or some such class? src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java https://reviews.apache.org/r/2065/#comment4895 missed some whitespace src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java https://reviews.apache.org/r/2065/#comment4896 nice src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java https://reviews.apache.org/r/2065/#comment4897 this seems like an important public method. i like the rename and your additional comments, but maybe we should add more. default behavior is to use a cached location, if one is not found, it is looked up in a catalog. setting reload to true bypasses the cache and forces the lookup to a catalog. and then, under what cases do we get an exception? does this verify that the server is actually hosting the region? or it just looks up in the catalog (i guess failure there could cause IOE) and if it finds something, just returns a connection to that RS (w/ no verification)... correct? src/main/java/org/apache/hadoop/hbase/master/HMaster.java https://reviews.apache.org/r/2065/#comment4898 why do you remove the javadoc on this method? src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java https://reviews.apache.org/r/2065/#comment4899 not even necessary to put this method in here at all now (we're just using it for getting the node name at this point but it's probably still nice to have the name in stacks and such) src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java https://reviews.apache.org/r/2065/#comment4900 yay! 3 src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java https://reviews.apache.org/r/2065/#comment4901 huh? :) src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java https://reviews.apache.org/r/2065/#comment4902 awesome src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java https://reviews.apache.org/r/2065/#comment4903 30,000 ft desc? i guess test name is self descriptive? :) - Jonathan On 2011-09-27 06:38:09, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq.
[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests
[ https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116012#comment-13116012 ] Doug Meil commented on HBASE-4448: -- Thanks Stack. In general, do you support the HBaseTestingUtilityFactory approach, where generic MiniClusters can be re-used (where-ever possible)? I think that HBaseClusterTestCase could be refactored to use HBaseTestingUtilityFactory to re-use the MiniCluster since they are all the same config. It's a plus but not a huge win, but still should be done. Every little bit helps. HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests - Key: HBASE-4448 URL: https://issues.apache.org/jira/browse/HBASE-4448 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: HBaseTestingUtilityFactory.java, hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch, java_HBASE_4448_v2.patch Setting up and tearing down HBaseTestingUtility instances in unit tests is very expensive. On my MacBook it takes about 10 seconds to set up a MiniCluster, and 7 seconds to tear it down. When multiplied by the number of test classes that use this facility, that's a lot of time in the build. This factory assumes that the JVM is being re-used across test classes in the build, otherwise this pattern won't work. I don't think this is appropriate for every use, but I think it can be applicable in a great many cases - especially where developers just want a simple MiniCluster with 1 slave. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.
[ https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116014#comment-13116014 ] Jonathan Gray commented on HBASE-4496: -- I can roll this in to my changes for CacheConfig. Don't have the JIRA off hand (but the point of it is to make it so we dont' have to change a bunch of constructors all the time). I'm hoping to have a diff out tonight or tomorrow morning. HFile V2 does not honor setCacheBlocks when scanning. - Key: HBASE-4496 URL: https://issues.apache.org/jira/browse/HBASE-4496 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0, 0.94.0 Attachments: 4496.txt While testing the LRU cache during the scanning I noticed quite some churn in the cache even when Scan.cacheBlocks is set to false. After debugging this, I found that HFile V2 always caches blocks in the LRU cache regardless of the cacheBlocks setting. Here's a trace (from Eclipse) showing the problem: HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, HFileBlock) line: 191 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151 StoreFileScanner.reseek(KeyValue) line: 110 KeyValueHeap.reseek(KeyValue) line: 255 StoreScanner.reseek(KeyValue) line: 409 StoreScanner.next(ListKeyValue, int) line: 304 KeyValueHeap.next(ListKeyValue, int) line: 114 KeyValueHeap.next(ListKeyValue) line: 143 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774 HRegion$RegionScannerImpl.nextInternal(int) line: 2722 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 HRegionServer.next(long, int) line: 2092 Every scanner.next causes a reseek, which eventually causes a call to HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the cacheBlocks information is lost. HFileReaderV2.readBlockData calls HFileReaderV2.readBlock with cacheBlocks set unconditionally to true. The fix is not immediately clear, unless we want to pass cacheBlocks to HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly as readBlockData should not care about caching. Avoiding caching during scans is somewhat important for us. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters
[ https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116016#comment-13116016 ] Jean-Daniel Cryans commented on HBASE-3130: --- Also I'd add that I was able to test the patch (on 0.90) and it really works, proof: First it loses the connection: {quote} 2011-09-27 16:44:54,984 WARN org.apache.hadoop.hbase.replication.ReplicationPeer: connection to cluster: 10.10.30.7:2181:/hbase1-0x132ad0f29d70017 connection to cluster: 10.10.30.7:2181:/hbase1-0x132ad0f29d70017 received expired from ZooKeeper, aborting org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 2011-09-27 16:44:54,984 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down {quote} Then later when it tries to replicate it tries to talk to ZK again and it works after a reload: {quote} 2011-09-27 16:49:03,738 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Since we are unable to replicate, sleeping 1000 times 10 2011-09-27 16:49:13,738 WARN org.apache.hadoop.hbase.replication.ReplicationZookeeper: Lost the ZooKeeper connection for peer 1 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase1/rs at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:389) at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndGetAsAddresses(ZKUtil.java:355) at org.apache.hadoop.hbase.replication.ReplicationZookeeper.fetchSlavesAddresses(ReplicationZookeeper.java:268) at org.apache.hadoop.hbase.replication.ReplicationZookeeper.getSlavesAddresses(ReplicationZookeeper.java:239) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:205) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341) 2011-09-27 16:49:13,772 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=10.10.30.7:2181 sessionTimeout=2 watcher=connection to cluster: 10.10.30.7:2181:/hbase1 2011-09-27 16:49:13,773 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server /10.10.30.7:2181 2011-09-27 16:49:14,111 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hbasedev.sfo.stumble.net/10.10.30.7:2181, initiating session 2011-09-27 16:49:14,140 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hbasedev.sfo.stumble.net/10.10.30.7:2181, sessionid = 0x132ad0f29d70024, negotiated timeout = 2 {quote} [replication] ReplicationSource can't recover from session expired on remote clusters - Key: HBASE-3130 URL: https://issues.apache.org/jira/browse/HBASE-3130 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Chris Trezzo Fix For: 0.92.0 Attachments: 3130-v2.txt, 3130-v3.txt, 3130.txt Currently ReplicationSource cannot recover when its zookeeper connection to its remote cluster expires. HLogs are still being tracked, but a cluster restart is required to continue replication (or a rolling restart). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116019#comment-13116019 ] jirapos...@reviews.apache.org commented on HBASE-3446: -- bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. There is a lot of excellence in here. I'm going to look at the code itself with this diff applied to try and understand where/how CT is now being used. I'm a little unclear between the lines you'd like to draw and the lines you actually draw in this diff. bq. bq. Great work! Sorry about that. Let me get you better answer to your question. I think its not very clear because I myself was unclear on scope of CT when I started in. What this patch has here is an attempt at shutting down CT scope with subsequent work put off for HBASE-4495. bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 513 bq. https://reviews.apache.org/r/2065/diff/1/?file=45907#file45907line513 bq. bq. maybe note here that you should not be synchronized on metaAvailable (and it will do so in the method)... the next method below is nicely clear in this regard Will do bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 575 bq. https://reviews.apache.org/r/2065/diff/1/?file=45907#file45907line575 bq. bq. verify the connection works, and also that the server is actually hosting the region we think it is... the comment makes me think this is looking up which server hosts the passed region but it's just verifying if we can connect to the server we think is hosting the region and verifies whether it's hosting it or not (so this fails if we can't connect or if the region is not on this server) Good point. bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java, line 194 bq. https://reviews.apache.org/r/2065/diff/1/?file=45908#file45908line194 bq. bq. i'm still trying to understand exactly what you've changed and what is still a TODO, but this looks much nicer now! :) In the above, we'd get the HRegionInterface and do the invocation on the actual Interface. The alternative steps back and asks an HTable instance to do the work. If an issue with former we'd just let the exception out. In the alternative, we'll do HTable retries before we let the exception out (and the retries are boosted in server-context). bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java, line 2 bq. https://reviews.apache.org/r/2065/diff/1/?file=45909#file45909line2 bq. bq. missing copyright and year? Turns out that copyright is not actually needed https://issues.apache.org/jira/browse/HBASE-3870 bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. src/main/java/org/apache/hadoop/hbase/client/Result.java, line 568 bq. https://reviews.apache.org/r/2065/diff/1/?file=45915#file45915line568 bq. bq. seems like this should be moved to static methods in a helper class rather than exposing to our client-side Result OK. It was kinda nice being able to do result.getServerNameFromCatalogResult. I suppose it does pollute. I can move it back to MetaReader since that seems like next best place. You are right shouldn't be generally public stuff. Will fix. bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java, line 70 bq. https://reviews.apache.org/r/2065/diff/1/?file=45918#file45918line70 bq. bq. this seems like an important public method. i like the rename and your additional comments, but maybe we should add more. default behavior is to use a cached location, if one is not found, it is looked up in a catalog. setting reload to true bypasses the cache and forces the lookup to a catalog. and then, under what cases do we get an exception? does this verify that the server is actually hosting the region? or it just looks up in the catalog (i guess failure there could cause IOE) and if it finds something, just returns a connection to that RS (w/ no verification)... correct? Will look into this. bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, lines 1041-1047 bq. https://reviews.apache.org/r/2065/diff/1/?file=45921#file45921line1041 bq. bq. why do you remove the javadoc on this method? Will look into this. bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java, line 132 bq. https://reviews.apache.org/r/2065/diff/1/?file=45931#file45931line132 bq. bq. huh? :) Let me fix. bq. On 2011-09-27 23:36:00,
[jira] [Created] (HBASE-4503) Purge deprecated HBaseClusterTestCase
Purge deprecated HBaseClusterTestCase - Key: HBASE-4503 URL: https://issues.apache.org/jira/browse/HBASE-4503 Project: HBase Issue Type: Improvement Reporter: stack It could gain us a few minutes on overall test run in the cases where we don't spin up a cluster for each test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4503) Purge deprecated HBaseClusterTestCase
[ https://issues.apache.org/jira/browse/HBASE-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4503: - Attachment: 4503.txt Here are a few tests converted. Few more to do. Purge deprecated HBaseClusterTestCase - Key: HBASE-4503 URL: https://issues.apache.org/jira/browse/HBASE-4503 Project: HBase Issue Type: Improvement Reporter: stack Attachments: 4503.txt It could gain us a few minutes on overall test run in the cases where we don't spin up a cluster for each test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter
[ https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116021#comment-13116021 ] Jonathan Hsieh commented on HBASE-4489: --- A few thoughts: I agree with jgray -- I think one fix should correct the MD5 string split so that it splits from 0x00.. 0xff. I think there could be another separate patch that adds the UniformSplit. I'd be wary of changing the default, especially if this is means to go into a 0.90.x branch. It looks like as a user you can add and use the UniformSplit by changing the conf option. Ideally patches with new functionality or changing semantics would also introduce corresponding tests. There were no tests on the previous code, and no tests in on the newly introduced code. Adding tests especially around edge cases could accommodate Ted's concerns, and it doesn't really hurt to be extra defensive when coding on non-performance sensitive code. Better key splitting in RegionSplitter -- Key: HBASE-4489 URL: https://issues.apache.org/jira/browse/HBASE-4489 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Dave Revell Assignee: Dave Revell Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch The RegionSplitter utility allows users to create a pre-split table from the command line or do a rolling split on an existing table. It supports pluggable split algorithms that implement the SplitAlgorithm interface. The only/default SplitAlgorithm is one that assumes keys fall in the range from ASCII string to ASCII string 7FFF. This is not a sane default, and seems useless to most users. Users are likely to be surprised by the fact that all the region splits occur in in the byte range of ASCII characters. A better default split algorithm would be one that evenly divides the space of all bytes, which is what this patch does. Making a table with five regions would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and \xFF\xFF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests
[ https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116023#comment-13116023 ] stack commented on HBASE-4448: -- I agree with your Every little bit helps. (I am working on hbase-4503 after going through your excel spread sheet). I'm still going through this issue. On the factory, I'm not sure yet. I'm unclear on how it'll be passed from test to test and also how we prevent one test polluting another tests's run. I also need to see for myself how this approach is different from the approach where we spin up the cluster at start of a suite and then do a bunch of tests as we do in TestFromClientSide and TestAdmin. Will be back with more comments. A two hour test run is killing us. It means we only get max of 12 runs a day (and never get that many anyways)... and we have four test branches run up on hudson so the long running test suite is a progress killer (as well as a productivity killer). Reminds me of the microsoft build. HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests - Key: HBASE-4448 URL: https://issues.apache.org/jira/browse/HBASE-4448 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: HBaseTestingUtilityFactory.java, hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch, java_HBASE_4448_v2.patch Setting up and tearing down HBaseTestingUtility instances in unit tests is very expensive. On my MacBook it takes about 10 seconds to set up a MiniCluster, and 7 seconds to tear it down. When multiplied by the number of test classes that use this facility, that's a lot of time in the build. This factory assumes that the JVM is being re-used across test classes in the build, otherwise this pattern won't work. I don't think this is appropriate for every use, but I think it can be applicable in a great many cases - especially where developers just want a simple MiniCluster with 1 slave. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions
[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116024#comment-13116024 ] jirapos...@reviews.apache.org commented on HBASE-3446: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2065/#review2124 --- src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java https://reviews.apache.org/r/2065/#comment4913 This doesn't seem right. - Ted On 2011-09-27 06:38:09, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2065/ bq. --- bq. bq. (Updated 2011-09-27 06:38:09) bq. bq. bq. Review request for hbase and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Make the Meta* operations against meta retry. We do it by using HTable instances. bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc). bq. In 0.89, we had special RetryableMetaOperation class that was a bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries bq. with its retry loop. Now we just use HTable instead (Costs some on setup but bq. otherwise, we avoid duplicating code). Upped the retries on serverside too. bq. bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying bq. heavily on CT methods getting proxy connections to meta and root servers. bq. CT needs to be cut back. This patch closes down access on (unused) public bq. methods and removes being able to get an HRegionInterface on meta and root bq. -- this stuff is used internally to CT only now; use MetaEditor or bq. MetaReader if you want to update or read catalog tables. Opening new issue bq. to cutback CT use over the code base. bq. bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor bq. trying to clean them up, I ended up moving meta migration code out to its bq. own class rather than have it in all inside in MetaEditor. bq. bq. Here is some detail to help reviews. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java bq.Clean up. Shutdown access on some of these unused methods. Don't bq.let out HRegionInterface instances in particular since we are going bq.away from raw HRI use to instead use a connection with retries: bq.i.e. HTable. bq. bq.Comments on state of this class. Javadoc edits. bq.getZooKeeperWatcher on HConnection is deprecated so don't use it bq.in constructor. Override MetaNodeTracker and on node delete bq.reset meta location (We used to do this over in MetaNodeTracker bq.but to do that we had to have a CatalogTracker over in zk package bq.which is silly -- bad package encapsulation). bq. bq.(waitForRootServer) Renamed getRootServerConnection and change it bq.from public to package private. bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed. bq.(getMetaServerConnection) Change from public to package private. bq.Use MetaReader to read the meta location in root rather than a bq.raw HRegionInterface so we get retrying. bq.(remaining, timedout) Added utility methods. bq.(waitForMetaServer) Changed from public to private. bq.(resetMetaLocation) Made it synchronized on metaAvailable. bq.Not all accesses were synchronized. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java bq.Refactor to use HTable instead of raw HRegionInterface so we get bq.retrying. For each operation we get an HTable, use it, then close it. bq.(putToMetaTable, putsToMetaTable, etc) Utility methods. bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own bq.class since these classes are for a one-time migration only. bq. bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java bq.New class that holds all Meta* methods updating meta table used bq.doing the one-time migration done to meta on startup. This class bq.is marked deprecated because its going to be dropped in 0.94. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java bq.Retrofit methods in here to use fullScan methods with Visitor. bq.(getCatalogRegionInterface, getCatalogRegionNameForTable, bq. getCatalogRegionNameForRegion) Removed. bq.(fullScan) Cleaned up the fullScans. Fixed up wrong javadoc. bq.(fullScanOfResults) Renamed as fullScan override. bq.(fullScanOfRoot) Added as deprecated. We should be doing bq.this against zk. bq.(metaRowToRegionPair, getServerNameFromResult) Moved to Result
[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE
[ https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116032#comment-13116032 ] Jonathan Gray commented on HBASE-4497: -- I was just discussing this scenario with Dhruba a few days back. There's definitely a race condition here and I don't see a trivial fix. We use HLog IO-fencing to ensure that edits don't slip into an HLog after a server is considered dead by the Master. But the Master has no way to prevent this META update from slipping in. We need to make some modification to how the master can safely timeout an OPENING. One possibility is for the master to require either an acknowledgment from the RS before moving the region elsewhere or for the RS to die. It seems unlikely that we will actually see the RS to Master acknowledgment since OPENING taking too long is usually a sign of brokenness or the RS being backed up, I think. But in any case I'd imagine some kind of OPEN_CANCEL_REQUESTED state that the Master transitions the node to and only when the RS transitions to OPEN_CANCELED or OFFLINE or something, then it's safe to reassign elsewhere. I think this design still has a hole in it though because there are scenarios where the RS doesn't actually die but for some reason doesn't OPEN or ack the cancel. Another option would be to do the RS performed META edits using a CheckAndPut rather than straight Put. Or we could move META editing back to the Master where it's easy to do things atomically :) The CheckAndPut idea is kind of neat but we'd probably have to send more data on the OPEN_RPC. For example, the existing server start code or server name + start code or something guaranteed unique (guaranteed that a conflicting RS opening stuff wouldn't be able to use the same thing). Then the atomicity is on the META region. If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE --- Key: HBASE-4497 URL: https://issues.apache.org/jira/browse/HBASE-4497 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Priority: Critical As per the discussion in the mail chain HBCK reporting of possible mismatch in RS assignment this JIRA is created. Consider two RS- RS1 and RS2. A region tries to open in RS1. But it takes a while. The RS1 has still not updated meta and transitioned the node from OPENING to OPENED So timeout assigns the region to RS2. RS2 successfully updates the META and opens the region. Now RS1 tries to act on the region by first updating the META and then transiting the node to OPENING to OPENED. RS1 transiting the node to OPENING to OPENED will fail. But the META entry will have RS1 as the latest. Now HBCK reports this as an inconsistency and if we try to scan the Region we get NotServingRegionException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4209) The HBase hbase-daemon.sh SIGKILLs master when stopping it
[ https://issues.apache.org/jira/browse/HBASE-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Shaposhnik updated HBASE-4209: Attachment: HBASE-4209.final.patch.txt Latest version of patch attached. Unfortunately, it has grown since last time I've submitted it to incorporated feedback from this JIRA. All tests pass on my local workstation. The HBase hbase-daemon.sh SIGKILLs master when stopping it -- Key: HBASE-4209 URL: https://issues.apache.org/jira/browse/HBASE-4209 Project: HBase Issue Type: Bug Components: master Reporter: Roman Shaposhnik Assignee: Roman Shaposhnik Attachments: HBASE-4209.final.patch.txt, HBASE-4209.patch.txt There's a bit of code in hbase-daemon.sh that makes HBase master being SIGKILLed when stopping it rather than trying SIGTERM (like it does for other daemons). When HBase is executed in a standalone mode (and the only daemon you need to run is master) that causes newly created tables to go missing as unflushed data is thrown out. If there was not a good reason to kill master with SIGKILL perhaps we can take that special case out and rely on SIGTERM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116059#comment-13116059 ] Hudson commented on HBASE-4455: --- Integrated in HBase-0.92 #24 (See [https://builds.apache.org/job/HBase-0.92/24/]) HBASE-4455 Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager (Ming Ma) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/MasterAddressTracker.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/MetaServerShutdownHandler.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager -- Key: HBASE-4455 URL: https://issues.apache.org/jira/browse/HBASE-4455 Project: HBase Issue Type: Bug Reporter: Ming Ma Assignee: Ming Ma Fix For: 0.92.0 Keep Master up all the time, do rolling restart of RSs like this - stop RS1, wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start RS2, wait for 2 seconds, etc. After a while, you will find the -ROOT-, .META. regions aren't in regions in transtion from AssignmentManager point of view, but they aren't assigned to any regions. Here are the issues. 1. .-ROOT- or .META. location is stale when MetaServerShutdownHandler is invoked to check if it contains -ROOT- region. That is due to long delay from ZK notification and async nature of the system. Here is an example, even though new root region server sea-lab-1,60020,1316380133656 is set at T2, at T3 the shutdown process for sea-lab-1,60020,1316380133656, the root location still points to old server sea-lab-3,60020,1316380037898. T1: 2011-09-18 14:08:52,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:6 -0x1327e43175e Retrieved 29 byte(s) of data from znode /hbase/root-regio n-server and set watcher; sea-lab-3,60020,1316380037898 T2: 2011-09-18 14:08:57,173 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region location in ZooKeeper as sea-lab-1,60020,1316380133656 T3: 2011-09-18 14:10:26,393 DEBUG org.apache.hadoop.hbase.master.ServerManager: Adde d=sea-lab-1,60020,1316380133656 to dead servers, submitted shutdown handler to be executed, root=false, meta=true, current Root Location: sea-lab-3,60020,1316380037898 T4: 2011-09-18 14:12:37,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:6 -0x1327e43175e Retrieved 29 byte(s) of data from znode /hbase/root-region-server and set watcher; sea-lab-1,60020,1316380133656 2. The MetaServerShutdownHandler worker thread that waits for -ROOT- or .META. availability could be blocked. If meanwhile, the new server that -ROOT- or .META. is being assigned restarted, another instance of MetaServerShutdownHandler is queued. Eventually, all MetaServerShutdownHandler worker threads are filled up. It looks like HBASE-4245. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests
[ https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116061#comment-13116061 ] Doug Meil commented on HBASE-4448: -- re: passed from test to test This is described in the issue description: this approach depends on JVM re-use, otherwise, it won't work. The factory has a cache of MiniClusters based on the number of slaves, and it can be added down the road for DFS cluster or ZkCluster as needed. re: pollution I hate pollution! :-) There are two cases: intra-test and inter-test. Inter-test is handled via all the tables getting disabled and whacked when the minicluster is returned to the factory. Intra-test (i.e., multiple test methods in the same class) is the same as is now. There is no automatic cleanup between test-methods in the same class even without this utility, so it's no worse. HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests - Key: HBASE-4448 URL: https://issues.apache.org/jira/browse/HBASE-4448 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: HBaseTestingUtilityFactory.java, hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch, java_HBASE_4448_v2.patch Setting up and tearing down HBaseTestingUtility instances in unit tests is very expensive. On my MacBook it takes about 10 seconds to set up a MiniCluster, and 7 seconds to tear it down. When multiplied by the number of test classes that use this facility, that's a lot of time in the build. This factory assumes that the JVM is being re-used across test classes in the build, otherwise this pattern won't work. I don't think this is appropriate for every use, but I think it can be applicable in a great many cases - especially where developers just want a simple MiniCluster with 1 slave. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters
[ https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116072#comment-13116072 ] Lars Hofhansl commented on HBASE-3130: -- You sound like you are surprised :) [replication] ReplicationSource can't recover from session expired on remote clusters - Key: HBASE-3130 URL: https://issues.apache.org/jira/browse/HBASE-3130 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Chris Trezzo Fix For: 0.92.0 Attachments: 3130-v2.txt, 3130-v3.txt, 3130.txt Currently ReplicationSource cannot recover when its zookeeper connection to its remote cluster expires. HLogs are still being tracked, but a cluster restart is required to continue replication (or a rolling restart). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters
[ https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116073#comment-13116073 ] Jean-Daniel Cryans commented on HBASE-3130: --- More like joyful amusement, it's been broken for so long... we're pushing this in prod really soon. [replication] ReplicationSource can't recover from session expired on remote clusters - Key: HBASE-3130 URL: https://issues.apache.org/jira/browse/HBASE-3130 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: Chris Trezzo Fix For: 0.92.0 Attachments: 3130-v2.txt, 3130-v3.txt, 3130.txt Currently ReplicationSource cannot recover when its zookeeper connection to its remote cluster expires. HLogs are still being tracked, but a cluster restart is required to continue replication (or a rolling restart). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4062) Multi-column scanner unit test
[ https://issues.apache.org/jira/browse/HBASE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116075#comment-13116075 ] jirapos...@reviews.apache.org commented on HBASE-4062: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1012/ --- (Updated 2011-09-28 02:16:10.418401) Review request for hbase and Michael Stack. Changes --- Linking the HBASE-4062 JIRA to this review. Summary --- Adding a unit test for the multi-column scanner. We are using this this to test an optimization we are making to multi-column scans using row-column Bloom filters. The scanner creates multiple StoreFiles for a single column family, each containing a randomized set of columns with different timestamps, and then tests scanning through the whole region with all possible sets of columns specified in the query. Point deletes (deletes of a specific timestamp) are also tested. This addresses bug HBASE-4062. https://issues.apache.org/jira/browse/HBASE-4062 Diffs - src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java PRE-CREATION Diff: https://reviews.apache.org/r/1012/diff Testing --- Run the unit test. Break the store scanner and make sure the test breaks. Thanks, Mikhail Multi-column scanner unit test -- Key: HBASE-4062 URL: https://issues.apache.org/jira/browse/HBASE-4062 Project: HBase Issue Type: Improvement Components: regionserver, test Affects Versions: 0.94.0 Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Labels: test Attachments: test-multi-column-scanner.patch Adding a unit test for the multi-column scanner. We are using this this to test an optimization we are making to multi-column scans using row-column Bloom filters. The scanner creates multiple StoreFiles for a single column family, each containing a randomized set of columns with different timestamps, and then tests scanning through the whole region with all possible sets of columns specified in the query. Point deletes (deletes of a specific timestamp) are also tested. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4062) Multi-column scanner unit test
[ https://issues.apache.org/jira/browse/HBASE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116077#comment-13116077 ] Mikhail Bautin commented on HBASE-4062: --- @Michael: apparently the patch is already committed -- can I close this JIRA? Multi-column scanner unit test -- Key: HBASE-4062 URL: https://issues.apache.org/jira/browse/HBASE-4062 Project: HBase Issue Type: Improvement Components: regionserver, test Affects Versions: 0.94.0 Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Labels: test Attachments: test-multi-column-scanner.patch Adding a unit test for the multi-column scanner. We are using this this to test an optimization we are making to multi-column scans using row-column Bloom filters. The scanner creates multiple StoreFiles for a single column family, each containing a randomized set of columns with different timestamps, and then tests scanning through the whole region with all possible sets of columns specified in the query. Point deletes (deletes of a specific timestamp) are also tested. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.
[ https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116101#comment-13116101 ] Lars Hofhansl commented on HBASE-4496: -- That sounds like a plan Jon. Let me know if you'd like me to have a look at the combined patch. HFile V2 does not honor setCacheBlocks when scanning. - Key: HBASE-4496 URL: https://issues.apache.org/jira/browse/HBASE-4496 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0, 0.94.0 Attachments: 4496.txt While testing the LRU cache during the scanning I noticed quite some churn in the cache even when Scan.cacheBlocks is set to false. After debugging this, I found that HFile V2 always caches blocks in the LRU cache regardless of the cacheBlocks setting. Here's a trace (from Eclipse) showing the problem: HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, HFileBlock) line: 191 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151 StoreFileScanner.reseek(KeyValue) line: 110 KeyValueHeap.reseek(KeyValue) line: 255 StoreScanner.reseek(KeyValue) line: 409 StoreScanner.next(ListKeyValue, int) line: 304 KeyValueHeap.next(ListKeyValue, int) line: 114 KeyValueHeap.next(ListKeyValue) line: 143 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774 HRegion$RegionScannerImpl.nextInternal(int) line: 2722 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 HRegionServer.next(long, int) line: 2092 Every scanner.next causes a reseek, which eventually causes a call to HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the cacheBlocks information is lost. HFileReaderV2.readBlockData calls HFileReaderV2.readBlock with cacheBlocks set unconditionally to true. The fix is not immediately clear, unless we want to pass cacheBlocks to HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly as readBlockData should not care about caching. Avoiding caching during scans is somewhat important for us. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4477) Ability for an application to store metadata into the transaction log
[ https://issues.apache.org/jira/browse/HBASE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116106#comment-13116106 ] dhruba borthakur commented on HBASE-4477: - @Andrew: I would like to do this via a co-processor API, can I change the signature of RegionObserver.prePut() to take in two additional arguments: a WALEdit and Put object? If so, shall I mark the existing prePut precessor api as Deprecated? anybody else have any opinions on where to put the code that implements this functionality? One place is org.apache.hadoop.hbase.coprocessor.library.walMetadata. Ability for an application to store metadata into the transaction log - Key: HBASE-4477 URL: https://issues.apache.org/jira/browse/HBASE-4477 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: hlogMetadata1.txt mySQL allows an application to store an arbitrary blob along with each transaction in its transaction logs. This JIRA is to have a similar feature request for HBASE. The use case is as follows: An application on one data center A stores a blob of data along with each transaction. A replication software picks up these blobs from the transaction logs in A and hands it to another instance of the same application running on a remote data center B. The application in B is responsible for applying this to the remote Hbase cluster (and also handle conflict resolution if any). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE
[ https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116134#comment-13116134 ] ramkrishna.s.vasudevan commented on HBASE-4497: --- I got this problem for 3 regions before HBASE-4452 went in. Now HBASE-4452 will definitely reduce the probability of this happening. If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE --- Key: HBASE-4497 URL: https://issues.apache.org/jira/browse/HBASE-4497 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Priority: Critical As per the discussion in the mail chain HBCK reporting of possible mismatch in RS assignment this JIRA is created. Consider two RS- RS1 and RS2. A region tries to open in RS1. But it takes a while. The RS1 has still not updated meta and transitioned the node from OPENING to OPENED So timeout assigns the region to RS2. RS2 successfully updates the META and opens the region. Now RS1 tries to act on the region by first updating the META and then transiting the node to OPENING to OPENED. RS1 transiting the node to OPENING to OPENED will fail. But the META entry will have RS1 as the latest. Now HBCK reports this as an inconsistency and if we try to scan the Region we get NotServingRegionException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE
[ https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116139#comment-13116139 ] dhruba borthakur commented on HBASE-4497: - Can somebody pl elaborate any disadvantages if we make the master be the only entity that can update META about where the region is being served from? If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE --- Key: HBASE-4497 URL: https://issues.apache.org/jira/browse/HBASE-4497 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Priority: Critical As per the discussion in the mail chain HBCK reporting of possible mismatch in RS assignment this JIRA is created. Consider two RS- RS1 and RS2. A region tries to open in RS1. But it takes a while. The RS1 has still not updated meta and transitioned the node from OPENING to OPENED So timeout assigns the region to RS2. RS2 successfully updates the META and opens the region. Now RS1 tries to act on the region by first updating the META and then transiting the node to OPENING to OPENED. RS1 transiting the node to OPENING to OPENED will fail. But the META entry will have RS1 as the latest. Now HBCK reports this as an inconsistency and if we try to scan the Region we get NotServingRegionException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE
[ https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116151#comment-13116151 ] Ming Ma commented on HBASE-4497: checkAndPut might work. We will use checkAndPut on both ZK as well as HBase. There are other bugs due to the lack of strong synchronization on the ZK nodes among AssignmentManager and RSs. Here is another scenario for race between AM timeoutMonitor and the first RS's openRegion operation. RS1 successfully transition to OPENED state around the same time as timeoutMonitor kicks in, timeoutMonitor gets data from ZK right before RS1 set it to OPENED, thus timeoutMonitor has RS_ZK_REGION_OPENING and tries to reassign the region. In that case, we will end up with the same region on two RSs. Will the followings work? 1. ZKAssign.transitionNode has some sort of checkAndPut semantics when it tries to enforce the original state is the correct one. However, it isn't atomic. It first tries to getData from ZK and then compare. Instead, we can use ZK's checkAndPut API to enforce the atomicity. 2. Introduce a ZK-base global AtomicInteger for region operation; e.g., each openRegion operation will use a new incremental region_operation_ID. Each openRegion operation will validate its own ID with ZK state via checkAndPut. Thus one of the two openRegion operations on RSs won't work. 3. With regard to HBase .META. update, we can put region_operation_ID into the table and enforce new update's region operation ID has to be greater than the previous version for a given region. In that way the older RS won't be able to update the table properly. We will need to introduce a new API for HBase, similar to checkAndPut, more like checkGreaterandPut. If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE --- Key: HBASE-4497 URL: https://issues.apache.org/jira/browse/HBASE-4497 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Priority: Critical As per the discussion in the mail chain HBCK reporting of possible mismatch in RS assignment this JIRA is created. Consider two RS- RS1 and RS2. A region tries to open in RS1. But it takes a while. The RS1 has still not updated meta and transitioned the node from OPENING to OPENED So timeout assigns the region to RS2. RS2 successfully updates the META and opens the region. Now RS1 tries to act on the region by first updating the META and then transiting the node to OPENING to OPENED. RS1 transiting the node to OPENING to OPENED will fail. But the META entry will have RS1 as the latest. Now HBCK reports this as an inconsistency and if we try to scan the Region we get NotServingRegionException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4459) HbaseObjectWritable code is a byte, we will eventually run out of codes
[ https://issues.apache.org/jira/browse/HBASE-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116155#comment-13116155 ] dhruba borthakur commented on HBASE-4459: - @Todd: which varint are you referring to here? HbaseObjectWritable code is a byte, we will eventually run out of codes --- Key: HBASE-4459 URL: https://issues.apache.org/jira/browse/HBASE-4459 Project: HBase Issue Type: Bug Components: io Reporter: Jonathan Gray Priority: Critical Fix For: 0.94.0 There are about 90 classes/codes in HbaseObjectWritable currently and Byte.MAX_VALUE is 127. In addition, anyone wanting to add custom classes but not break compatibility might want to leave a gap before using codes and that's difficult in such limited space. Eventually we should get rid of this pattern that makes compatibility difficult (better client/server protocol handshake) but we should probably at least bump this to a short for 0.94. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE
[ https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116160#comment-13116160 ] ramkrishna.s.vasudevan commented on HBASE-4497: --- I am not aware of ZK much. Your 3rd point looks good to me Ming. I think HBASE-4015 may handle the race you have told above. {code} if (hijack null != curDataInZNode) { EventType eventType = curDataInZNode.getEventType(); if (eventType.equals(EventType.RS_ZK_REGION_CLOSING) || eventType.equals(EventType.RS_ZK_REGION_CLOSED) || eventType.equals(EventType.RS_ZK_REGION_OPENED)) { return -1; } {code} Also if the timeout succeeds the transiting from OPENING to OPENED will fail in RS. May be there may be an entry in META with the old RS. If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE --- Key: HBASE-4497 URL: https://issues.apache.org/jira/browse/HBASE-4497 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Priority: Critical As per the discussion in the mail chain HBCK reporting of possible mismatch in RS assignment this JIRA is created. Consider two RS- RS1 and RS2. A region tries to open in RS1. But it takes a while. The RS1 has still not updated meta and transitioned the node from OPENING to OPENED So timeout assigns the region to RS2. RS2 successfully updates the META and opens the region. Now RS1 tries to act on the region by first updating the META and then transiting the node to OPENING to OPENED. RS1 transiting the node to OPENING to OPENED will fail. But the META entry will have RS1 as the latest. Now HBCK reports this as an inconsistency and if we try to scan the Region we get NotServingRegionException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira