[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests
[ https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642629#comment-13642629 ] stack commented on HBASE-8389: -- Reading over this nice, fat, info-dense issue, I am trying to figure what we need to add to trunk right now. Sounds like checking the recoverFileLease return checking gained us little in the end (though Varun you think we want to keep going till its true though v5 here skips out on it). The valuable finding hereabouts is the need for a pause before going ahead with file open it seems. Trunk does not have this pause. I need to add a version of v5 to trunk? (Holding our breath until an api not yet generally available, isFileClosed hbase-8394, shows up is not an option for now; nor is an expectation that all will just upgrade to an hdfs that has this api on either.) hbase-7878 backport is now elided since we have added back the old behavior w/ patch applied here excepting the pause of an arbitrary enough 4seconds The applied patch here does not loop on recoverLease after the 4seconds expire. It breaks. In trunk we loop. We should break too (...and let it fail if 0 length and then let the next split task do a new recoverLease call?) On the 4seconds, it seems that it rather should be the dfs timeout dfs.socket.timeout that hdfs is using -- plus a second or so -- rather than 4seconds if I follow Varuns' reasoning above properly and just remove the new config 'hbase.lease.recovery.retry.interval' (We have enough configs already)? Sounds like we are depending on WAL sizes being HDFS block sizes. This will not always be the case; we could go into a second block easily if a big edit comes in on the tail of the first block; and then there may be dataloss (TBD) because we have a file size (so we think the file recovered?) Sounds also like we are relying file size being zero as a marker that file is not yet closed (I suppose that is ok because an empty WAL will be 0 length IIRC. We should doc. our dependency though) Varun, i like your low timeouts. Would you suggest we adjust hbase default timeouts down and recommend folks change their hdfs defaults if they want better MTTR? If you had a blog post on your nice work done in here, I could at least point the refguide at it for those interested in improved MTTR (smile). HBASE-8354 forces Namenode into loop with lease recovery requests - Key: HBASE-8389 URL: https://issues.apache.org/jira/browse/HBASE-8389 Project: HBase Issue Type: Bug Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.94.8 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, sample.patch We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease recoveries because of the short retry interval of 1 second between lease recoveries. The namenode gets into the following loop: 1) Receives lease recovery request and initiates recovery choosing a primary datanode every second 2) A lease recovery is successful and the namenode tries to commit the block under recovery as finalized - this takes 10 seconds in our environment since we run with tight HDFS socket timeouts. 3) At step 2), there is a more recent recovery enqueued because of the aggressive retries. This causes the committed block to get preempted and we enter a vicious cycle So we do, initiate_recovery -- commit_block -- commit_preempted_by_another_recovery This loop is paused after 300 seconds which is the hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node detection timeout is 20 seconds. Note that before the patch, we do not call recoverLease so aggressively - also it seems that the HDFS namenode is pretty dumb in that it keeps initiating new recoveries for every call. Before the patch, we call recoverLease, assume that the block was recovered, try to get the file, it has zero length since its under recovery, we fail the task and retry until we get a non zero length. So things just work. Fixes: 1) Expecting recovery to occur within 1 second is too aggressive. We need to have a more generous timeout. The timeout needs to be configurable since typically, the recovery takes as much time as the DFS timeouts. The primary datanode doing the recovery tries to reconcile the blocks and hits the timeouts when it tries to contact the dead node. So the recovery is as fast as the HDFS timeouts. 2) We have another issue I report in HDFS
[jira] [Updated] (HBASE-8445) regionserver can't load an updated coprocessor jar with the same jar path
[ https://issues.apache.org/jira/browse/HBASE-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang Qiang updated HBASE-8445: -- Attachment: patch_20130426_01.txt regionserver can't load an updated coprocessor jar with the same jar path - Key: HBASE-8445 URL: https://issues.apache.org/jira/browse/HBASE-8445 Project: HBase Issue Type: Bug Affects Versions: 0.94.5 Reporter: Wang Qiang Attachments: patch_20130426_01.txt when I update a coprocessor jar, then I disable and enable the table with the coprocessor, but the new features in the updated coprocessor jar doesn't make any sense. Follow into the class 'org.apache.hadoop.hbase.coprocessor.CoprocessorHost', I found that there's a coprocessor class loader cache , of which the key is the coprocessor jar path(although the key is a weak reference), so when I disable/enable the table, it got a cached coprocessor class loader from the cache with the jar path, and it didn't try to reload the coprocessor jar from the hdfs. Here I give a patch, in which I add an extra info which is 'FileCheckSum' with the coprocessor class loader cache, if the checksum is changed, try to reload the jar from the hdfs path -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8445) regionserver can't load an updated coprocessor jar with the same jar path
Wang Qiang created HBASE-8445: - Summary: regionserver can't load an updated coprocessor jar with the same jar path Key: HBASE-8445 URL: https://issues.apache.org/jira/browse/HBASE-8445 Project: HBase Issue Type: Bug Affects Versions: 0.94.5 Reporter: Wang Qiang Attachments: patch_20130426_01.txt when I update a coprocessor jar, then I disable and enable the table with the coprocessor, but the new features in the updated coprocessor jar doesn't make any sense. Follow into the class 'org.apache.hadoop.hbase.coprocessor.CoprocessorHost', I found that there's a coprocessor class loader cache , of which the key is the coprocessor jar path(although the key is a weak reference), so when I disable/enable the table, it got a cached coprocessor class loader from the cache with the jar path, and it didn't try to reload the coprocessor jar from the hdfs. Here I give a patch, in which I add an extra info which is 'FileCheckSum' with the coprocessor class loader cache, if the checksum is changed, try to reload the jar from the hdfs path -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8422) Master won't go down. Stuck waiting on .META. to come on line.
[ https://issues.apache.org/jira/browse/HBASE-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642651#comment-13642651 ] rajeshbabu commented on HBASE-8422: --- [~stack], During master initialization, after initiating ROOT/META region assignment we are waiting until the region assigned successfully. In case of stop/shutdown we skip waiting for ROOT/META assignment and returning from initialization(with patch). In trunk this case already handled that's why 94 patch looking bit different. {code} // Make sure meta assigned before proceeding. if (!assignMeta(status)) return; {code} {code} boolean assignMeta(MonitoredTask status) throws InterruptedException, IOException, KeeperException { ... enableSSHandWaitForMeta(); // Make sure a .META. location is set. if (!isMetaLocation()) return false; ... } {code} Otherwise there are multiple places finishInitialization can hang on master shutdown if no region server is online. It will not impact normal cases. Master won't go down. Stuck waiting on .META. to come on line. --- Key: HBASE-8422 URL: https://issues.apache.org/jira/browse/HBASE-8422 Project: HBase Issue Type: Bug Affects Versions: 0.95.0 Reporter: stack Assignee: rajeshbabu Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: HBASE-8422_2.patch, HBASE-8422_3.patch, HBASE-8422_94.patch, HBASE-8422.patch Master came up w/ no regionservers. I then tried to shut it down. You can see in below that it started to go down {code} 2013-04-24 14:28:49,770 INFO [IPC Server handler 7 on 6] org.apache.hadoop.hbase.master.HMaster: Cluster shutdown requested 2013-04-24 14:28:49,815 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.ServerManager: Finished waiting for region servers count to settle; checked in 0, slept for 2818 ms, expecting minimum of 1, maximum of 2147483647, master is stopped. 2013-04-24 14:28:49,815 WARN [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.MasterFileSystem: Master stopped while splitting logs 2013-04-24 14:28:50,104 INFO [stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor] org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor: stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor exiting 2013-04-24 14:28:50,850 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker: Unsetting META region location in ZooKeeper 2013-04-24 14:28:50,884 WARN [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/meta-region-server already deleted, retry=false 2013-04-24 14:28:50,884 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.AssignmentManager: Cluster shutdown is set; skipping assign of .META.,,1.1028785192 2013-04-24 14:28:50,884 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.ServerManager: AssignmentManager hasn't finished failover cleanup 2013-04-24 14:29:46,188 INFO [master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner] org.apache.hadoop.hbase.master.cleaner.LogCleaner: master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner exiting 2013-04-24 14:29:46,193 INFO [master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner] org.apache.hadoop.hbase.master.cleaner.HFileCleaner: master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner exiting {code} ... but not it is stuck. We keep looping here: {code} master-stack-1.ent.cloudera.com,6,1366838923135 prio=10 tid=0x7f154853f000 nid=0x18b in Object.wait() [0x7f1545fde000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xc727d738 (a org.apache.hadoop.hbase.zookeeper.MetaRegionTracker) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:161) - locked 0xc727d738 (a org.apache.hadoop.hbase.zookeeper.MetaRegionTracker) at org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.waitMetaRegionLocation(MetaRegionTracker.java:105) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:250) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:299) at org.apache.hadoop.hbase.master.HMaster.enableSSHandWaitForMeta(HMaster.java:905) at
[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests
[ https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642669#comment-13642669 ] Nicolas Liochon commented on HBASE-8389: Varun, I +1 Stack: the timeout setting you mentionned are quite impressive! Thanks a lot for all this work. Here is my understanding, please correct me where I'm wrong. In don't think that single / multiple block is an issue, even if it's better to have single block (increased parallelism). HBase has a dataloss risk: we need to wait for the end of recoverFileLease before reading. = Either by polling the NN and calling recoverFileLease multiple times = Either calling isFileClosed (HDFS-4525) (and polling as well) where it's available. I'm not sure that we can poll every second recoverFileLease. When I try I have the same logs as Eric: java.io.IOException: The recovery id 2494 does not match current recovery id 2495 for block, and the state of the namenode seems strange. In critical scenarios, the recoverFileLease won't happen at all. The probability is greatly decreased by HDFS-4721, but it's not zero. In critical scenarios, the recoverFileLease will start, but will be stuck in bad datanodes. The probability is greatly decreased by HDFS-4721 and HDFS-4754, but it's not zero. Here, we need to limit the number of retry in HDFS to one, whatever the global setting, to be on the safe side (no hdfs jira for this). I see a possible common implementation (trunk / hbase 0.94) - if HDFS-4754, calls markAsStale to be sure this datanode won't be used. - call recoverFileLease a first time - if HDFS-4525 is available, call isFileClosed every second to detect that the recovery is done - every 60s, call again recoverFileLease (either isFileClosed is missing, either we went into one of the bad scenario above). This would mean: no dataloss and a MTTR of: - less than a minute if we have stale mode + HDFS-4721 + HDFS-4754 + HDFS-4525 + no retry in HDFS recoverLease or Varun's settings. - around 12 minutes if we have none of the above. But that's what we have already without the stale mode imho. - in the middle if we have a subset of the above patches and config. As HDFS-4721 seems validated by the HDFS dev team, I think that my only question is: can we poll very frequently recoverFileLease if we don't have isFileClosed? As a side node, tests more or less similar to yours with HBase trunk and HDFS branch-2 trunk (without your settings but with a hack to skip the deadnodes) brings similar results. HBASE-8354 forces Namenode into loop with lease recovery requests - Key: HBASE-8389 URL: https://issues.apache.org/jira/browse/HBASE-8389 Project: HBase Issue Type: Bug Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.94.8 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, sample.patch We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease recoveries because of the short retry interval of 1 second between lease recoveries. The namenode gets into the following loop: 1) Receives lease recovery request and initiates recovery choosing a primary datanode every second 2) A lease recovery is successful and the namenode tries to commit the block under recovery as finalized - this takes 10 seconds in our environment since we run with tight HDFS socket timeouts. 3) At step 2), there is a more recent recovery enqueued because of the aggressive retries. This causes the committed block to get preempted and we enter a vicious cycle So we do, initiate_recovery -- commit_block -- commit_preempted_by_another_recovery This loop is paused after 300 seconds which is the hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node detection timeout is 20 seconds. Note that before the patch, we do not call recoverLease so aggressively - also it seems that the HDFS namenode is pretty dumb in that it keeps initiating new recoveries for every call. Before the patch, we call recoverLease, assume that the block was recovered, try to get the file, it has zero length since its under recovery, we fail the task and retry until we get a non zero length. So things just work. Fixes: 1) Expecting recovery to occur within 1 second is too aggressive. We need to have a more generous timeout. The timeout needs to be configurable since typically, the recovery takes as much time as the DFS timeouts. The primary datanode doing the
[jira] [Commented] (HBASE-8392) TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 profile
[ https://issues.apache.org/jira/browse/HBASE-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642677#comment-13642677 ] Hudson commented on HBASE-8392: --- Integrated in hbase-0.95 #163 (See [https://builds.apache.org/job/hbase-0.95/163/]) HBASE-8392 TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 profile (Revision 1475997) Result = FAILURE eclark : Files : * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestExactCounterMetric.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestExponentiallyDecayingSample.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsHistogram.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsMBeanBase.java TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 profile Key: HBASE-8392 URL: https://issues.apache.org/jira/browse/HBASE-8392 Project: HBase Issue Type: Sub-task Components: hadoop2, metrics, test Affects Versions: 0.98.0, 0.95.0 Reporter: Jonathan Hsieh Assignee: Elliott Clark Fix For: 0.98.0, 0.95.1 Attachments: HBASE-8392-0.patch This specific small unit tests flakes out occasionally and blocks the medium and large tests from running. Here's an error trace: {code} Error Message expected:2.0 but was:0.125 Stacktrace junit.framework.AssertionFailedError: expected:2.0 but was:0.125 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:120) at junit.framework.Assert.assertEquals(Assert.java:129) at junit.framework.TestCase.assertEquals(TestCase.java:288) at org.apache.hadoop.hbase.metrics.TestMetricsMBeanBase.testGetAttribute(TestMetricsMBeanBase.java:93) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.junit.runners.Suite.runChild(Suite.java:127) at org.junit.runners.Suite.runChild(Suite.java:26) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} [~eclark] took a quick look and will chime in on this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8024) Make Store flush algorithm pluggable
[ https://issues.apache.org/jira/browse/HBASE-8024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642676#comment-13642676 ] Hudson commented on HBASE-8024: --- Integrated in hbase-0.95 #163 (See [https://builds.apache.org/job/hbase-0.95/163/]) HBASE-8024 Make Store flush algorithm pluggable (Revision 1475871) Result = FAILURE sershe : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreEngine.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreFlusher.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreEngine.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlushContext.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlusher.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultStoreEngine.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java Make Store flush algorithm pluggable Key: HBASE-8024 URL: https://issues.apache.org/jira/browse/HBASE-8024 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.94.5, 0.95.0, 0.95.2 Reporter: Maryann Xue Assignee: Sergey Shelukhin Fix For: 0.95.1 Attachments: HBASE-8024-trunk.patch, HBASE-8024.v2.patch, HBASE-8024-v3.patch, HBASE-8024-v4.patch The idea is to make StoreFlusher an interface instead of an implementation class, and have the original StoreFlusher as the default store flush impl. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8393) Testcase TestHeapSize#testMutations is wrong
[ https://issues.apache.org/jira/browse/HBASE-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642678#comment-13642678 ] Hudson commented on HBASE-8393: --- Integrated in hbase-0.95 #163 (See [https://builds.apache.org/job/hbase-0.95/163/]) HBASE-8393 Testcase TestHeapSize#testMutations is wrong (Jeffrey) (Revision 1476024) Result = FAILURE tedyu : Files : * /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/io/TestHeapSize.java Testcase TestHeapSize#testMutations is wrong Key: HBASE-8393 URL: https://issues.apache.org/jira/browse/HBASE-8393 Project: HBase Issue Type: Bug Components: test Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.95.1 Attachments: hbase-8393.patch I happened to check this test case and there are several existing errors to make it pass. You can reproduce the test case failure by adding a new field into Mutation, the test case will either fail on a 64 bit system or 32 bit one. Below are errors I found in the test case: 1) The test case is using {code}row=new byte[]{0}{code} which is an array with length=1 while ClassSize.estimateBase can only calculate base class size(without counting field array length) 2) Add ClassSize.REFERENCE twice in the following code because ClassSize.estimateBase adds all reference fields already. {code}expected += ClassSize.align(ClassSize.TREEMAP + ClassSize.REFERENCE);{code} 3) ClassSize.estimateBase round up the sum of length of reference fields + primitive fields + Array while Mutation.MUTATION_OVERHEAD aligns the sum of length of a different set of fields. Therefore, there will be round up differences for class Increment because it introduces a new reference field TimeRange tr when the test case runs on a 32bit and 64 bit system. {code} ... long prealign_size = coeff[0] + align(coeff[1] * ARRAY) + coeff[2] * REFERENCE; // Round up to a multiple of 8 long size = align(prealign_size); ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8422) Master won't go down. Stuck waiting on .META. to come on line.
[ https://issues.apache.org/jira/browse/HBASE-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642679#comment-13642679 ] Hudson commented on HBASE-8422: --- Integrated in hbase-0.95 #163 (See [https://builds.apache.org/job/hbase-0.95/163/]) HBASE-8422 Master won't go down. Stuck waiting on .META. to come on line (Revision 1475987) Result = FAILURE stack : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java Master won't go down. Stuck waiting on .META. to come on line. --- Key: HBASE-8422 URL: https://issues.apache.org/jira/browse/HBASE-8422 Project: HBase Issue Type: Bug Affects Versions: 0.95.0 Reporter: stack Assignee: rajeshbabu Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: HBASE-8422_2.patch, HBASE-8422_3.patch, HBASE-8422_94.patch, HBASE-8422.patch Master came up w/ no regionservers. I then tried to shut it down. You can see in below that it started to go down {code} 2013-04-24 14:28:49,770 INFO [IPC Server handler 7 on 6] org.apache.hadoop.hbase.master.HMaster: Cluster shutdown requested 2013-04-24 14:28:49,815 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.ServerManager: Finished waiting for region servers count to settle; checked in 0, slept for 2818 ms, expecting minimum of 1, maximum of 2147483647, master is stopped. 2013-04-24 14:28:49,815 WARN [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.MasterFileSystem: Master stopped while splitting logs 2013-04-24 14:28:50,104 INFO [stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor] org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor: stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor exiting 2013-04-24 14:28:50,850 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker: Unsetting META region location in ZooKeeper 2013-04-24 14:28:50,884 WARN [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/meta-region-server already deleted, retry=false 2013-04-24 14:28:50,884 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.AssignmentManager: Cluster shutdown is set; skipping assign of .META.,,1.1028785192 2013-04-24 14:28:50,884 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.ServerManager: AssignmentManager hasn't finished failover cleanup 2013-04-24 14:29:46,188 INFO [master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner] org.apache.hadoop.hbase.master.cleaner.LogCleaner: master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner exiting 2013-04-24 14:29:46,193 INFO [master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner] org.apache.hadoop.hbase.master.cleaner.HFileCleaner: master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner exiting {code} ... but not it is stuck. We keep looping here: {code} master-stack-1.ent.cloudera.com,6,1366838923135 prio=10 tid=0x7f154853f000 nid=0x18b in Object.wait() [0x7f1545fde000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xc727d738 (a org.apache.hadoop.hbase.zookeeper.MetaRegionTracker) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:161) - locked 0xc727d738 (a org.apache.hadoop.hbase.zookeeper.MetaRegionTracker) at org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.waitMetaRegionLocation(MetaRegionTracker.java:105) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:250) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:299) at org.apache.hadoop.hbase.master.HMaster.enableSSHandWaitForMeta(HMaster.java:905) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:879) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:764) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:522) at java.lang.Thread.run(Thread.java:722) {code} Odd. It is supposed to be checking the 'stopped' flag; maybe it has wrong stop flag. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please
[jira] [Commented] (HBASE-8345) Add all available resources in o.a.h.h.rest.RootResource and VersionResource to o.a.h.h.rest.client.RemoteAdmin
[ https://issues.apache.org/jira/browse/HBASE-8345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642680#comment-13642680 ] Hudson commented on HBASE-8345: --- Integrated in hbase-0.95 #163 (See [https://builds.apache.org/job/hbase-0.95/163/]) HBASE-8345 Add all available resources in RootResource and VersionResource to rest RemoteAdmin (Aleksandr Shulman) (Revision 1476027) Result = FAILURE jxiang : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteAdmin.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/rest/client/TestRemoteAdmin.java Add all available resources in o.a.h.h.rest.RootResource and VersionResource to o.a.h.h.rest.client.RemoteAdmin --- Key: HBASE-8345 URL: https://issues.apache.org/jira/browse/HBASE-8345 Project: HBase Issue Type: Improvement Components: Client, REST Affects Versions: 0.94.6.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Labels: rest_api Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: HBASE-8345-v1.patch, HBASE-8345-v6-94.patch, HBASE-8345-v6-trunk.patch In our built-in REST clients, we should add in more of the available REST resources. This will allow more thorough testing of the REST API, particularly with IntegrationTest. These clients are located in the o.a.h.h.rest.client package. In this case, I want to add the resources not already included in / and /version to o.a.h.h.rest.client.RemoteAdmin. This includes, /status/cluster, /version/rest and /version/cluster, among others. The RemoteAdmin class is a logical place for these methods because it is not related to a specific table (those methods should go into RemoteHTable). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8415) DisabledRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642681#comment-13642681 ] Hudson commented on HBASE-8415: --- Integrated in hbase-0.95 #163 (See [https://builds.apache.org/job/hbase-0.95/163/]) HBASE-8415 DisabledRegionSplitPolicy (Revision 1475944) Result = FAILURE enis : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DisabledRegionSplitPolicy.java DisabledRegionSplitPolicy - Key: HBASE-8415 URL: https://issues.apache.org/jira/browse/HBASE-8415 Project: HBase Issue Type: New Feature Components: regionserver Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: hbase-8415_v1.patch Simple RegionSplitPolicy for tests, and some special cases where we want to disable splits. Makes it easier and more explicit than using a ConstantSizeRegionSplitPolicy with a large region size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8299) ExploringCompactionPolicy can get stuck in rare cases.
[ https://issues.apache.org/jira/browse/HBASE-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642682#comment-13642682 ] Hudson commented on HBASE-8299: --- Integrated in hbase-0.95 #163 (See [https://builds.apache.org/job/hbase-0.95/163/]) HBASE-8299 ExploringCompactionPolicy can get stuck in rare cases. (Revision 1475965) Result = FAILURE eclark : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreEngine.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreConfigInformation.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/ConstantSizeFileListGenerator.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/EverythingPolicy.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/ExplicitFileListGenerator.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/GaussianFileListGenerator.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/MockStoreFileGenerator.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/PerfTestCompactionPolicies.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SemiConstantSizeFileListGenerator.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SinusoidalFileListGenerator.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SpikyFileListGenerator.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/StoreFileListGenerator.java ExploringCompactionPolicy can get stuck in rare cases. -- Key: HBASE-8299 URL: https://issues.apache.org/jira/browse/HBASE-8299 Project: HBase Issue Type: Bug Affects Versions: 0.95.1 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.98.0, 0.95.1 Attachments: HBASE-8299-0.patch, HBASE-8299-1.patch, HBASE-8299-2.patch, HBASE-8299-3.patch If the files are very oddly sized then it's possible that ExploringCompactionPolicy can get stuck. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8428) Tighten up IntegrationTestsDriver filter
[ https://issues.apache.org/jira/browse/HBASE-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642683#comment-13642683 ] Hudson commented on HBASE-8428: --- Integrated in hbase-0.95 #163 (See [https://builds.apache.org/job/hbase-0.95/163/]) HBASE-8428 Tighten up IntegrationTestsDriver filter (Revision 1475995) Result = FAILURE stack : Files : * /hbase/branches/0.95/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestsDriver.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/util/AbstractHBaseTool.java * /hbase/branches/0.95/src/main/docbkx/developer.xml Tighten up IntegrationTestsDriver filter Key: HBASE-8428 URL: https://issues.apache.org/jira/browse/HBASE-8428 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.95.1 Attachments: 8428.txt Currently, filter that looks for IntegrationTests is broad. Reports loads of errors as we try to parse classes we don't care about. Let me tighten it up so it doesn't scare folks away. It is particular bad when being run against a distribute cluster when the test context is not all present; here there are lots of ERROR reports about classes not found. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5930) Limits the amount of time an edit can live in the memstore.
[ https://issues.apache.org/jira/browse/HBASE-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642684#comment-13642684 ] Hudson commented on HBASE-5930: --- Integrated in hbase-0.95 #163 (See [https://builds.apache.org/job/hbase-0.95/163/]) HBASE-5930. Removed a configuration that was causing unnecessary flushes in tests. (Revision 1475991) HBASE-5930. Limits the amount of time an edit can live in the memstore. (Revision 1475874) Result = FAILURE ddas : Files : * /hbase/branches/0.95/hbase-server/src/test/resources/hbase-site.xml ddas : Files : * /hbase/branches/0.95/hbase-common/src/main/resources/hbase-default.xml * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequester.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java Limits the amount of time an edit can live in the memstore. --- Key: HBASE-5930 URL: https://issues.apache.org/jira/browse/HBASE-5930 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Devaraj Das Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: 5930-0.94.txt, 5930-1.patch, 5930-2.1.patch, 5930-2.2.patch, 5930-2.3.patch, 5930-2.4.patch, 5930-track-oldest-sample.txt, 5930-wip.patch, HBASE-5930-ADD-0.patch, hbase-5930-addendum2.patch, hbase-5930-test-execution.log A colleague of mine ran into an interesting issue. He inserted some data with the WAL disabled, which happened to fit in the aggregate Memstores memory. Two weeks later he a had problem with the HDFS cluster, which caused the region servers to abort. He found that his data was lost. Looking at the log we found that the Memstores were not flushed at all during these two weeks. Should we have an option to flush memstores periodically. There are obvious downsides to this, like many small storefiles, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642685#comment-13642685 ] Nicolas Liochon commented on HBASE-6435: During the tests on the impact of waiting for the end of hdfs recoverLease, it appeared: - there is a bug, and somes files are not detected. - we have a dependency on the machine name (issue if a machine has multiple names). HDFS-4754 supercedes this, so, to keep things simple and limit the number of possible configuration my plan is: - make sure that HDFS-4754 makes it to a reasonable number of hdfs branches. - revert this. Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.95.0 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch, 6435.v14.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 6435.v9.patch, 6535.v11.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8415) DisabledRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642741#comment-13642741 ] Hudson commented on HBASE-8415: --- Integrated in HBase-0.94 #968 (See [https://builds.apache.org/job/HBase-0.94/968/]) HBASE-8415 DisabledRegionSplitPolicy (Revision 1475946) Result = SUCCESS enis : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/DisabledRegionSplitPolicy.java DisabledRegionSplitPolicy - Key: HBASE-8415 URL: https://issues.apache.org/jira/browse/HBASE-8415 Project: HBase Issue Type: New Feature Components: regionserver Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: hbase-8415_v1.patch Simple RegionSplitPolicy for tests, and some special cases where we want to disable splits. Makes it easier and more explicit than using a ConstantSizeRegionSplitPolicy with a large region size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8345) Add all available resources in o.a.h.h.rest.RootResource and VersionResource to o.a.h.h.rest.client.RemoteAdmin
[ https://issues.apache.org/jira/browse/HBASE-8345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642740#comment-13642740 ] Hudson commented on HBASE-8345: --- Integrated in HBase-0.94 #968 (See [https://builds.apache.org/job/HBase-0.94/968/]) HBASE-8345 Add all available resources in RootResource and VersionResource to rest RemoteAdmin (Aleksandr Shulman) (Revision 1476028) Result = SUCCESS jxiang : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteAdmin.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/rest/client/TestRemoteAdmin.java Add all available resources in o.a.h.h.rest.RootResource and VersionResource to o.a.h.h.rest.client.RemoteAdmin --- Key: HBASE-8345 URL: https://issues.apache.org/jira/browse/HBASE-8345 Project: HBase Issue Type: Improvement Components: Client, REST Affects Versions: 0.94.6.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Labels: rest_api Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: HBASE-8345-v1.patch, HBASE-8345-v6-94.patch, HBASE-8345-v6-trunk.patch In our built-in REST clients, we should add in more of the available REST resources. This will allow more thorough testing of the REST API, particularly with IntegrationTest. These clients are located in the o.a.h.h.rest.client package. In this case, I want to add the resources not already included in / and /version to o.a.h.h.rest.client.RemoteAdmin. This includes, /status/cluster, /version/rest and /version/cluster, among others. The RemoteAdmin class is a logical place for these methods because it is not related to a specific table (those methods should go into RemoteHTable). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8432) a table with unbalanced regions will balance indefinitely with the 'org.apache.hadoop.hbase.master.DefaultLoadBalancer'
[ https://issues.apache.org/jira/browse/HBASE-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642785#comment-13642785 ] Jean-Marc Spaggiari commented on HBASE-8432: Thanks for the followup [~aaronwq]. Have you tried also the other scenarios? Like regions# RS#/2 and regions# RS#? Are they all still working fine? a table with unbalanced regions will balance indefinitely with the 'org.apache.hadoop.hbase.master.DefaultLoadBalancer' --- Key: HBASE-8432 URL: https://issues.apache.org/jira/browse/HBASE-8432 Project: HBase Issue Type: Bug Components: Balancer Affects Versions: 0.94.5 Environment: Linux 2.6.32-el5.x86_64 Reporter: Wang Qiang Priority: Critical Attachments: patch_20130425_01.txt it happened that a table with unbalanced regions, as follows in my cluster(the cluster has 20 regionservers, the table has 12 regions): http://hadoopdev19.cm6:60030/ 1 http://hadoopdev8.cm6:60030/ 2 http://hadoopdev17.cm6:60030/ 1 http://hadoopdev12.cm6:60030/ 1 http://hadoopdev5.cm6:60030/ 1 http://hadoopdev9.cm6:60030/ 1 http://hadoopdev22.cm6:60030/ 1 http://hadoopdev11.cm6:60030/ 1 http://hadoopdev21.cm6:60030/ 1 http://hadoopdev16.cm6:60030/ 1 http://hadoopdev10.cm6:60030/ 1 with the 'org.apache.hadoop.hbase.master.DefaultLoadBalancer', after 5 times load-balances, the table are still unbalanced: http://hadoopdev3.cm6:60030/ 1 http://hadoopdev20.cm6:60030/ 1 http://hadoopdev4.cm6:60030/ 2 http://hadoopdev18.cm6:60030/ 1 http://hadoopdev12.cm6:60030/ 1 http://hadoopdev14.cm6:60030/ 1 http://hadoopdev15.cm6:60030/ 1 http://hadoopdev6.cm6:60030/ 1 http://hadoopdev13.cm6:60030/ 1 http://hadoopdev11.cm6:60030/ 1 http://hadoopdev10.cm6:60030/ 1 http://hadoopdev19.cm6:60030/ 1 http://hadoopdev17.cm6:60030/ 1 http://hadoopdev8.cm6:60030/ 1 http://hadoopdev5.cm6:60030/ 1 http://hadoopdev12.cm6:60030/ 1 http://hadoopdev22.cm6:60030/ 1 http://hadoopdev11.cm6:60030/ 1 http://hadoopdev21.cm6:60030/ 1 http://hadoopdev7.cm6:60030/ 2 http://hadoopdev10.cm6:60030/ 1 http://hadoopdev16.cm6:60030/ 1 http://hadoopdev3.cm6:60030/ 1 http://hadoopdev20.cm6:60030/ 1 http://hadoopdev4.cm6:60030/ 1 http://hadoopdev18.cm6:60030/ 2 http://hadoopdev12.cm6:60030/ 1 http://hadoopdev14.cm6:60030/ 1 http://hadoopdev15.cm6:60030/ 1 http://hadoopdev6.cm6:60030/ 1 http://hadoopdev13.cm6:60030/ 1 http://hadoopdev11.cm6:60030/ 1 http://hadoopdev10.cm6:60030/ 1 http://hadoopdev19.cm6:60030/ 1 http://hadoopdev8.cm6:60030/ 1 http://hadoopdev17.cm6:60030/ 1 http://hadoopdev12.cm6:60030/ 1 http://hadoopdev5.cm6:60030/ 1 http://hadoopdev22.cm6:60030/ 1 http://hadoopdev11.cm6:60030/ 1 http://hadoopdev7.cm6:60030/ 1 http://hadoopdev21.cm6:60030/ 2 http://hadoopdev16.cm6:60030/ 1 http://hadoopdev10.cm6:60030/ 1 http://hadoopdev3.cm6:60030/ 1 http://hadoopdev20.cm6:60030/ 1 http://hadoopdev18.cm6:60030/ 1 http://hadoopdev4.cm6:60030/ 1 http://hadoopdev12.cm6:60030/ 1 http://hadoopdev15.cm6:60030/ 1 http://hadoopdev14.cm6:60030/ 2 http://hadoopdev6.cm6:60030/ 1 http://hadoopdev13.cm6:60030/ 1 http://hadoopdev11.cm6:60030/ 1 http://hadoopdev10.cm6:60030/ 1 from the above logs, we can also find that some regions needn't move, but they moved. follow into 'org.apache.hadoop.hbase.master.DefaultLoadBalancer.balanceCluster()', I found that 'maxToTake' is error calculated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8024) Make Store flush algorithm pluggable
[ https://issues.apache.org/jira/browse/HBASE-8024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642790#comment-13642790 ] Hudson commented on HBASE-8024: --- Integrated in hbase-0.95-on-hadoop2 #81 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/]) HBASE-8024 Make Store flush algorithm pluggable (Revision 1475871) Result = FAILURE sershe : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreEngine.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreFlusher.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreEngine.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlushContext.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlusher.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultStoreEngine.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java Make Store flush algorithm pluggable Key: HBASE-8024 URL: https://issues.apache.org/jira/browse/HBASE-8024 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.94.5, 0.95.0, 0.95.2 Reporter: Maryann Xue Assignee: Sergey Shelukhin Fix For: 0.95.1 Attachments: HBASE-8024-trunk.patch, HBASE-8024.v2.patch, HBASE-8024-v3.patch, HBASE-8024-v4.patch The idea is to make StoreFlusher an interface instead of an implementation class, and have the original StoreFlusher as the default store flush impl. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8392) TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 profile
[ https://issues.apache.org/jira/browse/HBASE-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642791#comment-13642791 ] Hudson commented on HBASE-8392: --- Integrated in hbase-0.95-on-hadoop2 #81 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/]) HBASE-8392 TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 profile (Revision 1475997) Result = FAILURE eclark : Files : * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestExactCounterMetric.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestExponentiallyDecayingSample.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsHistogram.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsMBeanBase.java TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 profile Key: HBASE-8392 URL: https://issues.apache.org/jira/browse/HBASE-8392 Project: HBase Issue Type: Sub-task Components: hadoop2, metrics, test Affects Versions: 0.98.0, 0.95.0 Reporter: Jonathan Hsieh Assignee: Elliott Clark Fix For: 0.98.0, 0.95.1 Attachments: HBASE-8392-0.patch This specific small unit tests flakes out occasionally and blocks the medium and large tests from running. Here's an error trace: {code} Error Message expected:2.0 but was:0.125 Stacktrace junit.framework.AssertionFailedError: expected:2.0 but was:0.125 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:120) at junit.framework.Assert.assertEquals(Assert.java:129) at junit.framework.TestCase.assertEquals(TestCase.java:288) at org.apache.hadoop.hbase.metrics.TestMetricsMBeanBase.testGetAttribute(TestMetricsMBeanBase.java:93) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.junit.runners.Suite.runChild(Suite.java:127) at org.junit.runners.Suite.runChild(Suite.java:26) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} [~eclark] took a quick look and will chime in on this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8393) Testcase TestHeapSize#testMutations is wrong
[ https://issues.apache.org/jira/browse/HBASE-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642792#comment-13642792 ] Hudson commented on HBASE-8393: --- Integrated in hbase-0.95-on-hadoop2 #81 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/]) HBASE-8393 Testcase TestHeapSize#testMutations is wrong (Jeffrey) (Revision 1476024) Result = FAILURE tedyu : Files : * /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/io/TestHeapSize.java Testcase TestHeapSize#testMutations is wrong Key: HBASE-8393 URL: https://issues.apache.org/jira/browse/HBASE-8393 Project: HBase Issue Type: Bug Components: test Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.95.1 Attachments: hbase-8393.patch I happened to check this test case and there are several existing errors to make it pass. You can reproduce the test case failure by adding a new field into Mutation, the test case will either fail on a 64 bit system or 32 bit one. Below are errors I found in the test case: 1) The test case is using {code}row=new byte[]{0}{code} which is an array with length=1 while ClassSize.estimateBase can only calculate base class size(without counting field array length) 2) Add ClassSize.REFERENCE twice in the following code because ClassSize.estimateBase adds all reference fields already. {code}expected += ClassSize.align(ClassSize.TREEMAP + ClassSize.REFERENCE);{code} 3) ClassSize.estimateBase round up the sum of length of reference fields + primitive fields + Array while Mutation.MUTATION_OVERHEAD aligns the sum of length of a different set of fields. Therefore, there will be round up differences for class Increment because it introduces a new reference field TimeRange tr when the test case runs on a 32bit and 64 bit system. {code} ... long prealign_size = coeff[0] + align(coeff[1] * ARRAY) + coeff[2] * REFERENCE; // Round up to a multiple of 8 long size = align(prealign_size); ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8422) Master won't go down. Stuck waiting on .META. to come on line.
[ https://issues.apache.org/jira/browse/HBASE-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642793#comment-13642793 ] Hudson commented on HBASE-8422: --- Integrated in hbase-0.95-on-hadoop2 #81 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/]) HBASE-8422 Master won't go down. Stuck waiting on .META. to come on line (Revision 1475987) Result = FAILURE stack : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java Master won't go down. Stuck waiting on .META. to come on line. --- Key: HBASE-8422 URL: https://issues.apache.org/jira/browse/HBASE-8422 Project: HBase Issue Type: Bug Affects Versions: 0.95.0 Reporter: stack Assignee: rajeshbabu Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: HBASE-8422_2.patch, HBASE-8422_3.patch, HBASE-8422_94.patch, HBASE-8422.patch Master came up w/ no regionservers. I then tried to shut it down. You can see in below that it started to go down {code} 2013-04-24 14:28:49,770 INFO [IPC Server handler 7 on 6] org.apache.hadoop.hbase.master.HMaster: Cluster shutdown requested 2013-04-24 14:28:49,815 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.ServerManager: Finished waiting for region servers count to settle; checked in 0, slept for 2818 ms, expecting minimum of 1, maximum of 2147483647, master is stopped. 2013-04-24 14:28:49,815 WARN [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.MasterFileSystem: Master stopped while splitting logs 2013-04-24 14:28:50,104 INFO [stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor] org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor: stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor exiting 2013-04-24 14:28:50,850 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker: Unsetting META region location in ZooKeeper 2013-04-24 14:28:50,884 WARN [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/meta-region-server already deleted, retry=false 2013-04-24 14:28:50,884 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.AssignmentManager: Cluster shutdown is set; skipping assign of .META.,,1.1028785192 2013-04-24 14:28:50,884 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.ServerManager: AssignmentManager hasn't finished failover cleanup 2013-04-24 14:29:46,188 INFO [master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner] org.apache.hadoop.hbase.master.cleaner.LogCleaner: master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner exiting 2013-04-24 14:29:46,193 INFO [master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner] org.apache.hadoop.hbase.master.cleaner.HFileCleaner: master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner exiting {code} ... but not it is stuck. We keep looping here: {code} master-stack-1.ent.cloudera.com,6,1366838923135 prio=10 tid=0x7f154853f000 nid=0x18b in Object.wait() [0x7f1545fde000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xc727d738 (a org.apache.hadoop.hbase.zookeeper.MetaRegionTracker) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:161) - locked 0xc727d738 (a org.apache.hadoop.hbase.zookeeper.MetaRegionTracker) at org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.waitMetaRegionLocation(MetaRegionTracker.java:105) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:250) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:299) at org.apache.hadoop.hbase.master.HMaster.enableSSHandWaitForMeta(HMaster.java:905) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:879) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:764) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:522) at java.lang.Thread.run(Thread.java:722) {code} Odd. It is supposed to be checking the 'stopped' flag; maybe it has wrong stop flag. -- This message is automatically generated by JIRA. If you think it was sent
[jira] [Commented] (HBASE-8345) Add all available resources in o.a.h.h.rest.RootResource and VersionResource to o.a.h.h.rest.client.RemoteAdmin
[ https://issues.apache.org/jira/browse/HBASE-8345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642794#comment-13642794 ] Hudson commented on HBASE-8345: --- Integrated in hbase-0.95-on-hadoop2 #81 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/]) HBASE-8345 Add all available resources in RootResource and VersionResource to rest RemoteAdmin (Aleksandr Shulman) (Revision 1476027) Result = FAILURE jxiang : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteAdmin.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/rest/client/TestRemoteAdmin.java Add all available resources in o.a.h.h.rest.RootResource and VersionResource to o.a.h.h.rest.client.RemoteAdmin --- Key: HBASE-8345 URL: https://issues.apache.org/jira/browse/HBASE-8345 Project: HBase Issue Type: Improvement Components: Client, REST Affects Versions: 0.94.6.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Labels: rest_api Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: HBASE-8345-v1.patch, HBASE-8345-v6-94.patch, HBASE-8345-v6-trunk.patch In our built-in REST clients, we should add in more of the available REST resources. This will allow more thorough testing of the REST API, particularly with IntegrationTest. These clients are located in the o.a.h.h.rest.client package. In this case, I want to add the resources not already included in / and /version to o.a.h.h.rest.client.RemoteAdmin. This includes, /status/cluster, /version/rest and /version/cluster, among others. The RemoteAdmin class is a logical place for these methods because it is not related to a specific table (those methods should go into RemoteHTable). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8415) DisabledRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642795#comment-13642795 ] Hudson commented on HBASE-8415: --- Integrated in hbase-0.95-on-hadoop2 #81 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/]) HBASE-8415 DisabledRegionSplitPolicy (Revision 1475944) Result = FAILURE enis : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DisabledRegionSplitPolicy.java DisabledRegionSplitPolicy - Key: HBASE-8415 URL: https://issues.apache.org/jira/browse/HBASE-8415 Project: HBase Issue Type: New Feature Components: regionserver Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: hbase-8415_v1.patch Simple RegionSplitPolicy for tests, and some special cases where we want to disable splits. Makes it easier and more explicit than using a ConstantSizeRegionSplitPolicy with a large region size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8299) ExploringCompactionPolicy can get stuck in rare cases.
[ https://issues.apache.org/jira/browse/HBASE-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642796#comment-13642796 ] Hudson commented on HBASE-8299: --- Integrated in hbase-0.95-on-hadoop2 #81 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/]) HBASE-8299 ExploringCompactionPolicy can get stuck in rare cases. (Revision 1475965) Result = FAILURE eclark : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreEngine.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreConfigInformation.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/ConstantSizeFileListGenerator.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/EverythingPolicy.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/ExplicitFileListGenerator.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/GaussianFileListGenerator.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/MockStoreFileGenerator.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/PerfTestCompactionPolicies.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SemiConstantSizeFileListGenerator.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SinusoidalFileListGenerator.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SpikyFileListGenerator.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/StoreFileListGenerator.java ExploringCompactionPolicy can get stuck in rare cases. -- Key: HBASE-8299 URL: https://issues.apache.org/jira/browse/HBASE-8299 Project: HBase Issue Type: Bug Affects Versions: 0.95.1 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.98.0, 0.95.1 Attachments: HBASE-8299-0.patch, HBASE-8299-1.patch, HBASE-8299-2.patch, HBASE-8299-3.patch If the files are very oddly sized then it's possible that ExploringCompactionPolicy can get stuck. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8428) Tighten up IntegrationTestsDriver filter
[ https://issues.apache.org/jira/browse/HBASE-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642797#comment-13642797 ] Hudson commented on HBASE-8428: --- Integrated in hbase-0.95-on-hadoop2 #81 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/]) HBASE-8428 Tighten up IntegrationTestsDriver filter (Revision 1475995) Result = FAILURE stack : Files : * /hbase/branches/0.95/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestsDriver.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/util/AbstractHBaseTool.java * /hbase/branches/0.95/src/main/docbkx/developer.xml Tighten up IntegrationTestsDriver filter Key: HBASE-8428 URL: https://issues.apache.org/jira/browse/HBASE-8428 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.95.1 Attachments: 8428.txt Currently, filter that looks for IntegrationTests is broad. Reports loads of errors as we try to parse classes we don't care about. Let me tighten it up so it doesn't scare folks away. It is particular bad when being run against a distribute cluster when the test context is not all present; here there are lots of ERROR reports about classes not found. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5930) Limits the amount of time an edit can live in the memstore.
[ https://issues.apache.org/jira/browse/HBASE-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642798#comment-13642798 ] Hudson commented on HBASE-5930: --- Integrated in hbase-0.95-on-hadoop2 #81 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/]) HBASE-5930. Removed a configuration that was causing unnecessary flushes in tests. (Revision 1475991) HBASE-5930. Limits the amount of time an edit can live in the memstore. (Revision 1475874) Result = FAILURE ddas : Files : * /hbase/branches/0.95/hbase-server/src/test/resources/hbase-site.xml ddas : Files : * /hbase/branches/0.95/hbase-common/src/main/resources/hbase-default.xml * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequester.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java Limits the amount of time an edit can live in the memstore. --- Key: HBASE-5930 URL: https://issues.apache.org/jira/browse/HBASE-5930 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Devaraj Das Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: 5930-0.94.txt, 5930-1.patch, 5930-2.1.patch, 5930-2.2.patch, 5930-2.3.patch, 5930-2.4.patch, 5930-track-oldest-sample.txt, 5930-wip.patch, HBASE-5930-ADD-0.patch, hbase-5930-addendum2.patch, hbase-5930-test-execution.log A colleague of mine ran into an interesting issue. He inserted some data with the WAL disabled, which happened to fit in the aggregate Memstores memory. Two weeks later he a had problem with the HDFS cluster, which caused the region servers to abort. He found that his data was lost. Looking at the log we found that the Memstores were not flushed at all during these two weeks. Should we have an option to flush memstores periodically. There are obvious downsides to this, like many small storefiles, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8393) Testcase TestHeapSize#testMutations is wrong
[ https://issues.apache.org/jira/browse/HBASE-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642826#comment-13642826 ] Hudson commented on HBASE-8393: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/]) HBASE-8393 Testcase TestHeapSize#testMutations is wrong (Jeffrey) (Revision 1476022) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/TestHeapSize.java Testcase TestHeapSize#testMutations is wrong Key: HBASE-8393 URL: https://issues.apache.org/jira/browse/HBASE-8393 Project: HBase Issue Type: Bug Components: test Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.95.1 Attachments: hbase-8393.patch I happened to check this test case and there are several existing errors to make it pass. You can reproduce the test case failure by adding a new field into Mutation, the test case will either fail on a 64 bit system or 32 bit one. Below are errors I found in the test case: 1) The test case is using {code}row=new byte[]{0}{code} which is an array with length=1 while ClassSize.estimateBase can only calculate base class size(without counting field array length) 2) Add ClassSize.REFERENCE twice in the following code because ClassSize.estimateBase adds all reference fields already. {code}expected += ClassSize.align(ClassSize.TREEMAP + ClassSize.REFERENCE);{code} 3) ClassSize.estimateBase round up the sum of length of reference fields + primitive fields + Array while Mutation.MUTATION_OVERHEAD aligns the sum of length of a different set of fields. Therefore, there will be round up differences for class Increment because it introduces a new reference field TimeRange tr when the test case runs on a 32bit and 64 bit system. {code} ... long prealign_size = coeff[0] + align(coeff[1] * ARRAY) + coeff[2] * REFERENCE; // Round up to a multiple of 8 long size = align(prealign_size); ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8024) Make Store flush algorithm pluggable
[ https://issues.apache.org/jira/browse/HBASE-8024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642824#comment-13642824 ] Hudson commented on HBASE-8024: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/]) HBASE-8024 Make Store flush algorithm pluggable (Revision 1475870) Result = FAILURE sershe : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreEngine.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreFlusher.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreEngine.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlushContext.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlusher.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultStoreEngine.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java Make Store flush algorithm pluggable Key: HBASE-8024 URL: https://issues.apache.org/jira/browse/HBASE-8024 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.94.5, 0.95.0, 0.95.2 Reporter: Maryann Xue Assignee: Sergey Shelukhin Fix For: 0.95.1 Attachments: HBASE-8024-trunk.patch, HBASE-8024.v2.patch, HBASE-8024-v3.patch, HBASE-8024-v4.patch The idea is to make StoreFlusher an interface instead of an implementation class, and have the original StoreFlusher as the default store flush impl. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests
[ https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642823#comment-13642823 ] Eric Newton commented on HBASE-8389: bq. Can you elaborate - how many recovery attempts for success and also how long b/w retries ? After the tablet server loses its lock in zookeeper, the master waits 10s and calls recoverLease which returns false. After 5s, recoverLease is retried and succeeds. These are the default values for the timeouts. HBASE-8354 forces Namenode into loop with lease recovery requests - Key: HBASE-8389 URL: https://issues.apache.org/jira/browse/HBASE-8389 Project: HBase Issue Type: Bug Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.94.8 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, sample.patch We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease recoveries because of the short retry interval of 1 second between lease recoveries. The namenode gets into the following loop: 1) Receives lease recovery request and initiates recovery choosing a primary datanode every second 2) A lease recovery is successful and the namenode tries to commit the block under recovery as finalized - this takes 10 seconds in our environment since we run with tight HDFS socket timeouts. 3) At step 2), there is a more recent recovery enqueued because of the aggressive retries. This causes the committed block to get preempted and we enter a vicious cycle So we do, initiate_recovery -- commit_block -- commit_preempted_by_another_recovery This loop is paused after 300 seconds which is the hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node detection timeout is 20 seconds. Note that before the patch, we do not call recoverLease so aggressively - also it seems that the HDFS namenode is pretty dumb in that it keeps initiating new recoveries for every call. Before the patch, we call recoverLease, assume that the block was recovered, try to get the file, it has zero length since its under recovery, we fail the task and retry until we get a non zero length. So things just work. Fixes: 1) Expecting recovery to occur within 1 second is too aggressive. We need to have a more generous timeout. The timeout needs to be configurable since typically, the recovery takes as much time as the DFS timeouts. The primary datanode doing the recovery tries to reconcile the blocks and hits the timeouts when it tries to contact the dead node. So the recovery is as fast as the HDFS timeouts. 2) We have another issue I report in HDFS 4721. The Namenode chooses the stale datanode to perform the recovery (since its still alive). Hence the first recovery request is bound to fail. So if we want a tight MTTR, we either need something like HDFS 4721 or we need something like this recoverLease(...) sleep(1000) recoverLease(...) sleep(configuredTimeout) recoverLease(...) sleep(configuredTimeout) Where configuredTimeout should be large enough to let the recovery happen but the first timeout is short so that we get past the moot recovery in step #1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8271) Book updates for changes to GC options in shell scripts
[ https://issues.apache.org/jira/browse/HBASE-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642834#comment-13642834 ] Hudson commented on HBASE-8271: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/]) HBASE-8271 Book updates for changes to GC options in shell scripts (Revision 1476037) Result = FAILURE stack : Files : * /hbase/trunk/src/main/docbkx/troubleshooting.xml Book updates for changes to GC options in shell scripts --- Key: HBASE-8271 URL: https://issues.apache.org/jira/browse/HBASE-8271 Project: HBase Issue Type: Improvement Components: documentation Reporter: Jesse Yates Priority: Minor Fix For: 0.98.0 Attachments: HBASE-8271.patch http://hbase.apache.org/book/trouble.log.html is a bit out of date as the 'right' way to do GC logging is via the GC_OPTS, rather than going through the general HBASE_OPTS. Follow up to HBASE-7817 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8392) TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 profile
[ https://issues.apache.org/jira/browse/HBASE-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642825#comment-13642825 ] Hudson commented on HBASE-8392: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/]) HBASE-8392 TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 profile (Revision 1475998) Result = FAILURE eclark : Files : * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestExactCounterMetric.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestExponentiallyDecayingSample.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsHistogram.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsMBeanBase.java TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 profile Key: HBASE-8392 URL: https://issues.apache.org/jira/browse/HBASE-8392 Project: HBase Issue Type: Sub-task Components: hadoop2, metrics, test Affects Versions: 0.98.0, 0.95.0 Reporter: Jonathan Hsieh Assignee: Elliott Clark Fix For: 0.98.0, 0.95.1 Attachments: HBASE-8392-0.patch This specific small unit tests flakes out occasionally and blocks the medium and large tests from running. Here's an error trace: {code} Error Message expected:2.0 but was:0.125 Stacktrace junit.framework.AssertionFailedError: expected:2.0 but was:0.125 at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.failNotEquals(Assert.java:329) at junit.framework.Assert.assertEquals(Assert.java:120) at junit.framework.Assert.assertEquals(Assert.java:129) at junit.framework.TestCase.assertEquals(TestCase.java:288) at org.apache.hadoop.hbase.metrics.TestMetricsMBeanBase.testGetAttribute(TestMetricsMBeanBase.java:93) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.junit.runners.Suite.runChild(Suite.java:127) at org.junit.runners.Suite.runChild(Suite.java:26) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} [~eclark] took a quick look and will chime in on this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8415) DisabledRegionSplitPolicy
[ https://issues.apache.org/jira/browse/HBASE-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642830#comment-13642830 ] Hudson commented on HBASE-8415: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/]) HBASE-8415 DisabledRegionSplitPolicy (Revision 1475943) Result = FAILURE enis : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DisabledRegionSplitPolicy.java DisabledRegionSplitPolicy - Key: HBASE-8415 URL: https://issues.apache.org/jira/browse/HBASE-8415 Project: HBase Issue Type: New Feature Components: regionserver Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: hbase-8415_v1.patch Simple RegionSplitPolicy for tests, and some special cases where we want to disable splits. Makes it easier and more explicit than using a ConstantSizeRegionSplitPolicy with a large region size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8444) Acknowledge that 0.95+ requires 1.0.3 hadoop at least.
[ https://issues.apache.org/jira/browse/HBASE-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642827#comment-13642827 ] Hudson commented on HBASE-8444: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/]) HBASE-8444 Acknowledge that 0.95+ requires 1.0.3 hadoop at least (Revision 1476036) Result = FAILURE stack : Files : * /hbase/trunk/src/main/docbkx/configuration.xml Acknowledge that 0.95+ requires 1.0.3 hadoop at least. -- Key: HBASE-8444 URL: https://issues.apache.org/jira/browse/HBASE-8444 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.98.0 Attachments: 8444.txt As per this mail thread, http://search-hadoop.com/m/stbKO1YNWZe/Compile+does+not+work+against+Hadoop-1.0.0+-+1.0.2subj=Re+Compile+does+not+work+against+Hadoop+1+0+0+1+0+2 ... 0.95.x requires hadoop 1.0.3 at least. Note it in the refguide hadoop section. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8422) Master won't go down. Stuck waiting on .META. to come on line.
[ https://issues.apache.org/jira/browse/HBASE-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642828#comment-13642828 ] Hudson commented on HBASE-8422: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/]) HBASE-8422 Master won't go down. Stuck waiting on .META. to come on line (Revision 1475986) Result = FAILURE stack : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java Master won't go down. Stuck waiting on .META. to come on line. --- Key: HBASE-8422 URL: https://issues.apache.org/jira/browse/HBASE-8422 Project: HBase Issue Type: Bug Affects Versions: 0.95.0 Reporter: stack Assignee: rajeshbabu Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: HBASE-8422_2.patch, HBASE-8422_3.patch, HBASE-8422_94.patch, HBASE-8422.patch Master came up w/ no regionservers. I then tried to shut it down. You can see in below that it started to go down {code} 2013-04-24 14:28:49,770 INFO [IPC Server handler 7 on 6] org.apache.hadoop.hbase.master.HMaster: Cluster shutdown requested 2013-04-24 14:28:49,815 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.ServerManager: Finished waiting for region servers count to settle; checked in 0, slept for 2818 ms, expecting minimum of 1, maximum of 2147483647, master is stopped. 2013-04-24 14:28:49,815 WARN [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.MasterFileSystem: Master stopped while splitting logs 2013-04-24 14:28:50,104 INFO [stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor] org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor: stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor exiting 2013-04-24 14:28:50,850 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker: Unsetting META region location in ZooKeeper 2013-04-24 14:28:50,884 WARN [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/meta-region-server already deleted, retry=false 2013-04-24 14:28:50,884 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.AssignmentManager: Cluster shutdown is set; skipping assign of .META.,,1.1028785192 2013-04-24 14:28:50,884 INFO [master-stack-1.ent.cloudera.com,6,1366838923135] org.apache.hadoop.hbase.master.ServerManager: AssignmentManager hasn't finished failover cleanup 2013-04-24 14:29:46,188 INFO [master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner] org.apache.hadoop.hbase.master.cleaner.LogCleaner: master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner exiting 2013-04-24 14:29:46,193 INFO [master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner] org.apache.hadoop.hbase.master.cleaner.HFileCleaner: master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner exiting {code} ... but not it is stuck. We keep looping here: {code} master-stack-1.ent.cloudera.com,6,1366838923135 prio=10 tid=0x7f154853f000 nid=0x18b in Object.wait() [0x7f1545fde000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xc727d738 (a org.apache.hadoop.hbase.zookeeper.MetaRegionTracker) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:161) - locked 0xc727d738 (a org.apache.hadoop.hbase.zookeeper.MetaRegionTracker) at org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.waitMetaRegionLocation(MetaRegionTracker.java:105) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:250) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:299) at org.apache.hadoop.hbase.master.HMaster.enableSSHandWaitForMeta(HMaster.java:905) at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:879) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:764) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:522) at java.lang.Thread.run(Thread.java:722) {code} Odd. It is supposed to be checking the 'stopped' flag; maybe it has wrong stop flag. -- This message is automatically generated by JIRA. If you think it was sent
[jira] [Commented] (HBASE-8299) ExploringCompactionPolicy can get stuck in rare cases.
[ https://issues.apache.org/jira/browse/HBASE-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642831#comment-13642831 ] Hudson commented on HBASE-8299: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/]) HBASE-8299 ExploringCompactionPolicy can get stuck in rare cases. (Revision 1475966) Result = FAILURE eclark : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreEngine.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreConfigInformation.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/ConstantSizeFileListGenerator.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/EverythingPolicy.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/ExplicitFileListGenerator.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/GaussianFileListGenerator.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/MockStoreFileGenerator.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/PerfTestCompactionPolicies.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SemiConstantSizeFileListGenerator.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SinusoidalFileListGenerator.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SpikyFileListGenerator.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/StoreFileListGenerator.java ExploringCompactionPolicy can get stuck in rare cases. -- Key: HBASE-8299 URL: https://issues.apache.org/jira/browse/HBASE-8299 Project: HBase Issue Type: Bug Affects Versions: 0.95.1 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.98.0, 0.95.1 Attachments: HBASE-8299-0.patch, HBASE-8299-1.patch, HBASE-8299-2.patch, HBASE-8299-3.patch If the files are very oddly sized then it's possible that ExploringCompactionPolicy can get stuck. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5930) Limits the amount of time an edit can live in the memstore.
[ https://issues.apache.org/jira/browse/HBASE-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642833#comment-13642833 ] Hudson commented on HBASE-5930: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/]) HBASE-5930. Removed a configuration that was causing unnecessary flushes in tests. (Revision 1475990) HBASE-5930 Limits the amount of time an edit can live in the memstore. (Revision 1475970) HBASE-5930. Limits the amount of time an edit can live in the memstore. (Revision 1475872) Result = FAILURE ddas : Files : * /hbase/trunk/hbase-server/src/test/resources/hbase-site.xml eclark : Files : * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedWriter.java ddas : Files : * /hbase/trunk/hbase-common/src/main/resources/hbase-default.xml * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequester.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedWriter.java Limits the amount of time an edit can live in the memstore. --- Key: HBASE-5930 URL: https://issues.apache.org/jira/browse/HBASE-5930 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Devaraj Das Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: 5930-0.94.txt, 5930-1.patch, 5930-2.1.patch, 5930-2.2.patch, 5930-2.3.patch, 5930-2.4.patch, 5930-track-oldest-sample.txt, 5930-wip.patch, HBASE-5930-ADD-0.patch, hbase-5930-addendum2.patch, hbase-5930-test-execution.log A colleague of mine ran into an interesting issue. He inserted some data with the WAL disabled, which happened to fit in the aggregate Memstores memory. Two weeks later he a had problem with the HDFS cluster, which caused the region servers to abort. He found that his data was lost. Looking at the log we found that the Memstores were not flushed at all during these two weeks. Should we have an option to flush memstores periodically. There are obvious downsides to this, like many small storefiles, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8428) Tighten up IntegrationTestsDriver filter
[ https://issues.apache.org/jira/browse/HBASE-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642832#comment-13642832 ] Hudson commented on HBASE-8428: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/]) HBASE-8428 Tighten up IntegrationTestsDriver filter (Revision 1475996) Result = FAILURE stack : Files : * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestsDriver.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/AbstractHBaseTool.java * /hbase/trunk/src/main/docbkx/developer.xml Tighten up IntegrationTestsDriver filter Key: HBASE-8428 URL: https://issues.apache.org/jira/browse/HBASE-8428 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.95.1 Attachments: 8428.txt Currently, filter that looks for IntegrationTests is broad. Reports loads of errors as we try to parse classes we don't care about. Let me tighten it up so it doesn't scare folks away. It is particular bad when being run against a distribute cluster when the test context is not all present; here there are lots of ERROR reports about classes not found. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8367) LoadIncrementalHFiles does not return an error code or throw Exception when failures occur due to timeouts.
[ https://issues.apache.org/jira/browse/HBASE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Dougan updated HBASE-8367: Attachment: LoadIncrementalHFiles-HBASE-8367.patch Patch file against trunk. LoadIncrementalHFiles does not return an error code or throw Exception when failures occur due to timeouts. --- Key: HBASE-8367 URL: https://issues.apache.org/jira/browse/HBASE-8367 Project: HBase Issue Type: Improvement Components: mapreduce Affects Versions: 0.92.1, 0.92.2 Environment: Red Hat 6.2 Java 1.6.0_26 Hadoop 2.0.0-mr1-cdh4.1.1 HBase 0.92.1-cdh4.1.1 Reporter: Brian Dougan Priority: Minor Fix For: 0.94.8 Attachments: LoadIncrementalHFiles-HBASE-8367.patch The LoadIncrementalHFiles (completebulkload) command will exit with a success code (or lack of Exception) when one or more of the HFiles fail to be imported through a few ways (mainly when timeouts occur). Instead, it simply logs error messages to the log which makes it difficult to automate the import of HFiles programmatically. The heart of the LoadIncrementalHFiles class (doBulkLoad) returns void and has essentially the following structure. {code:title=LoadIncrementalHFiles.java} try { ... } finally { pool.shutdown(); if (queue != null !queue.isEmpty()) { StringBuilder err = new StringBuilder(); err.append(-\n); err.append(Bulk load aborted with some files not yet loaded:\n); err.append(-\n); for (LoadQueueItem q : queue) { err.append( ).append(q.hfilePath).append('\n'); } LOG.error(err); } } {code} As you can see, instead of returning an error code, a success indicator, or simply throwing an Exception, an error message is sent to the log. This results in something like the following in the logs. {quote} Bulk load aborted with some files not yet loaded: - hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.bottom hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.top hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.bottom hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.top {quote} Without some sort of indication, it's not currently possible to chain this command to another or to programmatically consume this class and be certain of a successful import. This class should be enhanced to return non-success in whatever way makes sense to the community. I don't really have a strong preference, but one of the following should work fine (at least for my needs). * boolean return value on doBulkLoad (non-zero on run method) * Response object on doBulkLoad detailing the files that failed (non-zero on run method) * throw Exception in the finally block when files failed after the error is written to the log (should automatically cause non-zero on run method) It would also be nice to get this to the 0.94.x stream so it get included in the next Cloudera release. Thanks! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8367) LoadIncrementalHFiles does not return an error code or throw Exception when failures occur due to timeouts.
[ https://issues.apache.org/jira/browse/HBASE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Dougan updated HBASE-8367: Status: Patch Available (was: Open) Patched proposed changes against trunk. LoadIncrementalHFiles does not return an error code or throw Exception when failures occur due to timeouts. --- Key: HBASE-8367 URL: https://issues.apache.org/jira/browse/HBASE-8367 Project: HBase Issue Type: Improvement Components: mapreduce Affects Versions: 0.92.2, 0.92.1 Environment: Red Hat 6.2 Java 1.6.0_26 Hadoop 2.0.0-mr1-cdh4.1.1 HBase 0.92.1-cdh4.1.1 Reporter: Brian Dougan Priority: Minor Fix For: 0.94.8 Attachments: LoadIncrementalHFiles-HBASE-8367.patch The LoadIncrementalHFiles (completebulkload) command will exit with a success code (or lack of Exception) when one or more of the HFiles fail to be imported through a few ways (mainly when timeouts occur). Instead, it simply logs error messages to the log which makes it difficult to automate the import of HFiles programmatically. The heart of the LoadIncrementalHFiles class (doBulkLoad) returns void and has essentially the following structure. {code:title=LoadIncrementalHFiles.java} try { ... } finally { pool.shutdown(); if (queue != null !queue.isEmpty()) { StringBuilder err = new StringBuilder(); err.append(-\n); err.append(Bulk load aborted with some files not yet loaded:\n); err.append(-\n); for (LoadQueueItem q : queue) { err.append( ).append(q.hfilePath).append('\n'); } LOG.error(err); } } {code} As you can see, instead of returning an error code, a success indicator, or simply throwing an Exception, an error message is sent to the log. This results in something like the following in the logs. {quote} Bulk load aborted with some files not yet loaded: - hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.bottom hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.top hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.bottom hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.top {quote} Without some sort of indication, it's not currently possible to chain this command to another or to programmatically consume this class and be certain of a successful import. This class should be enhanced to return non-success in whatever way makes sense to the community. I don't really have a strong preference, but one of the following should work fine (at least for my needs). * boolean return value on doBulkLoad (non-zero on run method) * Response object on doBulkLoad detailing the files that failed (non-zero on run method) * throw Exception in the finally block when files failed after the error is written to the log (should automatically cause non-zero on run method) It would also be nice to get this to the 0.94.x stream so it get included in the next Cloudera release. Thanks! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8446) Allow parallel snapshot of different tables
Matteo Bertozzi created HBASE-8446: -- Summary: Allow parallel snapshot of different tables Key: HBASE-8446 URL: https://issues.apache.org/jira/browse/HBASE-8446 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.95.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.95.2 Attachments: HBASE-8446-v0.patch currently only one snapshot at the time is allowed. Like for the restore, we should allow taking snapshot of different tables in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8446) Allow parallel snapshot of different tables
[ https://issues.apache.org/jira/browse/HBASE-8446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-8446: --- Attachment: HBASE-8446-v0.patch Allow parallel snapshot of different tables --- Key: HBASE-8446 URL: https://issues.apache.org/jira/browse/HBASE-8446 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.95.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.95.2 Attachments: HBASE-8446-v0.patch currently only one snapshot at the time is allowed. Like for the restore, we should allow taking snapshot of different tables in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8367) LoadIncrementalHFiles does not return an error code or throw Exception when failures occur due to timeouts.
[ https://issues.apache.org/jira/browse/HBASE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642916#comment-13642916 ] Hadoop QA commented on HBASE-8367: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580706/LoadIncrementalHFiles-HBASE-8367.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5466//console This message is automatically generated. LoadIncrementalHFiles does not return an error code or throw Exception when failures occur due to timeouts. --- Key: HBASE-8367 URL: https://issues.apache.org/jira/browse/HBASE-8367 Project: HBase Issue Type: Improvement Components: mapreduce Affects Versions: 0.92.1, 0.92.2 Environment: Red Hat 6.2 Java 1.6.0_26 Hadoop 2.0.0-mr1-cdh4.1.1 HBase 0.92.1-cdh4.1.1 Reporter: Brian Dougan Priority: Minor Fix For: 0.94.8 Attachments: LoadIncrementalHFiles-HBASE-8367.patch The LoadIncrementalHFiles (completebulkload) command will exit with a success code (or lack of Exception) when one or more of the HFiles fail to be imported through a few ways (mainly when timeouts occur). Instead, it simply logs error messages to the log which makes it difficult to automate the import of HFiles programmatically. The heart of the LoadIncrementalHFiles class (doBulkLoad) returns void and has essentially the following structure. {code:title=LoadIncrementalHFiles.java} try { ... } finally { pool.shutdown(); if (queue != null !queue.isEmpty()) { StringBuilder err = new StringBuilder(); err.append(-\n); err.append(Bulk load aborted with some files not yet loaded:\n); err.append(-\n); for (LoadQueueItem q : queue) { err.append( ).append(q.hfilePath).append('\n'); } LOG.error(err); } } {code} As you can see, instead of returning an error code, a success indicator, or simply throwing an Exception, an error message is sent to the log. This results in something like the following in the logs. {quote} Bulk load aborted with some files not yet loaded: - hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.bottom hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.top hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.bottom hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.top {quote} Without some sort of indication, it's not currently possible to chain this command to another or to programmatically consume this class and be certain of a successful import. This class should be enhanced to return non-success in whatever way makes sense to the community. I don't really have a strong preference, but one of the following should work fine (at least for my needs). * boolean return value on doBulkLoad (non-zero on run method) * Response object on doBulkLoad detailing the files that failed (non-zero on run method) * throw Exception in the finally block when files failed after the error is written to the log (should automatically cause non-zero on run method) It would also be nice to get this to the 0.94.x stream so it get included in the next Cloudera release. Thanks! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8446) Allow parallel snapshot of different tables
[ https://issues.apache.org/jira/browse/HBASE-8446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-8446: --- Attachment: (was: HBASE-8446-v0.patch) Allow parallel snapshot of different tables --- Key: HBASE-8446 URL: https://issues.apache.org/jira/browse/HBASE-8446 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.95.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.95.2 Attachments: HBASE-8446-v0.patch currently only one snapshot at the time is allowed. Like for the restore, we should allow taking snapshot of different tables in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8446) Allow parallel snapshot of different tables
[ https://issues.apache.org/jira/browse/HBASE-8446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-8446: --- Attachment: HBASE-8446-v0.patch Allow parallel snapshot of different tables --- Key: HBASE-8446 URL: https://issues.apache.org/jira/browse/HBASE-8446 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.95.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.95.2 Attachments: HBASE-8446-v0.patch currently only one snapshot at the time is allowed. Like for the restore, we should allow taking snapshot of different tables in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8367) LoadIncrementalHFiles does not return an error code or throw Exception when failures occur due to timeouts.
[ https://issues.apache.org/jira/browse/HBASE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642959#comment-13642959 ] Nick Dimiduk commented on HBASE-8367: - Hi [~bkdougan]. Please regenerate the patch from the root of the checkout. I would expect to see {{hbase-server}} as the first component in the path. Also, have a look at TestLoadIncrementalHFiles and see if any logic in there should be updated accordingly. For instance, I think this patch will break the method {{testNonexistentColumnFamilyLoad}}. Thanks! LoadIncrementalHFiles does not return an error code or throw Exception when failures occur due to timeouts. --- Key: HBASE-8367 URL: https://issues.apache.org/jira/browse/HBASE-8367 Project: HBase Issue Type: Improvement Components: mapreduce Affects Versions: 0.92.1, 0.92.2 Environment: Red Hat 6.2 Java 1.6.0_26 Hadoop 2.0.0-mr1-cdh4.1.1 HBase 0.92.1-cdh4.1.1 Reporter: Brian Dougan Priority: Minor Fix For: 0.94.8 Attachments: LoadIncrementalHFiles-HBASE-8367.patch The LoadIncrementalHFiles (completebulkload) command will exit with a success code (or lack of Exception) when one or more of the HFiles fail to be imported through a few ways (mainly when timeouts occur). Instead, it simply logs error messages to the log which makes it difficult to automate the import of HFiles programmatically. The heart of the LoadIncrementalHFiles class (doBulkLoad) returns void and has essentially the following structure. {code:title=LoadIncrementalHFiles.java} try { ... } finally { pool.shutdown(); if (queue != null !queue.isEmpty()) { StringBuilder err = new StringBuilder(); err.append(-\n); err.append(Bulk load aborted with some files not yet loaded:\n); err.append(-\n); for (LoadQueueItem q : queue) { err.append( ).append(q.hfilePath).append('\n'); } LOG.error(err); } } {code} As you can see, instead of returning an error code, a success indicator, or simply throwing an Exception, an error message is sent to the log. This results in something like the following in the logs. {quote} Bulk load aborted with some files not yet loaded: - hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.bottom hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.top hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.bottom hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.top {quote} Without some sort of indication, it's not currently possible to chain this command to another or to programmatically consume this class and be certain of a successful import. This class should be enhanced to return non-success in whatever way makes sense to the community. I don't really have a strong preference, but one of the following should work fine (at least for my needs). * boolean return value on doBulkLoad (non-zero on run method) * Response object on doBulkLoad detailing the files that failed (non-zero on run method) * throw Exception in the finally block when files failed after the error is written to the log (should automatically cause non-zero on run method) It would also be nice to get this to the 0.94.x stream so it get included in the next Cloudera release. Thanks! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8367) LoadIncrementalHFiles does not return an error code or throw Exception when failures occur due to timeouts.
[ https://issues.apache.org/jira/browse/HBASE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642993#comment-13642993 ] Brian Dougan commented on HBASE-8367: - Yep, I messed up that patch...used eclipse to generate it and did it from the project rather than the root. I'll get that updated... As for the tests, none of the existing ones are affected by this. I tried to write a new test, but can't find a way to hit this code with the current setup of this class/tests. The code in that finally block only gets hit when all the setup for the HFiles work (it's able to determine region/check for splits/verify families). It only hits that code in the finally block when something like a timeout occurs or connection errors occur while doing the bulk load on the region after everything else has been successful. Without the ability to intercept the call to the region or to mock the region that gets called, I don't really think it can be duplicated currently...thoughts? LoadIncrementalHFiles does not return an error code or throw Exception when failures occur due to timeouts. --- Key: HBASE-8367 URL: https://issues.apache.org/jira/browse/HBASE-8367 Project: HBase Issue Type: Improvement Components: mapreduce Affects Versions: 0.92.1, 0.92.2 Environment: Red Hat 6.2 Java 1.6.0_26 Hadoop 2.0.0-mr1-cdh4.1.1 HBase 0.92.1-cdh4.1.1 Reporter: Brian Dougan Priority: Minor Fix For: 0.94.8 Attachments: LoadIncrementalHFiles-HBASE-8367.patch The LoadIncrementalHFiles (completebulkload) command will exit with a success code (or lack of Exception) when one or more of the HFiles fail to be imported through a few ways (mainly when timeouts occur). Instead, it simply logs error messages to the log which makes it difficult to automate the import of HFiles programmatically. The heart of the LoadIncrementalHFiles class (doBulkLoad) returns void and has essentially the following structure. {code:title=LoadIncrementalHFiles.java} try { ... } finally { pool.shutdown(); if (queue != null !queue.isEmpty()) { StringBuilder err = new StringBuilder(); err.append(-\n); err.append(Bulk load aborted with some files not yet loaded:\n); err.append(-\n); for (LoadQueueItem q : queue) { err.append( ).append(q.hfilePath).append('\n'); } LOG.error(err); } } {code} As you can see, instead of returning an error code, a success indicator, or simply throwing an Exception, an error message is sent to the log. This results in something like the following in the logs. {quote} Bulk load aborted with some files not yet loaded: - hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.bottom hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.top hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.bottom hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.top {quote} Without some sort of indication, it's not currently possible to chain this command to another or to programmatically consume this class and be certain of a successful import. This class should be enhanced to return non-success in whatever way makes sense to the community. I don't really have a strong preference, but one of the following should work fine (at least for my needs). * boolean return value on doBulkLoad (non-zero on run method) * Response object on doBulkLoad detailing the files that failed (non-zero on run method) * throw Exception in the finally block when files failed after the error is written to the log (should automatically cause non-zero on run method) It would also be nice to get this to the 0.94.x stream so it get included in the next Cloudera release. Thanks! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8393) Testcase TestHeapSize#testMutations is wrong
[ https://issues.apache.org/jira/browse/HBASE-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642997#comment-13642997 ] Hudson commented on HBASE-8393: --- Integrated in HBase-TRUNK #4081 (See [https://builds.apache.org/job/HBase-TRUNK/4081/]) HBASE-8393 Testcase TestHeapSize#testMutations is wrong (Jeffrey) (Revision 1476022) Result = SUCCESS tedyu : Files : * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/TestHeapSize.java Testcase TestHeapSize#testMutations is wrong Key: HBASE-8393 URL: https://issues.apache.org/jira/browse/HBASE-8393 Project: HBase Issue Type: Bug Components: test Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.95.1 Attachments: hbase-8393.patch I happened to check this test case and there are several existing errors to make it pass. You can reproduce the test case failure by adding a new field into Mutation, the test case will either fail on a 64 bit system or 32 bit one. Below are errors I found in the test case: 1) The test case is using {code}row=new byte[]{0}{code} which is an array with length=1 while ClassSize.estimateBase can only calculate base class size(without counting field array length) 2) Add ClassSize.REFERENCE twice in the following code because ClassSize.estimateBase adds all reference fields already. {code}expected += ClassSize.align(ClassSize.TREEMAP + ClassSize.REFERENCE);{code} 3) ClassSize.estimateBase round up the sum of length of reference fields + primitive fields + Array while Mutation.MUTATION_OVERHEAD aligns the sum of length of a different set of fields. Therefore, there will be round up differences for class Increment because it introduces a new reference field TimeRange tr when the test case runs on a 32bit and 64 bit system. {code} ... long prealign_size = coeff[0] + align(coeff[1] * ARRAY) + coeff[2] * REFERENCE; // Round up to a multiple of 8 long size = align(prealign_size); ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8444) Acknowledge that 0.95+ requires 1.0.3 hadoop at least.
[ https://issues.apache.org/jira/browse/HBASE-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642998#comment-13642998 ] Hudson commented on HBASE-8444: --- Integrated in HBase-TRUNK #4081 (See [https://builds.apache.org/job/HBase-TRUNK/4081/]) HBASE-8444 Acknowledge that 0.95+ requires 1.0.3 hadoop at least (Revision 1476036) Result = SUCCESS stack : Files : * /hbase/trunk/src/main/docbkx/configuration.xml Acknowledge that 0.95+ requires 1.0.3 hadoop at least. -- Key: HBASE-8444 URL: https://issues.apache.org/jira/browse/HBASE-8444 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.98.0 Attachments: 8444.txt As per this mail thread, http://search-hadoop.com/m/stbKO1YNWZe/Compile+does+not+work+against+Hadoop-1.0.0+-+1.0.2subj=Re+Compile+does+not+work+against+Hadoop+1+0+0+1+0+2 ... 0.95.x requires hadoop 1.0.3 at least. Note it in the refguide hadoop section. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8345) Add all available resources in o.a.h.h.rest.RootResource and VersionResource to o.a.h.h.rest.client.RemoteAdmin
[ https://issues.apache.org/jira/browse/HBASE-8345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642999#comment-13642999 ] Hudson commented on HBASE-8345: --- Integrated in HBase-TRUNK #4081 (See [https://builds.apache.org/job/HBase-TRUNK/4081/]) HBASE-8345 Add all available resources in RootResource and VersionResource to rest RemoteAdmin (Aleksandr Shulman) (Revision 1476025) Result = SUCCESS jxiang : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteAdmin.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/rest/client/TestRemoteAdmin.java Add all available resources in o.a.h.h.rest.RootResource and VersionResource to o.a.h.h.rest.client.RemoteAdmin --- Key: HBASE-8345 URL: https://issues.apache.org/jira/browse/HBASE-8345 Project: HBase Issue Type: Improvement Components: Client, REST Affects Versions: 0.94.6.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Labels: rest_api Fix For: 0.98.0, 0.94.8, 0.95.1 Attachments: HBASE-8345-v1.patch, HBASE-8345-v6-94.patch, HBASE-8345-v6-trunk.patch In our built-in REST clients, we should add in more of the available REST resources. This will allow more thorough testing of the REST API, particularly with IntegrationTest. These clients are located in the o.a.h.h.rest.client package. In this case, I want to add the resources not already included in / and /version to o.a.h.h.rest.client.RemoteAdmin. This includes, /status/cluster, /version/rest and /version/cluster, among others. The RemoteAdmin class is a logical place for these methods because it is not related to a specific table (those methods should go into RemoteHTable). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8271) Book updates for changes to GC options in shell scripts
[ https://issues.apache.org/jira/browse/HBASE-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643000#comment-13643000 ] Hudson commented on HBASE-8271: --- Integrated in HBase-TRUNK #4081 (See [https://builds.apache.org/job/HBase-TRUNK/4081/]) HBASE-8271 Book updates for changes to GC options in shell scripts (Revision 1476037) Result = SUCCESS stack : Files : * /hbase/trunk/src/main/docbkx/troubleshooting.xml Book updates for changes to GC options in shell scripts --- Key: HBASE-8271 URL: https://issues.apache.org/jira/browse/HBASE-8271 Project: HBase Issue Type: Improvement Components: documentation Reporter: Jesse Yates Priority: Minor Fix For: 0.98.0 Attachments: HBASE-8271.patch http://hbase.apache.org/book/trouble.log.html is a bit out of date as the 'right' way to do GC logging is via the GC_OPTS, rather than going through the general HBASE_OPTS. Follow up to HBASE-7817 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-8445) regionserver can't load an updated coprocessor jar with the same jar path
[ https://issues.apache.org/jira/browse/HBASE-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-8445. --- Resolution: Invalid We don't support module reload use cases. For that the consensus is we should consider a full OSGi runtime so we do not repeat all of the mistakes involved in creating such a runtime, however unless there is a compelling reason to do so the consensus is also that is not wanted. regionserver can't load an updated coprocessor jar with the same jar path - Key: HBASE-8445 URL: https://issues.apache.org/jira/browse/HBASE-8445 Project: HBase Issue Type: Bug Affects Versions: 0.94.5 Reporter: Wang Qiang Attachments: patch_20130426_01.txt when I update a coprocessor jar, then I disable and enable the table with the coprocessor, but the new features in the updated coprocessor jar doesn't make any sense. Follow into the class 'org.apache.hadoop.hbase.coprocessor.CoprocessorHost', I found that there's a coprocessor class loader cache , of which the key is the coprocessor jar path(although the key is a weak reference), so when I disable/enable the table, it got a cached coprocessor class loader from the cache with the jar path, and it didn't try to reload the coprocessor jar from the hdfs. Here I give a patch, in which I add an extra info which is 'FileCheckSum' with the coprocessor class loader cache, if the checksum is changed, try to reload the jar from the hdfs path -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8438) Extend bin/hbase to print a minimal classpath for used by other tools
[ https://issues.apache.org/jira/browse/HBASE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643016#comment-13643016 ] Andrew Purtell commented on HBASE-8438: --- +1 Extend bin/hbase to print a minimal classpath for used by other tools --- Key: HBASE-8438 URL: https://issues.apache.org/jira/browse/HBASE-8438 Project: HBase Issue Type: Improvement Components: scripts Affects Versions: 0.94.6.1, 0.95.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch, 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch, 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch For tools like pig and hive, blindly appending the full output of `bin/hbase classpath` to their own CLASSPATH is excessive. They already build CLASSPATH entries for hadoop. All they need from us is the delta entries, the dependencies we require w/o hadoop and all of it's transitive deps. This is also a kindness for Windows, where there's a shorter limit on the length of commandline arguments. See also HIVE-2055 for additional discussion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8445) regionserver can't load an updated coprocessor jar with the same jar path
[ https://issues.apache.org/jira/browse/HBASE-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643019#comment-13643019 ] Jimmy Xiang commented on HBASE-8445: That's right: we don't do reloading. One work-around is to do a full cluster rolling-restart in such a case. regionserver can't load an updated coprocessor jar with the same jar path - Key: HBASE-8445 URL: https://issues.apache.org/jira/browse/HBASE-8445 Project: HBase Issue Type: Bug Affects Versions: 0.94.5 Reporter: Wang Qiang Attachments: patch_20130426_01.txt when I update a coprocessor jar, then I disable and enable the table with the coprocessor, but the new features in the updated coprocessor jar doesn't make any sense. Follow into the class 'org.apache.hadoop.hbase.coprocessor.CoprocessorHost', I found that there's a coprocessor class loader cache , of which the key is the coprocessor jar path(although the key is a weak reference), so when I disable/enable the table, it got a cached coprocessor class loader from the cache with the jar path, and it didn't try to reload the coprocessor jar from the hdfs. Here I give a patch, in which I add an extra info which is 'FileCheckSum' with the coprocessor class loader cache, if the checksum is changed, try to reload the jar from the hdfs path -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8438) Extend bin/hbase to print a minimal classpath for used by other tools
[ https://issues.apache.org/jira/browse/HBASE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643036#comment-13643036 ] Nick Dimiduk commented on HBASE-8438: - How do we fix this release audit warning? {noformat} !? /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/test/data/a6a6562b777440fd9c34885428f5cb61.21e75333ada3d5bafb34bb918f29576c Lines that start with ? in the release audit report indicate files that do not have an Apache license header. {noformat} Extend bin/hbase to print a minimal classpath for used by other tools --- Key: HBASE-8438 URL: https://issues.apache.org/jira/browse/HBASE-8438 Project: HBase Issue Type: Improvement Components: scripts Affects Versions: 0.94.6.1, 0.95.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch, 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch, 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch For tools like pig and hive, blindly appending the full output of `bin/hbase classpath` to their own CLASSPATH is excessive. They already build CLASSPATH entries for hadoop. All they need from us is the delta entries, the dependencies we require w/o hadoop and all of it's transitive deps. This is also a kindness for Windows, where there's a shorter limit on the length of commandline arguments. See also HIVE-2055 for additional discussion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8447) Add docs for hbck around metaonly
Elliott Clark created HBASE-8447: Summary: Add docs for hbck around metaonly Key: HBASE-8447 URL: https://issues.apache.org/jira/browse/HBASE-8447 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Priority: Minor We should document -metaonly in the book. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7413) Convert WAL to pb
[ https://issues.apache.org/jira/browse/HBASE-7413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643053#comment-13643053 ] Sergey Shelukhin commented on HBASE-7413: - Same pattern later. I will commit sometime between today and monday if there are no objections. Convert WAL to pb - Key: HBASE-7413 URL: https://issues.apache.org/jira/browse/HBASE-7413 Project: HBase Issue Type: Sub-task Components: wal Reporter: stack Assignee: Sergey Shelukhin Priority: Critical Fix For: 0.95.1 Attachments: HBASE-7413-v0.patch, HBASE-7413-v1.patch, HBASE-7413-v2.patch, HBASE-7413-v3.patch, HBASE-7413-v4.patch From HBASE-7201 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8438) Extend bin/hbase to print a minimal classpath for used by other tools
[ https://issues.apache.org/jira/browse/HBASE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643059#comment-13643059 ] Sergey Shelukhin commented on HBASE-8438: - +1 Extend bin/hbase to print a minimal classpath for used by other tools --- Key: HBASE-8438 URL: https://issues.apache.org/jira/browse/HBASE-8438 Project: HBase Issue Type: Improvement Components: scripts Affects Versions: 0.94.6.1, 0.95.0 Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch, 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch, 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch For tools like pig and hive, blindly appending the full output of `bin/hbase classpath` to their own CLASSPATH is excessive. They already build CLASSPATH entries for hadoop. All they need from us is the delta entries, the dependencies we require w/o hadoop and all of it's transitive deps. This is also a kindness for Windows, where there's a shorter limit on the length of commandline arguments. See also HIVE-2055 for additional discussion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background
[ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-6295: --- Attachment: 6295.v4.patch Possible performance improvement in client batch operations: presplit and send in background Key: HBASE-6295 URL: https://issues.apache.org/jira/browse/HBASE-6295 Project: HBase Issue Type: Improvement Components: Client, Performance Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Labels: noob Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch today batch algo is: {noformat} for Operation o: ListOp{ add o to todolist if todolist maxsize or o last in list split todolist per location send split lists to region servers clear todolist wait } {noformat} We could: - create immediately the final object instead of an intermediate array - split per location immediately - instead of sending when the list as a whole is full, send it when there is enough data for a single location It would be: {noformat} for Operation o: ListOp{ get location add o to todo location.todolist if (location.todolist maxLocationSize) send location.todolist to region server clear location.todolist // don't wait, continue the loop } send remaining wait {noformat} It's not trivial to write if you add error management: retried list must be shared with the operations added in the todolist. But it's doable. It's interesting mainly for 'big' writes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8446) Allow parallel snapshot of different tables
[ https://issues.apache.org/jira/browse/HBASE-8446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643085#comment-13643085 ] Sergey Shelukhin commented on HBASE-8446: - There's no new test for this, otherwise looks good Allow parallel snapshot of different tables --- Key: HBASE-8446 URL: https://issues.apache.org/jira/browse/HBASE-8446 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.95.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.95.2 Attachments: HBASE-8446-v0.patch currently only one snapshot at the time is allowed. Like for the restore, we should allow taking snapshot of different tables in parallel. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background
[ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643087#comment-13643087 ] Nicolas Liochon commented on HBASE-6295: v4. I added a control per server: the client cannot have more than X request on a same server. If this number is reached, we continue for the other servers, but the ones on the overloaded servers are kept in the buffer. This will limit the rpc.timeout effect. It's still a hack in terms on implementation, but hopefully it's acceptable in terms of feature. I've got some tests running locally, I will do one on a real cluster if they are ok. Possible performance improvement in client batch operations: presplit and send in background Key: HBASE-6295 URL: https://issues.apache.org/jira/browse/HBASE-6295 Project: HBase Issue Type: Improvement Components: Client, Performance Affects Versions: 0.95.2 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Labels: noob Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch today batch algo is: {noformat} for Operation o: ListOp{ add o to todolist if todolist maxsize or o last in list split todolist per location send split lists to region servers clear todolist wait } {noformat} We could: - create immediately the final object instead of an intermediate array - split per location immediately - instead of sending when the list as a whole is full, send it when there is enough data for a single location It would be: {noformat} for Operation o: ListOp{ get location add o to todo location.todolist if (location.todolist maxLocationSize) send location.todolist to region server clear location.todolist // don't wait, continue the loop } send remaining wait {noformat} It's not trivial to write if you add error management: retried list must be shared with the operations added in the todolist. But it's doable. It's interesting mainly for 'big' writes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests
[ https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643136#comment-13643136 ] Varun Sharma commented on HBASE-8389: - [~saint@gmail.com] I can do a small write up that folks can refer to. [~nkeywal] One point regarding the low setting though. Its good for fast MTTR requirements such as online clusters but it does not work well if you pound a small cluster with mapreduce jobs. The write timeouts start kicking in on datanodes - we saw this on a small cluster. So it has to be taken with a pinch of salt. I think 4 seconds might be too tight. Because we have the following sequence - 1) recoverLease called 2) The primary node heartbeats (this can be 3 seconds in the worst case) 3) There are multiple timeouts during recovery at primary datanode: a) dfs.socket.timeout kicks in when we suspend the processes using kill -STOP - there is only 1 retry b) ipc.client.connect.timeout is the troublemaker - on old hadoop versions it is hardcoded at 20 seconds. On some versions, the # of retries is hardcoded at 45. This can be trigger by firewalling a host using iptables to drop all incoming/outgoing TCP packets. Another issue here is that b/w the timeouts there is a 1 second hardcoded sleep :) - I just fixed it in HADOOP 9503. If we make sure that all the dfs.socket.timeout and ipc client settings are the same in hbase-site.xml and hdfs-site.xml. Then, we can The retry rate should be no faster than 3a and 3b - or lease recoveries will accumulate for 900 seconds in trunk. To get around this problem, we would want to make sure that hbase-site.xml has the same settings as hdfs-site.xml. And we calculate the recovery interval from those settings. Otherwise, we can leave a release note saying that this number should be max(dfs.socket.timeout, ipc.client.connect.max.retries.on.timeouts * ipc.client.connect.timeout, ipc.client.connect.max.retries). The advantage of having HDFS 4721 is that at some point the data node will be recognized as stale - maybe a little later than hdfs recovery. Once that happens, recoveries typically occuring within 2 seconds. HBASE-8354 forces Namenode into loop with lease recovery requests - Key: HBASE-8389 URL: https://issues.apache.org/jira/browse/HBASE-8389 Project: HBase Issue Type: Bug Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.94.8 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, sample.patch We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease recoveries because of the short retry interval of 1 second between lease recoveries. The namenode gets into the following loop: 1) Receives lease recovery request and initiates recovery choosing a primary datanode every second 2) A lease recovery is successful and the namenode tries to commit the block under recovery as finalized - this takes 10 seconds in our environment since we run with tight HDFS socket timeouts. 3) At step 2), there is a more recent recovery enqueued because of the aggressive retries. This causes the committed block to get preempted and we enter a vicious cycle So we do, initiate_recovery -- commit_block -- commit_preempted_by_another_recovery This loop is paused after 300 seconds which is the hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node detection timeout is 20 seconds. Note that before the patch, we do not call recoverLease so aggressively - also it seems that the HDFS namenode is pretty dumb in that it keeps initiating new recoveries for every call. Before the patch, we call recoverLease, assume that the block was recovered, try to get the file, it has zero length since its under recovery, we fail the task and retry until we get a non zero length. So things just work. Fixes: 1) Expecting recovery to occur within 1 second is too aggressive. We need to have a more generous timeout. The timeout needs to be configurable since typically, the recovery takes as much time as the DFS timeouts. The primary datanode doing the recovery tries to reconcile the blocks and hits the timeouts when it tries to contact the dead node. So the recovery is as fast as the HDFS timeouts. 2) We have another issue I report in HDFS 4721. The Namenode chooses the stale datanode to perform the recovery (since its still alive). Hence the first recovery request is bound to fail. So if we want a tight MTTR, we either
[jira] [Commented] (HBASE-8444) Acknowledge that 0.95+ requires 1.0.3 hadoop at least.
[ https://issues.apache.org/jira/browse/HBASE-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643218#comment-13643218 ] Enis Soztutar commented on HBASE-8444: -- Thanks Stack. Acknowledge that 0.95+ requires 1.0.3 hadoop at least. -- Key: HBASE-8444 URL: https://issues.apache.org/jira/browse/HBASE-8444 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.98.0 Attachments: 8444.txt As per this mail thread, http://search-hadoop.com/m/stbKO1YNWZe/Compile+does+not+work+against+Hadoop-1.0.0+-+1.0.2subj=Re+Compile+does+not+work+against+Hadoop+1+0+0+1+0+2 ... 0.95.x requires hadoop 1.0.3 at least. Note it in the refguide hadoop section. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8393) Testcase TestHeapSize#testMutations is wrong
[ https://issues.apache.org/jira/browse/HBASE-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8393: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Testcase TestHeapSize#testMutations is wrong Key: HBASE-8393 URL: https://issues.apache.org/jira/browse/HBASE-8393 Project: HBase Issue Type: Bug Components: test Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.98.0, 0.95.1 Attachments: hbase-8393.patch I happened to check this test case and there are several existing errors to make it pass. You can reproduce the test case failure by adding a new field into Mutation, the test case will either fail on a 64 bit system or 32 bit one. Below are errors I found in the test case: 1) The test case is using {code}row=new byte[]{0}{code} which is an array with length=1 while ClassSize.estimateBase can only calculate base class size(without counting field array length) 2) Add ClassSize.REFERENCE twice in the following code because ClassSize.estimateBase adds all reference fields already. {code}expected += ClassSize.align(ClassSize.TREEMAP + ClassSize.REFERENCE);{code} 3) ClassSize.estimateBase round up the sum of length of reference fields + primitive fields + Array while Mutation.MUTATION_OVERHEAD aligns the sum of length of a different set of fields. Therefore, there will be round up differences for class Increment because it introduces a new reference field TimeRange tr when the test case runs on a 32bit and 64 bit system. {code} ... long prealign_size = coeff[0] + align(coeff[1] * ARRAY) + coeff[2] * REFERENCE; // Round up to a multiple of 8 long size = align(prealign_size); ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests
[ https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643220#comment-13643220 ] Ted Yu commented on HBASE-8389: --- @Varun: bq. Then, we can Can you complete the above sentence ? HBASE-8354 forces Namenode into loop with lease recovery requests - Key: HBASE-8389 URL: https://issues.apache.org/jira/browse/HBASE-8389 Project: HBase Issue Type: Bug Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.94.8 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, sample.patch We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease recoveries because of the short retry interval of 1 second between lease recoveries. The namenode gets into the following loop: 1) Receives lease recovery request and initiates recovery choosing a primary datanode every second 2) A lease recovery is successful and the namenode tries to commit the block under recovery as finalized - this takes 10 seconds in our environment since we run with tight HDFS socket timeouts. 3) At step 2), there is a more recent recovery enqueued because of the aggressive retries. This causes the committed block to get preempted and we enter a vicious cycle So we do, initiate_recovery -- commit_block -- commit_preempted_by_another_recovery This loop is paused after 300 seconds which is the hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node detection timeout is 20 seconds. Note that before the patch, we do not call recoverLease so aggressively - also it seems that the HDFS namenode is pretty dumb in that it keeps initiating new recoveries for every call. Before the patch, we call recoverLease, assume that the block was recovered, try to get the file, it has zero length since its under recovery, we fail the task and retry until we get a non zero length. So things just work. Fixes: 1) Expecting recovery to occur within 1 second is too aggressive. We need to have a more generous timeout. The timeout needs to be configurable since typically, the recovery takes as much time as the DFS timeouts. The primary datanode doing the recovery tries to reconcile the blocks and hits the timeouts when it tries to contact the dead node. So the recovery is as fast as the HDFS timeouts. 2) We have another issue I report in HDFS 4721. The Namenode chooses the stale datanode to perform the recovery (since its still alive). Hence the first recovery request is bound to fail. So if we want a tight MTTR, we either need something like HDFS 4721 or we need something like this recoverLease(...) sleep(1000) recoverLease(...) sleep(configuredTimeout) recoverLease(...) sleep(configuredTimeout) Where configuredTimeout should be large enough to let the recovery happen but the first timeout is short so that we get past the moot recovery in step #1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests
[ https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643226#comment-13643226 ] Varun Sharma commented on HBASE-8389: - Sorry about that... If we make sure that all the dfs.socket.timeout and ipc client settings are the same in hbase-site.xml and hdfs-site.xml. Then, we can do a custom calculation of recover lease retry interval inside hbase. But basically hbase needs to know in some way how the timeouts are setup underneath. Thanks Varun HBASE-8354 forces Namenode into loop with lease recovery requests - Key: HBASE-8389 URL: https://issues.apache.org/jira/browse/HBASE-8389 Project: HBase Issue Type: Bug Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.94.8 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, sample.patch We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease recoveries because of the short retry interval of 1 second between lease recoveries. The namenode gets into the following loop: 1) Receives lease recovery request and initiates recovery choosing a primary datanode every second 2) A lease recovery is successful and the namenode tries to commit the block under recovery as finalized - this takes 10 seconds in our environment since we run with tight HDFS socket timeouts. 3) At step 2), there is a more recent recovery enqueued because of the aggressive retries. This causes the committed block to get preempted and we enter a vicious cycle So we do, initiate_recovery -- commit_block -- commit_preempted_by_another_recovery This loop is paused after 300 seconds which is the hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node detection timeout is 20 seconds. Note that before the patch, we do not call recoverLease so aggressively - also it seems that the HDFS namenode is pretty dumb in that it keeps initiating new recoveries for every call. Before the patch, we call recoverLease, assume that the block was recovered, try to get the file, it has zero length since its under recovery, we fail the task and retry until we get a non zero length. So things just work. Fixes: 1) Expecting recovery to occur within 1 second is too aggressive. We need to have a more generous timeout. The timeout needs to be configurable since typically, the recovery takes as much time as the DFS timeouts. The primary datanode doing the recovery tries to reconcile the blocks and hits the timeouts when it tries to contact the dead node. So the recovery is as fast as the HDFS timeouts. 2) We have another issue I report in HDFS 4721. The Namenode chooses the stale datanode to perform the recovery (since its still alive). Hence the first recovery request is bound to fail. So if we want a tight MTTR, we either need something like HDFS 4721 or we need something like this recoverLease(...) sleep(1000) recoverLease(...) sleep(configuredTimeout) recoverLease(...) sleep(configuredTimeout) Where configuredTimeout should be large enough to let the recovery happen but the first timeout is short so that we get past the moot recovery in step #1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests
[ https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643255#comment-13643255 ] Ted Yu commented on HBASE-8389: --- bq. If we make sure that all the dfs.socket.timeout and ipc client settings are the same in hbase-site.xml and hdfs-site.xml. Should we add a check for the above at cluster startup ? If discrepancy is found, we can log a warning message. HBASE-8354 forces Namenode into loop with lease recovery requests - Key: HBASE-8389 URL: https://issues.apache.org/jira/browse/HBASE-8389 Project: HBase Issue Type: Bug Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.94.8 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, sample.patch We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease recoveries because of the short retry interval of 1 second between lease recoveries. The namenode gets into the following loop: 1) Receives lease recovery request and initiates recovery choosing a primary datanode every second 2) A lease recovery is successful and the namenode tries to commit the block under recovery as finalized - this takes 10 seconds in our environment since we run with tight HDFS socket timeouts. 3) At step 2), there is a more recent recovery enqueued because of the aggressive retries. This causes the committed block to get preempted and we enter a vicious cycle So we do, initiate_recovery -- commit_block -- commit_preempted_by_another_recovery This loop is paused after 300 seconds which is the hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node detection timeout is 20 seconds. Note that before the patch, we do not call recoverLease so aggressively - also it seems that the HDFS namenode is pretty dumb in that it keeps initiating new recoveries for every call. Before the patch, we call recoverLease, assume that the block was recovered, try to get the file, it has zero length since its under recovery, we fail the task and retry until we get a non zero length. So things just work. Fixes: 1) Expecting recovery to occur within 1 second is too aggressive. We need to have a more generous timeout. The timeout needs to be configurable since typically, the recovery takes as much time as the DFS timeouts. The primary datanode doing the recovery tries to reconcile the blocks and hits the timeouts when it tries to contact the dead node. So the recovery is as fast as the HDFS timeouts. 2) We have another issue I report in HDFS 4721. The Namenode chooses the stale datanode to perform the recovery (since its still alive). Hence the first recovery request is bound to fail. So if we want a tight MTTR, we either need something like HDFS 4721 or we need something like this recoverLease(...) sleep(1000) recoverLease(...) sleep(configuredTimeout) recoverLease(...) sleep(configuredTimeout) Where configuredTimeout should be large enough to let the recovery happen but the first timeout is short so that we get past the moot recovery in step #1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction
Sergey Shelukhin created HBASE-8448: --- Summary: RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction Key: HBASE-8448 URL: https://issues.apache.org/jira/browse/HBASE-8448 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin The code added to make sure it doesn't get stuck, doesn't take into account filesCompacting. This is the cause of recent TestHFileArchiving failures... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction
[ https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HBASE-8448: --- Assignee: Sergey Shelukhin RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction Key: HBASE-8448 URL: https://issues.apache.org/jira/browse/HBASE-8448 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin The code added to make sure it doesn't get stuck, doesn't take into account filesCompacting. This is the cause of recent TestHFileArchiving failures... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction
[ https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-8448: Attachment: HBASE-8448-v0.patch RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction Key: HBASE-8448 URL: https://issues.apache.org/jira/browse/HBASE-8448 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8448-v0.patch The code added to make sure it doesn't get stuck, doesn't take into account filesCompacting. This is the cause of recent TestHFileArchiving failures... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction
[ https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-8448: Status: Patch Available (was: Open) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction Key: HBASE-8448 URL: https://issues.apache.org/jira/browse/HBASE-8448 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8448-v0.patch The code added to make sure it doesn't get stuck, doesn't take into account filesCompacting. This is the cause of recent TestHFileArchiving failures... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction
[ https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643278#comment-13643278 ] Sergey Shelukhin commented on HBASE-8448: - tiny patch RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction Key: HBASE-8448 URL: https://issues.apache.org/jira/browse/HBASE-8448 Project: HBase Issue Type: Bug Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8448-v0.patch The code added to make sure it doesn't get stuck, doesn't take into account filesCompacting. This is the cause of recent TestHFileArchiving failures... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction
[ https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-8448: Component/s: Compaction RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction Key: HBASE-8448 URL: https://issues.apache.org/jira/browse/HBASE-8448 Project: HBase Issue Type: Bug Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8448-v0.patch The code added to make sure it doesn't get stuck, doesn't take into account filesCompacting. This is the cause of recent TestHFileArchiving failures... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francis Liu updated HBASE-6721: --- Attachment: HBASE-6721_8.patch RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Vandana Ayyalasomayajula Fix For: 0.95.1 Attachments: 6721-master-webUI.patch, HBASE-6721_8.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8449) Refactor recoverLease retries and pauses informed by findings over in hbase-8354
stack created HBASE-8449: Summary: Refactor recoverLease retries and pauses informed by findings over in hbase-8354 Key: HBASE-8449 URL: https://issues.apache.org/jira/browse/HBASE-8449 Project: HBase Issue Type: Bug Components: Filesystem Integration Affects Versions: 0.95.0, 0.94.7 Reporter: stack Priority: Critical Fix For: 0.95.1 HBASE-8354 is an interesting issue that roams near and far. This issue is about making use of the findings handily summarized on the end of hbase-8354 which have it that trunk needs refactor around how it does its recoverLease handling (and that the patch committed against HBASE-8354 is not what we want going forward). This issue is about making a patch that adds a lag between recoverLease invocations where the lag is related to dfs timeouts -- the hdfs-side dfs timeout -- and optionally makes use of the isFileClosed API if it is available (a facility that is not yet committed to a branch near you and unlikely to be within your locality with a good while to come). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests
[ https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643287#comment-13643287 ] stack commented on HBASE-8389: -- [~varun] Thanks. I made HBASE-8449 for trunk patch (and to fix what is applied here -- the 4s in particular). HBASE-8354 forces Namenode into loop with lease recovery requests - Key: HBASE-8389 URL: https://issues.apache.org/jira/browse/HBASE-8389 Project: HBase Issue Type: Bug Reporter: Varun Sharma Assignee: Varun Sharma Priority: Critical Fix For: 0.94.8 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, sample.patch We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease recoveries because of the short retry interval of 1 second between lease recoveries. The namenode gets into the following loop: 1) Receives lease recovery request and initiates recovery choosing a primary datanode every second 2) A lease recovery is successful and the namenode tries to commit the block under recovery as finalized - this takes 10 seconds in our environment since we run with tight HDFS socket timeouts. 3) At step 2), there is a more recent recovery enqueued because of the aggressive retries. This causes the committed block to get preempted and we enter a vicious cycle So we do, initiate_recovery -- commit_block -- commit_preempted_by_another_recovery This loop is paused after 300 seconds which is the hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node detection timeout is 20 seconds. Note that before the patch, we do not call recoverLease so aggressively - also it seems that the HDFS namenode is pretty dumb in that it keeps initiating new recoveries for every call. Before the patch, we call recoverLease, assume that the block was recovered, try to get the file, it has zero length since its under recovery, we fail the task and retry until we get a non zero length. So things just work. Fixes: 1) Expecting recovery to occur within 1 second is too aggressive. We need to have a more generous timeout. The timeout needs to be configurable since typically, the recovery takes as much time as the DFS timeouts. The primary datanode doing the recovery tries to reconcile the blocks and hits the timeouts when it tries to contact the dead node. So the recovery is as fast as the HDFS timeouts. 2) We have another issue I report in HDFS 4721. The Namenode chooses the stale datanode to perform the recovery (since its still alive). Hence the first recovery request is bound to fail. So if we want a tight MTTR, we either need something like HDFS 4721 or we need something like this recoverLease(...) sleep(1000) recoverLease(...) sleep(configuredTimeout) recoverLease(...) sleep(configuredTimeout) Where configuredTimeout should be large enough to let the recovery happen but the first timeout is short so that we get past the moot recovery in step #1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6721: -- Status: Patch Available (was: Open) RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Vandana Ayyalasomayajula Fix For: 0.95.1 Attachments: 6721-master-webUI.patch, HBASE-6721_8.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction
[ https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643294#comment-13643294 ] Elliott Clark commented on HBASE-8448: -- This could let compactions that are less than the min number of files through. RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction Key: HBASE-8448 URL: https://issues.apache.org/jira/browse/HBASE-8448 Project: HBase Issue Type: Bug Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8448-v0.patch The code added to make sure it doesn't get stuck, doesn't take into account filesCompacting. This is the cause of recent TestHFileArchiving failures... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2231) Compaction events should be written to HLog
[ https://issues.apache.org/jira/browse/HBASE-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-2231: - Resolution: Fixed Fix Version/s: 0.98.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to trunk and 0.95. Thanks Stack. Compaction events should be written to HLog --- Key: HBASE-2231 URL: https://issues.apache.org/jira/browse/HBASE-2231 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Todd Lipcon Assignee: stack Priority: Blocker Labels: moved_from_0_20_5 Fix For: 0.98.0, 0.95.1 Attachments: 2231-testcase-0.94.txt, 2231-testcase_v2.txt, 2231-testcase_v3.txt, 2231v2.txt, 2231v3.txt, 2231v4.txt, hbase-2231-testcase.txt, hbase-2231.txt, hbase-2231_v5.patch, hbase-2231_v6.patch, hbase-2231_v7.patch, hbase-2231_v7.patch The sequence for a compaction should look like this: # Compact region to new files # Write a Compacted Region entry to the HLog # Delete old files This deals with a case where the RS has paused between step 1 and 2 and the regions have since been reassigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2231) Compaction events should be written to HLog
[ https://issues.apache.org/jira/browse/HBASE-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-2231: - Attachment: hbase-2231_v7-0.95.patch Attaching 0.95 version of the patch, had to resolve some minor conflicts. Compaction events should be written to HLog --- Key: HBASE-2231 URL: https://issues.apache.org/jira/browse/HBASE-2231 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Todd Lipcon Assignee: stack Priority: Blocker Labels: moved_from_0_20_5 Fix For: 0.98.0, 0.95.1 Attachments: 2231-testcase-0.94.txt, 2231-testcase_v2.txt, 2231-testcase_v3.txt, 2231v2.txt, 2231v3.txt, 2231v4.txt, hbase-2231-testcase.txt, hbase-2231.txt, hbase-2231_v5.patch, hbase-2231_v6.patch, hbase-2231_v7-0.95.patch, hbase-2231_v7.patch, hbase-2231_v7.patch The sequence for a compaction should look like this: # Compact region to new files # Write a Compacted Region entry to the HLog # Delete old files This deals with a case where the RS has paused between step 1 and 2 and the regions have since been reassigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction
[ https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643299#comment-13643299 ] Sergey Shelukhin commented on HBASE-8448: - We are doing the sublist after the get eligible files, right RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction Key: HBASE-8448 URL: https://issues.apache.org/jira/browse/HBASE-8448 Project: HBase Issue Type: Bug Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8448-v0.patch The code added to make sure it doesn't get stuck, doesn't take into account filesCompacting. This is the cause of recent TestHFileArchiving failures... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction
[ https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643305#comment-13643305 ] Sergey Shelukhin commented on HBASE-8448: - Ah, I see, that's a separate problem RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction Key: HBASE-8448 URL: https://issues.apache.org/jira/browse/HBASE-8448 Project: HBase Issue Type: Bug Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8448-v0.patch The code added to make sure it doesn't get stuck, doesn't take into account filesCompacting. This is the cause of recent TestHFileArchiving failures... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction
[ https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643309#comment-13643309 ] Sergey Shelukhin commented on HBASE-8448: - I am going to move it to apply... after all, that way all the max-min-bulk-etc. checks will be honored. RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction Key: HBASE-8448 URL: https://issues.apache.org/jira/browse/HBASE-8448 Project: HBase Issue Type: Bug Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8448-v0.patch The code added to make sure it doesn't get stuck, doesn't take into account filesCompacting. This is the cause of recent TestHFileArchiving failures... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction
[ https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-8448: Status: Open (was: Patch Available) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction Key: HBASE-8448 URL: https://issues.apache.org/jira/browse/HBASE-8448 Project: HBase Issue Type: Bug Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8448-v0.patch The code added to make sure it doesn't get stuck, doesn't take into account filesCompacting. This is the cause of recent TestHFileArchiving failures... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction
[ https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643312#comment-13643312 ] Elliott Clark commented on HBASE-8448: -- Sounds good to me. RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction Key: HBASE-8448 URL: https://issues.apache.org/jira/browse/HBASE-8448 Project: HBase Issue Type: Bug Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8448-v0.patch The code added to make sure it doesn't get stuck, doesn't take into account filesCompacting. This is the cause of recent TestHFileArchiving failures... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8450) Update hbase-default.xml and general recommendations to better suit current hw, h2, experience, etc.
stack created HBASE-8450: Summary: Update hbase-default.xml and general recommendations to better suit current hw, h2, experience, etc. Key: HBASE-8450 URL: https://issues.apache.org/jira/browse/HBASE-8450 Project: HBase Issue Type: Task Reporter: stack Priority: Critical Fix For: 0.95.1 This is a critical task we need to do before we release; review our defaults. On cursory review, there are configs in hbase-default.xml that no longer have matching code; there are some that should be changed because we know better now and others that should be amended because hardware and deploys are bigger than they used to be. We could also move stuff around so that the must-edit stuff is near the top (zk quorum config. is mid-way down the page) and beef up the descriptions -- especially since these descriptions shine through in the refguide. Lastly, I notice that our tgz does not include an hbase-default.xml other than the one bundled up in the jar. Maybe we should make it more accessible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francis Liu updated HBASE-6721: --- Attachment: HBASE-6721-DesigDoc.pdf RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Vandana Ayyalasomayajula Fix For: 0.95.1 Attachments: 6721-master-webUI.patch, HBASE-6721_8.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8450) Update hbase-default.xml and general recommendations to better suit current hw, h2, experience, etc.
[ https://issues.apache.org/jira/browse/HBASE-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8450: - Attachment: 8450.txt Here is a start: Ups handlers to 100 from 10. Removes the no longer referenced: hbase.regionserver.nbreservationblocks hbase.hash.type Upps memstore.lowerlimit so it is close to higher limit making it 0.38 instead of 0.35 (high limit is 0.40). Make major compactions run once a week instead of every day. Remove dfs.support.append -- it only brings on a complaint if present (there is a UI component that also needs updating if this goes away). What else can we do to improve basic defaults? Update hbase-default.xml and general recommendations to better suit current hw, h2, experience, etc. Key: HBASE-8450 URL: https://issues.apache.org/jira/browse/HBASE-8450 Project: HBase Issue Type: Task Reporter: stack Priority: Critical Fix For: 0.95.1 Attachments: 8450.txt This is a critical task we need to do before we release; review our defaults. On cursory review, there are configs in hbase-default.xml that no longer have matching code; there are some that should be changed because we know better now and others that should be amended because hardware and deploys are bigger than they used to be. We could also move stuff around so that the must-edit stuff is near the top (zk quorum config. is mid-way down the page) and beef up the descriptions -- especially since these descriptions shine through in the refguide. Lastly, I notice that our tgz does not include an hbase-default.xml other than the one bundled up in the jar. Maybe we should make it more accessible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8450) Update hbase-default.xml and general recommendations to better suit current hw, h2, experience, etc.
[ https://issues.apache.org/jira/browse/HBASE-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-8450: - Component/s: Usability Update hbase-default.xml and general recommendations to better suit current hw, h2, experience, etc. Key: HBASE-8450 URL: https://issues.apache.org/jira/browse/HBASE-8450 Project: HBase Issue Type: Task Components: Usability Reporter: stack Priority: Critical Fix For: 0.95.1 Attachments: 8450.txt This is a critical task we need to do before we release; review our defaults. On cursory review, there are configs in hbase-default.xml that no longer have matching code; there are some that should be changed because we know better now and others that should be amended because hardware and deploys are bigger than they used to be. We could also move stuff around so that the must-edit stuff is near the top (zk quorum config. is mid-way down the page) and beef up the descriptions -- especially since these descriptions shine through in the refguide. Lastly, I notice that our tgz does not include an hbase-default.xml other than the one bundled up in the jar. Maybe we should make it more accessible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1364#comment-1364 ] Francis Liu commented on HBASE-6721: [~saint@gmail.com] I've updated the doc. Addressing your questions. Let me me know if it's missing anything else. RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Vandana Ayyalasomayajula Fix For: 0.95.1 Attachments: 6721-master-webUI.patch, HBASE-6721_8.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francis Liu updated HBASE-6721: --- Attachment: HBASE-6721_9.patch RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Vandana Ayyalasomayajula Fix For: 0.95.1 Attachments: 6721-master-webUI.patch, HBASE-6721_8.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721_9.patch, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction
[ https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643347#comment-13643347 ] Hadoop QA commented on HBASE-8448: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580759/HBASE-8448-v0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.thrift.TestThriftServer Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5468//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5468//console This message is automatically generated. RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction Key: HBASE-8448 URL: https://issues.apache.org/jira/browse/HBASE-8448 Project: HBase Issue Type: Bug Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8448-v0.patch The code added to make sure it doesn't get stuck, doesn't take into account filesCompacting. This is the cause of recent TestHFileArchiving failures... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-8390) Trunk/0.95 cannot simply compile against Hadoop 1.0
[ https://issues.apache.org/jira/browse/HBASE-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans resolved HBASE-8390. --- Resolution: Fixed Fix Version/s: 0.98.0 Looks like it worked, resolving. Thanks Stack for your help. Trunk/0.95 cannot simply compile against Hadoop 1.0 --- Key: HBASE-8390 URL: https://issues.apache.org/jira/browse/HBASE-8390 Project: HBase Issue Type: Bug Affects Versions: 0.95.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.98.0, 0.95.0 Attachments: HBASE-8390.patch Currently we can't simply compile against Hadoop 1.0 in 0.95 and newer, we are missing a dependency in common for Apache's commons-io. Easy fix, we could just add that dependency for all the profiles there. But doing it correctly requires adding a new profile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643354#comment-13643354 ] Hadoop QA commented on HBASE-6721: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580760/HBASE-6721_8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 30 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 6 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.backup.TestHFileArchiving Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5469//console This message is automatically generated. RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Vandana Ayyalasomayajula Fix For: 0.95.1 Attachments: 6721-master-webUI.patch, HBASE-6721_8.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721_9.patch, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction
[ https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-8448: Attachment: HBASE-8448-v1.patch sorry got distracted RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction Key: HBASE-8448 URL: https://issues.apache.org/jira/browse/HBASE-8448 Project: HBase Issue Type: Bug Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8448-v0.patch, HBASE-8448-v1.patch The code added to make sure it doesn't get stuck, doesn't take into account filesCompacting. This is the cause of recent TestHFileArchiving failures... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8426) Opening a region failed on Metrics source RegionServer,sub=Regions already exists!
[ https://issues.apache.org/jira/browse/HBASE-8426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643389#comment-13643389 ] Jean-Daniel Cryans commented on HBASE-8426: --- +1 from me. Opening a region failed on Metrics source RegionServer,sub=Regions already exists! Key: HBASE-8426 URL: https://issues.apache.org/jira/browse/HBASE-8426 Project: HBase Issue Type: Bug Affects Versions: 0.95.0 Reporter: Jean-Daniel Cryans Assignee: Elliott Clark Priority: Critical Fix For: 0.98.0, 0.95.1 Attachments: HBASE-8426-0.patch, HBASE-8426-1.patch, metrics_already_exist.txt I restarted a cluster on 0.95 (1ecd4c7e0b22bba75c76f2fc2ce369541502b6df) and some regions failed to open on their first assignment on an exception like: {noformat} Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source RegionServer,sub=Regions already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217) at org.apache.hadoop.hbase.metrics.BaseSourceImpl.init(BaseSourceImpl.java:75) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.init(MetricsRegionAggregateSourceImpl.java:49) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.init(MetricsRegionAggregateSourceImpl.java:41) at org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceFactoryImpl.getAggregate(MetricsRegionServerSourceFactoryImpl.java:33) at org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceFactoryImpl.createRegion(MetricsRegionServerSourceFactoryImpl.java:50) at org.apache.hadoop.hbase.regionserver.MetricsRegion.init(MetricsRegion.java:35) at org.apache.hadoop.hbase.regionserver.HRegion.init(HRegion.java:488) at org.apache.hadoop.hbase.regionserver.HRegion.init(HRegion.java:400) {noformat} I'm attaching a bigger log. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8449) Refactor recoverLease retries and pauses informed by findings over in hbase-8389
[ https://issues.apache.org/jira/browse/HBASE-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Sharma updated HBASE-8449: Summary: Refactor recoverLease retries and pauses informed by findings over in hbase-8389 (was: Refactor recoverLease retries and pauses informed by findings over in hbase-8354) Refactor recoverLease retries and pauses informed by findings over in hbase-8389 Key: HBASE-8449 URL: https://issues.apache.org/jira/browse/HBASE-8449 Project: HBase Issue Type: Bug Components: Filesystem Integration Affects Versions: 0.94.7, 0.95.0 Reporter: stack Priority: Critical Fix For: 0.95.1 HBASE-8354 is an interesting issue that roams near and far. This issue is about making use of the findings handily summarized on the end of hbase-8354 which have it that trunk needs refactor around how it does its recoverLease handling (and that the patch committed against HBASE-8354 is not what we want going forward). This issue is about making a patch that adds a lag between recoverLease invocations where the lag is related to dfs timeouts -- the hdfs-side dfs timeout -- and optionally makes use of the isFileClosed API if it is available (a facility that is not yet committed to a branch near you and unlikely to be within your locality with a good while to come). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction
[ https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-8448: Status: Patch Available (was: Open) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction Key: HBASE-8448 URL: https://issues.apache.org/jira/browse/HBASE-8448 Project: HBase Issue Type: Bug Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-8448-v0.patch, HBASE-8448-v1.patch The code added to make sure it doesn't get stuck, doesn't take into account filesCompacting. This is the cause of recent TestHFileArchiving failures... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7667) Support stripe compaction
[ https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-7667: Attachment: Using stripe compactions.pdf First draft of user-level doc. After trying to describe the size-based scheme, I think it should be improved. I will do that. Meanwhile there's design doc and user doc, so I'd like to get some reviews ;) I will rebase and update all patches between now and monday. [~stack] [~mbertozzi] what do you guys think? Support stripe compaction - Key: HBASE-7667 URL: https://issues.apache.org/jira/browse/HBASE-7667 Project: HBase Issue Type: New Feature Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Using stripe compactions.pdf So I was thinking about having many regions as the way to make compactions more manageable, and writing the level db doc about how level db range overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication factor. And I suggest the following idea, let's call it stripe compactions. It's a mix between level db ideas and having many small regions. It allows us to have a subset of benefits of many regions (wrt reads and compactions) without many of the drawbacks (managing and current memstore/etc. limitation). It also doesn't break seqNum-based file sorting for any one key. It works like this. The region key space is separated into configurable number of fixed-boundary stripes (determined the first time we stripe the data, see below). All the data from memstores is written to normal files with all keys present (not striped), similar to L0 in LevelDb, or current files. Compaction policy does 3 types of compactions. First is L0 compaction, which takes all L0 files and breaks them down by stripe. It may be optimized by adding more small files from different stripes, but the main logical outcome is that there are no more L0 files and all data is striped. Second is exactly similar to current compaction, but compacting one single stripe. In future, nothing prevents us from applying compaction rules and compacting part of the stripe (e.g. similar to current policy with rations and stuff, tiers, whatever), but for the first cut I'd argue let it major compact the entire stripe. Or just have the ratio and no more complexity. Finally, the third addresses the concern of the fixed boundaries causing stripes to be very unbalanced. It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the results out with different boundaries. There's a tradeoff here - if we always take 2 adjacent stripes, compactions will be smaller but rebalancing will take ridiculous amount of I/O. If we take many stripes we are essentially getting into the epic-major-compaction problem again. Some heuristics will have to be in place. In general, if, before stripes are determined, we initially let L0 grow before determining the stripes, we will get better boundaries. Also, unless unbalancing is really large we don't need to rebalance really. Obviously this scheme (as well as level) is not applicable for all scenarios, e.g. if timestamp is your key it completely falls apart. The end result: - many small compactions that can be spread out in time. - reads still read from a small number of files (one stripe + L0). - region splits become marvelously simple (if we could move files between regions, no references would be needed). Main advantage over Level (for HBase) is that default store can still open the files and get correct results - there are no range overlap shenanigans. It also needs no metadata, although we may record some for convenience. It also would appear to not cause as much I/O. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7667) Support stripe compaction
[ https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-7667: Attachment: Using stripe compactions.pdf Support stripe compaction - Key: HBASE-7667 URL: https://issues.apache.org/jira/browse/HBASE-7667 Project: HBase Issue Type: New Feature Components: Compaction Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Using stripe compactions.pdf So I was thinking about having many regions as the way to make compactions more manageable, and writing the level db doc about how level db range overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication factor. And I suggest the following idea, let's call it stripe compactions. It's a mix between level db ideas and having many small regions. It allows us to have a subset of benefits of many regions (wrt reads and compactions) without many of the drawbacks (managing and current memstore/etc. limitation). It also doesn't break seqNum-based file sorting for any one key. It works like this. The region key space is separated into configurable number of fixed-boundary stripes (determined the first time we stripe the data, see below). All the data from memstores is written to normal files with all keys present (not striped), similar to L0 in LevelDb, or current files. Compaction policy does 3 types of compactions. First is L0 compaction, which takes all L0 files and breaks them down by stripe. It may be optimized by adding more small files from different stripes, but the main logical outcome is that there are no more L0 files and all data is striped. Second is exactly similar to current compaction, but compacting one single stripe. In future, nothing prevents us from applying compaction rules and compacting part of the stripe (e.g. similar to current policy with rations and stuff, tiers, whatever), but for the first cut I'd argue let it major compact the entire stripe. Or just have the ratio and no more complexity. Finally, the third addresses the concern of the fixed boundaries causing stripes to be very unbalanced. It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the results out with different boundaries. There's a tradeoff here - if we always take 2 adjacent stripes, compactions will be smaller but rebalancing will take ridiculous amount of I/O. If we take many stripes we are essentially getting into the epic-major-compaction problem again. Some heuristics will have to be in place. In general, if, before stripes are determined, we initially let L0 grow before determining the stripes, we will get better boundaries. Also, unless unbalancing is really large we don't need to rebalance really. Obviously this scheme (as well as level) is not applicable for all scenarios, e.g. if timestamp is your key it completely falls apart. The end result: - many small compactions that can be spread out in time. - reads still read from a small number of files (one stripe + L0). - region splits become marvelously simple (if we could move files between regions, no references would be needed). Main advantage over Level (for HBase) is that default store can still open the files and get correct results - there are no range overlap shenanigans. It also needs no metadata, although we may record some for convenience. It also would appear to not cause as much I/O. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira