[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests

2013-04-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642629#comment-13642629
 ] 

stack commented on HBASE-8389:
--

Reading over this nice, fat, info-dense issue, I am trying to figure what we 
need to add to trunk right now.

Sounds like checking the recoverFileLease return checking gained us little in 
the end (though Varun you think we want to keep going till its true though v5 
here skips out on it).  The valuable finding hereabouts is the need for a pause 
before going ahead with file open it seems.  Trunk does not have this pause.  I 
need to add a version of v5 to trunk?  (Holding our breath until an api not yet 
generally available, isFileClosed hbase-8394, shows up is not an option for 
now; nor is an expectation that all will just upgrade to an hdfs that has this 
api on either.)

hbase-7878 backport is now elided since we have added back the old behavior w/ 
patch applied here excepting the pause of an arbitrary enough 4seconds

The applied patch here does not loop on recoverLease after the 4seconds expire. 
 It breaks. In trunk we loop.  We should break too (...and let it fail if 0 
length and then let the next split task do a new recoverLease call?)

On the 4seconds, it seems that it rather should be the dfs timeout 
dfs.socket.timeout that hdfs is using -- plus a second or so -- rather than 
4seconds if I follow Varuns' reasoning above properly and just remove the new 
config 'hbase.lease.recovery.retry.interval' (We have enough configs already)?

Sounds like we are depending on WAL sizes being  HDFS block sizes.  This will 
not always be the case; we could go into a second block easily if a big edit 
comes in on the tail of the first block; and then there may be dataloss (TBD) 
because we have a file size (so we think the file recovered?)

Sounds also like we are relying file size being zero as a marker that file is 
not yet closed (I suppose that is ok because an empty WAL will be  0 length 
IIRC.  We should doc. our dependency though)

Varun, i like your low timeouts.  Would you suggest we adjust hbase default 
timeouts down and recommend folks change their hdfs defaults if they want 
better MTTR?  If you had a blog post on your nice work done in here, I could at 
least point the refguide at it for those interested in improved MTTR (smile).

 HBASE-8354 forces Namenode into loop with lease recovery requests
 -

 Key: HBASE-8389
 URL: https://issues.apache.org/jira/browse/HBASE-8389
 Project: HBase
  Issue Type: Bug
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.94.8

 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 
 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 
 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, 
 sample.patch


 We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease 
 recoveries because of the short retry interval of 1 second between lease 
 recoveries.
 The namenode gets into the following loop:
 1) Receives lease recovery request and initiates recovery choosing a primary 
 datanode every second
 2) A lease recovery is successful and the namenode tries to commit the block 
 under recovery as finalized - this takes  10 seconds in our environment 
 since we run with tight HDFS socket timeouts.
 3) At step 2), there is a more recent recovery enqueued because of the 
 aggressive retries. This causes the committed block to get preempted and we 
 enter a vicious cycle
 So we do,  initiate_recovery -- commit_block -- 
 commit_preempted_by_another_recovery
 This loop is paused after 300 seconds which is the 
 hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes 
 which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node 
 detection timeout is 20 seconds.
 Note that before the patch, we do not call recoverLease so aggressively - 
 also it seems that the HDFS namenode is pretty dumb in that it keeps 
 initiating new recoveries for every call. Before the patch, we call 
 recoverLease, assume that the block was recovered, try to get the file, it 
 has zero length since its under recovery, we fail the task and retry until we 
 get a non zero length. So things just work.
 Fixes:
 1) Expecting recovery to occur within 1 second is too aggressive. We need to 
 have a more generous timeout. The timeout needs to be configurable since 
 typically, the recovery takes as much time as the DFS timeouts. The primary 
 datanode doing the recovery tries to reconcile the blocks and hits the 
 timeouts when it tries to contact the dead node. So the recovery is as fast 
 as the HDFS timeouts.
 2) We have another issue I report in HDFS 

[jira] [Updated] (HBASE-8445) regionserver can't load an updated coprocessor jar with the same jar path

2013-04-26 Thread Wang Qiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Qiang updated HBASE-8445:
--

Attachment: patch_20130426_01.txt

 regionserver can't load an updated coprocessor jar with the same jar path
 -

 Key: HBASE-8445
 URL: https://issues.apache.org/jira/browse/HBASE-8445
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5
Reporter: Wang Qiang
 Attachments: patch_20130426_01.txt


 when I update a coprocessor jar, then I disable and enable the table with the 
 coprocessor, but the new features in the updated coprocessor jar doesn't make 
 any sense. Follow into the class 
 'org.apache.hadoop.hbase.coprocessor.CoprocessorHost', I found that there's a 
 coprocessor class loader cache , of which the key is the coprocessor jar 
 path(although the key is a weak reference), so when I disable/enable the 
 table, it got a cached coprocessor class loader from the cache with the jar 
 path, and it didn't try to reload the coprocessor jar from the hdfs. Here I 
 give a patch, in which I add an extra info which is 'FileCheckSum' with the 
 coprocessor class loader cache, if the checksum is changed, try to reload the 
 jar from the hdfs path

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8445) regionserver can't load an updated coprocessor jar with the same jar path

2013-04-26 Thread Wang Qiang (JIRA)
Wang Qiang created HBASE-8445:
-

 Summary: regionserver can't load an updated coprocessor jar with 
the same jar path
 Key: HBASE-8445
 URL: https://issues.apache.org/jira/browse/HBASE-8445
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5
Reporter: Wang Qiang
 Attachments: patch_20130426_01.txt

when I update a coprocessor jar, then I disable and enable the table with the 
coprocessor, but the new features in the updated coprocessor jar doesn't make 
any sense. Follow into the class 
'org.apache.hadoop.hbase.coprocessor.CoprocessorHost', I found that there's a 
coprocessor class loader cache , of which the key is the coprocessor jar 
path(although the key is a weak reference), so when I disable/enable the table, 
it got a cached coprocessor class loader from the cache with the jar path, and 
it didn't try to reload the coprocessor jar from the hdfs. Here I give a patch, 
in which I add an extra info which is 'FileCheckSum' with the coprocessor class 
loader cache, if the checksum is changed, try to reload the jar from the hdfs 
path

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8422) Master won't go down. Stuck waiting on .META. to come on line.

2013-04-26 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642651#comment-13642651
 ] 

rajeshbabu commented on HBASE-8422:
---

[~stack],
During master initialization, after initiating ROOT/META region assignment we 
are waiting until the region assigned successfully. In case of stop/shutdown we 
skip waiting for ROOT/META assignment and returning from initialization(with 
patch). In trunk this case already handled that's why 94 patch looking bit 
different.
{code}
// Make sure meta assigned before proceeding.
if (!assignMeta(status)) return;
{code}
{code}
  boolean assignMeta(MonitoredTask status)
  throws InterruptedException, IOException, KeeperException {
  ...
  enableSSHandWaitForMeta();
  // Make sure a .META. location is set.
  if (!isMetaLocation()) return false;
  ...
  }
{code}
Otherwise there are multiple places finishInitialization can hang on master 
shutdown if no region server is online.
It will not impact normal cases.


 Master won't go down.  Stuck waiting on .META. to come on line.
 ---

 Key: HBASE-8422
 URL: https://issues.apache.org/jira/browse/HBASE-8422
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.0
Reporter: stack
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: HBASE-8422_2.patch, HBASE-8422_3.patch, 
 HBASE-8422_94.patch, HBASE-8422.patch


 Master came up w/ no regionservers.  I then tried to shut it down.  You can 
 see in below that it started to go down
 {code}
 2013-04-24 14:28:49,770 INFO  [IPC Server handler 7 on 6] 
 org.apache.hadoop.hbase.master.HMaster: Cluster shutdown requested
 2013-04-24 14:28:49,815 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.ServerManager: Finished waiting for region 
 servers count to settle; checked in 0, slept for 2818 ms, expecting minimum 
 of 1, maximum of 2147483647, master is stopped.
 2013-04-24 14:28:49,815 WARN  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.MasterFileSystem: Master stopped while 
 splitting logs
 2013-04-24 14:28:50,104 INFO  
 [stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor] 
 org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor: 
 stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor 
 exiting
 2013-04-24 14:28:50,850 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker: Unsetting META region 
 location in ZooKeeper
 2013-04-24 14:28:50,884 WARN  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
 /hbase/meta-region-server already deleted, retry=false
 2013-04-24 14:28:50,884 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.AssignmentManager: Cluster shutdown is set; 
 skipping assign of .META.,,1.1028785192
 2013-04-24 14:28:50,884 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.ServerManager: AssignmentManager hasn't 
 finished failover cleanup
 2013-04-24 14:29:46,188 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner] 
 org.apache.hadoop.hbase.master.cleaner.LogCleaner: 
 master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner exiting
 2013-04-24 14:29:46,193 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner] 
 org.apache.hadoop.hbase.master.cleaner.HFileCleaner: 
 master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner 
 exiting
 {code}
 ... but not it is stuck.
 We keep looping here:
 {code}
 master-stack-1.ent.cloudera.com,6,1366838923135 prio=10 
 tid=0x7f154853f000 nid=0x18b in Object.wait() [0x7f1545fde000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on 0xc727d738 (a 
 org.apache.hadoop.hbase.zookeeper.MetaRegionTracker)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:161)
 - locked 0xc727d738 (a 
 org.apache.hadoop.hbase.zookeeper.MetaRegionTracker)
 at 
 org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.waitMetaRegionLocation(MetaRegionTracker.java:105)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:250)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:299)
 at 
 org.apache.hadoop.hbase.master.HMaster.enableSSHandWaitForMeta(HMaster.java:905)
 at 

[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests

2013-04-26 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642669#comment-13642669
 ] 

Nicolas Liochon commented on HBASE-8389:


Varun, I +1 Stack: the timeout setting you mentionned are quite impressive!
Thanks a lot for all this work.

Here is my understanding, please correct me where I'm wrong.

In don't think that single / multiple block is an issue, even if it's better to 
have single block (increased parallelism).

HBase has a dataloss risk: we need to wait for the end of recoverFileLease 
before reading.
 = Either by polling the NN and calling recoverFileLease multiple times
 = Either calling isFileClosed (HDFS-4525) (and polling as well) where it's 
available.

I'm not sure that we can poll every second recoverFileLease. When I try I have 
the same logs as Eric: java.io.IOException: The recovery id 2494 does not 
match current recovery id 2495 for block, and the state of the namenode seems 
strange. 

In critical scenarios, the recoverFileLease won't happen at all. The 
probability is greatly decreased by HDFS-4721, but it's not zero.

In critical scenarios, the recoverFileLease will start, but will be stuck in 
bad datanodes. The probability is greatly decreased by HDFS-4721 and HDFS-4754, 
but it's not zero. Here, we need to limit the number of retry in HDFS to one, 
whatever the global setting, to be on the safe side (no hdfs jira for this).

I see a possible common implementation (trunk / hbase 0.94)
 - if HDFS-4754, calls markAsStale to be sure this datanode won't be used.
 - call recoverFileLease a first time
 - if HDFS-4525 is available, call isFileClosed every second to detect that the 
recovery is done
 - every 60s, call again recoverFileLease (either isFileClosed is missing, 
either we went into one of the bad scenario above). 

This would mean: no dataloss and a MTTR of:
 - less than a minute if we have stale mode + HDFS-4721 + HDFS-4754 + HDFS-4525 
+ no retry in HDFS recoverLease or Varun's settings.
 - around 12 minutes if we have none of the above. But that's what we have 
already without the stale mode imho.
 - in the middle if we have a subset of the above patches and config.

As HDFS-4721 seems validated by the HDFS dev team, I think that my only 
question is: can we poll very frequently recoverFileLease if we don't have 
isFileClosed?

As a side node, tests more or less similar to yours with HBase trunk and HDFS 
branch-2 trunk (without your settings but with a hack to skip the deadnodes) 
brings similar results.


 HBASE-8354 forces Namenode into loop with lease recovery requests
 -

 Key: HBASE-8389
 URL: https://issues.apache.org/jira/browse/HBASE-8389
 Project: HBase
  Issue Type: Bug
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.94.8

 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 
 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 
 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, 
 sample.patch


 We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease 
 recoveries because of the short retry interval of 1 second between lease 
 recoveries.
 The namenode gets into the following loop:
 1) Receives lease recovery request and initiates recovery choosing a primary 
 datanode every second
 2) A lease recovery is successful and the namenode tries to commit the block 
 under recovery as finalized - this takes  10 seconds in our environment 
 since we run with tight HDFS socket timeouts.
 3) At step 2), there is a more recent recovery enqueued because of the 
 aggressive retries. This causes the committed block to get preempted and we 
 enter a vicious cycle
 So we do,  initiate_recovery -- commit_block -- 
 commit_preempted_by_another_recovery
 This loop is paused after 300 seconds which is the 
 hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes 
 which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node 
 detection timeout is 20 seconds.
 Note that before the patch, we do not call recoverLease so aggressively - 
 also it seems that the HDFS namenode is pretty dumb in that it keeps 
 initiating new recoveries for every call. Before the patch, we call 
 recoverLease, assume that the block was recovered, try to get the file, it 
 has zero length since its under recovery, we fail the task and retry until we 
 get a non zero length. So things just work.
 Fixes:
 1) Expecting recovery to occur within 1 second is too aggressive. We need to 
 have a more generous timeout. The timeout needs to be configurable since 
 typically, the recovery takes as much time as the DFS timeouts. The primary 
 datanode doing the 

[jira] [Commented] (HBASE-8392) TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 profile

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642677#comment-13642677
 ] 

Hudson commented on HBASE-8392:
---

Integrated in hbase-0.95 #163 (See 
[https://builds.apache.org/job/hbase-0.95/163/])
HBASE-8392 TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 
profile (Revision 1475997)

 Result = FAILURE
eclark : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestExactCounterMetric.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestExponentiallyDecayingSample.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsHistogram.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsMBeanBase.java


 TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 profile
 

 Key: HBASE-8392
 URL: https://issues.apache.org/jira/browse/HBASE-8392
 Project: HBase
  Issue Type: Sub-task
  Components: hadoop2, metrics, test
Affects Versions: 0.98.0, 0.95.0
Reporter: Jonathan Hsieh
Assignee: Elliott Clark
 Fix For: 0.98.0, 0.95.1

 Attachments: HBASE-8392-0.patch


 This specific small unit tests flakes out occasionally and blocks the medium 
 and large tests from running.
 Here's an error trace:
 {code}
 Error Message
 expected:2.0 but was:0.125
 Stacktrace
 junit.framework.AssertionFailedError: expected:2.0 but was:0.125
   at junit.framework.Assert.fail(Assert.java:57)
   at junit.framework.Assert.failNotEquals(Assert.java:329)
   at junit.framework.Assert.assertEquals(Assert.java:120)
   at junit.framework.Assert.assertEquals(Assert.java:129)
   at junit.framework.TestCase.assertEquals(TestCase.java:288)
   at 
 org.apache.hadoop.hbase.metrics.TestMetricsMBeanBase.testGetAttribute(TestMetricsMBeanBase.java:93)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:176)
   at junit.framework.TestCase.runBare(TestCase.java:141)
   at junit.framework.TestResult$1.protect(TestResult.java:122)
   at junit.framework.TestResult.runProtected(TestResult.java:142)
   at junit.framework.TestResult.run(TestResult.java:125)
   at junit.framework.TestCase.run(TestCase.java:129)
   at junit.framework.TestSuite.runTest(TestSuite.java:255)
   at junit.framework.TestSuite.run(TestSuite.java:250)
   at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
   at org.junit.runners.Suite.runChild(Suite.java:127)
   at org.junit.runners.Suite.runChild(Suite.java:26)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 [~eclark] took a quick look and will chime in on this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8024) Make Store flush algorithm pluggable

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642676#comment-13642676
 ] 

Hudson commented on HBASE-8024:
---

Integrated in hbase-0.95 #163 (See 
[https://builds.apache.org/job/hbase-0.95/163/])
HBASE-8024 Make Store flush algorithm pluggable (Revision 1475871)

 Result = FAILURE
sershe : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreEngine.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreFlusher.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreEngine.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlushContext.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlusher.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultStoreEngine.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java


 Make Store flush algorithm pluggable
 

 Key: HBASE-8024
 URL: https://issues.apache.org/jira/browse/HBASE-8024
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.94.5, 0.95.0, 0.95.2
Reporter: Maryann Xue
Assignee: Sergey Shelukhin
 Fix For: 0.95.1

 Attachments: HBASE-8024-trunk.patch, HBASE-8024.v2.patch, 
 HBASE-8024-v3.patch, HBASE-8024-v4.patch


 The idea is to make StoreFlusher an interface instead of an implementation 
 class, and have the original StoreFlusher as the default store flush impl.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8393) Testcase TestHeapSize#testMutations is wrong

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642678#comment-13642678
 ] 

Hudson commented on HBASE-8393:
---

Integrated in hbase-0.95 #163 (See 
[https://builds.apache.org/job/hbase-0.95/163/])
HBASE-8393 Testcase TestHeapSize#testMutations is wrong (Jeffrey) (Revision 
1476024)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/io/TestHeapSize.java


 Testcase TestHeapSize#testMutations is wrong
 

 Key: HBASE-8393
 URL: https://issues.apache.org/jira/browse/HBASE-8393
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0, 0.95.1

 Attachments: hbase-8393.patch


 I happened to check this test case and there are several existing errors to 
 make it pass. You can reproduce the test case failure by adding a new field 
 into Mutation, the test case will either fail on a 64 bit system or 32 bit 
 one.
 Below are errors I found in the test case:
 1) The test case is using {code}row=new byte[]{0}{code} which is an array 
 with length=1 while ClassSize.estimateBase can only calculate base class 
 size(without counting field array length)
 2) Add ClassSize.REFERENCE twice in the following code because 
 ClassSize.estimateBase adds all reference fields already. {code}expected += 
 ClassSize.align(ClassSize.TREEMAP + ClassSize.REFERENCE);{code}
 3) ClassSize.estimateBase round up the sum of length of reference fields + 
 primitive fields + Array while Mutation.MUTATION_OVERHEAD aligns the sum of 
 length of a different set of fields. Therefore, there will be round up 
 differences for class Increment because it introduces a new reference field 
 TimeRange tr when the test case runs on a 32bit and 64 bit system.   
 {code}
 ...
 long prealign_size = coeff[0] + align(coeff[1] * ARRAY) + coeff[2] * 
 REFERENCE;
 // Round up to a multiple of 8
 long size = align(prealign_size);
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8422) Master won't go down. Stuck waiting on .META. to come on line.

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642679#comment-13642679
 ] 

Hudson commented on HBASE-8422:
---

Integrated in hbase-0.95 #163 (See 
[https://builds.apache.org/job/hbase-0.95/163/])
HBASE-8422 Master won't go down. Stuck waiting on .META. to come on line 
(Revision 1475987)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java


 Master won't go down.  Stuck waiting on .META. to come on line.
 ---

 Key: HBASE-8422
 URL: https://issues.apache.org/jira/browse/HBASE-8422
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.0
Reporter: stack
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: HBASE-8422_2.patch, HBASE-8422_3.patch, 
 HBASE-8422_94.patch, HBASE-8422.patch


 Master came up w/ no regionservers.  I then tried to shut it down.  You can 
 see in below that it started to go down
 {code}
 2013-04-24 14:28:49,770 INFO  [IPC Server handler 7 on 6] 
 org.apache.hadoop.hbase.master.HMaster: Cluster shutdown requested
 2013-04-24 14:28:49,815 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.ServerManager: Finished waiting for region 
 servers count to settle; checked in 0, slept for 2818 ms, expecting minimum 
 of 1, maximum of 2147483647, master is stopped.
 2013-04-24 14:28:49,815 WARN  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.MasterFileSystem: Master stopped while 
 splitting logs
 2013-04-24 14:28:50,104 INFO  
 [stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor] 
 org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor: 
 stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor 
 exiting
 2013-04-24 14:28:50,850 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker: Unsetting META region 
 location in ZooKeeper
 2013-04-24 14:28:50,884 WARN  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
 /hbase/meta-region-server already deleted, retry=false
 2013-04-24 14:28:50,884 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.AssignmentManager: Cluster shutdown is set; 
 skipping assign of .META.,,1.1028785192
 2013-04-24 14:28:50,884 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.ServerManager: AssignmentManager hasn't 
 finished failover cleanup
 2013-04-24 14:29:46,188 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner] 
 org.apache.hadoop.hbase.master.cleaner.LogCleaner: 
 master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner exiting
 2013-04-24 14:29:46,193 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner] 
 org.apache.hadoop.hbase.master.cleaner.HFileCleaner: 
 master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner 
 exiting
 {code}
 ... but not it is stuck.
 We keep looping here:
 {code}
 master-stack-1.ent.cloudera.com,6,1366838923135 prio=10 
 tid=0x7f154853f000 nid=0x18b in Object.wait() [0x7f1545fde000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on 0xc727d738 (a 
 org.apache.hadoop.hbase.zookeeper.MetaRegionTracker)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:161)
 - locked 0xc727d738 (a 
 org.apache.hadoop.hbase.zookeeper.MetaRegionTracker)
 at 
 org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.waitMetaRegionLocation(MetaRegionTracker.java:105)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:250)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:299)
 at 
 org.apache.hadoop.hbase.master.HMaster.enableSSHandWaitForMeta(HMaster.java:905)
 at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:879)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:764)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:522)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 Odd.  It is supposed to be checking the 'stopped' flag; maybe it has wrong 
 stop flag.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please 

[jira] [Commented] (HBASE-8345) Add all available resources in o.a.h.h.rest.RootResource and VersionResource to o.a.h.h.rest.client.RemoteAdmin

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642680#comment-13642680
 ] 

Hudson commented on HBASE-8345:
---

Integrated in hbase-0.95 #163 (See 
[https://builds.apache.org/job/hbase-0.95/163/])
HBASE-8345 Add all available resources in RootResource and VersionResource 
to rest RemoteAdmin (Aleksandr Shulman) (Revision 1476027)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteAdmin.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/rest/client/TestRemoteAdmin.java


 Add all available resources in o.a.h.h.rest.RootResource and VersionResource 
 to o.a.h.h.rest.client.RemoteAdmin
 ---

 Key: HBASE-8345
 URL: https://issues.apache.org/jira/browse/HBASE-8345
 Project: HBase
  Issue Type: Improvement
  Components: Client, REST
Affects Versions: 0.94.6.1
Reporter: Aleksandr Shulman
Assignee: Aleksandr Shulman
Priority: Minor
  Labels: rest_api
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: HBASE-8345-v1.patch, HBASE-8345-v6-94.patch, 
 HBASE-8345-v6-trunk.patch


 In our built-in REST clients, we should add in more of the available REST 
 resources. This will allow more thorough testing of the REST API, 
 particularly with IntegrationTest.
 These clients are located in the o.a.h.h.rest.client package.
 In this case, I want to add the resources not already included in / and 
 /version to o.a.h.h.rest.client.RemoteAdmin. This includes, /status/cluster, 
 /version/rest and /version/cluster, among others.
 The RemoteAdmin class is a logical place for these methods because it is not 
 related to a specific table (those methods should go into RemoteHTable).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8415) DisabledRegionSplitPolicy

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642681#comment-13642681
 ] 

Hudson commented on HBASE-8415:
---

Integrated in hbase-0.95 #163 (See 
[https://builds.apache.org/job/hbase-0.95/163/])
HBASE-8415 DisabledRegionSplitPolicy (Revision 1475944)

 Result = FAILURE
enis : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DisabledRegionSplitPolicy.java


 DisabledRegionSplitPolicy
 -

 Key: HBASE-8415
 URL: https://issues.apache.org/jira/browse/HBASE-8415
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: hbase-8415_v1.patch


 Simple RegionSplitPolicy for tests, and some special cases where we want to 
 disable splits. Makes it easier and more explicit than using a 
 ConstantSizeRegionSplitPolicy with a large region size. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8299) ExploringCompactionPolicy can get stuck in rare cases.

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642682#comment-13642682
 ] 

Hudson commented on HBASE-8299:
---

Integrated in hbase-0.95 #163 (See 
[https://builds.apache.org/job/hbase-0.95/163/])
HBASE-8299 ExploringCompactionPolicy can get stuck in rare cases. (Revision 
1475965)

 Result = FAILURE
eclark : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreEngine.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreConfigInformation.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/ConstantSizeFileListGenerator.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/EverythingPolicy.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/ExplicitFileListGenerator.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/GaussianFileListGenerator.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/MockStoreFileGenerator.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/PerfTestCompactionPolicies.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SemiConstantSizeFileListGenerator.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SinusoidalFileListGenerator.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SpikyFileListGenerator.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/StoreFileListGenerator.java


 ExploringCompactionPolicy can get stuck in rare cases.
 --

 Key: HBASE-8299
 URL: https://issues.apache.org/jira/browse/HBASE-8299
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Fix For: 0.98.0, 0.95.1

 Attachments: HBASE-8299-0.patch, HBASE-8299-1.patch, 
 HBASE-8299-2.patch, HBASE-8299-3.patch


 If the files are very oddly sized then it's possible that 
 ExploringCompactionPolicy can get stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8428) Tighten up IntegrationTestsDriver filter

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642683#comment-13642683
 ] 

Hudson commented on HBASE-8428:
---

Integrated in hbase-0.95 #163 (See 
[https://builds.apache.org/job/hbase-0.95/163/])
HBASE-8428 Tighten up IntegrationTestsDriver filter (Revision 1475995)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/branches/0.95/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestsDriver.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/util/AbstractHBaseTool.java
* /hbase/branches/0.95/src/main/docbkx/developer.xml


 Tighten up IntegrationTestsDriver filter
 

 Key: HBASE-8428
 URL: https://issues.apache.org/jira/browse/HBASE-8428
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.95.1

 Attachments: 8428.txt


 Currently, filter that looks for IntegrationTests is broad.  Reports loads of 
 errors as we try to parse classes we don't care about.  Let me tighten it up 
 so it doesn't scare folks away.
 It is particular bad when being run against a distribute cluster when the 
 test context is not all present; here there are lots of ERROR reports about 
 classes not found.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5930) Limits the amount of time an edit can live in the memstore.

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642684#comment-13642684
 ] 

Hudson commented on HBASE-5930:
---

Integrated in hbase-0.95 #163 (See 
[https://builds.apache.org/job/hbase-0.95/163/])
HBASE-5930. Removed a configuration that was causing unnecessary flushes in 
tests. (Revision 1475991)
HBASE-5930. Limits the amount of time an edit can live in the memstore. 
(Revision 1475874)

 Result = FAILURE
ddas : 
Files : 
* /hbase/branches/0.95/hbase-server/src/test/resources/hbase-site.xml

ddas : 
Files : 
* /hbase/branches/0.95/hbase-common/src/main/resources/hbase-default.xml
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequester.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java


 Limits the amount of time an edit can live in the memstore.
 ---

 Key: HBASE-5930
 URL: https://issues.apache.org/jira/browse/HBASE-5930
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Devaraj Das
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: 5930-0.94.txt, 5930-1.patch, 5930-2.1.patch, 
 5930-2.2.patch, 5930-2.3.patch, 5930-2.4.patch, 5930-track-oldest-sample.txt, 
 5930-wip.patch, HBASE-5930-ADD-0.patch, hbase-5930-addendum2.patch, 
 hbase-5930-test-execution.log


 A colleague of mine ran into an interesting issue.
 He inserted some data with the WAL disabled, which happened to fit in the 
 aggregate Memstores memory.
 Two weeks later he a had problem with the HDFS cluster, which caused the 
 region servers to abort. He found that his data was lost. Looking at the log 
 we found that the Memstores were not flushed at all during these two weeks.
 Should we have an option to flush memstores periodically. There are obvious 
 downsides to this, like many small storefiles, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2013-04-26 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642685#comment-13642685
 ] 

Nicolas Liochon commented on HBASE-6435:


During the tests on the impact of waiting for the end of hdfs recoverLease, it 
appeared:
 - there is a bug, and somes files are not detected.
 - we have a dependency on the machine name (issue if a machine has multiple 
names).

HDFS-4754 supercedes this, so, to keep things simple and limit the number of 
possible configuration my plan is:
 - make sure that HDFS-4754  makes it to a reasonable number of hdfs branches.
 - revert this.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.95.0

 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 
 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch, 
 6435.v14.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 
 6435.v9.patch, 6535.v11.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure. This log is 
 written on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8415) DisabledRegionSplitPolicy

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642741#comment-13642741
 ] 

Hudson commented on HBASE-8415:
---

Integrated in HBase-0.94 #968 (See 
[https://builds.apache.org/job/HBase-0.94/968/])
HBASE-8415 DisabledRegionSplitPolicy (Revision 1475946)

 Result = SUCCESS
enis : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/DisabledRegionSplitPolicy.java


 DisabledRegionSplitPolicy
 -

 Key: HBASE-8415
 URL: https://issues.apache.org/jira/browse/HBASE-8415
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: hbase-8415_v1.patch


 Simple RegionSplitPolicy for tests, and some special cases where we want to 
 disable splits. Makes it easier and more explicit than using a 
 ConstantSizeRegionSplitPolicy with a large region size. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8345) Add all available resources in o.a.h.h.rest.RootResource and VersionResource to o.a.h.h.rest.client.RemoteAdmin

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642740#comment-13642740
 ] 

Hudson commented on HBASE-8345:
---

Integrated in HBase-0.94 #968 (See 
[https://builds.apache.org/job/HBase-0.94/968/])
HBASE-8345 Add all available resources in RootResource and VersionResource 
to rest RemoteAdmin (Aleksandr Shulman) (Revision 1476028)

 Result = SUCCESS
jxiang : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteAdmin.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/rest/client/TestRemoteAdmin.java


 Add all available resources in o.a.h.h.rest.RootResource and VersionResource 
 to o.a.h.h.rest.client.RemoteAdmin
 ---

 Key: HBASE-8345
 URL: https://issues.apache.org/jira/browse/HBASE-8345
 Project: HBase
  Issue Type: Improvement
  Components: Client, REST
Affects Versions: 0.94.6.1
Reporter: Aleksandr Shulman
Assignee: Aleksandr Shulman
Priority: Minor
  Labels: rest_api
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: HBASE-8345-v1.patch, HBASE-8345-v6-94.patch, 
 HBASE-8345-v6-trunk.patch


 In our built-in REST clients, we should add in more of the available REST 
 resources. This will allow more thorough testing of the REST API, 
 particularly with IntegrationTest.
 These clients are located in the o.a.h.h.rest.client package.
 In this case, I want to add the resources not already included in / and 
 /version to o.a.h.h.rest.client.RemoteAdmin. This includes, /status/cluster, 
 /version/rest and /version/cluster, among others.
 The RemoteAdmin class is a logical place for these methods because it is not 
 related to a specific table (those methods should go into RemoteHTable).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8432) a table with unbalanced regions will balance indefinitely with the 'org.apache.hadoop.hbase.master.DefaultLoadBalancer'

2013-04-26 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642785#comment-13642785
 ] 

Jean-Marc Spaggiari commented on HBASE-8432:


Thanks for the followup [~aaronwq]. Have you tried also the other scenarios? 
Like regions#  RS#/2 and regions#  RS#? Are they all still working fine?

 a table with unbalanced regions will balance indefinitely with the 
 'org.apache.hadoop.hbase.master.DefaultLoadBalancer'
 ---

 Key: HBASE-8432
 URL: https://issues.apache.org/jira/browse/HBASE-8432
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 0.94.5
 Environment: Linux 2.6.32-el5.x86_64
Reporter: Wang Qiang
Priority: Critical
 Attachments: patch_20130425_01.txt


 it happened that a table with unbalanced regions, as follows in my 
 cluster(the cluster has 20 regionservers, the table has 12 regions):
 http://hadoopdev19.cm6:60030/ 1
 http://hadoopdev8.cm6:60030/  2
 http://hadoopdev17.cm6:60030/ 1
 http://hadoopdev12.cm6:60030/ 1
 http://hadoopdev5.cm6:60030/  1
 http://hadoopdev9.cm6:60030/  1
 http://hadoopdev22.cm6:60030/ 1
 http://hadoopdev11.cm6:60030/ 1
 http://hadoopdev21.cm6:60030/ 1
 http://hadoopdev16.cm6:60030/ 1
 http://hadoopdev10.cm6:60030/ 1
 with the 'org.apache.hadoop.hbase.master.DefaultLoadBalancer', after 5 times 
 load-balances, the table are still unbalanced:
 http://hadoopdev3.cm6:60030/  1
 http://hadoopdev20.cm6:60030/ 1
 http://hadoopdev4.cm6:60030/  2
 http://hadoopdev18.cm6:60030/ 1
 http://hadoopdev12.cm6:60030/ 1
 http://hadoopdev14.cm6:60030/ 1
 http://hadoopdev15.cm6:60030/ 1
 http://hadoopdev6.cm6:60030/  1
 http://hadoopdev13.cm6:60030/ 1
 http://hadoopdev11.cm6:60030/ 1
 http://hadoopdev10.cm6:60030/ 1
 http://hadoopdev19.cm6:60030/ 1
 http://hadoopdev17.cm6:60030/ 1
 http://hadoopdev8.cm6:60030/  1
 http://hadoopdev5.cm6:60030/  1
 http://hadoopdev12.cm6:60030/ 1
 http://hadoopdev22.cm6:60030/ 1
 http://hadoopdev11.cm6:60030/ 1
 http://hadoopdev21.cm6:60030/ 1
 http://hadoopdev7.cm6:60030/  2
 http://hadoopdev10.cm6:60030/ 1
 http://hadoopdev16.cm6:60030/ 1
 http://hadoopdev3.cm6:60030/  1
 http://hadoopdev20.cm6:60030/ 1
 http://hadoopdev4.cm6:60030/  1
 http://hadoopdev18.cm6:60030/ 2
 http://hadoopdev12.cm6:60030/ 1
 http://hadoopdev14.cm6:60030/ 1
 http://hadoopdev15.cm6:60030/ 1
 http://hadoopdev6.cm6:60030/  1
 http://hadoopdev13.cm6:60030/ 1
 http://hadoopdev11.cm6:60030/ 1
 http://hadoopdev10.cm6:60030/ 1
 http://hadoopdev19.cm6:60030/ 1
 http://hadoopdev8.cm6:60030/  1
 http://hadoopdev17.cm6:60030/ 1
 http://hadoopdev12.cm6:60030/ 1
 http://hadoopdev5.cm6:60030/  1
 http://hadoopdev22.cm6:60030/ 1
 http://hadoopdev11.cm6:60030/ 1
 http://hadoopdev7.cm6:60030/  1
 http://hadoopdev21.cm6:60030/ 2
 http://hadoopdev16.cm6:60030/ 1
 http://hadoopdev10.cm6:60030/ 1
 http://hadoopdev3.cm6:60030/  1
 http://hadoopdev20.cm6:60030/ 1
 http://hadoopdev18.cm6:60030/ 1
 http://hadoopdev4.cm6:60030/  1
 http://hadoopdev12.cm6:60030/ 1
 http://hadoopdev15.cm6:60030/ 1
 http://hadoopdev14.cm6:60030/ 2
 http://hadoopdev6.cm6:60030/  1
 http://hadoopdev13.cm6:60030/ 1
 http://hadoopdev11.cm6:60030/ 1
 http://hadoopdev10.cm6:60030/ 1
 from the above logs, we can also find that some regions needn't move, but 
 they moved. follow into 
 'org.apache.hadoop.hbase.master.DefaultLoadBalancer.balanceCluster()', I 
 found that 'maxToTake' is error calculated. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8024) Make Store flush algorithm pluggable

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642790#comment-13642790
 ] 

Hudson commented on HBASE-8024:
---

Integrated in hbase-0.95-on-hadoop2 #81 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/])
HBASE-8024 Make Store flush algorithm pluggable (Revision 1475871)

 Result = FAILURE
sershe : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreEngine.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreFlusher.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreEngine.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlushContext.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlusher.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultStoreEngine.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java


 Make Store flush algorithm pluggable
 

 Key: HBASE-8024
 URL: https://issues.apache.org/jira/browse/HBASE-8024
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.94.5, 0.95.0, 0.95.2
Reporter: Maryann Xue
Assignee: Sergey Shelukhin
 Fix For: 0.95.1

 Attachments: HBASE-8024-trunk.patch, HBASE-8024.v2.patch, 
 HBASE-8024-v3.patch, HBASE-8024-v4.patch


 The idea is to make StoreFlusher an interface instead of an implementation 
 class, and have the original StoreFlusher as the default store flush impl.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8392) TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 profile

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642791#comment-13642791
 ] 

Hudson commented on HBASE-8392:
---

Integrated in hbase-0.95-on-hadoop2 #81 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/])
HBASE-8392 TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 
profile (Revision 1475997)

 Result = FAILURE
eclark : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestExactCounterMetric.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestExponentiallyDecayingSample.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsHistogram.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsMBeanBase.java


 TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 profile
 

 Key: HBASE-8392
 URL: https://issues.apache.org/jira/browse/HBASE-8392
 Project: HBase
  Issue Type: Sub-task
  Components: hadoop2, metrics, test
Affects Versions: 0.98.0, 0.95.0
Reporter: Jonathan Hsieh
Assignee: Elliott Clark
 Fix For: 0.98.0, 0.95.1

 Attachments: HBASE-8392-0.patch


 This specific small unit tests flakes out occasionally and blocks the medium 
 and large tests from running.
 Here's an error trace:
 {code}
 Error Message
 expected:2.0 but was:0.125
 Stacktrace
 junit.framework.AssertionFailedError: expected:2.0 but was:0.125
   at junit.framework.Assert.fail(Assert.java:57)
   at junit.framework.Assert.failNotEquals(Assert.java:329)
   at junit.framework.Assert.assertEquals(Assert.java:120)
   at junit.framework.Assert.assertEquals(Assert.java:129)
   at junit.framework.TestCase.assertEquals(TestCase.java:288)
   at 
 org.apache.hadoop.hbase.metrics.TestMetricsMBeanBase.testGetAttribute(TestMetricsMBeanBase.java:93)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:176)
   at junit.framework.TestCase.runBare(TestCase.java:141)
   at junit.framework.TestResult$1.protect(TestResult.java:122)
   at junit.framework.TestResult.runProtected(TestResult.java:142)
   at junit.framework.TestResult.run(TestResult.java:125)
   at junit.framework.TestCase.run(TestCase.java:129)
   at junit.framework.TestSuite.runTest(TestSuite.java:255)
   at junit.framework.TestSuite.run(TestSuite.java:250)
   at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
   at org.junit.runners.Suite.runChild(Suite.java:127)
   at org.junit.runners.Suite.runChild(Suite.java:26)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 [~eclark] took a quick look and will chime in on this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8393) Testcase TestHeapSize#testMutations is wrong

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642792#comment-13642792
 ] 

Hudson commented on HBASE-8393:
---

Integrated in hbase-0.95-on-hadoop2 #81 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/])
HBASE-8393 Testcase TestHeapSize#testMutations is wrong (Jeffrey) (Revision 
1476024)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/io/TestHeapSize.java


 Testcase TestHeapSize#testMutations is wrong
 

 Key: HBASE-8393
 URL: https://issues.apache.org/jira/browse/HBASE-8393
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0, 0.95.1

 Attachments: hbase-8393.patch


 I happened to check this test case and there are several existing errors to 
 make it pass. You can reproduce the test case failure by adding a new field 
 into Mutation, the test case will either fail on a 64 bit system or 32 bit 
 one.
 Below are errors I found in the test case:
 1) The test case is using {code}row=new byte[]{0}{code} which is an array 
 with length=1 while ClassSize.estimateBase can only calculate base class 
 size(without counting field array length)
 2) Add ClassSize.REFERENCE twice in the following code because 
 ClassSize.estimateBase adds all reference fields already. {code}expected += 
 ClassSize.align(ClassSize.TREEMAP + ClassSize.REFERENCE);{code}
 3) ClassSize.estimateBase round up the sum of length of reference fields + 
 primitive fields + Array while Mutation.MUTATION_OVERHEAD aligns the sum of 
 length of a different set of fields. Therefore, there will be round up 
 differences for class Increment because it introduces a new reference field 
 TimeRange tr when the test case runs on a 32bit and 64 bit system.   
 {code}
 ...
 long prealign_size = coeff[0] + align(coeff[1] * ARRAY) + coeff[2] * 
 REFERENCE;
 // Round up to a multiple of 8
 long size = align(prealign_size);
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8422) Master won't go down. Stuck waiting on .META. to come on line.

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642793#comment-13642793
 ] 

Hudson commented on HBASE-8422:
---

Integrated in hbase-0.95-on-hadoop2 #81 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/])
HBASE-8422 Master won't go down. Stuck waiting on .META. to come on line 
(Revision 1475987)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java


 Master won't go down.  Stuck waiting on .META. to come on line.
 ---

 Key: HBASE-8422
 URL: https://issues.apache.org/jira/browse/HBASE-8422
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.0
Reporter: stack
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: HBASE-8422_2.patch, HBASE-8422_3.patch, 
 HBASE-8422_94.patch, HBASE-8422.patch


 Master came up w/ no regionservers.  I then tried to shut it down.  You can 
 see in below that it started to go down
 {code}
 2013-04-24 14:28:49,770 INFO  [IPC Server handler 7 on 6] 
 org.apache.hadoop.hbase.master.HMaster: Cluster shutdown requested
 2013-04-24 14:28:49,815 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.ServerManager: Finished waiting for region 
 servers count to settle; checked in 0, slept for 2818 ms, expecting minimum 
 of 1, maximum of 2147483647, master is stopped.
 2013-04-24 14:28:49,815 WARN  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.MasterFileSystem: Master stopped while 
 splitting logs
 2013-04-24 14:28:50,104 INFO  
 [stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor] 
 org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor: 
 stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor 
 exiting
 2013-04-24 14:28:50,850 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker: Unsetting META region 
 location in ZooKeeper
 2013-04-24 14:28:50,884 WARN  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
 /hbase/meta-region-server already deleted, retry=false
 2013-04-24 14:28:50,884 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.AssignmentManager: Cluster shutdown is set; 
 skipping assign of .META.,,1.1028785192
 2013-04-24 14:28:50,884 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.ServerManager: AssignmentManager hasn't 
 finished failover cleanup
 2013-04-24 14:29:46,188 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner] 
 org.apache.hadoop.hbase.master.cleaner.LogCleaner: 
 master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner exiting
 2013-04-24 14:29:46,193 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner] 
 org.apache.hadoop.hbase.master.cleaner.HFileCleaner: 
 master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner 
 exiting
 {code}
 ... but not it is stuck.
 We keep looping here:
 {code}
 master-stack-1.ent.cloudera.com,6,1366838923135 prio=10 
 tid=0x7f154853f000 nid=0x18b in Object.wait() [0x7f1545fde000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on 0xc727d738 (a 
 org.apache.hadoop.hbase.zookeeper.MetaRegionTracker)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:161)
 - locked 0xc727d738 (a 
 org.apache.hadoop.hbase.zookeeper.MetaRegionTracker)
 at 
 org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.waitMetaRegionLocation(MetaRegionTracker.java:105)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:250)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:299)
 at 
 org.apache.hadoop.hbase.master.HMaster.enableSSHandWaitForMeta(HMaster.java:905)
 at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:879)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:764)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:522)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 Odd.  It is supposed to be checking the 'stopped' flag; maybe it has wrong 
 stop flag.

--
This message is automatically generated by JIRA.
If you think it was sent 

[jira] [Commented] (HBASE-8345) Add all available resources in o.a.h.h.rest.RootResource and VersionResource to o.a.h.h.rest.client.RemoteAdmin

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642794#comment-13642794
 ] 

Hudson commented on HBASE-8345:
---

Integrated in hbase-0.95-on-hadoop2 #81 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/])
HBASE-8345 Add all available resources in RootResource and VersionResource 
to rest RemoteAdmin (Aleksandr Shulman) (Revision 1476027)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteAdmin.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/rest/client/TestRemoteAdmin.java


 Add all available resources in o.a.h.h.rest.RootResource and VersionResource 
 to o.a.h.h.rest.client.RemoteAdmin
 ---

 Key: HBASE-8345
 URL: https://issues.apache.org/jira/browse/HBASE-8345
 Project: HBase
  Issue Type: Improvement
  Components: Client, REST
Affects Versions: 0.94.6.1
Reporter: Aleksandr Shulman
Assignee: Aleksandr Shulman
Priority: Minor
  Labels: rest_api
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: HBASE-8345-v1.patch, HBASE-8345-v6-94.patch, 
 HBASE-8345-v6-trunk.patch


 In our built-in REST clients, we should add in more of the available REST 
 resources. This will allow more thorough testing of the REST API, 
 particularly with IntegrationTest.
 These clients are located in the o.a.h.h.rest.client package.
 In this case, I want to add the resources not already included in / and 
 /version to o.a.h.h.rest.client.RemoteAdmin. This includes, /status/cluster, 
 /version/rest and /version/cluster, among others.
 The RemoteAdmin class is a logical place for these methods because it is not 
 related to a specific table (those methods should go into RemoteHTable).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8415) DisabledRegionSplitPolicy

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642795#comment-13642795
 ] 

Hudson commented on HBASE-8415:
---

Integrated in hbase-0.95-on-hadoop2 #81 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/])
HBASE-8415 DisabledRegionSplitPolicy (Revision 1475944)

 Result = FAILURE
enis : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DisabledRegionSplitPolicy.java


 DisabledRegionSplitPolicy
 -

 Key: HBASE-8415
 URL: https://issues.apache.org/jira/browse/HBASE-8415
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: hbase-8415_v1.patch


 Simple RegionSplitPolicy for tests, and some special cases where we want to 
 disable splits. Makes it easier and more explicit than using a 
 ConstantSizeRegionSplitPolicy with a large region size. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8299) ExploringCompactionPolicy can get stuck in rare cases.

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642796#comment-13642796
 ] 

Hudson commented on HBASE-8299:
---

Integrated in hbase-0.95-on-hadoop2 #81 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/])
HBASE-8299 ExploringCompactionPolicy can get stuck in rare cases. (Revision 
1475965)

 Result = FAILURE
eclark : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreEngine.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreConfigInformation.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/ConstantSizeFileListGenerator.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/EverythingPolicy.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/ExplicitFileListGenerator.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/GaussianFileListGenerator.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/MockStoreFileGenerator.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/PerfTestCompactionPolicies.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SemiConstantSizeFileListGenerator.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SinusoidalFileListGenerator.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SpikyFileListGenerator.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/StoreFileListGenerator.java


 ExploringCompactionPolicy can get stuck in rare cases.
 --

 Key: HBASE-8299
 URL: https://issues.apache.org/jira/browse/HBASE-8299
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Fix For: 0.98.0, 0.95.1

 Attachments: HBASE-8299-0.patch, HBASE-8299-1.patch, 
 HBASE-8299-2.patch, HBASE-8299-3.patch


 If the files are very oddly sized then it's possible that 
 ExploringCompactionPolicy can get stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8428) Tighten up IntegrationTestsDriver filter

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642797#comment-13642797
 ] 

Hudson commented on HBASE-8428:
---

Integrated in hbase-0.95-on-hadoop2 #81 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/])
HBASE-8428 Tighten up IntegrationTestsDriver filter (Revision 1475995)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/branches/0.95/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestsDriver.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/util/AbstractHBaseTool.java
* /hbase/branches/0.95/src/main/docbkx/developer.xml


 Tighten up IntegrationTestsDriver filter
 

 Key: HBASE-8428
 URL: https://issues.apache.org/jira/browse/HBASE-8428
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.95.1

 Attachments: 8428.txt


 Currently, filter that looks for IntegrationTests is broad.  Reports loads of 
 errors as we try to parse classes we don't care about.  Let me tighten it up 
 so it doesn't scare folks away.
 It is particular bad when being run against a distribute cluster when the 
 test context is not all present; here there are lots of ERROR reports about 
 classes not found.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5930) Limits the amount of time an edit can live in the memstore.

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642798#comment-13642798
 ] 

Hudson commented on HBASE-5930:
---

Integrated in hbase-0.95-on-hadoop2 #81 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/81/])
HBASE-5930. Removed a configuration that was causing unnecessary flushes in 
tests. (Revision 1475991)
HBASE-5930. Limits the amount of time an edit can live in the memstore. 
(Revision 1475874)

 Result = FAILURE
ddas : 
Files : 
* /hbase/branches/0.95/hbase-server/src/test/resources/hbase-site.xml

ddas : 
Files : 
* /hbase/branches/0.95/hbase-common/src/main/resources/hbase-default.xml
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequester.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java


 Limits the amount of time an edit can live in the memstore.
 ---

 Key: HBASE-5930
 URL: https://issues.apache.org/jira/browse/HBASE-5930
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Devaraj Das
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: 5930-0.94.txt, 5930-1.patch, 5930-2.1.patch, 
 5930-2.2.patch, 5930-2.3.patch, 5930-2.4.patch, 5930-track-oldest-sample.txt, 
 5930-wip.patch, HBASE-5930-ADD-0.patch, hbase-5930-addendum2.patch, 
 hbase-5930-test-execution.log


 A colleague of mine ran into an interesting issue.
 He inserted some data with the WAL disabled, which happened to fit in the 
 aggregate Memstores memory.
 Two weeks later he a had problem with the HDFS cluster, which caused the 
 region servers to abort. He found that his data was lost. Looking at the log 
 we found that the Memstores were not flushed at all during these two weeks.
 Should we have an option to flush memstores periodically. There are obvious 
 downsides to this, like many small storefiles, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8393) Testcase TestHeapSize#testMutations is wrong

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642826#comment-13642826
 ] 

Hudson commented on HBASE-8393:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/])
HBASE-8393 Testcase TestHeapSize#testMutations is wrong (Jeffrey) (Revision 
1476022)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/TestHeapSize.java


 Testcase TestHeapSize#testMutations is wrong
 

 Key: HBASE-8393
 URL: https://issues.apache.org/jira/browse/HBASE-8393
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0, 0.95.1

 Attachments: hbase-8393.patch


 I happened to check this test case and there are several existing errors to 
 make it pass. You can reproduce the test case failure by adding a new field 
 into Mutation, the test case will either fail on a 64 bit system or 32 bit 
 one.
 Below are errors I found in the test case:
 1) The test case is using {code}row=new byte[]{0}{code} which is an array 
 with length=1 while ClassSize.estimateBase can only calculate base class 
 size(without counting field array length)
 2) Add ClassSize.REFERENCE twice in the following code because 
 ClassSize.estimateBase adds all reference fields already. {code}expected += 
 ClassSize.align(ClassSize.TREEMAP + ClassSize.REFERENCE);{code}
 3) ClassSize.estimateBase round up the sum of length of reference fields + 
 primitive fields + Array while Mutation.MUTATION_OVERHEAD aligns the sum of 
 length of a different set of fields. Therefore, there will be round up 
 differences for class Increment because it introduces a new reference field 
 TimeRange tr when the test case runs on a 32bit and 64 bit system.   
 {code}
 ...
 long prealign_size = coeff[0] + align(coeff[1] * ARRAY) + coeff[2] * 
 REFERENCE;
 // Round up to a multiple of 8
 long size = align(prealign_size);
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8024) Make Store flush algorithm pluggable

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642824#comment-13642824
 ] 

Hudson commented on HBASE-8024:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/])
HBASE-8024 Make Store flush algorithm pluggable (Revision 1475870)

 Result = FAILURE
sershe : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreEngine.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreFlusher.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreEngine.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlushContext.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlusher.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultStoreEngine.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java


 Make Store flush algorithm pluggable
 

 Key: HBASE-8024
 URL: https://issues.apache.org/jira/browse/HBASE-8024
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.94.5, 0.95.0, 0.95.2
Reporter: Maryann Xue
Assignee: Sergey Shelukhin
 Fix For: 0.95.1

 Attachments: HBASE-8024-trunk.patch, HBASE-8024.v2.patch, 
 HBASE-8024-v3.patch, HBASE-8024-v4.patch


 The idea is to make StoreFlusher an interface instead of an implementation 
 class, and have the original StoreFlusher as the default store flush impl.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests

2013-04-26 Thread Eric Newton (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642823#comment-13642823
 ] 

Eric Newton commented on HBASE-8389:


bq. Can you elaborate - how many recovery attempts for success and also how 
long b/w retries ?

After the tablet server loses its lock in zookeeper, the master waits 10s and 
calls recoverLease which returns false.  After 5s, recoverLease is retried and 
succeeds.  These are the default values for the timeouts.



 HBASE-8354 forces Namenode into loop with lease recovery requests
 -

 Key: HBASE-8389
 URL: https://issues.apache.org/jira/browse/HBASE-8389
 Project: HBase
  Issue Type: Bug
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.94.8

 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 
 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 
 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, 
 sample.patch


 We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease 
 recoveries because of the short retry interval of 1 second between lease 
 recoveries.
 The namenode gets into the following loop:
 1) Receives lease recovery request and initiates recovery choosing a primary 
 datanode every second
 2) A lease recovery is successful and the namenode tries to commit the block 
 under recovery as finalized - this takes  10 seconds in our environment 
 since we run with tight HDFS socket timeouts.
 3) At step 2), there is a more recent recovery enqueued because of the 
 aggressive retries. This causes the committed block to get preempted and we 
 enter a vicious cycle
 So we do,  initiate_recovery -- commit_block -- 
 commit_preempted_by_another_recovery
 This loop is paused after 300 seconds which is the 
 hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes 
 which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node 
 detection timeout is 20 seconds.
 Note that before the patch, we do not call recoverLease so aggressively - 
 also it seems that the HDFS namenode is pretty dumb in that it keeps 
 initiating new recoveries for every call. Before the patch, we call 
 recoverLease, assume that the block was recovered, try to get the file, it 
 has zero length since its under recovery, we fail the task and retry until we 
 get a non zero length. So things just work.
 Fixes:
 1) Expecting recovery to occur within 1 second is too aggressive. We need to 
 have a more generous timeout. The timeout needs to be configurable since 
 typically, the recovery takes as much time as the DFS timeouts. The primary 
 datanode doing the recovery tries to reconcile the blocks and hits the 
 timeouts when it tries to contact the dead node. So the recovery is as fast 
 as the HDFS timeouts.
 2) We have another issue I report in HDFS 4721. The Namenode chooses the 
 stale datanode to perform the recovery (since its still alive). Hence the 
 first recovery request is bound to fail. So if we want a tight MTTR, we 
 either need something like HDFS 4721 or we need something like this
   recoverLease(...)
   sleep(1000)
   recoverLease(...)
   sleep(configuredTimeout)
   recoverLease(...)
   sleep(configuredTimeout)
 Where configuredTimeout should be large enough to let the recovery happen but 
 the first timeout is short so that we get past the moot recovery in step #1.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8271) Book updates for changes to GC options in shell scripts

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642834#comment-13642834
 ] 

Hudson commented on HBASE-8271:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/])
HBASE-8271 Book updates for changes to GC options in shell scripts 
(Revision 1476037)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/src/main/docbkx/troubleshooting.xml


 Book updates for changes to GC options in shell scripts
 ---

 Key: HBASE-8271
 URL: https://issues.apache.org/jira/browse/HBASE-8271
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Reporter: Jesse Yates
Priority: Minor
 Fix For: 0.98.0

 Attachments: HBASE-8271.patch


 http://hbase.apache.org/book/trouble.log.html is a bit out of date as the 
 'right' way to do GC logging is via the GC_OPTS, rather than going through 
 the general HBASE_OPTS.
 Follow up to HBASE-7817

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8392) TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 profile

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642825#comment-13642825
 ] 

Hudson commented on HBASE-8392:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/])
HBASE-8392 TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 
profile (Revision 1475998)

 Result = FAILURE
eclark : 
Files : 
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestExactCounterMetric.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestExponentiallyDecayingSample.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsHistogram.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsMBeanBase.java


 TestMetricMBeanBase#testGetAttribute is flakey under hadoop2 profile
 

 Key: HBASE-8392
 URL: https://issues.apache.org/jira/browse/HBASE-8392
 Project: HBase
  Issue Type: Sub-task
  Components: hadoop2, metrics, test
Affects Versions: 0.98.0, 0.95.0
Reporter: Jonathan Hsieh
Assignee: Elliott Clark
 Fix For: 0.98.0, 0.95.1

 Attachments: HBASE-8392-0.patch


 This specific small unit tests flakes out occasionally and blocks the medium 
 and large tests from running.
 Here's an error trace:
 {code}
 Error Message
 expected:2.0 but was:0.125
 Stacktrace
 junit.framework.AssertionFailedError: expected:2.0 but was:0.125
   at junit.framework.Assert.fail(Assert.java:57)
   at junit.framework.Assert.failNotEquals(Assert.java:329)
   at junit.framework.Assert.assertEquals(Assert.java:120)
   at junit.framework.Assert.assertEquals(Assert.java:129)
   at junit.framework.TestCase.assertEquals(TestCase.java:288)
   at 
 org.apache.hadoop.hbase.metrics.TestMetricsMBeanBase.testGetAttribute(TestMetricsMBeanBase.java:93)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:176)
   at junit.framework.TestCase.runBare(TestCase.java:141)
   at junit.framework.TestResult$1.protect(TestResult.java:122)
   at junit.framework.TestResult.runProtected(TestResult.java:142)
   at junit.framework.TestResult.run(TestResult.java:125)
   at junit.framework.TestCase.run(TestCase.java:129)
   at junit.framework.TestSuite.runTest(TestSuite.java:255)
   at junit.framework.TestSuite.run(TestSuite.java:250)
   at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
   at org.junit.runners.Suite.runChild(Suite.java:127)
   at org.junit.runners.Suite.runChild(Suite.java:26)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 [~eclark] took a quick look and will chime in on this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8415) DisabledRegionSplitPolicy

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642830#comment-13642830
 ] 

Hudson commented on HBASE-8415:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/])
HBASE-8415 DisabledRegionSplitPolicy (Revision 1475943)

 Result = FAILURE
enis : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DisabledRegionSplitPolicy.java


 DisabledRegionSplitPolicy
 -

 Key: HBASE-8415
 URL: https://issues.apache.org/jira/browse/HBASE-8415
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: hbase-8415_v1.patch


 Simple RegionSplitPolicy for tests, and some special cases where we want to 
 disable splits. Makes it easier and more explicit than using a 
 ConstantSizeRegionSplitPolicy with a large region size. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8444) Acknowledge that 0.95+ requires 1.0.3 hadoop at least.

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642827#comment-13642827
 ] 

Hudson commented on HBASE-8444:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/])
HBASE-8444 Acknowledge that 0.95+ requires 1.0.3 hadoop at least (Revision 
1476036)

 Result = FAILURE
stack : 
Files : 
* /hbase/trunk/src/main/docbkx/configuration.xml


 Acknowledge that 0.95+ requires 1.0.3 hadoop at least.
 --

 Key: HBASE-8444
 URL: https://issues.apache.org/jira/browse/HBASE-8444
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.98.0

 Attachments: 8444.txt


 As per this mail thread, 
 http://search-hadoop.com/m/stbKO1YNWZe/Compile+does+not+work+against+Hadoop-1.0.0+-+1.0.2subj=Re+Compile+does+not+work+against+Hadoop+1+0+0+1+0+2
 ... 0.95.x requires hadoop 1.0.3 at least.  Note it in the refguide hadoop 
 section.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8422) Master won't go down. Stuck waiting on .META. to come on line.

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642828#comment-13642828
 ] 

Hudson commented on HBASE-8422:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/])
HBASE-8422 Master won't go down. Stuck waiting on .META. to come on line 
(Revision 1475986)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java


 Master won't go down.  Stuck waiting on .META. to come on line.
 ---

 Key: HBASE-8422
 URL: https://issues.apache.org/jira/browse/HBASE-8422
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.0
Reporter: stack
Assignee: rajeshbabu
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: HBASE-8422_2.patch, HBASE-8422_3.patch, 
 HBASE-8422_94.patch, HBASE-8422.patch


 Master came up w/ no regionservers.  I then tried to shut it down.  You can 
 see in below that it started to go down
 {code}
 2013-04-24 14:28:49,770 INFO  [IPC Server handler 7 on 6] 
 org.apache.hadoop.hbase.master.HMaster: Cluster shutdown requested
 2013-04-24 14:28:49,815 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.ServerManager: Finished waiting for region 
 servers count to settle; checked in 0, slept for 2818 ms, expecting minimum 
 of 1, maximum of 2147483647, master is stopped.
 2013-04-24 14:28:49,815 WARN  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.MasterFileSystem: Master stopped while 
 splitting logs
 2013-04-24 14:28:50,104 INFO  
 [stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor] 
 org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor: 
 stack-1.ent.cloudera.com,6,1366838923135.splitLogManagerTimeoutMonitor 
 exiting
 2013-04-24 14:28:50,850 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker: Unsetting META region 
 location in ZooKeeper
 2013-04-24 14:28:50,884 WARN  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
 /hbase/meta-region-server already deleted, retry=false
 2013-04-24 14:28:50,884 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.AssignmentManager: Cluster shutdown is set; 
 skipping assign of .META.,,1.1028785192
 2013-04-24 14:28:50,884 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135] 
 org.apache.hadoop.hbase.master.ServerManager: AssignmentManager hasn't 
 finished failover cleanup
 2013-04-24 14:29:46,188 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner] 
 org.apache.hadoop.hbase.master.cleaner.LogCleaner: 
 master-stack-1.ent.cloudera.com,6,1366838923135.oldLogCleaner exiting
 2013-04-24 14:29:46,193 INFO  
 [master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner] 
 org.apache.hadoop.hbase.master.cleaner.HFileCleaner: 
 master-stack-1.ent.cloudera.com,6,1366838923135.archivedHFileCleaner 
 exiting
 {code}
 ... but not it is stuck.
 We keep looping here:
 {code}
 master-stack-1.ent.cloudera.com,6,1366838923135 prio=10 
 tid=0x7f154853f000 nid=0x18b in Object.wait() [0x7f1545fde000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on 0xc727d738 (a 
 org.apache.hadoop.hbase.zookeeper.MetaRegionTracker)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:161)
 - locked 0xc727d738 (a 
 org.apache.hadoop.hbase.zookeeper.MetaRegionTracker)
 at 
 org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.waitMetaRegionLocation(MetaRegionTracker.java:105)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:250)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:299)
 at 
 org.apache.hadoop.hbase.master.HMaster.enableSSHandWaitForMeta(HMaster.java:905)
 at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:879)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:764)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:522)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 Odd.  It is supposed to be checking the 'stopped' flag; maybe it has wrong 
 stop flag.

--
This message is automatically generated by JIRA.
If you think it was sent 

[jira] [Commented] (HBASE-8299) ExploringCompactionPolicy can get stuck in rare cases.

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642831#comment-13642831
 ] 

Hudson commented on HBASE-8299:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/])
HBASE-8299 ExploringCompactionPolicy can get stuck in rare cases. (Revision 
1475966)

 Result = FAILURE
eclark : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultStoreEngine.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreConfigInformation.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/ConstantSizeFileListGenerator.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/EverythingPolicy.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/ExplicitFileListGenerator.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/GaussianFileListGenerator.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/MockStoreFileGenerator.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/PerfTestCompactionPolicies.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SemiConstantSizeFileListGenerator.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SinusoidalFileListGenerator.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/SpikyFileListGenerator.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/StoreFileListGenerator.java


 ExploringCompactionPolicy can get stuck in rare cases.
 --

 Key: HBASE-8299
 URL: https://issues.apache.org/jira/browse/HBASE-8299
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Fix For: 0.98.0, 0.95.1

 Attachments: HBASE-8299-0.patch, HBASE-8299-1.patch, 
 HBASE-8299-2.patch, HBASE-8299-3.patch


 If the files are very oddly sized then it's possible that 
 ExploringCompactionPolicy can get stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5930) Limits the amount of time an edit can live in the memstore.

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642833#comment-13642833
 ] 

Hudson commented on HBASE-5930:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/])
HBASE-5930. Removed a configuration that was causing unnecessary flushes in 
tests. (Revision 1475990)
HBASE-5930 Limits the amount of time an edit can live in the memstore. 
(Revision 1475970)
HBASE-5930. Limits the amount of time an edit can live in the memstore. 
(Revision 1475872)

 Result = FAILURE
ddas : 
Files : 
* /hbase/trunk/hbase-server/src/test/resources/hbase-site.xml

eclark : 
Files : 
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedWriter.java

ddas : 
Files : 
* /hbase/trunk/hbase-common/src/main/resources/hbase-default.xml
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/FlushRequester.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedWriter.java


 Limits the amount of time an edit can live in the memstore.
 ---

 Key: HBASE-5930
 URL: https://issues.apache.org/jira/browse/HBASE-5930
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Devaraj Das
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: 5930-0.94.txt, 5930-1.patch, 5930-2.1.patch, 
 5930-2.2.patch, 5930-2.3.patch, 5930-2.4.patch, 5930-track-oldest-sample.txt, 
 5930-wip.patch, HBASE-5930-ADD-0.patch, hbase-5930-addendum2.patch, 
 hbase-5930-test-execution.log


 A colleague of mine ran into an interesting issue.
 He inserted some data with the WAL disabled, which happened to fit in the 
 aggregate Memstores memory.
 Two weeks later he a had problem with the HDFS cluster, which caused the 
 region servers to abort. He found that his data was lost. Looking at the log 
 we found that the Memstores were not flushed at all during these two weeks.
 Should we have an option to flush memstores periodically. There are obvious 
 downsides to this, like many small storefiles, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8428) Tighten up IntegrationTestsDriver filter

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642832#comment-13642832
 ] 

Hudson commented on HBASE-8428:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #511 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/511/])
HBASE-8428 Tighten up IntegrationTestsDriver filter (Revision 1475996)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestsDriver.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/AbstractHBaseTool.java
* /hbase/trunk/src/main/docbkx/developer.xml


 Tighten up IntegrationTestsDriver filter
 

 Key: HBASE-8428
 URL: https://issues.apache.org/jira/browse/HBASE-8428
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.95.1

 Attachments: 8428.txt


 Currently, filter that looks for IntegrationTests is broad.  Reports loads of 
 errors as we try to parse classes we don't care about.  Let me tighten it up 
 so it doesn't scare folks away.
 It is particular bad when being run against a distribute cluster when the 
 test context is not all present; here there are lots of ERROR reports about 
 classes not found.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8367) LoadIncrementalHFiles does not return an error code or throw Exception when failures occur due to timeouts.

2013-04-26 Thread Brian Dougan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Dougan updated HBASE-8367:


Attachment: LoadIncrementalHFiles-HBASE-8367.patch

Patch file against trunk.

 LoadIncrementalHFiles does not return an error code or throw Exception when 
 failures occur due to timeouts.
 ---

 Key: HBASE-8367
 URL: https://issues.apache.org/jira/browse/HBASE-8367
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.92.1, 0.92.2
 Environment: Red Hat 6.2 
 Java 1.6.0_26
 Hadoop 2.0.0-mr1-cdh4.1.1
 HBase 0.92.1-cdh4.1.1
Reporter: Brian Dougan
Priority: Minor
 Fix For: 0.94.8

 Attachments: LoadIncrementalHFiles-HBASE-8367.patch


 The LoadIncrementalHFiles (completebulkload) command will exit with a success 
 code (or lack of Exception) when one or more of the HFiles fail to be 
 imported through a few ways (mainly when timeouts occur).  Instead, it simply 
 logs error messages to the log which makes it difficult to automate the 
 import of HFiles programmatically.   
 The heart of the LoadIncrementalHFiles class (doBulkLoad) returns void and 
 has essentially the following structure.
 {code:title=LoadIncrementalHFiles.java}
 try {
   ...
  
 } finally {
   pool.shutdown();
   if (queue != null  !queue.isEmpty()) {
 StringBuilder err = new StringBuilder();
 err.append(-\n);
 err.append(Bulk load aborted with some files not yet loaded:\n);
 err.append(-\n);
 for (LoadQueueItem q : queue) {
   err.append(  ).append(q.hfilePath).append('\n');
 }
 LOG.error(err);
   }
 }
 {code}
 As you can see, instead of returning an error code, a success indicator, or 
 simply throwing an Exception, an error message is sent to the log.  This 
 results in something like the following in the logs.
 {quote}
 Bulk load aborted with some files not yet loaded:
 -
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.bottom
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.top
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.bottom
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.top
 {quote}
 Without some sort of indication, it's not currently possible to chain this 
 command to another or to programmatically consume this class and be certain 
 of a successful import.
 This class should be enhanced to return non-success in whatever way makes 
 sense to the community.  I don't really have a strong preference, but one of 
 the following should work fine (at least for my needs).
 * boolean return value on doBulkLoad (non-zero on run method)
 * Response object on doBulkLoad detailing the files that failed (non-zero on 
 run method)
 * throw Exception in the finally block when files failed after the error is 
 written to the log (should automatically cause non-zero on run method)
 It would also be nice to get this to the 0.94.x stream so it get included in 
 the next Cloudera release.  Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8367) LoadIncrementalHFiles does not return an error code or throw Exception when failures occur due to timeouts.

2013-04-26 Thread Brian Dougan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Dougan updated HBASE-8367:


Status: Patch Available  (was: Open)

Patched proposed changes against trunk.

 LoadIncrementalHFiles does not return an error code or throw Exception when 
 failures occur due to timeouts.
 ---

 Key: HBASE-8367
 URL: https://issues.apache.org/jira/browse/HBASE-8367
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.92.2, 0.92.1
 Environment: Red Hat 6.2 
 Java 1.6.0_26
 Hadoop 2.0.0-mr1-cdh4.1.1
 HBase 0.92.1-cdh4.1.1
Reporter: Brian Dougan
Priority: Minor
 Fix For: 0.94.8

 Attachments: LoadIncrementalHFiles-HBASE-8367.patch


 The LoadIncrementalHFiles (completebulkload) command will exit with a success 
 code (or lack of Exception) when one or more of the HFiles fail to be 
 imported through a few ways (mainly when timeouts occur).  Instead, it simply 
 logs error messages to the log which makes it difficult to automate the 
 import of HFiles programmatically.   
 The heart of the LoadIncrementalHFiles class (doBulkLoad) returns void and 
 has essentially the following structure.
 {code:title=LoadIncrementalHFiles.java}
 try {
   ...
  
 } finally {
   pool.shutdown();
   if (queue != null  !queue.isEmpty()) {
 StringBuilder err = new StringBuilder();
 err.append(-\n);
 err.append(Bulk load aborted with some files not yet loaded:\n);
 err.append(-\n);
 for (LoadQueueItem q : queue) {
   err.append(  ).append(q.hfilePath).append('\n');
 }
 LOG.error(err);
   }
 }
 {code}
 As you can see, instead of returning an error code, a success indicator, or 
 simply throwing an Exception, an error message is sent to the log.  This 
 results in something like the following in the logs.
 {quote}
 Bulk load aborted with some files not yet loaded:
 -
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.bottom
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.top
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.bottom
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.top
 {quote}
 Without some sort of indication, it's not currently possible to chain this 
 command to another or to programmatically consume this class and be certain 
 of a successful import.
 This class should be enhanced to return non-success in whatever way makes 
 sense to the community.  I don't really have a strong preference, but one of 
 the following should work fine (at least for my needs).
 * boolean return value on doBulkLoad (non-zero on run method)
 * Response object on doBulkLoad detailing the files that failed (non-zero on 
 run method)
 * throw Exception in the finally block when files failed after the error is 
 written to the log (should automatically cause non-zero on run method)
 It would also be nice to get this to the 0.94.x stream so it get included in 
 the next Cloudera release.  Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8446) Allow parallel snapshot of different tables

2013-04-26 Thread Matteo Bertozzi (JIRA)
Matteo Bertozzi created HBASE-8446:
--

 Summary: Allow parallel snapshot of different tables
 Key: HBASE-8446
 URL: https://issues.apache.org/jira/browse/HBASE-8446
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 0.95.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Fix For: 0.95.2
 Attachments: HBASE-8446-v0.patch

currently only one snapshot at the time is allowed.
Like for the restore, we should allow taking snapshot of different tables in 
parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8446) Allow parallel snapshot of different tables

2013-04-26 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-8446:
---

Attachment: HBASE-8446-v0.patch

 Allow parallel snapshot of different tables
 ---

 Key: HBASE-8446
 URL: https://issues.apache.org/jira/browse/HBASE-8446
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 0.95.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Fix For: 0.95.2

 Attachments: HBASE-8446-v0.patch


 currently only one snapshot at the time is allowed.
 Like for the restore, we should allow taking snapshot of different tables in 
 parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8367) LoadIncrementalHFiles does not return an error code or throw Exception when failures occur due to timeouts.

2013-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642916#comment-13642916
 ] 

Hadoop QA commented on HBASE-8367:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12580706/LoadIncrementalHFiles-HBASE-8367.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5466//console

This message is automatically generated.

 LoadIncrementalHFiles does not return an error code or throw Exception when 
 failures occur due to timeouts.
 ---

 Key: HBASE-8367
 URL: https://issues.apache.org/jira/browse/HBASE-8367
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.92.1, 0.92.2
 Environment: Red Hat 6.2 
 Java 1.6.0_26
 Hadoop 2.0.0-mr1-cdh4.1.1
 HBase 0.92.1-cdh4.1.1
Reporter: Brian Dougan
Priority: Minor
 Fix For: 0.94.8

 Attachments: LoadIncrementalHFiles-HBASE-8367.patch


 The LoadIncrementalHFiles (completebulkload) command will exit with a success 
 code (or lack of Exception) when one or more of the HFiles fail to be 
 imported through a few ways (mainly when timeouts occur).  Instead, it simply 
 logs error messages to the log which makes it difficult to automate the 
 import of HFiles programmatically.   
 The heart of the LoadIncrementalHFiles class (doBulkLoad) returns void and 
 has essentially the following structure.
 {code:title=LoadIncrementalHFiles.java}
 try {
   ...
  
 } finally {
   pool.shutdown();
   if (queue != null  !queue.isEmpty()) {
 StringBuilder err = new StringBuilder();
 err.append(-\n);
 err.append(Bulk load aborted with some files not yet loaded:\n);
 err.append(-\n);
 for (LoadQueueItem q : queue) {
   err.append(  ).append(q.hfilePath).append('\n');
 }
 LOG.error(err);
   }
 }
 {code}
 As you can see, instead of returning an error code, a success indicator, or 
 simply throwing an Exception, an error message is sent to the log.  This 
 results in something like the following in the logs.
 {quote}
 Bulk load aborted with some files not yet loaded:
 -
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.bottom
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.top
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.bottom
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.top
 {quote}
 Without some sort of indication, it's not currently possible to chain this 
 command to another or to programmatically consume this class and be certain 
 of a successful import.
 This class should be enhanced to return non-success in whatever way makes 
 sense to the community.  I don't really have a strong preference, but one of 
 the following should work fine (at least for my needs).
 * boolean return value on doBulkLoad (non-zero on run method)
 * Response object on doBulkLoad detailing the files that failed (non-zero on 
 run method)
 * throw Exception in the finally block when files failed after the error is 
 written to the log (should automatically cause non-zero on run method)
 It would also be nice to get this to the 0.94.x stream so it get included in 
 the next Cloudera release.  Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8446) Allow parallel snapshot of different tables

2013-04-26 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-8446:
---

Attachment: (was: HBASE-8446-v0.patch)

 Allow parallel snapshot of different tables
 ---

 Key: HBASE-8446
 URL: https://issues.apache.org/jira/browse/HBASE-8446
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 0.95.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Fix For: 0.95.2

 Attachments: HBASE-8446-v0.patch


 currently only one snapshot at the time is allowed.
 Like for the restore, we should allow taking snapshot of different tables in 
 parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8446) Allow parallel snapshot of different tables

2013-04-26 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-8446:
---

Attachment: HBASE-8446-v0.patch

 Allow parallel snapshot of different tables
 ---

 Key: HBASE-8446
 URL: https://issues.apache.org/jira/browse/HBASE-8446
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 0.95.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Fix For: 0.95.2

 Attachments: HBASE-8446-v0.patch


 currently only one snapshot at the time is allowed.
 Like for the restore, we should allow taking snapshot of different tables in 
 parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8367) LoadIncrementalHFiles does not return an error code or throw Exception when failures occur due to timeouts.

2013-04-26 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642959#comment-13642959
 ] 

Nick Dimiduk commented on HBASE-8367:
-

Hi [~bkdougan]. Please regenerate the patch from the root of the checkout. I 
would expect to see {{hbase-server}} as the first component in the path. Also, 
have a look at TestLoadIncrementalHFiles and see if any logic in there should 
be updated accordingly. For instance, I think this patch will break the method 
{{testNonexistentColumnFamilyLoad}}.

Thanks!

 LoadIncrementalHFiles does not return an error code or throw Exception when 
 failures occur due to timeouts.
 ---

 Key: HBASE-8367
 URL: https://issues.apache.org/jira/browse/HBASE-8367
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.92.1, 0.92.2
 Environment: Red Hat 6.2 
 Java 1.6.0_26
 Hadoop 2.0.0-mr1-cdh4.1.1
 HBase 0.92.1-cdh4.1.1
Reporter: Brian Dougan
Priority: Minor
 Fix For: 0.94.8

 Attachments: LoadIncrementalHFiles-HBASE-8367.patch


 The LoadIncrementalHFiles (completebulkload) command will exit with a success 
 code (or lack of Exception) when one or more of the HFiles fail to be 
 imported through a few ways (mainly when timeouts occur).  Instead, it simply 
 logs error messages to the log which makes it difficult to automate the 
 import of HFiles programmatically.   
 The heart of the LoadIncrementalHFiles class (doBulkLoad) returns void and 
 has essentially the following structure.
 {code:title=LoadIncrementalHFiles.java}
 try {
   ...
  
 } finally {
   pool.shutdown();
   if (queue != null  !queue.isEmpty()) {
 StringBuilder err = new StringBuilder();
 err.append(-\n);
 err.append(Bulk load aborted with some files not yet loaded:\n);
 err.append(-\n);
 for (LoadQueueItem q : queue) {
   err.append(  ).append(q.hfilePath).append('\n');
 }
 LOG.error(err);
   }
 }
 {code}
 As you can see, instead of returning an error code, a success indicator, or 
 simply throwing an Exception, an error message is sent to the log.  This 
 results in something like the following in the logs.
 {quote}
 Bulk load aborted with some files not yet loaded:
 -
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.bottom
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.top
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.bottom
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.top
 {quote}
 Without some sort of indication, it's not currently possible to chain this 
 command to another or to programmatically consume this class and be certain 
 of a successful import.
 This class should be enhanced to return non-success in whatever way makes 
 sense to the community.  I don't really have a strong preference, but one of 
 the following should work fine (at least for my needs).
 * boolean return value on doBulkLoad (non-zero on run method)
 * Response object on doBulkLoad detailing the files that failed (non-zero on 
 run method)
 * throw Exception in the finally block when files failed after the error is 
 written to the log (should automatically cause non-zero on run method)
 It would also be nice to get this to the 0.94.x stream so it get included in 
 the next Cloudera release.  Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8367) LoadIncrementalHFiles does not return an error code or throw Exception when failures occur due to timeouts.

2013-04-26 Thread Brian Dougan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642993#comment-13642993
 ] 

Brian Dougan commented on HBASE-8367:
-

Yep, I messed up that patch...used eclipse to generate it and did it from the 
project rather than the root.  I'll get that updated...

As for the tests, none of the existing ones are affected by this.  I tried to 
write a new test, but can't find a way to hit this code with the current setup 
of this class/tests.  The code in that finally block only gets hit when all the 
setup for the HFiles work (it's able to determine region/check for 
splits/verify families).  It only hits that code in the finally block when 
something like a timeout occurs or connection errors occur while doing the bulk 
load on the region after everything else has been successful.  Without the 
ability to intercept the call to the region or to mock the region that gets 
called, I don't really think it can be duplicated currently...thoughts?

 LoadIncrementalHFiles does not return an error code or throw Exception when 
 failures occur due to timeouts.
 ---

 Key: HBASE-8367
 URL: https://issues.apache.org/jira/browse/HBASE-8367
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.92.1, 0.92.2
 Environment: Red Hat 6.2 
 Java 1.6.0_26
 Hadoop 2.0.0-mr1-cdh4.1.1
 HBase 0.92.1-cdh4.1.1
Reporter: Brian Dougan
Priority: Minor
 Fix For: 0.94.8

 Attachments: LoadIncrementalHFiles-HBASE-8367.patch


 The LoadIncrementalHFiles (completebulkload) command will exit with a success 
 code (or lack of Exception) when one or more of the HFiles fail to be 
 imported through a few ways (mainly when timeouts occur).  Instead, it simply 
 logs error messages to the log which makes it difficult to automate the 
 import of HFiles programmatically.   
 The heart of the LoadIncrementalHFiles class (doBulkLoad) returns void and 
 has essentially the following structure.
 {code:title=LoadIncrementalHFiles.java}
 try {
   ...
  
 } finally {
   pool.shutdown();
   if (queue != null  !queue.isEmpty()) {
 StringBuilder err = new StringBuilder();
 err.append(-\n);
 err.append(Bulk load aborted with some files not yet loaded:\n);
 err.append(-\n);
 for (LoadQueueItem q : queue) {
   err.append(  ).append(q.hfilePath).append('\n');
 }
 LOG.error(err);
   }
 }
 {code}
 As you can see, instead of returning an error code, a success indicator, or 
 simply throwing an Exception, an error message is sent to the log.  This 
 results in something like the following in the logs.
 {quote}
 Bulk load aborted with some files not yet loaded:
 -
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.bottom
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,2.top
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.bottom
   
 hdfs://prmdprod/user/userxxx/hfile/TABLE-1365721885510/record/_tmp/TABLE,1.top
 {quote}
 Without some sort of indication, it's not currently possible to chain this 
 command to another or to programmatically consume this class and be certain 
 of a successful import.
 This class should be enhanced to return non-success in whatever way makes 
 sense to the community.  I don't really have a strong preference, but one of 
 the following should work fine (at least for my needs).
 * boolean return value on doBulkLoad (non-zero on run method)
 * Response object on doBulkLoad detailing the files that failed (non-zero on 
 run method)
 * throw Exception in the finally block when files failed after the error is 
 written to the log (should automatically cause non-zero on run method)
 It would also be nice to get this to the 0.94.x stream so it get included in 
 the next Cloudera release.  Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8393) Testcase TestHeapSize#testMutations is wrong

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642997#comment-13642997
 ] 

Hudson commented on HBASE-8393:
---

Integrated in HBase-TRUNK #4081 (See 
[https://builds.apache.org/job/HBase-TRUNK/4081/])
HBASE-8393 Testcase TestHeapSize#testMutations is wrong (Jeffrey) (Revision 
1476022)

 Result = SUCCESS
tedyu : 
Files : 
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/TestHeapSize.java


 Testcase TestHeapSize#testMutations is wrong
 

 Key: HBASE-8393
 URL: https://issues.apache.org/jira/browse/HBASE-8393
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0, 0.95.1

 Attachments: hbase-8393.patch


 I happened to check this test case and there are several existing errors to 
 make it pass. You can reproduce the test case failure by adding a new field 
 into Mutation, the test case will either fail on a 64 bit system or 32 bit 
 one.
 Below are errors I found in the test case:
 1) The test case is using {code}row=new byte[]{0}{code} which is an array 
 with length=1 while ClassSize.estimateBase can only calculate base class 
 size(without counting field array length)
 2) Add ClassSize.REFERENCE twice in the following code because 
 ClassSize.estimateBase adds all reference fields already. {code}expected += 
 ClassSize.align(ClassSize.TREEMAP + ClassSize.REFERENCE);{code}
 3) ClassSize.estimateBase round up the sum of length of reference fields + 
 primitive fields + Array while Mutation.MUTATION_OVERHEAD aligns the sum of 
 length of a different set of fields. Therefore, there will be round up 
 differences for class Increment because it introduces a new reference field 
 TimeRange tr when the test case runs on a 32bit and 64 bit system.   
 {code}
 ...
 long prealign_size = coeff[0] + align(coeff[1] * ARRAY) + coeff[2] * 
 REFERENCE;
 // Round up to a multiple of 8
 long size = align(prealign_size);
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8444) Acknowledge that 0.95+ requires 1.0.3 hadoop at least.

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642998#comment-13642998
 ] 

Hudson commented on HBASE-8444:
---

Integrated in HBase-TRUNK #4081 (See 
[https://builds.apache.org/job/HBase-TRUNK/4081/])
HBASE-8444 Acknowledge that 0.95+ requires 1.0.3 hadoop at least (Revision 
1476036)

 Result = SUCCESS
stack : 
Files : 
* /hbase/trunk/src/main/docbkx/configuration.xml


 Acknowledge that 0.95+ requires 1.0.3 hadoop at least.
 --

 Key: HBASE-8444
 URL: https://issues.apache.org/jira/browse/HBASE-8444
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.98.0

 Attachments: 8444.txt


 As per this mail thread, 
 http://search-hadoop.com/m/stbKO1YNWZe/Compile+does+not+work+against+Hadoop-1.0.0+-+1.0.2subj=Re+Compile+does+not+work+against+Hadoop+1+0+0+1+0+2
 ... 0.95.x requires hadoop 1.0.3 at least.  Note it in the refguide hadoop 
 section.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8345) Add all available resources in o.a.h.h.rest.RootResource and VersionResource to o.a.h.h.rest.client.RemoteAdmin

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642999#comment-13642999
 ] 

Hudson commented on HBASE-8345:
---

Integrated in HBase-TRUNK #4081 (See 
[https://builds.apache.org/job/HBase-TRUNK/4081/])
HBASE-8345 Add all available resources in RootResource and VersionResource 
to rest RemoteAdmin (Aleksandr Shulman) (Revision 1476025)

 Result = SUCCESS
jxiang : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteAdmin.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/rest/client/TestRemoteAdmin.java


 Add all available resources in o.a.h.h.rest.RootResource and VersionResource 
 to o.a.h.h.rest.client.RemoteAdmin
 ---

 Key: HBASE-8345
 URL: https://issues.apache.org/jira/browse/HBASE-8345
 Project: HBase
  Issue Type: Improvement
  Components: Client, REST
Affects Versions: 0.94.6.1
Reporter: Aleksandr Shulman
Assignee: Aleksandr Shulman
Priority: Minor
  Labels: rest_api
 Fix For: 0.98.0, 0.94.8, 0.95.1

 Attachments: HBASE-8345-v1.patch, HBASE-8345-v6-94.patch, 
 HBASE-8345-v6-trunk.patch


 In our built-in REST clients, we should add in more of the available REST 
 resources. This will allow more thorough testing of the REST API, 
 particularly with IntegrationTest.
 These clients are located in the o.a.h.h.rest.client package.
 In this case, I want to add the resources not already included in / and 
 /version to o.a.h.h.rest.client.RemoteAdmin. This includes, /status/cluster, 
 /version/rest and /version/cluster, among others.
 The RemoteAdmin class is a logical place for these methods because it is not 
 related to a specific table (those methods should go into RemoteHTable).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8271) Book updates for changes to GC options in shell scripts

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643000#comment-13643000
 ] 

Hudson commented on HBASE-8271:
---

Integrated in HBase-TRUNK #4081 (See 
[https://builds.apache.org/job/HBase-TRUNK/4081/])
HBASE-8271 Book updates for changes to GC options in shell scripts 
(Revision 1476037)

 Result = SUCCESS
stack : 
Files : 
* /hbase/trunk/src/main/docbkx/troubleshooting.xml


 Book updates for changes to GC options in shell scripts
 ---

 Key: HBASE-8271
 URL: https://issues.apache.org/jira/browse/HBASE-8271
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Reporter: Jesse Yates
Priority: Minor
 Fix For: 0.98.0

 Attachments: HBASE-8271.patch


 http://hbase.apache.org/book/trouble.log.html is a bit out of date as the 
 'right' way to do GC logging is via the GC_OPTS, rather than going through 
 the general HBASE_OPTS.
 Follow up to HBASE-7817

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-8445) regionserver can't load an updated coprocessor jar with the same jar path

2013-04-26 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-8445.
---

Resolution: Invalid

We don't support module reload use cases. For that the consensus is we should 
consider a full OSGi runtime so we do not repeat all of the mistakes involved 
in creating such a runtime, however unless there is a compelling reason to do 
so the consensus is also that is not wanted.

 regionserver can't load an updated coprocessor jar with the same jar path
 -

 Key: HBASE-8445
 URL: https://issues.apache.org/jira/browse/HBASE-8445
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5
Reporter: Wang Qiang
 Attachments: patch_20130426_01.txt


 when I update a coprocessor jar, then I disable and enable the table with the 
 coprocessor, but the new features in the updated coprocessor jar doesn't make 
 any sense. Follow into the class 
 'org.apache.hadoop.hbase.coprocessor.CoprocessorHost', I found that there's a 
 coprocessor class loader cache , of which the key is the coprocessor jar 
 path(although the key is a weak reference), so when I disable/enable the 
 table, it got a cached coprocessor class loader from the cache with the jar 
 path, and it didn't try to reload the coprocessor jar from the hdfs. Here I 
 give a patch, in which I add an extra info which is 'FileCheckSum' with the 
 coprocessor class loader cache, if the checksum is changed, try to reload the 
 jar from the hdfs path

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8438) Extend bin/hbase to print a minimal classpath for used by other tools

2013-04-26 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643016#comment-13643016
 ] 

Andrew Purtell commented on HBASE-8438:
---

+1

 Extend bin/hbase to print a minimal classpath for used by other tools
 ---

 Key: HBASE-8438
 URL: https://issues.apache.org/jira/browse/HBASE-8438
 Project: HBase
  Issue Type: Improvement
  Components: scripts
Affects Versions: 0.94.6.1, 0.95.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: 
 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch, 
 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch, 
 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch


 For tools like pig and hive, blindly appending the full output of `bin/hbase 
 classpath` to their own CLASSPATH is excessive. They already build CLASSPATH 
 entries for hadoop. All they need from us is the delta entries, the 
 dependencies we require w/o hadoop and all of it's transitive deps. This is 
 also a kindness for Windows, where there's a shorter limit on the length of 
 commandline arguments.
 See also HIVE-2055 for additional discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8445) regionserver can't load an updated coprocessor jar with the same jar path

2013-04-26 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643019#comment-13643019
 ] 

Jimmy Xiang commented on HBASE-8445:


That's right: we don't do reloading. One work-around is to do a full cluster 
rolling-restart in such a case.

 regionserver can't load an updated coprocessor jar with the same jar path
 -

 Key: HBASE-8445
 URL: https://issues.apache.org/jira/browse/HBASE-8445
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.5
Reporter: Wang Qiang
 Attachments: patch_20130426_01.txt


 when I update a coprocessor jar, then I disable and enable the table with the 
 coprocessor, but the new features in the updated coprocessor jar doesn't make 
 any sense. Follow into the class 
 'org.apache.hadoop.hbase.coprocessor.CoprocessorHost', I found that there's a 
 coprocessor class loader cache , of which the key is the coprocessor jar 
 path(although the key is a weak reference), so when I disable/enable the 
 table, it got a cached coprocessor class loader from the cache with the jar 
 path, and it didn't try to reload the coprocessor jar from the hdfs. Here I 
 give a patch, in which I add an extra info which is 'FileCheckSum' with the 
 coprocessor class loader cache, if the checksum is changed, try to reload the 
 jar from the hdfs path

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8438) Extend bin/hbase to print a minimal classpath for used by other tools

2013-04-26 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643036#comment-13643036
 ] 

Nick Dimiduk commented on HBASE-8438:
-

How do we fix this release audit warning?

{noformat}
!? 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/test/data/a6a6562b777440fd9c34885428f5cb61.21e75333ada3d5bafb34bb918f29576c
Lines that start with ? in the release audit report indicate files that do 
not have an Apache license header.
{noformat}

 Extend bin/hbase to print a minimal classpath for used by other tools
 ---

 Key: HBASE-8438
 URL: https://issues.apache.org/jira/browse/HBASE-8438
 Project: HBase
  Issue Type: Improvement
  Components: scripts
Affects Versions: 0.94.6.1, 0.95.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: 
 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch, 
 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch, 
 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch


 For tools like pig and hive, blindly appending the full output of `bin/hbase 
 classpath` to their own CLASSPATH is excessive. They already build CLASSPATH 
 entries for hadoop. All they need from us is the delta entries, the 
 dependencies we require w/o hadoop and all of it's transitive deps. This is 
 also a kindness for Windows, where there's a shorter limit on the length of 
 commandline arguments.
 See also HIVE-2055 for additional discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8447) Add docs for hbck around metaonly

2013-04-26 Thread Elliott Clark (JIRA)
Elliott Clark created HBASE-8447:


 Summary: Add docs for hbck around metaonly
 Key: HBASE-8447
 URL: https://issues.apache.org/jira/browse/HBASE-8447
 Project: HBase
  Issue Type: Improvement
Reporter: Elliott Clark
Priority: Minor


We should document -metaonly in the book.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7413) Convert WAL to pb

2013-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643053#comment-13643053
 ] 

Sergey Shelukhin commented on HBASE-7413:
-

Same pattern later. I will commit sometime between today and monday if there 
are no objections.

 Convert WAL to pb
 -

 Key: HBASE-7413
 URL: https://issues.apache.org/jira/browse/HBASE-7413
 Project: HBase
  Issue Type: Sub-task
  Components: wal
Reporter: stack
Assignee: Sergey Shelukhin
Priority: Critical
 Fix For: 0.95.1

 Attachments: HBASE-7413-v0.patch, HBASE-7413-v1.patch, 
 HBASE-7413-v2.patch, HBASE-7413-v3.patch, HBASE-7413-v4.patch


 From HBASE-7201

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8438) Extend bin/hbase to print a minimal classpath for used by other tools

2013-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643059#comment-13643059
 ] 

Sergey Shelukhin commented on HBASE-8438:
-

+1

 Extend bin/hbase to print a minimal classpath for used by other tools
 ---

 Key: HBASE-8438
 URL: https://issues.apache.org/jira/browse/HBASE-8438
 Project: HBase
  Issue Type: Improvement
  Components: scripts
Affects Versions: 0.94.6.1, 0.95.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: 
 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch, 
 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch, 
 0001-HBASE-8438-Extend-bin-hbase-to-print-a-minimal-class.patch


 For tools like pig and hive, blindly appending the full output of `bin/hbase 
 classpath` to their own CLASSPATH is excessive. They already build CLASSPATH 
 entries for hadoop. All they need from us is the delta entries, the 
 dependencies we require w/o hadoop and all of it's transitive deps. This is 
 also a kindness for Windows, where there's a shorter limit on the length of 
 commandline arguments.
 See also HIVE-2055 for additional discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-26 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Attachment: 6295.v4.patch

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
 6295.v4.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8446) Allow parallel snapshot of different tables

2013-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643085#comment-13643085
 ] 

Sergey Shelukhin commented on HBASE-8446:
-

There's no new test for this, otherwise looks good

 Allow parallel snapshot of different tables
 ---

 Key: HBASE-8446
 URL: https://issues.apache.org/jira/browse/HBASE-8446
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 0.95.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Fix For: 0.95.2

 Attachments: HBASE-8446-v0.patch


 currently only one snapshot at the time is allowed.
 Like for the restore, we should allow taking snapshot of different tables in 
 parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-26 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643087#comment-13643087
 ] 

Nicolas Liochon commented on HBASE-6295:


v4. I added a control per server: the client cannot have more than X request on 
a same server. If this number is reached, we continue for the other servers, 
but the ones on the overloaded servers are kept in the buffer. This will limit 
the rpc.timeout effect.

It's still a hack in terms on implementation, but hopefully it's acceptable in 
terms of feature. I've got some tests running locally, I will do one on a real 
cluster if they are ok.

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
 6295.v4.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests

2013-04-26 Thread Varun Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643136#comment-13643136
 ] 

Varun Sharma commented on HBASE-8389:
-

[~saint@gmail.com]
I can do a small write up that folks can refer to.

[~nkeywal]
One point regarding the low setting though. Its good for fast MTTR requirements 
such as online clusters but it does not work well if you pound a small cluster 
with mapreduce jobs. The write timeouts start kicking in on datanodes - we saw 
this on a small cluster. So it has to be taken with a pinch of salt.

I think 4 seconds might be too tight. Because we have the following sequence -
1) recoverLease called
2) The primary node heartbeats (this can be 3 seconds in the worst case)
3) There are multiple timeouts during recovery at primary datanode:
a) dfs.socket.timeout kicks in when we suspend the processes using kill 
-STOP - there is only 1 retry
b) ipc.client.connect.timeout is the troublemaker - on old hadoop versions 
it is hardcoded at 20 seconds. On some versions, the # of retries is hardcoded 
at 45. This can be trigger by firewalling a host using iptables to drop all 
incoming/outgoing TCP packets. Another issue here is that b/w the timeouts 
there is a 1 second hardcoded sleep :) - I just fixed it in HADOOP 9503. If we 
make sure that all the dfs.socket.timeout and ipc client settings are the same 
in hbase-site.xml and hdfs-site.xml. Then, we can

The retry rate should be no faster than 3a and 3b - or lease recoveries will 
accumulate for 900 seconds in trunk. To get around this problem, we would want 
to make sure that hbase-site.xml has the same settings as hdfs-site.xml. And we 
calculate the recovery interval from those settings. Otherwise, we can leave a 
release note saying that this number should be max(dfs.socket.timeout, 
ipc.client.connect.max.retries.on.timeouts * ipc.client.connect.timeout, 
ipc.client.connect.max.retries).

The advantage of having HDFS 4721 is that at some point the data node will be 
recognized as stale - maybe a little later than hdfs recovery. Once that 
happens, recoveries typically occuring within 2 seconds.

 HBASE-8354 forces Namenode into loop with lease recovery requests
 -

 Key: HBASE-8389
 URL: https://issues.apache.org/jira/browse/HBASE-8389
 Project: HBase
  Issue Type: Bug
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.94.8

 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 
 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 
 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, 
 sample.patch


 We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease 
 recoveries because of the short retry interval of 1 second between lease 
 recoveries.
 The namenode gets into the following loop:
 1) Receives lease recovery request and initiates recovery choosing a primary 
 datanode every second
 2) A lease recovery is successful and the namenode tries to commit the block 
 under recovery as finalized - this takes  10 seconds in our environment 
 since we run with tight HDFS socket timeouts.
 3) At step 2), there is a more recent recovery enqueued because of the 
 aggressive retries. This causes the committed block to get preempted and we 
 enter a vicious cycle
 So we do,  initiate_recovery -- commit_block -- 
 commit_preempted_by_another_recovery
 This loop is paused after 300 seconds which is the 
 hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes 
 which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node 
 detection timeout is 20 seconds.
 Note that before the patch, we do not call recoverLease so aggressively - 
 also it seems that the HDFS namenode is pretty dumb in that it keeps 
 initiating new recoveries for every call. Before the patch, we call 
 recoverLease, assume that the block was recovered, try to get the file, it 
 has zero length since its under recovery, we fail the task and retry until we 
 get a non zero length. So things just work.
 Fixes:
 1) Expecting recovery to occur within 1 second is too aggressive. We need to 
 have a more generous timeout. The timeout needs to be configurable since 
 typically, the recovery takes as much time as the DFS timeouts. The primary 
 datanode doing the recovery tries to reconcile the blocks and hits the 
 timeouts when it tries to contact the dead node. So the recovery is as fast 
 as the HDFS timeouts.
 2) We have another issue I report in HDFS 4721. The Namenode chooses the 
 stale datanode to perform the recovery (since its still alive). Hence the 
 first recovery request is bound to fail. So if we want a tight MTTR, we 
 either 

[jira] [Commented] (HBASE-8444) Acknowledge that 0.95+ requires 1.0.3 hadoop at least.

2013-04-26 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643218#comment-13643218
 ] 

Enis Soztutar commented on HBASE-8444:
--

Thanks Stack. 

 Acknowledge that 0.95+ requires 1.0.3 hadoop at least.
 --

 Key: HBASE-8444
 URL: https://issues.apache.org/jira/browse/HBASE-8444
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.98.0

 Attachments: 8444.txt


 As per this mail thread, 
 http://search-hadoop.com/m/stbKO1YNWZe/Compile+does+not+work+against+Hadoop-1.0.0+-+1.0.2subj=Re+Compile+does+not+work+against+Hadoop+1+0+0+1+0+2
 ... 0.95.x requires hadoop 1.0.3 at least.  Note it in the refguide hadoop 
 section.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8393) Testcase TestHeapSize#testMutations is wrong

2013-04-26 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8393:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Testcase TestHeapSize#testMutations is wrong
 

 Key: HBASE-8393
 URL: https://issues.apache.org/jira/browse/HBASE-8393
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0, 0.95.1

 Attachments: hbase-8393.patch


 I happened to check this test case and there are several existing errors to 
 make it pass. You can reproduce the test case failure by adding a new field 
 into Mutation, the test case will either fail on a 64 bit system or 32 bit 
 one.
 Below are errors I found in the test case:
 1) The test case is using {code}row=new byte[]{0}{code} which is an array 
 with length=1 while ClassSize.estimateBase can only calculate base class 
 size(without counting field array length)
 2) Add ClassSize.REFERENCE twice in the following code because 
 ClassSize.estimateBase adds all reference fields already. {code}expected += 
 ClassSize.align(ClassSize.TREEMAP + ClassSize.REFERENCE);{code}
 3) ClassSize.estimateBase round up the sum of length of reference fields + 
 primitive fields + Array while Mutation.MUTATION_OVERHEAD aligns the sum of 
 length of a different set of fields. Therefore, there will be round up 
 differences for class Increment because it introduces a new reference field 
 TimeRange tr when the test case runs on a 32bit and 64 bit system.   
 {code}
 ...
 long prealign_size = coeff[0] + align(coeff[1] * ARRAY) + coeff[2] * 
 REFERENCE;
 // Round up to a multiple of 8
 long size = align(prealign_size);
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests

2013-04-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643220#comment-13643220
 ] 

Ted Yu commented on HBASE-8389:
---

@Varun:
bq. Then, we can
Can you complete the above sentence ?

 HBASE-8354 forces Namenode into loop with lease recovery requests
 -

 Key: HBASE-8389
 URL: https://issues.apache.org/jira/browse/HBASE-8389
 Project: HBase
  Issue Type: Bug
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.94.8

 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 
 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 
 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, 
 sample.patch


 We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease 
 recoveries because of the short retry interval of 1 second between lease 
 recoveries.
 The namenode gets into the following loop:
 1) Receives lease recovery request and initiates recovery choosing a primary 
 datanode every second
 2) A lease recovery is successful and the namenode tries to commit the block 
 under recovery as finalized - this takes  10 seconds in our environment 
 since we run with tight HDFS socket timeouts.
 3) At step 2), there is a more recent recovery enqueued because of the 
 aggressive retries. This causes the committed block to get preempted and we 
 enter a vicious cycle
 So we do,  initiate_recovery -- commit_block -- 
 commit_preempted_by_another_recovery
 This loop is paused after 300 seconds which is the 
 hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes 
 which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node 
 detection timeout is 20 seconds.
 Note that before the patch, we do not call recoverLease so aggressively - 
 also it seems that the HDFS namenode is pretty dumb in that it keeps 
 initiating new recoveries for every call. Before the patch, we call 
 recoverLease, assume that the block was recovered, try to get the file, it 
 has zero length since its under recovery, we fail the task and retry until we 
 get a non zero length. So things just work.
 Fixes:
 1) Expecting recovery to occur within 1 second is too aggressive. We need to 
 have a more generous timeout. The timeout needs to be configurable since 
 typically, the recovery takes as much time as the DFS timeouts. The primary 
 datanode doing the recovery tries to reconcile the blocks and hits the 
 timeouts when it tries to contact the dead node. So the recovery is as fast 
 as the HDFS timeouts.
 2) We have another issue I report in HDFS 4721. The Namenode chooses the 
 stale datanode to perform the recovery (since its still alive). Hence the 
 first recovery request is bound to fail. So if we want a tight MTTR, we 
 either need something like HDFS 4721 or we need something like this
   recoverLease(...)
   sleep(1000)
   recoverLease(...)
   sleep(configuredTimeout)
   recoverLease(...)
   sleep(configuredTimeout)
 Where configuredTimeout should be large enough to let the recovery happen but 
 the first timeout is short so that we get past the moot recovery in step #1.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests

2013-04-26 Thread Varun Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643226#comment-13643226
 ] 

Varun Sharma commented on HBASE-8389:
-

Sorry about that...

If we make sure that all the dfs.socket.timeout and ipc client settings are the 
same in hbase-site.xml and hdfs-site.xml. Then, we can do a custom calculation 
of recover lease retry interval inside hbase. But basically hbase needs to 
know in some way how the timeouts are setup underneath.

Thanks
Varun

 HBASE-8354 forces Namenode into loop with lease recovery requests
 -

 Key: HBASE-8389
 URL: https://issues.apache.org/jira/browse/HBASE-8389
 Project: HBase
  Issue Type: Bug
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.94.8

 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 
 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 
 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, 
 sample.patch


 We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease 
 recoveries because of the short retry interval of 1 second between lease 
 recoveries.
 The namenode gets into the following loop:
 1) Receives lease recovery request and initiates recovery choosing a primary 
 datanode every second
 2) A lease recovery is successful and the namenode tries to commit the block 
 under recovery as finalized - this takes  10 seconds in our environment 
 since we run with tight HDFS socket timeouts.
 3) At step 2), there is a more recent recovery enqueued because of the 
 aggressive retries. This causes the committed block to get preempted and we 
 enter a vicious cycle
 So we do,  initiate_recovery -- commit_block -- 
 commit_preempted_by_another_recovery
 This loop is paused after 300 seconds which is the 
 hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes 
 which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node 
 detection timeout is 20 seconds.
 Note that before the patch, we do not call recoverLease so aggressively - 
 also it seems that the HDFS namenode is pretty dumb in that it keeps 
 initiating new recoveries for every call. Before the patch, we call 
 recoverLease, assume that the block was recovered, try to get the file, it 
 has zero length since its under recovery, we fail the task and retry until we 
 get a non zero length. So things just work.
 Fixes:
 1) Expecting recovery to occur within 1 second is too aggressive. We need to 
 have a more generous timeout. The timeout needs to be configurable since 
 typically, the recovery takes as much time as the DFS timeouts. The primary 
 datanode doing the recovery tries to reconcile the blocks and hits the 
 timeouts when it tries to contact the dead node. So the recovery is as fast 
 as the HDFS timeouts.
 2) We have another issue I report in HDFS 4721. The Namenode chooses the 
 stale datanode to perform the recovery (since its still alive). Hence the 
 first recovery request is bound to fail. So if we want a tight MTTR, we 
 either need something like HDFS 4721 or we need something like this
   recoverLease(...)
   sleep(1000)
   recoverLease(...)
   sleep(configuredTimeout)
   recoverLease(...)
   sleep(configuredTimeout)
 Where configuredTimeout should be large enough to let the recovery happen but 
 the first timeout is short so that we get past the moot recovery in step #1.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests

2013-04-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643255#comment-13643255
 ] 

Ted Yu commented on HBASE-8389:
---

bq. If we make sure that all the dfs.socket.timeout and ipc client settings are 
the same in hbase-site.xml and hdfs-site.xml.
Should we add a check for the above at cluster startup ? If discrepancy is 
found, we can log a warning message.

 HBASE-8354 forces Namenode into loop with lease recovery requests
 -

 Key: HBASE-8389
 URL: https://issues.apache.org/jira/browse/HBASE-8389
 Project: HBase
  Issue Type: Bug
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.94.8

 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 
 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 
 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, 
 sample.patch


 We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease 
 recoveries because of the short retry interval of 1 second between lease 
 recoveries.
 The namenode gets into the following loop:
 1) Receives lease recovery request and initiates recovery choosing a primary 
 datanode every second
 2) A lease recovery is successful and the namenode tries to commit the block 
 under recovery as finalized - this takes  10 seconds in our environment 
 since we run with tight HDFS socket timeouts.
 3) At step 2), there is a more recent recovery enqueued because of the 
 aggressive retries. This causes the committed block to get preempted and we 
 enter a vicious cycle
 So we do,  initiate_recovery -- commit_block -- 
 commit_preempted_by_another_recovery
 This loop is paused after 300 seconds which is the 
 hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes 
 which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node 
 detection timeout is 20 seconds.
 Note that before the patch, we do not call recoverLease so aggressively - 
 also it seems that the HDFS namenode is pretty dumb in that it keeps 
 initiating new recoveries for every call. Before the patch, we call 
 recoverLease, assume that the block was recovered, try to get the file, it 
 has zero length since its under recovery, we fail the task and retry until we 
 get a non zero length. So things just work.
 Fixes:
 1) Expecting recovery to occur within 1 second is too aggressive. We need to 
 have a more generous timeout. The timeout needs to be configurable since 
 typically, the recovery takes as much time as the DFS timeouts. The primary 
 datanode doing the recovery tries to reconcile the blocks and hits the 
 timeouts when it tries to contact the dead node. So the recovery is as fast 
 as the HDFS timeouts.
 2) We have another issue I report in HDFS 4721. The Namenode chooses the 
 stale datanode to perform the recovery (since its still alive). Hence the 
 first recovery request is bound to fail. So if we want a tight MTTR, we 
 either need something like HDFS 4721 or we need something like this
   recoverLease(...)
   sleep(1000)
   recoverLease(...)
   sleep(configuredTimeout)
   recoverLease(...)
   sleep(configuredTimeout)
 Where configuredTimeout should be large enough to let the recovery happen but 
 the first timeout is short so that we get past the moot recovery in step #1.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HBASE-8448:
---

 Summary: RatioBasedCompactionPolicy (and derived ones) can select 
already-compacting files for compaction
 Key: HBASE-8448
 URL: https://issues.apache.org/jira/browse/HBASE-8448
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin


The code added to make sure it doesn't get stuck, doesn't take into account 
filesCompacting.
This is the cause of recent TestHFileArchiving failures...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HBASE-8448:
---

Assignee: Sergey Shelukhin

 RatioBasedCompactionPolicy (and derived ones) can select already-compacting 
 files for compaction
 

 Key: HBASE-8448
 URL: https://issues.apache.org/jira/browse/HBASE-8448
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 The code added to make sure it doesn't get stuck, doesn't take into account 
 filesCompacting.
 This is the cause of recent TestHFileArchiving failures...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-8448:


Attachment: HBASE-8448-v0.patch

 RatioBasedCompactionPolicy (and derived ones) can select already-compacting 
 files for compaction
 

 Key: HBASE-8448
 URL: https://issues.apache.org/jira/browse/HBASE-8448
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8448-v0.patch


 The code added to make sure it doesn't get stuck, doesn't take into account 
 filesCompacting.
 This is the cause of recent TestHFileArchiving failures...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-8448:


Status: Patch Available  (was: Open)

 RatioBasedCompactionPolicy (and derived ones) can select already-compacting 
 files for compaction
 

 Key: HBASE-8448
 URL: https://issues.apache.org/jira/browse/HBASE-8448
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8448-v0.patch


 The code added to make sure it doesn't get stuck, doesn't take into account 
 filesCompacting.
 This is the cause of recent TestHFileArchiving failures...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643278#comment-13643278
 ] 

Sergey Shelukhin commented on HBASE-8448:
-

tiny patch

 RatioBasedCompactionPolicy (and derived ones) can select already-compacting 
 files for compaction
 

 Key: HBASE-8448
 URL: https://issues.apache.org/jira/browse/HBASE-8448
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8448-v0.patch


 The code added to make sure it doesn't get stuck, doesn't take into account 
 filesCompacting.
 This is the cause of recent TestHFileArchiving failures...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-8448:


Component/s: Compaction

 RatioBasedCompactionPolicy (and derived ones) can select already-compacting 
 files for compaction
 

 Key: HBASE-8448
 URL: https://issues.apache.org/jira/browse/HBASE-8448
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8448-v0.patch


 The code added to make sure it doesn't get stuck, doesn't take into account 
 filesCompacting.
 This is the cause of recent TestHFileArchiving failures...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6721) RegionServer Group based Assignment

2013-04-26 Thread Francis Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Liu updated HBASE-6721:
---

Attachment: HBASE-6721_8.patch

 RegionServer Group based Assignment
 ---

 Key: HBASE-6721
 URL: https://issues.apache.org/jira/browse/HBASE-6721
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.95.1

 Attachments: 6721-master-webUI.patch, HBASE-6721_8.patch, 
 HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, 
 HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, 
 HBASE-6721_94_7.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, 
 HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, 
 HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch


 In multi-tenant deployments of HBase, it is likely that a RegionServer will 
 be serving out regions from a number of different tables owned by various 
 client applications. Being able to group a subset of running RegionServers 
 and assign specific tables to it, provides a client application a level of 
 isolation and resource allocation.
 The proposal essentially is to have an AssignmentManager which is aware of 
 RegionServer groups and assigns tables to region servers based on groupings. 
 Load balancing will occur on a per group basis as well. 
 This is essentially a simplification of the approach taken in HBASE-4120. See 
 attached document.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8449) Refactor recoverLease retries and pauses informed by findings over in hbase-8354

2013-04-26 Thread stack (JIRA)
stack created HBASE-8449:


 Summary: Refactor recoverLease retries and pauses informed by 
findings over in hbase-8354
 Key: HBASE-8449
 URL: https://issues.apache.org/jira/browse/HBASE-8449
 Project: HBase
  Issue Type: Bug
  Components: Filesystem Integration
Affects Versions: 0.95.0, 0.94.7
Reporter: stack
Priority: Critical
 Fix For: 0.95.1


HBASE-8354 is an interesting issue that roams near and far.  This issue is 
about making use of the findings handily summarized on the end of hbase-8354 
which have it that trunk needs refactor around how it does its recoverLease 
handling (and that the patch committed against HBASE-8354 is not what we want 
going forward).

This issue is about making a patch that adds a lag between recoverLease 
invocations where the lag is related to dfs timeouts -- the hdfs-side dfs 
timeout -- and optionally makes use of the isFileClosed API if it is available 
(a facility that is not yet committed to a branch near you and unlikely to be 
within your locality with a good while to come).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests

2013-04-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643287#comment-13643287
 ] 

stack commented on HBASE-8389:
--

[~varun] Thanks.  I made HBASE-8449 for trunk patch (and to fix what is applied 
here -- the 4s in particular).

 HBASE-8354 forces Namenode into loop with lease recovery requests
 -

 Key: HBASE-8389
 URL: https://issues.apache.org/jira/browse/HBASE-8389
 Project: HBase
  Issue Type: Bug
Reporter: Varun Sharma
Assignee: Varun Sharma
Priority: Critical
 Fix For: 0.94.8

 Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 
 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 
 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, 
 sample.patch


 We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease 
 recoveries because of the short retry interval of 1 second between lease 
 recoveries.
 The namenode gets into the following loop:
 1) Receives lease recovery request and initiates recovery choosing a primary 
 datanode every second
 2) A lease recovery is successful and the namenode tries to commit the block 
 under recovery as finalized - this takes  10 seconds in our environment 
 since we run with tight HDFS socket timeouts.
 3) At step 2), there is a more recent recovery enqueued because of the 
 aggressive retries. This causes the committed block to get preempted and we 
 enter a vicious cycle
 So we do,  initiate_recovery -- commit_block -- 
 commit_preempted_by_another_recovery
 This loop is paused after 300 seconds which is the 
 hbase.lease.recovery.timeout. Hence the MTTR we are observing is 5 minutes 
 which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node 
 detection timeout is 20 seconds.
 Note that before the patch, we do not call recoverLease so aggressively - 
 also it seems that the HDFS namenode is pretty dumb in that it keeps 
 initiating new recoveries for every call. Before the patch, we call 
 recoverLease, assume that the block was recovered, try to get the file, it 
 has zero length since its under recovery, we fail the task and retry until we 
 get a non zero length. So things just work.
 Fixes:
 1) Expecting recovery to occur within 1 second is too aggressive. We need to 
 have a more generous timeout. The timeout needs to be configurable since 
 typically, the recovery takes as much time as the DFS timeouts. The primary 
 datanode doing the recovery tries to reconcile the blocks and hits the 
 timeouts when it tries to contact the dead node. So the recovery is as fast 
 as the HDFS timeouts.
 2) We have another issue I report in HDFS 4721. The Namenode chooses the 
 stale datanode to perform the recovery (since its still alive). Hence the 
 first recovery request is bound to fail. So if we want a tight MTTR, we 
 either need something like HDFS 4721 or we need something like this
   recoverLease(...)
   sleep(1000)
   recoverLease(...)
   sleep(configuredTimeout)
   recoverLease(...)
   sleep(configuredTimeout)
 Where configuredTimeout should be large enough to let the recovery happen but 
 the first timeout is short so that we get past the moot recovery in step #1.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6721) RegionServer Group based Assignment

2013-04-26 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-6721:
--

Status: Patch Available  (was: Open)

 RegionServer Group based Assignment
 ---

 Key: HBASE-6721
 URL: https://issues.apache.org/jira/browse/HBASE-6721
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.95.1

 Attachments: 6721-master-webUI.patch, HBASE-6721_8.patch, 
 HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, 
 HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, 
 HBASE-6721_94_7.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, 
 HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, 
 HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch


 In multi-tenant deployments of HBase, it is likely that a RegionServer will 
 be serving out regions from a number of different tables owned by various 
 client applications. Being able to group a subset of running RegionServers 
 and assign specific tables to it, provides a client application a level of 
 isolation and resource allocation.
 The proposal essentially is to have an AssignmentManager which is aware of 
 RegionServer groups and assigns tables to region servers based on groupings. 
 Load balancing will occur on a per group basis as well. 
 This is essentially a simplification of the approach taken in HBASE-4120. See 
 attached document.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction

2013-04-26 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643294#comment-13643294
 ] 

Elliott Clark commented on HBASE-8448:
--

This could let compactions that are less than the min number of files through.

 RatioBasedCompactionPolicy (and derived ones) can select already-compacting 
 files for compaction
 

 Key: HBASE-8448
 URL: https://issues.apache.org/jira/browse/HBASE-8448
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8448-v0.patch


 The code added to make sure it doesn't get stuck, doesn't take into account 
 filesCompacting.
 This is the cause of recent TestHFileArchiving failures...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-2231) Compaction events should be written to HLog

2013-04-26 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-2231:
-

   Resolution: Fixed
Fix Version/s: 0.98.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk and 0.95. Thanks Stack.

 Compaction events should be written to HLog
 ---

 Key: HBASE-2231
 URL: https://issues.apache.org/jira/browse/HBASE-2231
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Todd Lipcon
Assignee: stack
Priority: Blocker
  Labels: moved_from_0_20_5
 Fix For: 0.98.0, 0.95.1

 Attachments: 2231-testcase-0.94.txt, 2231-testcase_v2.txt, 
 2231-testcase_v3.txt, 2231v2.txt, 2231v3.txt, 2231v4.txt, 
 hbase-2231-testcase.txt, hbase-2231.txt, hbase-2231_v5.patch, 
 hbase-2231_v6.patch, hbase-2231_v7.patch, hbase-2231_v7.patch


 The sequence for a compaction should look like this:
 # Compact region to new files
 # Write a Compacted Region entry to the HLog
 # Delete old files
 This deals with a case where the RS has paused between step 1 and 2 and the 
 regions have since been reassigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-2231) Compaction events should be written to HLog

2013-04-26 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-2231:
-

Attachment: hbase-2231_v7-0.95.patch

Attaching 0.95 version of the patch, had to resolve some minor conflicts. 

 Compaction events should be written to HLog
 ---

 Key: HBASE-2231
 URL: https://issues.apache.org/jira/browse/HBASE-2231
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Todd Lipcon
Assignee: stack
Priority: Blocker
  Labels: moved_from_0_20_5
 Fix For: 0.98.0, 0.95.1

 Attachments: 2231-testcase-0.94.txt, 2231-testcase_v2.txt, 
 2231-testcase_v3.txt, 2231v2.txt, 2231v3.txt, 2231v4.txt, 
 hbase-2231-testcase.txt, hbase-2231.txt, hbase-2231_v5.patch, 
 hbase-2231_v6.patch, hbase-2231_v7-0.95.patch, hbase-2231_v7.patch, 
 hbase-2231_v7.patch


 The sequence for a compaction should look like this:
 # Compact region to new files
 # Write a Compacted Region entry to the HLog
 # Delete old files
 This deals with a case where the RS has paused between step 1 and 2 and the 
 regions have since been reassigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643299#comment-13643299
 ] 

Sergey Shelukhin commented on HBASE-8448:
-

We are doing the sublist after the get eligible files, right

 RatioBasedCompactionPolicy (and derived ones) can select already-compacting 
 files for compaction
 

 Key: HBASE-8448
 URL: https://issues.apache.org/jira/browse/HBASE-8448
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8448-v0.patch


 The code added to make sure it doesn't get stuck, doesn't take into account 
 filesCompacting.
 This is the cause of recent TestHFileArchiving failures...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643305#comment-13643305
 ] 

Sergey Shelukhin commented on HBASE-8448:
-

Ah, I see, that's a separate problem

 RatioBasedCompactionPolicy (and derived ones) can select already-compacting 
 files for compaction
 

 Key: HBASE-8448
 URL: https://issues.apache.org/jira/browse/HBASE-8448
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8448-v0.patch


 The code added to make sure it doesn't get stuck, doesn't take into account 
 filesCompacting.
 This is the cause of recent TestHFileArchiving failures...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643309#comment-13643309
 ] 

Sergey Shelukhin commented on HBASE-8448:
-

I am going to move it to apply... after all, that way all the max-min-bulk-etc. 
checks will be honored.

 RatioBasedCompactionPolicy (and derived ones) can select already-compacting 
 files for compaction
 

 Key: HBASE-8448
 URL: https://issues.apache.org/jira/browse/HBASE-8448
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8448-v0.patch


 The code added to make sure it doesn't get stuck, doesn't take into account 
 filesCompacting.
 This is the cause of recent TestHFileArchiving failures...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-8448:


Status: Open  (was: Patch Available)

 RatioBasedCompactionPolicy (and derived ones) can select already-compacting 
 files for compaction
 

 Key: HBASE-8448
 URL: https://issues.apache.org/jira/browse/HBASE-8448
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8448-v0.patch


 The code added to make sure it doesn't get stuck, doesn't take into account 
 filesCompacting.
 This is the cause of recent TestHFileArchiving failures...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction

2013-04-26 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643312#comment-13643312
 ] 

Elliott Clark commented on HBASE-8448:
--

Sounds good to me.

 RatioBasedCompactionPolicy (and derived ones) can select already-compacting 
 files for compaction
 

 Key: HBASE-8448
 URL: https://issues.apache.org/jira/browse/HBASE-8448
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8448-v0.patch


 The code added to make sure it doesn't get stuck, doesn't take into account 
 filesCompacting.
 This is the cause of recent TestHFileArchiving failures...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8450) Update hbase-default.xml and general recommendations to better suit current hw, h2, experience, etc.

2013-04-26 Thread stack (JIRA)
stack created HBASE-8450:


 Summary: Update hbase-default.xml and general recommendations to 
better suit current hw, h2, experience, etc.
 Key: HBASE-8450
 URL: https://issues.apache.org/jira/browse/HBASE-8450
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Critical
 Fix For: 0.95.1


This is a critical task we need to do before we release; review our defaults.

On cursory review, there are configs in hbase-default.xml that no longer have 
matching code; there are some that should be changed because we know better now 
and others that should be amended because hardware and deploys are bigger than 
they used to be.

We could also move stuff around so that the must-edit stuff is near the top (zk 
quorum config. is mid-way down the page) and beef up the descriptions -- 
especially since these descriptions shine through in the refguide.

Lastly, I notice that our tgz does not include an hbase-default.xml other 
than the one bundled up in the jar.  Maybe we should make it more accessible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6721) RegionServer Group based Assignment

2013-04-26 Thread Francis Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Liu updated HBASE-6721:
---

Attachment: HBASE-6721-DesigDoc.pdf

 RegionServer Group based Assignment
 ---

 Key: HBASE-6721
 URL: https://issues.apache.org/jira/browse/HBASE-6721
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.95.1

 Attachments: 6721-master-webUI.patch, HBASE-6721_8.patch, 
 HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, 
 HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, 
 HBASE-6721_94_7.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, 
 HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, 
 HBASE-6721-DesigDoc.pdf, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, 
 HBASE-6721_trunk.patch


 In multi-tenant deployments of HBase, it is likely that a RegionServer will 
 be serving out regions from a number of different tables owned by various 
 client applications. Being able to group a subset of running RegionServers 
 and assign specific tables to it, provides a client application a level of 
 isolation and resource allocation.
 The proposal essentially is to have an AssignmentManager which is aware of 
 RegionServer groups and assigns tables to region servers based on groupings. 
 Load balancing will occur on a per group basis as well. 
 This is essentially a simplification of the approach taken in HBASE-4120. See 
 attached document.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8450) Update hbase-default.xml and general recommendations to better suit current hw, h2, experience, etc.

2013-04-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8450:
-

Attachment: 8450.txt

Here is a start:

Ups handlers to 100 from 10.

Removes the no longer referenced:

hbase.regionserver.nbreservationblocks
hbase.hash.type

Upps memstore.lowerlimit so it is close to higher limit making it 0.38 instead 
of 0.35 (high limit is 0.40).

Make major compactions run once a week instead of every day.

Remove dfs.support.append -- it only brings on a complaint if present (there is 
a UI component that also needs updating if this goes away).

What else can we do to improve basic defaults?

 Update hbase-default.xml and general recommendations to better suit current 
 hw, h2, experience, etc.
 

 Key: HBASE-8450
 URL: https://issues.apache.org/jira/browse/HBASE-8450
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Critical
 Fix For: 0.95.1

 Attachments: 8450.txt


 This is a critical task we need to do before we release; review our defaults.
 On cursory review, there are configs in hbase-default.xml that no longer have 
 matching code; there are some that should be changed because we know better 
 now and others that should be amended because hardware and deploys are bigger 
 than they used to be.
 We could also move stuff around so that the must-edit stuff is near the top 
 (zk quorum config. is mid-way down the page) and beef up the descriptions -- 
 especially since these descriptions shine through in the refguide.
 Lastly, I notice that our tgz does not include an hbase-default.xml other 
 than the one bundled up in the jar.  Maybe we should make it more accessible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8450) Update hbase-default.xml and general recommendations to better suit current hw, h2, experience, etc.

2013-04-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8450:
-

Component/s: Usability

 Update hbase-default.xml and general recommendations to better suit current 
 hw, h2, experience, etc.
 

 Key: HBASE-8450
 URL: https://issues.apache.org/jira/browse/HBASE-8450
 Project: HBase
  Issue Type: Task
  Components: Usability
Reporter: stack
Priority: Critical
 Fix For: 0.95.1

 Attachments: 8450.txt


 This is a critical task we need to do before we release; review our defaults.
 On cursory review, there are configs in hbase-default.xml that no longer have 
 matching code; there are some that should be changed because we know better 
 now and others that should be amended because hardware and deploys are bigger 
 than they used to be.
 We could also move stuff around so that the must-edit stuff is near the top 
 (zk quorum config. is mid-way down the page) and beef up the descriptions -- 
 especially since these descriptions shine through in the refguide.
 Lastly, I notice that our tgz does not include an hbase-default.xml other 
 than the one bundled up in the jar.  Maybe we should make it more accessible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment

2013-04-26 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1364#comment-1364
 ] 

Francis Liu commented on HBASE-6721:


[~saint@gmail.com] I've updated the doc. Addressing your questions. Let me 
me know if it's missing anything else.



 RegionServer Group based Assignment
 ---

 Key: HBASE-6721
 URL: https://issues.apache.org/jira/browse/HBASE-6721
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.95.1

 Attachments: 6721-master-webUI.patch, HBASE-6721_8.patch, 
 HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, 
 HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, 
 HBASE-6721_94_7.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, 
 HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, 
 HBASE-6721-DesigDoc.pdf, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, 
 HBASE-6721_trunk.patch


 In multi-tenant deployments of HBase, it is likely that a RegionServer will 
 be serving out regions from a number of different tables owned by various 
 client applications. Being able to group a subset of running RegionServers 
 and assign specific tables to it, provides a client application a level of 
 isolation and resource allocation.
 The proposal essentially is to have an AssignmentManager which is aware of 
 RegionServer groups and assigns tables to region servers based on groupings. 
 Load balancing will occur on a per group basis as well. 
 This is essentially a simplification of the approach taken in HBASE-4120. See 
 attached document.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6721) RegionServer Group based Assignment

2013-04-26 Thread Francis Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Liu updated HBASE-6721:
---

Attachment: HBASE-6721_9.patch

 RegionServer Group based Assignment
 ---

 Key: HBASE-6721
 URL: https://issues.apache.org/jira/browse/HBASE-6721
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.95.1

 Attachments: 6721-master-webUI.patch, HBASE-6721_8.patch, 
 HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, 
 HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, 
 HBASE-6721_94_7.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, 
 HBASE-6721_9.patch, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, 
 HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_trunk.patch, 
 HBASE-6721_trunk.patch, HBASE-6721_trunk.patch


 In multi-tenant deployments of HBase, it is likely that a RegionServer will 
 be serving out regions from a number of different tables owned by various 
 client applications. Being able to group a subset of running RegionServers 
 and assign specific tables to it, provides a client application a level of 
 isolation and resource allocation.
 The proposal essentially is to have an AssignmentManager which is aware of 
 RegionServer groups and assigns tables to region servers based on groupings. 
 Load balancing will occur on a per group basis as well. 
 This is essentially a simplification of the approach taken in HBASE-4120. See 
 attached document.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction

2013-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643347#comment-13643347
 ] 

Hadoop QA commented on HBASE-8448:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12580759/HBASE-8448-v0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.thrift.TestThriftServer

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5468//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5468//console

This message is automatically generated.

 RatioBasedCompactionPolicy (and derived ones) can select already-compacting 
 files for compaction
 

 Key: HBASE-8448
 URL: https://issues.apache.org/jira/browse/HBASE-8448
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8448-v0.patch


 The code added to make sure it doesn't get stuck, doesn't take into account 
 filesCompacting.
 This is the cause of recent TestHFileArchiving failures...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-8390) Trunk/0.95 cannot simply compile against Hadoop 1.0

2013-04-26 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-8390.
---

   Resolution: Fixed
Fix Version/s: 0.98.0

Looks like it worked, resolving. Thanks Stack for your help.

 Trunk/0.95 cannot simply compile against Hadoop 1.0
 ---

 Key: HBASE-8390
 URL: https://issues.apache.org/jira/browse/HBASE-8390
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.98.0, 0.95.0

 Attachments: HBASE-8390.patch


 Currently we can't simply compile against Hadoop 1.0 in 0.95 and newer, we 
 are missing a dependency in common for Apache's commons-io. Easy fix, we 
 could just add that dependency for all the profiles there. But doing it 
 correctly requires adding a new profile.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment

2013-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643354#comment-13643354
 ] 

Hadoop QA commented on HBASE-6721:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12580760/HBASE-6721_8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 30 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 6 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 2 release 
audit warnings (more than the trunk's current 0 warnings).

{color:red}-1 lineLengths{color}.  The patch introduces lines longer than 
100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.backup.TestHFileArchiving

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5469//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5469//console

This message is automatically generated.

 RegionServer Group based Assignment
 ---

 Key: HBASE-6721
 URL: https://issues.apache.org/jira/browse/HBASE-6721
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.95.1

 Attachments: 6721-master-webUI.patch, HBASE-6721_8.patch, 
 HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, 
 HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, 
 HBASE-6721_94_7.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, 
 HBASE-6721_9.patch, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, 
 HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_trunk.patch, 
 HBASE-6721_trunk.patch, HBASE-6721_trunk.patch


 In multi-tenant deployments of HBase, it is likely that a RegionServer will 
 be serving out regions from a number of different tables owned by various 
 client applications. Being able to group a subset of running RegionServers 
 and assign specific tables to it, provides a client application a level of 
 isolation and resource allocation.
 The proposal essentially is to have an AssignmentManager which is aware of 
 RegionServer groups and assigns tables to region servers based on groupings. 
 Load balancing will occur on a per group basis as well. 
 This is essentially a simplification of the approach taken in HBASE-4120. See 
 attached document.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-8448:


Attachment: HBASE-8448-v1.patch

sorry got distracted

 RatioBasedCompactionPolicy (and derived ones) can select already-compacting 
 files for compaction
 

 Key: HBASE-8448
 URL: https://issues.apache.org/jira/browse/HBASE-8448
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8448-v0.patch, HBASE-8448-v1.patch


 The code added to make sure it doesn't get stuck, doesn't take into account 
 filesCompacting.
 This is the cause of recent TestHFileArchiving failures...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8426) Opening a region failed on Metrics source RegionServer,sub=Regions already exists!

2013-04-26 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643389#comment-13643389
 ] 

Jean-Daniel Cryans commented on HBASE-8426:
---

+1 from me.

 Opening a region failed on Metrics source RegionServer,sub=Regions already 
 exists!
 

 Key: HBASE-8426
 URL: https://issues.apache.org/jira/browse/HBASE-8426
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.0
Reporter: Jean-Daniel Cryans
Assignee: Elliott Clark
Priority: Critical
 Fix For: 0.98.0, 0.95.1

 Attachments: HBASE-8426-0.patch, HBASE-8426-1.patch, 
 metrics_already_exist.txt


 I restarted a cluster on 0.95 (1ecd4c7e0b22bba75c76f2fc2ce369541502b6df) and 
 some regions failed to open on their first assignment on an exception like:
 {noformat}
 Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source 
 RegionServer,sub=Regions already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
   at 
 org.apache.hadoop.hbase.metrics.BaseSourceImpl.init(BaseSourceImpl.java:75)
   at 
 org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.init(MetricsRegionAggregateSourceImpl.java:49)
   at 
 org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.init(MetricsRegionAggregateSourceImpl.java:41)
   at 
 org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceFactoryImpl.getAggregate(MetricsRegionServerSourceFactoryImpl.java:33)
   at 
 org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceFactoryImpl.createRegion(MetricsRegionServerSourceFactoryImpl.java:50)
   at 
 org.apache.hadoop.hbase.regionserver.MetricsRegion.init(MetricsRegion.java:35)
   at org.apache.hadoop.hbase.regionserver.HRegion.init(HRegion.java:488)
   at org.apache.hadoop.hbase.regionserver.HRegion.init(HRegion.java:400)
 {noformat}
 I'm attaching a bigger log.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8449) Refactor recoverLease retries and pauses informed by findings over in hbase-8389

2013-04-26 Thread Varun Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Sharma updated HBASE-8449:


Summary: Refactor recoverLease retries and pauses informed by findings over 
in hbase-8389  (was: Refactor recoverLease retries and pauses informed by 
findings over in hbase-8354)

 Refactor recoverLease retries and pauses informed by findings over in 
 hbase-8389
 

 Key: HBASE-8449
 URL: https://issues.apache.org/jira/browse/HBASE-8449
 Project: HBase
  Issue Type: Bug
  Components: Filesystem Integration
Affects Versions: 0.94.7, 0.95.0
Reporter: stack
Priority: Critical
 Fix For: 0.95.1


 HBASE-8354 is an interesting issue that roams near and far.  This issue is 
 about making use of the findings handily summarized on the end of hbase-8354 
 which have it that trunk needs refactor around how it does its recoverLease 
 handling (and that the patch committed against HBASE-8354 is not what we want 
 going forward).
 This issue is about making a patch that adds a lag between recoverLease 
 invocations where the lag is related to dfs timeouts -- the hdfs-side dfs 
 timeout -- and optionally makes use of the isFileClosed API if it is 
 available (a facility that is not yet committed to a branch near you and 
 unlikely to be within your locality with a good while to come).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8448) RatioBasedCompactionPolicy (and derived ones) can select already-compacting files for compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-8448:


Status: Patch Available  (was: Open)

 RatioBasedCompactionPolicy (and derived ones) can select already-compacting 
 files for compaction
 

 Key: HBASE-8448
 URL: https://issues.apache.org/jira/browse/HBASE-8448
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8448-v0.patch, HBASE-8448-v1.patch


 The code added to make sure it doesn't get stuck, doesn't take into account 
 filesCompacting.
 This is the cause of recent TestHFileArchiving failures...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7667) Support stripe compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7667:


Attachment: Using stripe compactions.pdf

First draft of user-level doc. After trying to describe the size-based scheme, 
I think it should be improved. I will do that. Meanwhile there's design doc and 
user doc, so I'd like to get some reviews ;)
I will rebase and update all patches between now and monday. [~stack] 
[~mbertozzi] what do you guys think?

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Using stripe 
 compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7667) Support stripe compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7667:


Attachment: Using stripe compactions.pdf

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Using stripe 
 compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >