date:20120731

[
https://issues.apache.org/jira/browse/HBASE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425548#comment-13425548
]

Shrijeet Paliwal commented on HBASE-6468:
-

{quote}
FirstKeyValueMatchingQualifiersFilter - When a CF contains qualifier a,b,c and
the qualifiers provided are b and c, what will happen to the KVs for qualifier
'a' ? Will this be included in the Result? Is this expected?
{quote}
Depends on whether or not first KV matching any of the columns associated with
filter have been seen yet or not.

{quote}
Can the qualifiers be accommodated in FirstKeyOnlyFilter only? Do we need a new
Filter? Just a though from my side.
By default FirstKeyOnlyFilter allow only the 1st KV (from all the qualifiers)
from a CF to come in the Result and will filter out other KVs.
Specifying a set of qualifiers to FirstKeyOnlyFilter will restrict the
selection of the 1st KV from a any of these qualifiers only. It will filter out
all KVs from other qualifiers.
{quote}
FKVMatchingQualifiersFilter has a peculiar behavior hence (extending it from
FirstKeyOnlyFilter and) creating a new filter made sense.

RowCounter may return incorrect result if column name is specified in command
line
--

Key: HBASE-6468
URL: https://issues.apache.org/jira/browse/HBASE-6468
Project: HBase
Issue Type: Bug
Affects Versions: 0.90.5
Reporter: Shrijeet Paliwal
Attachments:
0001-HBASE-6468-RowCounter-may-return-incorrect-result.patch,
0002-HBASE-6468-RowCounter-may-return-incorrect-result.patch

The RowCounter use FirstKeyOnlyFilter regardless of whether or not the
command line argument specified a column family (or family:qualifier).
In case when no qualifier was specified as argument, the scan will
give correct result. However in the other case the scan instance may
have been set with columns other than the very first column in the
row, causing scan to get nothing as the FirstKeyOnlyFilter removes
everything else.
https://issues.apache.org/jira/browse/HBASE-6042 is related.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6479) HFileReaderV1 caching the same parent META block could cause server abot when splitting

chunhui shen created HBASE-6479:
---

 Summary: HFileReaderV1 caching the same parent META block could 
cause server abot when splitting
 Key: HBASE-6479
 URL: https://issues.apache.org/jira/browse/HBASE-6479
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: test.patch

If the hfile's version is 1 now, when splitting, two daughters would 
loadBloomfilter concurrently in the open progress. Because their META block is 
the same one(parent's META block),  the following expection would be thrown 
when doing HFileReaderV1#getMetaBlock
{code}
java.io.IOException: Failed null-daughterOpener=af73f8c9a9b409531ac211a9a7f92eba
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:367)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:453)
at 
org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplit(TestSplitTransaction.java:225)
at 
org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplitWithHFileV1(TestSplitTransaction.java:203)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
at org.junit.rules.RunRules.evaluate(RunRules.java:18)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.io.IOException: java.io.IOException: 
java.lang.RuntimeException: Cached an already cached block
at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:540)
at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:463)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3784)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:506)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:486)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.lang.RuntimeException: Cached an already 
cached block
at 
org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:424)
at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:271)
at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2918)
at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:516)
at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

[jira] [Updated] (HBASE-6479) HFileReaderV1 caching the same parent META block could cause server abot when splitting


 [ 
https://issues.apache.org/jira/browse/HBASE-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-6479:


Attachment: test.patch

 HFileReaderV1 caching the same parent META block could cause server abot when 
 splitting
 ---

 Key: HBASE-6479
 URL: https://issues.apache.org/jira/browse/HBASE-6479
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: test.patch


 If the hfile's version is 1 now, when splitting, two daughters would 
 loadBloomfilter concurrently in the open progress. Because their META block 
 is the same one(parent's META block),  the following expection would be 
 thrown when doing HFileReaderV1#getMetaBlock
 {code}
 java.io.IOException: Failed 
 null-daughterOpener=af73f8c9a9b409531ac211a9a7f92eba
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:367)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:453)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplit(TestSplitTransaction.java:225)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplitWithHFileV1(TestSplitTransaction.java:203)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at 
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49)
   at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
 Caused by: java.io.IOException: java.io.IOException: 
 java.lang.RuntimeException: Cached an already cached block
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:540)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:463)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3784)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:506)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:486)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: java.lang.RuntimeException: Cached an already 
 cached block
   at 
 org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:424)
   at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:271)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2918)
   at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:516)
   at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1)

[jira] [Updated] (HBASE-6479) HFileReaderV1 caching the same parent META block could cause server abot when splitting


 [ 
https://issues.apache.org/jira/browse/HBASE-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-6479:


Description: 
If the hfile's version is 1 now, when splitting, two daughters would 
loadBloomfilter concurrently in the open progress. Because their META block is 
the same one(parent's META block),  the following expection would be thrown 
when doing HFileReaderV1#getMetaBlock
{code}
java.io.IOException: Failed null-daughterOpener=af73f8c9a9b409531ac211a9a7f92eba
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:367)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:453)
at 
org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplit(TestSplitTransaction.java:225)
at 
org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplitWithHFileV1(TestSplitTransaction.java:203)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
at org.junit.rules.RunRules.evaluate(RunRules.java:18)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.io.IOException: java.io.IOException: 
java.lang.RuntimeException: Cached an already cached block
at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:540)
at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:463)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3784)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:506)
at 
org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:486)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.lang.RuntimeException: Cached an already 
cached block
at 
org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:424)
at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:271)
at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2918)
at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:516)
at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at

[jira] [Commented] (HBASE-6479) HFileReaderV1 caching the same parent META block could cause server abot when splitting


[ 
https://issues.apache.org/jira/browse/HBASE-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425560#comment-13425560
 ] 

chunhui shen commented on HBASE-6479:
-

An easy way to fix this case is disable cache meta block when loadBloomfilter()

or don't throw the exception of Cached an already cached block


 HFileReaderV1 caching the same parent META block could cause server abot when 
 splitting
 ---

 Key: HBASE-6479
 URL: https://issues.apache.org/jira/browse/HBASE-6479
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: test.patch


 If the hfile's version is 1 now, when splitting, two daughters would 
 loadBloomfilter concurrently in the open progress. Because their META block 
 is the same one(parent's META block),  the following expection would be 
 thrown when doing HFileReaderV1#getMetaBlock
 {code}
 java.io.IOException: Failed 
 null-daughterOpener=af73f8c9a9b409531ac211a9a7f92eba
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:367)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:453)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplit(TestSplitTransaction.java:225)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplitWithHFileV1(TestSplitTransaction.java:203)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at 
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49)
   at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
 Caused by: java.io.IOException: java.io.IOException: 
 java.lang.RuntimeException: Cached an already cached block
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:540)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:463)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3784)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:506)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:486)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: java.lang.RuntimeException: Cached an already 
 cached block
   at 
 org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:424)
   at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:271)
   at

[jira] [Updated] (HBASE-6479) HFileReaderV1 caching the same parent META block could cause server abot when splitting


 [ 
https://issues.apache.org/jira/browse/HBASE-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-6479:


Attachment: HBASE-6479.patch

 HFileReaderV1 caching the same parent META block could cause server abot when 
 splitting
 ---

 Key: HBASE-6479
 URL: https://issues.apache.org/jira/browse/HBASE-6479
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6479.patch, test.patch


 If the hfile's version is 1 now, when splitting, two daughters would 
 loadBloomfilter concurrently in the open progress. Because their META block 
 is the same one(parent's META block),  the following expection would be 
 thrown when doing HFileReaderV1#getMetaBlock
 {code}
 java.io.IOException: Failed 
 null-daughterOpener=af73f8c9a9b409531ac211a9a7f92eba
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:367)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:453)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplit(TestSplitTransaction.java:225)
   at 
 org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplitWithHFileV1(TestSplitTransaction.java:203)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at 
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49)
   at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
 Caused by: java.io.IOException: java.io.IOException: 
 java.lang.RuntimeException: Cached an already cached block
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:540)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:463)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3784)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:506)
   at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:486)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: java.lang.RuntimeException: Cached an already 
 cached block
   at 
 org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:424)
   at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:271)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2918)
   at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:516)
   at

[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent

[
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425592#comment-13425592
]

nkeywal commented on HBASE-6476:

bq. How hard would it be to add a maven task that checks for that, so we do not
introduce System.currentTimeMillis back in the future?

It could easily be done on the build env, as there is a script that we can
change. We could add a simple grep there.
The proper way would be to run something as pmd, adding rules is not difficult.
But it would require some configuration to distinguish the debt vs. the new
errors. Or we would activate only the totally clean rules.

bq. Would be a problem too, if we globally mess with the EnvironmentEdge.
There are some tests that play with the EnvironmentEdgeManager, they had to be
made medium as it was not possible to have them on a shared jvm as the small
tests.

Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge
equivalent
-

Key: HBASE-6476
URL: https://issues.apache.org/jira/browse/HBASE-6476
Project: HBase
Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
Fix For: 0.94.2

There are still some areas where System.currentTimeMillis() is used in HBase.
In order to make all parts of the code base testable and (potentially) to be
able to configure HBase's notion of time, this should be generally be
replaced with EnvironmentEdgeManager.currentTimeMillis().
How hard would it be to add a maven task that checks for that, so we do not
introduce System.currentTimeMillis back in the future?

[jira] [Commented] (HBASE-6468) RowCounter may return incorrect result if column name is specified in command line

2012-07-31 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425604#comment-13425604
 ] 

Anoop Sam John commented on HBASE-6468:
---

So when a row r1 contains KVs with qualifier a,b,c (for a give CF) and the 
qualifier in FirstKeyValueMatchingQualifiersFilter are b,c
we will include all KVs for qualifier a and one KV(1st KV) for qualifier b/c in 
the Result for row r1. Is this expected? I was thinking that this new Filter 
also will select only one KV for a row. But the selection is from a subset of 
qualifiers not whole.
[KVs for qualifier a will come before b and c]

 RowCounter may return incorrect result if column name is specified in command 
 line
 --

 Key: HBASE-6468
 URL: https://issues.apache.org/jira/browse/HBASE-6468
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: Shrijeet Paliwal
 Attachments: 
 0001-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 
 0002-HBASE-6468-RowCounter-may-return-incorrect-result.patch


 The RowCounter use FirstKeyOnlyFilter regardless of whether or not the
 command line argument specified a column family (or family:qualifier).
 In case when no qualifier was specified as argument, the scan will
 give correct result. However in the other case the scan instance may
 have been set with columns other than the very first column in the
 row, causing scan to get nothing as the FirstKeyOnlyFilter removes
 everything else.
 https://issues.apache.org/jira/browse/HBASE-6042 is related. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6480) If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall

binlijin created HBASE-6480:
---

 Summary: If callQueueSize exceed maxQueueSize, all call will be 
rejected, do not reject priorityCall 
 Key: HBASE-6480
 URL: https://issues.apache.org/jira/browse/HBASE-6480
 Project: HBase
  Issue Type: Bug
Reporter: binlijin


Current if the callQueueSize exceed maxQueueSize, all call will be rejected, 
Should we let the priority Call pass through?

Current:
if ((callSize + callQueueSize.get())  maxQueueSize) {
  Call callTooBig = xxx
  return ;
}
if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
  priorityCallQueue.put(call);
  updateCallQueueLenMetrics(priorityCallQueue);
} else {
  callQueue.put(call);  // queue the call; maybe blocked here
  updateCallQueueLenMetrics(callQueue);
}

Should we change it to :
if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
  priorityCallQueue.put(call);
  updateCallQueueLenMetrics(priorityCallQueue);
} else {
  if ((callSize + callQueueSize.get())  maxQueueSize) {
   Call callTooBig = xxx
   return ;
  }
  callQueue.put(call);  // queue the call; maybe blocked here
  updateCallQueueLenMetrics(callQueue);
}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6480) If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall


 [ 
https://issues.apache.org/jira/browse/HBASE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-6480:


Attachment: HBASE-6480-94.patch

 If callQueueSize exceed maxQueueSize, all call will be rejected, do not 
 reject priorityCall 
 

 Key: HBASE-6480
 URL: https://issues.apache.org/jira/browse/HBASE-6480
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Attachments: HBASE-6480-94.patch, HBASE-6480-trunk.patch


 Current if the callQueueSize exceed maxQueueSize, all call will be rejected, 
 Should we let the priority Call pass through?
 Current:
 if ((callSize + callQueueSize.get())  maxQueueSize) {
   Call callTooBig = xxx
   return ;
 }
 if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
   priorityCallQueue.put(call);
   updateCallQueueLenMetrics(priorityCallQueue);
 } else {
   callQueue.put(call);  // queue the call; maybe blocked here
   updateCallQueueLenMetrics(callQueue);
 }
 Should we change it to :
 if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
   priorityCallQueue.put(call);
   updateCallQueueLenMetrics(priorityCallQueue);
 } else {
   if ((callSize + callQueueSize.get())  maxQueueSize) {
Call callTooBig = xxx
return ;
   }
   callQueue.put(call);  // queue the call; maybe blocked here
   updateCallQueueLenMetrics(callQueue);
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6480) If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall


 [ 
https://issues.apache.org/jira/browse/HBASE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-6480:


Attachment: HBASE-6480-trunk.patch

 If callQueueSize exceed maxQueueSize, all call will be rejected, do not 
 reject priorityCall 
 

 Key: HBASE-6480
 URL: https://issues.apache.org/jira/browse/HBASE-6480
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Attachments: HBASE-6480-94.patch, HBASE-6480-trunk.patch


 Current if the callQueueSize exceed maxQueueSize, all call will be rejected, 
 Should we let the priority Call pass through?
 Current:
 if ((callSize + callQueueSize.get())  maxQueueSize) {
   Call callTooBig = xxx
   return ;
 }
 if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
   priorityCallQueue.put(call);
   updateCallQueueLenMetrics(priorityCallQueue);
 } else {
   callQueue.put(call);  // queue the call; maybe blocked here
   updateCallQueueLenMetrics(callQueue);
 }
 Should we change it to :
 if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
   priorityCallQueue.put(call);
   updateCallQueueLenMetrics(priorityCallQueue);
 } else {
   if ((callSize + callQueueSize.get())  maxQueueSize) {
Call callTooBig = xxx
return ;
   }
   callQueue.put(call);  // queue the call; maybe blocked here
   updateCallQueueLenMetrics(callQueue);
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6372) Add scanner batching to Export job

2012-07-31 Thread Shengsheng Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shengsheng Huang updated HBASE-6372:


Attachment: HBASE-6372.2.patch

Modified patch according to @Jonathan's suggestion. Also removed the formatting 
issue as @stack indicates.

 Add scanner batching to Export job
 --

 Key: HBASE-6372
 URL: https://issues.apache.org/jira/browse/HBASE-6372
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.96.0, 0.94.2
Reporter: Lars George
Assignee: Shengsheng Huang
Priority: Minor
  Labels: newbie
 Attachments: HBASE-6372.2.patch, HBASE-6372.patch


 When a single row is too large for the RS heap then an OOME can take out the 
 entire RS. Setting scanner batching in custom scans helps avoiding this 
 scenario, but for the supplied Export job this is not set.
 Similar to HBASE-3421 we can set the batching to a low number - or if needed 
 make it a command line option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6372) Add scanner batching to Export job

[
https://issues.apache.org/jira/browse/HBASE-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425643#comment-13425643
]

Hadoop QA commented on HBASE-6372:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12538518/HBASE-6372.2.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.regionserver.TestAtomicOperation

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2459//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2459//console

This message is automatically generated.

Add scanner batching to Export job
--

Key: HBASE-6372
URL: https://issues.apache.org/jira/browse/HBASE-6372
Project: HBase
Issue Type: Improvement
Components: mapreduce
Affects Versions: 0.96.0, 0.94.2
Reporter: Lars George
Assignee: Shengsheng Huang
Priority: Minor
Labels: newbie
Attachments: HBASE-6372.2.patch, HBASE-6372.patch

When a single row is too large for the RS heap then an OOME can take out the
entire RS. Setting scanner batching in custom scans helps avoiding this
scenario, but for the supplied Export job this is not set.
Similar to HBASE-3421 we can set the batching to a low number - or if needed
make it a command line option.

[jira] [Commented] (HBASE-6460) hbck -repairHoles shortcut doesn't enable -fixHdfsOrphans

2012-07-31 Thread Jie Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425746#comment-13425746
 ] 

Jie Huang commented on HBASE-6460:
--

OK. I see your point. We can fix this issue to make the help info and the code 
implementation consistent here. Regarding that feature, I wonder if we can run 
hbck without -fixHdfsOrphans to ignore those orphan regions.  Any comment?

 hbck -repairHoles shortcut doesn't enable -fixHdfsOrphans
 -

 Key: HBASE-6460
 URL: https://issues.apache.org/jira/browse/HBASE-6460
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.94.0, 0.96.0
Reporter: Jie Huang
Priority: Minor
 Attachments: hbase-6460.patch


 According to the hbck's help info, shortcut - -repairHoles will enable 
 -fixHdfsOrphans as below.
 {noformat}
  -repairHoles  Shortcut for -fixAssignments -fixMeta -fixHdfsHoles 
 -fixHdfsOrphans
 {noformat}
 However, in the implementation, the function fsck.setFixHdfsOrphans(false); 
 is called in -repairHoles. This is not consistent with the usage 
 information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6468) RowCounter may return incorrect result if column name is specified in command line


[ 
https://issues.apache.org/jira/browse/HBASE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425780#comment-13425780
 ] 

Zhihong Ted Yu commented on HBASE-6468:
---

@Anoop:
The behavior you described is expected for the new Filter, considering it is 
used for row counting.

 RowCounter may return incorrect result if column name is specified in command 
 line
 --

 Key: HBASE-6468
 URL: https://issues.apache.org/jira/browse/HBASE-6468
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: Shrijeet Paliwal
 Attachments: 
 0001-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 
 0002-HBASE-6468-RowCounter-may-return-incorrect-result.patch


 The RowCounter use FirstKeyOnlyFilter regardless of whether or not the
 command line argument specified a column family (or family:qualifier).
 In case when no qualifier was specified as argument, the scan will
 give correct result. However in the other case the scan instance may
 have been set with columns other than the very first column in the
 row, causing scan to get nothing as the FirstKeyOnlyFilter removes
 everything else.
 https://issues.apache.org/jira/browse/HBASE-6042 is related. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6411) Move Master Metrics to metrics 2


 [ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Baranau updated HBASE-6411:


Attachment: HBASE-6411-4.patch

Uploaded newer diff on review board. Fixed nits.

Note: after trunk update had to remove
{noformat}
  public RegionsInTransitionInfo[] getRegionsInTransition()
{noformat}
method from master.metrics.MBean. I believe that this metric is now exposed via 
metrics, not via this bean. Please correct me if I'm wrong.

There are two Qs for Elliott inside, pasting here for convenience:

1.

{noformat}
/hbase-server/src/main/java/org/apache/hadoop/hbase/master/metrics/MXBean.java 
(Diff revision 1)
18  
package org.apache.hadoop.hbase.master.metrics;
{noformat}

Ted:
Should this class be in org.apache.hadoop.hbase.master namespace ?
  
Alex Baranau:
Hm, I guess we now have two pairs of classes: MXBean and MXBeanImpl in 
org.apache.hadoop.hbase.master and in org.apache.hadoop.hbase.master.metrics. 
Not sure what was intended by Elliott here. I assume that he forgot to remove 
one of them (in org.apache.hadoop.hbase.master? why to move it in metrics 
package then?)

Elliott, could you provide some insight here please?

2. 
{noformat}
/hbase-server/src/main/java/org/apache/hadoop/hbase/master/metrics/MasterMetrics.java
 (Diff revision 1)
63 Deleted: 
  final MetricsHistogram splitTime = new MetricsHistogram(splitTime, 
registry);
{noformat}
We don't maintain such metrics now ?

Alex Baranau
I believe Elliott is working on new such metrics (different issue) and this is 
why he removed it. Elliott?

 Move Master Metrics to metrics 2
 

 Key: HBASE-6411
 URL: https://issues.apache.org/jira/browse/HBASE-6411
 Project: HBase
  Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, 
 HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch, 
 HBASE-6411_concept.patch


 Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent


[ 
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425867#comment-13425867
 ] 

Lars Hofhansl commented on HBASE-6476:
--

bq. There are some tests that play with the EnvironmentEdgeManager, they had to 
be made medium as it was not possible to have them on a shared jvm as the small 
tests.

So simply replacing all of System.currentTimeMillis() with 
EnvironmentEdgeManager.currentTimeMillis() should not be a problem, but if a 
test would actually mess with it, it would need to run on its own JVM.

Do you see any other problems with just doing wholesale scripted replace?

 Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge 
 equivalent
 -

 Key: HBASE-6476
 URL: https://issues.apache.org/jira/browse/HBASE-6476
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.2


 There are still some areas where System.currentTimeMillis() is used in HBase. 
 In order to make all parts of the code base testable and (potentially) to be 
 able to configure HBase's notion of time, this should be generally be 
 replaced with EnvironmentEdgeManager.currentTimeMillis().
 How hard would it be to add a maven task that checks for that, so we do not 
 introduce System.currentTimeMillis back in the future?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent


[ 
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425877#comment-13425877
 ] 

nkeywal commented on HBASE-6476:


I think it should be ok!
And it will be cleaner as well.

 Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge 
 equivalent
 -

 Key: HBASE-6476
 URL: https://issues.apache.org/jira/browse/HBASE-6476
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.2


 There are still some areas where System.currentTimeMillis() is used in HBase. 
 In order to make all parts of the code base testable and (potentially) to be 
 able to configure HBase's notion of time, this should be generally be 
 replaced with EnvironmentEdgeManager.currentTimeMillis().
 How hard would it be to add a maven task that checks for that, so we do not 
 introduce System.currentTimeMillis back in the future?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425878#comment-13425878
 ] 

nkeywal commented on HBASE-6435:


Tested on a real cluster by adding validation code on a region server, went ok. 
I don't have a real idea on how to activate it just for some hadoop versions, 
so I will do a last clean-up on the logs and propose a final version.


 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6460) hbck -repairHoles shortcut doesn't enable -fixHdfsOrphans

2012-07-31 Thread Jonathan Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425915#comment-13425915
 ] 

Jonathan Hsieh commented on HBASE-6460:
---

Jie, if the -fixHdfsOrphans option is not set, it will not attempt to fix the 
problem, but still will report it.  The -repairHoles flag is purely a 
convenience option.  It probably better to exclude also because we can easily 
set -repairHoles -fixHdfsOrphans but currently cannot take away a set option.

When you say ignore, do you mean treat it as a warning as opposed to a error?

 hbck -repairHoles shortcut doesn't enable -fixHdfsOrphans
 -

 Key: HBASE-6460
 URL: https://issues.apache.org/jira/browse/HBASE-6460
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.94.0, 0.96.0
Reporter: Jie Huang
Priority: Minor
 Attachments: hbase-6460.patch


 According to the hbck's help info, shortcut - -repairHoles will enable 
 -fixHdfsOrphans as below.
 {noformat}
  -repairHoles  Shortcut for -fixAssignments -fixMeta -fixHdfsHoles 
 -fixHdfsOrphans
 {noformat}
 However, in the implementation, the function fsck.setFixHdfsOrphans(false); 
 is called in -repairHoles. This is not consistent with the usage 
 information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5987) HFileBlockIndex improvement

[
https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhihong Ted Yu updated HBASE-5987:
--

Fix Version/s: 0.96.0

HFileBlockIndex improvement
---

Key: HBASE-5987
URL: https://issues.apache.org/jira/browse/HBASE-5987
Project: HBase
Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
Fix For: 0.96.0

Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch,
D3237.4.patch, D3237.5.patch, D3237.6.patch, D3237.7.patch, D3237.8.patch,
screen_shot_of_sequential_scan_profiling.png

Recently we find out a performance problem that it is quite slow when
multiple requests are reading the same block of data or index.
From the profiling, one of the causes is the IdLock contention which has been
addressed in HBASE-5898.
Another issue is that the HFileScanner will keep asking the HFileBlockIndex
about the data block location for each target key value during the scan
process(reSeekTo), even though the target key value has already been in the
current data block. This issue will cause certain index block very HOT,
especially when it is a sequential scan.
To solve this issue, we propose the following solutions:
First, we propose to lookahead for one more block index so that the
HFileScanner would know the start key value of next data block. So if the
target key value for the scan(reSeekTo) is smaller than that start kv of
next data block, it means the target key value has a very high possibility in
the current data block (if not in current data block, then the start kv of
next data block should be returned. +Indexing on the start key has some
defects here+) and it shall NOT query the HFileBlockIndex in this case. On
the contrary, if the target key value is bigger, then it shall query the
HFileBlockIndex. This improvement shall help to reduce the hotness of
HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block
Cache lookup.
Secondary, we propose to push this idea a little further that the
HFileBlockIndex shall index on the last key value of each data block instead
of indexing on the start key value. The motivation is to solve the HBASE-4443
issue (avoid seeking to previous block when key you are interested in is
the first one of a block) as well as +the defects mentioned above+.
For example, if the target key value is smaller than the start key value of
the data block N. There is no way for sure the target key value is in the
data block N or N-1. So it has to seek from data block N-1. However, if the
block index is based on the last key value for each data block and the target
key value is beween the last key value of data block N-1 and data block N,
then the target key value is supposed be data block N for sure.
As long as HBase only supports the forward scan, the last key value makes
more sense to be indexed on than the start key value.
Thanks Kannan and Mikhail for the insightful discussions and suggestions.

[jira] [Commented] (HBASE-6454) Write PB definitions for filters

2012-07-31 Thread Gregory Chanan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425938#comment-13425938
 ] 

Gregory Chanan commented on HBASE-6454:
---

It was originally nested in Condition in Client.proto.  I needed CompareType 
for filters and all the shared types are in hbase.proto, so I moved it there.  
I didn't need the entire condition type, so I just took what I needed.

I'm happy to do whatever you think is best here.

 Write PB definitions for filters
 

 Key: HBASE-6454
 URL: https://issues.apache.org/jira/browse/HBASE-6454
 Project: HBase
  Issue Type: Task
  Components: ipc, migration
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.96.0

 Attachments: HBASE-6454.patch


 See HBASE-5447.
 Conversion to protobuf requires writing protobuf definitions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-31 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425943#comment-13425943
 ] 

Elliott Clark commented on HBASE-6411:
--

{quote}
Hm, I guess we now have two pairs of classes: MXBean and MXBeanImpl in 
org.apache.hadoop.hbase.master and in org.apache.hadoop.hbase.master.metrics. 
Not sure what was intended by Elliott here. I assume that he forgot to remove 
one of them (in org.apache.hadoop.hbase.master? why to move it in metrics 
package then?)

Elliott, could you provide some insight here please?
{quote}

Yea, I must have just missed deleting them.  The move was just because those 
classes are only about metrics and not used anywhere else.  So might as well 
clean up as we go.  They were interface private so moving shouldn't be an issue.

{quote}
I believe Elliott is working on new such metrics (different issue) and this is 
why he removed it. Elliott?
{quote}

Correct.  One of the sub-tasks of 4050 is creating a metrics2 
histogram(Actually there will be two but that's out of scope) and using 
histograms where ever it's useful.

 Move Master Metrics to metrics 2
 

 Key: HBASE-6411
 URL: https://issues.apache.org/jira/browse/HBASE-6411
 Project: HBase
  Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, 
 HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch, 
 HBASE-6411_concept.patch


 Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent


[ 
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425949#comment-13425949
 ] 

Lars Hofhansl commented on HBASE-6476:
--

What could happen, though, is that a test that formerly used 
System.currentTimeMillis that was run in shared VM with a test that messed with 
the environmentedge, would not potentially have problems if we switched it to 
EnvironmentEdge. Although, I do not think there are many of these, and a test 
run will show.

 Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge 
 equivalent
 -

 Key: HBASE-6476
 URL: https://issues.apache.org/jira/browse/HBASE-6476
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.2


 There are still some areas where System.currentTimeMillis() is used in HBase. 
 In order to make all parts of the code base testable and (potentially) to be 
 able to configure HBase's notion of time, this should be generally be 
 replaced with EnvironmentEdgeManager.currentTimeMillis().
 How hard would it be to add a maven task that checks for that, so we do not 
 introduce System.currentTimeMillis back in the future?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent

2012-07-31 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425953#comment-13425953
 ] 

Andrew Purtell commented on HBASE-6476:
---

+1

We've been replacing as needed but why not a one time global replacement. 
Adding a conformance check is nice.

 Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge 
 equivalent
 -

 Key: HBASE-6476
 URL: https://issues.apache.org/jira/browse/HBASE-6476
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.2


 There are still some areas where System.currentTimeMillis() is used in HBase. 
 In order to make all parts of the code base testable and (potentially) to be 
 able to configure HBase's notion of time, this should be generally be 
 replaced with EnvironmentEdgeManager.currentTimeMillis().
 How hard would it be to add a maven task that checks for that, so we do not 
 introduce System.currentTimeMillis back in the future?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2


[ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425954#comment-13425954
 ] 

Zhihong Ted Yu commented on HBASE-6411:
---

Since MXBean.java is in master.metrics, should TestMXBean.java be in the same 
package ?

 Move Master Metrics to metrics 2
 

 Key: HBASE-6411
 URL: https://issues.apache.org/jira/browse/HBASE-6411
 Project: HBase
  Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, 
 HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch, 
 HBASE-6411_concept.patch


 Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6454) Write PB definitions for filters


[ 
https://issues.apache.org/jira/browse/HBASE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425955#comment-13425955
 ] 

Zhihong Ted Yu commented on HBASE-6454:
---

That's fine Gregory.
Looking forward to HBASE-6477.

I will integrate by tomorrow morning if there is no objection.

 Write PB definitions for filters
 

 Key: HBASE-6454
 URL: https://issues.apache.org/jira/browse/HBASE-6454
 Project: HBase
  Issue Type: Task
  Components: ipc, migration
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.96.0

 Attachments: HBASE-6454.patch


 See HBASE-5447.
 Conversion to protobuf requires writing protobuf definitions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6478) TestClassLoading.testClassLoadingFromLibDirInJar in coprocessor may appear fail due to waitTableAvailable

2012-07-31 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425956#comment-13425956
 ] 

Andrew Purtell commented on HBASE-6478:
---

Or does the contract implied by waitTableAvailable suggest improving its test 
rather than adding a new waitTableEnabled?

 TestClassLoading.testClassLoadingFromLibDirInJar in coprocessor may appear 
 fail due to waitTableAvailable
 -

 Key: HBASE-6478
 URL: https://issues.apache.org/jira/browse/HBASE-6478
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: zhou wenjian
 Fix For: 0.96.0

 Attachments: HBASE-6478-trunk.patch


 When hudson runs for HBASE-6459, it encounters a failed testcase in 
 org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLibDirInJar.
  The link is 
 https://builds.apache.org/job/PreCommit-HBASE-Build/2455/testReport/org.apache.hadoop.hbase.coprocessor/TestClassLoading/testClassLoadingFromLibDirInJar/
 I check the log, and find that the function waitTableAvailable will only 
 check the meta table, when rs open the region and update the metalocation in 
 meta, it may not be added to the onlineregions in rs.
 for (HRegion region:
 hbase.getRegionServer(0).getOnlineRegionsLocalContext()) {
 this Loop will ship, and found1 will be false altogether.
 that's why the testcase failed.
 So maybe we can  hbave some strictly check when table is created

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


 [ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-6435:
---

Attachment: 6435.v7.patch

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


 [ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-6435:
---

Status: Patch Available  (was: Open)

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425959#comment-13425959
 ] 

nkeywal commented on HBASE-6435:


Ok for review...

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425980#comment-13425980
 ] 

Zhihong Ted Yu commented on HBASE-6435:
---

Just started to look at the patch.
It doesn't compile against hadoop 2.0:
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) 
on project hbase-server: Compilation failure: Compilation failure:
[ERROR] 
/Users/zhihyu/trunk-hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:[214,12]
 namenode is not public in org.apache.hadoop.hdfs.DFSClient; cannot be accessed 
from outside package
[ERROR] 
[ERROR] 
/Users/zhihyu/trunk-hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:[221,52]
 namenode is not public in org.apache.hadoop.hdfs.DFSClient; cannot be accessed 
from outside package
[ERROR] 
[ERROR] 
/Users/zhihyu/trunk-hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:[289,81]
 cannot find symbol
[ERROR] symbol  : method getHost()
[ERROR] location: class org.apache.hadoop.hdfs.protocol.DatanodeInfo
{code}
Can we give the following a more meaningful name ?
{code}
+if (!conf.getBoolean(hbase.hdfs.jira6435, true)){  // activated by 
default
{code}
Comment from Todd would be appreciated.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425988#comment-13425988
 ] 

nkeywal commented on HBASE-6435:


I will have a look at the hadoop2 stuff.

for 
bq. Can we give the following a more meaningful name ?
Do you have an idea?

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent

[
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Hofhansl updated HBASE-6476:
-

Attachment: 6476.txt

Here's a gigantic patch for trunk.
I manually had to fix the imports in many of the classes (smarter people would
used Eclipse to script that, but anyway).

I'm not expecting anybody to review this.
If the HadoopQA succeeds that should be good enough.

I also ran some validation scripts to make sure all files referring to
EnvironmentEdgeManager have the matching imports.

Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge
equivalent
-

Key: HBASE-6476
URL: https://issues.apache.org/jira/browse/HBASE-6476
Project: HBase
Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
Fix For: 0.94.2

Attachments: 6476.txt

[jira] [Updated] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent


 [ 
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6476:
-

Status: Patch Available  (was: Open)

 Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge 
 equivalent
 -

 Key: HBASE-6476
 URL: https://issues.apache.org/jira/browse/HBASE-6476
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.2

 Attachments: 6476.txt


 There are still some areas where System.currentTimeMillis() is used in HBase. 
 In order to make all parts of the code base testable and (potentially) to be 
 able to configure HBase's notion of time, this should be generally be 
 replaced with EnvironmentEdgeManager.currentTimeMillis().
 How hard would it be to add a maven task that checks for that, so we do not 
 introduce System.currentTimeMillis back in the future?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent


 [ 
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6476:
-

Fix Version/s: (was: 0.94.2)
   0.96.0

 Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge 
 equivalent
 -

 Key: HBASE-6476
 URL: https://issues.apache.org/jira/browse/HBASE-6476
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6476.txt


 There are still some areas where System.currentTimeMillis() is used in HBase. 
 In order to make all parts of the code base testable and (potentially) to be 
 able to configure HBase's notion of time, this should be generally be 
 replaced with EnvironmentEdgeManager.currentTimeMillis().
 How hard would it be to add a maven task that checks for that, so we do not 
 introduce System.currentTimeMillis back in the future?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425995#comment-13425995
 ] 

Zhihong Ted Yu commented on HBASE-6435:
---

How about 'hbase.filesystem.reorder.blocks' ?

BTW replacing 'Hack' with some form of 'Intercept' would be better IMHO.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6411) Move Master Metrics to metrics 2


 [ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Baranau updated HBASE-6411:


Attachment: HBASE-6411-4_2.patch

Cleaned up redundant MXBean and MXBeanImpl. Looks like all comments and Qs are 
resolved.

 Move Master Metrics to metrics 2
 

 Key: HBASE-6411
 URL: https://issues.apache.org/jira/browse/HBASE-6411
 Project: HBase
  Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, 
 HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch, 
 HBASE-6411-4_2.patch, HBASE-6411_concept.patch


 Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2


[ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426005#comment-13426005
 ] 

Alex Baranau commented on HBASE-6411:
-

bq. Since MXBean.java is in master.metrics, should TestMXBean.java be in the 
same package ?

I'd say it may be the same situation as with TestMasterMetrics. It's just 
easier to place it here as test relies on access to master internal state 
heavily.

 Move Master Metrics to metrics 2
 

 Key: HBASE-6411
 URL: https://issues.apache.org/jira/browse/HBASE-6411
 Project: HBase
  Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, 
 HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch, 
 HBASE-6411-4_2.patch, HBASE-6411_concept.patch


 Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent


[ 
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426014#comment-13426014
 ] 

Hadoop QA commented on HBASE-6476:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12538589/6476.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 240 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 6 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.coprocessor.TestCoprocessorInterface
  org.apache.hadoop.hbase.master.TestClockSkewDetection
  org.apache.hadoop.hbase.TestKeyValue
  
org.apache.hadoop.hbase.regionserver.wal.TestWALActionsListener
  org.apache.hadoop.hbase.regionserver.TestQueryMatcher
  org.apache.hadoop.hbase.metrics.TestMetricsMBeanBase
  org.apache.hadoop.hbase.filter.TestDependentColumnFilter
  org.apache.hadoop.hbase.regionserver.TestResettingCounters
  org.apache.hadoop.hbase.coprocessor.TestRegionObserverStacking
  
org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster
  org.apache.hadoop.hbase.regionserver.wal.TestHLogMethods
  org.apache.hadoop.hbase.regionserver.TestBlocksScanned
  org.apache.hadoop.hbase.util.TestHFileArchiveUtil
  org.apache.hadoop.hbase.regionserver.TestMinVersions
  org.apache.hadoop.hbase.regionserver.TestCompactSelection
  org.apache.hadoop.hbase.regionserver.TestSplitTransaction
  org.apache.hadoop.hbase.ipc.TestPBOnWritableRpc
  org.apache.hadoop.hbase.TestSerialization
  org.apache.hadoop.hbase.regionserver.TestScanner
  org.apache.hadoop.hbase.util.TestHBaseFsckComparator
  org.apache.hadoop.hbase.util.TestByteBloomFilter
  org.apache.hadoop.hbase.master.cleaner.TestHFileCleaner
  org.apache.hadoop.hbase.regionserver.TestKeepDeletes
  org.apache.hadoop.hbase.util.TestThreads
  org.apache.hadoop.hbase.regionserver.TestRSStatusServlet
  org.apache.hadoop.hbase.master.TestCatalogJanitor
  org.apache.hadoop.hbase.regionserver.TestRegionSplitPolicy
  org.apache.hadoop.hbase.regionserver.TestScanWithBloomError
  org.apache.hadoop.hbase.client.TestIntraRowPagination
  org.apache.hadoop.hbase.regionserver.TestHRegionInfo
  org.apache.hadoop.hbase.regionserver.TestWideScanner
  org.apache.hadoop.hbase.migration.TestMigrationFrom090To092
  org.apache.hadoop.hbase.monitoring.TestTaskMonitor
  org.apache.hadoop.hbase.regionserver.TestColumnSeeking
  org.apache.hadoop.hbase.TestCompare
  org.apache.hadoop.hbase.filter.TestFilter
  org.apache.hadoop.hbase.regionserver.TestStoreFile
  org.apache.hadoop.hbase.filter.TestColumnPrefixFilter
  
org.apache.hadoop.hbase.monitoring.TestMemoryBoundedLogMessageBuffer
  org.apache.hadoop.hbase.filter.TestMultipleColumnPrefixFilter

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2462//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2462//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2462//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2462//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2462//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2462//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2462//console

This message is automatically generated.

 Replace all occurrances of System.currentTimeMillis()

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426015#comment-13426015
 ] 

nkeywal commented on HBASE-6435:


Ok.
I wanted to make clear it was a temporary workaround.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6471) Performance regression caused by HBASE-4054

[
https://issues.apache.org/jira/browse/HBASE-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426017#comment-13426017
]

Lars Hofhansl commented on HBASE-6471:
--

The way we have HTablePool.PooledHTable extending HTable is actually quite
terrible.
For example append was not added to it, so for append it won't go through the
delegate... Which happens to be fine, because the delegate is not needed to
begin with when PooledHTable just extends HTable.

At the very least we should remove the delegate and just override close() and
toString().. or fix HBASE-5728.

Performance regression caused by HBASE-4054
---

Key: HBASE-6471
URL: https://issues.apache.org/jira/browse/HBASE-6471
Project: HBase
Issue Type: Bug
Components: client
Affects Versions: 0.92.0
Reporter: Lars George
Priority: Critical
Fix For: 0.94.2

The patch in HBASE-4054 switches the PooledHTable to extend HTable as opposed
to implement HTableInterface.
Since HTable does not have an empty constructor, the patch added a call to
the super() constructor, which though does trigger the ZooKeeper and META
scan, causing a considerable delay.
With multiple threads using the pool in parallel, the first thread is holding
up all the subsequent ones, in effect it negates the whole reason we have a
HTable pool.
We should complete HBASE-5728, or alternatively add a protected, empty
constructor the HTable. I am +1 for the former.

[jira] [Updated] (HBASE-6468) RowCounter may return incorrect result if column name is specified in command line


 [ 
https://issues.apache.org/jira/browse/HBASE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shrijeet Paliwal updated HBASE-6468:


Attachment: 0004-HBASE-6468-RowCounter-may-return-incorrect-result.patch

Patch off trunk. 

 RowCounter may return incorrect result if column name is specified in command 
 line
 --

 Key: HBASE-6468
 URL: https://issues.apache.org/jira/browse/HBASE-6468
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: Shrijeet Paliwal
 Attachments: 
 0001-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 
 0002-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 
 0004-HBASE-6468-RowCounter-may-return-incorrect-result.patch


 The RowCounter use FirstKeyOnlyFilter regardless of whether or not the
 command line argument specified a column family (or family:qualifier).
 In case when no qualifier was specified as argument, the scan will
 give correct result. However in the other case the scan instance may
 have been set with columns other than the very first column in the
 row, causing scan to get nothing as the FirstKeyOnlyFilter removes
 everything else.
 https://issues.apache.org/jira/browse/HBASE-6042 is related. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5728) Methods Missing in HTableInterface


[ 
https://issues.apache.org/jira/browse/HBASE-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426024#comment-13426024
 ] 

Lars Hofhansl commented on HBASE-5728:
--

These:
{code}
public MapHRegionInfo, HServerAddress getRegionsInfo() throws IOException;
public HRegionLocation getRegionLocation(String row) throws IOException;
public HRegionLocation getRegionLocation(byte[] row) throws IOException;

public void prewarmRegionCache(MapHRegionInfo, HServerAddress regionMap);
public void clearRegionCache();

public long getWriteBufferSize();
public void setWriteBufferSize(long writeBufferSize) throws IOException,
public ArrayListPut getWriteBuffer();
{code}


Would leak implementation stuff into the interface.
I think HBASE-4054 specifically mentions, that {code}public MapHRegionInfo, 
HServerAddress getRegionsInfo() throws IOException;{code} is needed. Hmm...


 Methods Missing in HTableInterface
 --

 Key: HBASE-5728
 URL: https://issues.apache.org/jira/browse/HBASE-5728
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Bing Li

 Dear all,
 I found some methods existed in HTable were not in HTableInterface.
setAutoFlush
setWriteBufferSize
...
 In most cases, I manipulate HBase through HTableInterface from HTablePool. If 
 I need to use the above methods, how to do that?
 I am considering writing my own table pool if no proper ways. Is it fine?
 Thanks so much!
 Best regards,
 Bing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2

[
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426066#comment-13426066
]

Hadoop QA commented on HBASE-6411:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12538593/HBASE-6411-4_2.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 45 new or modified tests.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The patch appears to cause mvn compile goal to fail.

-1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2463//testReport/
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2463//console

This message is automatically generated.

Move Master Metrics to metrics 2

Key: HBASE-6411
URL: https://issues.apache.org/jira/browse/HBASE-6411
Project: HBase
Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Elliott Clark
Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch,
HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch,
HBASE-6411-4_2.patch, HBASE-6411_concept.patch

Move Master Metrics to metrics 2

[jira] [Assigned] (HBASE-6411) Move Master Metrics to metrics 2


 [ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Baranau reassigned HBASE-6411:
---

Assignee: Alex Baranau  (was: Elliott Clark)

 Move Master Metrics to metrics 2
 

 Key: HBASE-6411
 URL: https://issues.apache.org/jira/browse/HBASE-6411
 Project: HBase
  Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Alex Baranau
 Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, 
 HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch, 
 HBASE-6411-4_2.patch, HBASE-6411_concept.patch


 Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2


[ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426070#comment-13426070
 ] 

Alex Baranau commented on HBASE-6411:
-

Not sure if I should do smth about these:
bq. -1 javac. The patch appears to cause mvn compile goal to fail.
bq. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

Please let me know.

 Move Master Metrics to metrics 2
 

 Key: HBASE-6411
 URL: https://issues.apache.org/jira/browse/HBASE-6411
 Project: HBase
  Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Alex Baranau
 Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, 
 HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch, 
 HBASE-6411-4_2.patch, HBASE-6411_concept.patch


 Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6481) SkipFilter javadoc is incorrect

Shrijeet Paliwal created HBASE-6481:
---

 Summary: SkipFilter javadoc is incorrect
 Key: HBASE-6481
 URL: https://issues.apache.org/jira/browse/HBASE-6481
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Shrijeet Paliwal
Priority: Minor


The javadoc for SkipFilter 
(http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SkipFilter.html)
 states : 
 
A wrapper filter that filters an entire row if any of the KeyValue checks do 
not pass.
 
But the example same javadocs gives to support this statement is wrong. The 
*scan.setFilter(new SkipFilter(new ValueFilter(CompareOp.EQUAL,
 new BinaryComparator(Bytes.toBytes(0;* , will only emit rows which 
have all column values zero. In other words it is going to skip all rows for 
which 
ValueFilter(CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes(0))) does not 
pass , which happen to be all non zero valued cells. 

In the same example a ValueFilter created with CompareOp.NOT_EQUAL will filter 
out the rows which have a column value zero. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6427) Pluggable compaction and scan policies via coprocessors

2012-07-31 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426090#comment-13426090
 ] 

Elliott Clark commented on HBASE-6427:
--

An InterfaceAudience annotation slipped in here.  It breaks older hadoop 
versions(HBASE-6141).

 Pluggable compaction and scan policies via coprocessors
 ---

 Key: HBASE-6427
 URL: https://issues.apache.org/jira/browse/HBASE-6427
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.96.0, 0.94.2

 Attachments: 6427-0.94.txt, 6427-notReady.txt, 6427-v1.txt, 
 6427-v10.txt, 6427-v2.txt, 6427-v3.txt, 6427-v4.txt, 6427-v5.txt, 6427-v7.txt


 When implementing higher level stores on top of HBase it is necessary to 
 allow dynamic control over how long KVs must be kept around.
 Semi-static config options for ColumnFamilies (# of version or TTL) is not 
 sufficient.
 This can be done with a few additional coprocessor hooks, or by makeing 
 Store.ScanInfo pluggable.
 Was:
 The simplest way to achieve this is to have a pluggable class to determine 
 the smallestReadpoint for Region. That way outside code can control what KVs 
 to retain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent


[ 
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426093#comment-13426093
 ] 

Lars Hofhansl commented on HBASE-6476:
--

That doesn't look too good.

 Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge 
 equivalent
 -

 Key: HBASE-6476
 URL: https://issues.apache.org/jira/browse/HBASE-6476
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6476.txt


 There are still some areas where System.currentTimeMillis() is used in HBase. 
 In order to make all parts of the code base testable and (potentially) to be 
 able to configure HBase's notion of time, this should be generally be 
 replaced with EnvironmentEdgeManager.currentTimeMillis().
 How hard would it be to add a maven task that checks for that, so we do not 
 introduce System.currentTimeMillis back in the future?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent


[ 
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426096#comment-13426096
 ] 

Lars Hofhansl commented on HBASE-6476:
--

Argghh... I am an idiot. My script replaced System.currentTimeMillis() with 
EnvironmentEdgeManager.currentTimeMillis() in DefaultEnvironmentEgde. Obviously 
that leads to an endless loop.

 Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge 
 equivalent
 -

 Key: HBASE-6476
 URL: https://issues.apache.org/jira/browse/HBASE-6476
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6476.txt


 There are still some areas where System.currentTimeMillis() is used in HBase. 
 In order to make all parts of the code base testable and (potentially) to be 
 able to configure HBase's notion of time, this should be generally be 
 replaced with EnvironmentEdgeManager.currentTimeMillis().
 How hard would it be to add a maven task that checks for that, so we do not 
 introduce System.currentTimeMillis back in the future?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


 [ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-6435:
---

Status: Open  (was: Patch Available)

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


 [ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-6435:
---

Attachment: 6435.v8.patch

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent


 [ 
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6476:
-

Attachment: 6476-v2.txt

Let's try again.

 Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge 
 equivalent
 -

 Key: HBASE-6476
 URL: https://issues.apache.org/jira/browse/HBASE-6476
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6476-v2.txt, 6476.txt


 There are still some areas where System.currentTimeMillis() is used in HBase. 
 In order to make all parts of the code base testable and (potentially) to be 
 able to configure HBase's notion of time, this should be generally be 
 replaced with EnvironmentEdgeManager.currentTimeMillis().
 How hard would it be to add a maven task that checks for that, so we do not 
 introduce System.currentTimeMillis back in the future?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


 [ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-6435:
---

Status: Patch Available  (was: Open)

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426102#comment-13426102
 ] 

nkeywal commented on HBASE-6435:


v8 works ok with hadoop 1  hadoop 2 and other Ted's comments. I tried the v3 
profile, but got errors in the pom.xml.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6427) Pluggable compaction and scan policies via coprocessors


 [ 
https://issues.apache.org/jira/browse/HBASE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6427:
-

Attachment: 6427-0.94-addendum.txt

Oops... Yes. Here's an addendum.

 Pluggable compaction and scan policies via coprocessors
 ---

 Key: HBASE-6427
 URL: https://issues.apache.org/jira/browse/HBASE-6427
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.96.0, 0.94.2

 Attachments: 6427-0.94-addendum.txt, 6427-0.94.txt, 
 6427-notReady.txt, 6427-v1.txt, 6427-v10.txt, 6427-v2.txt, 6427-v3.txt, 
 6427-v4.txt, 6427-v5.txt, 6427-v7.txt


 When implementing higher level stores on top of HBase it is necessary to 
 allow dynamic control over how long KVs must be kept around.
 Semi-static config options for ColumnFamilies (# of version or TTL) is not 
 sufficient.
 This can be done with a few additional coprocessor hooks, or by makeing 
 Store.ScanInfo pluggable.
 Was:
 The simplest way to achieve this is to have a pluggable class to determine 
 the smallestReadpoint for Region. That way outside code can control what KVs 
 to retain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6427) Pluggable compaction and scan policies via coprocessors


[ 
https://issues.apache.org/jira/browse/HBASE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426107#comment-13426107
 ] 

Lars Hofhansl commented on HBASE-6427:
--

Committed addendum, thanks for watching Elliot.

 Pluggable compaction and scan policies via coprocessors
 ---

 Key: HBASE-6427
 URL: https://issues.apache.org/jira/browse/HBASE-6427
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.96.0, 0.94.2

 Attachments: 6427-0.94-addendum.txt, 6427-0.94.txt, 
 6427-notReady.txt, 6427-v1.txt, 6427-v10.txt, 6427-v2.txt, 6427-v3.txt, 
 6427-v4.txt, 6427-v5.txt, 6427-v7.txt


 When implementing higher level stores on top of HBase it is necessary to 
 allow dynamic control over how long KVs must be kept around.
 Semi-static config options for ColumnFamilies (# of version or TTL) is not 
 sufficient.
 This can be done with a few additional coprocessor hooks, or by makeing 
 Store.ScanInfo pluggable.
 Was:
 The simplest way to achieve this is to have a pluggable class to determine 
 the smallestReadpoint for Region. That way outside code can control what KVs 
 to retain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6468) RowCounter may return incorrect result if column name is specified in command line

[
https://issues.apache.org/jira/browse/HBASE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426113#comment-13426113
]

Zhihong Ted Yu commented on HBASE-6468:
---

{code}
+ * Constructor which takes a list of columns. As soon as first KeyValue
{code}
Now that the parameter has changed to Setbyte [], the above javadoc should be
modified.
{code}
+ * matching any of these columns if found, filter moves to next row.
{code}
'if found' - 'is found'.
{code}
+ * @param qualifiers the list of columns to me matched.
{code}
Change to 'the set of columns to be matched'

Looks like HBASE-6454 may go in ahead of this JIRA. So Filter.proto should have
the following:
{code}
+message FirstKeyValueMatchingQualifiersFilter {
+}
{code}

RowCounter may return incorrect result if column name is specified in command
line
--

[jira] [Commented] (HBASE-6427) Pluggable compaction and scan policies via coprocessors

2012-07-31 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426114#comment-13426114
 ] 

Elliott Clark commented on HBASE-6427:
--

Thanks so much.

 Pluggable compaction and scan policies via coprocessors
 ---

 Key: HBASE-6427
 URL: https://issues.apache.org/jira/browse/HBASE-6427
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.96.0, 0.94.2

 Attachments: 6427-0.94-addendum.txt, 6427-0.94.txt, 
 6427-notReady.txt, 6427-v1.txt, 6427-v10.txt, 6427-v2.txt, 6427-v3.txt, 
 6427-v4.txt, 6427-v5.txt, 6427-v7.txt


 When implementing higher level stores on top of HBase it is necessary to 
 allow dynamic control over how long KVs must be kept around.
 Semi-static config options for ColumnFamilies (# of version or TTL) is not 
 sufficient.
 This can be done with a few additional coprocessor hooks, or by makeing 
 Store.ScanInfo pluggable.
 Was:
 The simplest way to achieve this is to have a pluggable class to determine 
 the smallestReadpoint for Region. That way outside code can control what KVs 
 to retain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6467) ROOT stuck in assigning forever


 [ 
https://issues.apache.org/jira/browse/HBASE-6467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6467:
---

Priority: Minor  (was: Major)

 ROOT stuck in assigning forever
 ---

 Key: HBASE-6467
 URL: https://issues.apache.org/jira/browse/HBASE-6467
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.1
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: master.log.gz, regionserver.log.gz, 
 root-region-assignment.png


 After restart a cluster, all region servers checked into master but the 
 master stuck in assigning forever.
 Master log shows it keeps trying connect to one region server for ROOT table, 
 while that region server's log shows it keeps printing out 
 NotServingRegionException.
 After restart the master, things are ok now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-6467) ROOT stuck in assigning forever


 [ 
https://issues.apache.org/jira/browse/HBASE-6467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang resolved HBASE-6467.


Resolution: Won't Fix

This issue should have been fixed in 0.94 and trunk.  Not plan to fix it in 
0.92 since the workaround is good enough.

 ROOT stuck in assigning forever
 ---

 Key: HBASE-6467
 URL: https://issues.apache.org/jira/browse/HBASE-6467
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.1
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: master.log.gz, regionserver.log.gz, 
 root-region-assignment.png


 After restart a cluster, all region servers checked into master but the 
 master stuck in assigning forever.
 Master log shows it keeps trying connect to one region server for ROOT table, 
 while that region server's log shows it keeps printing out 
 NotServingRegionException.
 After restart the master, things are ok now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6471) Performance regression caused by HBASE-4054


[ 
https://issues.apache.org/jira/browse/HBASE-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426144#comment-13426144
 ] 

Lars Hofhansl commented on HBASE-6471:
--

Please see my comment on HBASE-5728. I am not sure we want to leak a lot of the 
HTable internals (anything related to regions, etc) up into the interface.
So maybe just remove the delegation code from PooledHTable and add a 
constructor to HTable that avoids the ZK/Meta scan?

 Performance regression caused by HBASE-4054
 ---

 Key: HBASE-6471
 URL: https://issues.apache.org/jira/browse/HBASE-6471
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0
Reporter: Lars George
Priority: Critical
 Fix For: 0.94.2


 The patch in HBASE-4054 switches the PooledHTable to extend HTable as opposed 
 to implement HTableInterface.
 Since HTable does not have an empty constructor, the patch added a call to 
 the super() constructor, which though does trigger the ZooKeeper and META 
 scan, causing a considerable delay. 
 With multiple threads using the pool in parallel, the first thread is holding 
 up all the subsequent ones, in effect it negates the whole reason we have a 
 HTable pool.
 We should complete HBASE-5728, or alternatively add a protected, empty 
 constructor the HTable. I am +1 for the former.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6468) RowCounter may return incorrect result if column name is specified in command line


[ 
https://issues.apache.org/jira/browse/HBASE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426146#comment-13426146
 ] 

Shrijeet Paliwal commented on HBASE-6468:
-

Will wait for Filter.proto to get committed. Is the patch format fine (I used 
git format-patch), or you want git diff --no-prefix ?

 RowCounter may return incorrect result if column name is specified in command 
 line
 --

 Key: HBASE-6468
 URL: https://issues.apache.org/jira/browse/HBASE-6468
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: Shrijeet Paliwal
 Attachments: 
 0001-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 
 0002-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 
 0004-HBASE-6468-RowCounter-may-return-incorrect-result.patch


 The RowCounter use FirstKeyOnlyFilter regardless of whether or not the
 command line argument specified a column family (or family:qualifier).
 In case when no qualifier was specified as argument, the scan will
 give correct result. However in the other case the scan instance may
 have been set with columns other than the very first column in the
 row, causing scan to get nothing as the FirstKeyOnlyFilter removes
 everything else.
 https://issues.apache.org/jira/browse/HBASE-6042 is related. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6468) RowCounter may return incorrect result if column name is specified in command line


[ 
https://issues.apache.org/jira/browse/HBASE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426147#comment-13426147
 ] 

Zhihong Ted Yu commented on HBASE-6468:
---

Git format is fine, acceptable by Hadoop QA.

 RowCounter may return incorrect result if column name is specified in command 
 line
 --

 Key: HBASE-6468
 URL: https://issues.apache.org/jira/browse/HBASE-6468
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: Shrijeet Paliwal
 Attachments: 
 0001-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 
 0002-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 
 0004-HBASE-6468-RowCounter-may-return-incorrect-result.patch


 The RowCounter use FirstKeyOnlyFilter regardless of whether or not the
 command line argument specified a column family (or family:qualifier).
 In case when no qualifier was specified as argument, the scan will
 give correct result. However in the other case the scan instance may
 have been set with columns other than the very first column in the
 row, causing scan to get nothing as the FirstKeyOnlyFilter removes
 everything else.
 https://issues.apache.org/jira/browse/HBASE-6042 is related. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426151#comment-13426151
 ] 

Zhihong Ted Yu commented on HBASE-6435:
---

{code}
+  private static ClientProtocol createReordoringProxy(final ClientProtocol cp,
{code}
Usually spelling would be nit. But this spelling mistake was in method name :-)
{code}
+  public static ServerName getServerNameFromHLogDirectoryName(Configuration 
conf, String path) throws IOException {
{code}
The above line is too long.
{code}
+  LOG.debug(Moved the location +toLast.getHostName()+ to the 
last place. +
+   locations size was +dnis.length);
{code}
I think the above log may appear many times.
{code}
+LOG.fatal( REORDER);
{code}
The above can be made a debug log.


 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

[
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426158#comment-13426158
]

Hadoop QA commented on HBASE-6435:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12538610/6435.v8.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 8 new or modified tests.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.fs.TestBlockReorder

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2464//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2464//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2464//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2464//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2464//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2464//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2464//console

This message is automatically generated.

Reading WAL files after a recovery leads to time lost in HDFS timeouts when
using dead datanodes

Key: HBASE-6435
URL: https://issues.apache.org/jira/browse/HBASE-6435
Project: HBase
Issue Type: Improvement
Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch,
6435.v8.patch

HBase writes a Write-Ahead-Log to revover from hardware failure.
This log is written with 'append' on hdfs.
Through ZooKeeper, HBase gets informed usually in 30s that it should start
the recovery process.
This means reading the Write-Ahead-Log to replay the edits on the other
servers.
In standards deployments, HBase process (regionserver) are deployed on the
same box as the datanodes.
It means that when the box stops, we've actually lost one of the edits, as we
lost both the regionserver and the datanode.
As HDFS marks a node as dead after ~10 minutes, it appears as available when
we try to read the blocks to recover. As such, we are delaying the recovery
process by 60 seconds as the read will usually fail with a socket timeout. If
the file is still opened for writing, it adds an extra 20s + a risk of losing
edits if we connect with ipc to the dead DN.
Possible solutions are:
- shorter dead datanodes detection by the NN. Requires a NN code change.
- better dead datanodes management in DFSClient. Requires a DFS code change.
- NN customisation to write the WAL files on another DN instead of the local
one.
- reordering the blocks returned by the NN on the client side to put the
blocks on the same DN as the dead RS at the end of the priority queue.
Requires a DFS code change or a kind of workaround.
The solution retained is the last one. Compared to what was discussed on the
mailing list, the proposed patch will not modify HDFS source code but adds a
proxy. This for two reasons:
- Some HDFS functions managing block orders are static
(MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would
require to implement partially the fix, change the DFS interface to make this
function non static, or put the hook static. None of these solution is very
clean.
- Adding a proxy allows to put all the code in HBase, simplifying dependency
management.
Nevertheless, it would be better to have this in HDFS. But this solution
allows to target the last version only, and this could allow minimal
interface changes such as non static methods.
Moreover, writing the blocks to the non local DN would be an even better
solution long term.

--
This message is automatically generated by JIRA.
If you

[jira] [Commented] (HBASE-6427) Pluggable compaction and scan policies via coprocessors

2012-07-31 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426167#comment-13426167
 ] 

Hudson commented on HBASE-6427:
---

Integrated in HBase-0.94 #379 (See 
[https://builds.apache.org/job/HBase-0.94/379/])
HBASE-6427 addendum (Revision 1367770)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanType.java


 Pluggable compaction and scan policies via coprocessors
 ---

 Key: HBASE-6427
 URL: https://issues.apache.org/jira/browse/HBASE-6427
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.96.0, 0.94.2

 Attachments: 6427-0.94-addendum.txt, 6427-0.94.txt, 
 6427-notReady.txt, 6427-v1.txt, 6427-v10.txt, 6427-v2.txt, 6427-v3.txt, 
 6427-v4.txt, 6427-v5.txt, 6427-v7.txt


 When implementing higher level stores on top of HBase it is necessary to 
 allow dynamic control over how long KVs must be kept around.
 Semi-static config options for ColumnFamilies (# of version or TTL) is not 
 sufficient.
 This can be done with a few additional coprocessor hooks, or by makeing 
 Store.ScanInfo pluggable.
 Was:
 The simplest way to achieve this is to have a pluggable class to determine 
 the smallestReadpoint for Region. That way outside code can control what KVs 
 to retain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6482) In AssignmentManager failover mode, use ServerShutdownHandler to handle dead regions

Jimmy Xiang created HBASE-6482:
--

 Summary: In AssignmentManager failover mode, use 
ServerShutdownHandler to handle dead regions
 Key: HBASE-6482
 URL: https://issues.apache.org/jira/browse/HBASE-6482
 Project: HBase
  Issue Type: Sub-task
Reporter: Jimmy Xiang


In AssignmentManager failover mode, a special failoverProcessedRegions map is 
used to manage regions in transition. It complicates the code. Should we use 
ServerShutdownHander to process those regions?  So that we can share some code 
and make the logic of AssignmentManager a little bit simpler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6483) Fully enable ServerShutdownHandler after master joins the cluster

Jimmy Xiang created HBASE-6483:
--

 Summary: Fully enable ServerShutdownHandler after master joins the 
cluster
 Key: HBASE-6483
 URL: https://issues.apache.org/jira/browse/HBASE-6483
 Project: HBase
  Issue Type: Sub-task
Reporter: Jimmy Xiang


Once ROOT and META are assigned, ServerShutdownHandler is enabled. So that we 
can handle meta/root region server failure before joinCluster is completed. 
However, we can hold ServerShutdownHandler a little bit more for the user 
region assignments, i.e. doesn't assign user regions before joinCluster is 
returned. If so, we can avoid some region assignments racing: same regions are 
trying to be assigned in both joinCluster and ServerShutdownHandler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6484) Make AssignmentManger#enablingTables and disablintTables local variables

Jimmy Xiang created HBASE-6484:
--

 Summary: Make AssignmentManger#enablingTables and disablintTables 
local variables
 Key: HBASE-6484
 URL: https://issues.apache.org/jira/browse/HBASE-6484
 Project: HBase
  Issue Type: Sub-task
Reporter: Jimmy Xiang


Those enablingTables and disablingTables, are used only during the startup 
time. They should be some local variables. We can load them from ZKTable at the 
beginning instead of handling them per table.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6485) Share bulk assign code in AssignmentManager

Jimmy Xiang created HBASE-6485:
--

 Summary: Share bulk assign code in AssignmentManager
 Key: HBASE-6485
 URL: https://issues.apache.org/jira/browse/HBASE-6485
 Project: HBase
  Issue Type: Sub-task
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


AssignmentManager has several bulk assign functions: for startup bulk assign, 
for ServerShutdownHandler bulk assign, etc. They can be shared.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent


 [ 
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6476:
-

Status: Open  (was: Patch Available)

 Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge 
 equivalent
 -

 Key: HBASE-6476
 URL: https://issues.apache.org/jira/browse/HBASE-6476
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6476-v2.txt, 6476.txt


 There are still some areas where System.currentTimeMillis() is used in HBase. 
 In order to make all parts of the code base testable and (potentially) to be 
 able to configure HBase's notion of time, this should be generally be 
 replaced with EnvironmentEdgeManager.currentTimeMillis().
 How hard would it be to add a maven task that checks for that, so we do not 
 introduce System.currentTimeMillis back in the future?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent


 [ 
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6476:
-

Status: Patch Available  (was: Open)

 Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge 
 equivalent
 -

 Key: HBASE-6476
 URL: https://issues.apache.org/jira/browse/HBASE-6476
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6476-v2.txt, 6476.txt


 There are still some areas where System.currentTimeMillis() is used in HBase. 
 In order to make all parts of the code base testable and (potentially) to be 
 able to configure HBase's notion of time, this should be generally be 
 replaced with EnvironmentEdgeManager.currentTimeMillis().
 How hard would it be to add a maven task that checks for that, so we do not 
 introduce System.currentTimeMillis back in the future?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6486) Enhance load test to print throughput measurements

2012-07-31 Thread Karthik Ranganathan (JIRA)

Karthik Ranganathan created HBASE-6486:
--

 Summary: Enhance load test to print throughput measurements
 Key: HBASE-6486
 URL: https://issues.apache.org/jira/browse/HBASE-6486
 Project: HBase
  Issue Type: Bug
Reporter: Karthik Ranganathan
Assignee: Aurick Qiao


Idea is to know how many MB/sec of throughput we are able to get by writing 
into HBase using a simple tool.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes


[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426225#comment-13426225
 ] 

Zhihong Ted Yu commented on HBASE-6435:
---

For the test failure:
{code}
org.junit.ComparisonFailure: expected:[localhost] but was:[host2]
at org.junit.Assert.assertEquals(Assert.java:125)
at org.junit.Assert.assertEquals(Assert.java:147)
at 
org.apache.hadoop.hbase.fs.TestBlockReorder.testFromDFS(TestBlockReorder.java:320)
at 
org.apache.hadoop.hbase.fs.TestBlockReorder.testHBaseCluster(TestBlockReorder.java:271)
{code}
testFromDFS() should have utilized the done flag for the while loop below:
{code}
+for (int y = 0; y  l.getLocatedBlocks().size()  done; y++) {
+  done = (l.get(y).getLocations().length == 3);
+}
+  } while (l.get(0).getLocations().length != 3);
{code}
When l.getLocatedBlocks().size() is greater than 1, the above loop may exit 
prematurely.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 
 6435.v8.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6487) assign region doesn't check if the region is already assigned

Jimmy Xiang created HBASE-6487:
--

 Summary: assign region doesn't check if the region is already 
assigned
 Key: HBASE-6487
 URL: https://issues.apache.org/jira/browse/HBASE-6487
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang


Tried to assign a region already assigned somewhere from hbase shell, the 
region is assigned to a different place but the previous assignment is not 
closed.  So it causes double assignments.  In such a case, it's better to issue 
a warning instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6052) Convert .META. and -ROOT- content to pb

2012-07-31 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426264#comment-13426264
 ] 

Enis Soztutar commented on HBASE-6052:
--

Stack, mind if I attack this? 

 Convert .META. and -ROOT- content to pb
 ---

 Key: HBASE-6052
 URL: https://issues.apache.org/jira/browse/HBASE-6052
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes

2012-07-31 Thread ryan rawson (JIRA)

ryan rawson created HBASE-6488:
--

 Summary: HBase wont run on IPv6 on OSes that use zone-indexes
 Key: HBASE-6488
 URL: https://issues.apache.org/jira/browse/HBASE-6488
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ryan rawson


In IPv6, an address may have a zone-index, which is specified with a percent, 
eg: ...%0.  This looks like a format string, and thus in a part of the code 
which uses the hostname as a prefix to another string which is interpreted with 
String.format, you end up with an exception:


2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster:
Unhandled exception. Starting shutdown.
java.util.UnknownFormatConversionException: Conversion = '0'
at java.util.Formatter.checkText(Formatter.java:2503)
at java.util.Formatter.parse(Formatter.java:2467)
at java.util.Formatter.format(Formatter.java:2414)
at java.util.Formatter.format(Formatter.java:2367)
at java.lang.String.format(String.java:2769)
at 
com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68)
at 
org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299)
at 
org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185)
at 
org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227)
at 
org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344)
at 
org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220)
at java.lang.Thread.run(Thread.java:680)
2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes

2012-07-31 Thread ryan rawson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryan rawson updated HBASE-6488:
---

Status: Patch Available  (was: Open)

 HBase wont run on IPv6 on OSes that use zone-indexes
 

 Key: HBASE-6488
 URL: https://issues.apache.org/jira/browse/HBASE-6488
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ryan rawson
 Attachments: HBASE-6488.txt


 In IPv6, an address may have a zone-index, which is specified with a percent, 
 eg: ...%0.  This looks like a format string, and thus in a part of the code 
 which uses the hostname as a prefix to another string which is interpreted 
 with String.format, you end up with an exception:
 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unhandled exception. Starting shutdown.
 java.util.UnknownFormatConversionException: Conversion = '0'
 at java.util.Formatter.checkText(Formatter.java:2503)
 at java.util.Formatter.parse(Formatter.java:2467)
 at java.util.Formatter.format(Formatter.java:2414)
 at java.util.Formatter.format(Formatter.java:2367)
 at java.lang.String.format(String.java:2769)
 at 
 com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227)
 at 
 org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344)
 at 
 org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220)
 at java.lang.Thread.run(Thread.java:680)
 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes

2012-07-31 Thread ryan rawson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryan rawson updated HBASE-6488:
---

Attachment: HBASE-6488.txt

 HBase wont run on IPv6 on OSes that use zone-indexes
 

 Key: HBASE-6488
 URL: https://issues.apache.org/jira/browse/HBASE-6488
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ryan rawson
 Attachments: HBASE-6488.txt


 In IPv6, an address may have a zone-index, which is specified with a percent, 
 eg: ...%0.  This looks like a format string, and thus in a part of the code 
 which uses the hostname as a prefix to another string which is interpreted 
 with String.format, you end up with an exception:
 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unhandled exception. Starting shutdown.
 java.util.UnknownFormatConversionException: Conversion = '0'
 at java.util.Formatter.checkText(Formatter.java:2503)
 at java.util.Formatter.parse(Formatter.java:2467)
 at java.util.Formatter.format(Formatter.java:2414)
 at java.util.Formatter.format(Formatter.java:2367)
 at java.lang.String.format(String.java:2769)
 at 
 com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227)
 at 
 org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344)
 at 
 org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220)
 at java.lang.Thread.run(Thread.java:680)
 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes


[ 
https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426270#comment-13426270
 ] 

Hadoop QA commented on HBASE-6488:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12538644/HBASE-6488.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2466//console

This message is automatically generated.

 HBase wont run on IPv6 on OSes that use zone-indexes
 

 Key: HBASE-6488
 URL: https://issues.apache.org/jira/browse/HBASE-6488
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ryan rawson
 Attachments: HBASE-6488.txt


 In IPv6, an address may have a zone-index, which is specified with a percent, 
 eg: ...%0.  This looks like a format string, and thus in a part of the code 
 which uses the hostname as a prefix to another string which is interpreted 
 with String.format, you end up with an exception:
 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unhandled exception. Starting shutdown.
 java.util.UnknownFormatConversionException: Conversion = '0'
 at java.util.Formatter.checkText(Formatter.java:2503)
 at java.util.Formatter.parse(Formatter.java:2467)
 at java.util.Formatter.format(Formatter.java:2414)
 at java.util.Formatter.format(Formatter.java:2367)
 at java.lang.String.format(String.java:2769)
 at 
 com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227)
 at 
 org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344)
 at 
 org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220)
 at java.lang.Thread.run(Thread.java:680)
 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes


[ 
https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426280#comment-13426280
 ] 

Zhihong Ted Yu commented on HBASE-6488:
---

The path to ExecutorService.java should be 
hbase-server/src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java.

Hadoop QA only runs test suite in trunk.

 HBase wont run on IPv6 on OSes that use zone-indexes
 

 Key: HBASE-6488
 URL: https://issues.apache.org/jira/browse/HBASE-6488
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: ryan rawson
 Attachments: HBASE-6488.txt


 In IPv6, an address may have a zone-index, which is specified with a percent, 
 eg: ...%0.  This looks like a format string, and thus in a part of the code 
 which uses the hostname as a prefix to another string which is interpreted 
 with String.format, you end up with an exception:
 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unhandled exception. Starting shutdown.
 java.util.UnknownFormatConversionException: Conversion = '0'
 at java.util.Formatter.checkText(Formatter.java:2503)
 at java.util.Formatter.parse(Formatter.java:2467)
 at java.util.Formatter.format(Formatter.java:2414)
 at java.util.Formatter.format(Formatter.java:2367)
 at java.lang.String.format(String.java:2769)
 at 
 com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185)
 at 
 org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227)
 at 
 org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344)
 at 
 org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220)
 at java.lang.Thread.run(Thread.java:680)
 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent


 [ 
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6476:
-

Attachment: 6476-v2.txt

Not sure why hadoop QA wouldn't run. Trying again.

 Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge 
 equivalent
 -

 Key: HBASE-6476
 URL: https://issues.apache.org/jira/browse/HBASE-6476
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6476-v2.txt, 6476-v2.txt, 6476.txt


 There are still some areas where System.currentTimeMillis() is used in HBase. 
 In order to make all parts of the code base testable and (potentially) to be 
 able to configure HBase's notion of time, this should be generally be 
 replaced with EnvironmentEdgeManager.currentTimeMillis().
 How hard would it be to add a maven task that checks for that, so we do not 
 introduce System.currentTimeMillis back in the future?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6480) If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall


 [ 
https://issues.apache.org/jira/browse/HBASE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-6480:


Fix Version/s: 0.94.2
   0.96.0

 If callQueueSize exceed maxQueueSize, all call will be rejected, do not 
 reject priorityCall 
 

 Key: HBASE-6480
 URL: https://issues.apache.org/jira/browse/HBASE-6480
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6480-94.patch, HBASE-6480-trunk.patch


 Current if the callQueueSize exceed maxQueueSize, all call will be rejected, 
 Should we let the priority Call pass through?
 Current:
 if ((callSize + callQueueSize.get())  maxQueueSize) {
   Call callTooBig = xxx
   return ;
 }
 if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
   priorityCallQueue.put(call);
   updateCallQueueLenMetrics(priorityCallQueue);
 } else {
   callQueue.put(call);  // queue the call; maybe blocked here
   updateCallQueueLenMetrics(callQueue);
 }
 Should we change it to :
 if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
   priorityCallQueue.put(call);
   updateCallQueueLenMetrics(priorityCallQueue);
 } else {
   if ((callSize + callQueueSize.get())  maxQueueSize) {
Call callTooBig = xxx
return ;
   }
   callQueue.put(call);  // queue the call; maybe blocked here
   updateCallQueueLenMetrics(callQueue);
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6478) TestClassLoading.testClassLoadingFromLibDirInJar in coprocessor may appear fail due to waitTableAvailable


 [ 
https://issues.apache.org/jira/browse/HBASE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhou wenjian updated HBASE-6478:


Fix Version/s: 0.94.2

 TestClassLoading.testClassLoadingFromLibDirInJar in coprocessor may appear 
 fail due to waitTableAvailable
 -

 Key: HBASE-6478
 URL: https://issues.apache.org/jira/browse/HBASE-6478
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: zhou wenjian
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6478-trunk.patch


 When hudson runs for HBASE-6459, it encounters a failed testcase in 
 org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLibDirInJar.
  The link is 
 https://builds.apache.org/job/PreCommit-HBASE-Build/2455/testReport/org.apache.hadoop.hbase.coprocessor/TestClassLoading/testClassLoadingFromLibDirInJar/
 I check the log, and find that the function waitTableAvailable will only 
 check the meta table, when rs open the region and update the metalocation in 
 meta, it may not be added to the onlineregions in rs.
 for (HRegion region:
 hbase.getRegionServer(0).getOnlineRegionsLocalContext()) {
 this Loop will ship, and found1 will be false altogether.
 that's why the testcase failed.
 So maybe we can  hbave some strictly check when table is created

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6485) Share bulk assign code in AssignmentManager


[ 
https://issues.apache.org/jira/browse/HBASE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426304#comment-13426304
 ] 

Jimmy Xiang commented on HBASE-6485:


First patch was posted on RB: https://reviews.apache.org/r/6269/

 Share bulk assign code in AssignmentManager
 ---

 Key: HBASE-6485
 URL: https://issues.apache.org/jira/browse/HBASE-6485
 Project: HBase
  Issue Type: Sub-task
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang

 AssignmentManager has several bulk assign functions: for startup bulk assign, 
 for ServerShutdownHandler bulk assign, etc. They can be shared.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6478) TestClassLoading.testClassLoadingFromLibDirInJar in coprocessor may appear fail due to waitTableAvailable


 [ 
https://issues.apache.org/jira/browse/HBASE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhou wenjian updated HBASE-6478:


Status: Patch Available  (was: Open)

 TestClassLoading.testClassLoadingFromLibDirInJar in coprocessor may appear 
 fail due to waitTableAvailable
 -

 Key: HBASE-6478
 URL: https://issues.apache.org/jira/browse/HBASE-6478
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: zhou wenjian
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6478-trunk.patch


 When hudson runs for HBASE-6459, it encounters a failed testcase in 
 org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLibDirInJar.
  The link is 
 https://builds.apache.org/job/PreCommit-HBASE-Build/2455/testReport/org.apache.hadoop.hbase.coprocessor/TestClassLoading/testClassLoadingFromLibDirInJar/
 I check the log, and find that the function waitTableAvailable will only 
 check the meta table, when rs open the region and update the metalocation in 
 meta, it may not be added to the onlineregions in rs.
 for (HRegion region:
 hbase.getRegionServer(0).getOnlineRegionsLocalContext()) {
 this Loop will ship, and found1 will be false altogether.
 that's why the testcase failed.
 So maybe we can  hbave some strictly check when table is created

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6473) deletedtable is not deleted completely, some region may be still online


 [ 
https://issues.apache.org/jira/browse/HBASE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhou wenjian updated HBASE-6473:


Status: Patch Available  (was: Open)

 deletedtable is not deleted completely, some region may be still online
 ---

 Key: HBASE-6473
 URL: https://issues.apache.org/jira/browse/HBASE-6473
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: zhou wenjian
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6473-trunk.patch


 consider such Scenario:
 we have a table called T1, which has 1 regions: A 
 1. move A from rs1 to rs2,and A is now closed
 2. disable T1,
 3. delete  T1.
 when we disable T1, disable handler will just set the zk to disabled and A 
 will still be assigned. when Ais opened, A in transition will be clean out. 
 At that time, Deletetable found it is safe to delete all regions and table in 
 meta and fs , it will also delete the zk node of T1.
 {code}
 while (System.currentTimeMillis()  done) {
 AssignmentManager.RegionState rs = am.isRegionInTransition(region);
 if (rs == null) break;
 Threads.sleep(waitingTimeForEvents);
 LOG.debug(Waiting on  region to clear regions in transition;  + rs);
   }
   if (am.isRegionInTransition(region) != null) {
 throw new IOException(Waited hbase.master.wait.on.region ( +
   waitTime + ms) for region to leave region  +
   region.getRegionNameAsString() +  in transitions);
   }
 {code}
 however A is still being unassigned, when it finished closed the A,it finds 
 that the disabled state in zk is deleted, and then A will be assigned again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6459) improve speed of create table


 [ 
https://issues.apache.org/jira/browse/HBASE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhou wenjian updated HBASE-6459:


Status: Patch Available  (was: Open)

 improve speed of create table
 -

 Key: HBASE-6459
 URL: https://issues.apache.org/jira/browse/HBASE-6459
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: zhou wenjian
 Fix For: 0.96.0

 Attachments: HBASE-6459-90.patch, HBASE-6459-92.patch, 
 HBASE-6459-94.patch, HBASE-6459-trunk-v2.patch, HBASE-6459-trunk.patch


 in CreateTableHandler
 for (int regionIdx = 0; regionIdx  this.newRegions.length; regionIdx++) {
   HRegionInfo newRegion = this.newRegions[regionIdx];
   // 1. Create HRegion
   HRegion region = HRegion.createHRegion(newRegion,
 this.fileSystemManager.getRootDir(), this.conf,
 this.hTableDescriptor, null, false, true);
   regionInfos.add(region.getRegionInfo());
   if (regionIdx % batchSize == 0) {
 // 2. Insert into META
 MetaEditor.addRegionsToMeta(this.catalogTracker, regionInfos);
 regionInfos.clear();
   }
   // 3. Close the new region to flush to disk.  Close log file too.
   region.close();
 }
 All the region will be create serially.
 If we have thousands of regions, that will be a huge cost.
 We can improve it by create the region in parallel

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6459) improve speed of create table


 [ 
https://issues.apache.org/jira/browse/HBASE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhou wenjian updated HBASE-6459:


Status: Open  (was: Patch Available)

 improve speed of create table
 -

 Key: HBASE-6459
 URL: https://issues.apache.org/jira/browse/HBASE-6459
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: zhou wenjian
 Fix For: 0.96.0

 Attachments: HBASE-6459-90.patch, HBASE-6459-92.patch, 
 HBASE-6459-94.patch, HBASE-6459-trunk-v2.patch, HBASE-6459-trunk.patch


 in CreateTableHandler
 for (int regionIdx = 0; regionIdx  this.newRegions.length; regionIdx++) {
   HRegionInfo newRegion = this.newRegions[regionIdx];
   // 1. Create HRegion
   HRegion region = HRegion.createHRegion(newRegion,
 this.fileSystemManager.getRootDir(), this.conf,
 this.hTableDescriptor, null, false, true);
   regionInfos.add(region.getRegionInfo());
   if (regionIdx % batchSize == 0) {
 // 2. Insert into META
 MetaEditor.addRegionsToMeta(this.catalogTracker, regionInfos);
 regionInfos.clear();
   }
   // 3. Close the new region to flush to disk.  Close log file too.
   region.close();
 }
 All the region will be create serially.
 If we have thousands of regions, that will be a huge cost.
 We can improve it by create the region in parallel

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6459) improve speed of create table


[ 
https://issues.apache.org/jira/browse/HBASE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426307#comment-13426307
 ] 

zhou wenjian commented on HBASE-6459:
-

I think failed testcase is invovled with HBASE-6459.

 improve speed of create table
 -

 Key: HBASE-6459
 URL: https://issues.apache.org/jira/browse/HBASE-6459
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: zhou wenjian
 Fix For: 0.96.0

 Attachments: HBASE-6459-90.patch, HBASE-6459-92.patch, 
 HBASE-6459-94.patch, HBASE-6459-trunk-v2.patch, HBASE-6459-trunk.patch


 in CreateTableHandler
 for (int regionIdx = 0; regionIdx  this.newRegions.length; regionIdx++) {
   HRegionInfo newRegion = this.newRegions[regionIdx];
   // 1. Create HRegion
   HRegion region = HRegion.createHRegion(newRegion,
 this.fileSystemManager.getRootDir(), this.conf,
 this.hTableDescriptor, null, false, true);
   regionInfos.add(region.getRegionInfo());
   if (regionIdx % batchSize == 0) {
 // 2. Insert into META
 MetaEditor.addRegionsToMeta(this.catalogTracker, regionInfos);
 regionInfos.clear();
   }
   // 3. Close the new region to flush to disk.  Close log file too.
   region.close();
 }
 All the region will be create serially.
 If we have thousands of regions, that will be a huge cost.
 We can improve it by create the region in parallel

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6052) Convert .META. and -ROOT- content to pb


[ 
https://issues.apache.org/jira/browse/HBASE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426306#comment-13426306
 ] 

Lars Hofhansl commented on HBASE-6052:
--

Shouldn't we just get rid of ROOT?

 Convert .META. and -ROOT- content to pb
 ---

 Key: HBASE-6052
 URL: https://issues.apache.org/jira/browse/HBASE-6052
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6459) improve speed of create table


[ 
https://issues.apache.org/jira/browse/HBASE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426308#comment-13426308
 ] 

zhou wenjian commented on HBASE-6459:
-

I think failed testcase is invovled with HBASE-6478.

 improve speed of create table
 -

 Key: HBASE-6459
 URL: https://issues.apache.org/jira/browse/HBASE-6459
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: zhou wenjian
 Fix For: 0.96.0

 Attachments: HBASE-6459-90.patch, HBASE-6459-92.patch, 
 HBASE-6459-94.patch, HBASE-6459-trunk-v2.patch, HBASE-6459-trunk.patch


 in CreateTableHandler
 for (int regionIdx = 0; regionIdx  this.newRegions.length; regionIdx++) {
   HRegionInfo newRegion = this.newRegions[regionIdx];
   // 1. Create HRegion
   HRegion region = HRegion.createHRegion(newRegion,
 this.fileSystemManager.getRootDir(), this.conf,
 this.hTableDescriptor, null, false, true);
   regionInfos.add(region.getRegionInfo());
   if (regionIdx % batchSize == 0) {
 // 2. Insert into META
 MetaEditor.addRegionsToMeta(this.catalogTracker, regionInfos);
 regionInfos.clear();
   }
   // 3. Close the new region to flush to disk.  Close log file too.
   region.close();
 }
 All the region will be create serially.
 If we have thousands of regions, that will be a huge cost.
 We can improve it by create the region in parallel

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6480) If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall


 [ 
https://issues.apache.org/jira/browse/HBASE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6480:
--

Description: 
Current if the callQueueSize exceed maxQueueSize, all call will be rejected, 
Should we let the priority Call pass through?

Current:
{code}
if ((callSize + callQueueSize.get())  maxQueueSize) {
  Call callTooBig = xxx
  return ;
}
if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
  priorityCallQueue.put(call);
  updateCallQueueLenMetrics(priorityCallQueue);
} else {
  callQueue.put(call);  // queue the call; maybe blocked here
  updateCallQueueLenMetrics(callQueue);
}
{code}
Should we change it to :
{code}
if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
  priorityCallQueue.put(call);
  updateCallQueueLenMetrics(priorityCallQueue);
} else {
  if ((callSize + callQueueSize.get())  maxQueueSize) {
   Call callTooBig = xxx
   return ;
  }
  callQueue.put(call);  // queue the call; maybe blocked here
  updateCallQueueLenMetrics(callQueue);
}
{code}

  was:
Current if the callQueueSize exceed maxQueueSize, all call will be rejected, 
Should we let the priority Call pass through?

Current:
if ((callSize + callQueueSize.get())  maxQueueSize) {
  Call callTooBig = xxx
  return ;
}
if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
  priorityCallQueue.put(call);
  updateCallQueueLenMetrics(priorityCallQueue);
} else {
  callQueue.put(call);  // queue the call; maybe blocked here
  updateCallQueueLenMetrics(callQueue);
}

Should we change it to :
if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
  priorityCallQueue.put(call);
  updateCallQueueLenMetrics(priorityCallQueue);
} else {
  if ((callSize + callQueueSize.get())  maxQueueSize) {
   Call callTooBig = xxx
   return ;
  }
  callQueue.put(call);  // queue the call; maybe blocked here
  updateCallQueueLenMetrics(callQueue);
}



 If callQueueSize exceed maxQueueSize, all call will be rejected, do not 
 reject priorityCall 
 

 Key: HBASE-6480
 URL: https://issues.apache.org/jira/browse/HBASE-6480
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6480-94.patch, HBASE-6480-trunk.patch


 Current if the callQueueSize exceed maxQueueSize, all call will be rejected, 
 Should we let the priority Call pass through?
 Current:
 {code}
 if ((callSize + callQueueSize.get())  maxQueueSize) {
   Call callTooBig = xxx
   return ;
 }
 if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
   priorityCallQueue.put(call);
   updateCallQueueLenMetrics(priorityCallQueue);
 } else {
   callQueue.put(call);  // queue the call; maybe blocked here
   updateCallQueueLenMetrics(callQueue);
 }
 {code}
 Should we change it to :
 {code}
 if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
   priorityCallQueue.put(call);
   updateCallQueueLenMetrics(priorityCallQueue);
 } else {
   if ((callSize + callQueueSize.get())  maxQueueSize) {
Call callTooBig = xxx
return ;
   }
   callQueue.put(call);  // queue the call; maybe blocked here
   updateCallQueueLenMetrics(callQueue);
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6480) If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall


[ 
https://issues.apache.org/jira/browse/HBASE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426317#comment-13426317
 ] 

Zhihong Ted Yu commented on HBASE-6480:
---

Did you encounter this scenario on a cluster or, find this scenario through 
code analysis ?

How do we bound the size of priorityCallQueue after proposed change ?

Thanks

 If callQueueSize exceed maxQueueSize, all call will be rejected, do not 
 reject priorityCall 
 

 Key: HBASE-6480
 URL: https://issues.apache.org/jira/browse/HBASE-6480
 Project: HBase
  Issue Type: Bug
Reporter: binlijin
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6480-94.patch, HBASE-6480-trunk.patch


 Current if the callQueueSize exceed maxQueueSize, all call will be rejected, 
 Should we let the priority Call pass through?
 Current:
 {code}
 if ((callSize + callQueueSize.get())  maxQueueSize) {
   Call callTooBig = xxx
   return ;
 }
 if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
   priorityCallQueue.put(call);
   updateCallQueueLenMetrics(priorityCallQueue);
 } else {
   callQueue.put(call);  // queue the call; maybe blocked here
   updateCallQueueLenMetrics(callQueue);
 }
 {code}
 Should we change it to :
 {code}
 if (priorityCallQueue != null  getQosLevel(param)  highPriorityLevel) {
   priorityCallQueue.put(call);
   updateCallQueueLenMetrics(priorityCallQueue);
 } else {
   if ((callSize + callQueueSize.get())  maxQueueSize) {
Call callTooBig = xxx
return ;
   }
   callQueue.put(call);  // queue the call; maybe blocked here
   updateCallQueueLenMetrics(callQueue);
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent

[
https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426319#comment-13426319
]

Hadoop QA commented on HBASE-6476:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12538646/6476-v2.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 240 new or modified tests.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.master.TestAssignmentManager

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2467//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2467//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2467//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2467//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2467//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2467//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2467//console

This message is automatically generated.

Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge
equivalent
-

Key: HBASE-6476
URL: https://issues.apache.org/jira/browse/HBASE-6476
Project: HBase
Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
Fix For: 0.96.0

Attachments: 6476-v2.txt, 6476-v2.txt, 6476.txt

[jira] [Commented] (HBASE-6473) deletedtable is not deleted completely, some region may be still online

[
https://issues.apache.org/jira/browse/HBASE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426325#comment-13426325
]

Hadoop QA commented on HBASE-6473:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12538326/HBASE-6473-trunk.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2468//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2468//console

This message is automatically generated.

deletedtable is not deleted completely, some region may be still online
---

Key: HBASE-6473
URL: https://issues.apache.org/jira/browse/HBASE-6473
Project: HBase
Issue Type: Bug
Affects Versions: 0.94.0
Reporter: zhou wenjian
Fix For: 0.96.0, 0.94.2

Attachments: HBASE-6473-trunk.patch

consider such Scenario:
we have a table called T1, which has 1 regions: A
1. move A from rs1 to rs2,and A is now closed
2. disable T1,
3. delete T1.
when we disable T1, disable handler will just set the zk to disabled and A
will still be assigned. when Ais opened, A in transition will be clean out.
At that time, Deletetable found it is safe to delete all regions and table in
meta and fs , it will also delete the zk node of T1.
{code}
while (System.currentTimeMillis() done) {
AssignmentManager.RegionState rs = am.isRegionInTransition(region);
if (rs == null) break;
Threads.sleep(waitingTimeForEvents);
LOG.debug(Waiting on region to clear regions in transition; + rs);
}
if (am.isRegionInTransition(region) != null) {
throw new IOException(Waited hbase.master.wait.on.region ( +
waitTime + ms) for region to leave region +
region.getRegionNameAsString() + in transitions);
}
{code}
however A is still being unassigned, when it finished closed the A,it finds
that the disabled state in zk is deleted, and then A will be assigned again.

[jira] [Commented] (HBASE-6466) Enable multi-thread for memstore flush


[ 
https://issues.apache.org/jira/browse/HBASE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426327#comment-13426327
 ] 

Zhihong Ted Yu commented on HBASE-6466:
---

@Ram, @J-D:
Please share your comments.

 Enable multi-thread for memstore flush
 --

 Key: HBASE-6466
 URL: https://issues.apache.org/jira/browse/HBASE-6466
 Project: HBase
  Issue Type: Improvement
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6466.patch, HBASE-6466v2.patch, HBASE-6466v3.patch


 If the KV is large or Hlog is closed with high-pressure putting, we found 
 memstore is often above the high water mark and block the putting.
 So should we enable multi-thread for Memstore Flush?
 Some performance test data for reference,
 1.test environment ： 
 random writting；upper memstore limit 5.6GB;lower memstore limit 4.8GB;400 
 regions per regionserver；row len=50 bytes, value len=1024 bytes;5 
 regionserver, 300 ipc handler per regionserver;5 client, 50 thread handler 
 per client for writing
 2.test results:
 one cacheFlush handler, tps: 7.8k/s per regionserver, Flush:10.1MB/s per 
 regionserver, appears many aboveGlobalMemstoreLimit blocking
 two cacheFlush handlers, tps: 10.7k/s per regionserver, Flush:12.46MB/s per 
 regionserver,
 200 thread handler per client  two cacheFlush handlers, tps:16.1k/s per 
 regionserver, Flush:18.6MB/s per regionserver

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6478) TestClassLoading.testClassLoadingFromLibDirInJar in coprocessor may appear fail due to waitTableAvailable