[jira] [Commented] (HBASE-6468) RowCounter may return incorrect result if column name is specified in command line
[ https://issues.apache.org/jira/browse/HBASE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425548#comment-13425548 ] Shrijeet Paliwal commented on HBASE-6468: - {quote} FirstKeyValueMatchingQualifiersFilter - When a CF contains qualifier a,b,c and the qualifiers provided are b and c, what will happen to the KVs for qualifier 'a' ? Will this be included in the Result? Is this expected? {quote} Depends on whether or not first KV matching any of the columns associated with filter have been seen yet or not. {quote} Can the qualifiers be accommodated in FirstKeyOnlyFilter only? Do we need a new Filter? Just a though from my side. By default FirstKeyOnlyFilter allow only the 1st KV (from all the qualifiers) from a CF to come in the Result and will filter out other KVs. Specifying a set of qualifiers to FirstKeyOnlyFilter will restrict the selection of the 1st KV from a any of these qualifiers only. It will filter out all KVs from other qualifiers. {quote} FKVMatchingQualifiersFilter has a peculiar behavior hence (extending it from FirstKeyOnlyFilter and) creating a new filter made sense. RowCounter may return incorrect result if column name is specified in command line -- Key: HBASE-6468 URL: https://issues.apache.org/jira/browse/HBASE-6468 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: Shrijeet Paliwal Attachments: 0001-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 0002-HBASE-6468-RowCounter-may-return-incorrect-result.patch The RowCounter use FirstKeyOnlyFilter regardless of whether or not the command line argument specified a column family (or family:qualifier). In case when no qualifier was specified as argument, the scan will give correct result. However in the other case the scan instance may have been set with columns other than the very first column in the row, causing scan to get nothing as the FirstKeyOnlyFilter removes everything else. https://issues.apache.org/jira/browse/HBASE-6042 is related. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6479) HFileReaderV1 caching the same parent META block could cause server abot when splitting
chunhui shen created HBASE-6479: --- Summary: HFileReaderV1 caching the same parent META block could cause server abot when splitting Key: HBASE-6479 URL: https://issues.apache.org/jira/browse/HBASE-6479 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: chunhui shen Assignee: chunhui shen Attachments: test.patch If the hfile's version is 1 now, when splitting, two daughters would loadBloomfilter concurrently in the open progress. Because their META block is the same one(parent's META block), the following expection would be thrown when doing HFileReaderV1#getMetaBlock {code} java.io.IOException: Failed null-daughterOpener=af73f8c9a9b409531ac211a9a7f92eba at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:367) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:453) at org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplit(TestSplitTransaction.java:225) at org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplitWithHFileV1(TestSplitTransaction.java:203) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.io.IOException: java.io.IOException: java.lang.RuntimeException: Cached an already cached block at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:540) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:463) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3784) at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:506) at org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:486) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: java.lang.RuntimeException: Cached an already cached block at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:424) at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:271) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2918) at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:516) at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
[jira] [Updated] (HBASE-6479) HFileReaderV1 caching the same parent META block could cause server abot when splitting
[ https://issues.apache.org/jira/browse/HBASE-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-6479: Attachment: test.patch HFileReaderV1 caching the same parent META block could cause server abot when splitting --- Key: HBASE-6479 URL: https://issues.apache.org/jira/browse/HBASE-6479 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: chunhui shen Assignee: chunhui shen Attachments: test.patch If the hfile's version is 1 now, when splitting, two daughters would loadBloomfilter concurrently in the open progress. Because their META block is the same one(parent's META block), the following expection would be thrown when doing HFileReaderV1#getMetaBlock {code} java.io.IOException: Failed null-daughterOpener=af73f8c9a9b409531ac211a9a7f92eba at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:367) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:453) at org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplit(TestSplitTransaction.java:225) at org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplitWithHFileV1(TestSplitTransaction.java:203) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.io.IOException: java.io.IOException: java.lang.RuntimeException: Cached an already cached block at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:540) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:463) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3784) at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:506) at org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:486) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: java.lang.RuntimeException: Cached an already cached block at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:424) at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:271) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2918) at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:516) at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1)
[jira] [Updated] (HBASE-6479) HFileReaderV1 caching the same parent META block could cause server abot when splitting
[ https://issues.apache.org/jira/browse/HBASE-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-6479: Description: If the hfile's version is 1 now, when splitting, two daughters would loadBloomfilter concurrently in the open progress. Because their META block is the same one(parent's META block), the following expection would be thrown when doing HFileReaderV1#getMetaBlock {code} java.io.IOException: Failed null-daughterOpener=af73f8c9a9b409531ac211a9a7f92eba at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:367) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:453) at org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplit(TestSplitTransaction.java:225) at org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplitWithHFileV1(TestSplitTransaction.java:203) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.io.IOException: java.io.IOException: java.lang.RuntimeException: Cached an already cached block at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:540) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:463) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3784) at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:506) at org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:486) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: java.lang.RuntimeException: Cached an already cached block at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:424) at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:271) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2918) at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:516) at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at
[jira] [Commented] (HBASE-6479) HFileReaderV1 caching the same parent META block could cause server abot when splitting
[ https://issues.apache.org/jira/browse/HBASE-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425560#comment-13425560 ] chunhui shen commented on HBASE-6479: - An easy way to fix this case is disable cache meta block when loadBloomfilter() or don't throw the exception of Cached an already cached block HFileReaderV1 caching the same parent META block could cause server abot when splitting --- Key: HBASE-6479 URL: https://issues.apache.org/jira/browse/HBASE-6479 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: chunhui shen Assignee: chunhui shen Attachments: test.patch If the hfile's version is 1 now, when splitting, two daughters would loadBloomfilter concurrently in the open progress. Because their META block is the same one(parent's META block), the following expection would be thrown when doing HFileReaderV1#getMetaBlock {code} java.io.IOException: Failed null-daughterOpener=af73f8c9a9b409531ac211a9a7f92eba at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:367) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:453) at org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplit(TestSplitTransaction.java:225) at org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplitWithHFileV1(TestSplitTransaction.java:203) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.io.IOException: java.io.IOException: java.lang.RuntimeException: Cached an already cached block at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:540) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:463) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3784) at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:506) at org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:486) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: java.lang.RuntimeException: Cached an already cached block at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:424) at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:271) at
[jira] [Updated] (HBASE-6479) HFileReaderV1 caching the same parent META block could cause server abot when splitting
[ https://issues.apache.org/jira/browse/HBASE-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-6479: Attachment: HBASE-6479.patch HFileReaderV1 caching the same parent META block could cause server abot when splitting --- Key: HBASE-6479 URL: https://issues.apache.org/jira/browse/HBASE-6479 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6479.patch, test.patch If the hfile's version is 1 now, when splitting, two daughters would loadBloomfilter concurrently in the open progress. Because their META block is the same one(parent's META block), the following expection would be thrown when doing HFileReaderV1#getMetaBlock {code} java.io.IOException: Failed null-daughterOpener=af73f8c9a9b409531ac211a9a7f92eba at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:367) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:453) at org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplit(TestSplitTransaction.java:225) at org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testWholesomeSplitWithHFileV1(TestSplitTransaction.java:203) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.io.IOException: java.io.IOException: java.lang.RuntimeException: Cached an already cached block at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:540) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:463) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3784) at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:506) at org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:486) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: java.lang.RuntimeException: Cached an already cached block at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:424) at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:271) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2918) at org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:516) at
[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425592#comment-13425592 ] nkeywal commented on HBASE-6476: bq. How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? It could easily be done on the build env, as there is a script that we can change. We could add a simple grep there. The proper way would be to run something as pmd, adding rules is not difficult. But it would require some configuration to distinguish the debt vs. the new errors. Or we would activate only the totally clean rules. bq. Would be a problem too, if we globally mess with the EnvironmentEdge. There are some tests that play with the EnvironmentEdgeManager, they had to be made medium as it was not possible to have them on a shared jvm as the small tests. Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.2 There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6468) RowCounter may return incorrect result if column name is specified in command line
[ https://issues.apache.org/jira/browse/HBASE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425604#comment-13425604 ] Anoop Sam John commented on HBASE-6468: --- So when a row r1 contains KVs with qualifier a,b,c (for a give CF) and the qualifier in FirstKeyValueMatchingQualifiersFilter are b,c we will include all KVs for qualifier a and one KV(1st KV) for qualifier b/c in the Result for row r1. Is this expected? I was thinking that this new Filter also will select only one KV for a row. But the selection is from a subset of qualifiers not whole. [KVs for qualifier a will come before b and c] RowCounter may return incorrect result if column name is specified in command line -- Key: HBASE-6468 URL: https://issues.apache.org/jira/browse/HBASE-6468 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: Shrijeet Paliwal Attachments: 0001-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 0002-HBASE-6468-RowCounter-may-return-incorrect-result.patch The RowCounter use FirstKeyOnlyFilter regardless of whether or not the command line argument specified a column family (or family:qualifier). In case when no qualifier was specified as argument, the scan will give correct result. However in the other case the scan instance may have been set with columns other than the very first column in the row, causing scan to get nothing as the FirstKeyOnlyFilter removes everything else. https://issues.apache.org/jira/browse/HBASE-6042 is related. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6480) If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall
binlijin created HBASE-6480: --- Summary: If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall Key: HBASE-6480 URL: https://issues.apache.org/jira/browse/HBASE-6480 Project: HBase Issue Type: Bug Reporter: binlijin Current if the callQueueSize exceed maxQueueSize, all call will be rejected, Should we let the priority Call pass through? Current: if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } Should we change it to : if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6480) If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall
[ https://issues.apache.org/jira/browse/HBASE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-6480: Attachment: HBASE-6480-94.patch If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall Key: HBASE-6480 URL: https://issues.apache.org/jira/browse/HBASE-6480 Project: HBase Issue Type: Bug Reporter: binlijin Attachments: HBASE-6480-94.patch, HBASE-6480-trunk.patch Current if the callQueueSize exceed maxQueueSize, all call will be rejected, Should we let the priority Call pass through? Current: if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } Should we change it to : if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6480) If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall
[ https://issues.apache.org/jira/browse/HBASE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-6480: Attachment: HBASE-6480-trunk.patch If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall Key: HBASE-6480 URL: https://issues.apache.org/jira/browse/HBASE-6480 Project: HBase Issue Type: Bug Reporter: binlijin Attachments: HBASE-6480-94.patch, HBASE-6480-trunk.patch Current if the callQueueSize exceed maxQueueSize, all call will be rejected, Should we let the priority Call pass through? Current: if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } Should we change it to : if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6372) Add scanner batching to Export job
[ https://issues.apache.org/jira/browse/HBASE-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shengsheng Huang updated HBASE-6372: Attachment: HBASE-6372.2.patch Modified patch according to @Jonathan's suggestion. Also removed the formatting issue as @stack indicates. Add scanner batching to Export job -- Key: HBASE-6372 URL: https://issues.apache.org/jira/browse/HBASE-6372 Project: HBase Issue Type: Improvement Components: mapreduce Affects Versions: 0.96.0, 0.94.2 Reporter: Lars George Assignee: Shengsheng Huang Priority: Minor Labels: newbie Attachments: HBASE-6372.2.patch, HBASE-6372.patch When a single row is too large for the RS heap then an OOME can take out the entire RS. Setting scanner batching in custom scans helps avoiding this scenario, but for the supplied Export job this is not set. Similar to HBASE-3421 we can set the batching to a low number - or if needed make it a command line option. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6372) Add scanner batching to Export job
[ https://issues.apache.org/jira/browse/HBASE-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425643#comment-13425643 ] Hadoop QA commented on HBASE-6372: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538518/HBASE-6372.2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestAtomicOperation Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2459//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2459//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2459//console This message is automatically generated. Add scanner batching to Export job -- Key: HBASE-6372 URL: https://issues.apache.org/jira/browse/HBASE-6372 Project: HBase Issue Type: Improvement Components: mapreduce Affects Versions: 0.96.0, 0.94.2 Reporter: Lars George Assignee: Shengsheng Huang Priority: Minor Labels: newbie Attachments: HBASE-6372.2.patch, HBASE-6372.patch When a single row is too large for the RS heap then an OOME can take out the entire RS. Setting scanner batching in custom scans helps avoiding this scenario, but for the supplied Export job this is not set. Similar to HBASE-3421 we can set the batching to a low number - or if needed make it a command line option. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6460) hbck -repairHoles shortcut doesn't enable -fixHdfsOrphans
[ https://issues.apache.org/jira/browse/HBASE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425746#comment-13425746 ] Jie Huang commented on HBASE-6460: -- OK. I see your point. We can fix this issue to make the help info and the code implementation consistent here. Regarding that feature, I wonder if we can run hbck without -fixHdfsOrphans to ignore those orphan regions. Any comment? hbck -repairHoles shortcut doesn't enable -fixHdfsOrphans - Key: HBASE-6460 URL: https://issues.apache.org/jira/browse/HBASE-6460 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.0, 0.96.0 Reporter: Jie Huang Priority: Minor Attachments: hbase-6460.patch According to the hbck's help info, shortcut - -repairHoles will enable -fixHdfsOrphans as below. {noformat} -repairHoles Shortcut for -fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans {noformat} However, in the implementation, the function fsck.setFixHdfsOrphans(false); is called in -repairHoles. This is not consistent with the usage information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6468) RowCounter may return incorrect result if column name is specified in command line
[ https://issues.apache.org/jira/browse/HBASE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425780#comment-13425780 ] Zhihong Ted Yu commented on HBASE-6468: --- @Anoop: The behavior you described is expected for the new Filter, considering it is used for row counting. RowCounter may return incorrect result if column name is specified in command line -- Key: HBASE-6468 URL: https://issues.apache.org/jira/browse/HBASE-6468 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: Shrijeet Paliwal Attachments: 0001-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 0002-HBASE-6468-RowCounter-may-return-incorrect-result.patch The RowCounter use FirstKeyOnlyFilter regardless of whether or not the command line argument specified a column family (or family:qualifier). In case when no qualifier was specified as argument, the scan will give correct result. However in the other case the scan instance may have been set with columns other than the very first column in the row, causing scan to get nothing as the FirstKeyOnlyFilter removes everything else. https://issues.apache.org/jira/browse/HBASE-6042 is related. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Baranau updated HBASE-6411: Attachment: HBASE-6411-4.patch Uploaded newer diff on review board. Fixed nits. Note: after trunk update had to remove {noformat} public RegionsInTransitionInfo[] getRegionsInTransition() {noformat} method from master.metrics.MBean. I believe that this metric is now exposed via metrics, not via this bean. Please correct me if I'm wrong. There are two Qs for Elliott inside, pasting here for convenience: 1. {noformat} /hbase-server/src/main/java/org/apache/hadoop/hbase/master/metrics/MXBean.java (Diff revision 1) 18 package org.apache.hadoop.hbase.master.metrics; {noformat} Ted: Should this class be in org.apache.hadoop.hbase.master namespace ? Alex Baranau: Hm, I guess we now have two pairs of classes: MXBean and MXBeanImpl in org.apache.hadoop.hbase.master and in org.apache.hadoop.hbase.master.metrics. Not sure what was intended by Elliott here. I assume that he forgot to remove one of them (in org.apache.hadoop.hbase.master? why to move it in metrics package then?) Elliott, could you provide some insight here please? 2. {noformat} /hbase-server/src/main/java/org/apache/hadoop/hbase/master/metrics/MasterMetrics.java (Diff revision 1) 63 Deleted: final MetricsHistogram splitTime = new MetricsHistogram(splitTime, registry); {noformat} We don't maintain such metrics now ? Alex Baranau I believe Elliott is working on new such metrics (different issue) and this is why he removed it. Elliott? Move Master Metrics to metrics 2 Key: HBASE-6411 URL: https://issues.apache.org/jira/browse/HBASE-6411 Project: HBase Issue Type: Sub-task Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch, HBASE-6411_concept.patch Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425867#comment-13425867 ] Lars Hofhansl commented on HBASE-6476: -- bq. There are some tests that play with the EnvironmentEdgeManager, they had to be made medium as it was not possible to have them on a shared jvm as the small tests. So simply replacing all of System.currentTimeMillis() with EnvironmentEdgeManager.currentTimeMillis() should not be a problem, but if a test would actually mess with it, it would need to run on its own JVM. Do you see any other problems with just doing wholesale scripted replace? Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.2 There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425877#comment-13425877 ] nkeywal commented on HBASE-6476: I think it should be ok! And it will be cleaner as well. Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.2 There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425878#comment-13425878 ] nkeywal commented on HBASE-6435: Tested on a real cluster by adding validation code on a region server, went ok. I don't have a real idea on how to activate it just for some hadoop versions, so I will do a last clean-up on the logs and propose a final version. Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch, 6435.v2.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6460) hbck -repairHoles shortcut doesn't enable -fixHdfsOrphans
[ https://issues.apache.org/jira/browse/HBASE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425915#comment-13425915 ] Jonathan Hsieh commented on HBASE-6460: --- Jie, if the -fixHdfsOrphans option is not set, it will not attempt to fix the problem, but still will report it. The -repairHoles flag is purely a convenience option. It probably better to exclude also because we can easily set -repairHoles -fixHdfsOrphans but currently cannot take away a set option. When you say ignore, do you mean treat it as a warning as opposed to a error? hbck -repairHoles shortcut doesn't enable -fixHdfsOrphans - Key: HBASE-6460 URL: https://issues.apache.org/jira/browse/HBASE-6460 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.0, 0.96.0 Reporter: Jie Huang Priority: Minor Attachments: hbase-6460.patch According to the hbck's help info, shortcut - -repairHoles will enable -fixHdfsOrphans as below. {noformat} -repairHoles Shortcut for -fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans {noformat} However, in the implementation, the function fsck.setFixHdfsOrphans(false); is called in -repairHoles. This is not consistent with the usage information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5987) HFileBlockIndex improvement
[ https://issues.apache.org/jira/browse/HBASE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-5987: -- Fix Version/s: 0.96.0 HFileBlockIndex improvement --- Key: HBASE-5987 URL: https://issues.apache.org/jira/browse/HBASE-5987 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.96.0 Attachments: D3237.1.patch, D3237.2.patch, D3237.3.patch, D3237.4.patch, D3237.5.patch, D3237.6.patch, D3237.7.patch, D3237.8.patch, screen_shot_of_sequential_scan_profiling.png Recently we find out a performance problem that it is quite slow when multiple requests are reading the same block of data or index. From the profiling, one of the causes is the IdLock contention which has been addressed in HBASE-5898. Another issue is that the HFileScanner will keep asking the HFileBlockIndex about the data block location for each target key value during the scan process(reSeekTo), even though the target key value has already been in the current data block. This issue will cause certain index block very HOT, especially when it is a sequential scan. To solve this issue, we propose the following solutions: First, we propose to lookahead for one more block index so that the HFileScanner would know the start key value of next data block. So if the target key value for the scan(reSeekTo) is smaller than that start kv of next data block, it means the target key value has a very high possibility in the current data block (if not in current data block, then the start kv of next data block should be returned. +Indexing on the start key has some defects here+) and it shall NOT query the HFileBlockIndex in this case. On the contrary, if the target key value is bigger, then it shall query the HFileBlockIndex. This improvement shall help to reduce the hotness of HFileBlockIndex and avoid some unnecessary IdLock Contention or Index Block Cache lookup. Secondary, we propose to push this idea a little further that the HFileBlockIndex shall index on the last key value of each data block instead of indexing on the start key value. The motivation is to solve the HBASE-4443 issue (avoid seeking to previous block when key you are interested in is the first one of a block) as well as +the defects mentioned above+. For example, if the target key value is smaller than the start key value of the data block N. There is no way for sure the target key value is in the data block N or N-1. So it has to seek from data block N-1. However, if the block index is based on the last key value for each data block and the target key value is beween the last key value of data block N-1 and data block N, then the target key value is supposed be data block N for sure. As long as HBase only supports the forward scan, the last key value makes more sense to be indexed on than the start key value. Thanks Kannan and Mikhail for the insightful discussions and suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6454) Write PB definitions for filters
[ https://issues.apache.org/jira/browse/HBASE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425938#comment-13425938 ] Gregory Chanan commented on HBASE-6454: --- It was originally nested in Condition in Client.proto. I needed CompareType for filters and all the shared types are in hbase.proto, so I moved it there. I didn't need the entire condition type, so I just took what I needed. I'm happy to do whatever you think is best here. Write PB definitions for filters Key: HBASE-6454 URL: https://issues.apache.org/jira/browse/HBASE-6454 Project: HBase Issue Type: Task Components: ipc, migration Reporter: Gregory Chanan Assignee: Gregory Chanan Fix For: 0.96.0 Attachments: HBASE-6454.patch See HBASE-5447. Conversion to protobuf requires writing protobuf definitions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425943#comment-13425943 ] Elliott Clark commented on HBASE-6411: -- {quote} Hm, I guess we now have two pairs of classes: MXBean and MXBeanImpl in org.apache.hadoop.hbase.master and in org.apache.hadoop.hbase.master.metrics. Not sure what was intended by Elliott here. I assume that he forgot to remove one of them (in org.apache.hadoop.hbase.master? why to move it in metrics package then?) Elliott, could you provide some insight here please? {quote} Yea, I must have just missed deleting them. The move was just because those classes are only about metrics and not used anywhere else. So might as well clean up as we go. They were interface private so moving shouldn't be an issue. {quote} I believe Elliott is working on new such metrics (different issue) and this is why he removed it. Elliott? {quote} Correct. One of the sub-tasks of 4050 is creating a metrics2 histogram(Actually there will be two but that's out of scope) and using histograms where ever it's useful. Move Master Metrics to metrics 2 Key: HBASE-6411 URL: https://issues.apache.org/jira/browse/HBASE-6411 Project: HBase Issue Type: Sub-task Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch, HBASE-6411_concept.patch Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425949#comment-13425949 ] Lars Hofhansl commented on HBASE-6476: -- What could happen, though, is that a test that formerly used System.currentTimeMillis that was run in shared VM with a test that messed with the environmentedge, would not potentially have problems if we switched it to EnvironmentEdge. Although, I do not think there are many of these, and a test run will show. Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.2 There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425953#comment-13425953 ] Andrew Purtell commented on HBASE-6476: --- +1 We've been replacing as needed but why not a one time global replacement. Adding a conformance check is nice. Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.2 There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425954#comment-13425954 ] Zhihong Ted Yu commented on HBASE-6411: --- Since MXBean.java is in master.metrics, should TestMXBean.java be in the same package ? Move Master Metrics to metrics 2 Key: HBASE-6411 URL: https://issues.apache.org/jira/browse/HBASE-6411 Project: HBase Issue Type: Sub-task Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch, HBASE-6411_concept.patch Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6454) Write PB definitions for filters
[ https://issues.apache.org/jira/browse/HBASE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425955#comment-13425955 ] Zhihong Ted Yu commented on HBASE-6454: --- That's fine Gregory. Looking forward to HBASE-6477. I will integrate by tomorrow morning if there is no objection. Write PB definitions for filters Key: HBASE-6454 URL: https://issues.apache.org/jira/browse/HBASE-6454 Project: HBase Issue Type: Task Components: ipc, migration Reporter: Gregory Chanan Assignee: Gregory Chanan Fix For: 0.96.0 Attachments: HBASE-6454.patch See HBASE-5447. Conversion to protobuf requires writing protobuf definitions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6478) TestClassLoading.testClassLoadingFromLibDirInJar in coprocessor may appear fail due to waitTableAvailable
[ https://issues.apache.org/jira/browse/HBASE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425956#comment-13425956 ] Andrew Purtell commented on HBASE-6478: --- Or does the contract implied by waitTableAvailable suggest improving its test rather than adding a new waitTableEnabled? TestClassLoading.testClassLoadingFromLibDirInJar in coprocessor may appear fail due to waitTableAvailable - Key: HBASE-6478 URL: https://issues.apache.org/jira/browse/HBASE-6478 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: zhou wenjian Fix For: 0.96.0 Attachments: HBASE-6478-trunk.patch When hudson runs for HBASE-6459, it encounters a failed testcase in org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLibDirInJar. The link is https://builds.apache.org/job/PreCommit-HBASE-Build/2455/testReport/org.apache.hadoop.hbase.coprocessor/TestClassLoading/testClassLoadingFromLibDirInJar/ I check the log, and find that the function waitTableAvailable will only check the meta table, when rs open the region and update the metalocation in meta, it may not be added to the onlineregions in rs. for (HRegion region: hbase.getRegionServer(0).getOnlineRegionsLocalContext()) { this Loop will ship, and found1 will be false altogether. that's why the testcase failed. So maybe we can hbave some strictly check when table is created -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6435: --- Attachment: 6435.v7.patch Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6435: --- Status: Patch Available (was: Open) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425959#comment-13425959 ] nkeywal commented on HBASE-6435: Ok for review... Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425980#comment-13425980 ] Zhihong Ted Yu commented on HBASE-6435: --- Just started to look at the patch. It doesn't compile against hadoop 2.0: {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project hbase-server: Compilation failure: Compilation failure: [ERROR] /Users/zhihyu/trunk-hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:[214,12] namenode is not public in org.apache.hadoop.hdfs.DFSClient; cannot be accessed from outside package [ERROR] [ERROR] /Users/zhihyu/trunk-hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:[221,52] namenode is not public in org.apache.hadoop.hdfs.DFSClient; cannot be accessed from outside package [ERROR] [ERROR] /Users/zhihyu/trunk-hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:[289,81] cannot find symbol [ERROR] symbol : method getHost() [ERROR] location: class org.apache.hadoop.hdfs.protocol.DatanodeInfo {code} Can we give the following a more meaningful name ? {code} +if (!conf.getBoolean(hbase.hdfs.jira6435, true)){ // activated by default {code} Comment from Todd would be appreciated. Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425988#comment-13425988 ] nkeywal commented on HBASE-6435: I will have a look at the hadoop2 stuff. for bq. Can we give the following a more meaningful name ? Do you have an idea? Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6476: - Attachment: 6476.txt Here's a gigantic patch for trunk. I manually had to fix the imports in many of the classes (smarter people would used Eclipse to script that, but anyway). I'm not expecting anybody to review this. If the HadoopQA succeeds that should be good enough. I also ran some validation scripts to make sure all files referring to EnvironmentEdgeManager have the matching imports. Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.2 Attachments: 6476.txt There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6476: - Status: Patch Available (was: Open) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.2 Attachments: 6476.txt There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6476: - Fix Version/s: (was: 0.94.2) 0.96.0 Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0 Attachments: 6476.txt There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425995#comment-13425995 ] Zhihong Ted Yu commented on HBASE-6435: --- How about 'hbase.filesystem.reorder.blocks' ? BTW replacing 'Hack' with some form of 'Intercept' would be better IMHO. Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Baranau updated HBASE-6411: Attachment: HBASE-6411-4_2.patch Cleaned up redundant MXBean and MXBeanImpl. Looks like all comments and Qs are resolved. Move Master Metrics to metrics 2 Key: HBASE-6411 URL: https://issues.apache.org/jira/browse/HBASE-6411 Project: HBase Issue Type: Sub-task Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch, HBASE-6411-4_2.patch, HBASE-6411_concept.patch Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426005#comment-13426005 ] Alex Baranau commented on HBASE-6411: - bq. Since MXBean.java is in master.metrics, should TestMXBean.java be in the same package ? I'd say it may be the same situation as with TestMasterMetrics. It's just easier to place it here as test relies on access to master internal state heavily. Move Master Metrics to metrics 2 Key: HBASE-6411 URL: https://issues.apache.org/jira/browse/HBASE-6411 Project: HBase Issue Type: Sub-task Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch, HBASE-6411-4_2.patch, HBASE-6411_concept.patch Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426014#comment-13426014 ] Hadoop QA commented on HBASE-6476: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538589/6476.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 240 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestCoprocessorInterface org.apache.hadoop.hbase.master.TestClockSkewDetection org.apache.hadoop.hbase.TestKeyValue org.apache.hadoop.hbase.regionserver.wal.TestWALActionsListener org.apache.hadoop.hbase.regionserver.TestQueryMatcher org.apache.hadoop.hbase.metrics.TestMetricsMBeanBase org.apache.hadoop.hbase.filter.TestDependentColumnFilter org.apache.hadoop.hbase.regionserver.TestResettingCounters org.apache.hadoop.hbase.coprocessor.TestRegionObserverStacking org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster org.apache.hadoop.hbase.regionserver.wal.TestHLogMethods org.apache.hadoop.hbase.regionserver.TestBlocksScanned org.apache.hadoop.hbase.util.TestHFileArchiveUtil org.apache.hadoop.hbase.regionserver.TestMinVersions org.apache.hadoop.hbase.regionserver.TestCompactSelection org.apache.hadoop.hbase.regionserver.TestSplitTransaction org.apache.hadoop.hbase.ipc.TestPBOnWritableRpc org.apache.hadoop.hbase.TestSerialization org.apache.hadoop.hbase.regionserver.TestScanner org.apache.hadoop.hbase.util.TestHBaseFsckComparator org.apache.hadoop.hbase.util.TestByteBloomFilter org.apache.hadoop.hbase.master.cleaner.TestHFileCleaner org.apache.hadoop.hbase.regionserver.TestKeepDeletes org.apache.hadoop.hbase.util.TestThreads org.apache.hadoop.hbase.regionserver.TestRSStatusServlet org.apache.hadoop.hbase.master.TestCatalogJanitor org.apache.hadoop.hbase.regionserver.TestRegionSplitPolicy org.apache.hadoop.hbase.regionserver.TestScanWithBloomError org.apache.hadoop.hbase.client.TestIntraRowPagination org.apache.hadoop.hbase.regionserver.TestHRegionInfo org.apache.hadoop.hbase.regionserver.TestWideScanner org.apache.hadoop.hbase.migration.TestMigrationFrom090To092 org.apache.hadoop.hbase.monitoring.TestTaskMonitor org.apache.hadoop.hbase.regionserver.TestColumnSeeking org.apache.hadoop.hbase.TestCompare org.apache.hadoop.hbase.filter.TestFilter org.apache.hadoop.hbase.regionserver.TestStoreFile org.apache.hadoop.hbase.filter.TestColumnPrefixFilter org.apache.hadoop.hbase.monitoring.TestMemoryBoundedLogMessageBuffer org.apache.hadoop.hbase.filter.TestMultipleColumnPrefixFilter Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2462//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2462//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2462//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2462//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2462//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2462//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2462//console This message is automatically generated. Replace all occurrances of System.currentTimeMillis()
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426015#comment-13426015 ] nkeywal commented on HBASE-6435: Ok. I wanted to make clear it was a temporary workaround. Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6471) Performance regression caused by HBASE-4054
[ https://issues.apache.org/jira/browse/HBASE-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426017#comment-13426017 ] Lars Hofhansl commented on HBASE-6471: -- The way we have HTablePool.PooledHTable extending HTable is actually quite terrible. For example append was not added to it, so for append it won't go through the delegate... Which happens to be fine, because the delegate is not needed to begin with when PooledHTable just extends HTable. At the very least we should remove the delegate and just override close() and toString().. or fix HBASE-5728. Performance regression caused by HBASE-4054 --- Key: HBASE-6471 URL: https://issues.apache.org/jira/browse/HBASE-6471 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Lars George Priority: Critical Fix For: 0.94.2 The patch in HBASE-4054 switches the PooledHTable to extend HTable as opposed to implement HTableInterface. Since HTable does not have an empty constructor, the patch added a call to the super() constructor, which though does trigger the ZooKeeper and META scan, causing a considerable delay. With multiple threads using the pool in parallel, the first thread is holding up all the subsequent ones, in effect it negates the whole reason we have a HTable pool. We should complete HBASE-5728, or alternatively add a protected, empty constructor the HTable. I am +1 for the former. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6468) RowCounter may return incorrect result if column name is specified in command line
[ https://issues.apache.org/jira/browse/HBASE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shrijeet Paliwal updated HBASE-6468: Attachment: 0004-HBASE-6468-RowCounter-may-return-incorrect-result.patch Patch off trunk. RowCounter may return incorrect result if column name is specified in command line -- Key: HBASE-6468 URL: https://issues.apache.org/jira/browse/HBASE-6468 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: Shrijeet Paliwal Attachments: 0001-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 0002-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 0004-HBASE-6468-RowCounter-may-return-incorrect-result.patch The RowCounter use FirstKeyOnlyFilter regardless of whether or not the command line argument specified a column family (or family:qualifier). In case when no qualifier was specified as argument, the scan will give correct result. However in the other case the scan instance may have been set with columns other than the very first column in the row, causing scan to get nothing as the FirstKeyOnlyFilter removes everything else. https://issues.apache.org/jira/browse/HBASE-6042 is related. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5728) Methods Missing in HTableInterface
[ https://issues.apache.org/jira/browse/HBASE-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426024#comment-13426024 ] Lars Hofhansl commented on HBASE-5728: -- These: {code} public MapHRegionInfo, HServerAddress getRegionsInfo() throws IOException; public HRegionLocation getRegionLocation(String row) throws IOException; public HRegionLocation getRegionLocation(byte[] row) throws IOException; public void prewarmRegionCache(MapHRegionInfo, HServerAddress regionMap); public void clearRegionCache(); public long getWriteBufferSize(); public void setWriteBufferSize(long writeBufferSize) throws IOException, public ArrayListPut getWriteBuffer(); {code} Would leak implementation stuff into the interface. I think HBASE-4054 specifically mentions, that {code}public MapHRegionInfo, HServerAddress getRegionsInfo() throws IOException;{code} is needed. Hmm... Methods Missing in HTableInterface -- Key: HBASE-5728 URL: https://issues.apache.org/jira/browse/HBASE-5728 Project: HBase Issue Type: Improvement Components: client Reporter: Bing Li Dear all, I found some methods existed in HTable were not in HTableInterface. setAutoFlush setWriteBufferSize ... In most cases, I manipulate HBase through HTableInterface from HTablePool. If I need to use the above methods, how to do that? I am considering writing my own table pool if no proper ways. Is it fine? Thanks so much! Best regards, Bing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426066#comment-13426066 ] Hadoop QA commented on HBASE-6411: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538593/HBASE-6411-4_2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 45 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause mvn compile goal to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2463//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2463//console This message is automatically generated. Move Master Metrics to metrics 2 Key: HBASE-6411 URL: https://issues.apache.org/jira/browse/HBASE-6411 Project: HBase Issue Type: Sub-task Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch, HBASE-6411-4_2.patch, HBASE-6411_concept.patch Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Baranau reassigned HBASE-6411: --- Assignee: Alex Baranau (was: Elliott Clark) Move Master Metrics to metrics 2 Key: HBASE-6411 URL: https://issues.apache.org/jira/browse/HBASE-6411 Project: HBase Issue Type: Sub-task Reporter: Elliott Clark Assignee: Alex Baranau Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch, HBASE-6411-4_2.patch, HBASE-6411_concept.patch Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426070#comment-13426070 ] Alex Baranau commented on HBASE-6411: - Not sure if I should do smth about these: bq. -1 javac. The patch appears to cause mvn compile goal to fail. bq. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. Please let me know. Move Master Metrics to metrics 2 Key: HBASE-6411 URL: https://issues.apache.org/jira/browse/HBASE-6411 Project: HBase Issue Type: Sub-task Reporter: Elliott Clark Assignee: Alex Baranau Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, HBASE-6411-2.patch, HBASE-6411-3.patch, HBASE-6411-4.patch, HBASE-6411-4_2.patch, HBASE-6411_concept.patch Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6481) SkipFilter javadoc is incorrect
Shrijeet Paliwal created HBASE-6481: --- Summary: SkipFilter javadoc is incorrect Key: HBASE-6481 URL: https://issues.apache.org/jira/browse/HBASE-6481 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Shrijeet Paliwal Priority: Minor The javadoc for SkipFilter (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SkipFilter.html) states : A wrapper filter that filters an entire row if any of the KeyValue checks do not pass. But the example same javadocs gives to support this statement is wrong. The *scan.setFilter(new SkipFilter(new ValueFilter(CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes(0;* , will only emit rows which have all column values zero. In other words it is going to skip all rows for which ValueFilter(CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes(0))) does not pass , which happen to be all non zero valued cells. In the same example a ValueFilter created with CompareOp.NOT_EQUAL will filter out the rows which have a column value zero. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6427) Pluggable compaction and scan policies via coprocessors
[ https://issues.apache.org/jira/browse/HBASE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426090#comment-13426090 ] Elliott Clark commented on HBASE-6427: -- An InterfaceAudience annotation slipped in here. It breaks older hadoop versions(HBASE-6141). Pluggable compaction and scan policies via coprocessors --- Key: HBASE-6427 URL: https://issues.apache.org/jira/browse/HBASE-6427 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.2 Attachments: 6427-0.94.txt, 6427-notReady.txt, 6427-v1.txt, 6427-v10.txt, 6427-v2.txt, 6427-v3.txt, 6427-v4.txt, 6427-v5.txt, 6427-v7.txt When implementing higher level stores on top of HBase it is necessary to allow dynamic control over how long KVs must be kept around. Semi-static config options for ColumnFamilies (# of version or TTL) is not sufficient. This can be done with a few additional coprocessor hooks, or by makeing Store.ScanInfo pluggable. Was: The simplest way to achieve this is to have a pluggable class to determine the smallestReadpoint for Region. That way outside code can control what KVs to retain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426093#comment-13426093 ] Lars Hofhansl commented on HBASE-6476: -- That doesn't look too good. Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0 Attachments: 6476.txt There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426096#comment-13426096 ] Lars Hofhansl commented on HBASE-6476: -- Argghh... I am an idiot. My script replaced System.currentTimeMillis() with EnvironmentEdgeManager.currentTimeMillis() in DefaultEnvironmentEgde. Obviously that leads to an endless loop. Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0 Attachments: 6476.txt There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6435: --- Status: Open (was: Patch Available) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6435: --- Attachment: 6435.v8.patch Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6476: - Attachment: 6476-v2.txt Let's try again. Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0 Attachments: 6476-v2.txt, 6476.txt There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6435: --- Status: Patch Available (was: Open) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426102#comment-13426102 ] nkeywal commented on HBASE-6435: v8 works ok with hadoop 1 hadoop 2 and other Ted's comments. I tried the v3 profile, but got errors in the pom.xml. Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6427) Pluggable compaction and scan policies via coprocessors
[ https://issues.apache.org/jira/browse/HBASE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6427: - Attachment: 6427-0.94-addendum.txt Oops... Yes. Here's an addendum. Pluggable compaction and scan policies via coprocessors --- Key: HBASE-6427 URL: https://issues.apache.org/jira/browse/HBASE-6427 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.2 Attachments: 6427-0.94-addendum.txt, 6427-0.94.txt, 6427-notReady.txt, 6427-v1.txt, 6427-v10.txt, 6427-v2.txt, 6427-v3.txt, 6427-v4.txt, 6427-v5.txt, 6427-v7.txt When implementing higher level stores on top of HBase it is necessary to allow dynamic control over how long KVs must be kept around. Semi-static config options for ColumnFamilies (# of version or TTL) is not sufficient. This can be done with a few additional coprocessor hooks, or by makeing Store.ScanInfo pluggable. Was: The simplest way to achieve this is to have a pluggable class to determine the smallestReadpoint for Region. That way outside code can control what KVs to retain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6427) Pluggable compaction and scan policies via coprocessors
[ https://issues.apache.org/jira/browse/HBASE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426107#comment-13426107 ] Lars Hofhansl commented on HBASE-6427: -- Committed addendum, thanks for watching Elliot. Pluggable compaction and scan policies via coprocessors --- Key: HBASE-6427 URL: https://issues.apache.org/jira/browse/HBASE-6427 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.2 Attachments: 6427-0.94-addendum.txt, 6427-0.94.txt, 6427-notReady.txt, 6427-v1.txt, 6427-v10.txt, 6427-v2.txt, 6427-v3.txt, 6427-v4.txt, 6427-v5.txt, 6427-v7.txt When implementing higher level stores on top of HBase it is necessary to allow dynamic control over how long KVs must be kept around. Semi-static config options for ColumnFamilies (# of version or TTL) is not sufficient. This can be done with a few additional coprocessor hooks, or by makeing Store.ScanInfo pluggable. Was: The simplest way to achieve this is to have a pluggable class to determine the smallestReadpoint for Region. That way outside code can control what KVs to retain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6468) RowCounter may return incorrect result if column name is specified in command line
[ https://issues.apache.org/jira/browse/HBASE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426113#comment-13426113 ] Zhihong Ted Yu commented on HBASE-6468: --- {code} + * Constructor which takes a list of columns. As soon as first KeyValue {code} Now that the parameter has changed to Setbyte [], the above javadoc should be modified. {code} + * matching any of these columns if found, filter moves to next row. {code} 'if found' - 'is found'. {code} + * @param qualifiers the list of columns to me matched. {code} Change to 'the set of columns to be matched' Looks like HBASE-6454 may go in ahead of this JIRA. So Filter.proto should have the following: {code} +message FirstKeyValueMatchingQualifiersFilter { +} {code} RowCounter may return incorrect result if column name is specified in command line -- Key: HBASE-6468 URL: https://issues.apache.org/jira/browse/HBASE-6468 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: Shrijeet Paliwal Attachments: 0001-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 0002-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 0004-HBASE-6468-RowCounter-may-return-incorrect-result.patch The RowCounter use FirstKeyOnlyFilter regardless of whether or not the command line argument specified a column family (or family:qualifier). In case when no qualifier was specified as argument, the scan will give correct result. However in the other case the scan instance may have been set with columns other than the very first column in the row, causing scan to get nothing as the FirstKeyOnlyFilter removes everything else. https://issues.apache.org/jira/browse/HBASE-6042 is related. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6427) Pluggable compaction and scan policies via coprocessors
[ https://issues.apache.org/jira/browse/HBASE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426114#comment-13426114 ] Elliott Clark commented on HBASE-6427: -- Thanks so much. Pluggable compaction and scan policies via coprocessors --- Key: HBASE-6427 URL: https://issues.apache.org/jira/browse/HBASE-6427 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.2 Attachments: 6427-0.94-addendum.txt, 6427-0.94.txt, 6427-notReady.txt, 6427-v1.txt, 6427-v10.txt, 6427-v2.txt, 6427-v3.txt, 6427-v4.txt, 6427-v5.txt, 6427-v7.txt When implementing higher level stores on top of HBase it is necessary to allow dynamic control over how long KVs must be kept around. Semi-static config options for ColumnFamilies (# of version or TTL) is not sufficient. This can be done with a few additional coprocessor hooks, or by makeing Store.ScanInfo pluggable. Was: The simplest way to achieve this is to have a pluggable class to determine the smallestReadpoint for Region. That way outside code can control what KVs to retain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6467) ROOT stuck in assigning forever
[ https://issues.apache.org/jira/browse/HBASE-6467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6467: --- Priority: Minor (was: Major) ROOT stuck in assigning forever --- Key: HBASE-6467 URL: https://issues.apache.org/jira/browse/HBASE-6467 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: master.log.gz, regionserver.log.gz, root-region-assignment.png After restart a cluster, all region servers checked into master but the master stuck in assigning forever. Master log shows it keeps trying connect to one region server for ROOT table, while that region server's log shows it keeps printing out NotServingRegionException. After restart the master, things are ok now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6467) ROOT stuck in assigning forever
[ https://issues.apache.org/jira/browse/HBASE-6467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang resolved HBASE-6467. Resolution: Won't Fix This issue should have been fixed in 0.94 and trunk. Not plan to fix it in 0.92 since the workaround is good enough. ROOT stuck in assigning forever --- Key: HBASE-6467 URL: https://issues.apache.org/jira/browse/HBASE-6467 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: master.log.gz, regionserver.log.gz, root-region-assignment.png After restart a cluster, all region servers checked into master but the master stuck in assigning forever. Master log shows it keeps trying connect to one region server for ROOT table, while that region server's log shows it keeps printing out NotServingRegionException. After restart the master, things are ok now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6471) Performance regression caused by HBASE-4054
[ https://issues.apache.org/jira/browse/HBASE-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426144#comment-13426144 ] Lars Hofhansl commented on HBASE-6471: -- Please see my comment on HBASE-5728. I am not sure we want to leak a lot of the HTable internals (anything related to regions, etc) up into the interface. So maybe just remove the delegation code from PooledHTable and add a constructor to HTable that avoids the ZK/Meta scan? Performance regression caused by HBASE-4054 --- Key: HBASE-6471 URL: https://issues.apache.org/jira/browse/HBASE-6471 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Lars George Priority: Critical Fix For: 0.94.2 The patch in HBASE-4054 switches the PooledHTable to extend HTable as opposed to implement HTableInterface. Since HTable does not have an empty constructor, the patch added a call to the super() constructor, which though does trigger the ZooKeeper and META scan, causing a considerable delay. With multiple threads using the pool in parallel, the first thread is holding up all the subsequent ones, in effect it negates the whole reason we have a HTable pool. We should complete HBASE-5728, or alternatively add a protected, empty constructor the HTable. I am +1 for the former. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6468) RowCounter may return incorrect result if column name is specified in command line
[ https://issues.apache.org/jira/browse/HBASE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426146#comment-13426146 ] Shrijeet Paliwal commented on HBASE-6468: - Will wait for Filter.proto to get committed. Is the patch format fine (I used git format-patch), or you want git diff --no-prefix ? RowCounter may return incorrect result if column name is specified in command line -- Key: HBASE-6468 URL: https://issues.apache.org/jira/browse/HBASE-6468 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: Shrijeet Paliwal Attachments: 0001-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 0002-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 0004-HBASE-6468-RowCounter-may-return-incorrect-result.patch The RowCounter use FirstKeyOnlyFilter regardless of whether or not the command line argument specified a column family (or family:qualifier). In case when no qualifier was specified as argument, the scan will give correct result. However in the other case the scan instance may have been set with columns other than the very first column in the row, causing scan to get nothing as the FirstKeyOnlyFilter removes everything else. https://issues.apache.org/jira/browse/HBASE-6042 is related. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6468) RowCounter may return incorrect result if column name is specified in command line
[ https://issues.apache.org/jira/browse/HBASE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426147#comment-13426147 ] Zhihong Ted Yu commented on HBASE-6468: --- Git format is fine, acceptable by Hadoop QA. RowCounter may return incorrect result if column name is specified in command line -- Key: HBASE-6468 URL: https://issues.apache.org/jira/browse/HBASE-6468 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: Shrijeet Paliwal Attachments: 0001-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 0002-HBASE-6468-RowCounter-may-return-incorrect-result.patch, 0004-HBASE-6468-RowCounter-may-return-incorrect-result.patch The RowCounter use FirstKeyOnlyFilter regardless of whether or not the command line argument specified a column family (or family:qualifier). In case when no qualifier was specified as argument, the scan will give correct result. However in the other case the scan instance may have been set with columns other than the very first column in the row, causing scan to get nothing as the FirstKeyOnlyFilter removes everything else. https://issues.apache.org/jira/browse/HBASE-6042 is related. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426151#comment-13426151 ] Zhihong Ted Yu commented on HBASE-6435: --- {code} + private static ClientProtocol createReordoringProxy(final ClientProtocol cp, {code} Usually spelling would be nit. But this spelling mistake was in method name :-) {code} + public static ServerName getServerNameFromHLogDirectoryName(Configuration conf, String path) throws IOException { {code} The above line is too long. {code} + LOG.debug(Moved the location +toLast.getHostName()+ to the last place. + + locations size was +dnis.length); {code} I think the above log may appear many times. {code} +LOG.fatal( REORDER); {code} The above can be made a debug log. Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426158#comment-13426158 ] Hadoop QA commented on HBASE-6435: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538610/6435.v8.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.fs.TestBlockReorder Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2464//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2464//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2464//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2464//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2464//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2464//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2464//console This message is automatically generated. Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you
[jira] [Commented] (HBASE-6427) Pluggable compaction and scan policies via coprocessors
[ https://issues.apache.org/jira/browse/HBASE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426167#comment-13426167 ] Hudson commented on HBASE-6427: --- Integrated in HBase-0.94 #379 (See [https://builds.apache.org/job/HBase-0.94/379/]) HBASE-6427 addendum (Revision 1367770) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanType.java Pluggable compaction and scan policies via coprocessors --- Key: HBASE-6427 URL: https://issues.apache.org/jira/browse/HBASE-6427 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.2 Attachments: 6427-0.94-addendum.txt, 6427-0.94.txt, 6427-notReady.txt, 6427-v1.txt, 6427-v10.txt, 6427-v2.txt, 6427-v3.txt, 6427-v4.txt, 6427-v5.txt, 6427-v7.txt When implementing higher level stores on top of HBase it is necessary to allow dynamic control over how long KVs must be kept around. Semi-static config options for ColumnFamilies (# of version or TTL) is not sufficient. This can be done with a few additional coprocessor hooks, or by makeing Store.ScanInfo pluggable. Was: The simplest way to achieve this is to have a pluggable class to determine the smallestReadpoint for Region. That way outside code can control what KVs to retain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6482) In AssignmentManager failover mode, use ServerShutdownHandler to handle dead regions
Jimmy Xiang created HBASE-6482: -- Summary: In AssignmentManager failover mode, use ServerShutdownHandler to handle dead regions Key: HBASE-6482 URL: https://issues.apache.org/jira/browse/HBASE-6482 Project: HBase Issue Type: Sub-task Reporter: Jimmy Xiang In AssignmentManager failover mode, a special failoverProcessedRegions map is used to manage regions in transition. It complicates the code. Should we use ServerShutdownHander to process those regions? So that we can share some code and make the logic of AssignmentManager a little bit simpler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6483) Fully enable ServerShutdownHandler after master joins the cluster
Jimmy Xiang created HBASE-6483: -- Summary: Fully enable ServerShutdownHandler after master joins the cluster Key: HBASE-6483 URL: https://issues.apache.org/jira/browse/HBASE-6483 Project: HBase Issue Type: Sub-task Reporter: Jimmy Xiang Once ROOT and META are assigned, ServerShutdownHandler is enabled. So that we can handle meta/root region server failure before joinCluster is completed. However, we can hold ServerShutdownHandler a little bit more for the user region assignments, i.e. doesn't assign user regions before joinCluster is returned. If so, we can avoid some region assignments racing: same regions are trying to be assigned in both joinCluster and ServerShutdownHandler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6484) Make AssignmentManger#enablingTables and disablintTables local variables
Jimmy Xiang created HBASE-6484: -- Summary: Make AssignmentManger#enablingTables and disablintTables local variables Key: HBASE-6484 URL: https://issues.apache.org/jira/browse/HBASE-6484 Project: HBase Issue Type: Sub-task Reporter: Jimmy Xiang Those enablingTables and disablingTables, are used only during the startup time. They should be some local variables. We can load them from ZKTable at the beginning instead of handling them per table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6485) Share bulk assign code in AssignmentManager
Jimmy Xiang created HBASE-6485: -- Summary: Share bulk assign code in AssignmentManager Key: HBASE-6485 URL: https://issues.apache.org/jira/browse/HBASE-6485 Project: HBase Issue Type: Sub-task Reporter: Jimmy Xiang Assignee: Jimmy Xiang AssignmentManager has several bulk assign functions: for startup bulk assign, for ServerShutdownHandler bulk assign, etc. They can be shared. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6476: - Status: Open (was: Patch Available) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0 Attachments: 6476-v2.txt, 6476.txt There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6476: - Status: Patch Available (was: Open) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0 Attachments: 6476-v2.txt, 6476.txt There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6486) Enhance load test to print throughput measurements
Karthik Ranganathan created HBASE-6486: -- Summary: Enhance load test to print throughput measurements Key: HBASE-6486 URL: https://issues.apache.org/jira/browse/HBASE-6486 Project: HBase Issue Type: Bug Reporter: Karthik Ranganathan Assignee: Aurick Qiao Idea is to know how many MB/sec of throughput we are able to get by writing into HBase using a simple tool. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426225#comment-13426225 ] Zhihong Ted Yu commented on HBASE-6435: --- For the test failure: {code} org.junit.ComparisonFailure: expected:[localhost] but was:[host2] at org.junit.Assert.assertEquals(Assert.java:125) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.hbase.fs.TestBlockReorder.testFromDFS(TestBlockReorder.java:320) at org.apache.hadoop.hbase.fs.TestBlockReorder.testHBaseCluster(TestBlockReorder.java:271) {code} testFromDFS() should have utilized the done flag for the while loop below: {code} +for (int y = 0; y l.getLocatedBlocks().size() done; y++) { + done = (l.get(y).getLocations().length == 3); +} + } while (l.get(0).getLocations().length != 3); {code} When l.getLocatedBlocks().size() is greater than 1, the above loop may exit prematurely. Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 6435.unfinished.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written with 'append' on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6487) assign region doesn't check if the region is already assigned
Jimmy Xiang created HBASE-6487: -- Summary: assign region doesn't check if the region is already assigned Key: HBASE-6487 URL: https://issues.apache.org/jira/browse/HBASE-6487 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Tried to assign a region already assigned somewhere from hbase shell, the region is assigned to a different place but the previous assignment is not closed. So it causes double assignments. In such a case, it's better to issue a warning instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6052) Convert .META. and -ROOT- content to pb
[ https://issues.apache.org/jira/browse/HBASE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426264#comment-13426264 ] Enis Soztutar commented on HBASE-6052: -- Stack, mind if I attack this? Convert .META. and -ROOT- content to pb --- Key: HBASE-6052 URL: https://issues.apache.org/jira/browse/HBASE-6052 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes
ryan rawson created HBASE-6488: -- Summary: HBase wont run on IPv6 on OSes that use zone-indexes Key: HBASE-6488 URL: https://issues.apache.org/jira/browse/HBASE-6488 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ryan rawson In IPv6, an address may have a zone-index, which is specified with a percent, eg: ...%0. This looks like a format string, and thus in a part of the code which uses the hostname as a prefix to another string which is interpreted with String.format, you end up with an exception: 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.util.UnknownFormatConversionException: Conversion = '0' at java.util.Formatter.checkText(Formatter.java:2503) at java.util.Formatter.parse(Formatter.java:2467) at java.util.Formatter.format(Formatter.java:2414) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68) at org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227) at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220) at java.lang.Thread.run(Thread.java:680) 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes
[ https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-6488: --- Status: Patch Available (was: Open) HBase wont run on IPv6 on OSes that use zone-indexes Key: HBASE-6488 URL: https://issues.apache.org/jira/browse/HBASE-6488 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ryan rawson Attachments: HBASE-6488.txt In IPv6, an address may have a zone-index, which is specified with a percent, eg: ...%0. This looks like a format string, and thus in a part of the code which uses the hostname as a prefix to another string which is interpreted with String.format, you end up with an exception: 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.util.UnknownFormatConversionException: Conversion = '0' at java.util.Formatter.checkText(Formatter.java:2503) at java.util.Formatter.parse(Formatter.java:2467) at java.util.Formatter.format(Formatter.java:2414) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68) at org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227) at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220) at java.lang.Thread.run(Thread.java:680) 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes
[ https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-6488: --- Attachment: HBASE-6488.txt HBase wont run on IPv6 on OSes that use zone-indexes Key: HBASE-6488 URL: https://issues.apache.org/jira/browse/HBASE-6488 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ryan rawson Attachments: HBASE-6488.txt In IPv6, an address may have a zone-index, which is specified with a percent, eg: ...%0. This looks like a format string, and thus in a part of the code which uses the hostname as a prefix to another string which is interpreted with String.format, you end up with an exception: 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.util.UnknownFormatConversionException: Conversion = '0' at java.util.Formatter.checkText(Formatter.java:2503) at java.util.Formatter.parse(Formatter.java:2467) at java.util.Formatter.format(Formatter.java:2414) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68) at org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227) at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220) at java.lang.Thread.run(Thread.java:680) 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes
[ https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426270#comment-13426270 ] Hadoop QA commented on HBASE-6488: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538644/HBASE-6488.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2466//console This message is automatically generated. HBase wont run on IPv6 on OSes that use zone-indexes Key: HBASE-6488 URL: https://issues.apache.org/jira/browse/HBASE-6488 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ryan rawson Attachments: HBASE-6488.txt In IPv6, an address may have a zone-index, which is specified with a percent, eg: ...%0. This looks like a format string, and thus in a part of the code which uses the hostname as a prefix to another string which is interpreted with String.format, you end up with an exception: 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.util.UnknownFormatConversionException: Conversion = '0' at java.util.Formatter.checkText(Formatter.java:2503) at java.util.Formatter.parse(Formatter.java:2467) at java.util.Formatter.format(Formatter.java:2414) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68) at org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227) at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220) at java.lang.Thread.run(Thread.java:680) 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6488) HBase wont run on IPv6 on OSes that use zone-indexes
[ https://issues.apache.org/jira/browse/HBASE-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426280#comment-13426280 ] Zhihong Ted Yu commented on HBASE-6488: --- The path to ExecutorService.java should be hbase-server/src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java. Hadoop QA only runs test suite in trunk. HBase wont run on IPv6 on OSes that use zone-indexes Key: HBASE-6488 URL: https://issues.apache.org/jira/browse/HBASE-6488 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: ryan rawson Attachments: HBASE-6488.txt In IPv6, an address may have a zone-index, which is specified with a percent, eg: ...%0. This looks like a format string, and thus in a part of the code which uses the hostname as a prefix to another string which is interpreted with String.format, you end up with an exception: 2012-07-31 18:21:39,848 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.util.UnknownFormatConversionException: Conversion = '0' at java.util.Formatter.checkText(Formatter.java:2503) at java.util.Formatter.parse(Formatter.java:2467) at java.util.Formatter.format(Formatter.java:2414) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.google.common.util.concurrent.ThreadFactoryBuilder.setNameFormat(ThreadFactoryBuilder.java:68) at org.apache.hadoop.hbase.executor.ExecutorService$Executor.init(ExecutorService.java:299) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:185) at org.apache.hadoop.hbase.executor.ExecutorService.startExecutorService(ExecutorService.java:227) at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:821) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:507) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220) at java.lang.Thread.run(Thread.java:680) 2012-07-31 18:21:39,908 INFO org.apache.hadoop.hbase.master.HMaster: Aborting -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6476: - Attachment: 6476-v2.txt Not sure why hadoop QA wouldn't run. Trying again. Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0 Attachments: 6476-v2.txt, 6476-v2.txt, 6476.txt There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6480) If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall
[ https://issues.apache.org/jira/browse/HBASE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-6480: Fix Version/s: 0.94.2 0.96.0 If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall Key: HBASE-6480 URL: https://issues.apache.org/jira/browse/HBASE-6480 Project: HBase Issue Type: Bug Reporter: binlijin Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6480-94.patch, HBASE-6480-trunk.patch Current if the callQueueSize exceed maxQueueSize, all call will be rejected, Should we let the priority Call pass through? Current: if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } Should we change it to : if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6478) TestClassLoading.testClassLoadingFromLibDirInJar in coprocessor may appear fail due to waitTableAvailable
[ https://issues.apache.org/jira/browse/HBASE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhou wenjian updated HBASE-6478: Fix Version/s: 0.94.2 TestClassLoading.testClassLoadingFromLibDirInJar in coprocessor may appear fail due to waitTableAvailable - Key: HBASE-6478 URL: https://issues.apache.org/jira/browse/HBASE-6478 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: zhou wenjian Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6478-trunk.patch When hudson runs for HBASE-6459, it encounters a failed testcase in org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLibDirInJar. The link is https://builds.apache.org/job/PreCommit-HBASE-Build/2455/testReport/org.apache.hadoop.hbase.coprocessor/TestClassLoading/testClassLoadingFromLibDirInJar/ I check the log, and find that the function waitTableAvailable will only check the meta table, when rs open the region and update the metalocation in meta, it may not be added to the onlineregions in rs. for (HRegion region: hbase.getRegionServer(0).getOnlineRegionsLocalContext()) { this Loop will ship, and found1 will be false altogether. that's why the testcase failed. So maybe we can hbave some strictly check when table is created -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6485) Share bulk assign code in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426304#comment-13426304 ] Jimmy Xiang commented on HBASE-6485: First patch was posted on RB: https://reviews.apache.org/r/6269/ Share bulk assign code in AssignmentManager --- Key: HBASE-6485 URL: https://issues.apache.org/jira/browse/HBASE-6485 Project: HBase Issue Type: Sub-task Reporter: Jimmy Xiang Assignee: Jimmy Xiang AssignmentManager has several bulk assign functions: for startup bulk assign, for ServerShutdownHandler bulk assign, etc. They can be shared. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6478) TestClassLoading.testClassLoadingFromLibDirInJar in coprocessor may appear fail due to waitTableAvailable
[ https://issues.apache.org/jira/browse/HBASE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhou wenjian updated HBASE-6478: Status: Patch Available (was: Open) TestClassLoading.testClassLoadingFromLibDirInJar in coprocessor may appear fail due to waitTableAvailable - Key: HBASE-6478 URL: https://issues.apache.org/jira/browse/HBASE-6478 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: zhou wenjian Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6478-trunk.patch When hudson runs for HBASE-6459, it encounters a failed testcase in org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLibDirInJar. The link is https://builds.apache.org/job/PreCommit-HBASE-Build/2455/testReport/org.apache.hadoop.hbase.coprocessor/TestClassLoading/testClassLoadingFromLibDirInJar/ I check the log, and find that the function waitTableAvailable will only check the meta table, when rs open the region and update the metalocation in meta, it may not be added to the onlineregions in rs. for (HRegion region: hbase.getRegionServer(0).getOnlineRegionsLocalContext()) { this Loop will ship, and found1 will be false altogether. that's why the testcase failed. So maybe we can hbave some strictly check when table is created -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6473) deletedtable is not deleted completely, some region may be still online
[ https://issues.apache.org/jira/browse/HBASE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhou wenjian updated HBASE-6473: Status: Patch Available (was: Open) deletedtable is not deleted completely, some region may be still online --- Key: HBASE-6473 URL: https://issues.apache.org/jira/browse/HBASE-6473 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: zhou wenjian Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6473-trunk.patch consider such Scenario: we have a table called T1, which has 1 regions: A 1. move A from rs1 to rs2,and A is now closed 2. disable T1, 3. delete T1. when we disable T1, disable handler will just set the zk to disabled and A will still be assigned. when Ais opened, A in transition will be clean out. At that time, Deletetable found it is safe to delete all regions and table in meta and fs , it will also delete the zk node of T1. {code} while (System.currentTimeMillis() done) { AssignmentManager.RegionState rs = am.isRegionInTransition(region); if (rs == null) break; Threads.sleep(waitingTimeForEvents); LOG.debug(Waiting on region to clear regions in transition; + rs); } if (am.isRegionInTransition(region) != null) { throw new IOException(Waited hbase.master.wait.on.region ( + waitTime + ms) for region to leave region + region.getRegionNameAsString() + in transitions); } {code} however A is still being unassigned, when it finished closed the A,it finds that the disabled state in zk is deleted, and then A will be assigned again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6459) improve speed of create table
[ https://issues.apache.org/jira/browse/HBASE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhou wenjian updated HBASE-6459: Status: Patch Available (was: Open) improve speed of create table - Key: HBASE-6459 URL: https://issues.apache.org/jira/browse/HBASE-6459 Project: HBase Issue Type: Improvement Affects Versions: 0.94.0 Reporter: zhou wenjian Fix For: 0.96.0 Attachments: HBASE-6459-90.patch, HBASE-6459-92.patch, HBASE-6459-94.patch, HBASE-6459-trunk-v2.patch, HBASE-6459-trunk.patch in CreateTableHandler for (int regionIdx = 0; regionIdx this.newRegions.length; regionIdx++) { HRegionInfo newRegion = this.newRegions[regionIdx]; // 1. Create HRegion HRegion region = HRegion.createHRegion(newRegion, this.fileSystemManager.getRootDir(), this.conf, this.hTableDescriptor, null, false, true); regionInfos.add(region.getRegionInfo()); if (regionIdx % batchSize == 0) { // 2. Insert into META MetaEditor.addRegionsToMeta(this.catalogTracker, regionInfos); regionInfos.clear(); } // 3. Close the new region to flush to disk. Close log file too. region.close(); } All the region will be create serially. If we have thousands of regions, that will be a huge cost. We can improve it by create the region in parallel -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6459) improve speed of create table
[ https://issues.apache.org/jira/browse/HBASE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhou wenjian updated HBASE-6459: Status: Open (was: Patch Available) improve speed of create table - Key: HBASE-6459 URL: https://issues.apache.org/jira/browse/HBASE-6459 Project: HBase Issue Type: Improvement Affects Versions: 0.94.0 Reporter: zhou wenjian Fix For: 0.96.0 Attachments: HBASE-6459-90.patch, HBASE-6459-92.patch, HBASE-6459-94.patch, HBASE-6459-trunk-v2.patch, HBASE-6459-trunk.patch in CreateTableHandler for (int regionIdx = 0; regionIdx this.newRegions.length; regionIdx++) { HRegionInfo newRegion = this.newRegions[regionIdx]; // 1. Create HRegion HRegion region = HRegion.createHRegion(newRegion, this.fileSystemManager.getRootDir(), this.conf, this.hTableDescriptor, null, false, true); regionInfos.add(region.getRegionInfo()); if (regionIdx % batchSize == 0) { // 2. Insert into META MetaEditor.addRegionsToMeta(this.catalogTracker, regionInfos); regionInfos.clear(); } // 3. Close the new region to flush to disk. Close log file too. region.close(); } All the region will be create serially. If we have thousands of regions, that will be a huge cost. We can improve it by create the region in parallel -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6459) improve speed of create table
[ https://issues.apache.org/jira/browse/HBASE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426307#comment-13426307 ] zhou wenjian commented on HBASE-6459: - I think failed testcase is invovled with HBASE-6459. improve speed of create table - Key: HBASE-6459 URL: https://issues.apache.org/jira/browse/HBASE-6459 Project: HBase Issue Type: Improvement Affects Versions: 0.94.0 Reporter: zhou wenjian Fix For: 0.96.0 Attachments: HBASE-6459-90.patch, HBASE-6459-92.patch, HBASE-6459-94.patch, HBASE-6459-trunk-v2.patch, HBASE-6459-trunk.patch in CreateTableHandler for (int regionIdx = 0; regionIdx this.newRegions.length; regionIdx++) { HRegionInfo newRegion = this.newRegions[regionIdx]; // 1. Create HRegion HRegion region = HRegion.createHRegion(newRegion, this.fileSystemManager.getRootDir(), this.conf, this.hTableDescriptor, null, false, true); regionInfos.add(region.getRegionInfo()); if (regionIdx % batchSize == 0) { // 2. Insert into META MetaEditor.addRegionsToMeta(this.catalogTracker, regionInfos); regionInfos.clear(); } // 3. Close the new region to flush to disk. Close log file too. region.close(); } All the region will be create serially. If we have thousands of regions, that will be a huge cost. We can improve it by create the region in parallel -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6052) Convert .META. and -ROOT- content to pb
[ https://issues.apache.org/jira/browse/HBASE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426306#comment-13426306 ] Lars Hofhansl commented on HBASE-6052: -- Shouldn't we just get rid of ROOT? Convert .META. and -ROOT- content to pb --- Key: HBASE-6052 URL: https://issues.apache.org/jira/browse/HBASE-6052 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6459) improve speed of create table
[ https://issues.apache.org/jira/browse/HBASE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426308#comment-13426308 ] zhou wenjian commented on HBASE-6459: - I think failed testcase is invovled with HBASE-6478. improve speed of create table - Key: HBASE-6459 URL: https://issues.apache.org/jira/browse/HBASE-6459 Project: HBase Issue Type: Improvement Affects Versions: 0.94.0 Reporter: zhou wenjian Fix For: 0.96.0 Attachments: HBASE-6459-90.patch, HBASE-6459-92.patch, HBASE-6459-94.patch, HBASE-6459-trunk-v2.patch, HBASE-6459-trunk.patch in CreateTableHandler for (int regionIdx = 0; regionIdx this.newRegions.length; regionIdx++) { HRegionInfo newRegion = this.newRegions[regionIdx]; // 1. Create HRegion HRegion region = HRegion.createHRegion(newRegion, this.fileSystemManager.getRootDir(), this.conf, this.hTableDescriptor, null, false, true); regionInfos.add(region.getRegionInfo()); if (regionIdx % batchSize == 0) { // 2. Insert into META MetaEditor.addRegionsToMeta(this.catalogTracker, regionInfos); regionInfos.clear(); } // 3. Close the new region to flush to disk. Close log file too. region.close(); } All the region will be create serially. If we have thousands of regions, that will be a huge cost. We can improve it by create the region in parallel -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6480) If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall
[ https://issues.apache.org/jira/browse/HBASE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6480: -- Description: Current if the callQueueSize exceed maxQueueSize, all call will be rejected, Should we let the priority Call pass through? Current: {code} if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } {code} Should we change it to : {code} if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } {code} was: Current if the callQueueSize exceed maxQueueSize, all call will be rejected, Should we let the priority Call pass through? Current: if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } Should we change it to : if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall Key: HBASE-6480 URL: https://issues.apache.org/jira/browse/HBASE-6480 Project: HBase Issue Type: Bug Reporter: binlijin Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6480-94.patch, HBASE-6480-trunk.patch Current if the callQueueSize exceed maxQueueSize, all call will be rejected, Should we let the priority Call pass through? Current: {code} if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } {code} Should we change it to : {code} if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6480) If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall
[ https://issues.apache.org/jira/browse/HBASE-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426317#comment-13426317 ] Zhihong Ted Yu commented on HBASE-6480: --- Did you encounter this scenario on a cluster or, find this scenario through code analysis ? How do we bound the size of priorityCallQueue after proposed change ? Thanks If callQueueSize exceed maxQueueSize, all call will be rejected, do not reject priorityCall Key: HBASE-6480 URL: https://issues.apache.org/jira/browse/HBASE-6480 Project: HBase Issue Type: Bug Reporter: binlijin Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6480-94.patch, HBASE-6480-trunk.patch Current if the callQueueSize exceed maxQueueSize, all call will be rejected, Should we let the priority Call pass through? Current: {code} if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } {code} Should we change it to : {code} if (priorityCallQueue != null getQosLevel(param) highPriorityLevel) { priorityCallQueue.put(call); updateCallQueueLenMetrics(priorityCallQueue); } else { if ((callSize + callQueueSize.get()) maxQueueSize) { Call callTooBig = xxx return ; } callQueue.put(call); // queue the call; maybe blocked here updateCallQueueLenMetrics(callQueue); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426319#comment-13426319 ] Hadoop QA commented on HBASE-6476: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538646/6476-v2.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 240 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestAssignmentManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2467//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2467//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2467//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2467//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2467//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2467//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2467//console This message is automatically generated. Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0 Attachments: 6476-v2.txt, 6476-v2.txt, 6476.txt There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6473) deletedtable is not deleted completely, some region may be still online
[ https://issues.apache.org/jira/browse/HBASE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426325#comment-13426325 ] Hadoop QA commented on HBASE-6473: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538326/HBASE-6473-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2468//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2468//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2468//console This message is automatically generated. deletedtable is not deleted completely, some region may be still online --- Key: HBASE-6473 URL: https://issues.apache.org/jira/browse/HBASE-6473 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: zhou wenjian Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6473-trunk.patch consider such Scenario: we have a table called T1, which has 1 regions: A 1. move A from rs1 to rs2,and A is now closed 2. disable T1, 3. delete T1. when we disable T1, disable handler will just set the zk to disabled and A will still be assigned. when Ais opened, A in transition will be clean out. At that time, Deletetable found it is safe to delete all regions and table in meta and fs , it will also delete the zk node of T1. {code} while (System.currentTimeMillis() done) { AssignmentManager.RegionState rs = am.isRegionInTransition(region); if (rs == null) break; Threads.sleep(waitingTimeForEvents); LOG.debug(Waiting on region to clear regions in transition; + rs); } if (am.isRegionInTransition(region) != null) { throw new IOException(Waited hbase.master.wait.on.region ( + waitTime + ms) for region to leave region + region.getRegionNameAsString() + in transitions); } {code} however A is still being unassigned, when it finished closed the A,it finds that the disabled state in zk is deleted, and then A will be assigned again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6466) Enable multi-thread for memstore flush
[ https://issues.apache.org/jira/browse/HBASE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426327#comment-13426327 ] Zhihong Ted Yu commented on HBASE-6466: --- @Ram, @J-D: Please share your comments. Enable multi-thread for memstore flush -- Key: HBASE-6466 URL: https://issues.apache.org/jira/browse/HBASE-6466 Project: HBase Issue Type: Improvement Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6466.patch, HBASE-6466v2.patch, HBASE-6466v3.patch If the KV is large or Hlog is closed with high-pressure putting, we found memstore is often above the high water mark and block the putting. So should we enable multi-thread for Memstore Flush? Some performance test data for reference, 1.test environment : random writting;upper memstore limit 5.6GB;lower memstore limit 4.8GB;400 regions per regionserver;row len=50 bytes, value len=1024 bytes;5 regionserver, 300 ipc handler per regionserver;5 client, 50 thread handler per client for writing 2.test results: one cacheFlush handler, tps: 7.8k/s per regionserver, Flush:10.1MB/s per regionserver, appears many aboveGlobalMemstoreLimit blocking two cacheFlush handlers, tps: 10.7k/s per regionserver, Flush:12.46MB/s per regionserver, 200 thread handler per client two cacheFlush handlers, tps:16.1k/s per regionserver, Flush:18.6MB/s per regionserver -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6478) TestClassLoading.testClassLoadingFromLibDirInJar in coprocessor may appear fail due to waitTableAvailable
[ https://issues.apache.org/jira/browse/HBASE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426331#comment-13426331 ] Hadoop QA commented on HBASE-6478: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538503/HBASE-6478-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2469//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2469//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2469//console This message is automatically generated. TestClassLoading.testClassLoadingFromLibDirInJar in coprocessor may appear fail due to waitTableAvailable - Key: HBASE-6478 URL: https://issues.apache.org/jira/browse/HBASE-6478 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.0 Reporter: zhou wenjian Fix For: 0.96.0, 0.94.2 Attachments: HBASE-6478-trunk.patch When hudson runs for HBASE-6459, it encounters a failed testcase in org.apache.hadoop.hbase.coprocessor.TestClassLoading.testClassLoadingFromLibDirInJar. The link is https://builds.apache.org/job/PreCommit-HBASE-Build/2455/testReport/org.apache.hadoop.hbase.coprocessor/TestClassLoading/testClassLoadingFromLibDirInJar/ I check the log, and find that the function waitTableAvailable will only check the meta table, when rs open the region and update the metalocation in meta, it may not be added to the onlineregions in rs. for (HRegion region: hbase.getRegionServer(0).getOnlineRegionsLocalContext()) { this Loop will ship, and found1 will be false altogether. that's why the testcase failed. So maybe we can hbave some strictly check when table is created -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira