[jira] [Created] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Priority: Minor The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725) at org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.testExceptionFromCoprocessorDuringPut(TestRegionServerCoprocessorExceptionWithAbort.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at
[jira] [Updated] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4832: --- Attachment: 4832_trunk_hregionserver.patch 4832_trunk_hregionserver.patch contains the fix on HRegionServer which makes the coprocessor test fails. TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Priority: Minor Attachments: 4832_trunk_hregionserver.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725) at org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.testExceptionFromCoprocessorDuringPut(TestRegionServerCoprocessorExceptionWithAbort.java:84) at
[jira] [Created] (HBASE-4833) HRegionServer stops could be 0,5s faster
HRegionServer stops could be 0,5s faster Key: HBASE-4833 URL: https://issues.apache.org/jira/browse/HBASE-4833 Project: HBase Issue Type: Improvement Components: regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. See HBASE-4832 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4833) HRegionServer stops could be 0,5s faster
[ https://issues.apache.org/jira/browse/HBASE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4833: --- Attachment: 4833_trunk_hregionserver.patch HRegionServer stops could be 0,5s faster Key: HBASE-4833 URL: https://issues.apache.org/jira/browse/HBASE-4833 Project: HBase Issue Type: Improvement Components: regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4833_trunk_hregionserver.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. See HBASE-4832 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4798: --- Attachment: 4798_trunk_all.v10.patch Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4798: --- Status: Open (was: Patch Available) Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4798: --- Status: Patch Available (was: Open) Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153766#comment-13153766 ] Hadoop QA commented on HBASE-4798: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504426/4798_trunk_all.v10.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -166 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 59 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.replication.TestReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/314//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/314//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/314//console This message is automatically generated. Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153774#comment-13153774 ] Jonathan Hsieh commented on HBASE-2856: --- On trunk, TestAcidGuarantees ran for a solid day and a half (33+ hours) without failing. larsh@ I'll loop the 0.92 version and let it run through today and report how it fared around midday monday. TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153775#comment-13153775 ] Jonathan Hsieh commented on HBASE-2856: --- On trunk, TestAcidGuarantees ran for a solid day and a half (33+ hours) without failing. larsh@ I'll loop the 0.92 version and let it run through today and report how it fared around midday monday. TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4798: --- Status: Open (was: Patch Available) Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4798: --- Attachment: 4798_trunk_all.v10.patch Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4798: --- Status: Patch Available (was: Open) I tend to think that the patch is ok and that the errors we're seeing are the usual flaky stuff, but let's try again. Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4815) Disable online altering by default, create a config for it
[ https://issues.apache.org/jira/browse/HBASE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153828#comment-13153828 ] ramkrishna.s.vasudevan commented on HBASE-4815: --- Thanks Stack. Should have completed the verification of all test cases.. Usually used to do it.. as it was friday i left office before it could get completed. Next time will be more careful.. Disable online altering by default, create a config for it -- Key: HBASE-4815 URL: https://issues.apache.org/jira/browse/HBASE-4815 Project: HBase Issue Type: Task Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.92.0 Attachments: 4815-v2.txt, 4815.addendum, 4815.patch There's a whole class of bugs that we've been revealing from trying out online altering in conjunction with other operations like splitting. HBASE-4729, HBASE-4794, and HBASE-4814 are examples. It's not so much that the online altering code is buggy, but that it wasn't tested in an environment that permits splitting. I think we should mark online altering as experimental in 0.92 and add a config to enable it (so it would be disabled by default, requiring people to enable for altering table schema). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153831#comment-13153831 ] Hadoop QA commented on HBASE-4798: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504430/4798_trunk_all.v10.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -166 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 59 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.regionserver.wal.TestLogRolling Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/315//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/315//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/315//console This message is automatically generated. Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4834) CopyTable: Cannot have ZK source to destination
CopyTable: Cannot have ZK source to destination --- Key: HBASE-4834 URL: https://issues.apache.org/jira/browse/HBASE-4834 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.1 Reporter: Linden Hillenbrand During a Copy Table, involving --peer.adr, we found the following block of code: if (address != null) { ZKUtil.applyClusterKeyToConf(this.conf, address); } When we set ZK conf in setConf method, that also gets called in frontend when MR initializes TOF, so there's no way now to have two ZK points for a single job, cause source gets reset before job is submitted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4834) CopyTable: Cannot have ZK source to destination
[ https://issues.apache.org/jira/browse/HBASE-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Linden Hillenbrand updated HBASE-4834: -- Priority: Critical (was: Major) CopyTable: Cannot have ZK source to destination --- Key: HBASE-4834 URL: https://issues.apache.org/jira/browse/HBASE-4834 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.1 Reporter: Linden Hillenbrand Priority: Critical During a Copy Table, involving --peer.adr, we found the following block of code: if (address != null) { ZKUtil.applyClusterKeyToConf(this.conf, address); } When we set ZK conf in setConf method, that also gets called in frontend when MR initializes TOF, so there's no way now to have two ZK points for a single job, cause source gets reset before job is submitted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4834) CopyTable: Cannot have ZK source to destination
[ https://issues.apache.org/jira/browse/HBASE-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153837#comment-13153837 ] Linden Hillenbrand commented on HBASE-4834: --- Going to move to Table.Format.GetRecordWriter so that it is called. Moving entire block in setconf in TableOutputFormat.java. Harsh to test out locally, I will submit patch shortly. CopyTable: Cannot have ZK source to destination --- Key: HBASE-4834 URL: https://issues.apache.org/jira/browse/HBASE-4834 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.1 Reporter: Linden Hillenbrand Priority: Critical During a Copy Table, involving --peer.adr, we found the following block of code: if (address != null) { ZKUtil.applyClusterKeyToConf(this.conf, address); } When we set ZK conf in setConf method, that also gets called in frontend when MR initializes TOF, so there's no way now to have two ZK points for a single job, cause source gets reset before job is submitted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4834) CopyTable: Cannot have ZK source to destination
[ https://issues.apache.org/jira/browse/HBASE-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153839#comment-13153839 ] Harsh J commented on HBASE-4834: Frontend calls setConf of TableOutputFormat when its initialized for output-spec checks upon job.submit(), and that reloads the actual ZK keys in job.xml itself (which, in Hadoop, is written _after_ checkOutputSpecs(…) and such). CopyTable: Cannot have ZK source to destination --- Key: HBASE-4834 URL: https://issues.apache.org/jira/browse/HBASE-4834 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.1 Reporter: Linden Hillenbrand Priority: Critical During a Copy Table, involving --peer.adr, we found the following block of code: if (address != null) { ZKUtil.applyClusterKeyToConf(this.conf, address); } When we set ZK conf in setConf method, that also gets called in frontend when MR initializes TOF, so there's no way now to have two ZK points for a single job, cause source gets reset before job is submitted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4834) CopyTable: Cannot have ZK source to destination
[ https://issues.apache.org/jira/browse/HBASE-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HBASE-4834. Resolution: Duplicate This was fixed by HBASE-3497. Resolving as dup. Apologies for the noise, and for the confusion Linden! Regards, Harsh CopyTable: Cannot have ZK source to destination --- Key: HBASE-4834 URL: https://issues.apache.org/jira/browse/HBASE-4834 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.1 Reporter: Linden Hillenbrand Priority: Critical During a Copy Table, involving --peer.adr, we found the following block of code: if (address != null) { ZKUtil.applyClusterKeyToConf(this.conf, address); } When we set ZK conf in setConf method, that also gets called in frontend when MR initializes TOF, so there's no way now to have two ZK points for a single job, cause source gets reset before job is submitted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4834) CopyTable: Cannot have ZK source to destination
[ https://issues.apache.org/jira/browse/HBASE-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153845#comment-13153845 ] Linden Hillenbrand commented on HBASE-4834: --- No worries, at least it was fixed and someone beat us to it. Thanks for your help on the investigation Harsh! CopyTable: Cannot have ZK source to destination --- Key: HBASE-4834 URL: https://issues.apache.org/jira/browse/HBASE-4834 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.1 Reporter: Linden Hillenbrand Priority: Critical During a Copy Table, involving --peer.adr, we found the following block of code: if (address != null) { ZKUtil.applyClusterKeyToConf(this.conf, address); } When we set ZK conf in setConf method, that also gets called in frontend when MR initializes TOF, so there's no way now to have two ZK points for a single job, cause source gets reset before job is submitted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153850#comment-13153850 ] Lars Hofhansl commented on HBASE-2856: -- Thanks Jon! TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153852#comment-13153852 ] Lars Hofhansl commented on HBASE-2856: -- @Nicolas and @Amit, could you review the 0.92 patch? I turned out to be much more manual than I had wished or expected, so it is very possible, that I missed something. (I tried to upload the 0.92 patch to review board for easier verification but apparently that does not work for branches other than trunk.) TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153852#comment-13153852 ] Lars Hofhansl edited comment on HBASE-2856 at 11/20/11 7:20 PM: @Nicolas and @Amit, could you review the 0.92 patch? I turned out to be much more manual than I had wished or expected, so it is very possible that I missed something. (I tried to upload the 0.92 patch to review board for easier verification, but apparently that does not work for branches other than trunk.) was (Author: lhofhansl): @Nicolas and @Amit, could you review the 0.92 patch? I turned out to be much more manual than I had wished or expected, so it is very possible, that I missed something. (I tried to upload the 0.92 patch to review board for easier verification but apparently that does not work for branches other than trunk.) TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4831) LRU stats thread should be a daemon thread
[ https://issues.apache.org/jira/browse/HBASE-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-4831. --- Resolution: Duplicate Duplicate of HBASE-4745, already resolved LRU stats thread should be a daemon thread -- Key: HBASE-4831 URL: https://issues.apache.org/jira/browse/HBASE-4831 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani I have seen the hung processes where the following was the only non-daemon thread LRU Statistics #0 prio=10 tid=0x2ab0bc04f800 nid=0x11ac waiting on condition [0x42f57000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaab9a1c000 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at java.util.concurrent.DelayQueue.take(DelayQueue.java:164) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-2418: -- Resolution: Fixed Fix Version/s: 0.94.0 Release Note: This adds support for protecting the state of HBase znodes on a multi-tenant ZooKeeper cluster. This support requires ZK 3.4.0. It is a companion patch to HBASE-2742 (secure RPC), and HBASE-3025 (Coprocessor based access control). SASL authentication of ZooKeeper clients with the quorum is handled in the ZK client independently of HBase concerns. To enable strong ZK authentication, one must create a suitable JaaS configuration, for example: Server { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab=/etc/hbase/conf/hbase.keytab storeKey=true useTicketCache=false principal=zookeeper/$HOSTNAME; }; Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true useTicketCache=false keyTab=/etc/hbase/conf/hbase.keytab principal=hbase/$HOSTNAME; }; and then configure both the client and server processes to use it, for example in hbase-site.xml: HBASE_OPTS=${HBASE_OPTS} -Djava.security.auth.login.config=/etc/hbase/conf/jaas.conf HBASE_OPTS=${HBASE_OPTS} -Dzookeeper.kerberos.removeHostFromPrincipal=true HBASE_OPTS=${HBASE_OPTS} -Dzookeeper.kerberos.removeRealmFromPrincipal=true HBase will then secure all znodes but for a few world-readable read-only ones needed for clients to look up region locations. All internal cluster operations will be protected from unauthenticated ZK clients, or clients not authenticated to the HBase principal. Presumably the only ZK clients authenticated to the HBase principal will be those embedded in the master and regionservers. We will pull in a Hadoop artifact patched with HADOOP-7070 if building under the security profile (-P security). 0.20.205 does not yet include HADOOP-7070. Without it, the JAAS configuration required for secure operation of the ZooKeeper client will be ignored. Status: Resolved (was: Patch Available) Committed to trunk and 0.92. TestZooKeeperACL passes with and without '-P security' locally. Does not break the build if '-P security' is not specified. Test failures found by HudsonQA are not directly related to this change. add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153889#comment-13153889 ] Hudson commented on HBASE-2418: --- Integrated in HBase-0.92 #152 (See [https://builds.apache.org/job/HBase-0.92/152/]) HBASE-2418 Support for ZooKeeper authentication apurtell : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/pom.xml * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperACL.java add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153888#comment-13153888 ] Hudson commented on HBASE-2418: --- Integrated in HBase-TRUNK #2466 (See [https://builds.apache.org/job/HBase-TRUNK/2466/]) HBASE-2418 Support for ZooKeeper authentication apurtell : Files : * /hbase/trunk/pom.xml * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperACL.java add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-2418: -- Attachment: 2418.addendum Addendum adds Gary's maven repository to pom add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153894#comment-13153894 ] Ted Yu commented on HBASE-2418: --- Applied addendum to 0.92 branch. Build 153 is running tests as this moment. add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4809) Per-CF set RPC metrics
[ https://issues.apache.org/jira/browse/HBASE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-4809: --- Attachment: D483.3.patch mbautin updated the revision [jira] [HBASE-4809] Per-CF set RPC metrics. Reviewers: nspiegelberg, JIRA, Kannan, Karthik Rebasing on most recent changes to the trunk and fixing a bug in StoreScanner. Unit tests pass, I will start a run on Hudson. Cluster testing is still to be done. REVISION DETAIL https://reviews.facebook.net/D483 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java src/test/java/org/apache/hadoop/hbase/regionserver/metrics/TestSchemaMetrics.java Per-CF set RPC metrics -- Key: HBASE-4809 URL: https://issues.apache.org/jira/browse/HBASE-4809 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D483.1.patch, D483.2.patch, D483.3.patch Porting per-CF set metrics for RPC times and response sizes from 0.89-fb to trunk. For each mutation signature (a set of column families involved in an RPC request) we increment several metrics, allowing to monitor access patterns. We deal with guarding against an explosion of the number of metrics in HBASE-4638 (which might even be implemented as part of this JIRA). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4809) Per-CF set RPC metrics
[ https://issues.apache.org/jira/browse/HBASE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4809: -- Attachment: HBASE-4809_Per_CF_set_RPC_metrics.patch This corresponds to D483.3.patch. Per-CF set RPC metrics -- Key: HBASE-4809 URL: https://issues.apache.org/jira/browse/HBASE-4809 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D483.1.patch, D483.2.patch, D483.3.patch, HBASE-4809_Per_CF_set_RPC_metrics.patch Porting per-CF set metrics for RPC times and response sizes from 0.89-fb to trunk. For each mutation signature (a set of column families involved in an RPC request) we increment several metrics, allowing to monitor access patterns. We deal with guarding against an explosion of the number of metrics in HBASE-4638 (which might even be implemented as part of this JIRA). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4809) Per-CF set RPC metrics
[ https://issues.apache.org/jira/browse/HBASE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4809: -- Release Note: Testing the patch on Hudson. Status: Patch Available (was: Open) Per-CF set RPC metrics -- Key: HBASE-4809 URL: https://issues.apache.org/jira/browse/HBASE-4809 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D483.1.patch, D483.2.patch, D483.3.patch, HBASE-4809_Per_CF_set_RPC_metrics.patch Porting per-CF set metrics for RPC times and response sizes from 0.89-fb to trunk. For each mutation signature (a set of column families involved in an RPC request) we increment several metrics, allowing to monitor access patterns. We deal with guarding against an explosion of the number of metrics in HBASE-4638 (which might even be implemented as part of this JIRA). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153852#comment-13153852 ] Lars Hofhansl edited comment on HBASE-2856 at 11/20/11 11:38 PM: - @Nicolas and @Amit, could you review the 0.92 patch? It turned out to be much more manual than I had wished or expected, so it is very possible that I missed something. (I tried to upload the 0.92 patch to review board for easier verification, but apparently that does not work for branches other than trunk.) was (Author: lhofhansl): @Nicolas and @Amit, could you review the 0.92 patch? I turned out to be much more manual than I had wished or expected, so it is very possible that I missed something. (I tried to upload the 0.92 patch to review board for easier verification, but apparently that does not work for branches other than trunk.) TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153913#comment-13153913 ] Hudson commented on HBASE-2418: --- Integrated in HBase-0.92 #153 (See [https://builds.apache.org/job/HBase-0.92/153/]) HBASE-2418 Addendum adds Gary's maven repo to pom.xml tedyu : Files : * /hbase/branches/0.92/pom.xml add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153921#comment-13153921 ] Andrew Purtell commented on HBASE-2418: --- Thanks Ted. I thought that went in with HBASE-3025. add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153923#comment-13153923 ] Andrew Purtell commented on HBASE-2418: --- And it looks like this part of the POM in trunk is not in the POM on 0.92: {code} pluginRepositories pluginRepository idghelmling.testing/id nameGary Helmling test repo/name urlhttp://people.apache.org/~garyh/mvn//url snapshots enabledtrue/enabled /snapshots releases enabledtrue/enabled /releases /pluginRepository /pluginRepositories {code} I don't know enough about Maven or how Gary set up the security profile to know if it is needed or not. Gary? add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153925#comment-13153925 ] Jonathan Hsieh commented on HBASE-2856: --- @larsh I posted it for you here. https://reviews.apache.org/r/2893/ I applied the patch, committed it and generated a git-patch via 'git format-patch HEAD^' which has enough info to find the right branch. TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4809) Per-CF set RPC metrics
[ https://issues.apache.org/jira/browse/HBASE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153930#comment-13153930 ] Hadoop QA commented on HBASE-4809: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504450/HBASE-4809_Per_CF_set_RPC_metrics.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -166 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 61 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.regionserver.wal.TestLogRolling Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/316//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/316//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/316//console This message is automatically generated. Per-CF set RPC metrics -- Key: HBASE-4809 URL: https://issues.apache.org/jira/browse/HBASE-4809 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D483.1.patch, D483.2.patch, D483.3.patch, HBASE-4809_Per_CF_set_RPC_metrics.patch Porting per-CF set metrics for RPC times and response sizes from 0.89-fb to trunk. For each mutation signature (a set of column families involved in an RPC request) we increment several metrics, allowing to monitor access patterns. We deal with guarding against an explosion of the number of metrics in HBASE-4638 (which might even be implemented as part of this JIRA). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153931#comment-13153931 ] Hudson commented on HBASE-2418: --- Integrated in HBase-0.92-security #2 (See [https://builds.apache.org/job/HBase-0.92-security/2/]) HBASE-2418 Addendum adds Gary's maven repo to pom.xml HBASE-2418 Support for ZooKeeper authentication tedyu : Files : * /hbase/branches/0.92/pom.xml apurtell : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/pom.xml * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperACL.java add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153934#comment-13153934 ] Andrew Purtell commented on HBASE-2418: --- Hudson returned another build failure report. I committed the above to the 0.92 POM. add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153942#comment-13153942 ] jirapos...@reviews.apache.org commented on HBASE-4820: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2895/ --- Review request for hbase, Todd Lipcon and Jonathan Robie. Summary --- Distributed log splitting coding enhancement to make it easier to understand, no semantics change. It is some issue raised during the code review in back porting this feature to CDH. This addresses bug HBASE-4820. https://issues.apache.org/jira/browse/HBASE-4820 Diffs - src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java f7ef653 src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9a3a2c src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 7dd67e9 src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 1d329b0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 21747b1 src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 51daa1f src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java c8684ec src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java 84d76e8 Diff: https://reviews.apache.org/r/2895/diff Testing --- Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, which are not related to this change. Thanks, Jimmy Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153966#comment-13153966 ] jirapos...@reviews.apache.org commented on HBASE-4820: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2895/#review3385 --- src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java https://reviews.apache.org/r/2895/#comment7563 handleDeadWorkers would be a better method name. src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java https://reviews.apache.org/r/2895/#comment7562 Please remove white space. src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java https://reviews.apache.org/r/2895/#comment7564 retry_count is the remaining count. This log message should be clearer. src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java https://reviews.apache.org/r/2895/#comment7565 Can we implement this item now ? src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java https://reviews.apache.org/r/2895/#comment7566 We should say 'remaining retries=' src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java https://reviews.apache.org/r/2895/#comment7567 Please adjust indentation. src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java https://reviews.apache.org/r/2895/#comment7571 Please adjust indentation for these 4 lines. src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java https://reviews.apache.org/r/2895/#comment7570 Should read 'splitlog workers' src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java https://reviews.apache.org/r/2895/#comment7572 Adjust indentation, please. - Ted On 2011-11-21 02:06:29, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2895/ bq. --- bq. bq. (Updated 2011-11-21 02:06:29) bq. bq. bq. Review request for hbase, Todd Lipcon and Jonathan Robie. bq. bq. bq. Summary bq. --- bq. bq. Distributed log splitting coding enhancement to make it easier to understand, no semantics change. bq. It is some issue raised during the code review in back porting this feature to CDH. bq. bq. bq. This addresses bug HBASE-4820. bq. https://issues.apache.org/jira/browse/HBASE-4820 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java f7ef653 bq.src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9a3a2c bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 7dd67e9 bq.src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 1d329b0 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 21747b1 bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 51daa1f bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java c8684ec bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java 84d76e8 bq. bq. Diff: https://reviews.apache.org/r/2895/diff bq. bq. bq. Testing bq. --- bq. bq. Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, which are not related to this change. bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153968#comment-13153968 ] Hudson commented on HBASE-2418: --- Integrated in HBase-0.92 #154 (See [https://builds.apache.org/job/HBase-0.92/154/]) Amend HBASE-2418 Add pluginRepositories to POM apurtell : Files : * /hbase/branches/0.92/pom.xml add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153969#comment-13153969 ] gaojinchao commented on HBASE-4739: --- HBASE-4739_trail5 made a few changes, Please review, if it makes sense, I will verify in a real cluster. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152585#comment-13152585 ] Ted Yu edited comment on HBASE-4739 at 11/21/11 4:09 AM: - @J-D In 0.92 version, uses HBASE-4739_Trunk_V2 in timeout monitor for sending a CLOSING rpc.(I try to modify this patch) In trunk, uses patch 4739_trialV3. Hbase is used by thousands of people. If this problem occurred once, it may occur more. So I think we need to solve this issue. What do you say J-D? I will do some more detailed testing about these patches and give my test cases. was (Author: sunnygao): @J-D In 0.92 version, uses HBASE-4739_Trunk_V2 in timeout monitor for sending a CLOSING rpc.(I try to modify this patch) In trunk, uses patch 4739_trialV3. Hbase thousands of people in the use of, If we once, may appear more. So I think we need slove this isse. What do you say J-D? I will do some more detailed testing about these patches and give my test cases. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153974#comment-13153974 ] Gary Helmling commented on HBASE-2418: -- The {{pluginRepositories/}} entry was added for HBASE-4763/HBASE-4781 for the custom maven-surefire build. It's not needed for the security components and should not be in the 0.92 branch as far as I can tell (HBASE-4781 is marked for 0.94). add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153975#comment-13153975 ] Hudson commented on HBASE-2418: --- Integrated in HBase-0.92-security #3 (See [https://builds.apache.org/job/HBase-0.92-security/3/]) Amend HBASE-2418 Add pluginRepositories to POM apurtell : Files : * /hbase/branches/0.92/pom.xml add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153976#comment-13153976 ] Gary Helmling commented on HBASE-2418: -- http://monitoring.apache.org/status/ is showing people.apache.org is down (minotaur.apache.org). This is probably the cause of the build failures, which are showing connection timed out retrieving artifacts from my repo. add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4830) Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno running 0.20.205.0+
[ https://issues.apache.org/jira/browse/HBASE-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153977#comment-13153977 ] stack commented on HBASE-4830: -- Thanks Todd. Here's the OOME out in IPC: {code} Exception in thread IPC Reader 8 on port 7003 java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:39) at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1157) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} Because it happened out here we don't get benefit of HBASE-4769 and abort immediately. Need a fix so we abort immediately instead of hang around in zombie mode as this server was doing. Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno running 0.20.205.0+ --- Key: HBASE-4830 URL: https://issues.apache.org/jira/browse/HBASE-4830 Project: HBase Issue Type: Bug Reporter: stack Attachments: hbase-stack-regionserver-sv4r9s38.out Running 0.20.205.1 (I was not at tip of the branch) I ran into the following hung regionserver: {code} regionserver7003.logRoller daemon prio=10 tid=0x7fd98028f800 nid=0x61af in Object.wait() [0x7fd987bfa000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(DFSClient.java:3606) - locked 0xf8656788 (a java.util.LinkedList) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3595) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3687) - locked 0xf8656458 (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3626) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:966) - locked 0xf8655998 (a org.apache.hadoop.io.SequenceFile$Writer) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:214) at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:791) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:578) - locked 0xc443deb0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) at java.lang.Thread.run(Thread.java:662) {code} Other threads are like this (here's a sample): {code} regionserver7003.logSyncer daemon prio=10 tid=0x7fd98025e000 nid=0x61ae waiting for monitor entry [0x7fd987cfb000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1074) - waiting to lock 0xc443deb0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1195) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1057) at java.lang.Thread.run(Thread.java:662) IPC Server handler 0 on 7003 daemon prio=10 tid=0x7fd98049b800 nid=0x61b8 waiting for monitor entry [0x7fd9872f1000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:1007) - waiting to lock 0xc443deb0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1798) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1668) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2980) at sun.reflect.GeneratedMethodAccessor636.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at
[jira] [Commented] (HBASE-4833) HRegionServer stops could be 0,5s faster
[ https://issues.apache.org/jira/browse/HBASE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153981#comment-13153981 ] stack commented on HBASE-4833: -- +1 on patch but can't commit it if it breaks TestRegionServerCoprocessorExceptionWithAbort. Any chance of including fix for that N? HRegionServer stops could be 0,5s faster Key: HBASE-4833 URL: https://issues.apache.org/jira/browse/HBASE-4833 Project: HBase Issue Type: Improvement Components: regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4833_trunk_hregionserver.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. See HBASE-4832 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153983#comment-13153983 ] stack commented on HBASE-2856: -- You fellas want this in 0.92? I want to cut a 0.92 RC. I have 0.92 tests passing up on jenkins a few times in a row now and all criticals and blockers are in. Should we wait? Or should we cut the RC and get this into the second RC (Im sure there'll be one). TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153985#comment-13153985 ] stack commented on HBASE-2856: -- Do all tests pass w/ 0.92 version of this patch in place? TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153986#comment-13153986 ] jirapos...@reviews.apache.org commented on HBASE-4820: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2895/#review3388 --- src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java https://reviews.apache.org/r/2895/#comment7574 Can we make this msg more clear. Something like Unexpected state : statename.. Cannot transit znode state from : currentState to OFFLINE. - ramkrishna On 2011-11-21 02:06:29, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2895/ bq. --- bq. bq. (Updated 2011-11-21 02:06:29) bq. bq. bq. Review request for hbase, Todd Lipcon and Jonathan Robie. bq. bq. bq. Summary bq. --- bq. bq. Distributed log splitting coding enhancement to make it easier to understand, no semantics change. bq. It is some issue raised during the code review in back porting this feature to CDH. bq. bq. bq. This addresses bug HBASE-4820. bq. https://issues.apache.org/jira/browse/HBASE-4820 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java f7ef653 bq.src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9a3a2c bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 7dd67e9 bq.src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 1d329b0 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 21747b1 bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 51daa1f bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java c8684ec bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java 84d76e8 bq. bq. Diff: https://reviews.apache.org/r/2895/diff bq. bq. bq. Testing bq. --- bq. bq. Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, which are not related to this change. bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153999#comment-13153999 ] Hadoop QA commented on HBASE-4739: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504464/HBASE-4739_trail5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -166 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 60 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/317//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/317//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/317//console This message is automatically generated. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154001#comment-13154001 ] ramkrishna.s.vasudevan commented on HBASE-4739: --- This is similar to the initial version you had given. +1 with the change. As i said any way RS will not allow the transition from happening again if it was already processing it. But we need to confirm like if we need to handle RegionAlreadyInTransitionException exception. It may be needed. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154002#comment-13154002 ] Lars Hofhansl commented on HBASE-2856: -- Re: 0.92, I was going by your comment above bq. If someone did it in next day or so, I'd be up for having it committed to 0.92 in time for first RC. It's not entirely clean, yet: {noformat} Results : Failed tests: testClosing(org.apache.hadoop.hbase.client.TestHCM) testFilterAcrossMultipleRegions(org.apache.hadoop.hbase.client.TestFromClientSide): expected:17576 but was:28064 testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): Scanned more than expected (6000) testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): Scanned more than expected (6000) testSplitWhileBulkLoadPhase(org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery) testGroupOrSplitPresplit(org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery) testWholesomeSplit(org.apache.hadoop.hbase.regionserver.TestSplitTransaction) testRollback(org.apache.hadoop.hbase.regionserver.TestSplitTransaction) testBasicSplit(org.apache.hadoop.hbase.regionserver.TestHRegion) Tests in error: testShutdownFixupWhenDaughterHasSplit(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster): test timed out after 30 milliseconds Tests run: 1065, Failures: 9, Errors: 1, Skipped: 7 {noformat} I have no time to look at these tonight, though. But that probably points to another RC. Would sure be nice if the acid guarantees that HBase claims would be met in 0.92 :) TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4830) Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno running 0.20.205.0+
[ https://issues.apache.org/jira/browse/HBASE-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154004#comment-13154004 ] Todd Lipcon commented on HBASE-4830: We could ship with a config -XX:OnOutOfMemoryError=kill -9 %p or whatever that trick is... Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno running 0.20.205.0+ --- Key: HBASE-4830 URL: https://issues.apache.org/jira/browse/HBASE-4830 Project: HBase Issue Type: Bug Reporter: stack Attachments: hbase-stack-regionserver-sv4r9s38.out Running 0.20.205.1 (I was not at tip of the branch) I ran into the following hung regionserver: {code} regionserver7003.logRoller daemon prio=10 tid=0x7fd98028f800 nid=0x61af in Object.wait() [0x7fd987bfa000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(DFSClient.java:3606) - locked 0xf8656788 (a java.util.LinkedList) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3595) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3687) - locked 0xf8656458 (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3626) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:966) - locked 0xf8655998 (a org.apache.hadoop.io.SequenceFile$Writer) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:214) at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:791) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:578) - locked 0xc443deb0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) at java.lang.Thread.run(Thread.java:662) {code} Other threads are like this (here's a sample): {code} regionserver7003.logSyncer daemon prio=10 tid=0x7fd98025e000 nid=0x61ae waiting for monitor entry [0x7fd987cfb000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1074) - waiting to lock 0xc443deb0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1195) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1057) at java.lang.Thread.run(Thread.java:662) IPC Server handler 0 on 7003 daemon prio=10 tid=0x7fd98049b800 nid=0x61b8 waiting for monitor entry [0x7fd9872f1000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:1007) - waiting to lock 0xc443deb0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1798) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1668) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2980) at sun.reflect.GeneratedMethodAccessor636.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1325) {code} Looks like HDFS-1529? (Todd?) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154009#comment-13154009 ] Lars Hofhansl commented on HBASE-2856: -- @Jon: Thanks for uploading to RB, btw. TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154010#comment-13154010 ] gaojinchao commented on HBASE-4739: --- I think we don't need handle RegionAlreadyInTransitionException exception. We only need update the timestamp of RIT,we have done. my reason is : 1. The moniter timeout is 30 minutes, There are enough time to close a region. 2. if the RS throws RegionAlreadyInTransitionException exception, we need update the timestamp of RIT and wait next timeout. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154026#comment-13154026 ] ramkrishna.s.vasudevan commented on HBASE-4739: --- Yes Gao.. thats is what i said when i meant need to handle RegionAlreadyInTransitionException . Explicitly catch that exception in the catch block of unassign() and handle the exception by updating the timestamp of RIT. Good work. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira