[jira] [Created] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
Enis Soztutar created HBASE-5849: Summary: On first cluster startup, RS aborts if root znode is not available Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259927#comment-13259927 ] Enis Soztutar commented on HBASE-5849: -- Upon inspecting further, it seems the patch for HBASE-4138 added the check for the base server at region server start code. While it makes sense to check for znode.parent from the client side, we should not do that for the regionserver. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5849: - Attachment: HBASE-5849_v1.patch Attaching a simple patch. Applies to trunk, 0.92 and 0.94 branches. Tested this with pseudo-distributed setup on my laptop, by first launching regionserver, and observing that it does actually wait for the master to boot up, instead of aborting. I'll try to come up with a boot order unit test shortly. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5849_v1.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5849: - Status: Open (was: Patch Available) On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5849_v1.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5849: - Attachment: HBASE-5849_v2.patch Thanks Stack for taking a look into this. I have added a unit test for boot order for the cluster. To answer you earlier comment, I think the region server should just keep waiting until there is an active master. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5849: - Status: Patch Available (was: Open) Rerunning hudson for patch v2. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13260791#comment-13260791 ] Enis Soztutar commented on HBASE-5849: -- Interesting that Hudson did not report any test failures. let me dig down to this. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5342) Grant/Revoke global permissions
[ https://issues.apache.org/jira/browse/HBASE-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261149#comment-13261149 ] Enis Soztutar commented on HBASE-5342: -- @Matteo, I do not plan to work on this in the near future, feel free to take a shot. As Gary mentioned, there is already the infrastructure to manage and distribute ACL changes to region servers. I think for this, we should just reuse those. For the hbase shell, we just need to make table argument optional, and change the AccessControlProtocol.grant()/revoke() methods to accept Permission objects rather than TablePermission objects. Grant/Revoke global permissions --- Key: HBASE-5342 URL: https://issues.apache.org/jira/browse/HBASE-5342 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar HBASE-3025 introduced simple ACLs based on coprocessors. It defines global/table/cf/cq level permissions. However, there is no way to grant/revoke global level permissions, other than the hbase.superuser conf setting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5849: - Attachment: HBASE-5849_v4.patch HBASE-5849_v4-0.92.patch HBASE-5849_v4.patch I have found 2 issues, that caused timeouts in 0.92 branch: 1. hbase dir was not setup to use the temp dir under target/, but used the default one under /tmp/hadoop-${username}, so running the test on 0.92 causes rs to not come up if you have dirty data under /tmp/. 2. giving timeouts like @Test(timeout=xxx) causes 0.92 master to not shutdown properly. I could not inspect this further, there might be an issue with surefire. As a result, I updated the patch to first boot up a mini dfs, and setup the hbase dir. And I also removed the timeouts (the test runner (maven) will timeout instead if something goes wrong). All my tests for trunk,0.94, and 0.92 seem to pass. @Ted, @Stack, can you please try the patch to see whether you can replicate? On an unrelated note, the ResourceChecker notifies that some of the daemon threads (like LruBlockCache.EvictionThread) are not shutdown properly (even when using MiniHBaseCluster, and shutting down properly). Any idea, whether we should dig into that? On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch, HBASE-5849_v4-0.92.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5849: - Status: Patch Available (was: Reopened) On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch, HBASE-5849_v4-0.92.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5849: - Attachment: HBASE-5849_v4.patch Reattaching for Jenkins. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch, HBASE-5849_v4-0.92.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261855#comment-13261855 ] Enis Soztutar commented on HBASE-4821: -- DD and I also want to commit some resources into developing/maintaining/running such tests. We are also willing to allocate some cluster resources into running the tests for extended periods of time. @Mikhail, do you have anything planned yet? To go further with this, I think a short test design doc would be a great start, wdyt? @Keith, @Stack, do you think we should port goraci inside hbase or bigtop? @Roman, I love the idea that bigtop provides services for deployment, and running e2e (end to end) tests. But in my experience, maintaining the actual tests (code, logic, etc) will be a lot easier if the code resides inside hbase. Does bigtop provide that kind of use case? A fully automated comprehensive distributed integration test for HBase -- Key: HBASE-4821 URL: https://issues.apache.org/jira/browse/HBASE-4821 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Critical To properly verify that a particular version of HBase is good for production deployment we need a better way to do real cluster testing after incremental changes. Running unit tests is good, but we also need to deploy HBase to a cluster, run integration tests, load tests, Thrift server tests, kill some region servers, kill the master, and produce a report. All of this needs to happen in 20-30 minutes with minimal manual intervention. I think this way we can combine agile development with high stability of the codebase. I am envisioning a high-level framework written in a scripting language (e.g. Python) that would abstract external operations such as deploy to test cluster, kill a particular server, run load test A, run load test B (we already have a few kinds of load tests implemented in Java, and we could write a Thrift load test in Python). This tool should also produce intermediate output, allowing to catch problems early and restart the test. No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261973#comment-13261973 ] Enis Soztutar commented on HBASE-4821: -- Yeah, it makes sense. Agreed that we want to run HBase MR kind of tests as both unit tests and #2 tests at a larger scale. What I wanted to ask actually was whether bigtop already provides such an API, or shall we develop one in bigtop. One other consideration is to abstract away the data for the tests. When run in a local cluster, we want to finish in a reasonable time, but when run on a 5-node cluster or a 100-node cluster, the tests should reasonable stress the cluster accordingly. A fully automated comprehensive distributed integration test for HBase -- Key: HBASE-4821 URL: https://issues.apache.org/jira/browse/HBASE-4821 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Critical To properly verify that a particular version of HBase is good for production deployment we need a better way to do real cluster testing after incremental changes. Running unit tests is good, but we also need to deploy HBase to a cluster, run integration tests, load tests, Thrift server tests, kill some region servers, kill the master, and produce a report. All of this needs to happen in 20-30 minutes with minimal manual intervention. I think this way we can combine agile development with high stability of the codebase. I am envisioning a high-level framework written in a scripting language (e.g. Python) that would abstract external operations such as deploy to test cluster, kill a particular server, run load test A, run load test B (we already have a few kinds of load tests implemented in Java, and we could write a Thrift load test in Python). This tool should also produce intermediate output, allowing to catch problems early and restart the test. No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262131#comment-13262131 ] Enis Soztutar commented on HBASE-4821: -- Yep, I was referring to a shim layer + utilities to run against deployed or local cluster. Let me check out what we have in bigtop. A fully automated comprehensive distributed integration test for HBase -- Key: HBASE-4821 URL: https://issues.apache.org/jira/browse/HBASE-4821 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Critical To properly verify that a particular version of HBase is good for production deployment we need a better way to do real cluster testing after incremental changes. Running unit tests is good, but we also need to deploy HBase to a cluster, run integration tests, load tests, Thrift server tests, kill some region servers, kill the master, and produce a report. All of this needs to happen in 20-30 minutes with minimal manual intervention. I think this way we can combine agile development with high stability of the codebase. I am envisioning a high-level framework written in a scripting language (e.g. Python) that would abstract external operations such as deploy to test cluster, kill a particular server, run load test A, run load test B (we already have a few kinds of load tests implemented in Java, and we could write a Thrift load test in Python). This tool should also produce intermediate output, allowing to catch problems early and restart the test. No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5849) On first cluster startup, RS aborts if root znode is not available
[ https://issues.apache.org/jira/browse/HBASE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262137#comment-13262137 ] Enis Soztutar commented on HBASE-5849: -- Thanks all for pursuing this. From the failed Hudson builds: https://builds.apache.org/job/HBase-TRUNK-security/183/ https://builds.apache.org/job/HBase-TRUNK/2811/testReport/ https://builds.apache.org/job/HBase-0.92/390/ https://builds.apache.org/job/HBase-0.94-security/21/ None of the tests seem related. @Stack, for EvictionThread, I guess since the git repo is falling behind, I might not have your recent changes (I'm so lazy to checkout from svn). Although I saw also some other daemon threads (like a couple of IPC Client threads, etc). Let me dig into that later, and see if we can improve on that. I'll open another jira if I find anything interesting. On first cluster startup, RS aborts if root znode is not available -- Key: HBASE-5849 URL: https://issues.apache.org/jira/browse/HBASE-5849 Project: HBase Issue Type: Bug Components: master, regionserver, zookeeper Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.0 Attachments: 5849v3.txt, HBASE-5849_v1.patch, HBASE-5849_v2.patch, HBASE-5849_v4-0.92.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch, HBASE-5849_v4.patch When launching a fresh new cluster, the master has to be started first, which might create race conditions for starting master and rs at the same time. Master startup code is smt like this: - establish zk connection - create root znodes in zk (/hbase) - create ephemeral node for master /hbase/master, Region server start up code is smt like this: - establish zk connection - check whether the root znode (/hbase) is there. If not, shutdown. - wait for the master to create znodes /hbase/master So, the problem is on the very first launch of the cluster, RS aborts to start since /hbase znode might not have been created yet (only the master creates it if needed). Since /hbase/ is not deleted on cluster shutdown, on subsequent cluster starts, it does not matter which order the servers are started. So this affects only first launchs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5888) Clover profile in build
Enis Soztutar created HBASE-5888: Summary: Clover profile in build Key: HBASE-5888 URL: https://issues.apache.org/jira/browse/HBASE-5888 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Clover is disabled right now. I would like to add a profile that enables clover reports. We can also backport this to 0.92, and 0.94, since we are also interested in test coverage for those branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5888) Clover profile in build
[ https://issues.apache.org/jira/browse/HBASE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5888: - Attachment: hbase-clover_v1.patch Patch against trunk. I'll provide backwards patches once we are settled. Replicating the patch comment: Profile for running clover. You need to have a clover license under ~/.clover.license for ${clover.version} or you can provide the license with -Dmaven.clover.licenseLocation=/path/to/license. Committers can find the license under https://svn.apache.org/repos/private/committers/donated-licenses/clover/ Note that clover 2.6.3 does not run with maven 3, so you have to use maven2. The report will be generated under target/site/clover/index.html when you run MAVEN_OPTS=-Xmx2048m mvn clean test -Pclover site Clover profile in build --- Key: HBASE-5888 URL: https://issues.apache.org/jira/browse/HBASE-5888 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: hbase-clover_v1.patch Clover is disabled right now. I would like to add a profile that enables clover reports. We can also backport this to 0.92, and 0.94, since we are also interested in test coverage for those branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5385) Delete table/column should delete stored permissions on -acl- table
[ https://issues.apache.org/jira/browse/HBASE-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265212#comment-13265212 ] Enis Soztutar commented on HBASE-5385: -- Looks good. Can we add: 1. Audit logging AccessController.AUDITLOG 2. On preCreateTable and preAddColumn, ensure that the acl table is empty for the table / column. We might still have residual acl entries if smt goes wrong. If so, we should refuse creating a table by throwing a kind of access control exception. Andrew, any comments? Delete table/column should delete stored permissions on -acl- table - Key: HBASE-5385 URL: https://issues.apache.org/jira/browse/HBASE-5385 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Matteo Bertozzi Attachments: HBASE-5385-v0.patch, HBASE-5385-v1.patch Deleting the table or a column does not cascade to the stored permissions at the -acl- table. We should also remove those permissions, otherwise, it can be a security leak, where freshly created tables contain permissions from previous same-named tables. We might also want to ensure, upon table creation, that no entries are already stored at the -acl- table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5888) Clover profile in build
[ https://issues.apache.org/jira/browse/HBASE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5888: - Attachment: HBASE-5358_v2.patch Updated the patch to ignore generated packages (thrift.generated, protobuf.generated), since they are skewing coverage results. I uploaded a sample report for 0.92 here: http://people.apache.org/~enis/hbase-clover/ Clover profile in build --- Key: HBASE-5888 URL: https://issues.apache.org/jira/browse/HBASE-5888 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5358_v2.patch, hbase-clover_v1.patch Clover is disabled right now. I would like to add a profile that enables clover reports. We can also backport this to 0.92, and 0.94, since we are also interested in test coverage for those branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5888) Clover profile in build
[ https://issues.apache.org/jira/browse/HBASE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5888: - Status: Patch Available (was: Open) Clover profile in build --- Key: HBASE-5888 URL: https://issues.apache.org/jira/browse/HBASE-5888 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5358_v2.patch, hbase-clover_v1.patch Clover is disabled right now. I would like to add a profile that enables clover reports. We can also backport this to 0.92, and 0.94, since we are also interested in test coverage for those branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5888) Clover profile in build
[ https://issues.apache.org/jira/browse/HBASE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5888: - Status: Open (was: Patch Available) Clover profile in build --- Key: HBASE-5888 URL: https://issues.apache.org/jira/browse/HBASE-5888 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: hbase-clover_v1.patch Clover is disabled right now. I would like to add a profile that enables clover reports. We can also backport this to 0.92, and 0.94, since we are also interested in test coverage for those branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5888) Clover profile in build
[ https://issues.apache.org/jira/browse/HBASE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5888: - Attachment: (was: HBASE-5358_v2.patch) Clover profile in build --- Key: HBASE-5888 URL: https://issues.apache.org/jira/browse/HBASE-5888 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: hbase-clover_v1.patch Clover is disabled right now. I would like to add a profile that enables clover reports. We can also backport this to 0.92, and 0.94, since we are also interested in test coverage for those branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5888) Clover profile in build
[ https://issues.apache.org/jira/browse/HBASE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5888: - Attachment: hbase-clover_v2.patch Uploaded wrong patch. This should be the one. Clover profile in build --- Key: HBASE-5888 URL: https://issues.apache.org/jira/browse/HBASE-5888 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: hbase-clover_v1.patch, hbase-clover_v2.patch Clover is disabled right now. I would like to add a profile that enables clover reports. We can also backport this to 0.92, and 0.94, since we are also interested in test coverage for those branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5888) Clover profile in build
[ https://issues.apache.org/jira/browse/HBASE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5888: - Status: Patch Available (was: Open) Clover profile in build --- Key: HBASE-5888 URL: https://issues.apache.org/jira/browse/HBASE-5888 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: hbase-clover_v1.patch, hbase-clover_v2.patch Clover is disabled right now. I would like to add a profile that enables clover reports. We can also backport this to 0.92, and 0.94, since we are also interested in test coverage for those branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5968) Proper html escaping for region names
Enis Soztutar created HBASE-5968: Summary: Proper html escaping for region names Key: HBASE-5968 URL: https://issues.apache.org/jira/browse/HBASE-5968 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar I noticed that we are not doing html escaping for the rs/master web interfaces, so you can end up generating html like: {code} tr tdci,,\xEEp/T\xBE\xC0,1336471826990.fc5a943e75ce8521b1ccdaf72d2c96c8./td td a href=http://hrt24n06.cc1.ygridcore.net:60030/;hrt24n06.cc1.ygridcore.net:60030/a /td td,\xEEp/T\xBE\xC0/td td-n\xA8\xE0\x15\xDD\x80!/td td2966724/td /tr {code} This obviously does not render properly. Also, my crazy theory is that it can be a security risk. Since the region name is computed from table rows, which are most of the time user input. Thus if the rows contain a script onload= or similar, then that will be executed on the developer's browser having possibly access to dev environment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5968) Proper html escaping for region names
[ https://issues.apache.org/jira/browse/HBASE-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5968: - Description: I noticed that we are not doing html escaping for the rs/master web interfaces, so you can end up generating html like: {code} tr tdci,,\xEEp/T\xBE\xC0,1336471826990.fc5a943e75ce8521b1ccdaf72d2c96c8./td td a href=hostnamehostname/a /td td,\xEEp/T\xBE\xC0/td td-n\xA8\xE0\x15\xDD\x80!/td td2966724/td /tr {code} This obviously does not render properly. Also, my crazy theory is that it can be a security risk. Since the region name is computed from table rows, which are most of the time user input. Thus if the rows contain a script onload= or similar, then that will be executed on the developer's browser having possibly access to dev environment. was: I noticed that we are not doing html escaping for the rs/master web interfaces, so you can end up generating html like: {code} tr tdci,,\xEEp/T\xBE\xC0,1336471826990.fc5a943e75ce8521b1ccdaf72d2c96c8./td td a href=http://hrt24n06.cc1.ygridcore.net:60030/;hrt24n06.cc1.ygridcore.net:60030/a /td td,\xEEp/T\xBE\xC0/td td-n\xA8\xE0\x15\xDD\x80!/td td2966724/td /tr {code} This obviously does not render properly. Also, my crazy theory is that it can be a security risk. Since the region name is computed from table rows, which are most of the time user input. Thus if the rows contain a script onload= or similar, then that will be executed on the developer's browser having possibly access to dev environment. Proper html escaping for region names - Key: HBASE-5968 URL: https://issues.apache.org/jira/browse/HBASE-5968 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar I noticed that we are not doing html escaping for the rs/master web interfaces, so you can end up generating html like: {code} tr tdci,,\xEEp/T\xBE\xC0,1336471826990.fc5a943e75ce8521b1ccdaf72d2c96c8./td td a href=hostnamehostname/a /td td,\xEEp/T\xBE\xC0/td td-n\xA8\xE0\x15\xDD\x80!/td td2966724/td /tr {code} This obviously does not render properly. Also, my crazy theory is that it can be a security risk. Since the region name is computed from table rows, which are most of the time user input. Thus if the rows contain a script onload= or similar, then that will be executed on the developer's browser having possibly access to dev environment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)
[ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271036#comment-13271036 ] Enis Soztutar commented on HBASE-5754: -- In one of my 0.92.x tests on a 10 node cluster, 250M inserts, I did manage to get the verify to fail: {code} 12/05/08 11:11:18 INFO mapred.JobClient: goraci.Verify$Counts 12/05/08 11:11:18 INFO mapred.JobClient: UNDEFINED=972506 12/05/08 11:11:18 INFO mapred.JobClient: REFERENCED=248051318 12/05/08 11:11:18 INFO mapred.JobClient: UNREFERENCED=972506 12/05/08 11:11:18 INFO mapred.JobClient: Map-Reduce Framework 12/05/08 11:11:18 INFO mapred.JobClient: Map input records=249023824 {code} Notice that map input records is 1M less that 250M, which indicates that the inputformat did not provide all records in the table. The missing rows all belong to the single region. I rerun the test again after a couple of hours, and it passed. But the failed test created 244 maps, instead of 246, which is the current region count, so I am suspecting there is something wrong in the split calculation or in the supposed transactional behavior for split/balance operations in the meta table. I am still inspecting the code and the logs, but any pointers are welcome. data lost with gora continuous ingest test (goraci) --- Key: HBASE-5754 URL: https://issues.apache.org/jira/browse/HBASE-5754 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Environment: 10 node test cluster Reporter: Eric Newton Assignee: stack Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends. I put a billion entries into HBase, and ran the Verify map/reduce job. The verification failed because about 21K entries were missing. The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data. I re-ran the test with 100 million entries, and it verified successfully. Both of the times I tested using a billion entries, the verification failed. If I run the verification step twice, the results are consistent, so the problem is probably not on the verify step. Here's the versions of the various packages: ||package||version|| |hadoop|0.20.205.0| |hbase|0.92.1| |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277| |goraci|https://github.com/ericnewton/goraci tagged 2012-04-08| The change I made to goraci was to configure it for hbase and to allow it to build properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5986) Clients can see holes in the META table when regions are being split
Enis Soztutar created HBASE-5986: Summary: Clients can see holes in the META table when regions are being split Key: HBASE-5986 URL: https://issues.apache.org/jira/browse/HBASE-5986 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar We found this issue when running large scale ingestion tests for HBASE-5754. The problem is that the .META. table updates are not atomic while splitting a region. In SplitTransaction, there is a time lap between the marking the parent offline, and adding of daughters to the META table. This can result in clients using MetaScanner, of HTable.getStartEndKeys (used by the TableInputFormat) missing regions which are made just offline, but the daughters are not added yet. This is also related to HBASE-4335. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5986) Clients can see holes in the META table when regions are being split
[ https://issues.apache.org/jira/browse/HBASE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5986: - Attachment: HBASE-5986-test_v1.patch Attaching a unit test to illustrate the problem. The test fails for me when splitting the first region, the client sees 0 regions for some time. Clients can see holes in the META table when regions are being split Key: HBASE-5986 URL: https://issues.apache.org/jira/browse/HBASE-5986 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5986-test_v1.patch We found this issue when running large scale ingestion tests for HBASE-5754. The problem is that the .META. table updates are not atomic while splitting a region. In SplitTransaction, there is a time lap between the marking the parent offline, and adding of daughters to the META table. This can result in clients using MetaScanner, of HTable.getStartEndKeys (used by the TableInputFormat) missing regions which are made just offline, but the daughters are not added yet. This is also related to HBASE-4335. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5986) Clients can see holes in the META table when regions are being split
[ https://issues.apache.org/jira/browse/HBASE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13272918#comment-13272918 ] Enis Soztutar commented on HBASE-5986: -- Possible fixes I can think of: 1. Keep MetaScanner/MetaReader as non-consistent (as it is), but allow for a consistent view for getting table regions. Since single row puts are atomic, when the parent region is mutated to be offline, the HRI for daughters are added to the row. So on MetaScanner.allTableRegions and similar calls, we can keep track of daughter regions from split parents and return them to the client. 2. Make MetaScanner consistent, in that, whenever it sees a split parent, it blocks until the daughters are available. 3. We have region-local transactions now, so if we ensure that the rows for parent and daughters will be served from the same META region, then we can update all three rows atomically. Maybe we can come up with a META-specific split policy to ensure split-regions go to the same META region. Thoughts? Clients can see holes in the META table when regions are being split Key: HBASE-5986 URL: https://issues.apache.org/jira/browse/HBASE-5986 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5986-test_v1.patch We found this issue when running large scale ingestion tests for HBASE-5754. The problem is that the .META. table updates are not atomic while splitting a region. In SplitTransaction, there is a time lap between the marking the parent offline, and adding of daughters to the META table. This can result in clients using MetaScanner, of HTable.getStartEndKeys (used by the TableInputFormat) missing regions which are made just offline, but the daughters are not added yet. This is also related to HBASE-4335. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5986) Clients can see holes in the META table when regions are being split
[ https://issues.apache.org/jira/browse/HBASE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13272952#comment-13272952 ] Enis Soztutar commented on HBASE-5986: -- bq. I think we are assuming in many other places that META only has a single region. The ROOT is -by-design- one region, but META is not, right? bq. Is there another alternative, such as adding the daughter regions first, and then have HTable disentangle conflicts? I have thought about this as well, but then there is a time window in which you have both the parent and daughter regions online, and parent not marked as split. So the client again has to resolve that the returned regions are overlapping. Clients can see holes in the META table when regions are being split Key: HBASE-5986 URL: https://issues.apache.org/jira/browse/HBASE-5986 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5986-test_v1.patch We found this issue when running large scale ingestion tests for HBASE-5754. The problem is that the .META. table updates are not atomic while splitting a region. In SplitTransaction, there is a time lap between the marking the parent offline, and adding of daughters to the META table. This can result in clients using MetaScanner, of HTable.getStartEndKeys (used by the TableInputFormat) missing regions which are made just offline, but the daughters are not added yet. This is also related to HBASE-4335. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5986) Clients can see holes in the META table when regions are being split
[ https://issues.apache.org/jira/browse/HBASE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13272987#comment-13272987 ] Enis Soztutar commented on HBASE-5986: -- I have implemented approach 1 by adding split daughters to the returned map from MetaScanner.allTableRegions(). But then the problem is that, we are returning regions which does not yet exists in the META table, so any subsequent getRegion call will fail. Thinking a bit more about 3, I think we already guarantee that the region split parent, and daughters fall into the same META region. Let's say we have two regions region1 and region2, with start keys start_key*, and timestamps ts* respectively. Before split: {code} table start_key1 ts1 encoded_name1 table start_key2 ts2 encoded_name2 {code} Now, if we split region1, daughters will be sorted after region1, and before region2: {code} table start_key1 ts1 encoded_name1 offline split table start_key1 ts3 encoded_name1 table mid_key1 ts3 encoded_name1 table start_key2 ts2 encoded_name2 {code} we know this since we have the invariants ts3 ts1 (SplitTransaction.getDaughterRegionIdTimestamp()) and start_key1 mid_key1 start_key2. Even if we have a region boundary between start_key1 and start_key2 in the META table, the daughters will be co-located with the parent. The only exception is that while the user table is split, we have a concurrent split for the META table, and the new region boundary is chosen to be between the parent and daughters. With some effort, we can prevent this, but it seems to be very highly unlikely. So, if my analysis is correct, that means option 3 seems like the best choice, since this will not complicate the meta scan code. The problem is that, there is no internal API to do multi-row transcations other than using the coprocessor. Should we think of allowing that w/o coprocessors? @Lars, does HRegion.mutateRowsWithLock() guarantee that a concurrent scanner won't see partial changes? Clients can see holes in the META table when regions are being split Key: HBASE-5986 URL: https://issues.apache.org/jira/browse/HBASE-5986 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5986-test_v1.patch We found this issue when running large scale ingestion tests for HBASE-5754. The problem is that the .META. table updates are not atomic while splitting a region. In SplitTransaction, there is a time lap between the marking the parent offline, and adding of daughters to the META table. This can result in clients using MetaScanner, of HTable.getStartEndKeys (used by the TableInputFormat) missing regions which are made just offline, but the daughters are not added yet. This is also related to HBASE-4335. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6000) Cleanup where we keep .proto files
[ https://issues.apache.org/jira/browse/HBASE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276371#comment-13276371 ] Enis Soztutar commented on HBASE-6000: -- .proto files are source files, so +1 for putting them under src/main. bq. In my opinion it should not be necessary to have protoc installed to build HBase, just like it's not necessary to have the Thrift compiler available +1 to that. I think we should make out-of-the-box compilation as easy as possible. If we commit the generated sources under src/, it should be ok. Also +1 on having generated in the package name. We have some maven targets depending on that convention. Cleanup where we keep .proto files -- Key: HBASE-6000 URL: https://issues.apache.org/jira/browse/HBASE-6000 Project: HBase Issue Type: Bug Reporter: stack I see Andrew for his pb work over in rest has .protos files under src/main/resources. We should unify where these files live. The recently added .protos place them under src/main/protobuf Its confusing. The thift idl files are here under resources too. Seems like we should move src/main/protobuf under src/resources to be consistent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6025) Expose Hadoop Metrics through JSON Rest interface
[ https://issues.apache.org/jira/browse/HBASE-6025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277279#comment-13277279 ] Enis Soztutar commented on HBASE-6025: -- How is this different than http://region-server:60030/jmx ? It works with hadoop-1.0.1+ (HBASE-5309) Expose Hadoop Metrics through JSON Rest interface - Key: HBASE-6025 URL: https://issues.apache.org/jira/browse/HBASE-6025 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6025) Expose Hadoop Dynamic Metrics through JSON Rest interface
[ https://issues.apache.org/jira/browse/HBASE-6025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13277481#comment-13277481 ] Enis Soztutar commented on HBASE-6025: -- Makes sense. Thanks for changing the issue title. Is the plan to add them to jmx, which in turn will make them available under /jmx? Expose Hadoop Dynamic Metrics through JSON Rest interface - Key: HBASE-6025 URL: https://issues.apache.org/jira/browse/HBASE-6025 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6025) Expose Hadoop Dynamic Metrics through JSON Rest interface
[ https://issues.apache.org/jira/browse/HBASE-6025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13278011#comment-13278011 ] Enis Soztutar commented on HBASE-6025: -- bq. You saying that what comes out of jmx will show in a hadoop webapp servlet mounted at /jmx? Yes, it comes from the Hadoop HttpServer class, which adds JMXJsonServlet to serve for /jmx. That servlet finds all registered MBeans (including metrics) and exports them via json. bq. we ship 1.0.0 even in 0.94 which needs to be fixed hadoop-1.0.3 is recently released with fixes for non-oracle jvms and other bug fixes. We can switch to that. bq. We should add a 'metrics' link along the top beside the log level, thread dump, etc. servlets Agreed. I learned about this from the Hadoop folks. We should make it more visible. bq. Oh, it looks like the jmx per-region stuff is showing under /jmx because Elliott already added this to 0.94 and trunk In my near-trunk test, I saw the dynamic metrics json, but nothing reported underneath (did not spend much time on it). If it works for you, than what you outlined seems like a good plan. Expose Hadoop Dynamic Metrics through JSON Rest interface - Key: HBASE-6025 URL: https://issues.apache.org/jira/browse/HBASE-6025 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6025) Expose Hadoop Dynamic Metrics through JSON Rest interface
[ https://issues.apache.org/jira/browse/HBASE-6025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13278019#comment-13278019 ] Enis Soztutar commented on HBASE-6025: -- I see. Then, +1 for HBASE-5802. @Elliot, do you see the exception that Stack has pasted? Expose Hadoop Dynamic Metrics through JSON Rest interface - Key: HBASE-6025 URL: https://issues.apache.org/jira/browse/HBASE-6025 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6034) Upgrade Hadoop dependency for 0.92 branch
[ https://issues.apache.org/jira/browse/HBASE-6034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13278103#comment-13278103 ] Enis Soztutar commented on HBASE-6034: -- Shall we do this for 92, 94 and trunk? Upgrade Hadoop dependency for 0.92 branch - Key: HBASE-6034 URL: https://issues.apache.org/jira/browse/HBASE-6034 Project: HBase Issue Type: Task Affects Versions: 0.92.2 Reporter: Andrew Purtell Priority: Minor Attachments: 6034.092.txt, 6034.094.txt 0.92 branch currently depends on Hadoop 1.0.0, but this has been moved to the archive. The earliest release on www.apache.org/dist/ is 1.0.1. Consider moving up? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6025) Expose Hadoop Dynamic Metrics through JSON Rest interface
[ https://issues.apache.org/jira/browse/HBASE-6025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6025: - Attachment: hbase-jmx.patch Attaching simple patch to add /jmx links to the rs/master web UI's. Expose Hadoop Dynamic Metrics through JSON Rest interface - Key: HBASE-6025 URL: https://issues.apache.org/jira/browse/HBASE-6025 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Attachments: hbase-jmx.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6009) Changes for HBASE-5209 are technically incompatible
[ https://issues.apache.org/jira/browse/HBASE-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13279075#comment-13279075 ] Enis Soztutar commented on HBASE-6009: -- +1 for the release note on 0.92.1. Changes for HBASE-5209 are technically incompatible --- Key: HBASE-6009 URL: https://issues.apache.org/jira/browse/HBASE-6009 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.1, 0.94.0 Reporter: David S. Wang The additions to add backup masters to ClusterStatus are technically incompatible between clients and servers. Older clients will basically not read the extra bits that the newer server pushes for the backup masters, thus screwing up the serialization for the next blob in the pipe. For the Writable, we can add a total size field for ClusterStatus at the beginning, or we can have start and end markers. I can make a patch for either approach; interested in whatever folks have to suggest. Would be good to get this in soon to limit the damage to 0.92.1 (don't know if we can get this in in time for 0.94.0). Either change will make us forward-compatible starting with when the change goes in, but will not fix the backwards incompatibility, which we will have to mark with a release note as there have already been releases with this change. Hopefully we can do this in a cleaner way when wire compat rolls around in 0.96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
Enis Soztutar created HBASE-6060: Summary: Regions's in OPENING state from failed regionservers takes a long time to recover Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: Enis Soztutar we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280469#comment-13280469 ] Enis Soztutar commented on HBASE-6060: -- Thanks Andrew for the pointer. Agreed that lowering the timeout can have deeper impacts. We should fix the issue properly instead. Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: Enis Soztutar we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5986) Clients can see holes in the META table when regions are being split
[ https://issues.apache.org/jira/browse/HBASE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5986: - Attachment: HBASE-5986-0.94.patch HBASE-5986-0.92.patch Attaching patches for 0.92 and 0.94 branches. They are direct ports of the v3 patch, but 0.92 patch also includes HRegionServer.getOnlineRegions(byte[] tableName) function directly copied from 0.94, since we need it. I have discovered this when testing with 0.92, so I would like it to make into it. One minor mishap from my part is that the v3 patch which went into trunk includes an unrelated change in RegionServerDynamicStatistics. Related issue is HBASE-6025. Although the change is trivial ,changing RegionServerDynamicStatistics to extend hbase-specific MetricsMBeanBase rather than hadoop-specific MetricsDynamicMBeanBase, we may want to note this, or revert that part. Backport patches does not include this change. Sorry for the trouble guys. Clients can see holes in the META table when regions are being split Key: HBASE-5986 URL: https://issues.apache.org/jira/browse/HBASE-5986 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.96.0 Attachments: 5986-v2.txt, HBASE-5986-0.92.patch, HBASE-5986-0.94.patch, HBASE-5986-test_v1.patch, HBASE-5986_v3.patch We found this issue when running large scale ingestion tests for HBASE-5754. The problem is that the .META. table updates are not atomic while splitting a region. In SplitTransaction, there is a time lap between the marking the parent offline, and adding of daughters to the META table. This can result in clients using MetaScanner, of HTable.getStartEndKeys (used by the TableInputFormat) missing regions which are made just offline, but the daughters are not added yet. This is also related to HBASE-4335. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5986) Clients can see holes in the META table when regions are being split
[ https://issues.apache.org/jira/browse/HBASE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283735#comment-13283735 ] Enis Soztutar commented on HBASE-5986: -- @Ted, I did run TestEndToEndSplitTransaction, but not the whole suite. Let me do that. Clients can see holes in the META table when regions are being split Key: HBASE-5986 URL: https://issues.apache.org/jira/browse/HBASE-5986 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.96.0 Attachments: 5986-v2.txt, HBASE-5986-0.92.patch, HBASE-5986-0.94.patch, HBASE-5986-test_v1.patch, HBASE-5986_v3.patch We found this issue when running large scale ingestion tests for HBASE-5754. The problem is that the .META. table updates are not atomic while splitting a region. In SplitTransaction, there is a time lap between the marking the parent offline, and adding of daughters to the META table. This can result in clients using MetaScanner, of HTable.getStartEndKeys (used by the TableInputFormat) missing regions which are made just offline, but the daughters are not added yet. This is also related to HBASE-4335. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5986) Clients can see holes in the META table when regions are being split
[ https://issues.apache.org/jira/browse/HBASE-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283881#comment-13283881 ] Enis Soztutar commented on HBASE-5986: -- Here are the test results for 0.94: {code} Tests run: 551, Failures: 0, Errors: 0, Skipped: 0 ... Tests run: 932, Failures: 1, Errors: 2, Skipped: 9 Failed tests: testShutdownSimpleFixup(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster): expected:1 but was:0 Tests in error: testDelayedRpcImmediateReturnValue(org.apache.hadoop.hbase.ipc.TestDelayedRpc): Call to /127.0.0.1:53586 failed on socket timeout exception: java.net.SocketTimeoutException: 1000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:53623 remote=/127.0.0.1:53586] testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster): Master not initialized after 200 seconds {code} I rerun the tests locally with success, except TestLocalHBaseCluster. But it fails on 0.94 HEAD as well for me. For 0.92: {code} Results : Failed tests: testMultipleResubmits(org.apache.hadoop.hbase.master.TestSplitLogManager) testcomputeHDFSBlocksDistribution(org.apache.hadoop.hbase.util.TestFSUtils) Tests in error: testClusterRestart(org.apache.hadoop.hbase.master.TestRestartCluster): org.apache.hadoop.hbase.PleaseHoldException: Master is initializing testWholesomeSplit(org.apache.hadoop.hbase.regionserver.TestSplitTransaction): Failed delete of /homes/hortonde/enis/code/hbase-0.92/target/test-data/af023188-0b23-4f9d-a9bc-a074e94e57f8/org.apache.hadoop.hbase.regionserver.TestSplitTransaction/table/7c59b6677ad46bf3f652a83de1e62bcb testRollback(org.apache.hadoop.hbase.regionserver.TestSplitTransaction): Target HLog directory already exists: /homes/hortonde/enis/code/hbase-0.92/target/test-data/af023188-0b23-4f9d-a9bc-a074e94e57f8/org.apache.hadoop.hbase.regionserver.TestSplitTransaction/logs testRollback(org.apache.hadoop.hbase.regionserver.TestSplitTransaction) loadTest[0](org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential): test timed out after 12 milliseconds loadTest[0](org.apache.hadoop.hbase.util.TestMiniClusterLoadParallel): test timed out after 12 milliseconds Tests run: 1135, Failures: 2, Errors: 6, Skipped: 8 {code} Also run those failed tests locally with success. It seems we can go ahead with 0.92 and 0.94 if you don't have any concerns. Clients can see holes in the META table when regions are being split Key: HBASE-5986 URL: https://issues.apache.org/jira/browse/HBASE-5986 Project: HBase Issue Type: Bug Affects Versions: 0.92.1, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.96.0 Attachments: 5986-v2.txt, HBASE-5986-0.92.patch, HBASE-5986-0.94.patch, HBASE-5986-test_v1.patch, HBASE-5986_v3.patch We found this issue when running large scale ingestion tests for HBASE-5754. The problem is that the .META. table updates are not atomic while splitting a region. In SplitTransaction, there is a time lap between the marking the parent offline, and adding of daughters to the META table. This can result in clients using MetaScanner, of HTable.getStartEndKeys (used by the TableInputFormat) missing regions which are made just offline, but the daughters are not added yet. This is also related to HBASE-4335. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6135) Style the Web UI to use Twitter's Bootstrap.
[ https://issues.apache.org/jira/browse/HBASE-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285910#comment-13285910 ] Enis Soztutar commented on HBASE-6135: -- +1 to bootstrap. Style the Web UI to use Twitter's Bootstrap. Key: HBASE-6135 URL: https://issues.apache.org/jira/browse/HBASE-6135 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Fix For: 0.96.0 Our web ui has lagged a little bit behind. While it's not a huge deal, it is one of the first things that new people see. As such styling it a little bit better would put a good foot forward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6096) AccessController v2
[ https://issues.apache.org/jira/browse/HBASE-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286125#comment-13286125 ] Enis Soztutar commented on HBASE-6096: -- bq. The superuser shortcut should be removed. We need something like a superuser, so that if somehow there is a mixup of grants, we can fix it. But as Andrew suggest, just using the service principal should be good enough. bq. Also we could drop the owner concept +1. It is better to manage all permissions from one place. AccessController v2 --- Key: HBASE-6096 URL: https://issues.apache.org/jira/browse/HBASE-6096 Project: HBase Issue Type: Umbrella Components: security Affects Versions: 0.96.0, 0.94.1 Reporter: Andrew Purtell Umbrella issue for iteration on the initial AccessController drop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286795#comment-13286795 ] Enis Soztutar commented on HBASE-6060: -- @Ramkrishna, that is great. I have also noticed regions in CLOSING to stay in RIT as well, and strangely enough, showing the master as their assigned server. Do you think that it can be related? Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: Enis Soztutar we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6152) Split abort is not handled properly
[ https://issues.apache.org/jira/browse/HBASE-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287838#comment-13287838 ] Enis Soztutar commented on HBASE-6152: -- I think the problem is that the master offlines the region at step 3, however, the parent region is recovered, and onlined by RS. So all other region transitions fail for the master. Split abort is not handled properly --- Key: HBASE-6152 URL: https://issues.apache.org/jira/browse/HBASE-6152 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Devaraj Das Assignee: Devaraj Das I ran into this: 1. RegionServer started to split a region(R), but the split was taking a long time, and hence the split was aborted 2. As part of cleanup, the RS deleted the ZK node that it created initially for R 3. The master (AssignmentManager) noticed the node deletion, and made R offline 4. The RS recovered from the failure, and at some point of time, tried to do the split again. 5. The master got an event RS_ZK_REGION_SPLIT but the server gave an error like - Received SPLIT for region R from server RS but it doesn't exist anymore,.. 6. The RS apparently did the split successfully this time, but is stuck on the master to delete the znode for the region. It kept on saying - org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for R and it was stuck there forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6160) META entries from daughters can be deleted before parent entries
[ https://issues.apache.org/jira/browse/HBASE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288875#comment-13288875 ] Enis Soztutar commented on HBASE-6160: -- One option for a fix is to ensure that META entry for the parent region is deleted before deleting the META entry, and do the META entry deletion recursively. META entries from daughters can be deleted before parent entries Key: HBASE-6160 URL: https://issues.apache.org/jira/browse/HBASE-6160 Project: HBase Issue Type: Bug Components: client, regionserver Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar HBASE-5986 fixed and issue, where the client sees the META entry for the parent, but not the children. However, after the fix, we have seen the following issue in tests: Region A is split to - B, C Region B is split to - D, E After some time, META entry for B is deleted since it is not needed anymore, but META entry for Region A stays in META (C still refers it). In this case, the client throws RegionOfflineException for B. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6160) META entries from daughters can be deleted before parent entries
[ https://issues.apache.org/jira/browse/HBASE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288941#comment-13288941 ] Enis Soztutar commented on HBASE-6160: -- The exception: {code} 12/06/04 06:50:41 ERROR security.UserGroupInformation: PriviledgedActionException as: cause:org.apache.hadoop.hbase.client.RegionOfflineException: Split daughter region TestLoadAndVerify_1338798130970,\\xA2\x04\x00\x00\x00\x00\x00/48_0,1338800158687.50a4617eead34cad335a8dfa727d177d. cannot be found in META. Exception in thread main org.apache.hadoop.hbase.client.RegionOfflineException: Split daughter region TestLoadAndVerify_1338798130970,\\xA2\x04\x00\x00\x00\x00\x00/48_0,1338800158687.50a4617eead34cad335a8dfa727d177d. cannot be found in META. at org.apache.hadoop.hbase.client.MetaScanner$BlockingMetaScannerVisitor.processRow(MetaScanner.java:433) at org.apache.hadoop.hbase.client.MetaScanner$TableMetaScannerVisitor.processRow(MetaScanner.java:490) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:227) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:57) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:136) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:133) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:361) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:133) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:108) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:86) at org.apache.hadoop.hbase.client.MetaScanner.allTableRegions(MetaScanner.java:326) at org.apache.hadoop.hbase.client.HTable.getRegionLocations(HTable.java:499) at org.apache.hadoop.hbase.client.HTable.getStartEndKeys(HTable.java:452) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:132) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979) {code} So the region in question is {code} 50a4617eead34cad335a8dfa727d177d {code} and from the logs we see that {{25d9c4ff574a37bd95bf5e5be6d618dd}} is split into {{1dc74065583c67b3916c4ed158cb53fa}} and {{50a4617eead34cad335a8dfa727d177d}} {code} ./hbase-hbase-regionserver-ip-10-226-65-102.log:2012-06-04 04:56:02,855 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, META updated, and report to master. Parent=TestLoadAndVerify_1338798130970,[\x02\x01\x00\x00\x00\x00\x00/71_0,1338799021182.25d9c4ff574a37bd95bf5e5be6d618dd., new regions: TestLoadAndVerify_1338798130970,[\x02\x01\x00\x00\x00\x00\x00/71_0,1338800158687.1dc74065583c67b3916c4ed158cb53fa., TestLoadAndVerify_1338798130970,\\xA2\x04\x00\x00\x00\x00\x00/48_0,1338800158687.50a4617eead34cad335a8dfa727d177d.. Split took 4sec {code} After some time, {{50a4617eead34cad335a8dfa727d177d}} is further split into two: {code} ./hbase-hbase-regionserver-ip-10-226-65-102.log:2012-06-04 05:41:13,488 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, META updated, and report to master. Parent=TestLoadAndVerify_1 338798130970,\\xA2\x04\x00\x00\x00\x00\x00/48_0,1338800158687.50a4617eead34cad335a8dfa727d177d., new regions: TestLoadAndVerify_1338798130970,\\xA2\x04\x00\x00\x00\x00\x00/48_0,1338802866393.16288 65d7fa8e9eec3a7d8073465296e., TestLoadAndVerify_1338798130970,]y\x04\x00\x00\x00\x00\x00/47_0,1338802866393.413cafe6c61426e26254c197e8c0a6ba.. Split took 7sec {code} Further time passes, and CatalogJanitor deletes the META entry for that region: {code} ./hbase-hbase-master-ip-10-144-69-91.log:2012-06-04 05:47:16,688 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Deleting region TestLoadAndVerify_1338798130970,\\xA2\x04\x00\x00\x00\x00\x00/48_0,1338800158687.50a4617eead34cad335a8dfa727d177d. because daughter splits no longer hold references ./hbase-hbase-master-ip-10-144-69-91.log:2012-06-04 05:47:18,103 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted daughters references, qualifier=splitA and qualifier=splitB, from parent TestLoadAndVerify_1338798130970,\\xA2\x04\x00\x00\x00\x00\x00/48_0,1338800158687.50a4617eead34cad335a8dfa727d177d. ./hbase-hbase-master-ip-10-144-69-91.log:2012-06-04 05:47:18,103 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: DELETING region hdfs://ip-10-10-50-98.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338798130970/50a4617eead34cad335a8dfa727d177d ./hbase-hbase-master-ip-10-144-69-91.log:2012-06-04 05:47:18,145 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region
[jira] [Updated] (HBASE-6160) META entries from daughters can be deleted before parent entries
[ https://issues.apache.org/jira/browse/HBASE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6160: - Attachment: HBASE-6160_v1.patch Attaching a patch for trunk. - Changes CatalogJanitor to not delete split parents, whose parents are still in META. - Adds a test case META entries from daughters can be deleted before parent entries Key: HBASE-6160 URL: https://issues.apache.org/jira/browse/HBASE-6160 Project: HBase Issue Type: Bug Components: client, regionserver Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-6160_v1.patch HBASE-5986 fixed and issue, where the client sees the META entry for the parent, but not the children. However, after the fix, we have seen the following issue in tests: Region A is split to - B, C Region B is split to - D, E After some time, META entry for B is deleted since it is not needed anymore, but META entry for Region A stays in META (C still refers it). In this case, the client throws RegionOfflineException for B. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6160) META entries from daughters can be deleted before parent entries
[ https://issues.apache.org/jira/browse/HBASE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6160: - Status: Patch Available (was: Open) META entries from daughters can be deleted before parent entries Key: HBASE-6160 URL: https://issues.apache.org/jira/browse/HBASE-6160 Project: HBase Issue Type: Bug Components: client, regionserver Affects Versions: 0.94.0, 0.92.2, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-6160_v1.patch HBASE-5986 fixed and issue, where the client sees the META entry for the parent, but not the children. However, after the fix, we have seen the following issue in tests: Region A is split to - B, C Region B is split to - D, E After some time, META entry for B is deleted since it is not needed anymore, but META entry for Region A stays in META (C still refers it). In this case, the client throws RegionOfflineException for B. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6160) META entries from daughters can be deleted before parent entries
[ https://issues.apache.org/jira/browse/HBASE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289572#comment-13289572 ] Enis Soztutar commented on HBASE-6160: -- @Ramkrishna yes, ideally that is the case. But we may end up with this, if for example, the regions are non-uniform. I think we still have to prioritize the ref files in compaction, since they also prevent splitting further. META entries from daughters can be deleted before parent entries Key: HBASE-6160 URL: https://issues.apache.org/jira/browse/HBASE-6160 Project: HBase Issue Type: Bug Components: client, regionserver Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-6160_v1.patch HBASE-5986 fixed and issue, where the client sees the META entry for the parent, but not the children. However, after the fix, we have seen the following issue in tests: Region A is split to - B, C Region B is split to - D, E After some time, META entry for B is deleted since it is not needed anymore, but META entry for Region A stays in META (C still refers it). In this case, the client throws RegionOfflineException for B. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6160) META entries from daughters can be deleted before parent entries
[ https://issues.apache.org/jira/browse/HBASE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6160: - Attachment: HBASE-6160_v2.patch v2 patch addressing Ted's comments. META entries from daughters can be deleted before parent entries Key: HBASE-6160 URL: https://issues.apache.org/jira/browse/HBASE-6160 Project: HBase Issue Type: Bug Components: client, regionserver Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-6160_v1.patch, HBASE-6160_v2.patch HBASE-5986 fixed and issue, where the client sees the META entry for the parent, but not the children. However, after the fix, we have seen the following issue in tests: Region A is split to - B, C Region B is split to - D, E After some time, META entry for B is deleted since it is not needed anymore, but META entry for Region A stays in META (C still refers it). In this case, the client throws RegionOfflineException for B. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6160) META entries from daughters can be deleted before parent entries
[ https://issues.apache.org/jira/browse/HBASE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289618#comment-13289618 ] Enis Soztutar commented on HBASE-6160: -- Thanks Stack, you beat me to the 92, and 94 patches. META entries from daughters can be deleted before parent entries Key: HBASE-6160 URL: https://issues.apache.org/jira/browse/HBASE-6160 Project: HBase Issue Type: Bug Components: client, regionserver Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.92.2, 0.94.1 Attachments: HBASE-6160_v1.patch, HBASE-6160_v2.patch, HBASE-6160_v2.patch, HBASE-6160v2092.txt HBASE-5986 fixed and issue, where the client sees the META entry for the parent, but not the children. However, after the fix, we have seen the following issue in tests: Region A is split to - B, C Region B is split to - D, E After some time, META entry for B is deleted since it is not needed anymore, but META entry for Region A stays in META (C still refers it). In this case, the client throws RegionOfflineException for B. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6168) [replication] Add replication zookeeper state documentation to replication.html
[ https://issues.apache.org/jira/browse/HBASE-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290580#comment-13290580 ] Enis Soztutar commented on HBASE-6168: -- Great doc. minor issues: - the peer name does not have to be an integer, AFAIK. - we can add lock znodes for RS failover. [replication] Add replication zookeeper state documentation to replication.html --- Key: HBASE-6168 URL: https://issues.apache.org/jira/browse/HBASE-6168 Project: HBase Issue Type: Improvement Components: documentation, replication Affects Versions: 0.96.0 Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6168.patch, HBASE-6168v2.patch Add a detailed explanation about the zookeeper state that HBase replication maintains. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5372) Table mutation operations should check table level rights, not global rights
[ https://issues.apache.org/jira/browse/HBASE-5372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar reassigned HBASE-5372: Assignee: Laxman (was: Enis Soztutar) Sure by all means go ahead, I'll assign the issue to you. Table mutation operations should check table level rights, not global rights - Key: HBASE-5372 URL: https://issues.apache.org/jira/browse/HBASE-5372 Project: HBase Issue Type: Sub-task Components: security Reporter: Enis Soztutar Assignee: Laxman getUserPermissions(tableName)/grant/revoke and drop/modify table operations should not check for global CREATE/ADMIN rights, but table CREATE/ADMIN rights. The reasoning is that if a user is able to admin or read from a table, she should be able to read the table's permissions. We can choose whether we want only READ or ADMIN permissions for getUserPermission(). Since we check for global permissions first for table permissions, configuring table access using global permissions will continue to work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291488#comment-13291488 ] Enis Soztutar commented on HBASE-6060: -- This issue, and the other related issues Ram has recently fixed makes me very nervous about all the state combinations distributed between zk / meta / rs-memory and master-memory. After this is done, do you think we can come up with a simpler design? I do not have any particular idea, so just spitballing here. Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1, 0.92.3 Attachments: 6060-94-v3.patch, 6060-94-v4.patch, 6060-94-v4_1.patch, 6060-94-v4_1.patch, 6060-trunk.patch, 6060-trunk.patch, 6060-trunk_2.patch, 6060-trunk_3.patch, HBASE-6060-92.patch, HBASE-6060-94.patch we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6192) Document ACL matrix in the book
Enis Soztutar created HBASE-6192: Summary: Document ACL matrix in the book Key: HBASE-6192 URL: https://issues.apache.org/jira/browse/HBASE-6192 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.96.0 Reporter: Enis Soztutar We have an excellent matrix at https://issues.apache.org/jira/secure/attachment/12531252/Security-ACL%20Matrix.pdf for ACL. Once the changes are done, we can adapt that and put it in the book, also add some more documentation about the new authorization features. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5947) Check for valid user/table/family/qualifier and acl state
[ https://issues.apache.org/jira/browse/HBASE-5947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13292951#comment-13292951 ] Enis Soztutar commented on HBASE-5947: -- bq. No news on that... check for column qualifier require a deep scan or keeping ref-counted qualifiers somewhere. For qualifiers, I think it is fine to not enforce that they exists, but we should check for table / cf. For preCreateTable, and postDelete, we have to do the scan on ACL table, not on the actual table, no? Check for valid user/table/family/qualifier and acl state - Key: HBASE-5947 URL: https://issues.apache.org/jira/browse/HBASE-5947 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl HBase Shell grant/revoke doesn't check for valid user or table/family/qualifier so can you end up having rights for something that doesn't exists. We might also want to ensure, upon table/column creation, that no entries are already stored at the acl table. We might still have residual acl entries if something goes wrong, in postDeleteTable(), postDeleteColumn(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5947) Check for valid user/table/family/qualifier and acl state
[ https://issues.apache.org/jira/browse/HBASE-5947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13292959#comment-13292959 ] Enis Soztutar commented on HBASE-5947: -- Are we sure we want to check for users? Check for valid user/table/family/qualifier and acl state - Key: HBASE-5947 URL: https://issues.apache.org/jira/browse/HBASE-5947 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl HBase Shell grant/revoke doesn't check for valid user or table/family/qualifier so can you end up having rights for something that doesn't exists. We might also want to ensure, upon table/column creation, that no entries are already stored at the acl table. We might still have residual acl entries if something goes wrong, in postDeleteTable(), postDeleteColumn(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4391) Add ability to start RS as root and call mlockall
[ https://issues.apache.org/jira/browse/HBASE-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13292975#comment-13292975 ] Enis Soztutar commented on HBASE-4391: -- I've seen smt similar in accumulo code base: http://svn.apache.org/viewvc/accumulo/trunk/server/src/main/c%2B%2B/mlock/ http://svn.apache.org/viewvc/accumulo/trunk/server/src/main/java/org/apache/accumulo/server/tabletserver/MLock.java?view=log Add ability to start RS as root and call mlockall - Key: HBASE-4391 URL: https://issues.apache.org/jira/browse/HBASE-4391 Project: HBase Issue Type: New Feature Components: regionserver Affects Versions: 0.94.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.96.0 Attachments: HBASE-4391-v0.patch A common issue we've seen in practice is that users oversubscribe their region servers with too many MR tasks, etc. As soon as the machine starts swapping, the RS grinds to a halt, loses ZK session, aborts, etc. This can be combatted by starting the RS as root, calling mlockall(), and then setuid down to the hbase user. We should not require this, but we should provide it as an option. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5947) Check for valid user/table/family/qualifier and acl state
[ https://issues.apache.org/jira/browse/HBASE-5947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293075#comment-13293075 ] Enis Soztutar commented on HBASE-5947: -- Then let's reduce the scope for this issue to be: - Check for table / cf existence in grant. not sure about revoke, since we may end up in an inconsistent state between ACL and table metadata, so revoke can just remove what is available in ACL table. - Ensure that there is no table/cf/qualifier level permissions are stored in ACL in preCreateTable Check for valid user/table/family/qualifier and acl state - Key: HBASE-5947 URL: https://issues.apache.org/jira/browse/HBASE-5947 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl HBase Shell grant/revoke doesn't check for valid user or table/family/qualifier so can you end up having rights for something that doesn't exists. We might also want to ensure, upon table/column creation, that no entries are already stored at the acl table. We might still have residual acl entries if something goes wrong, in postDeleteTable(), postDeleteColumn(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6201) HBase integration/system tests
Enis Soztutar created HBASE-6201: Summary: HBase integration/system tests Key: HBASE-6201 URL: https://issues.apache.org/jira/browse/HBASE-6201 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Integration and general system tests have been discussed previously, and the conclusion is that we need to unify how we do release candidate testing (HBASE-6091). In this issue, I would like to discuss and agree on a general plan, and open subtickets for execution so that we can carry out most of the tests in HBASE-6091 automatically. Initially, here is what I have in mind: 1. Create hbase-it (or hbase-tests) containing forward port of HBASE-4454 (without any tests). This will allow integration test to be run with {code} mvn verify {code} 2. Add ability to run all integration/system tests on a given cluster. Smt like: {code} mvn verify -Dconf=/etc/hbase/conf/ {code} should run the test suite on the given cluster. (Right now we can launch some of the tests (TestAcidGuarantees) from command line). Most of the system tests will be client side, and interface with the cluster through public APIs. We need a tool on top of MiniHBaseCluster or improve HBaseTestingUtility, so that tests can interface with the mini cluster or the actual cluster uniformly. 3. Port candidate unit tests to the integration tests module. Some of the candidates are: - TestAcidGuarantees / TestAtomicOperation - TestRegionBalancing (HBASE-6053) - TestFullLogReconstruction - TestMasterFailover - TestImportExport - TestMultiVersions / TestKeepDeletes - TestFromClientSide - TestShell and src/test/ruby - TestRollingRestart - Test**OnCluster - Balancer tests These tests should continue to be run as unit tests w/o any change in semantics. However, given an actual cluster, they should use that, instead of spinning a mini cluster. 4. Add more tests, especially, long running ingestion tests (goraci, BigTop's TestLoadAndVerify, LoadTestTool), and chaos monkey style fault tests. All suggestions welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6203) Create hbase-it
[ https://issues.apache.org/jira/browse/HBASE-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6203: - Attachment: HBASE-6203_v1.patch Attaching a patch. Create hbase-it --- Key: HBASE-6203 URL: https://issues.apache.org/jira/browse/HBASE-6203 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Attachments: HBASE-6203_v1.patch Create hbase-it, as per parent issue, and re-introduce HBASE-4454 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6203) Create hbase-it
Enis Soztutar created HBASE-6203: Summary: Create hbase-it Key: HBASE-6203 URL: https://issues.apache.org/jira/browse/HBASE-6203 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Create hbase-it, as per parent issue, and re-introduce HBASE-4454 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6203) Create hbase-it
[ https://issues.apache.org/jira/browse/HBASE-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293972#comment-13293972 ] Enis Soztutar commented on HBASE-6203: -- Some notes: {code} mvn verify {code} runs the tests under hbase-it, named IntegrationTestXXX. Note that {{mvn test}} does not run these tests. You can run just the integration tests, but cd'ing under hbase-it module, or use {code} mvn verify -Dskip-server-tests -Dskip-common-tests {code} You can also skip integration tests with {{-Dskip-integration-tests}}. Failsafe also honors {{-DskipTests}}. Create hbase-it --- Key: HBASE-6203 URL: https://issues.apache.org/jira/browse/HBASE-6203 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Attachments: HBASE-6203_v1.patch Create hbase-it, as per parent issue, and re-introduce HBASE-4454 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6201) HBase integration/system tests
[ https://issues.apache.org/jira/browse/HBASE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293981#comment-13293981 ] Enis Soztutar commented on HBASE-6201: -- bq. Bigtop provides a framework for integration tests that is, essentially, 'mvn verify'. Thanks for bringing this up. I know that bigtop provides a test framework for integration tests. From my perspective, I see hbase and bigtop sharing responsibility on the testing side, and we can work to define best practices for this, and would love to hear Bigtop's perspective as well. I completely agree that HBase code, should not bother with deployments, cluster management services, smoke testing, nor integration with other components (hive, pig, etc). Those kind of functionality can belong in BigTop or similar projects. However, some core testing functionality, is better managed by the HBase project. Lets consider the TestMasterFailover test. Right now it is a unit test, testing the internal state transitions, when the master fails. However, we can extend this test to run from the client side, and see whether the transition is transparent when we kill the active master on an actual cluster. That kind of testing, should be managed by HBase itself, because, although they would run from the client side, these kind of tests are hbase-specific, and better managed by Hbase devs. Also, I do not expect BigTop to host a large number of test cases for all of the stack (right now 8 projects). Having said that, in this issue, we can come up with a way to interface with BigTop (and other projects, custom jenkins jobs, etc) so that, these tests can use the underlying deployment, server management, etc services, and BigTop, and others can just execute the HBase internal integration tests on the cluster. A simple way for this is that HBase to offer {{mvn verify}} to be consumed by BigTop, and those tests will use HBase's own scripts (and SSH, etc) for cluster/server management. Since BigTop configures the cluster to be usable by those, it should be ok. HBase integration/system tests -- Key: HBASE-6201 URL: https://issues.apache.org/jira/browse/HBASE-6201 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Integration and general system tests have been discussed previously, and the conclusion is that we need to unify how we do release candidate testing (HBASE-6091). In this issue, I would like to discuss and agree on a general plan, and open subtickets for execution so that we can carry out most of the tests in HBASE-6091 automatically. Initially, here is what I have in mind: 1. Create hbase-it (or hbase-tests) containing forward port of HBASE-4454 (without any tests). This will allow integration test to be run with {code} mvn verify {code} 2. Add ability to run all integration/system tests on a given cluster. Smt like: {code} mvn verify -Dconf=/etc/hbase/conf/ {code} should run the test suite on the given cluster. (Right now we can launch some of the tests (TestAcidGuarantees) from command line). Most of the system tests will be client side, and interface with the cluster through public APIs. We need a tool on top of MiniHBaseCluster or improve HBaseTestingUtility, so that tests can interface with the mini cluster or the actual cluster uniformly. 3. Port candidate unit tests to the integration tests module. Some of the candidates are: - TestAcidGuarantees / TestAtomicOperation - TestRegionBalancing (HBASE-6053) - TestFullLogReconstruction - TestMasterFailover - TestImportExport - TestMultiVersions / TestKeepDeletes - TestFromClientSide - TestShell and src/test/ruby - TestRollingRestart - Test**OnCluster - Balancer tests These tests should continue to be run as unit tests w/o any change in semantics. However, given an actual cluster, they should use that, instead of spinning a mini cluster. 4. Add more tests, especially, long running ingestion tests (goraci, BigTop's TestLoadAndVerify, LoadTestTool), and chaos monkey style fault tests. All suggestions welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6053) Enhance TestRegionRebalancing test to be a system test
[ https://issues.apache.org/jira/browse/HBASE-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6053: - Issue Type: Sub-task (was: Bug) Parent: HBASE-6201 Enhance TestRegionRebalancing test to be a system test -- Key: HBASE-6053 URL: https://issues.apache.org/jira/browse/HBASE-6053 Project: HBase Issue Type: Sub-task Components: test Reporter: Devaraj Das Assignee: Devaraj Das Priority: Minor Attachments: 6053-1.patch, regionRebalancingSystemTest.txt TestRegionRebalancing can be converted to be a system test -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6053) Enhance TestRegionRebalancing test to be a system test
[ https://issues.apache.org/jira/browse/HBASE-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294010#comment-13294010 ] Enis Soztutar commented on HBASE-6053: -- TestRegionRebalancing assumes that there are 1 RS available, and adds other RSs afterwards. What happens when we run this on 10/100 node cluster. We can have more RS, than initial regions. Should we also generalize the testing condition? Or the test will shut down every RS, except for 1, and restart them afterwards? We can remove RandomKiller, not used for now. Enhance TestRegionRebalancing test to be a system test -- Key: HBASE-6053 URL: https://issues.apache.org/jira/browse/HBASE-6053 Project: HBase Issue Type: Sub-task Components: test Reporter: Devaraj Das Assignee: Devaraj Das Priority: Minor Attachments: 6053-1.patch, regionRebalancingSystemTest.txt TestRegionRebalancing can be converted to be a system test -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6201) HBase integration/system tests
[ https://issues.apache.org/jira/browse/HBASE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396326#comment-13396326 ] Enis Soztutar commented on HBASE-6201: -- I think your categorization, and my comments above are telling the same thing, no confusion there. This umbrella issue is all about maintaining #2 kind of tests inside HBase. Now, the problem is how to best interface between HBase and Bigtop. My proposal is that depends on itest-common, and uses it to interact with the servers. My understanding is that, even if you are not deploying the cluster with bigtop, as long as /etc/init.d/ scripts are there, you should be fine. At this point, we only need starting / stopping deamons kind of functionality, assuming the cluster is already deployed. On the other side, if we provide a mvn verify in hbase-it module to run the tests on the actual cluster, I assume BigTop can leverage this to carry out the tests. For refactoring, once the module, and other bits are ready, we can move select tests from Bigtop to HBase. I'll open a subtask for that. HBase integration/system tests -- Key: HBASE-6201 URL: https://issues.apache.org/jira/browse/HBASE-6201 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Integration and general system tests have been discussed previously, and the conclusion is that we need to unify how we do release candidate testing (HBASE-6091). In this issue, I would like to discuss and agree on a general plan, and open subtickets for execution so that we can carry out most of the tests in HBASE-6091 automatically. Initially, here is what I have in mind: 1. Create hbase-it (or hbase-tests) containing forward port of HBASE-4454 (without any tests). This will allow integration test to be run with {code} mvn verify {code} 2. Add ability to run all integration/system tests on a given cluster. Smt like: {code} mvn verify -Dconf=/etc/hbase/conf/ {code} should run the test suite on the given cluster. (Right now we can launch some of the tests (TestAcidGuarantees) from command line). Most of the system tests will be client side, and interface with the cluster through public APIs. We need a tool on top of MiniHBaseCluster or improve HBaseTestingUtility, so that tests can interface with the mini cluster or the actual cluster uniformly. 3. Port candidate unit tests to the integration tests module. Some of the candidates are: - TestAcidGuarantees / TestAtomicOperation - TestRegionBalancing (HBASE-6053) - TestFullLogReconstruction - TestMasterFailover - TestImportExport - TestMultiVersions / TestKeepDeletes - TestFromClientSide - TestShell and src/test/ruby - TestRollingRestart - Test**OnCluster - Balancer tests These tests should continue to be run as unit tests w/o any change in semantics. However, given an actual cluster, they should use that, instead of spinning a mini cluster. 4. Add more tests, especially, long running ingestion tests (goraci, BigTop's TestLoadAndVerify, LoadTestTool), and chaos monkey style fault tests. All suggestions welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6201) HBase integration/system tests
[ https://issues.apache.org/jira/browse/HBASE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396347#comment-13396347 ] Enis Soztutar commented on HBASE-6201: -- bq. Are you saying that you would like the tests themeselves to get involved in the lifecycle of each service? Like bringing them up and down, etc? Yes. At the current state, most of our unit tests, which are candidates to be upgraded to be system tests does start a mini-cluster of n-nodes, load some data, kill a few nodes, verify, etc. We are converting/reimplementing them to do the same things on the actual cluster. A particular test case, for example, starts 4 region servers, put some data, kills 1 RS, checks whether the regions are balanced, kills one more, checks agains, etc. Some basic functionality we can use from itest are: - Starting / stopping / sending a signal to daemons (start a region server on host1, kill master on host2, etc). For both HBase and Hadoop processes. - Basic cluster/node discovery (give me the nodes running hmaster) - Run this command on host3 (SSH) HBase integration/system tests -- Key: HBASE-6201 URL: https://issues.apache.org/jira/browse/HBASE-6201 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Integration and general system tests have been discussed previously, and the conclusion is that we need to unify how we do release candidate testing (HBASE-6091). In this issue, I would like to discuss and agree on a general plan, and open subtickets for execution so that we can carry out most of the tests in HBASE-6091 automatically. Initially, here is what I have in mind: 1. Create hbase-it (or hbase-tests) containing forward port of HBASE-4454 (without any tests). This will allow integration test to be run with {code} mvn verify {code} 2. Add ability to run all integration/system tests on a given cluster. Smt like: {code} mvn verify -Dconf=/etc/hbase/conf/ {code} should run the test suite on the given cluster. (Right now we can launch some of the tests (TestAcidGuarantees) from command line). Most of the system tests will be client side, and interface with the cluster through public APIs. We need a tool on top of MiniHBaseCluster or improve HBaseTestingUtility, so that tests can interface with the mini cluster or the actual cluster uniformly. 3. Port candidate unit tests to the integration tests module. Some of the candidates are: - TestAcidGuarantees / TestAtomicOperation - TestRegionBalancing (HBASE-6053) - TestFullLogReconstruction - TestMasterFailover - TestImportExport - TestMultiVersions / TestKeepDeletes - TestFromClientSide - TestShell and src/test/ruby - TestRollingRestart - Test**OnCluster - Balancer tests These tests should continue to be run as unit tests w/o any change in semantics. However, given an actual cluster, they should use that, instead of spinning a mini cluster. 4. Add more tests, especially, long running ingestion tests (goraci, BigTop's TestLoadAndVerify, LoadTestTool), and chaos monkey style fault tests. All suggestions welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6203) Create hbase-it
[ https://issues.apache.org/jira/browse/HBASE-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar reassigned HBASE-6203: Assignee: Enis Soztutar Create hbase-it --- Key: HBASE-6203 URL: https://issues.apache.org/jira/browse/HBASE-6203 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-6203_v1.patch Create hbase-it, as per parent issue, and re-introduce HBASE-4454 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6053) Enhance TestRegionRebalancing test to be a system test
[ https://issues.apache.org/jira/browse/HBASE-6053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397187#comment-13397187 ] Enis Soztutar commented on HBASE-6053: -- After some discussions, we realized that the patch is too big to handle. I've opened HBASE-6241 for tracking the changes for the HBaseCluster/MiniHBaseCluster/RealHBaseCluster related changes. In this issue, we can track the TestRegionRebalancing-specific changes. Obviously this issue will depend on the new issue. Enhance TestRegionRebalancing test to be a system test -- Key: HBASE-6053 URL: https://issues.apache.org/jira/browse/HBASE-6053 Project: HBase Issue Type: Sub-task Components: test Reporter: Devaraj Das Assignee: Devaraj Das Priority: Minor Attachments: 6053-1.patch, regionRebalancingSystemTest.txt TestRegionRebalancing can be converted to be a system test -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6205) Support an option to keep data of dropped table for some time
[ https://issues.apache.org/jira/browse/HBASE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400756#comment-13400756 ] Enis Soztutar commented on HBASE-6205: -- +1 on using the hdfs trash, it has all the properties we need (configurable, easy to use, and works). We just need a way to reconstruct the table. Support an option to keep data of dropped table for some time - Key: HBASE-6205 URL: https://issues.apache.org/jira/browse/HBASE-6205 Project: HBase Issue Type: New Feature Affects Versions: 0.94.0, 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: HBASE-6205.patch, HBASE-6205v2.patch, HBASE-6205v3.patch, HBASE-6205v4.patch, HBASE-6205v5.patch User may drop table accidentally because of error code or other uncertain reasons. Unfortunately, it happens in our environment because one user make a mistake between production cluster and testing cluster. So, I just give a suggestion, do we need to support an option to keep data of dropped table for some time, e.g. 1 day In the patch: We make a new dir named .trashtables in the rood dir. In the DeleteTableHandler, we move files in dropped table's dir to trash table dir instead of deleting them directly. And Create new class TrashCleaner which will clean dropped tables if it is time out with a period check. Default keep time for dropped tables is 1 day, and check period is 1 hour. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6205) Support an option to keep data of dropped table for some time
[ https://issues.apache.org/jira/browse/HBASE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400946#comment-13400946 ] Enis Soztutar commented on HBASE-6205: -- After an offline conversation with Ted, and Jitendra, it seems that hdfs trash works only for shell. One other concern is that trash is not exposed as hadoop filesystem feature, so we have to use the shell-equivalent commands to accomplish this, and it will work only on hdfs, not other file systems. The question of whether to implement an hbase-thrash boils down to whether we want this to work with file systems other than hdfs, and have more control on the retention policy. Support an option to keep data of dropped table for some time - Key: HBASE-6205 URL: https://issues.apache.org/jira/browse/HBASE-6205 Project: HBase Issue Type: New Feature Affects Versions: 0.94.0, 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: HBASE-6205.patch, HBASE-6205v2.patch, HBASE-6205v3.patch, HBASE-6205v4.patch, HBASE-6205v5.patch User may drop table accidentally because of error code or other uncertain reasons. Unfortunately, it happens in our environment because one user make a mistake between production cluster and testing cluster. So, I just give a suggestion, do we need to support an option to keep data of dropped table for some time, e.g. 1 day In the patch: We make a new dir named .trashtables in the rood dir. In the DeleteTableHandler, we move files in dropped table's dir to trash table dir instead of deleting them directly. And Create new class TrashCleaner which will clean dropped tables if it is time out with a period check. Default keep time for dropped tables is 1 day, and check period is 1 hour. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6241) HBaseCluster interface for interacting with the cluster from system tests
[ https://issues.apache.org/jira/browse/HBASE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6241: - Attachment: HBASE-6241_v0.2.patch Attaching an patch for early view. I am still polishing stuff, but bulk of the patch is pretty much done. i'll upload the candidate for review version once it is done. This is based on the patch for HBASE-6053, but does not include TestRegionRebalance changes. It requires HBASE-6201. Some high-level notes on the patch: - uses hbase-it module, and adds a new test there called IntegrationTestDataIngestWithChaosMonkey. This class runs LoadTestTool with a chaos monkey(http://www.codinghorror.com/blog/2011/04/working-with-the-chaos-monkey.html). Chaos monkey is very sipmle right now, just does selects a random RS, kills and restarts it. - Introduces HBaseCluster, RealHBaseCluster and changes MiniHBaseCluster to extends HBaseCluster. - Adds a ClusterManager interface, and a default HBase shell scripts based HBaseClusterManager. These are internal-classses and tests does not directly refer to them, so we can improve on them, maybe add another implementation when BIGTOP-635 is done. - I've tested the patch on a mini-cluster as well as a 8-node cluster. - Adds an IntegrationTestsDriver class as a driver for running integration tests from command line. You can do bin/hbase --config hbase_conf_dir o.a.h.h.ITD to run all the integration tests on a real cluster. mvn verify runs them on mini cluster. I'll open another issue for mvn verify on real clusters. HBaseCluster interface for interacting with the cluster from system tests -- Key: HBASE-6241 URL: https://issues.apache.org/jira/browse/HBASE-6241 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-6241_v0.2.patch We need to abstract away the cluster interactions for system tests running on actual clusters. MiniHBaseCluster and RealHBaseCluster should both implement this interface, and system tests should work with both. I'll split Devaraj's patch in HBASE-6053 for the initial version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6241) HBaseCluster interface for interacting with the cluster from system tests
[ https://issues.apache.org/jira/browse/HBASE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401847#comment-13401847 ] Enis Soztutar commented on HBASE-6241: -- bq. This JIRA is a sub-task of HBASE-6201 which doesn't have patch attached. bq. Can you clarify the above ? Sorry, it should be HBASE-6203. HBaseCluster interface for interacting with the cluster from system tests -- Key: HBASE-6241 URL: https://issues.apache.org/jira/browse/HBASE-6241 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-6241_v0.2.patch We need to abstract away the cluster interactions for system tests running on actual clusters. MiniHBaseCluster and RealHBaseCluster should both implement this interface, and system tests should work with both. I'll split Devaraj's patch in HBASE-6053 for the initial version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6274) Proto files should be in the same palce
[ https://issues.apache.org/jira/browse/HBASE-6274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401854#comment-13401854 ] Enis Soztutar commented on HBASE-6274: -- Jimmy, this looks like a duplicate of HBASE-6000? Proto files should be in the same palce --- Key: HBASE-6274 URL: https://issues.apache.org/jira/browse/HBASE-6274 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jimmy Xiang Priority: Trivial Fix For: 0.96.0 Currently, proto files are under hbase-server/src/main/protobuf and hbase-server/src/protobuf. It's better to put them together. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5612) Data types for HBase values
[ https://issues.apache.org/jira/browse/HBASE-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401899#comment-13401899 ] Enis Soztutar commented on HBASE-5612: -- At the recent HBase hackaton, and the BOF sessions, we had some discussions about adding some kind of schemas/data types to hbase, and Ian gave a short talk about it. Other than the use cases for this jira, having optional schema-data has the advantages of: - HBase internals can make use of data types (like the block level encoding, comparators for sub-fields in keys, etc) - HBase shell can make use of the data types, and display the data correctly - Hive/Pig can better map their own data-types to hbase types, and their schemas to hbase schema, instead of managing it themselves. - Client written coprocessors or system level coprocessors can do data validation according to the schema and data types. So, what I am trying to say is that we can start to think of a bigger picture for the data types, rather than doing something only for compression/block encoding. WDTY? Data types for HBase values --- Key: HBASE-5612 URL: https://issues.apache.org/jira/browse/HBASE-5612 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin In many real-life applications all values in a certain column family are of a certain data type, e.g. 64-bit integer. We could specify that in the column descriptor and enable data type-specific compression such as variable-length integer encoding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6205) Support an option to keep data of dropped table for some time
[ https://issues.apache.org/jira/browse/HBASE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402436#comment-13402436 ] Enis Soztutar commented on HBASE-6205: -- Considering this, HBASE-5547, and snapshots, it seems that we can decouple file management, and region-file association. We can build a very lightweight file manager, and remove all file deletion code from RS code. As in the bigtable design, we can keep the current hfile's (and WAL's) of the regions in META, and RS flushes, or rolls the log, adds the file reference at META. Then for a snapshot or a backup, we just need a point-in-time snapshot of the META table. A master thread can periodically scan the META, and META snapshots and the hdfs directories, and delete the files with 0 reference based on a policy. And deleting the table will just take a META snapshot for the table, and delete the META entries afterwards. This META snapshot will be kept for a while (similar to the normal snapshot retention). WDYT, how crazy is this? Support an option to keep data of dropped table for some time - Key: HBASE-6205 URL: https://issues.apache.org/jira/browse/HBASE-6205 Project: HBase Issue Type: New Feature Affects Versions: 0.94.0, 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: HBASE-6205.patch, HBASE-6205v2.patch, HBASE-6205v3.patch, HBASE-6205v4.patch, HBASE-6205v5.patch User may drop table accidentally because of error code or other uncertain reasons. Unfortunately, it happens in our environment because one user make a mistake between production cluster and testing cluster. So, I just give a suggestion, do we need to support an option to keep data of dropped table for some time, e.g. 1 day In the patch: We make a new dir named .trashtables in the rood dir. In the DeleteTableHandler, we move files in dropped table's dir to trash table dir instead of deleting them directly. And Create new class TrashCleaner which will clean dropped tables if it is time out with a period check. Default keep time for dropped tables is 1 day, and check period is 1 hour. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6203) Create hbase-it
[ https://issues.apache.org/jira/browse/HBASE-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403361#comment-13403361 ] Enis Soztutar commented on HBASE-6203: -- How about waiting for HBASE-6241 and committing this and that consecutively? Create hbase-it --- Key: HBASE-6203 URL: https://issues.apache.org/jira/browse/HBASE-6203 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-6203_v1.patch Create hbase-it, as per parent issue, and re-introduce HBASE-4454 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6203) Create hbase-it
[ https://issues.apache.org/jira/browse/HBASE-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403647#comment-13403647 ] Enis Soztutar commented on HBASE-6203: -- HBASE-6241 is currently a large patch, and I don't want to add more complexity to it. Let's keep the patches separate. For convenience though, I'll upload both merged and unmerged patches at HBASE-6241. Create hbase-it --- Key: HBASE-6203 URL: https://issues.apache.org/jira/browse/HBASE-6203 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-6203_v1.patch Create hbase-it, as per parent issue, and re-introduce HBASE-4454 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6241) HBaseCluster interface for interacting with the cluster from system tests
[ https://issues.apache.org/jira/browse/HBASE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403661#comment-13403661 ] Enis Soztutar commented on HBASE-6241: -- @Ted thanks for comments. I've addressed most of them. I've uploaded an updated version of the patch: https://reviews.apache.org/r/5653/. I guess RB still does not post to jira. HBaseCluster interface for interacting with the cluster from system tests -- Key: HBASE-6241 URL: https://issues.apache.org/jira/browse/HBASE-6241 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-6241_v0.2.patch We need to abstract away the cluster interactions for system tests running on actual clusters. MiniHBaseCluster and RealHBaseCluster should both implement this interface, and system tests should work with both. I'll split Devaraj's patch in HBASE-6053 for the initial version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6203) Create hbase-it module
[ https://issues.apache.org/jira/browse/HBASE-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404170#comment-13404170 ] Enis Soztutar commented on HBASE-6203: -- @Jesse, Agreed. We now have a switch for every module, but -Donly-integration-tests might not be necessary since you can do {{cd hbase-it;mvn-verify}}. With maven modules, I think the standard way to only execute one module is to cd into that module and execute. I'll add that to the doc. Do you think that would be enough? @Stack Thanks a lot for the docs, I totally missed that. failsafe is basically a fork of surefire, which only adds pre- and post-integration-tests targets. We are not using them right now, but we can make use of those targets, for recovering the cluster after the test for example. I totally share your doubts about this failsafe/surefire thing, but I do not know of any better solution. Suggestions welcome :) mvn verify now executes the classes {{IntegrationTestXXX}} as unit tests. There is also an IntegrationTestsDriver class in the patch at HBASE-6241, which is executed by: {code} bin/hbase --config org.apache.hadoop.hbase.IntegrationTestsDriver {code} I'll open another subtask for HBASE-6201, for making {{mvn verify}} to work with real clusters. I have checked how bigtop does it, and it seems they have: {code} bigtop-tests/test-artifacts/ -- contains actual test code bigtop-tests/test-execution/ -- contains code + configuration for executing the tests {code} Especially, if you look into {{bigtop-tests/test-execution/smokes/hbase/pom.xml}}, it passes HBASE_HOME, HBASE_CONF_DIR, etc from evn to failsafe. It works for bigtop, so I think we can make it work for our cases as well. Create hbase-it module -- Key: HBASE-6203 URL: https://issues.apache.org/jira/browse/HBASE-6203 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.96.0 Attachments: HBASE-6203_v1.patch, it-doc.txt Create hbase-it, as per parent issue, and re-introduce HBASE-4454 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6241) HBaseCluster interface for interacting with the cluster from system tests
[ https://issues.apache.org/jira/browse/HBASE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404174#comment-13404174 ] Enis Soztutar commented on HBASE-6241: -- @Ted Thanks for trying it out. Did you run mvn verify at the top level or cd'ing into hbase-it. hbase-it depends on hbase-server, so it fetches hbase-common and other jars transitively, but you might have to do mvn install -DskipTests first. HBaseCluster interface for interacting with the cluster from system tests -- Key: HBASE-6241 URL: https://issues.apache.org/jira/browse/HBASE-6241 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-6241_v0.2.patch, HBASE-6241_v1.patch We need to abstract away the cluster interactions for system tests running on actual clusters. MiniHBaseCluster and RealHBaseCluster should both implement this interface, and system tests should work with both. I'll split Devaraj's patch in HBASE-6053 for the initial version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6203) Create hbase-it module
[ https://issues.apache.org/jira/browse/HBASE-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404331#comment-13404331 ] Enis Soztutar commented on HBASE-6203: -- bq. which is kind of a pain such is maven :) My only concern for the flag is that, for implementing it, every module has to know about and honor only-integration-tests parameter, which seems not clean to me. I am not a maven guy, if you have a suggestion, I'll be more than happy to try it out. Can we instruct reactor to only run test-compile for everything, but test just for hbase-it? Create hbase-it module -- Key: HBASE-6203 URL: https://issues.apache.org/jira/browse/HBASE-6203 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.96.0 Attachments: HBASE-6203_v1.patch, it-doc.txt Create hbase-it, as per parent issue, and re-introduce HBASE-4454 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6302) Document how to run integration tests
[ https://issues.apache.org/jira/browse/HBASE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar reassigned HBASE-6302: Assignee: Enis Soztutar Document how to run integration tests - Key: HBASE-6302 URL: https://issues.apache.org/jira/browse/HBASE-6302 Project: HBase Issue Type: Bug Components: documentation Reporter: stack Assignee: Enis Soztutar Priority: Blocker Fix For: 0.96.0 HBASE-6203 has attached the old IT doc with some mods. When we figure how ITs are to be run, update it and apply the documentation under this issue. Making a blocker against 0.96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)
[ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416003#comment-13416003 ] Enis Soztutar commented on HBASE-5754: -- @Lars, We have been running this for a while as nightlies, and apart from the reported HBASE-5986, HBASE-6060, and HBASE-6160, we did not run into more issues. All of them can be considered META issues w/o actual data loss. Let's see what Eric would say. data lost with gora continuous ingest test (goraci) --- Key: HBASE-5754 URL: https://issues.apache.org/jira/browse/HBASE-5754 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Environment: 10 node test cluster Reporter: Eric Newton Assignee: stack Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends. I put a billion entries into HBase, and ran the Verify map/reduce job. The verification failed because about 21K entries were missing. The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data. I re-ran the test with 100 million entries, and it verified successfully. Both of the times I tested using a billion entries, the verification failed. If I run the verification step twice, the results are consistent, so the problem is probably not on the verify step. Here's the versions of the various packages: ||package||version|| |hadoop|0.20.205.0| |hbase|0.92.1| |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277| |goraci|https://github.com/ericnewton/goraci tagged 2012-04-08| The change I made to goraci was to configure it for hbase and to allow it to build properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6400) Add getMasterAdmin() and getMasterMonitor() to HConnection
Enis Soztutar created HBASE-6400: Summary: Add getMasterAdmin() and getMasterMonitor() to HConnection Key: HBASE-6400 URL: https://issues.apache.org/jira/browse/HBASE-6400 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar HConnection used to have getMasterInterface(), but after HBASE-6039 it has been removed. I think we need to expose HConnection.getMasterAdmin() and getMasterMonitor() a la HConnection.getAdmin(), and getClient(). HConnectionImplementation has getKeepAliveMasterAdmin() but, I see no reason to leak keep alive classes to upper layers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6400) Add getMasterAdmin() and getMasterMonitor() to HConnection
[ https://issues.apache.org/jira/browse/HBASE-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6400: - Attachment: HBASE-6400_v1.patch Attaching a very simple patch that does the task. BTW, I did need these for the patch at HBASE-6241. Add getMasterAdmin() and getMasterMonitor() to HConnection -- Key: HBASE-6400 URL: https://issues.apache.org/jira/browse/HBASE-6400 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-6400_v1.patch HConnection used to have getMasterInterface(), but after HBASE-6039 it has been removed. I think we need to expose HConnection.getMasterAdmin() and getMasterMonitor() a la HConnection.getAdmin(), and getClient(). HConnectionImplementation has getKeepAliveMasterAdmin() but, I see no reason to leak keep alive classes to upper layers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)
[ https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13416243#comment-13416243 ] Enis Soztutar commented on HBASE-5754: -- Just FYI in case, I've been working on adding long running ingestion tests while randomly killing of servers, and other types of integration tests over at HBASE-6241, HBASE-6201. Feel free to chime in. data lost with gora continuous ingest test (goraci) --- Key: HBASE-5754 URL: https://issues.apache.org/jira/browse/HBASE-5754 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Environment: 10 node test cluster Reporter: Eric Newton Assignee: stack Keith Turner re-wrote the accumulo continuous ingest test using gora, which has both hbase and accumulo back-ends. I put a billion entries into HBase, and ran the Verify map/reduce job. The verification failed because about 21K entries were missing. The goraci [README|https://github.com/keith-turner/goraci] explains the test, and how it detects missing data. I re-ran the test with 100 million entries, and it verified successfully. Both of the times I tested using a billion entries, the verification failed. If I run the verification step twice, the results are consistent, so the problem is probably not on the verify step. Here's the versions of the various packages: ||package||version|| |hadoop|0.20.205.0| |hbase|0.92.1| |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277| |goraci|https://github.com/ericnewton/goraci tagged 2012-04-08| The change I made to goraci was to configure it for hbase and to allow it to build properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6241) HBaseCluster interface for interacting with the cluster from system tests
[ https://issues.apache.org/jira/browse/HBASE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418668#comment-13418668 ] Enis Soztutar commented on HBASE-6241: -- Thanks Stack for the review. I put up an updated patch at RB (after some sweet time-off :) ). HBaseCluster interface for interacting with the cluster from system tests -- Key: HBASE-6241 URL: https://issues.apache.org/jira/browse/HBASE-6241 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-6241_v0.2.patch, HBASE-6241_v1.patch We need to abstract away the cluster interactions for system tests running on actual clusters. MiniHBaseCluster and RealHBaseCluster should both implement this interface, and system tests should work with both. I'll split Devaraj's patch in HBASE-6053 for the initial version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6203) Create hbase-it module
[ https://issues.apache.org/jira/browse/HBASE-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6203: - Resolution: Fixed Release Note: Adds a new module hbase-it, which contains integration and system tests Status: Resolved (was: Patch Available) Resolving this issue as it has been committed Create hbase-it module -- Key: HBASE-6203 URL: https://issues.apache.org/jira/browse/HBASE-6203 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.96.0 Attachments: HBASE-6203_v1.patch, it-doc.txt Create hbase-it, as per parent issue, and re-introduce HBASE-4454 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6400) Add getMasterAdmin() and getMasterMonitor() to HConnection
[ https://issues.apache.org/jira/browse/HBASE-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6400: - Resolution: Fixed Status: Resolved (was: Patch Available) Resolving this, since it is committed. Add getMasterAdmin() and getMasterMonitor() to HConnection -- Key: HBASE-6400 URL: https://issues.apache.org/jira/browse/HBASE-6400 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.96.0 Attachments: 6400-v2.patch, HBASE-6400_v1.patch HConnection used to have getMaster() which returns HMasterInterface, but after HBASE-6039 it has been removed. I think we need to expose HConnection.getMasterAdmin() and getMasterMonitor() a la HConnection.getAdmin(), and getClient(). HConnectionImplementation has getKeepAliveMasterAdmin() but, I see no reason to leak keep alive classes to upper layers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6462) TestAcidGuarantees failed on trunk
Enis Soztutar created HBASE-6462: Summary: TestAcidGuarantees failed on trunk Key: HBASE-6462 URL: https://issues.apache.org/jira/browse/HBASE-6462 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Enis Soztutar I've seen TestAcidGurantees fail with: {code} testGetAtomicity(org.apache.hadoop.hbase.IntegrationTestAcidGuaranteesWithChaosMonkey) Time elapsed: 42.611 sec ERROR! java.lang.RuntimeException: Deferred at org.apache.hadoop.hbase.MultithreadedTestUtil$TestContext.checkException(MultithreadedTestUtil.java:76) at org.apache.hadoop.hbase.MultithreadedTestUtil$TestContext.stop(MultithreadedTestUtil.java:103) at org.apache.hadoop.hbase.TestAcidGuarantees.runTestAtomicity(TestAcidGuarantees.java:298) at org.apache.hadoop.hbase.TestAcidGuarantees.runTestAtomicity(TestAcidGuarantees.java:248) at org.apache.hadoop.hbase.IntegrationTestAcidGuaranteesWithChaosMonkey.testGetAtomicity(IntegrationTestAcidGuaranteesWithChaosMonkey.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:234) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:133) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:114) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:188) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:166) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:86) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:101) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74) Caused by: java.lang.RuntimeException: Failed after 147200!Expected=\x1BT\xC0i\x0CW\x9B\x108\xA0Got: test_row_0/A:col0/1343328428704/Put/vlen=10/ts=0 val= \x1BT\xC0i\x0CW\x9B\x108\xA0 test_row_0/A:col1/1343328428704/Put/vlen=10/ts=0 val= \x1BT\xC0i\x0CW\x9B\x108\xA0 test_row_0/A:col10/1343328428704/Put/vlen=10/ts=0 val= \x1BT\xC0i\x0CW\x9B\x108\xA0 ... test_row_0/B:col0/1343328425510/Put/vlen=10/ts=0 val= 4G\xE1T\x1B\xFDa\x98\xAC\xB6 test_row_0/B:col1/1343328425510/Put/vlen=10/ts=0 val= 4G\xE1T\x1B\xFDa\x98\xAC\xB6 test_row_0/B:col10/1343328425510/Put/vlen=10/ts=0 val= ... test_row_0/C:col0/1343328425510/Put/vlen=10/ts=0 val= 4G\xE1T\x1B\xFDa\x98\xAC\xB6 test_row_0/C:col1/1343328425510/Put/vlen=10/ts=0 val= 4G\xE1T\x1B\xFDa\x98\xAC\xB6 test_row_0/C:col10/1343328425510/Put/vlen=10/ts=0 val= {code} Might be related to HBASE-2856, but haven't had the time to check the root cause. The flusher thread was on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6462) TestAcidGuarantees failed on trunk
[ https://issues.apache.org/jira/browse/HBASE-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423489#comment-13423489 ] Enis Soztutar commented on HBASE-6462: -- Another stack trace I got now has the 3rd CF with different values: {code} Caused by: java.lang.RuntimeException: Failed after 98200!Expected=\xFF[\x0B_\xAF\xCAQJ\xBDKGot: test_row_2/A:col0/1343337923715/Put/vlen=10/ts=0 val= \xFF[\x0B_\xAF\xCAQJ\xBDK test_row_2/A:col1/1343337923715/Put/vlen=10/ts=0 val= \xFF[\x0B_\xAF\xCAQJ\xBDK .. test_row_2/B:col8/1343337923715/Put/vlen=10/ts=0 val= \xFF[\x0B_\xAF\xCAQJ\xBDK test_row_2/B:col9/1343337923715/Put/vlen=10/ts=0 val= \xFF[\x0B_\xAF\xCAQJ\xBDK .. test_row_2/C:col0/1343337921472/Put/vlen=10/ts=0 val= \xEA\xD0\x15\xFB\xC0\xE7\xE3\xA0\xDB^ test_row_2/C:col1/1343337921472/Put/vlen=10/ts=0 val= \xEA\xD0\x15\xFB\xC0\xE7\xE3\xA0\xDB^ .. {code} TestAcidGuarantees failed on trunk -- Key: HBASE-6462 URL: https://issues.apache.org/jira/browse/HBASE-6462 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Enis Soztutar I've seen TestAcidGurantees fail with: {code} testGetAtomicity(org.apache.hadoop.hbase.IntegrationTestAcidGuaranteesWithChaosMonkey) Time elapsed: 42.611 sec ERROR! java.lang.RuntimeException: Deferred at org.apache.hadoop.hbase.MultithreadedTestUtil$TestContext.checkException(MultithreadedTestUtil.java:76) at org.apache.hadoop.hbase.MultithreadedTestUtil$TestContext.stop(MultithreadedTestUtil.java:103) at org.apache.hadoop.hbase.TestAcidGuarantees.runTestAtomicity(TestAcidGuarantees.java:298) at org.apache.hadoop.hbase.TestAcidGuarantees.runTestAtomicity(TestAcidGuarantees.java:248) at org.apache.hadoop.hbase.IntegrationTestAcidGuaranteesWithChaosMonkey.testGetAtomicity(IntegrationTestAcidGuaranteesWithChaosMonkey.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:234) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:133) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:114) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:188) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:166) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:86) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:101) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74) Caused by: java.lang.RuntimeException: Failed after 147200!Expected=\x1BT\xC0i\x0CW\x9B\x108\xA0Got: test_row_0/A:col0/1343328428704/Put/vlen=10/ts=0 val= \x1BT\xC0i\x0CW\x9B\x108\xA0 test_row_0/A:col1/1343328428704/Put/vlen=10/ts=0 val= \x1BT\xC0i\x0CW\x9B\x108\xA0
[jira] [Created] (HBASE-6469) Failure on enable/disable table will cause table state in zk to be left as enabling/disabling until master is restart
Enis Soztutar created HBASE-6469: Summary: Failure on enable/disable table will cause table state in zk to be left as enabling/disabling until master is restart Key: HBASE-6469 URL: https://issues.apache.org/jira/browse/HBASE-6469 Project: HBase Issue Type: Bug Affects Versions: 0.96.0, 0.94.2 Reporter: Enis Soztutar Assignee: Enis Soztutar In Enable/DisableTableHandler code, if something goes wrong in handling, the table state in zk is left as ENABLING / DISABLING. After that we cannot force any more action from the API or CLI, and the only recovery path is restarting the master. {code} if (done) { // Flip the table to enabled. this.assignmentManager.getZKTable().setEnabledTable( this.tableNameStr); LOG.info(Table ' + this.tableNameStr + ' was successfully enabled. Status: done= + done); } else { LOG.warn(Table ' + this.tableNameStr + ' wasn't successfully enabled. Status: done= + done); } {code} Here, if done is false, the table state is not changed. There is also no way to set skipTableStateCheck from cli / api. We have run into this issue a couple of times before. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6302) Document how to run integration tests
[ https://issues.apache.org/jira/browse/HBASE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6302: - Issue Type: Sub-task (was: Bug) Parent: HBASE-6201 Document how to run integration tests - Key: HBASE-6302 URL: https://issues.apache.org/jira/browse/HBASE-6302 Project: HBase Issue Type: Sub-task Components: documentation Reporter: stack Assignee: Enis Soztutar Priority: Blocker Fix For: 0.96.0 HBASE-6203 has attached the old IT doc with some mods. When we figure how ITs are to be run, update it and apply the documentation under this issue. Making a blocker against 0.96. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira