[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295453#comment-13295453 ] Andrew Purtell commented on HBASE-4821: --- This discussion was carried forward into HBASE-6201 ? > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262131#comment-13262131 ] Enis Soztutar commented on HBASE-4821: -- Yep, I was referring to a shim layer + utilities to run against deployed or local cluster. Let me check out what we have in bigtop. > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261993#comment-13261993 ] Jonathan Hsieh commented on HBASE-4821: --- @Enis I think may be asking if Bigtop already has a shim layer? There is a little utility class (o.a.bigtop.itest.hbase.util.HBaseTestUtil) that Bigtop HBase tests run against. The tests tend to be written in a style that are unit tests or run as main. This could use a little polish -- maybe making a trimmed down o.a.h.hbase.HBastTestingUtility that doesn't expose internals. https://github.com/apache/bigtop/blob/trunk/bigtop-tests/test-artifacts/hbase/src/main/groovy/org/apache/bigtop/itest/hbase/util/HBaseTestUtil.java > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261981#comment-13261981 ] Roman Shaposhnik commented on HBASE-4821: - @Enis, @Jon, yes you'd have to provide knobs in the tests as to what the desired data size is. TestLoadAndVerify already does that and all the others should follow. > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261980#comment-13261980 ] Roman Shaposhnik commented on HBASE-4821: - @Enis, I'm not sure what API are yous asking for. Bigtop currently provides a way of quickly deploying a fully distributed clusters using puppet code. You really don't want to burden your tests with deployment logic -- hence a choice of a full fledged CM system: puppet. We're already using that for nightly testing of the entire Hadoop stack on ~5 nodes fully distributed clusters that we sping on-demand on EC2. Here's a job that does that: http://bigtop01.cloudera.org:8080/view/Test/job/DeployCluster/ And here's how a test runs looks like (bear with me while I fix a couple of things that got broken after Bigtop's trunk migration to Hadoop 2.X): http://bigtop01.cloudera.org:8080/view/Test/job/SmokeCluster/ > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261973#comment-13261973 ] Enis Soztutar commented on HBASE-4821: -- Yeah, it makes sense. Agreed that we want to run HBase MR kind of tests as both unit tests and #2 tests at a larger scale. What I wanted to ask actually was whether bigtop already provides such an API, or shall we develop one in bigtop. One other consideration is to abstract away the data for the tests. When run in a local cluster, we want to finish in a reasonable time, but when run on a 5-node cluster or a 100-node cluster, the tests should reasonable stress the cluster accordingly. > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261960#comment-13261960 ] Jonathan Hsieh commented on HBASE-4821: --- TestLoadAndVerify is a Bigtop test currently, but others that might fit into Roman's category #2 include any of the HBase MR tests or tool-sy tests like TestImportTsv, TestImportExport, (possibly the thrift/rest/avro servers) and some of the other long running external-api only tests like TestAcidGuarantee. Also another purpose of the shim layer is to provide an abstraction layer so the same code is used against a minicluster when run in a HBase context or against a real cluster in the Bigtop context. It would a thinner interface than Mini*Cluster that does not expose internals. I haven't thought this out completely yet but it could potentially be useful for dealing Hadoop1 vs Hadoop2 issues as well. > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261885#comment-13261885 ] Roman Shaposhnik commented on HBASE-4821: - @Enis, I had a really nice chat with Jon yesterday and we arrived at a common understanding that the tests in general fall into 3 distinct categories (please note that this categories classify test *implementation* and not whether they are used as part of the mvn test, mvn verify or Bigtop's test infra -- more on that later): # pure unit tests -- they reach into the guts of the implementation and use non-public APIs. There's absolutely no way to run that testcode on anything but MiniHBase/MiniDFS/MiniMR. Bigtop has no role to play in helping HBase community with developing/maintaining/executing those tests. # HBase-specific functional tests -- these are the tests that only use public APIs and don't muck about with internals. They are, however, only concerned with HBase itself. IOW, a test that wants to verify that you can submit an Oozie workflow that has Hive->HBASE->Pig pipeline does not fall into this category # Integration tests -- these are the multi-component tests that exercise not just HBase but a # of different components. An above example of the Oozie workflow falls into this category. Here's how an ideal situation looks from Bigtop's perspective: * you guys totally take care of #1 and you implement it as usual unit tests. * Bigtop (with your help) takes care of #3. It simply makes no sense to reproduce the same infra at the HBase level. * A proposal on #2 is this -- these tests belong to HBase. However, they have to be clearly marked as belonging to the functional class AND they have to utilize a very thin shim layer so you can use them in an mvn verify context and we can reuse them in Bigtop running against a fully distributed beefy clusters. At this point I'm convinced that TestLoadAndVerify should be the first example of this class and it should reside in HBase codebase (yet be available to Bigtop). Let me know if this makes sense. > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261855#comment-13261855 ] Enis Soztutar commented on HBASE-4821: -- DD and I also want to commit some resources into developing/maintaining/running such tests. We are also willing to allocate some cluster resources into running the tests for extended periods of time. @Mikhail, do you have anything planned yet? To go further with this, I think a short test design doc would be a great start, wdyt? @Keith, @Stack, do you think we should port goraci inside hbase or bigtop? @Roman, I love the idea that bigtop provides services for deployment, and running e2e (end to end) tests. But in my experience, maintaining the actual tests (code, logic, etc) will be a lot easier if the code resides inside hbase. Does bigtop provide that kind of use case? > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245753#comment-13245753 ] Keith Turner commented on HBASE-4821: - One other comment about random walk test vs functional test. Random walk will also test adding splits in many different combination with other table operations. For example a random walker may split, merge, update, split, split, delete, etc. A functional test for add splits will not randomly mix table operations like this. It will generally always test the same table operation in the same order. > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245705#comment-13245705 ] Keith Turner commented on HBASE-4821: - I noticed an early comment about Python code in the Accumulo test dir. This is code in test/auto and we call these functional test. This code is probably similar to some HBase unit test. It supports test that run against a live instance of Accumulo. The test framework starts an instance of Accumulo, runs a python or JAva test against that instance, and then shuts the instance down. Running all of the functional test takes 1 to 2 hours. This test framework was written before random walk and it ensures basic functionality works. For example theres a test to verify that adding split points to a table continues to work. Since we have implemented random walk, I have found myself writing a lot more random walk test and less functional test. The reason for this is that the functional test usually test the feature when the system is one state, where as random walk test the same feature with the system in many different states. For example a random walk test that adds splits points to a table will try to do that when the table and system are in many different states. It may try to add the split when a tablet/region is migrating, currently splitting, minor compacting, major compacting, offline, etc. So the likelyhood of finding a bug with addsplits() in randomwalk is much greater than the functional test. The functional test will detect if the feature is completely broken, random walk can detect if the feature is broken under certain circumstances. > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245557#comment-13245557 ] Keith Turner commented on HBASE-4821: - Eric Newton has been experimenting with running goraci against HBase. One issue he ran into was that gora-hbase uses auto flush on every HTable connection. This really slowed down ingest. He modified the gora code locally so it would not do this. Eric posted a question on the gora user list asking why it behaved this way. The Gora API has a flush call. > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245536#comment-13245536 ] stack commented on HBASE-4821: -- @Keith Welcome. Thanks for the nice fat comment. I'm not sure I want to run your randomwalk test if its going to generate that many bugs in hbase (smile). Let us take a looksee at the continuous ingest Good on you Keith. > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245528#comment-13245528 ] Keith Turner commented on HBASE-4821: - I am an Accumulo developer, there is some cruft in our test dir. The two most successful cluster test we have are continuous ingest and random walk. We have found lots of bugs w/ these test. I wrote a Gora version of continuous ingest that should run against HBASE. The readme on github has a nice description. https://github.com/keith-turner/goraci/ The accumulo version of continuous ingest can be found here. http://svn.apache.org/repos/asf/accumulo/tags/1.4.0/test/system/continuous/ This dir contains an old set of open office slides that also give an overview of continuous ingest. At the end of the slides is the beginning of the idea of random walk test. I am not sure if we have a nice description of random walk anywhere. It is a fairly simple test framework. You write test nodes in Java and link the nodes together in a graph using XML. You start a test clients each node in a cluster. The test client just does a random walk of the test graph. We have found a ton of bugs in 1.3 and 1.4 using random walk. Actually the Accumulo features page may be the only place we give an overview of randomwalk. I noticed that our random walk readme only tells you how to run it, not what it is. Below is a link to the random walk test, but like I said its not very informative. http://svn.apache.org/repos/asf/accumulo/tags/1.4.0/test/system/randomwalk/ The actual Java code at the link below. The framework and test nodes code is all here. http://svn.apache.org/repos/asf/accumulo/tags/1.4.0/src/server/src/main/java/org/apache/accumulo/server/test/randomwalk/ The short description of randomwalk I mentioned is here. http://accumulo.apache.org/notable_features.html#testing If anyone is interested in generalizing random walk so that HBase could use it to, let me know. One last thing. We tested Accumulo for over a month on a 10 node cluster using Continuous ingest, Random Walk, and the Agitator. Below are some of the bugs we found during that time period. [Bugs found in 1.4 testing|https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=labels+%3D+14_qa_bug] > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153385#comment-13153385 ] stack commented on HBASE-4821: -- +1 on 'extremely easy to use' > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153319#comment-13153319 ] Mikhail Bautin commented on HBASE-4821: --- Thanks everyone for your comments. I will read up on BigTop and Puppet. I am also fine with using a JVM-based language for load tests themselves, as long as there is a way to do something like "kill -9", which we can't really do in our unit tests. We could also try to reuse/modify the MiniHBaseCluster framework to talk to a real HBase cluster and script various distributed test scenarios in pure Java. However, I want to emphasize one thing. Once configured, this HBase integration test tool should be extremely easy to use, as simple as: hbase_integration_test.sh . We might have to write some amount of nontrivial "glue" script code (e.g. in bash) to make that happen. > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153256#comment-13153256 ] Konstantin Boudnik commented on HBASE-4821: --- +1 on BigTop approach: it has been proven once and again to be a consistent and repeatable environment for stacks' (no pun intended ;) integration testing. It has well thought separation of concerns in place (as has been pointed by Roman). Besides, I am not sure I am buying into obsession with Python (or Ruby, etc.) when it comes to work with software written in JVM languages: why one needs to block himself an access to all nice things the platform provides. If a scripting language is desirable - and I can totally buy that - be Groovy with your Java apps ;) > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin >Priority: Critical > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153237#comment-13153237 ] stack commented on HBASE-4821: -- +1 on at least trying bigtop till we figure why it just won't work for us Mikhail, Accumulo, the other bigtable clone, has an integration suite that is made of python and c hackings. You can check it out here: http://svn.apache.org/viewvc/incubator/accumulo/trunk/test/ I'd like a single integration suite/framework that could be run on amz, locally, on emc's 1k cluster that they are talking of donating to apache, etc. I'd like to help w/ t his project. > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase
[ https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153223#comment-13153223 ] Roman Shaposhnik commented on HBASE-4821: - Bigtop (http://incubator.apache.org/projects/bigtop.html) aims at providing a platform for the creations of exactly the kind of tests you're talking about *across* all the projects of a Hadoop stack. Granted, we're a young project and our test management framework is nowhere near the scope and quality of something like TestNG, but I think it will be helpful to invest in improving it. To give you a quick idea of the baseline architecture, here's what Bigtop testing framework assumes: 1. all tests are implemented as code running on top of JVM. We don't care what language it is (Java, Groovy, Clojure, etc) as long as at the end of the day there's a bunch of class files generated. 2. all tests are packaged/versioned as Maven artifacts 3. all test data is packaged/versioned as Maven artifacts 4. the entry point into test execution is via Junit/TestNG-style 5. tests NEVER concern themselves with deployment (we've got Puppet for that) 6. tests NEVER concern themselves with configuration (we've got Puppet for that) If you think Bigtop can serve as a reasonable platform for what you're trying to accomplish, lets continue this discussion over at bigtop-dev@incubator (and Bigtop JIRAs). > A fully automated comprehensive distributed integration test for HBase > -- > > Key: HBASE-4821 > URL: https://issues.apache.org/jira/browse/HBASE-4821 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin > > To properly verify that a particular version of HBase is good for production > deployment we need a better way to do real cluster testing after incremental > changes. Running unit tests is good, but we also need to deploy HBase to a > cluster, run integration tests, load tests, Thrift server tests, kill some > region servers, kill the master, and produce a report. All of this needs to > happen in 20-30 minutes with minimal manual intervention. I think this way we > can combine agile development with high stability of the codebase. I am > envisioning a high-level framework written in a scripting language (e.g. > Python) that would abstract external operations such as "deploy to test > cluster", "kill a particular server", "run load test A", "run load test B" > (we already have a few kinds of load tests implemented in Java, and we could > write a Thrift load test in Python). This tool should also produce > intermediate output, allowing to catch problems early and restart the test. > No implementation has yet been done. Any ideas or suggestions are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira