[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-06-14 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295453#comment-13295453
 ] 

Andrew Purtell commented on HBASE-4821:
---

This discussion was carried forward into HBASE-6201 ?

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-04-25 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261855#comment-13261855
 ] 

Enis Soztutar commented on HBASE-4821:
--

DD and I also want to commit some resources into developing/maintaining/running 
such tests. We are also willing to allocate some  cluster resources into 
running the tests for extended periods of time. 

@Mikhail, do you have anything planned yet? To go further with this, I think a 
short test design doc would be a great start, wdyt? 

@Keith, @Stack, do you think we should port goraci inside hbase or bigtop? 

@Roman, I love the idea that bigtop provides services for deployment, and 
running e2e (end to end) tests. But in my experience, maintaining the actual 
tests (code, logic, etc) will be a lot easier if the code resides inside hbase. 
Does bigtop provide that kind of use case?

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-04-25 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261885#comment-13261885
 ] 

Roman Shaposhnik commented on HBASE-4821:
-

@Enis,

I had a really nice chat with Jon yesterday and we arrived at a common 
understanding that the tests in general fall into 3 distinct categories (please 
note that this categories classify test *implementation* and not whether they 
are used as part of the mvn test, mvn verify or Bigtop's test infra -- more on 
that later): 
  # pure unit tests -- they reach into the guts of the implementation and use 
non-public APIs. There's absolutely no way to run that testcode on anything but 
MiniHBase/MiniDFS/MiniMR. Bigtop has no role to play in helping HBase community 
with developing/maintaining/executing those tests.
  # HBase-specific functional tests -- these are the tests that only use public 
APIs and don't muck about with internals. They are, however, only concerned 
with HBase itself. IOW, a test that wants to verify that you can submit an 
Oozie workflow that has Hive-HBASE-Pig pipeline does not fall into this 
category
  # Integration tests -- these are the multi-component tests that exercise  not 
just HBase but a # of different components. An above example of the Oozie 
workflow falls into this category.

Here's how an ideal situation looks from Bigtop's perspective: 
  * you guys totally take care of #1 and you implement it as usual unit tests. 
  * Bigtop (with your help) takes care of #3. It simply makes no sense to 
reproduce the same infra at the HBase level.
  * A proposal on #2 is this -- these tests belong to HBase. However, they have 
to be clearly marked as belonging to the functional class AND they have to 
utilize a very thin shim layer so you can use them in an mvn verify context and 
we can reuse them in Bigtop running against a fully distributed beefy clusters. 
At this point I'm convinced that TestLoadAndVerify should be the first example 
of this class and it should reside in HBase codebase (yet be available to 
Bigtop).

Let me know if this makes sense.

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-04-25 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261960#comment-13261960
 ] 

Jonathan Hsieh commented on HBASE-4821:
---

TestLoadAndVerify is a Bigtop test currently, but others that might fit into 
Roman's category #2 include any of the HBase MR tests or tool-sy tests like 
TestImportTsv, TestImportExport, (possibly the thrift/rest/avro servers) and 
some of the other long running external-api only tests like TestAcidGuarantee. 

Also another purpose of the shim layer is to provide an abstraction layer so 
the same code is used against a minicluster when run in a HBase context or 
against a real cluster in the Bigtop context.  It would a thinner interface 
than Mini*Cluster that does not expose internals.  I haven't thought this out 
completely yet but it could potentially be useful for dealing Hadoop1 vs 
Hadoop2 issues as well.

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-04-25 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261973#comment-13261973
 ] 

Enis Soztutar commented on HBASE-4821:
--

Yeah, it makes sense. Agreed that we want to run HBase MR kind of tests as both 
unit tests and #2 tests at a larger scale. What I wanted to ask actually was 
whether bigtop already provides such an API, or shall we develop one in bigtop. 
One other consideration is to abstract away the data for the tests. When run in 
a local cluster, we want to finish in a reasonable time, but when run on a 
5-node cluster or a 100-node cluster, the tests should reasonable stress the 
cluster accordingly.  

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-04-25 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261981#comment-13261981
 ] 

Roman Shaposhnik commented on HBASE-4821:
-

@Enis, @Jon, yes you'd have to provide knobs in the tests as to what the 
desired data size is. TestLoadAndVerify already does that and all the others 
should follow.

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-04-25 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261980#comment-13261980
 ] 

Roman Shaposhnik commented on HBASE-4821:
-

@Enis, I'm not sure what API are yous asking for. Bigtop currently provides a 
way of quickly deploying a fully distributed clusters using puppet code. You 
really don't want to burden your tests with deployment logic -- hence a choice 
of a full fledged CM system: puppet.

We're already using that for nightly testing of the entire Hadoop stack on ~5 
nodes fully distributed clusters that we sping on-demand on EC2. Here's a job 
that does that:
   http://bigtop01.cloudera.org:8080/view/Test/job/DeployCluster/

And here's how a test runs looks like (bear with me while I fix a couple of 
things that got broken after Bigtop's trunk migration to Hadoop 2.X):
   http://bigtop01.cloudera.org:8080/view/Test/job/SmokeCluster/

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-04-25 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261993#comment-13261993
 ] 

Jonathan Hsieh commented on HBASE-4821:
---

@Enis I think may be asking if Bigtop already has a shim layer?  There is a 
little utility class (o.a.bigtop.itest.hbase.util.HBaseTestUtil) that Bigtop 
HBase tests run against.  The tests tend to be written in a style that are unit 
tests or run as main.  This could use a little polish -- maybe making a trimmed 
down o.a.h.hbase.HBastTestingUtility that doesn't expose internals.

https://github.com/apache/bigtop/blob/trunk/bigtop-tests/test-artifacts/hbase/src/main/groovy/org/apache/bigtop/itest/hbase/util/HBaseTestUtil.java

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-04-25 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262131#comment-13262131
 ] 

Enis Soztutar commented on HBASE-4821:
--

Yep, I was referring to a shim layer + utilities to run against deployed or 
local cluster. Let me check out what we have in bigtop. 

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-04-03 Thread Keith Turner (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245528#comment-13245528
 ] 

Keith Turner commented on HBASE-4821:
-

I am an Accumulo developer, there is some cruft in our test dir.  The two most 
successful cluster test we have are continuous ingest and random walk.  We have 
found lots of bugs w/ these test.  I wrote a Gora version of continuous ingest 
that should run against HBASE.  The readme on github has a nice description.  

  https://github.com/keith-turner/goraci/

The accumulo version of continuous ingest can be found here.

  http://svn.apache.org/repos/asf/accumulo/tags/1.4.0/test/system/continuous/

This dir contains an old set of open office slides that also give an overview 
of continuous ingest.  At the end of the slides is the beginning of the idea of 
random walk test.  I am not sure if we have a nice description of random walk 
anywhere.  It is a fairly simple test framework.  You write test nodes in Java 
and link the nodes together in a graph using XML.  You start a test clients 
each node in a cluster.  The test client just does a random walk of the test 
graph.  We have found a ton of bugs in 1.3 and 1.4 using random walk.  

Actually the Accumulo features page may be the only place we give an overview 
of randomwalk.  I noticed that our random walk readme only tells you how to run 
it, not what it is.  Below is a link to the random walk test, but like I said 
its not very informative.

  http://svn.apache.org/repos/asf/accumulo/tags/1.4.0/test/system/randomwalk/

The actual Java code at the link below.  The framework and test nodes code is 
all here.

  
http://svn.apache.org/repos/asf/accumulo/tags/1.4.0/src/server/src/main/java/org/apache/accumulo/server/test/randomwalk/

The short description of randomwalk I mentioned is here.

  http://accumulo.apache.org/notable_features.html#testing

If anyone is interested in generalizing random walk so that HBase could use it 
to, let me know.

One last thing.  We tested Accumulo for over a month on a 10 node cluster using 
Continuous ingest, Random Walk, and the Agitator.  Below are some of the bugs 
we found during that time period.

[Bugs found in 1.4 
testing|https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=labels+%3D+14_qa_bug]

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-04-03 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245536#comment-13245536
 ] 

stack commented on HBASE-4821:
--

@Keith Welcome.  Thanks for the nice fat comment.  I'm not sure I want to run 
your randomwalk test if its going to generate that many bugs in hbase (smile).  
Let us take a looksee at the continuous ingest  Good on you Keith.

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-04-03 Thread Keith Turner (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245557#comment-13245557
 ] 

Keith Turner commented on HBASE-4821:
-

Eric Newton has been experimenting with running goraci against HBase.  One 
issue he ran into was that gora-hbase uses auto flush on every HTable 
connection.  This really slowed down ingest.  He modified the gora code locally 
so it would not do this. Eric posted a question on the gora user list asking 
why it behaved this way.  The Gora API has a flush call.

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-04-03 Thread Keith Turner (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245705#comment-13245705
 ] 

Keith Turner commented on HBASE-4821:
-

I noticed an early comment about Python code in the Accumulo test dir.  This is 
code in test/auto and we call these functional test.  This code is probably 
similar to some HBase unit test.  It supports test that run against a live 
instance of Accumulo.  The test framework starts an instance of Accumulo, runs 
a python or JAva test against that instance, and then shuts the instance down.  
Running all of the functional test takes 1 to 2 hours.

This test framework was written before random walk and it ensures basic 
functionality works.  For example theres a test to verify that adding split 
points to a table continues to work.  Since we have implemented random walk, I 
have found myself writing a lot more random walk test and less functional test. 
The reason for this is that the functional test usually test the feature when 
the system is one state, where as random walk test the same feature with the 
system in many different states.  For example a random walk test that adds 
splits points to a table will try to do that when the table and system are in 
many different states.  It may try to add the split when a tablet/region is 
migrating, currently splitting, minor compacting, major compacting, offline, 
etc.   So the likelyhood of finding a bug with addsplits() in randomwalk is 
much greater than the functional test.  The functional test will detect if the 
feature is completely broken, random walk can detect if the feature is broken 
under certain circumstances.
 


 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2011-11-18 Thread Roman Shaposhnik (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153223#comment-13153223
 ] 

Roman Shaposhnik commented on HBASE-4821:
-

Bigtop (http://incubator.apache.org/projects/bigtop.html) aims at providing a 
platform for the creations of exactly the kind of tests you're talking about 
*across* all the projects of a Hadoop stack. Granted, we're a young project and 
our test management framework is nowhere near the scope and quality of 
something like TestNG, but I think it will be helpful to invest in improving 
it. To give you a quick idea of the baseline architecture, here's what Bigtop 
testing framework assumes:
   1. all tests are implemented as code running on top of JVM. We don't care 
what language it is (Java, Groovy, Clojure, etc) as long as at the end of the 
day there's a bunch of class files generated.
   2. all tests are packaged/versioned as Maven artifacts
   3. all test data is packaged/versioned as Maven artifacts
   4. the entry point into test execution is via Junit/TestNG-style
   5. tests NEVER concern themselves with deployment (we've got Puppet for that)
   6. tests NEVER concern themselves with configuration (we've got Puppet for 
that)

If you think Bigtop can serve as a reasonable platform for what you're trying 
to accomplish, lets continue this discussion over at bigtop-dev@incubator (and 
Bigtop JIRAs).

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2011-11-18 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153237#comment-13153237
 ] 

stack commented on HBASE-4821:
--

+1 on at least trying bigtop till we figure why it just won't work for us

Mikhail, Accumulo, the other bigtable clone, has an integration suite that is 
made of python and c hackings.  You can check it out here: 
http://svn.apache.org/viewvc/incubator/accumulo/trunk/test/

I'd like a single integration suite/framework that could be run on amz, 
locally, on emc's 1k cluster that they are talking of donating to apache, etc.

I'd like to help w/ t his project.

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2011-11-18 Thread Konstantin Boudnik (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153256#comment-13153256
 ] 

Konstantin Boudnik commented on HBASE-4821:
---

+1 on BigTop approach: it has been proven once and again to be a consistent and 
repeatable environment for stacks' (no pun intended ;) integration testing. It 
has well thought separation of concerns in place (as has been pointed by Roman).

Besides, I am not sure I am buying into obsession with Python (or Ruby, etc.) 
when it comes to work with software written in JVM languages: why one needs to 
block himself an access to all nice things the platform provides. If a 
scripting language is desirable - and I can totally buy that - be Groovy with 
your Java apps ;)

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2011-11-18 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153319#comment-13153319
 ] 

Mikhail Bautin commented on HBASE-4821:
---

Thanks everyone for your comments. I will read up on BigTop and Puppet. I am 
also fine with using a JVM-based language for load tests themselves, as long as 
there is a way to do something like kill -9, which we can't really do in our 
unit tests. We could also try to reuse/modify the MiniHBaseCluster framework to 
talk to a real HBase cluster and script various distributed test scenarios in 
pure Java.

However, I want to emphasize one thing. Once configured, this HBase integration 
test tool should be extremely easy to use, as simple as: 
hbase_integration_test.sh hbase_source_dir. We might have to write some 
amount of nontrivial glue script code (e.g. in bash) to make that happen.

 A fully automated comprehensive distributed integration test for HBase
 --

 Key: HBASE-4821
 URL: https://issues.apache.org/jira/browse/HBASE-4821
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

 To properly verify that a particular version of HBase is good for production 
 deployment we need a better way to do real cluster testing after incremental 
 changes. Running unit tests is good, but we also need to deploy HBase to a 
 cluster, run integration tests, load tests, Thrift server tests, kill some 
 region servers, kill the master, and produce a report. All of this needs to 
 happen in 20-30 minutes with minimal manual intervention. I think this way we 
 can combine agile development with high stability of the codebase. I am 
 envisioning a high-level framework written in a scripting language (e.g. 
 Python) that would abstract external operations such as deploy to test 
 cluster, kill a particular server, run load test A, run load test B 
 (we already have a few kinds of load tests implemented in Java, and we could 
 write a Thrift load test in Python). This tool should also produce 
 intermediate output, allowing to catch problems early and restart the test.
 No implementation has yet been done. Any ideas or suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira