On Sat, May 4, 2013 at 10:37 PM, Buddy Burden barefootco...@gmail.comwrote:
We have several databases, but unit tests definitely don't have their
own. Typically unit tests run either against the dev database, or the QA
database. Primarily, they run against whichever database the current
developer has their config pointed to. This has to be the case, since
sometimes we make modifications to the schema. If the unit tests all ran
against their own database, then my unit tests for my new feature involving
the schema change would necessarily fail. Or, contrariwise, if I make the
schema modification on the unit test database, then every other dev's unit
tests would fail. I suppose if we were using MySQL, it might be feasible
to create a new database on the fly for every unit test run. When you're
stuck with Oracle though ... not so much. :-/
Interesting... Developers in our project have a local copy of the
production database for working with but our unit test runs always create a
database from scratch and run all schema migrations on it before running
the tests. Creating and migrating the unit test DB usually takes between 10
and 30 seconds so setup time is not really an issue... We're currently on
MySQL but will be migrating to Oracle in the near future. Could you
elaborate on why this approach might not be viable on Oracle?
As to why we do this - I guess it's mainly history... We've only recently
cleaned up our tests to not rely on each other so we're only now getting to
a point where we can start running them in random order - let alone in
parallel... I guess the upsides of starting from a clean database are
mainly matters of convenience; single-digit IDs are easier to read
ten-digits ones and debugging failures is easier on a table with 10 rows
instead of 10 million. The flip-side is of course, as previously mentioned
is that production code is expected to work in a dirty rather than
clean environment...
Your points about parallelization and using it to flush out
locking/contention issues are interesting and something that we haven't
really explored in our test setup but something we could certainly benefit
from... (Having had our fair share of those issues in the past...)
/L
So all our unit tests just connect to whatever database you're currently
pointed at, and they all create their own data, and they all roll it back
at the end. In fact, our common test module (which is based on Test::Most)
does the rollback for you. In fact in fact, it won't allow you to commit.
So there's never anything to clean up.
AFA leaving the data around for debugging purposes, we've never needed
that. The common test module exports a DBdump function that will dump
out whatever records you need. If you run into a problem with the data and
you need to see what the data is, you stick a DBdump in there. When you're
finished debugging, you either comment it out, or (better yet) just change
it from `diag DBdump` to `note DBdump` and that way you can get the dump
back any time just by adding -v to your prove.
AFAIK the only time anyone's ever asked me to make it possible for the
data to hang around afterwards was when the QA department was toying with
the idea of using the common test module to create test data for their
manual testing scenarios, but they eventually found another way around
that. Certainly no one's ever asked me to do so for a unit test. If they
did, there's a way to commit if you really really want to--I just don't
tell anyone what it is. ;-
Our data generation routines generate randomized data for things that have
to be unique (e.g. email addresses) using modules such as String::Random.
In the unlikely event that it gets a collision, it just retries a few
times. If a completely randomly generated string isn't unique after, say,
10 tries, you've probably got a bigger problem anyway. Once it's inserted,
we pull it back out again using whatever unique key we generated, so we
don't ever have a need to count records or anything like that. Perhaps
count the number of records _attached_ to a record we inserted previously
in the test, but that obviously isn't impacted by having extra data in the
table.
Unlike Mark, I won't say we _count_ on the random data being in the DB; we
just don't mind it. We only ever look at the data we just inserted. And,
since all unit test data is in a transaction (whether ours or someone
else's who happens to be running a unit test at the same time), the unit
tests can't conflict with each other, or with themselves (i.e. we do use
parallelization for all our unit tests). The only problems we ever see
with this approach are:
* The performance on the unit tests can be bad if lots and lots of things
are hitting the same tables at the same time.
* If the inserts or updates aren't judicious with their locking, some
tests can lock other tests out from accessing the table they want.
And the cool thing there is, both