Re: How might we mark a test suite isn't parallalizable?

Lasse Makholm Mon, 06 May 2013 03:15:44 -0700

On Sat, May 4, 2013 at 10:37 PM, Buddy Burden <barefootco...@gmail.com>wrote:


>
> We have several databases, but unit tests definitely don't have their
> own.  Typically unit tests run either against the dev database, or the QA
> database.  Primarily, they run against whichever database the current
> developer has their config pointed to.  This has to be the case, since
> sometimes we make modifications to the schema.  If the unit tests all ran
> against their own database, then my unit tests for my new feature involving
> the schema change would necessarily fail.  Or, contrariwise, if I make the
> schema modification on the unit test database, then every other dev's unit
> tests would fail.  I suppose if we were using MySQL, it might be feasible
> to create a new database on the fly for every unit test run.  When you're
> stuck with Oracle though ... not so much. :-/
>

Interesting... Developers in our project have a local copy of the
production database for working with but our unit test runs always create a
database from scratch and run all schema migrations on it before running
the tests. Creating and migrating the unit test DB usually takes between 10
and 30 seconds so setup time is not really an issue... We're currently on
MySQL but will be migrating to Oracle in the near future. Could you
elaborate on why this approach might not be viable on Oracle?

As to why we do this - I guess it's mainly history... We've only recently
cleaned up our tests to not rely on each other so we're only now getting to
a point where we can start running them in random order - let alone in
parallel... I guess the upsides of starting from a clean database are
mainly matters of convenience; single-digit IDs are easier to read
ten-digits ones and debugging failures is easier on a table with 10 rows
instead of 10 million. The flip-side is of course, as previously mentioned
is that production code is expected to work in a "dirty" rather than
"clean" environment...

Your points about parallelization and using it to flush out
locking/contention issues are interesting and something that we haven't
really explored in our test setup but something we could certainly benefit
from... (Having had our fair share of those issues in the past...)

/L


>
> So all our unit tests just connect to whatever database you're currently
> pointed at, and they all create their own data, and they all roll it back
> at the end.  In fact, our common test module (which is based on Test::Most)
> does the rollback for you.  In fact in fact, it won't allow you to commit.
> So there's never anything to clean up.
>
> AFA leaving the data around for debugging purposes, we've never needed
> that.  The common test module exports a "DBdump" function that will dump
> out whatever records you need.  If you run into a problem with the data and
> you need to see what the data is, you stick a DBdump in there.  When you're
> finished debugging, you either comment it out, or (better yet) just change
> it from `diag DBdump` to `note DBdump` and that way you can get the dump
> back any time just by adding -v to your prove.
>
> AFAIK the only time anyone's ever asked me to make it possible for the
> data to hang around afterwards was when the QA department was toying with
> the idea of using the common test module to create test data for their
> manual testing scenarios, but they eventually found another way around
> that.  Certainly no one's ever asked me to do so for a unit test.  If they
> did, there's a way to commit if you really really want to--I just don't
> tell anyone what it is. ;->
>
> Our data generation routines generate randomized data for things that have
> to be unique (e.g. email addresses) using modules such as String::Random.
> In the unlikely event that it gets a collision, it just retries a few
> times.  If a completely randomly generated string isn't unique after, say,
> 10 tries, you've probably got a bigger problem anyway.  Once it's inserted,
> we pull it back out again using whatever unique key we generated, so we
> don't ever have a need to count records or anything like that.  Perhaps
> count the number of records _attached_ to a record we inserted previously
> in the test, but that obviously isn't impacted by having extra data in the
> table.
>
> Unlike Mark, I won't say we _count_ on the random data being in the DB; we
> just don't mind it.  We only ever look at the data we just inserted.  And,
> since all unit test data is in a transaction (whether ours or someone
> else's who happens to be running a unit test at the same time), the unit
> tests can't conflict with each other, or with themselves (i.e. we do use
> parallelization for all our unit tests).  The only problems we ever see
> with this approach are:
>
> * The performance on the unit tests can be bad if lots and lots of things
> are hitting the same tables at the same time.
> * If the inserts or updates aren't judicious with their locking, some
> tests can lock other tests out from accessing the table they want.
>
> And the cool thing there is, both of those issues expose problems in the
> implementation that need fixing anyway: scalability problems and potential
> DB contention issues.  So forcing people to fix those in order to make
> their unit tests run smoothly is a net gain.
>
> Anyways, just wanted to throw in yet another perspective.
>
>
>             -- Buddy

Re: How might we mark a test suite isn't parallalizable?

Reply via email to