Dude, where's my diagnostics? (Re: Halting on first test failure)

Michael G Schwern Fri, 11 Jan 2008 17:17:08 -0800

Ovid wrote:
> I've posted a trimmed down version of the custom 'Test::More' we use
> here:
> 
>   http://use.perl.org/~Ovid/journal/35363
> 
> I can't recall who was asking about this, but you can now do this:
> 
>   use Our::Test::More 'no_plan', 'fail';
> 
> If 'fail' is included in the import list, the test program will die
> immediately after the first failure.  VERY HANDY at times.


I've experimented with this idea in the past to use Test::Builder to replace
home rolled "die on failure" assert() style test suites.  Unfortunately
there's a major problem:

$ perl -wle 'use OurMore "fail", "no_plan";  is 23, 42'
not ok 1
#   Failed test at /usr/local/perl/5.8.8/lib/Test/More.pm line 329.
Test failed.  Halting at OurMore.pm line 44.
1..1

Dude, where's my diagnostics?

In Test::Builder, the diagnostics are printed *after* the test fails.  So
dying on ok() will kill those very important diagnostics.  Sure, you don't
have to read a big list of garbage but now you don't have anything to read at 
all!

Since the diagnostics are printed by a calling function outside of
Test::Builder's control (even if you cheated and wrapped all of Test::More
there's all the Test modules on CPAN, too) I'd considered die on failure
impossible. [1]  The diagnostics are far more important.


Now, getting into opinion, I really, really hate die on failure.  I had to use
a system that implemented it for a year (Ovid knows just what I'm talking
about) and I'd rather scroll up through an occasional burst of errors and
warnings then ever not be able to fully diagnose a bug because a test bailed
out before it was done giving me all the information I needed to fix it.  For
example, let's look at the ExtUtils::MakeMaker tests for generating a PPD file.

ok( open(PPD, 'Big-Dummy.ppd'), '  .ppd file generated' );
my $ppd_html;
{ local $/; $ppd_html = <PPD> }
close PPD;
like( $ppd_html, qr{^<SOFTPKG NAME="Big-Dummy" VERSION="0,01,0,0">}m,
                                                           '  <SOFTPKG>' );
like( $ppd_html, qr{^\s*<TITLE>Big-Dummy</TITLE>}m,        '  <TITLE>'   );
like( $ppd_html, qr{^\s*<ABSTRACT>Try "our" hot dog's</ABSTRACT>}m,
                                                           '  <ABSTRACT>');
like( $ppd_html,
      qr{^\s*<AUTHOR>Michael G Schwern &lt;[EMAIL PROTECTED]&gt;</AUTHOR>}m,
                                                           '  <AUTHOR>'  );
like( $ppd_html, qr{^\s*<IMPLEMENTATION>}m,          '  <IMPLEMENTATION>');
like( $ppd_html, qr{^\s*<DEPENDENCY NAME="strict" VERSION="0,0,0,0" />}m,
                                                           '  <DEPENDENCY>' );
like( $ppd_html, qr{^\s*<OS NAME="$Config{osname}" />}m,
                                                           '  <OS>'      );
my $archname = $Config{archname};
$archname .= "-". substr($Config{version},0,3) if $] >= 5.008;
like( $ppd_html, qr{^\s*<ARCHITECTURE NAME="$archname" />}m,
                                                           '  <ARCHITECTURE>');
like( $ppd_html, qr{^\s*<CODEBASE HREF="" />}m,            '  <CODEBASE>');
like( $ppd_html, qr{^\s*</IMPLEMENTATION>}m,           '  </IMPLEMENTATION>');
like( $ppd_html, qr{^\s*</SOFTPKG>}m,                      '  </SOFTPKG>');

Let's say the first like() fails.  So you go into the PPD code and fix that.
Rerun the test.  Oh, the second like failed.  Go into the PPD code and fix
that.  Oh, the fifth like failed.  Go into the PPD code and fix that...

Might it be faster and useful to see all the related failures at once?

And then sometimes tests are combinatorial.  A failure of A means one thing
but A + B means another entirely.

Again, let's look at the MakeMaker test to see if files got installed.

ok( -e $files{'dummy.pm'},     '  Dummy.pm installed' );
ok( -e $files{'liar.pm'},      '  Liar.pm installed'  );
ok( -e $files{'program'},      '  program installed'  );
ok( -e $files{'.packlist'},    '  packlist created'   );
ok( -e $files{'perllocal.pod'},'  perllocal.pod created' );

If the first test fails, what does that mean?  Well, it could mean...

A)  Only Dummy.pm failed to get installed and it's a special case.
B)  None of the .pm files got installed, but everything else installed ok.
C)  None of the .pm files or the programs got installed, but the
    generated files are ok
D)  Nothing got installed and the whole thing is broken.

Each of these things suggests different debugging tactics.  But with a "die on
failure" system they all look exactly the same.


Oooh, and if you're the sort of person that likes to use the debugger it's
jolly great fun to have the test suite just KILL THE PROGRAM when you want to
diagnose a post-failure problem.


There are two usual rebuttals.  The first is "well just turn off
die-on-faillure and rerun the test."  Ovid's system is at least capable of
being turned off, many hard code "failure == die".  Unfortunately Ovid's is at
the file level, it should be at the user level since the "do I or do I not
want to see the gobbledygook" is more a user preference.

But we all know the problems with the "just rerun the tests" approach.

Maybe re-running the tests just isn't possible, or it's really slow to do so?
 What if these are tests on a shipped module and all you've got is an email
with a cut & pasted report?  Now you've lost time waiting for the user to
rerun the tests with a special flag set... assuming you hear back at all.

What about heisenbugs?  Not you see failure, now you don't.  Rerun it a second
time with all the diagnostics on and suddenly it passes.  Maybe you need to
run the entire 30 minute test suite before it happens.  Maybe it only happens
on 2:30am on a Sunday.  Maybe you tickled a memory leak.  This is why it's so
important to get as much information as you can on the first run, you might
not get a second.

The second rebuttal is typically something about how I should restructure my
tests so they go in the right order or turn five tests into one.  Well,
sometimes you can, sometimes you can't.  And I'm sure, after a lot more
thought and time than I care to put into it and with two scoops of hindsight,
you can do it.  But now I'm spending time carefully crafting tests to deal
with an artificial restriction.  Writing tests isn't about hindsight, or even
foresight.  It's about casting a net wide enough that it's going to catch the
bugs you have *and the bugs you don't yet have* and give you the information
to fix them.  You can't predict that very well, and die-on-failure forces you
to do it very well.


[1] "But I have a way", he said mysteriously.

-- 
184. When operating a military vehicle I may *not* attempt something
     “I saw in a cartoon”.
    -- The 213 Things Skippy Is No Longer Allowed To Do In The U.S. Army
           http://skippyslist.com/?page_id=3

Dude, where's my diagnostics? (Re: Halting on first test failure)

Reply via email to