Re: Breaking compatability with Test::Harness and friends?

Michael G Schwern Sat, 16 Sep 2006 07:06:57 -0700

Ovid wrote:
> The following line is giving me pause:
> 
>   ok 9 Elegy 9B           # TOdO
> 
> That's an 'unexpectedly succeeded' test ('bonus', in the Test::Harness 
> world).


I agree "bonus" is not a very good term.  It was what was in the TH code when I 
dug out the TODO functionality.


> Right now, if that read 'not ok # TODO', TAPx::Parser would have this:
> 
>   passed    true
>   actual_passed    false
>   todo_failed        false
> 
> But since it reads 'ok', I've reversed the sense:
> 
>   passed    false
>   actual_passed    true
>   todo_failed        true
> 
> In this case, Test::Harness and friends report that 'ok 9 # todo' is passing, 
> not failing, but I'm reporting the opposite result.  I think my behavior is 
> more correct because I'm trying to write things so that someone who forgets 
> writes a bad harness will still see what's going on.  For example, let's say 
> someone only wants to see failing tests:
> 
>   # the 'source' key is new.  You no longer have to manually create a stream
>   my $parser = TAPx::Parser->new( { source => $test_file } );
>   while ( my $result = $parser->next ) {
>     print $result->as_string if ! $parser->passed;
>   }
> 
> With that, they'll never notice that 'ok # TODO' has unexpectedly succeeded 
> if I adopt the current behavior of Test::Harness unless they explicitly 
> remember to check the $parser->todo_failed method.
> 
> I propose that 'ok # TODO' tests be reported as failures.  I think it would 
> be good if Test::Harness also adopted this strategy because right now, if you 
> see that tests unexpectedly succeeded, you don't know which tests they are 
> and you have to try and grep through the TAP output manually.
> 
> Thoughts?  Is this going to break a lot of stuff (I suspect it might).

Why do "ok # TODO" tests pass?  Users will report "the tests failed and the 
CPAN shell won't install it and now I can't install <list of dependencies> and 
OMGWTF your shit is broke" when in fact it works even better than the author 
thought!  Now wouldn't that suck.  Users won't install your module (and thus 
any dependents) because the tests are reporting a failure when in fact 
everything is working fine.  Automated installation systems (CPAN shell, 
packagers, etc...) won't install the module.  They have no way of deciding what 
action they should take on a "everything passed but there were some passing 
TODO tests" because all they know is "make test" returned failure.

In toolchain land, RFC 1925 Section 2.1 dominates.


Thought:  Writing your code so people are less likely to make a mistake: good.  
Writing your code to try and make up for a mistake: not so good.  The idea of 
changing the parsing semantics for a "bad harness" makes my feet itch.  The 
idea of making it less likely they'll forget to check a magic flag, good.


Thought:  If someone writes as dumb simple a displayer as your example above 
they probably don't care about reporting back passing TODO tests.


Thought:  

> If TODO tests mark a complete feature and some parts of that feature pass but 
> I don't notice
> that they've unexpectedly succeeded, I have no way of really trusting what 
> state they'll leave
> my object in.  Thus, I can't trust TODO tests.

This is a display issue.  You are writing the parser.  Just because 
Test::Harness does this poorly doesn't mean everyone else will.

The treatment of "ok" TODO tests is up to the displayer.  If they want an ok 
TODO to be silently ignored they can do that.  If they want it to cause a big 
hammer to pop out of the machine and bonk the developer on the head, they can 
do that.  You're writing the parser.  All you have to do is report enough 
information to the displayer to know when to swing the hammer.  By putting 
display decisions into the parser you are limiting the possibilities of the 
display writer.

The information needed is:
What was the actual result of the test regardless of any directives?  ("ok" or 
"not ok")
Were there any directives associated with that test and what were they?
When everything is taken into account is that test considered a pass or fail by 
TAP?

As far as TAP is concerned, TODO and SKIP tests always pass. [1]

I'll agree that putting the responsibility of checking for any special 
directives onto the part of the display writer is going to be prone to their 
forgetting.  So perhaps you can change your examples to discourage it.

    if( $result->simply_passed ) {
        ...normal "ok" with no directives...
    }
    elsif( $result->passed ) {
        ...it has some sort of directive, deal with it...
    }
    else {
        ...failure...
    }

Or if you're going with a callback model, have one callback which is for 
whether the test logically passed or failed and another which is for dealing 
with directives.


[1] If you want to get technical "not ok # SKIP" is currently undefined.

Re: Breaking compatability with Test::Harness and friends?

Reply via email to