Ovid wrote: > The following line is giving me pause: > > ok 9 Elegy 9B # TOdO > > That's an 'unexpectedly succeeded' test ('bonus', in the Test::Harness > world).
I agree "bonus" is not a very good term. It was what was in the TH code when I dug out the TODO functionality. > Right now, if that read 'not ok # TODO', TAPx::Parser would have this: > > passed true > actual_passed false > todo_failed false > > But since it reads 'ok', I've reversed the sense: > > passed false > actual_passed true > todo_failed true > > In this case, Test::Harness and friends report that 'ok 9 # todo' is passing, > not failing, but I'm reporting the opposite result. I think my behavior is > more correct because I'm trying to write things so that someone who forgets > writes a bad harness will still see what's going on. For example, let's say > someone only wants to see failing tests: > > # the 'source' key is new. You no longer have to manually create a stream > my $parser = TAPx::Parser->new( { source => $test_file } ); > while ( my $result = $parser->next ) { > print $result->as_string if ! $parser->passed; > } > > With that, they'll never notice that 'ok # TODO' has unexpectedly succeeded > if I adopt the current behavior of Test::Harness unless they explicitly > remember to check the $parser->todo_failed method. > > I propose that 'ok # TODO' tests be reported as failures. I think it would > be good if Test::Harness also adopted this strategy because right now, if you > see that tests unexpectedly succeeded, you don't know which tests they are > and you have to try and grep through the TAP output manually. > > Thoughts? Is this going to break a lot of stuff (I suspect it might). Why do "ok # TODO" tests pass? Users will report "the tests failed and the CPAN shell won't install it and now I can't install <list of dependencies> and OMGWTF your shit is broke" when in fact it works even better than the author thought! Now wouldn't that suck. Users won't install your module (and thus any dependents) because the tests are reporting a failure when in fact everything is working fine. Automated installation systems (CPAN shell, packagers, etc...) won't install the module. They have no way of deciding what action they should take on a "everything passed but there were some passing TODO tests" because all they know is "make test" returned failure. In toolchain land, RFC 1925 Section 2.1 dominates. Thought: Writing your code so people are less likely to make a mistake: good. Writing your code to try and make up for a mistake: not so good. The idea of changing the parsing semantics for a "bad harness" makes my feet itch. The idea of making it less likely they'll forget to check a magic flag, good. Thought: If someone writes as dumb simple a displayer as your example above they probably don't care about reporting back passing TODO tests. Thought: > If TODO tests mark a complete feature and some parts of that feature pass but > I don't notice > that they've unexpectedly succeeded, I have no way of really trusting what > state they'll leave > my object in. Thus, I can't trust TODO tests. This is a display issue. You are writing the parser. Just because Test::Harness does this poorly doesn't mean everyone else will. The treatment of "ok" TODO tests is up to the displayer. If they want an ok TODO to be silently ignored they can do that. If they want it to cause a big hammer to pop out of the machine and bonk the developer on the head, they can do that. You're writing the parser. All you have to do is report enough information to the displayer to know when to swing the hammer. By putting display decisions into the parser you are limiting the possibilities of the display writer. The information needed is: What was the actual result of the test regardless of any directives? ("ok" or "not ok") Were there any directives associated with that test and what were they? When everything is taken into account is that test considered a pass or fail by TAP? As far as TAP is concerned, TODO and SKIP tests always pass. [1] I'll agree that putting the responsibility of checking for any special directives onto the part of the display writer is going to be prone to their forgetting. So perhaps you can change your examples to discourage it. if( $result->simply_passed ) { ...normal "ok" with no directives... } elsif( $result->passed ) { ...it has some sort of directive, deal with it... } else { ...failure... } Or if you're going with a callback model, have one callback which is for whether the test logically passed or failed and another which is for dealing with directives. [1] If you want to get technical "not ok # SKIP" is currently undefined.