Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

Michael G Schwern Tue, 04 Dec 2007 20:25:31 -0800

I'm going to sum up this reply, because it got long but kept on the same themes.

*  TODO tests provide you with information about what tests the author decided
to ignore.
**  Commented out tests provide you with NO information.
**  Most TODO tests would have otherwise been commented out.

*  How you interpret that information is up to you.
**  Most folks don't care, so the default is to be quiet.

*  The decision for what is success and what is failure lies with the author
**  There's nothing we can do to stop that.
**  But TODO tests allow you to reinterpret the author's desires.

*  TAP::Harness (aka Test::Harness 3) has fairly easy ways to control how
   TODO tests are interpreted.
**  It could be made easier, especially WRT controlling "make test"
**  CPAN::Reporter could be made aware of TODO passes.

Fergal Daly wrote:
> On 05/12/2007, Michael G Schwern <[EMAIL PROTECTED]> wrote:
>> This this whole discussion has unhinged a bit from reality, maybe you can 
>> give
>> some concrete examples of the problems you're talking about?  You obviously
>> have some specific breakdowns in mind.
> 
> I don't, I'm arguing against what has been put forward as good
> practice when there are other better practices that are approximately
> as easy and don't have the same downsides.
> 
> In fairness though these bad practices were far more strongly
> advocated in the previous thread on this topic than in this one.

I don't know what thread that was, or if I was involved, so maybe I'm not the
best person to be arguing with.

>> The final choice, incrementing the dependency version to one that does not 
>> yet
>> exist, boils down to "it won't work".  It's also ill advised to anticipate
>> that version X+1 will fix a given bug as on more than one occasion an
>> anticipated bug has not been fixed in the next version.
> 
> As I said earlier though, in Module::Build you have the option of
> saying version < X and then when it's finally fixed, you can say !X
> (and !X+1 if that didn't fix it).

Yep, rich dependencies are helpful.

>> There is also the "I don't think feature X works in Y environment" problem.
>> For example, say you have something that depends on symlinks.  You could hard
>> code in your test to skip if on Windows or some such, but that's often too
>> broad.  Maybe they'll add them in a later version, or with a different
>> filesystem (it's happened on VMS) or with some fancy 3rd party hack.  It's
>> nice to get that information back.
> 
> How do you get this information back? Unexpected passes are not
> reported to you. If you want to be informed about things like this a
> TODO is not a very good way to do it.

The TODO test is precisely the way to do it, it provides all the information
needed.  We just don't have the infrastructure to report it back.

As discussed before, what's needed is a higher resolution then just "pass" and
"fail" for the complete test run.  That's the "Result: PASS/TODO" discussed
earlier.  Things like CPAN::Reporter could then send that information back to
the author.  It's a fairly trivial change for Test::Harness.

The important thing is that "report back" is no longer locked to "fail".

>>> I'm talking about people converting tests that were working just fine
>>> to be TODO tests because the latest version of Foo (an external
>>> module) has a new bug. While Foo is broken, they don't want lots of
>>> bug reports from CPAN testers that they can't do anything about.
>>>
>>> This use of TODO allows you to silence the alarm and also gives you a
>>> way to spot when the alarm condition has passed. It's convenient for
>>> developers but it's 2 fingers to users who can now get false passes
>>> from the test suites,
>> It still boils down to what known bugs the author is willing to release with.
>>  Once the author has decided they don't want to hear about a broken
>> dependency,  and that the breakage isn't important, the damage is done.  The
>> TODO test is orthogonal.
>>
>> Again, consider the alternative which is to comment the test out.  Then you
>> have NO information.
> 
> Who's "you"?

You == user.

> If you==user then a failing TODO test and commented out test are
> indistinguishable unless you go digging in the code or TAP stream.

As they say, "works as designed".  The author decided the failures aren't
important.  Don't like it?  Take it up with the author.  Most folks don't care
about that information, they just want the thing installed.

You (meaning Fergal Daly) can dig them out with some Test::Harness hackery,
and maybe that should be easier if you really care about it.  The important
thing is the information is there, encoded in the tests, and you can get at it
programatically.

The alternative is to comment the failing test out in which case you have *no*
information and those who are interested cannot get it out.

> A passing TODO is just confusing.

That's a function of how it's displayed.  "UNEXPECTEDLY SUCCEEDED", I agree,
was confusing.  No question.  TH 3's display is more muted and no more
confusing then a skip test.  There is also the very clear "All tests
successful" and "Result: PASS" which should clear things up.

More importantly, when you're installing stuff via the CPAN shell it's all
whizzing by and the user is blissfully innocent.

There is, perhaps, an impedance mismatch here.  The concern seems to be known
bugs, and there's implication that a TODO test is somehow dishonestly hiding
known bugs from the user.  The mismatch is that it's not the job of the test
suite to inform the user about known bugs.

> If you==author then the only time a TODO and a commented out test are
> distinguishable is when _you_ are running them and studying the
> output. Studying the output of tests that have some TODOs passing is
> not simple.

I really think you need to look at Test::Harness 3.

$ prove todo.t
todo......ok
All tests successful.

Test Summary Report
-------------------
todo.t (Wstat: 0 Tests: 2 Failed: 0)
  TODO passed:   1-2
Files=1, Tests=2,  0 wallclock secs ( 0.01 usr +  0.01 sys =  0.02 CPU)
Result: PASS

As the summary report is only displayed when something fails OR if a TODO test
passes, it's real easy to spot.

There's also a very clear "todo_passed" method in TAP::Parser to tell you if
any TODO tests passed if you want to automate it.

Could this be made even easier?  Yes, probably some Test::Harness environment
variable.  Point is, TODO tests make the information available.  Digging it
out and deciding what to do with it is another problem, but YOU can now make
that decision.

> A far easier way to be notified when Foo starts working is
> to write an explicit test for Foo's functionality and run it whenever
> you see a new Foo.

Humans are really awful at rote work, especially over long periods of time,
and I don't want to waste my brainpower remembering to manually run special
tests for special conditions.  Bleh.

>> So I think the problem you're concerned with is poor release decisions.  TODO
>> tests are just a tool being employed therein.
> 
> The point is that you have no idea what functionality is important for
> your users. Disabling (with TODO or any other means) tests that test
> previously working functionality that might be critical for a given
> user is always a poor release decision in my book.

Your opinion, and generally mine, too.  I do agree with the philosophy of
"version X+1 should be no worse than version X", but reality intervenes.  And
authoring a general purpose testing library means to be realistic, not
idealistic (while always nudging towards the idealistic).

If one does have to release with breakage, rather than have folks make the
choice between "release with failing tests and get lots of redundant reports
and stop automated installers" and "comment out failing tests (and probably
forget to uncomment them later)" TODO tests give a third more palatable option
to allow tests to remain and still be active.

Consider also that you're not always blotting out previously working
functionality.  Often you get a bug report of something that never quite
worked.  Good practice says you write a test.  If you can't fix it before
release, wrap it in a TODO test.

> You have no idea what version of Foo they're using

Well, you do with version dependency declarations so you control the range.
New versions are, of course, open to breakage but at some point you have to
trust something.

> or what strangeness is lurking in their environment.

Anything that might effect the code can be checked for and worked around or do
whatever with as desired.  Geez, it's the story of my life with MakeMaker.

> If someone is going to use your
> module for real work, they should be able to run the full set of
> tests.

They do run the full set of tests, that's what a TODO test offers over a SKIP!
 All the tests are still run, a TODO test doesn't change that.

What it does change is the idea of pass and fail.  I have no problem with
leaving the control of what is success and failure in the hands of the author.
 We can't actually take it away, they'll just delete offending tests if we do.

What TODO tests do give you, if I can beat this expired equine a little more,
is information.  The information is in the TAP stream about what tests were
TODO and what weren't.  You, if you really want to know, can extract that
information with TAP::Parser.

At this point I'll leave it as a SMOP for you to patch Test::Harness to make
the reporting you want possible.

> The test suite should not vary depending on what the latest uploaded
> version of Foo on CPAN does. Perhaps the reporting should vary and
> perhaps the toolchain's reaction to failing tests could be made
> smarter and that would remove the desire of developers to use TODO in
> this way,

I have no idea what might make a human, much less the toolchain, "smarter"
about a failing test.  That's a terrifying idea to me along the lines of the
"expected failure" ("Oh, don't worry about that failing test.  It happens all
the time.  Just ignore it.") that TODO tests are exactly made to avoid.
That's a swamp I have worked hard for years to drain.

Given that TODO tests attempt to convey exactly that information, how to
interpret a failing test, and you're arguing *against* that... I'm not sure
what's going on.

-- 
Ahh email, my old friend.  Do you know that revenge is a dish that is best
served cold?  And it is very cold on the Internet!

Re: shouldn't "UNEXPECTEDLY SUCCEEDED" mean failure?

Reply via email to