Let me make something clear, I don't have a solution to this problem.  I'm
just finally getting a grip on what the problem actually is.  The last week
has shaken lose use cases and conditions I hadn't thought about before and the
TAP diagnostic syntax proposal does not cover.

What I do know is that merging the streams or sending diag() to STDOUT will
not work.  I've explained this a number of times.  I don't want to spend any
more time explaining why when the energy can be spent on a real fix.  I've
spent probably an hour on this post alone.


chromatic wrote:
>> Here's one.
>> http://www.mail-archive.com/perl-qa@perl.org/msg06694.html
> 
> Merely skimming that message reveals a falsehood:
> 
>       Test::Harness throws out all non-TAP stuff going to STDOUT.  This 
> includes 
> comments.  So if Test::Builder started sending its diagnostics to STDOUT 
> they'd disappear into the ether.
> 
> That hasn't been true for *nearly two years*.  I seem to recall patching 
> Test::Harness::Straps before the coffee stain book came out in July 2005.

I forgot, I'm on a mailing list.  Thou shalt not let any inaccuracy, no matter
how minor or totally inconsequential to the point being made, go uncorrected.

Yes, THS might capture them but TH does nothing with them and that's the
important thing for the user.


>> Here's another, referencing that first one.
>> http://www.nntp.perl.org/group/perl.qa/2006/09/msg7152.html
> 
>> And yet another one.
>> http://www.nntp.perl.org/group/perl.qa/2006/09/msg7153.html
> 
>> And that's just in September.  I feel like I have to refute this idea about
>> once a month.  Here we go again.
> 
> I remember that thread now, the one where people like Ovid, Aristotle, David, 
> Adrian, and chromatic all said "Uh, making one small change to Test::Builder 
> and fixing Test.pm too would improve 99% of the Perl world without making 
> life worse than it already is for anyone else".
> 
> Did I miss your response to everything there?

The same points remain.

1) It breaks displaying TODO tests.

2) Not everyone uses Test.pm and Test::Builder.  Really.  TH is one of the
most widely used modules in Perldom and most folks have no idea they're using 
it.

3) TAP comments on STDOUT have never been displayed, it will break that
assumption which has been around for about a decade.  Lots of folks take
advantage of this by printing to either STDOUT or STDERR for "don't display"
or "display".  You'll start displaying all sorts of comments that never were
before.

4) It requires a simultaneous Test::Harness, Test and Test::More upgrade.  I
don't want to even think of the dependency mobius strip that will be.

5) It couples the harness and the producer violating one of the central
principles of TAP, what sets it apart from XUnit.


>> Piping all diagnostics to STDOUT solves nothing except maybe allowing
>> runtests to display warnings again.  You still can't tell the difference
>> between a comment (what currently is "# foo" printed to STDOUT) and a
>> failure diagnostic (what currently is "# foo" printed to STDERR) and
>> diagnostics associated with a TODO test (which is "# foo" printed to
>> STDOUT).
> 
> Test::Harness can identify TODO tests.  Test::Harness can even tell if a 
> diagnostic followed a TODO test.

I covered this in the original post.  The heuristics used to determine what
diagnostics are associated with what test are just that, heuristics.  There's
nothing which guarantees that 1) diagnostics must follow a test and 2) that
all the diagnostics between two tests belong to the proceeding test.  Its just
a convention.  You can easily lose important failure diagnostics this way.


>> Consider the following.  Again.
>>
>> $ cat ~/tmp/foo.t
>> #!/usr/bin/perl -w
>>
>> $| = 1;
>>
>> print "1..2\n";
>> print "ok 1\n";
>> print "# This is not displayed\n";
>> print "not ok 2\n";
>> print STDERR "# This displayed.\n";
> 
> ...
> 
>> $ runtests ~/tmp/foo.t
>> /Users/schwern/tmp/foo......1/2 # This is not displayed
>> /Users/schwern/tmp/foo......2/2 # This displayed.
>> /Users/schwern/tmp/foo...... Failed 1/2 subtests
> 
> You didn't have to convince me that interleaving STDOUT and STDERR is a 
> mistake.  I already believed that.

That might not have been clear, it was intended to show the effect of what
you're proposing on existing tests.  I used runtests because it does display
all diagnostics (except for its TODO heuristic).  It illustrates that the
STDOUT/STDERR split is an important one in determining what should be
displayed.  If you start displaying all diagnostics from STDOUT you're going
to display things which were not intended to be and passing tests will be
noisy.  This goes against the principle that passing tests should be quiet.


>> There are proposals on the table for machine parsable diagnostics on the
>> wiki but nothing to indicate free-form information to be displayed to the
>> user.
> 
> Currently:
>       - the test harness displays all free-form information to the user by 
> virtue 
> of not touching it at all

"Currently" as in what Test::Harness 2 currently does or in what Test::Harness
3 might do as a stop gap?  I'm going to assume the former.

That's not entirely true.  "free-form information" (aka TAP diagnostics) are
not displayed by the harness.  The only reason diagnostics going to STDERR are
displayed is because the harness does nothing to STDERR, it has no knowledge.


> In the future:
>       - the test harness could display all free-form information to the user 
> by 
> virtue of it being identifiably free-form information

I don't quite agree, there are two classes of information.  One "this is
information which should be displayed as part of a normal, passing test" stuff
like "I'm going to run a really long test now" or "I'm going to try and make
an Internet connection".  The other is "this is just a comment that goes in
the TAP stream that might be useful for someone reading and debugging the raw
TAP".  One is normally displayed, one is not.


> Currently:
>       - no harness can take advantage of information sent to STDERR
>       - the existing Test::Harness::Straps already knows how to handle 
> diagnostic 
> information!
> 
> In the future:
>       - TAP producers that continue to send diagnostics to STDERR are no 
> worse off!
>       - TAP consumers that use Test::Harness::Straps can handle diagnostic 
> information

THS has no future, its going away when TAP::Parser takes over as the guts of
TH 3.  It was an experiment and has served its purpose.


> Displaying ALL diagnostics is not only not harmful but it is consistent with 
> Test::Harness::TAP:

(Augh, double negative makes brain skip)

>       Diagnostics
> 
>        Additional information may be put into the testing output on separate
>        lines.  Diagnostic lines should begin with a "#", which the harness
>        must ignore, at least as far as analyzing the test results.  The har‐
>        ness is free, however, to display the diagnostics.

Test::Harness::TAP is not a bible but a draft.  That the harness is free to
display diagnostics does not mean that its a good idea to do so by default.


>>> Problems solved:
>>>
>>>     - synchronization no longer an issue
>> Synching is still an issue as warnings will be out of sync.  C'est le
>> guerre.
> 
> Perl warnings?  The ones that go to STDERR by default?  How in the world do 
> they go through diag()?

That's the point, they don't.

I layed out why this is troublesome on the TAP::Parser mailing list.  It goes
like this.  Consider the following timeline of what's going on in the harness,
the script and the STDOUT and STDERR buffers.

    Harness              Script              STDOUT          STDERR

    start script
                         print header        1..1
    read line (1..1)
                         print test          not ok 1
    parse header
                         warn "Oh crap!"     not ok 1        Oh crap!
    read line (not ok 1)
                         print test          not ok 2
    parse test
                         warn "Hell!"        not ok 2        Hell!
    display 1st failure
                                             not ok 2
    read line (not ok 2)
    parse test
    display 2nd failure

While STDOUT is buffering up waiting for the harness to read and parse each
line, the test continues on.  STDERR has no such buffering, its results go
straight out to the screen.  So its possible for a script to run two tests and
print their resulting warnings in the time the harness takes to write one.
Thus you see:

    Oh crap!
    Hell!
    test 1 failed
    test 2 failed

And one would assume the warnings are the result of the first test.  That's
the desync.  It plagues any program that treats STDOUT and STDERR as two
streams.  As Eric Willhelm pointed out on the TAP::Parser list...

$ cat ~/tmp/test.out
#!/usr/bin/perl -w

$| = 1;
print "One\n";
print "Two\n";
print STDERR "Three\n";
print STDERR "Four\n";

$ perl ~/tmp/test.out | tee
Three
Four
One
Two

$ perl ~/tmp/test.out | tee
Three
One
Two
Four

tee can't even get it right.


>> I will not have Test::Builder in bed with the harness.
> 
> Gonna be awful difficult for them to understand each other if they don't, 
> erm, 
> speak the same TAP.

Right now they do, and the TAP version number can help with future upgrades.

Furthermore, its ok for a TAP producer (such as Test::Builder) to require a
certain of Test::Harness (which is really saying "I require a parser which can
handle at least X version of TAP" but we have no direct way of a Perl module
saying that) but the other way around is a biiiig no no.  The harness should
have no knowledge of the internals of a test script.


>> I'm unconvinced this is something we can just make go away with a clever
>> trick.
> 
> And so the right solution is to figure out a clever, cross-platform, 
> cross-version way to both merge and umerge the interleaved output of two 
> separate filehandles while rejecting all semantics that could possibly 
> disambiguate whether the output came from anywhere within the tests or 
> something completely unrelated elsewhere?
> 
> That is indeed not clever.

I'm not sure yet what the solution is, but I know that merging the streams and
shunting diag() to STDOUT is not it.  I've long since rejected them and said
why many times.  The more time we spend repeating the same arguments about
rejected solutions the less we have to think about how to really fix the 
problem.

One thing's for sure, I want to eliminate the TAP producer printing anything
to STDERR on purpose.  One stream.

Reply via email to