Let me make something clear, I don't have a solution to this problem. I'm just finally getting a grip on what the problem actually is. The last week has shaken lose use cases and conditions I hadn't thought about before and the TAP diagnostic syntax proposal does not cover.
What I do know is that merging the streams or sending diag() to STDOUT will not work. I've explained this a number of times. I don't want to spend any more time explaining why when the energy can be spent on a real fix. I've spent probably an hour on this post alone. chromatic wrote: >> Here's one. >> http://www.mail-archive.com/perl-qa@perl.org/msg06694.html > > Merely skimming that message reveals a falsehood: > > Test::Harness throws out all non-TAP stuff going to STDOUT. This > includes > comments. So if Test::Builder started sending its diagnostics to STDOUT > they'd disappear into the ether. > > That hasn't been true for *nearly two years*. I seem to recall patching > Test::Harness::Straps before the coffee stain book came out in July 2005. I forgot, I'm on a mailing list. Thou shalt not let any inaccuracy, no matter how minor or totally inconsequential to the point being made, go uncorrected. Yes, THS might capture them but TH does nothing with them and that's the important thing for the user. >> Here's another, referencing that first one. >> http://www.nntp.perl.org/group/perl.qa/2006/09/msg7152.html > >> And yet another one. >> http://www.nntp.perl.org/group/perl.qa/2006/09/msg7153.html > >> And that's just in September. I feel like I have to refute this idea about >> once a month. Here we go again. > > I remember that thread now, the one where people like Ovid, Aristotle, David, > Adrian, and chromatic all said "Uh, making one small change to Test::Builder > and fixing Test.pm too would improve 99% of the Perl world without making > life worse than it already is for anyone else". > > Did I miss your response to everything there? The same points remain. 1) It breaks displaying TODO tests. 2) Not everyone uses Test.pm and Test::Builder. Really. TH is one of the most widely used modules in Perldom and most folks have no idea they're using it. 3) TAP comments on STDOUT have never been displayed, it will break that assumption which has been around for about a decade. Lots of folks take advantage of this by printing to either STDOUT or STDERR for "don't display" or "display". You'll start displaying all sorts of comments that never were before. 4) It requires a simultaneous Test::Harness, Test and Test::More upgrade. I don't want to even think of the dependency mobius strip that will be. 5) It couples the harness and the producer violating one of the central principles of TAP, what sets it apart from XUnit. >> Piping all diagnostics to STDOUT solves nothing except maybe allowing >> runtests to display warnings again. You still can't tell the difference >> between a comment (what currently is "# foo" printed to STDOUT) and a >> failure diagnostic (what currently is "# foo" printed to STDERR) and >> diagnostics associated with a TODO test (which is "# foo" printed to >> STDOUT). > > Test::Harness can identify TODO tests. Test::Harness can even tell if a > diagnostic followed a TODO test. I covered this in the original post. The heuristics used to determine what diagnostics are associated with what test are just that, heuristics. There's nothing which guarantees that 1) diagnostics must follow a test and 2) that all the diagnostics between two tests belong to the proceeding test. Its just a convention. You can easily lose important failure diagnostics this way. >> Consider the following. Again. >> >> $ cat ~/tmp/foo.t >> #!/usr/bin/perl -w >> >> $| = 1; >> >> print "1..2\n"; >> print "ok 1\n"; >> print "# This is not displayed\n"; >> print "not ok 2\n"; >> print STDERR "# This displayed.\n"; > > ... > >> $ runtests ~/tmp/foo.t >> /Users/schwern/tmp/foo......1/2 # This is not displayed >> /Users/schwern/tmp/foo......2/2 # This displayed. >> /Users/schwern/tmp/foo...... Failed 1/2 subtests > > You didn't have to convince me that interleaving STDOUT and STDERR is a > mistake. I already believed that. That might not have been clear, it was intended to show the effect of what you're proposing on existing tests. I used runtests because it does display all diagnostics (except for its TODO heuristic). It illustrates that the STDOUT/STDERR split is an important one in determining what should be displayed. If you start displaying all diagnostics from STDOUT you're going to display things which were not intended to be and passing tests will be noisy. This goes against the principle that passing tests should be quiet. >> There are proposals on the table for machine parsable diagnostics on the >> wiki but nothing to indicate free-form information to be displayed to the >> user. > > Currently: > - the test harness displays all free-form information to the user by > virtue > of not touching it at all "Currently" as in what Test::Harness 2 currently does or in what Test::Harness 3 might do as a stop gap? I'm going to assume the former. That's not entirely true. "free-form information" (aka TAP diagnostics) are not displayed by the harness. The only reason diagnostics going to STDERR are displayed is because the harness does nothing to STDERR, it has no knowledge. > In the future: > - the test harness could display all free-form information to the user > by > virtue of it being identifiably free-form information I don't quite agree, there are two classes of information. One "this is information which should be displayed as part of a normal, passing test" stuff like "I'm going to run a really long test now" or "I'm going to try and make an Internet connection". The other is "this is just a comment that goes in the TAP stream that might be useful for someone reading and debugging the raw TAP". One is normally displayed, one is not. > Currently: > - no harness can take advantage of information sent to STDERR > - the existing Test::Harness::Straps already knows how to handle > diagnostic > information! > > In the future: > - TAP producers that continue to send diagnostics to STDERR are no > worse off! > - TAP consumers that use Test::Harness::Straps can handle diagnostic > information THS has no future, its going away when TAP::Parser takes over as the guts of TH 3. It was an experiment and has served its purpose. > Displaying ALL diagnostics is not only not harmful but it is consistent with > Test::Harness::TAP: (Augh, double negative makes brain skip) > Diagnostics > > Additional information may be put into the testing output on separate > lines. Diagnostic lines should begin with a "#", which the harness > must ignore, at least as far as analyzing the test results. The har‐ > ness is free, however, to display the diagnostics. Test::Harness::TAP is not a bible but a draft. That the harness is free to display diagnostics does not mean that its a good idea to do so by default. >>> Problems solved: >>> >>> - synchronization no longer an issue >> Synching is still an issue as warnings will be out of sync. C'est le >> guerre. > > Perl warnings? The ones that go to STDERR by default? How in the world do > they go through diag()? That's the point, they don't. I layed out why this is troublesome on the TAP::Parser mailing list. It goes like this. Consider the following timeline of what's going on in the harness, the script and the STDOUT and STDERR buffers. Harness Script STDOUT STDERR start script print header 1..1 read line (1..1) print test not ok 1 parse header warn "Oh crap!" not ok 1 Oh crap! read line (not ok 1) print test not ok 2 parse test warn "Hell!" not ok 2 Hell! display 1st failure not ok 2 read line (not ok 2) parse test display 2nd failure While STDOUT is buffering up waiting for the harness to read and parse each line, the test continues on. STDERR has no such buffering, its results go straight out to the screen. So its possible for a script to run two tests and print their resulting warnings in the time the harness takes to write one. Thus you see: Oh crap! Hell! test 1 failed test 2 failed And one would assume the warnings are the result of the first test. That's the desync. It plagues any program that treats STDOUT and STDERR as two streams. As Eric Willhelm pointed out on the TAP::Parser list... $ cat ~/tmp/test.out #!/usr/bin/perl -w $| = 1; print "One\n"; print "Two\n"; print STDERR "Three\n"; print STDERR "Four\n"; $ perl ~/tmp/test.out | tee Three Four One Two $ perl ~/tmp/test.out | tee Three One Two Four tee can't even get it right. >> I will not have Test::Builder in bed with the harness. > > Gonna be awful difficult for them to understand each other if they don't, > erm, > speak the same TAP. Right now they do, and the TAP version number can help with future upgrades. Furthermore, its ok for a TAP producer (such as Test::Builder) to require a certain of Test::Harness (which is really saying "I require a parser which can handle at least X version of TAP" but we have no direct way of a Perl module saying that) but the other way around is a biiiig no no. The harness should have no knowledge of the internals of a test script. >> I'm unconvinced this is something we can just make go away with a clever >> trick. > > And so the right solution is to figure out a clever, cross-platform, > cross-version way to both merge and umerge the interleaved output of two > separate filehandles while rejecting all semantics that could possibly > disambiguate whether the output came from anywhere within the tests or > something completely unrelated elsewhere? > > That is indeed not clever. I'm not sure yet what the solution is, but I know that merging the streams and shunting diag() to STDOUT is not it. I've long since rejected them and said why many times. The more time we spend repeating the same arguments about rejected solutions the less we have to think about how to really fix the problem. One thing's for sure, I want to eliminate the TAP producer printing anything to STDERR on purpose. One stream.