Re: [webkit-dev] A simpler proposal for handling failing tests WAS: A proposal for handling "failing" layout tests and TestExpectations

Dirk Pranke Mon, 20 Aug 2012 16:37:20 -0700

On Sat, Aug 18, 2012 at 8:31 PM, Filip Pizlo <fpi...@apple.com> wrote:
>
> On Aug 18, 2012, at 5:55 PM, Maciej Stachowiak <m...@apple.com> wrote:
>
>>
>> On Aug 18, 2012, at 5:11 PM, Filip Pizlo <fpi...@apple.com> wrote:
>>
>>> Maybe at this point we can agree to let Dirk land some variant of this with 
>>> whatever half-way sensible name (any of the options on the table are 
>>> decent) and see how it works?
>>>
>>> It seems that the only thing anyone is disagreeing over is naming and which 
>>> files to keep around, which is a much smaller set of differences than 
>>> status-quo versus any variant of this proposal.
>>
>> I agree that we should adopt some variant over the status quo. As you 
>> rightly noted, there are too many different ways to handle tests that 
>> deviate from the original expectation, and we have the opportunity to 
>> obsolete most of those ways with an approach that combines advantages of 
>> multiple current approaches.
>>
>> However, I fear that whatever names we pick for the first round will then be 
>> unchangeable due to status quo bias (which we see a lot of in test 
>> infrastructure discussions, indeed, even this one). And anyone arguing 
>> against change at that point will have a valid argument that a huge global 
>> rename of tests is a bad idea. So I think it's worth expending a little 
>> effort to find names that are good.
>>
>> Would you object to -expected-failure/-unexpected-pass as a naming scheme, 
>> along with the approach of keeping both around when they are used?
>
> I don't mind -expected-failure/-unexpected-pass, and I think that the 
> slightly added verbosity will make things clearer.  Would you also advocate 
> having the tooling mandate that the expected files are in either, but never 
> both, of these two states:
>
> 1) -expected.foo
> 2) -expected-failure.foo/-unexpected-pass.foo
>
> That is, if we're not in a failing state, the -expected suffix is what we 
> use.  Dirk, what do you think?  (And a possibly correct retort will be to 
> tell us that we're bikeshedding. ;-))
>

I think I'm lost :) I think this is partially because Maciej didn't
respond to my previous questions about this proposal, and partially
because I'm not actually sure which combinations you're now proposing
we have (there was something like twelve different variants :).

Perhaps someone can recap how they expect things to work and what the
extensions being proposed for each case are?

While I agree with Maciej's point that it would be nice if
"-expected.txt" referred to whatever we currently expect to happen, as
this discussion indicates, the definition of "expected" itself starts
to become unclear. This is partially why I only wanted there to be one
baseline allowed for a given test regardless of pass/fail/unknown
status.

The other (and IMO more serious) flaw with allowing more than one
baseline to exist at a time is that the one that isn't actually being
exercised is subject to bitrot, and hence it's not clear how relevant
it will stay. But it's hard to discuss this clearly without being
referring to the different names and cases, and so I'll wait until
someone can recap first.

-- Dirk

> -F
>
>
>>
>> Regards,
>> Maciej
>>
>>>
>>> -Filip
>>>
>>> On Aug 18, 2012, at 2:01 PM, Maciej Stachowiak <m...@apple.com> wrote:
>>>
>>>>
>>>> On Aug 18, 2012, at 1:08 AM, Filip Pizlo <fpi...@apple.com> wrote:
>>>>
>>>>> I like your idea of having both the result-we-currently-expect and the 
>>>>> result-we-think-may-be-more-correct to be checked in.  I still prefer 
>>>>> Dirk's naming scheme though.
>>>>
>>>> I think if we had both checked in, the result-we-think-may-be-more-correct 
>>>> should be named something other than -expected, since it is not, in fact, 
>>>> expected. That was the basis of my naming scheme.
>>>>
>>>> I think I would be happy with any scheme that had both checked in, and 
>>>> matched the criteria that you never have a file named -expected that is 
>>>> unexpected. For example, there could be schemes with no file named 
>>>> expected. If you let it be verbose, you could have:
>>>>
>>>> Single result:
>>>>  foo-expected.txt
>>>>
>>>> Possibly-worse current result, possibly-better older result:
>>>>  foo-expected-failure.txt
>>>>  foo-unexpected-pass.txt
>>>>
>>>>>
>>>>> I get the notion that "expected" always means literally what it seems to 
>>>>> mean from the standpoint of whether the tooling is silent for the test 
>>>>> (actual == expected) or has something to say.
>>>>>
>>>>> But I think that if the tooling is behaving right, your concern that "a 
>>>>> test would fail if it did *not* match the "failing" result" would be 
>>>>> addressed: the tooling could be silent for actual == failing (if a 
>>>>> failing file exists) but notify you of an "unexpected pass" if actual == 
>>>>> expected.
>>>>
>>>> But if you match neither, you get a failure for not matching the "failing" 
>>>> result. That still strikes me as a little goofy. Not failing is failing, 
>>>> and getting the expected result is unexpected. I think my extra-verbose 
>>>> naming scheme above would better match what you suggest the tool UI would 
>>>> do. Maybe there is a more concise way to get the same point across.
>>>>
>>>> Regards,
>>>> Maciej
>>>>
>>
>
_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev

Re: [webkit-dev] A simpler proposal for handling failing tests WAS: A proposal for handling "failing" layout tests and TestExpectations

Reply via email to