Re: [webkit-dev] A simpler proposal for handling failing tests WAS: A proposal for handling "failing" layout tests and TestExpectations

Filip Pizlo Sat, 18 Aug 2012 01:07:45 -0700

I like your idea of having both the result-we-currently-expect and the 
result-we-think-may-be-more-correct to be checked in.  I still prefer Dirk's 
naming scheme though.


I get the notion that "expected" always means literally what it seems to mean 
from the standpoint of whether the tooling is silent for the test (actual == 
expected) or has something to say.

But I think that if the tooling is behaving right, your concern that "a test 
would fail if it did *not* match the "failing" result" would be addressed: the 
tooling could be silent for actual == failing (if a failing file exists) but 
notify you of an "unexpected pass" if actual == expected.

-F



On Aug 18, 2012, at 1:02 AM, Maciej Stachowiak <[email protected]> wrote:

> 
> On Aug 17, 2012, at 8:02 PM, Filip Pizlo <[email protected]> wrote:
> 
>> So this is down to expected/failing and expected/previous?
> 
> The difference is not just one of names, but also of whether you keep both 
> around. One could imagine a proposal where you add -failing.txt and keep 
> -expected.txt. But that would be confusing because 
> 
> I forgot to mention another benefit of keeping the old expectation around, 
> which is that tools can tell you if you start producing it again. Though I 
> don't know how common it is for a test to be accidentally fixed.
> 
> Yet another is that we'd avoid the slowdown of having to look for two 
> different filenames for the current expected result.
> 
>> 
>> I must say that expected/failing feels less confusing. Easier to remember if 
>> I have to quickly recall what it means. 
> 
> I don't think it's really that obvious what it means. A "failing" result 
> would, if checked in, be what is expected, and a test would fail if it did 
> *not* match the "failing" result. That is hardly intuitive. You'd also have 
> to sometimes label things as "failing" when you don't actually know that it's 
> failing, just that it's different than the previous result.
> 
> I don't know if any of my suggestions for "a result from before that may or 
> may not be better than the current result" are that great. But I would much 
> prefer to always use -expected to refer to the result that is currently 
> expected.
> 
> Regards,
> Maciej
> 
> 
>> 
>> -Filip
>> 
>> On Aug 17, 2012, at 7:36 PM, Dirk Pranke <[email protected]> wrote:
>> 
>>> I'm not sure if I like this idea or not. A couple of observations/questions 
>>> ...
>>> 
>>> 1) I wouldn't want to call it '-correct' unless we were sure it was
>>> correct; '-previous' is better in that regard
>>> 
>>> 2) the issue with keeping a '-correct' in the tree is that it's quite
>>> possible for a previous correct expectation to need to change to a
>>> different expectation to still be correct. i.e., they get stale. I
>>> fear that this could quickly become more confusing than it's worth.
>>> It's also not clear to me when -previous gets updated or removed?
>>> 
>>> 3) It also feels like '-previous' is something that we can just as
>>> easily get from SVN/Git/whatever, in a completely foolproof and
>>> automatic way. I grant that it's easier to just do a side by side
>>> compare, but "diff against previous" isn't that hard to do and we
>>> could easily write a wrapper to make it trivial ...
>>> 
>>> 4) I'd want to work through the various branches in the workflow more
>>> before I felt comfortable with this. When I was coming up with my
>>> original proposal I originally wanted to allow -passing and -expected
>>> to live side-by-side, but all sorts of complications arose, so I'd be
>>> worried that we'd have them here, too.
>>> 
>>> -- Dirk
>>> 
>>> On Fri, Aug 17, 2012 at 6:51 PM, Ojan Vafai <[email protected]> wrote:
>>>> That matches my understanding. You proposed modification sounds fine to me.
>>>> 
>>>> On Fri, Aug 17, 2012 at 6:40 PM, Maciej Stachowiak <[email protected]> wrote:
>>>>> 
>>>>> 
>>>>> My understanding of the current proposal is this:
>>>>> 
>>>>> 1) This applies to tests that fail deterministically, for reasons other
>>>>> than a crash or hang.
>>>>> 2) If the test has a new result that you're confident is a progression (or
>>>>> neither better or worse), you simply update the -expected.txt file.
>>>>> 3) If the test has a new result that you're confident is a regression, you
>>>>> delete the -expected.txt file and check in the new results as 
>>>>> -failing.txt.
>>>>> 4) Ditto points 2 and 3 with respect to -expected.png, for image diffs.
>>>>> 5) We would stop using all other ways of marking tests that fail
>>>>> deterministically, including Skipped and the many things you could enter 
>>>>> in
>>>>> TestExpectations.
>>>>> 
>>>>> Is that correct?
>>>>> 
>>>>> If so, I'd like to suggest a minor modification. In place of point 3
>>>>> above, I propose the following:
>>>>> 
>>>>> 3) If the test has a new result that you're confident is a regression, you
>>>>> rename the -expected.txt file to -previous.txt (or maybe -correct.txt or
>>>>> -pre-expected.txt or something) and check in the new results as
>>>>> -expected.txt (unless there is already a -previous.txt, in which case just
>>>>> update -expected and leave -previous).
>>>>> 
>>>>> I propose this for the following reasons:
>>>>> 
>>>>> - It maintains the longstanding approach that -expected.txt reflects what
>>>>> is currently *expected* to happen, not whether it is right or wrong in 
>>>>> some
>>>>> abstract sense. It is an expectation, not a reference.
>>>>> - It still leaves a clear indication of tests that somebody needs to look
>>>>> at further, to determine if a regression occurred.
>>>>> - It leaves both old and new result in place for easy comparison by an
>>>>> expert.
>>>>> 
>>>>> Regards,
>>>>> Maciej
>>>>> 
>>>>> 
>>>>> On Aug 17, 2012, at 6:06 PM, Ojan Vafai <[email protected]> wrote:
>>>>> 
>>>>> +1
>>>>> 
>>>>> 
>>>>> On Fri, Aug 17, 2012 at 5:36 PM, Ryosuke Niwa <[email protected]> wrote:
>>>>>> 
>>>>>> That's my expectation although we probably can't do that for flaky tests
>>>>>> :(
>>>>>> 
>>>>>> e.g. sometimes fails with image diff.
>>>>>> 
>>>>>> On Fri, Aug 17, 2012 at 5:35 PM, Filip Pizlo <[email protected]> wrote:
>>>>>>> 
>>>>>>> +1, contingent upon the following: are we agreeing that all current uses
>>>>>>> of TEXT, IMAGE, and so forth in TestExpectations should be in the *very 
>>>>>>> near
>>>>>>> term* following Dirk's change be turned into -failing files?
>>>>>>> 
>>>>>>> -Filip
>>>>>>> 
>>>>>>> 
>>>>>>> On Aug 17, 2012, at 5:01 PM, Ryosuke Niwa <[email protected]> wrote:
>>>>>>> 
>>>>>>> On Fri, Aug 17, 2012 at 4:55 PM, Ojan Vafai <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> Asserting a test case is 100% correct is nearly impossible for a large
>>>>>>>> percentage of tests. The main advantage it gives us is the ability to 
>>>>>>>> have
>>>>>>>> -expected mean "unsure".
>>>>>>>> 
>>>>>>>> Lets instead only add -failing (i.e. no -passing). Leaving -expected to
>>>>>>>> mean roughly what it does today to Chromium folk (roughly, as best we 
>>>>>>>> can
>>>>>>>> tell this test is passing). -failing means it's *probably* an incorrect
>>>>>>>> result but needs an expert to look at it to either mark it correct 
>>>>>>>> (i.e.
>>>>>>>> rename it to -expected) or figure out how the root cause of the bug.
>>>>>>>> 
>>>>>>>> This actually matches exactly what Chromium gardeners do today, except
>>>>>>>> instead of putting a line in TestExpectations/Skipped to look at 
>>>>>>>> later, they
>>>>>>>> checkin the -failing file to look at later, which has all the 
>>>>>>>> advantages
>>>>>>>> Dirk listed in the other thread.
>>>>>>> 
>>>>>>> 
>>>>>>> I'm much more comfortable with this proposal.
>>>>>>> 
>>>>>>> - Ryosuke
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> webkit-dev mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.webkit.org/mailman/listinfo/webkit-dev
>>>>> 
>>>>> _______________________________________________
>>>>> webkit-dev mailing list
>>>>> [email protected]
>>>>> http://lists.webkit.org/mailman/listinfo/webkit-dev
>>> _______________________________________________
>>> webkit-dev mailing list
>>> [email protected]
>>> http://lists.webkit.org/mailman/listinfo/webkit-dev
>> _______________________________________________
>> webkit-dev mailing list
>> [email protected]
>> http://lists.webkit.org/mailman/listinfo/webkit-dev
> 

_______________________________________________
webkit-dev mailing list
[email protected]
http://lists.webkit.org/mailman/listinfo/webkit-dev

Re: [webkit-dev] A simpler proposal for handling failing tests WAS: A proposal for handling "failing" layout tests and TestExpectations

Reply via email to