Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)
Done in http://trac.webkit.org/changeset/120371 - Ryosuke ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)
On Jun 6, 2012 10:36 PM, Peter Kasting pkast...@chromium.org wrote: On Wed, Jun 6, 2012 at 10:22 PM, Ryosuke Niwa rn...@webkit.org wrote: Now that everyone knows the problem, I propose to rename FAIL to DIFF. FAIL should mean that the test fails, not that it fails with image, text, or image and text failures. DIFF, on the other hand, has no ambiguity. It can't be interpreted as timeout, crash, or pass but can easily be associated with image and text differences. I don't think DIFF is any better. It sounds like it means the output is different than what we wanted, thus it effectively means didn't pass, and one would expect it to match MISSING/CRASH/TIMEOUT as much as one would expect FAIL to. The output being different implies that we have an output, which isn't true when DRT/WTR crashes or times out. Personally I'd prefer to resolve this -- if we need to -- by removing FAIL entirely. Being explicit about your expectations isn't a bad thing. Plus, the number of cases that are truly TEXT IMAGE IMAGE+TEXT seems likely to be small. People use FAIL when they don't know what to expect; e.g. adding or rebaselining tests. zit's utterky unreasonable to expect patch authors to add TEXT IMAGE TEXT+IMAGE to every test they're expecting to rebaseline. I also think it's a bad practice to add test expections just to keep bots green. It's much better if the tests started to fail on the waterfall so that people who pay attention to bots can rebaseline them since most people forget about rebaselining tests once their patches are landed. - Ryosuke ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)
On Wed, Jun 6, 2012 at 11:28 PM, Ryosuke Niwa rn...@webkit.org wrote: I don't think DIFF is any better. It sounds like it means the output is different than what we wanted, thus it effectively means didn't pass, and one would expect it to match MISSING/CRASH/TIMEOUT as much as one would expect FAIL to. The output being different implies that we have an output, which isn't true when DRT/WTR crashes or times out. Those are outputs as well. Or if you don't like the work output substitute outcome. Personally I'd prefer to resolve this -- if we need to -- by removing FAIL entirely. Being explicit about your expectations isn't a bad thing. Plus, the number of cases that are truly TEXT IMAGE IMAGE+TEXT seems likely to be small. People use FAIL when they don't know what to expect; e.g. adding or rebaselining tests. zit's utterky unreasonable to expect patch authors to add TEXT IMAGE TEXT+IMAGE to every test they're expecting to rebaseline. If you're rebaselining an existing test, presumably the test already has an expectation that matches what it's doing, or you can see what it's doing to write one. If the rebaselining tools can already figure out the right behavior when an expectation is FAIL, why don't we just make them work correctly regardless of what the expectation says? Or work correctly when the expectation is the empty string? Those would let people just write foo.html = and rebaseline which is even easier on them. I also think it's a bad practice to add test expections just to keep bots green. It's much better if the tests started to fail on the waterfall so that people who pay attention to bots can rebaseline them since most people forget about rebaselining tests once their patches are landed. I can't tell what you're arguing here. As far as I know nothing in what I was saying related to the question of the reason why people are adding expectations. PK ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)
On Thu, Jun 7, 2012 at 11:03 AM, Peter Kasting pkast...@chromium.orgwrote: On Wed, Jun 6, 2012 at 11:28 PM, Ryosuke Niwa rn...@webkit.org wrote: I don't think DIFF is any better. It sounds like it means the output is different than what we wanted, thus it effectively means didn't pass, and one would expect it to match MISSING/CRASH/TIMEOUT as much as one would expect FAIL to. The output being different implies that we have an output, which isn't true when DRT/WTR crashes or times out. Those are outputs as well. Or if you don't like the work output substitute outcome. I don't buy it. Personally I'd prefer to resolve this -- if we need to -- by removing FAIL entirely. Being explicit about your expectations isn't a bad thing. Plus, the number of cases that are truly TEXT IMAGE IMAGE+TEXT seems likely to be small. People use FAIL when they don't know what to expect; e.g. adding or rebaselining tests. zit's utterky unreasonable to expect patch authors to add TEXT IMAGE TEXT+IMAGE to every test they're expecting to rebaseline. If you're rebaselining an existing test, presumably the test already has an expectation that matches what it's doing, or you can see what it's doing to write one. Not if the test was padding. I'm talking about the case where you're modifying WebCore and know that some tests are going to need rebaselines. People have advised in the past that patch authors add failing test expectations to TestExpectations files to avoid turning bots red. If the rebaselining tools can already figure out the right behavior when an expectation is FAIL, why don't we just make them work correctly regardless of what the expectation says? Or work correctly when the expectation is the empty string? Those would let people just write foo.html = and rebaseline which is even easier on them. That isn't really the issue. The problem is that people add test expectations pre-emptively to keep bots green but end up adding wrong expectations sometimes because FAIL doesn't encompass MISSING, and that's the failure you'll get on qt and gtk if you're adding expected results to mac. If you're adding expected results to just chromium linux, for example, then you'll get MISSING failure on all but chromium linux. I also think it's a bad practice to add test expections just to keep bots green. It's much better if the tests started to fail on the waterfall so that people who pay attention to bots can rebaseline them since most people forget about rebaselining tests once their patches are landed. I can't tell what you're arguing here. As far as I know nothing in what I was saying related to the question of the reason why people are adding expectations. It was nothing to do with your response. I was making a general statement. - Ryosuke ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)
On Thu, Jun 7, 2012 at 12:33 PM, Ryosuke Niwa rn...@webkit.org wrote: Not if the test was padding. I'm talking about the case where you're modifying WebCore and know that some tests are going to need rebaselines. People have advised in the past that patch authors add failing test expectations to TestExpectations files to avoid turning bots red. I think this is bad advice. When I've been sheriff it seems like people who try this inevitably miss some tests and platforms anyway, get wrong expectations, etc. All this does is add more work for the submitter that is difficult to check ahead of time. We should just advise people to land and then fix (and be around on IRC/notify sheriffs about what's going on). It seems like this is what you were saying as your general statement as well. PK ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)
On Thu, Jun 7, 2012 at 12:44 PM, Peter Kasting pkast...@chromium.orgwrote: On Thu, Jun 7, 2012 at 12:33 PM, Ryosuke Niwa rn...@webkit.org wrote: Not if the test was padding. I'm talking about the case where you're modifying WebCore and know that some tests are going to need rebaselines. People have advised in the past that patch authors add failing test expectations to TestExpectations files to avoid turning bots red. I think this is bad advice. When I've been sheriff it seems like people who try this inevitably miss some tests and platforms anyway, get wrong expectations, etc. All this does is add more work for the submitter that is difficult to check ahead of time. We should just advise people to land and then fix (and be around on IRC/notify sheriffs about what's going on). Yes. I'd strongly advocate for not adding test expectations. I've seen too many patch authors adding test expectations and then forgetting about them. However, there is a practical problem that the commit queue uses Chromium Linux port and rejects patches that need rebaselines unless the authors add test expectations. It seems like this is what you were saying as your general statement as well. Right. - Ryosuke ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)
On Wed, Jun 6, 2012 at 10:36 PM, Peter Kasting pkast...@chromium.org wrote: On Wed, Jun 6, 2012 at 10:22 PM, Ryosuke Niwa rn...@webkit.org wrote: Now that everyone knows the problem, I propose to rename FAIL to DIFF. FAIL should mean that the test fails, not that it fails with image, text, or image and text failures. DIFF, on the other hand, has no ambiguity. It can't be interpreted as timeout, crash, or pass but can easily be associated with image and text differences. I don't think DIFF is any better. It sounds like it means the output is different than what we wanted, thus it effectively means didn't pass, and one would expect it to match MISSING/CRASH/TIMEOUT as much as one would expect FAIL to. Personally I'd prefer to resolve this -- if we need to -- by removing FAIL entirely. Being explicit about your expectations isn't a bad thing. Plus, the number of cases that are truly TEXT IMAGE IMAGE+TEXT seems likely to be small. FAIL was supposed to have been removed long ago in favor of TEXT/IMAGE/IMAGE+TEXT as necessary as Peter suggests, but I never got around to it :) I agree with his reasoning, and have not been convinced that not having to be specific about what is failing is a good thing. I also agree that we shouldn't TEXT/IMAGE/IMAGE+TEXT in the file at all for extended periods of time since you can just check in updated baselines. -- Dirk ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
[webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)
Now that everyone knows the problem, I propose to rename FAIL to DIFF. FAIL should mean that the test fails, not that it fails with image, text, or image and text failures. DIFF, on the other hand, has no ambiguity. It can't be interpreted as timeout, crash, or pass but can easily be associated with image and text differences. - Ryosuke On Wed, Jun 6, 2012 at 9:42 PM, Ryosuke Niwa rn...@webkit.org wrote: Hi, As much as it doesn't make any sense to me, FAIL test expectation doesn't cover MISSING, CRASH, and TIMEOUT test expectations as it is currently interpreted by new-run-webkit-tests. Meaning that BUGRNIWA : some-pixel-test-i-am-adding.html = FAIL is logically equivalent to BUGRNIWA : some-pixel-test-i-am-adding.html = IMAGE TEXT IMAGE+TEXT Thus, if you're adding a pixel test and only adding the corresponding expected result to, say, Mac, then you need to add MISSING results to some platforms (e.g. Qt, GTK+, etc...) that cannot find this expected result as in: BUGRNIWA : some-pixel-test-i-am-adding.html = MISSING If you're expecting the test to fail with image, text, or image and text failures, or pass, then do: BUGRNIWA : some-pixel-test-i-am-adding.html = FAIL PASS If you're expecting the test to timeout, crash, or, fail (image, text, or image and text), or pass, then do: BUGRNIWA : some-pixel-test-i-am-adding.html = TIMEOUT CRASH FAIL PASS Best, Ryosuke Niwa Software Engineer Google Inc. ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)
On Wed, Jun 6, 2012 at 10:22 PM, Ryosuke Niwa rn...@webkit.org wrote: Now that everyone knows the problem, I propose to rename FAIL to DIFF. FAIL should mean that the test fails, not that it fails with image, text, or image and text failures. DIFF, on the other hand, has no ambiguity. It can't be interpreted as timeout, crash, or pass but can easily be associated with image and text differences. I don't think DIFF is any better. It sounds like it means the output is different than what we wanted, thus it effectively means didn't pass, and one would expect it to match MISSING/CRASH/TIMEOUT as much as one would expect FAIL to. Personally I'd prefer to resolve this -- if we need to -- by removing FAIL entirely. Being explicit about your expectations isn't a bad thing. Plus, the number of cases that are truly TEXT IMAGE IMAGE+TEXT seems likely to be small. PK ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev