Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)

2012-06-14 Thread Ryosuke Niwa
Done in http://trac.webkit.org/changeset/120371

- Ryosuke
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)

2012-06-07 Thread Ryosuke Niwa
On Jun 6, 2012 10:36 PM, Peter Kasting pkast...@chromium.org wrote:

 On Wed, Jun 6, 2012 at 10:22 PM, Ryosuke Niwa rn...@webkit.org wrote:

 Now that everyone knows the problem, I propose to rename FAIL to DIFF.

 FAIL should mean that the test fails, not that it fails with image,
text, or image and text failures.

 DIFF, on the other hand, has no ambiguity. It can't be interpreted as
timeout, crash, or pass but can easily be associated with image and text
differences.


 I don't think DIFF is any better.  It sounds like it means the output is
different than what we wanted, thus it effectively means didn't pass,
and one would expect it to match MISSING/CRASH/TIMEOUT as much as one would
expect FAIL to.

The output being different implies that we have an output, which isn't true
when DRT/WTR crashes or times out.

 Personally I'd prefer to resolve this -- if we need to -- by removing
FAIL entirely.  Being explicit about your expectations isn't a bad thing.
 Plus, the number of cases that are truly TEXT IMAGE IMAGE+TEXT seems
likely to be small.

People use FAIL when they don't know what to expect; e.g. adding or
rebaselining tests. zit's utterky unreasonable to expect patch authors to
add TEXT IMAGE TEXT+IMAGE to every test they're expecting to rebaseline.

I also think it's a bad practice to add test expections just to keep bots
green. It's much better if the tests started to fail on the waterfall so
that people who pay attention to bots can rebaseline them since most people
forget about rebaselining tests once their patches are landed.

- Ryosuke
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)

2012-06-07 Thread Peter Kasting
On Wed, Jun 6, 2012 at 11:28 PM, Ryosuke Niwa rn...@webkit.org wrote:

  I don't think DIFF is any better.  It sounds like it means the output is
 different than what we wanted, thus it effectively means didn't pass,
 and one would expect it to match MISSING/CRASH/TIMEOUT as much as one would
 expect FAIL to.

 The output being different implies that we have an output, which isn't
 true when DRT/WTR crashes or times out.

Those are outputs as well.  Or if you don't like the work output
substitute outcome.

  Personally I'd prefer to resolve this -- if we need to -- by removing
 FAIL entirely.  Being explicit about your expectations isn't a bad thing.
  Plus, the number of cases that are truly TEXT IMAGE IMAGE+TEXT seems
 likely to be small.

 People use FAIL when they don't know what to expect; e.g. adding or
 rebaselining tests. zit's utterky unreasonable to expect patch authors to
 add TEXT IMAGE TEXT+IMAGE to every test they're expecting to rebaseline.

If you're rebaselining an existing test, presumably the test already has an
expectation that matches what it's doing, or you can see what it's doing to
write one.

If the rebaselining tools can already figure out the right behavior when an
expectation is FAIL, why don't we just make them work correctly regardless
of what the expectation says?  Or work correctly when the expectation is
the empty string?  Those would let people just write foo.html = and
rebaseline which is even easier on them.

 I also think it's a bad practice to add test expections just to keep bots
 green. It's much better if the tests started to fail on the waterfall so
 that people who pay attention to bots can rebaseline them since most people
 forget about rebaselining tests once their patches are landed.


I can't tell what you're arguing here.  As far as I know nothing in what I
was saying related to the question of the reason why people are adding
expectations.

PK
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)

2012-06-07 Thread Ryosuke Niwa
On Thu, Jun 7, 2012 at 11:03 AM, Peter Kasting pkast...@chromium.orgwrote:

 On Wed, Jun 6, 2012 at 11:28 PM, Ryosuke Niwa rn...@webkit.org wrote:

  I don't think DIFF is any better.  It sounds like it means the output
 is different than what we wanted, thus it effectively means didn't
 pass, and one would expect it to match MISSING/CRASH/TIMEOUT as much as
 one would expect FAIL to.

 The output being different implies that we have an output, which isn't
 true when DRT/WTR crashes or times out.

 Those are outputs as well.  Or if you don't like the work output
 substitute outcome.


I don't buy it.

   Personally I'd prefer to resolve this -- if we need to -- by removing
 FAIL entirely.  Being explicit about your expectations isn't a bad thing.
  Plus, the number of cases that are truly TEXT IMAGE IMAGE+TEXT seems
 likely to be small.

 People use FAIL when they don't know what to expect; e.g. adding or
 rebaselining tests. zit's utterky unreasonable to expect patch authors to
 add TEXT IMAGE TEXT+IMAGE to every test they're expecting to rebaseline.

 If you're rebaselining an existing test, presumably the test already has
 an expectation that matches what it's doing, or you can see what it's doing
 to write one.


Not if the test was padding. I'm talking about the case where you're
modifying WebCore and know that some tests are going to need rebaselines.
People have advised in the past that patch authors add failing test
expectations to TestExpectations files to avoid turning bots red.

If the rebaselining tools can already figure out the right behavior when an
 expectation is FAIL, why don't we just make them work correctly regardless
 of what the expectation says?  Or work correctly when the expectation is
 the empty string?  Those would let people just write foo.html = and
 rebaseline which is even easier on them.


That isn't really the issue. The problem is that people add test
expectations pre-emptively to keep bots green but end up adding wrong
expectations sometimes because FAIL doesn't encompass MISSING, and that's
the failure you'll get on qt and gtk if you're adding expected results to
mac. If you're adding expected results to just chromium linux, for example,
then you'll get MISSING failure on all but chromium linux.

  I also think it's a bad practice to add test expections just to keep
 bots green. It's much better if the tests started to fail on the waterfall
 so that people who pay attention to bots can rebaseline them since most
 people forget about rebaselining tests once their patches are landed.

 I can't tell what you're arguing here.  As far as I know nothing in what I
 was saying related to the question of the reason why people are adding
 expectations.


It was nothing to do with your response. I was making a general statement.

- Ryosuke
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)

2012-06-07 Thread Peter Kasting
On Thu, Jun 7, 2012 at 12:33 PM, Ryosuke Niwa rn...@webkit.org wrote:

 Not if the test was padding. I'm talking about the case where you're
 modifying WebCore and know that some tests are going to need rebaselines.
 People have advised in the past that patch authors add failing test
 expectations to TestExpectations files to avoid turning bots red.


I think this is bad advice.  When I've been sheriff it seems like people
who try this inevitably miss some tests and platforms anyway, get wrong
expectations, etc.  All this does is add more work for the submitter that
is difficult to check ahead of time.  We should just advise people to land
and then fix (and be around on IRC/notify sheriffs about what's going on).

It seems like this is what you were saying as your general statement as
well.

PK
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)

2012-06-07 Thread Ryosuke Niwa
On Thu, Jun 7, 2012 at 12:44 PM, Peter Kasting pkast...@chromium.orgwrote:

 On Thu, Jun 7, 2012 at 12:33 PM, Ryosuke Niwa rn...@webkit.org wrote:

 Not if the test was padding. I'm talking about the case where you're
 modifying WebCore and know that some tests are going to need rebaselines.
 People have advised in the past that patch authors add failing test
 expectations to TestExpectations files to avoid turning bots red.


 I think this is bad advice.  When I've been sheriff it seems like people
 who try this inevitably miss some tests and platforms anyway, get wrong
 expectations, etc.  All this does is add more work for the submitter that
 is difficult to check ahead of time.  We should just advise people to land
 and then fix (and be around on IRC/notify sheriffs about what's going on).


Yes. I'd strongly advocate for not adding test expectations. I've seen too
many patch authors adding test expectations and then forgetting about them.

However, there is a practical problem that the commit queue uses Chromium
Linux port and rejects patches that need rebaselines unless the authors add
test expectations.

It seems like this is what you were saying as your general statement as
 well.


Right.

- Ryosuke
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)

2012-06-07 Thread Dirk Pranke
On Wed, Jun 6, 2012 at 10:36 PM, Peter Kasting pkast...@chromium.org wrote:
 On Wed, Jun 6, 2012 at 10:22 PM, Ryosuke Niwa rn...@webkit.org wrote:

 Now that everyone knows the problem, I propose to rename FAIL to DIFF.

 FAIL should mean that the test fails, not that it fails with image, text,
 or image and text failures.

 DIFF, on the other hand, has no ambiguity. It can't be interpreted as
 timeout, crash, or pass but can easily be associated with image and text
 differences.


 I don't think DIFF is any better.  It sounds like it means the output is
 different than what we wanted, thus it effectively means didn't pass,
 and one would expect it to match MISSING/CRASH/TIMEOUT as much as one would
 expect FAIL to.

 Personally I'd prefer to resolve this -- if we need to -- by removing FAIL
 entirely.  Being explicit about your expectations isn't a bad thing.  Plus,
 the number of cases that are truly TEXT IMAGE IMAGE+TEXT seems likely to be
 small.

FAIL was supposed to have been removed long ago in favor of
TEXT/IMAGE/IMAGE+TEXT as necessary as Peter suggests, but I never got
around to it :) I agree with his reasoning, and have not been
convinced that not having to be specific about what is failing is a
good thing.

I also agree that we shouldn't TEXT/IMAGE/IMAGE+TEXT in the file at
all for extended periods of time since you can just check in updated
baselines.

-- Dirk
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


[webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)

2012-06-06 Thread Ryosuke Niwa
Now that everyone knows the problem, I propose to rename FAIL to DIFF.

FAIL should mean that the test fails, not that it fails with image, text,
or image and text failures.

DIFF, on the other hand, has no ambiguity. It can't be interpreted as
timeout, crash, or pass but can easily be associated with image and text
differences.

- Ryosuke

On Wed, Jun 6, 2012 at 9:42 PM, Ryosuke Niwa rn...@webkit.org wrote:

 Hi,

 As much as it doesn't make any sense to me, FAIL test expectation doesn't
 cover MISSING, CRASH, and TIMEOUT test expectations as it is currently
 interpreted by new-run-webkit-tests. Meaning that

 BUGRNIWA : some-pixel-test-i-am-adding.html = FAIL

 is logically equivalent to

 BUGRNIWA : some-pixel-test-i-am-adding.html = IMAGE TEXT IMAGE+TEXT

 Thus, if you're adding a pixel test and only adding the corresponding
 expected result to, say, Mac, then you need to add MISSING results to some
 platforms (e.g. Qt, GTK+, etc...) that cannot find this expected result as
 in:

 BUGRNIWA : some-pixel-test-i-am-adding.html = MISSING

 If you're expecting the test to fail with image, text, or image and text
 failures, or pass, then do:

 BUGRNIWA : some-pixel-test-i-am-adding.html = FAIL PASS

 If you're expecting the test to timeout, crash, or, fail (image, text, or
 image and text), or pass, then do:

 BUGRNIWA : some-pixel-test-i-am-adding.html = TIMEOUT CRASH FAIL PASS

 Best,
 Ryosuke Niwa
 Software Engineer
 Google Inc.



___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Rename FAIL to DIFF Was (Re: PSA: FAIL test expectation does not encompass MISSING, CRASH, or TIMEOUT)

2012-06-06 Thread Peter Kasting
On Wed, Jun 6, 2012 at 10:22 PM, Ryosuke Niwa rn...@webkit.org wrote:

 Now that everyone knows the problem, I propose to rename FAIL to DIFF.

 FAIL should mean that the test fails, not that it fails with image, text,
 or image and text failures.

 DIFF, on the other hand, has no ambiguity. It can't be interpreted as
 timeout, crash, or pass but can easily be associated with image and text
 differences.


I don't think DIFF is any better.  It sounds like it means the output is
different than what we wanted, thus it effectively means didn't pass,
and one would expect it to match MISSING/CRASH/TIMEOUT as much as one would
expect FAIL to.

Personally I'd prefer to resolve this -- if we need to -- by removing FAIL
entirely.  Being explicit about your expectations isn't a bad thing.  Plus,
the number of cases that are truly TEXT IMAGE IMAGE+TEXT seems likely to be
small.

PK
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev