Re: [webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)

2010-10-19 Thread Mihai Parparita
FWIW, I needed NRWT to support --tolerance for something else today
(mainly because when using it with the Mac port, it defaults to 0.1
tolerance, with no way to override it), so I added NRWT support for
it: http://webkit.org/b/47959.

Mihai

On Thu, Oct 14, 2010 at 2:44 PM, Dirk Pranke dpra...@chromium.org wrote:
 On Thu, Oct 14, 2010 at 9:06 AM, Ojan Vafai o...@chromium.org wrote:
 Dirk, implementing --tolerance in NRWT isn't that hard is it? Getting rid of
 --tolerance will be a lot of work of making sure all the pixel results that
 currently pass also pass with --tolerance=0. While I would support someone
 doing that work, I don't think we should block moving to NRWT on it.

 Assuming we implement it only for the ports that currently use
 tolerance on old-run-webkit-tests, no, I wouldn't expect it to be
 hard. Dunno how much work it would be to implement tolerance on the
 chromium image_diff implementations (side note: it would be nice if
 these binaries weren't port-specific, but that's another topic).

 As to how many files we'd have to rebaseline for the base ports, I
 don't know how many there are compared to how many fail pixel tests,
 period. I'll run a couple tests and find out.

 -- Dirk

 Ojan
 On Fri, Oct 8, 2010 at 1:03 PM, Simon Fraser simon.fra...@apple.com wrote:

 I think the best solution to this pixel matching problem is ref tests.

 How practical would it be to use ref tests for SVG?

 Simon

 On Oct 8, 2010, at 12:43 PM, Dirk Pranke wrote:

  Jeremy is correct; the Chromium port has seen real regressions that
  virtually no concept of a fuzzy match that I can imagine would've
  caught.
  new-run-webkit-tests doesn't currently support the tolerance concept
  at al, and I am inclined to argue that it shouldn't.
 
  However, I frequently am wrong about things, so it's quite possible
  that there are good arguments for supporting it that I'm not aware of.
  I'm not particularly interested in working on a tool that doesn't do
  what the group wants it to do, and I would like all of the other
  WebKit ports to be running pixel tests by default (and
  new-run-webkit-tests ;) ) since I think it catches bugs.
 
  As far as I know, the general sentiment on the list has been that we
  should be running pixel tests by default, and the reason that we
  aren't is largely due to the work involved in getting them back up to
  date and keeping them up to date. I'm sure that fuzzy matching reduces
  the work load, especially for the sort of mismatches caused by
  differences in the text antialiasing.
 
  In addition, I have heard concerns that we'd like to keep fuzzy
  matching because people might potentially get different results on
  machines with different hardware configurations, but I don't know that
  we have any confirmed cases of that (except for arguably the case of
  different code paths for gpu-accelerated rendering vs. unaccelerated
  rendering).
 
  If we made it easier to maintain the baselines (improved tooling like
  the chromium's rebaselining tool, add reftest support, etc.) are there
  still compelling reasons for supporting --tolerance -based testing as
  opposed to exact matching?
 
  -- Dirk
 
  On Fri, Oct 8, 2010 at 11:14 AM, Jeremy Orlow jor...@chromium.org
  wrote:
  I'm not an expert on Pixel tests, but my understanding is that in
  Chromium
  (where we've always run with tolerance 0) we've seen real regressions
  that
  would have slipped by with something like tolerance 0.1.  When you have
  0 tolerance, it is more maintenance work, but if we can avoid
  regressions,
  it seems worth it.
  J
 
  On Fri, Oct 8, 2010 at 10:58 AM, Nikolas Zimmermann
  zimmerm...@physik.rwth-aachen.de wrote:
 
  Am 08.10.2010 um 19:53 schrieb Maciej Stachowiak:
 
 
  On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote:
 
 
  Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak:
 
 
  On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote:
 
  Good evening webkit folks,
 
  I've finished landing svg/ pixel test baselines, which pass with
  --tolerance 0 on my 10.5  10.6 machines.
  As the pixel testing is very important for the SVG tests, I'd like
  to
  run them on the bots, experimentally, so we can catch regressions
  easily.
 
  Maybe someone with direct access to the leopard  snow leopard
  bots,
  could just run run-webkit-tests --tolerance 0 -p svg and mail me
  the
  results?
  If it passes, we could maybe run the pixel tests for the svg/
  subdirectory on these bots?
 
  Running pixel tests would be great, but can we really expect the
  results to be stable cross-platform with tolerance 0? Perhaps we
  should
  start with a higher tolerance level.
 
  Sure, we could do that. But I'd really like to get a feeling, for
  what's
  problematic first. If we see 95% of the SVG tests pass with
  --tolerance 0,
  and only a few need higher tolerances
  (64bit vs. 32bit aa differences, etc.), I could come up with a
  per-file
  pixel test tolerance extension to DRT, if it's needed.
 
  How about 

Re: [webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)

2010-10-14 Thread Ojan Vafai
Simon, are you suggesting that we should only use pixel results for ref
tests? If not, then we still need to come to a conclusion on this tolerance
issue.

Dirk, implementing --tolerance in NRWT isn't that hard is it? Getting rid of
--tolerance will be a lot of work of making sure all the pixel results that
currently pass also pass with --tolerance=0. While I would support someone
doing that work, I don't think we should block moving to NRWT on it.

Ojan

On Fri, Oct 8, 2010 at 1:03 PM, Simon Fraser simon.fra...@apple.com wrote:

 I think the best solution to this pixel matching problem is ref tests.

 How practical would it be to use ref tests for SVG?

 Simon

 On Oct 8, 2010, at 12:43 PM, Dirk Pranke wrote:

  Jeremy is correct; the Chromium port has seen real regressions that
  virtually no concept of a fuzzy match that I can imagine would've
  caught.
  new-run-webkit-tests doesn't currently support the tolerance concept
  at al, and I am inclined to argue that it shouldn't.
 
  However, I frequently am wrong about things, so it's quite possible
  that there are good arguments for supporting it that I'm not aware of.
  I'm not particularly interested in working on a tool that doesn't do
  what the group wants it to do, and I would like all of the other
  WebKit ports to be running pixel tests by default (and
  new-run-webkit-tests ;) ) since I think it catches bugs.
 
  As far as I know, the general sentiment on the list has been that we
  should be running pixel tests by default, and the reason that we
  aren't is largely due to the work involved in getting them back up to
  date and keeping them up to date. I'm sure that fuzzy matching reduces
  the work load, especially for the sort of mismatches caused by
  differences in the text antialiasing.
 
  In addition, I have heard concerns that we'd like to keep fuzzy
  matching because people might potentially get different results on
  machines with different hardware configurations, but I don't know that
  we have any confirmed cases of that (except for arguably the case of
  different code paths for gpu-accelerated rendering vs. unaccelerated
  rendering).
 
  If we made it easier to maintain the baselines (improved tooling like
  the chromium's rebaselining tool, add reftest support, etc.) are there
  still compelling reasons for supporting --tolerance -based testing as
  opposed to exact matching?
 
  -- Dirk
 
  On Fri, Oct 8, 2010 at 11:14 AM, Jeremy Orlow jor...@chromium.org
 wrote:
  I'm not an expert on Pixel tests, but my understanding is that in
 Chromium
  (where we've always run with tolerance 0) we've seen real regressions
 that
  would have slipped by with something like tolerance 0.1.  When you have
  0 tolerance, it is more maintenance work, but if we can avoid
 regressions,
  it seems worth it.
  J
 
  On Fri, Oct 8, 2010 at 10:58 AM, Nikolas Zimmermann
  zimmerm...@physik.rwth-aachen.de wrote:
 
  Am 08.10.2010 um 19:53 schrieb Maciej Stachowiak:
 
 
  On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote:
 
 
  Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak:
 
 
  On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote:
 
  Good evening webkit folks,
 
  I've finished landing svg/ pixel test baselines, which pass with
  --tolerance 0 on my 10.5  10.6 machines.
  As the pixel testing is very important for the SVG tests, I'd like
 to
  run them on the bots, experimentally, so we can catch regressions
 easily.
 
  Maybe someone with direct access to the leopard  snow leopard
 bots,
  could just run run-webkit-tests --tolerance 0 -p svg and mail me
 the
  results?
  If it passes, we could maybe run the pixel tests for the svg/
  subdirectory on these bots?
 
  Running pixel tests would be great, but can we really expect the
  results to be stable cross-platform with tolerance 0? Perhaps we
 should
  start with a higher tolerance level.
 
  Sure, we could do that. But I'd really like to get a feeling, for
 what's
  problematic first. If we see 95% of the SVG tests pass with
 --tolerance 0,
  and only a few need higher tolerances
  (64bit vs. 32bit aa differences, etc.), I could come up with a
 per-file
  pixel test tolerance extension to DRT, if it's needed.
 
  How about starting with just one build slave (say. Mac Leopard) that
  runs the pixel tests for SVG, with --tolerance 0 for a while. I'd be
 happy
  to identify the problems, and see
  if we can make it work, somehow :-)
 
  The problem I worry about is that on future Mac OS X releases,
 rendering
  of shapes may change in some tiny way that is not visible but enough
 to
  cause failures at tolerance 0. In the past, such false positives arose
 from
  time to time, which is one reason we added pixel test tolerance in the
 first
  place. I don't think running pixel tests on just one build slave will
 help
  us understand that risk.
 
  I think we'd just update the baseline to the newer OS X release, then,
  like it has been done for the tiger - leopard, leopard - snow leopard
  

Re: [webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)

2010-10-14 Thread Stephen White
I'm not sure if this could be make to work with SVG (might require some
additions to LayoutTestController), but Philip Taylor's canvas test suite
(in LayoutTests/canvas/philip) compares pixels programmatically in
JavaScript.  This has the major advantage that it doesn't require pixel
results, and allows for a per-test level of fuzziness/tolerance (if
required).  Obviously we would still want to have some tests remain pixel
tests, as these tests only cover a subset of pixels, but it might be a good
alternative to consider when writing new tests (especially for regressions,
where a single pixel correctly chosen can often correctly isolate the
problem).

Stephen

On Thu, Oct 14, 2010 at 12:06 PM, Ojan Vafai o...@chromium.org wrote:

 Simon, are you suggesting that we should only use pixel results for ref
 tests? If not, then we still need to come to a conclusion on this tolerance
 issue.

 Dirk, implementing --tolerance in NRWT isn't that hard is it? Getting rid
 of --tolerance will be a lot of work of making sure all the pixel results
 that currently pass also pass with --tolerance=0. While I would support
 someone doing that work, I don't think we should block moving to NRWT on it.

 Ojan

 On Fri, Oct 8, 2010 at 1:03 PM, Simon Fraser simon.fra...@apple.comwrote:

 I think the best solution to this pixel matching problem is ref tests.

 How practical would it be to use ref tests for SVG?

 Simon

 On Oct 8, 2010, at 12:43 PM, Dirk Pranke wrote:

  Jeremy is correct; the Chromium port has seen real regressions that
  virtually no concept of a fuzzy match that I can imagine would've
  caught.
  new-run-webkit-tests doesn't currently support the tolerance concept
  at al, and I am inclined to argue that it shouldn't.
 
  However, I frequently am wrong about things, so it's quite possible
  that there are good arguments for supporting it that I'm not aware of.
  I'm not particularly interested in working on a tool that doesn't do
  what the group wants it to do, and I would like all of the other
  WebKit ports to be running pixel tests by default (and
  new-run-webkit-tests ;) ) since I think it catches bugs.
 
  As far as I know, the general sentiment on the list has been that we
  should be running pixel tests by default, and the reason that we
  aren't is largely due to the work involved in getting them back up to
  date and keeping them up to date. I'm sure that fuzzy matching reduces
  the work load, especially for the sort of mismatches caused by
  differences in the text antialiasing.
 
  In addition, I have heard concerns that we'd like to keep fuzzy
  matching because people might potentially get different results on
  machines with different hardware configurations, but I don't know that
  we have any confirmed cases of that (except for arguably the case of
  different code paths for gpu-accelerated rendering vs. unaccelerated
  rendering).
 
  If we made it easier to maintain the baselines (improved tooling like
  the chromium's rebaselining tool, add reftest support, etc.) are there
  still compelling reasons for supporting --tolerance -based testing as
  opposed to exact matching?
 
  -- Dirk
 
  On Fri, Oct 8, 2010 at 11:14 AM, Jeremy Orlow jor...@chromium.org
 wrote:
  I'm not an expert on Pixel tests, but my understanding is that in
 Chromium
  (where we've always run with tolerance 0) we've seen real regressions
 that
  would have slipped by with something like tolerance 0.1.  When you have
  0 tolerance, it is more maintenance work, but if we can avoid
 regressions,
  it seems worth it.
  J
 
  On Fri, Oct 8, 2010 at 10:58 AM, Nikolas Zimmermann
  zimmerm...@physik.rwth-aachen.de wrote:
 
  Am 08.10.2010 um 19:53 schrieb Maciej Stachowiak:
 
 
  On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote:
 
 
  Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak:
 
 
  On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote:
 
  Good evening webkit folks,
 
  I've finished landing svg/ pixel test baselines, which pass with
  --tolerance 0 on my 10.5  10.6 machines.
  As the pixel testing is very important for the SVG tests, I'd like
 to
  run them on the bots, experimentally, so we can catch regressions
 easily.
 
  Maybe someone with direct access to the leopard  snow leopard
 bots,
  could just run run-webkit-tests --tolerance 0 -p svg and mail me
 the
  results?
  If it passes, we could maybe run the pixel tests for the svg/
  subdirectory on these bots?
 
  Running pixel tests would be great, but can we really expect the
  results to be stable cross-platform with tolerance 0? Perhaps we
 should
  start with a higher tolerance level.
 
  Sure, we could do that. But I'd really like to get a feeling, for
 what's
  problematic first. If we see 95% of the SVG tests pass with
 --tolerance 0,
  and only a few need higher tolerances
  (64bit vs. 32bit aa differences, etc.), I could come up with a
 per-file
  pixel test tolerance extension to DRT, if it's needed.
 
  How about starting with just one build 

Re: [webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)

2010-10-14 Thread Simon Fraser
On Oct 14, 2010, at 9:06 AM, Ojan Vafai wrote:

 Simon, are you suggesting that we should only use pixel results for ref tests?

In an ideal world, yes. But we have such a huge body of existing tests that 
converting them all to ref tests
is a non-starter, so I agree that we need to resolve the tolerance issue.

However, at some point I'd like to see us get to a stage where new pixel tests 
must be ref tests.

Simon 

 If not, then we still need to come to a conclusion on this tolerance issue.
 
 Dirk, implementing --tolerance in NRWT isn't that hard is it? Getting rid of 
 --tolerance will be a lot of work of making sure all the pixel results that 
 currently pass also pass with --tolerance=0. While I would support someone 
 doing that work, I don't think we should block moving to NRWT on it.
 
 Ojan

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)

2010-10-14 Thread Dirk Pranke
On Thu, Oct 14, 2010 at 9:06 AM, Ojan Vafai o...@chromium.org wrote:
 Dirk, implementing --tolerance in NRWT isn't that hard is it? Getting rid of
 --tolerance will be a lot of work of making sure all the pixel results that
 currently pass also pass with --tolerance=0. While I would support someone
 doing that work, I don't think we should block moving to NRWT on it.

Assuming we implement it only for the ports that currently use
tolerance on old-run-webkit-tests, no, I wouldn't expect it to be
hard. Dunno how much work it would be to implement tolerance on the
chromium image_diff implementations (side note: it would be nice if
these binaries weren't port-specific, but that's another topic).

As to how many files we'd have to rebaseline for the base ports, I
don't know how many there are compared to how many fail pixel tests,
period. I'll run a couple tests and find out.

-- Dirk

 Ojan
 On Fri, Oct 8, 2010 at 1:03 PM, Simon Fraser simon.fra...@apple.com wrote:

 I think the best solution to this pixel matching problem is ref tests.

 How practical would it be to use ref tests for SVG?

 Simon

 On Oct 8, 2010, at 12:43 PM, Dirk Pranke wrote:

  Jeremy is correct; the Chromium port has seen real regressions that
  virtually no concept of a fuzzy match that I can imagine would've
  caught.
  new-run-webkit-tests doesn't currently support the tolerance concept
  at al, and I am inclined to argue that it shouldn't.
 
  However, I frequently am wrong about things, so it's quite possible
  that there are good arguments for supporting it that I'm not aware of.
  I'm not particularly interested in working on a tool that doesn't do
  what the group wants it to do, and I would like all of the other
  WebKit ports to be running pixel tests by default (and
  new-run-webkit-tests ;) ) since I think it catches bugs.
 
  As far as I know, the general sentiment on the list has been that we
  should be running pixel tests by default, and the reason that we
  aren't is largely due to the work involved in getting them back up to
  date and keeping them up to date. I'm sure that fuzzy matching reduces
  the work load, especially for the sort of mismatches caused by
  differences in the text antialiasing.
 
  In addition, I have heard concerns that we'd like to keep fuzzy
  matching because people might potentially get different results on
  machines with different hardware configurations, but I don't know that
  we have any confirmed cases of that (except for arguably the case of
  different code paths for gpu-accelerated rendering vs. unaccelerated
  rendering).
 
  If we made it easier to maintain the baselines (improved tooling like
  the chromium's rebaselining tool, add reftest support, etc.) are there
  still compelling reasons for supporting --tolerance -based testing as
  opposed to exact matching?
 
  -- Dirk
 
  On Fri, Oct 8, 2010 at 11:14 AM, Jeremy Orlow jor...@chromium.org
  wrote:
  I'm not an expert on Pixel tests, but my understanding is that in
  Chromium
  (where we've always run with tolerance 0) we've seen real regressions
  that
  would have slipped by with something like tolerance 0.1.  When you have
  0 tolerance, it is more maintenance work, but if we can avoid
  regressions,
  it seems worth it.
  J
 
  On Fri, Oct 8, 2010 at 10:58 AM, Nikolas Zimmermann
  zimmerm...@physik.rwth-aachen.de wrote:
 
  Am 08.10.2010 um 19:53 schrieb Maciej Stachowiak:
 
 
  On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote:
 
 
  Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak:
 
 
  On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote:
 
  Good evening webkit folks,
 
  I've finished landing svg/ pixel test baselines, which pass with
  --tolerance 0 on my 10.5  10.6 machines.
  As the pixel testing is very important for the SVG tests, I'd like
  to
  run them on the bots, experimentally, so we can catch regressions
  easily.
 
  Maybe someone with direct access to the leopard  snow leopard
  bots,
  could just run run-webkit-tests --tolerance 0 -p svg and mail me
  the
  results?
  If it passes, we could maybe run the pixel tests for the svg/
  subdirectory on these bots?
 
  Running pixel tests would be great, but can we really expect the
  results to be stable cross-platform with tolerance 0? Perhaps we
  should
  start with a higher tolerance level.
 
  Sure, we could do that. But I'd really like to get a feeling, for
  what's
  problematic first. If we see 95% of the SVG tests pass with
  --tolerance 0,
  and only a few need higher tolerances
  (64bit vs. 32bit aa differences, etc.), I could come up with a
  per-file
  pixel test tolerance extension to DRT, if it's needed.
 
  How about starting with just one build slave (say. Mac Leopard) that
  runs the pixel tests for SVG, with --tolerance 0 for a while. I'd be
  happy
  to identify the problems, and see
  if we can make it work, somehow :-)
 
  The problem I worry about is that on future Mac OS X releases,
  rendering
  of shapes may change in some tiny 

[webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)

2010-10-08 Thread Dirk Pranke
Jeremy is correct; the Chromium port has seen real regressions that
virtually no concept of a fuzzy match that I can imagine would've
caught.
new-run-webkit-tests doesn't currently support the tolerance concept
at al, and I am inclined to argue that it shouldn't.

However, I frequently am wrong about things, so it's quite possible
that there are good arguments for supporting it that I'm not aware of.
I'm not particularly interested in working on a tool that doesn't do
what the group wants it to do, and I would like all of the other
WebKit ports to be running pixel tests by default (and
new-run-webkit-tests ;) ) since I think it catches bugs.

As far as I know, the general sentiment on the list has been that we
should be running pixel tests by default, and the reason that we
aren't is largely due to the work involved in getting them back up to
date and keeping them up to date. I'm sure that fuzzy matching reduces
the work load, especially for the sort of mismatches caused by
differences in the text antialiasing.

In addition, I have heard concerns that we'd like to keep fuzzy
matching because people might potentially get different results on
machines with different hardware configurations, but I don't know that
we have any confirmed cases of that (except for arguably the case of
different code paths for gpu-accelerated rendering vs. unaccelerated
rendering).

If we made it easier to maintain the baselines (improved tooling like
the chromium's rebaselining tool, add reftest support, etc.) are there
still compelling reasons for supporting --tolerance -based testing as
opposed to exact matching?

-- Dirk

On Fri, Oct 8, 2010 at 11:14 AM, Jeremy Orlow jor...@chromium.org wrote:
 I'm not an expert on Pixel tests, but my understanding is that in Chromium
 (where we've always run with tolerance 0) we've seen real regressions that
 would have slipped by with something like tolerance 0.1.  When you have
 0 tolerance, it is more maintenance work, but if we can avoid regressions,
 it seems worth it.
 J

 On Fri, Oct 8, 2010 at 10:58 AM, Nikolas Zimmermann
 zimmerm...@physik.rwth-aachen.de wrote:

 Am 08.10.2010 um 19:53 schrieb Maciej Stachowiak:


 On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote:


 Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak:


 On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote:

 Good evening webkit folks,

 I've finished landing svg/ pixel test baselines, which pass with
 --tolerance 0 on my 10.5  10.6 machines.
 As the pixel testing is very important for the SVG tests, I'd like to
 run them on the bots, experimentally, so we can catch regressions easily.

 Maybe someone with direct access to the leopard  snow leopard bots,
 could just run run-webkit-tests --tolerance 0 -p svg and mail me the
 results?
 If it passes, we could maybe run the pixel tests for the svg/
 subdirectory on these bots?

 Running pixel tests would be great, but can we really expect the
 results to be stable cross-platform with tolerance 0? Perhaps we should
 start with a higher tolerance level.

 Sure, we could do that. But I'd really like to get a feeling, for what's
 problematic first. If we see 95% of the SVG tests pass with --tolerance 0,
 and only a few need higher tolerances
 (64bit vs. 32bit aa differences, etc.), I could come up with a per-file
 pixel test tolerance extension to DRT, if it's needed.

 How about starting with just one build slave (say. Mac Leopard) that
 runs the pixel tests for SVG, with --tolerance 0 for a while. I'd be happy
 to identify the problems, and see
 if we can make it work, somehow :-)

 The problem I worry about is that on future Mac OS X releases, rendering
 of shapes may change in some tiny way that is not visible but enough to
 cause failures at tolerance 0. In the past, such false positives arose from
 time to time, which is one reason we added pixel test tolerance in the first
 place. I don't think running pixel tests on just one build slave will help
 us understand that risk.

 I think we'd just update the baseline to the newer OS X release, then,
 like it has been done for the tiger - leopard, leopard - snow leopard
 switch?
 platform/mac/ should always contain the newest release baseline, when
 therere are differences on leopard, the results go into
 platform/mac-leopard/

 Why not start with some low but non-zero tolerance (0.1?) and see if we
 can at least make that work consistently, before we try the bolder step of
 tolerance 0?
 Also, and as a side note, we probably need to add more build slaves to
 run pixel tests at all, since just running the test suite without pixel
 tests is already slow enough that the testers are often significantly behind
 the builders.

 Well, I thought about just running the pixel tests for the svg/
 subdirectory as a seperate step, hence my request for tolerance 0, as the
 baseline passes without problems at least on my  Dirks machine already.
 I wouldnt' want to argue running 20.000+ pixel tests with tolerance 0 as
 first step :-) 

Re: [webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)

2010-10-08 Thread Simon Fraser
I think the best solution to this pixel matching problem is ref tests.

How practical would it be to use ref tests for SVG?

Simon

On Oct 8, 2010, at 12:43 PM, Dirk Pranke wrote:

 Jeremy is correct; the Chromium port has seen real regressions that
 virtually no concept of a fuzzy match that I can imagine would've
 caught.
 new-run-webkit-tests doesn't currently support the tolerance concept
 at al, and I am inclined to argue that it shouldn't.
 
 However, I frequently am wrong about things, so it's quite possible
 that there are good arguments for supporting it that I'm not aware of.
 I'm not particularly interested in working on a tool that doesn't do
 what the group wants it to do, and I would like all of the other
 WebKit ports to be running pixel tests by default (and
 new-run-webkit-tests ;) ) since I think it catches bugs.
 
 As far as I know, the general sentiment on the list has been that we
 should be running pixel tests by default, and the reason that we
 aren't is largely due to the work involved in getting them back up to
 date and keeping them up to date. I'm sure that fuzzy matching reduces
 the work load, especially for the sort of mismatches caused by
 differences in the text antialiasing.
 
 In addition, I have heard concerns that we'd like to keep fuzzy
 matching because people might potentially get different results on
 machines with different hardware configurations, but I don't know that
 we have any confirmed cases of that (except for arguably the case of
 different code paths for gpu-accelerated rendering vs. unaccelerated
 rendering).
 
 If we made it easier to maintain the baselines (improved tooling like
 the chromium's rebaselining tool, add reftest support, etc.) are there
 still compelling reasons for supporting --tolerance -based testing as
 opposed to exact matching?
 
 -- Dirk
 
 On Fri, Oct 8, 2010 at 11:14 AM, Jeremy Orlow jor...@chromium.org wrote:
 I'm not an expert on Pixel tests, but my understanding is that in Chromium
 (where we've always run with tolerance 0) we've seen real regressions that
 would have slipped by with something like tolerance 0.1.  When you have
 0 tolerance, it is more maintenance work, but if we can avoid regressions,
 it seems worth it.
 J
 
 On Fri, Oct 8, 2010 at 10:58 AM, Nikolas Zimmermann
 zimmerm...@physik.rwth-aachen.de wrote:
 
 Am 08.10.2010 um 19:53 schrieb Maciej Stachowiak:
 
 
 On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote:
 
 
 Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak:
 
 
 On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote:
 
 Good evening webkit folks,
 
 I've finished landing svg/ pixel test baselines, which pass with
 --tolerance 0 on my 10.5  10.6 machines.
 As the pixel testing is very important for the SVG tests, I'd like to
 run them on the bots, experimentally, so we can catch regressions 
 easily.
 
 Maybe someone with direct access to the leopard  snow leopard bots,
 could just run run-webkit-tests --tolerance 0 -p svg and mail me the
 results?
 If it passes, we could maybe run the pixel tests for the svg/
 subdirectory on these bots?
 
 Running pixel tests would be great, but can we really expect the
 results to be stable cross-platform with tolerance 0? Perhaps we should
 start with a higher tolerance level.
 
 Sure, we could do that. But I'd really like to get a feeling, for what's
 problematic first. If we see 95% of the SVG tests pass with --tolerance 0,
 and only a few need higher tolerances
 (64bit vs. 32bit aa differences, etc.), I could come up with a per-file
 pixel test tolerance extension to DRT, if it's needed.
 
 How about starting with just one build slave (say. Mac Leopard) that
 runs the pixel tests for SVG, with --tolerance 0 for a while. I'd be happy
 to identify the problems, and see
 if we can make it work, somehow :-)
 
 The problem I worry about is that on future Mac OS X releases, rendering
 of shapes may change in some tiny way that is not visible but enough to
 cause failures at tolerance 0. In the past, such false positives arose from
 time to time, which is one reason we added pixel test tolerance in the 
 first
 place. I don't think running pixel tests on just one build slave will help
 us understand that risk.
 
 I think we'd just update the baseline to the newer OS X release, then,
 like it has been done for the tiger - leopard, leopard - snow leopard
 switch?
 platform/mac/ should always contain the newest release baseline, when
 therere are differences on leopard, the results go into
 platform/mac-leopard/
 
 Why not start with some low but non-zero tolerance (0.1?) and see if we
 can at least make that work consistently, before we try the bolder step of
 tolerance 0?
 Also, and as a side note, we probably need to add more build slaves to
 run pixel tests at all, since just running the test suite without pixel
 tests is already slow enough that the testers are often significantly 
 behind
 the builders.
 
 Well, I thought about just running the pixel tests for the