Re: [webkit-dev] Pixel test experiment

2010-10-14 Thread Ojan Vafai
My experience is that having a non-zero tolerance makes maintaining the
pixel results *harder*. It makes it easier at first of course. But as more
and more tests only pass with a non-zero tolerance, it gets harder to figure
out if your change causes a regression (e.g. your change causes a pixel test
to fail, but when you look at the diff, it includes more changes than you
would expect from your patch).

Having no tolerance is a pain for sure, but it's much more black and white
and thus, it's usually much easier to reason about the correctness of a
change.

Ojan

On Tue, Oct 12, 2010 at 1:43 PM, James Robinson jam...@google.com wrote:

 To add a concrete data point, http://trac.webkit.org/changeset/69517 caused
 a number of SVG tests to fail.  It required 14 text rebaselines for Mac and
 a further two more for Leopard (done by Adam Barth).  In order to pass the
 pixel tests in Chromium, it required 1506 new pixel baselines (checked in by
 the very brave Albert Wong, http://trac.webkit.org/changeset/69543).  None
 of the rebaselining was done by the patch authors and in general I would not
 expect a patch author that didn't work in Chromium to be expected to update
 Chromium-specific baselines.  I'm a little skeptical of the claim that all
 SVG changes are run through the pixel tests given that to date none of the
 affected platform/mac SVG pixel baselines have been updated.  This sort of
 mass-rebaselining is required fairly regularly for minor changes in SVG and
 in other parts of the codebase.

 I'd really like for the bots to run the pixel tests on every run,
 preferably with 0 tolerance.  We catch a lot of regressions by running these
 tests on the Chromium bots that would probably otherwise go unnoticed.
  However there is a large maintenance cost associated with this coverage.
  We normally have two engineers (one in PST, one elsewhere in the world) who
 watch the Chromium bots to triage, suppress, and rebaseline tests as churn
 is introduced.

 Questions:
 - If the pixel tests were running either with a tolerance of 0 or 0.1, what
 would the expectation be for a patch like
 http://trac.webkit.org/changeset/69517 which requires hundreds of pixel
 rebaselines?  Would the patch author be expected to update the baselines for
 the platform/mac port, or would someone else?  Thus far the Chromium folks
 have been the only ones actively maintaining the pixel baselines - which I
 think is entirely reasonable since we're the only ones trying to run the
 pixel tests on bots.

 - Do we have the tools and infrastructure needed to do mass rebaselines in
 WebKit currently?  We've built a number of tools to deal with the Chromium
 expectations, but since this has been a need unique to Chromium so far the
 tools only work for Chromium.

 - James


 On Fri, Oct 8, 2010 at 11:18 PM, Nikolas Zimmermann 
 zimmerm...@physik.rwth-aachen.de wrote:


 Am 08.10.2010 um 20:14 schrieb Jeremy Orlow:


  I'm not an expert on Pixel tests, but my understanding is that in
 Chromium (where we've always run with tolerance 0) we've seen real
 regressions that would have slipped by with something like tolerance 0.1.
  When you have 0 tolerance, it is more maintenance work, but if we can avoid
 regressions, it seems worth it.


 Well, that's why I initially argued for tolerance 0. Especially in SVG we
 had lots of regressions in the past that were below the 0.1 tolerance. I
 fully support --tolerance 0 as default.

 Dirk  me are also willing to investigate possible problem sources and
 minimize them.
 Reftests as Simon said, are a great thing, but it won't help with official
 test suites like the W3C one - it would be a huge amount of work to create
 reftests for all of these...


 Cheers,
 Niko

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev



 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Pixel test experiment

2010-10-13 Thread Nikolas Zimmermann


Am 12.10.2010 um 22:43 schrieb James Robinson:

To add a concrete data point, http://trac.webkit.org/changeset/69517  
caused a number of SVG tests to fail.  It required 14 text  
rebaselines for Mac and a further two more for Leopard (done by Adam  
Barth).  In order to pass the pixel tests in Chromium, it required  
1506 new pixel baselines (checked in by the very brave Albert Wong, http://trac.webkit.org/changeset/69543) 
.  None of the rebaselining was done by the patch authors and in  
general I would not expect a patch author that didn't work in  
Chromium to be expected to update Chromium-specific baselines.  I'm  
a little skeptical of the claim that all SVG changes are run through  
the pixel tests given that to date none of the affected platform/mac  
SVG pixel baselines have been updated.  This sort of mass- 
rebaselining is required fairly regularly for minor changes in SVG  
and in other parts of the codebase.


Dirk  me are running _every_ single SVG change through run-webkit- 
tests --tolerance 0 -p svg.
Andreas Kling doesn't have access to a mac, and Dirk  me agreed to  
rebaseline all affected SVG tests after his comment. Dirk did the 10.6  
baseline updates, I added platform exceptions for 10.5 after Dirks  
commit.


You can trust me, that we're running pixel tests.


I'd really like for the bots to run the pixel tests on every run,  
preferably with 0 tolerance.  We catch a lot of regressions by  
running these tests on the Chromium bots that would probably  
otherwise go unnoticed.  However there is a large maintenance cost  
associated with this coverage.  We normally have two engineers (one  
in PST, one elsewhere in the world) who watch the Chromium bots to  
triage, suppress, and rebaseline tests as churn is introduced.


Questions:
- If the pixel tests were running either with a tolerance of 0 or  
0.1, what would the expectation be for a patch like http://trac.webkit.org/changeset/69517 
 which requires hundreds of pixel rebaselines?  Would the patch  
author be expected to update the baselines for the platform/mac  
port, or would someone else?  Thus far the Chromium folks have been  
the only ones actively maintaining the pixel baselines - which I  
think is entirely reasonable since we're the only ones trying to run  
the pixel tests on bots.
As I said before, Dirk  me maintain the mac pixel baselines for SVG.  
If I would have written the patch, I had included the rebaselines,  
though ppl like Andreas who don't have access to pixel tests, have to  
be able to produce patches as well.




- Do we have the tools and infrastructure needed to do mass  
rebaselines in WebKit currently?  We've built a number of tools to  
deal with the Chromium expectations, but since this has been a need  
unique to Chromium so far the tools only work for Chromium.

Yes, webkit-patch rebaseline but I think Adam already mentioned it.

Cheers,
Niko

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Pixel test experiment

2010-10-12 Thread James Robinson
To add a concrete data point, http://trac.webkit.org/changeset/69517 caused
a number of SVG tests to fail.  It required 14 text rebaselines for Mac and
a further two more for Leopard (done by Adam Barth).  In order to pass the
pixel tests in Chromium, it required 1506 new pixel baselines (checked in by
the very brave Albert Wong, http://trac.webkit.org/changeset/69543).  None
of the rebaselining was done by the patch authors and in general I would not
expect a patch author that didn't work in Chromium to be expected to update
Chromium-specific baselines.  I'm a little skeptical of the claim that all
SVG changes are run through the pixel tests given that to date none of the
affected platform/mac SVG pixel baselines have been updated.  This sort of
mass-rebaselining is required fairly regularly for minor changes in SVG and
in other parts of the codebase.

I'd really like for the bots to run the pixel tests on every run, preferably
with 0 tolerance.  We catch a lot of regressions by running these tests on
the Chromium bots that would probably otherwise go unnoticed.  However there
is a large maintenance cost associated with this coverage.  We normally have
two engineers (one in PST, one elsewhere in the world) who watch the
Chromium bots to triage, suppress, and rebaseline tests as churn is
introduced.

Questions:
- If the pixel tests were running either with a tolerance of 0 or 0.1, what
would the expectation be for a patch like
http://trac.webkit.org/changeset/69517 which requires hundreds of pixel
rebaselines?  Would the patch author be expected to update the baselines for
the platform/mac port, or would someone else?  Thus far the Chromium folks
have been the only ones actively maintaining the pixel baselines - which I
think is entirely reasonable since we're the only ones trying to run the
pixel tests on bots.

- Do we have the tools and infrastructure needed to do mass rebaselines in
WebKit currently?  We've built a number of tools to deal with the Chromium
expectations, but since this has been a need unique to Chromium so far the
tools only work for Chromium.

- James

On Fri, Oct 8, 2010 at 11:18 PM, Nikolas Zimmermann 
zimmerm...@physik.rwth-aachen.de wrote:


 Am 08.10.2010 um 20:14 schrieb Jeremy Orlow:


  I'm not an expert on Pixel tests, but my understanding is that in Chromium
 (where we've always run with tolerance 0) we've seen real regressions that
 would have slipped by with something like tolerance 0.1.  When you have 0
 tolerance, it is more maintenance work, but if we can avoid regressions, it
 seems worth it.


 Well, that's why I initially argued for tolerance 0. Especially in SVG we
 had lots of regressions in the past that were below the 0.1 tolerance. I
 fully support --tolerance 0 as default.

 Dirk  me are also willing to investigate possible problem sources and
 minimize them.
 Reftests as Simon said, are a great thing, but it won't help with official
 test suites like the W3C one - it would be a huge amount of work to create
 reftests for all of these...


 Cheers,
 Niko

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Pixel test experiment

2010-10-12 Thread Adam Barth
On Tue, Oct 12, 2010 at 1:43 PM, James Robinson jam...@google.com wrote:
 - Do we have the tools and infrastructure needed to do mass rebaselines in
 WebKit currently?  We've built a number of tools to deal with the Chromium
 expectations, but since this has been a need unique to Chromium so far the
 tools only work for Chromium.

webkit-patch has a very primitive rebaseline command.  There are
some bugs on file on making it better.

Adam
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Pixel test experiment

2010-10-12 Thread Jeremy Orlow
On Tue, Oct 12, 2010 at 1:43 PM, James Robinson jam...@google.com wrote:

 To add a concrete data point, http://trac.webkit.org/changeset/69517 caused
 a number of SVG tests to fail.  It required 14 text rebaselines for Mac and
 a further two more for Leopard (done by Adam Barth).  In order to pass the
 pixel tests in Chromium, it required 1506 new pixel baselines (checked in by
 the very brave Albert Wong, http://trac.webkit.org/changeset/69543).  None
 of the rebaselining was done by the patch authors and in general I would not
 expect a patch author that didn't work in Chromium to be expected to update
 Chromium-specific baselines.  I'm a little skeptical of the claim that all
 SVG changes are run through the pixel tests given that to date none of the
 affected platform/mac SVG pixel baselines have been updated.  This sort of
 mass-rebaselining is required fairly regularly for minor changes in SVG and
 in other parts of the codebase.

 I'd really like for the bots to run the pixel tests on every run,
 preferably with 0 tolerance.  We catch a lot of regressions by running these
 tests on the Chromium bots that would probably otherwise go unnoticed.
  However there is a large maintenance cost associated with this coverage.
  We normally have two engineers (one in PST, one elsewhere in the world) who
 watch the Chromium bots to triage, suppress, and rebaseline tests as churn
 is introduced.


This isn't to say that it's 2 full time people worth of work to keep them up
to date though.

Background: Pixel tests and Chromium tests (i.e. tests that touch the full
Chromium stack, performance tests, valgrind, etc) both rely on code in
Chromium's SVN repo which is why we don't have them on build.webkit.org.
 One of the major jobs of the WebKit gardener (which James mentioned) is
triaging these failures and fixing them, filing upstream bugs, backing out
the changes (when they are true regressions and the author is not around and
cannot be easily fixed), and/or rebaslining when appropriate.  Another major
job (and part of why we try to have near 24 hour coverage) is keeping the
build from breaking so that our bots don't go blind.  So the actual act of
rebaslining is only a small component of their job.


 Questions:
 - If the pixel tests were running either with a tolerance of 0 or 0.1, what
 would the expectation be for a patch like
 http://trac.webkit.org/changeset/69517 which requires hundreds of pixel
 rebaselines?  Would the patch author be expected to update the baselines for
 the platform/mac port, or would someone else?  Thus far the Chromium folks
 have been the only ones actively maintaining the pixel baselines - which I
 think is entirely reasonable since we're the only ones trying to run the
 pixel tests on bots.

 - Do we have the tools and infrastructure needed to do mass rebaselines in
 WebKit currently?  We've built a number of tools to deal with the Chromium
 expectations, but since this has been a need unique to Chromium so far the
 tools only work for Chromium.

 - James

 On Fri, Oct 8, 2010 at 11:18 PM, Nikolas Zimmermann 
 zimmerm...@physik.rwth-aachen.de wrote:


 Am 08.10.2010 um 20:14 schrieb Jeremy Orlow:


  I'm not an expert on Pixel tests, but my understanding is that in
 Chromium (where we've always run with tolerance 0) we've seen real
 regressions that would have slipped by with something like tolerance 0.1.
  When you have 0 tolerance, it is more maintenance work, but if we can avoid
 regressions, it seems worth it.


 Well, that's why I initially argued for tolerance 0. Especially in SVG we
 had lots of regressions in the past that were below the 0.1 tolerance. I
 fully support --tolerance 0 as default.

 Dirk  me are also willing to investigate possible problem sources and
 minimize them.
 Reftests as Simon said, are a great thing, but it won't help with official
 test suites like the W3C one - it would be a huge amount of work to create
 reftests for all of these...


 Cheers,
 Niko

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev



___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Pixel test experiment

2010-10-12 Thread Dirk Schulze
Does it support pixel test updates? Is it possible to extend this tool if not? 
This would limit the maintenance cost and every commiter should rebaseline mac 
if the change is a progression, or the difference is machine dependent
(but not OS dependent).

Dirk

Am 12.10.2010 um 22:49 schrieb Adam Barth:

 On Tue, Oct 12, 2010 at 1:43 PM, James Robinson jam...@google.com wrote:
 - Do we have the tools and infrastructure needed to do mass rebaselines in
 WebKit currently?  We've built a number of tools to deal with the Chromium
 expectations, but since this has been a need unique to Chromium so far the
 tools only work for Chromium.
 
 webkit-patch has a very primitive rebaseline command.  There are
 some bugs on file on making it better.
 
 Adam
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Pixel test experiment

2010-10-09 Thread Nikolas Zimmermann


Am 08.10.2010 um 20:14 schrieb Jeremy Orlow:

I'm not an expert on Pixel tests, but my understanding is that in  
Chromium (where we've always run with tolerance 0) we've seen real  
regressions that would have slipped by with something like tolerance  
0.1.  When you have 0 tolerance, it is more maintenance work, but if  
we can avoid regressions, it seems worth it.


Well, that's why I initially argued for tolerance 0. Especially in SVG  
we had lots of regressions in the past that were below the 0.1  
tolerance. I fully support --tolerance 0 as default.


Dirk  me are also willing to investigate possible problem sources and  
minimize them.
Reftests as Simon said, are a great thing, but it won't help with  
official test suites like the W3C one - it would be a huge amount of  
work to create reftests for all of these...


Cheers,
Niko

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Pixel test experiment

2010-10-08 Thread Nikolas Zimmermann


Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak:



On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote:


Good evening webkit folks,

I've finished landing svg/ pixel test baselines, which pass with -- 
tolerance 0 on my 10.5  10.6 machines.
As the pixel testing is very important for the SVG tests, I'd like  
to run them on the bots, experimentally, so we can catch  
regressions easily.


Maybe someone with direct access to the leopard  snow leopard  
bots, could just run run-webkit-tests --tolerance 0 -p svg and  
mail me the results?
If it passes, we could maybe run the pixel tests for the svg/  
subdirectory on these bots?


Running pixel tests would be great, but can we really expect the  
results to be stable cross-platform with tolerance 0? Perhaps we  
should start with a higher tolerance level.


Sure, we could do that. But I'd really like to get a feeling, for  
what's problematic first. If we see 95% of the SVG tests pass with -- 
tolerance 0, and only a few need higher tolerances
(64bit vs. 32bit aa differences, etc.), I could come up with a per- 
file pixel test tolerance extension to DRT, if it's needed.


How about starting with just one build slave (say. Mac Leopard) that  
runs the pixel tests for SVG, with --tolerance 0 for a while. I'd be  
happy to identify the problems, and see

if we can make it work, somehow :-)

Cheers,
Niko

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Pixel test experiment

2010-10-08 Thread Dirk Schulze
We missed many changes because of an existent tolerance level in the past. We 
made a baseline for MacOS Leopard as well as Snow Leopard and I would active 
pixel tests just for those two bots. I don't expect any problems. Niko and I 
run pixel tests on different machines and get the same results.

Dirk

Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak:

 
 On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote:
 
 Good evening webkit folks,
 
 I've finished landing svg/ pixel test baselines, which pass with --tolerance 
 0 on my 10.5  10.6 machines.
 As the pixel testing is very important for the SVG tests, I'd like to run 
 them on the bots, experimentally, so we can catch regressions easily.
 
 Maybe someone with direct access to the leopard  snow leopard bots, could 
 just run run-webkit-tests --tolerance 0 -p svg and mail me the results?
 If it passes, we could maybe run the pixel tests for the svg/ subdirectory 
 on these bots?
 
 Running pixel tests would be great, but can we really expect the results to 
 be stable cross-platform with tolerance 0? Perhaps we should start with a 
 higher tolerance level.
 
 REgards,
 Maciej
 
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Pixel test experiment

2010-10-08 Thread Nikolas Zimmermann


Am 07.10.2010 um 22:28 schrieb Evan Martin:


Chromium also runs pixel tests (for all tests).  For SVG, I recall we
have problems where 32-bit and 64-bit code will end up drawing
(antialiasing) curves differently.  Does this sound familiar?  Do you
have any suggestions on how to address it?


This doesn't sound familiar, because I don't have many machines to  
test with. I'm mainly working on an older MacBook Pro, using 10.5, 32  
bits,
and a new MacBook Pro using 10.6 which is 64 bits. The pixel test  
baseline in platform/mac/svg is generated using the 10.6 machine, the  
10.5
baseline using the other 32 bit machine. If I recall correctly, I only  
saw a few AA differences, but much more font differences.


When we had pixel tests enabled in the past, I recall that eg. the  
Leopard bot passed them 100%, and I had several hundred tests failing
on my local leopard machine. Do you guys have baselines that pass on  
most developer machines (default installations of eg. win, no  
special fonts installed)
_and_ the bot? Or do you always rely on the pixel test results from  
the bots, and don't run pixel test locally?


Cheers,
Niko

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Pixel test experiment

2010-10-08 Thread Maciej Stachowiak

On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote:

 
 Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak:
 
 
 On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote:
 
 Good evening webkit folks,
 
 I've finished landing svg/ pixel test baselines, which pass with 
 --tolerance 0 on my 10.5  10.6 machines.
 As the pixel testing is very important for the SVG tests, I'd like to run 
 them on the bots, experimentally, so we can catch regressions easily.
 
 Maybe someone with direct access to the leopard  snow leopard bots, could 
 just run run-webkit-tests --tolerance 0 -p svg and mail me the results?
 If it passes, we could maybe run the pixel tests for the svg/ subdirectory 
 on these bots?
 
 Running pixel tests would be great, but can we really expect the results to 
 be stable cross-platform with tolerance 0? Perhaps we should start with a 
 higher tolerance level.
 
 Sure, we could do that. But I'd really like to get a feeling, for what's 
 problematic first. If we see 95% of the SVG tests pass with --tolerance 0, 
 and only a few need higher tolerances
 (64bit vs. 32bit aa differences, etc.), I could come up with a per-file pixel 
 test tolerance extension to DRT, if it's needed.
 
 How about starting with just one build slave (say. Mac Leopard) that runs the 
 pixel tests for SVG, with --tolerance 0 for a while. I'd be happy to identify 
 the problems, and see
 if we can make it work, somehow :-)

The problem I worry about is that on future Mac OS X releases, rendering of 
shapes may change in some tiny way that is not visible but enough to cause 
failures at tolerance 0. In the past, such false positives arose from time to 
time, which is one reason we added pixel test tolerance in the first place. I 
don't think running pixel tests on just one build slave will help us understand 
that risk.

Why not start with some low but non-zero tolerance (0.1?) and see if we can at 
least make that work consistently, before we try the bolder step of tolerance 0?

Also, and as a side note, we probably need to add more build slaves to run 
pixel tests at all, since just running the test suite without pixel tests is 
already slow enough that the testers are often significantly behind the 
builders.

Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Pixel test experiment

2010-10-08 Thread Dirk Schulze

 The problem I worry about is that on future Mac OS X releases, rendering of 
 shapes may change in some tiny way that is not visible but enough to cause 
 failures at tolerance 0. In the past, such false positives arose from time to 
 time, which is one reason we added pixel test tolerance in the first place. I 
 don't think running pixel tests on just one build slave will help us 
 understand that risk.
 
 Why not start with some low but non-zero tolerance (0.1?) and see if we can 
 at least make that work consistently, before we try the bolder step of 
 tolerance 0?
 
 Also, and as a side note, we probably need to add more build slaves to run 
 pixel tests at all, since just running the test suite without pixel tests is 
 already slow enough that the testers are often significantly behind the 
 builders.
 
 Regards,
 Maciej
Running pixel test with a tolerance of 0.1 is still better than don't run pixel 
tests at all. So if we get a consensus with a small tolerance, I'm fine.
And yes, we might get problems with a new MacOS release. We have a lot of 
differences (0.1%) between 10.5 and 10.6 right now.
But I don't see a problem with it as long as someone manages the results. Niko 
and I are doing it for SVG on MacOSX 10.6 and also continue it on 10.5 for a 
while.

Dirk
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Pixel test experiment

2010-10-08 Thread Jeremy Orlow
I'm not an expert on Pixel tests, but my understanding is that in Chromium
(where we've always run with tolerance 0) we've seen real regressions that
would have slipped by with something like tolerance 0.1.  When you have
0 tolerance, it is more maintenance work, but if we can avoid regressions,
it seems worth it.

J

On Fri, Oct 8, 2010 at 10:58 AM, Nikolas Zimmermann 
zimmerm...@physik.rwth-aachen.de wrote:


 Am 08.10.2010 um 19:53 schrieb Maciej Stachowiak:



 On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote:


 Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak:


 On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote:

  Good evening webkit folks,

 I've finished landing svg/ pixel test baselines, which pass with
 --tolerance 0 on my 10.5  10.6 machines.
 As the pixel testing is very important for the SVG tests, I'd like to
 run them on the bots, experimentally, so we can catch regressions easily.

 Maybe someone with direct access to the leopard  snow leopard bots,
 could just run run-webkit-tests --tolerance 0 -p svg and mail me the
 results?
 If it passes, we could maybe run the pixel tests for the svg/
 subdirectory on these bots?


 Running pixel tests would be great, but can we really expect the results
 to be stable cross-platform with tolerance 0? Perhaps we should start with 
 a
 higher tolerance level.


 Sure, we could do that. But I'd really like to get a feeling, for what's
 problematic first. If we see 95% of the SVG tests pass with --tolerance 0,
 and only a few need higher tolerances
 (64bit vs. 32bit aa differences, etc.), I could come up with a per-file
 pixel test tolerance extension to DRT, if it's needed.

 How about starting with just one build slave (say. Mac Leopard) that runs
 the pixel tests for SVG, with --tolerance 0 for a while. I'd be happy to
 identify the problems, and see
 if we can make it work, somehow :-)


 The problem I worry about is that on future Mac OS X releases, rendering
 of shapes may change in some tiny way that is not visible but enough to
 cause failures at tolerance 0. In the past, such false positives arose from
 time to time, which is one reason we added pixel test tolerance in the first
 place. I don't think running pixel tests on just one build slave will help
 us understand that risk.


 I think we'd just update the baseline to the newer OS X release, then, like
 it has been done for the tiger - leopard, leopard - snow leopard switch?
 platform/mac/ should always contain the newest release baseline, when
 therere are differences on leopard, the results go into
 platform/mac-leopard/


  Why not start with some low but non-zero tolerance (0.1?) and see if we
 can at least make that work consistently, before we try the bolder step of
 tolerance 0?
 Also, and as a side note, we probably need to add more build slaves to run
 pixel tests at all, since just running the test suite without pixel tests is
 already slow enough that the testers are often significantly behind the
 builders.


 Well, I thought about just running the pixel tests for the svg/
 subdirectory as a seperate step, hence my request for tolerance 0, as the
 baseline passes without problems at least on my  Dirks machine already.
 I wouldnt' want to argue running 20.000+ pixel tests with tolerance 0 as
 first step :-) But the 1000 SVG tests, might be fine, with tolerance 0?

 Even tolerance 0.1 as default for SVG would be fine with me, as long as we
 can get the bots to run the SVG pixel tests :-)

 Cheers,
 Niko


 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Pixel test experiment

2010-10-07 Thread Dirk Schulze
I strongly support pixel tests for SVG on the bots! Niko and me are hard 
working to get SVG pxiel perfect at all time. We run pixel tests on every patch 
we apply to the SVG code. And it would really help us if the bots blame any 
change that causes a pixel test to fail, or at least give some kind of feedback.

Dirk

 Good evening webkit folks,
 
 I've finished landing svg/ pixel test baselines, which pass with --tolerance 
 0 on my 10.5  10.6 machines.
 As the pixel testing is very important for the SVG tests, I'd like to run 
 them on the bots, experimentally, so we can catch regressions easily.
 
 Maybe someone with direct access to the leopard  snow leopard bots, could 
 just run run-webkit-tests --tolerance 0 -p svg and mail me the results?
 If it passes, we could maybe run the pixel tests for the svg/ subdirectory on 
 these bots?
 
 Cheers,
 Niko
 
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Pixel test experiment

2010-10-07 Thread Jeremy Orlow
This does seem like a great idea.  The more pixel tests we can run on the
bots, the better!

J

On Thu, Oct 7, 2010 at 11:06 AM, Dirk Schulze k...@webkit.org wrote:

 I strongly support pixel tests for SVG on the bots! Niko and me are hard
 working to get SVG pxiel perfect at all time. We run pixel tests on every
 patch we apply to the SVG code. And it would really help us if the bots
 blame any change that causes a pixel test to fail, or at least give some
 kind of feedback.

 Dirk

  Good evening webkit folks,
 
  I've finished landing svg/ pixel test baselines, which pass with
 --tolerance 0 on my 10.5  10.6 machines.
  As the pixel testing is very important for the SVG tests, I'd like to run
 them on the bots, experimentally, so we can catch regressions easily.
 
  Maybe someone with direct access to the leopard  snow leopard bots,
 could just run run-webkit-tests --tolerance 0 -p svg and mail me the
 results?
  If it passes, we could maybe run the pixel tests for the svg/
 subdirectory on these bots?
 
  Cheers,
  Niko
 
  ___
  webkit-dev mailing list
  webkit-dev@lists.webkit.org
  http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Pixel test experiment

2010-10-07 Thread Maciej Stachowiak

On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote:

 Good evening webkit folks,
 
 I've finished landing svg/ pixel test baselines, which pass with --tolerance 
 0 on my 10.5  10.6 machines.
 As the pixel testing is very important for the SVG tests, I'd like to run 
 them on the bots, experimentally, so we can catch regressions easily.
 
 Maybe someone with direct access to the leopard  snow leopard bots, could 
 just run run-webkit-tests --tolerance 0 -p svg and mail me the results?
 If it passes, we could maybe run the pixel tests for the svg/ subdirectory on 
 these bots?

Running pixel tests would be great, but can we really expect the results to be 
stable cross-platform with tolerance 0? Perhaps we should start with a higher 
tolerance level.

REgards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev