Re: [webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)
FWIW, I needed NRWT to support --tolerance for something else today (mainly because when using it with the Mac port, it defaults to 0.1 tolerance, with no way to override it), so I added NRWT support for it: http://webkit.org/b/47959. Mihai On Thu, Oct 14, 2010 at 2:44 PM, Dirk Pranke dpra...@chromium.org wrote: On Thu, Oct 14, 2010 at 9:06 AM, Ojan Vafai o...@chromium.org wrote: Dirk, implementing --tolerance in NRWT isn't that hard is it? Getting rid of --tolerance will be a lot of work of making sure all the pixel results that currently pass also pass with --tolerance=0. While I would support someone doing that work, I don't think we should block moving to NRWT on it. Assuming we implement it only for the ports that currently use tolerance on old-run-webkit-tests, no, I wouldn't expect it to be hard. Dunno how much work it would be to implement tolerance on the chromium image_diff implementations (side note: it would be nice if these binaries weren't port-specific, but that's another topic). As to how many files we'd have to rebaseline for the base ports, I don't know how many there are compared to how many fail pixel tests, period. I'll run a couple tests and find out. -- Dirk Ojan On Fri, Oct 8, 2010 at 1:03 PM, Simon Fraser simon.fra...@apple.com wrote: I think the best solution to this pixel matching problem is ref tests. How practical would it be to use ref tests for SVG? Simon On Oct 8, 2010, at 12:43 PM, Dirk Pranke wrote: Jeremy is correct; the Chromium port has seen real regressions that virtually no concept of a fuzzy match that I can imagine would've caught. new-run-webkit-tests doesn't currently support the tolerance concept at al, and I am inclined to argue that it shouldn't. However, I frequently am wrong about things, so it's quite possible that there are good arguments for supporting it that I'm not aware of. I'm not particularly interested in working on a tool that doesn't do what the group wants it to do, and I would like all of the other WebKit ports to be running pixel tests by default (and new-run-webkit-tests ;) ) since I think it catches bugs. As far as I know, the general sentiment on the list has been that we should be running pixel tests by default, and the reason that we aren't is largely due to the work involved in getting them back up to date and keeping them up to date. I'm sure that fuzzy matching reduces the work load, especially for the sort of mismatches caused by differences in the text antialiasing. In addition, I have heard concerns that we'd like to keep fuzzy matching because people might potentially get different results on machines with different hardware configurations, but I don't know that we have any confirmed cases of that (except for arguably the case of different code paths for gpu-accelerated rendering vs. unaccelerated rendering). If we made it easier to maintain the baselines (improved tooling like the chromium's rebaselining tool, add reftest support, etc.) are there still compelling reasons for supporting --tolerance -based testing as opposed to exact matching? -- Dirk On Fri, Oct 8, 2010 at 11:14 AM, Jeremy Orlow jor...@chromium.org wrote: I'm not an expert on Pixel tests, but my understanding is that in Chromium (where we've always run with tolerance 0) we've seen real regressions that would have slipped by with something like tolerance 0.1. When you have 0 tolerance, it is more maintenance work, but if we can avoid regressions, it seems worth it. J On Fri, Oct 8, 2010 at 10:58 AM, Nikolas Zimmermann zimmerm...@physik.rwth-aachen.de wrote: Am 08.10.2010 um 19:53 schrieb Maciej Stachowiak: On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote: Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak: On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote: Good evening webkit folks, I've finished landing svg/ pixel test baselines, which pass with --tolerance 0 on my 10.5 10.6 machines. As the pixel testing is very important for the SVG tests, I'd like to run them on the bots, experimentally, so we can catch regressions easily. Maybe someone with direct access to the leopard snow leopard bots, could just run run-webkit-tests --tolerance 0 -p svg and mail me the results? If it passes, we could maybe run the pixel tests for the svg/ subdirectory on these bots? Running pixel tests would be great, but can we really expect the results to be stable cross-platform with tolerance 0? Perhaps we should start with a higher tolerance level. Sure, we could do that. But I'd really like to get a feeling, for what's problematic first. If we see 95% of the SVG tests pass with --tolerance 0, and only a few need higher tolerances (64bit vs. 32bit aa differences, etc.), I could come up with a per-file pixel test tolerance extension to DRT, if it's needed. How about
Re: [webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)
Simon, are you suggesting that we should only use pixel results for ref tests? If not, then we still need to come to a conclusion on this tolerance issue. Dirk, implementing --tolerance in NRWT isn't that hard is it? Getting rid of --tolerance will be a lot of work of making sure all the pixel results that currently pass also pass with --tolerance=0. While I would support someone doing that work, I don't think we should block moving to NRWT on it. Ojan On Fri, Oct 8, 2010 at 1:03 PM, Simon Fraser simon.fra...@apple.com wrote: I think the best solution to this pixel matching problem is ref tests. How practical would it be to use ref tests for SVG? Simon On Oct 8, 2010, at 12:43 PM, Dirk Pranke wrote: Jeremy is correct; the Chromium port has seen real regressions that virtually no concept of a fuzzy match that I can imagine would've caught. new-run-webkit-tests doesn't currently support the tolerance concept at al, and I am inclined to argue that it shouldn't. However, I frequently am wrong about things, so it's quite possible that there are good arguments for supporting it that I'm not aware of. I'm not particularly interested in working on a tool that doesn't do what the group wants it to do, and I would like all of the other WebKit ports to be running pixel tests by default (and new-run-webkit-tests ;) ) since I think it catches bugs. As far as I know, the general sentiment on the list has been that we should be running pixel tests by default, and the reason that we aren't is largely due to the work involved in getting them back up to date and keeping them up to date. I'm sure that fuzzy matching reduces the work load, especially for the sort of mismatches caused by differences in the text antialiasing. In addition, I have heard concerns that we'd like to keep fuzzy matching because people might potentially get different results on machines with different hardware configurations, but I don't know that we have any confirmed cases of that (except for arguably the case of different code paths for gpu-accelerated rendering vs. unaccelerated rendering). If we made it easier to maintain the baselines (improved tooling like the chromium's rebaselining tool, add reftest support, etc.) are there still compelling reasons for supporting --tolerance -based testing as opposed to exact matching? -- Dirk On Fri, Oct 8, 2010 at 11:14 AM, Jeremy Orlow jor...@chromium.org wrote: I'm not an expert on Pixel tests, but my understanding is that in Chromium (where we've always run with tolerance 0) we've seen real regressions that would have slipped by with something like tolerance 0.1. When you have 0 tolerance, it is more maintenance work, but if we can avoid regressions, it seems worth it. J On Fri, Oct 8, 2010 at 10:58 AM, Nikolas Zimmermann zimmerm...@physik.rwth-aachen.de wrote: Am 08.10.2010 um 19:53 schrieb Maciej Stachowiak: On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote: Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak: On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote: Good evening webkit folks, I've finished landing svg/ pixel test baselines, which pass with --tolerance 0 on my 10.5 10.6 machines. As the pixel testing is very important for the SVG tests, I'd like to run them on the bots, experimentally, so we can catch regressions easily. Maybe someone with direct access to the leopard snow leopard bots, could just run run-webkit-tests --tolerance 0 -p svg and mail me the results? If it passes, we could maybe run the pixel tests for the svg/ subdirectory on these bots? Running pixel tests would be great, but can we really expect the results to be stable cross-platform with tolerance 0? Perhaps we should start with a higher tolerance level. Sure, we could do that. But I'd really like to get a feeling, for what's problematic first. If we see 95% of the SVG tests pass with --tolerance 0, and only a few need higher tolerances (64bit vs. 32bit aa differences, etc.), I could come up with a per-file pixel test tolerance extension to DRT, if it's needed. How about starting with just one build slave (say. Mac Leopard) that runs the pixel tests for SVG, with --tolerance 0 for a while. I'd be happy to identify the problems, and see if we can make it work, somehow :-) The problem I worry about is that on future Mac OS X releases, rendering of shapes may change in some tiny way that is not visible but enough to cause failures at tolerance 0. In the past, such false positives arose from time to time, which is one reason we added pixel test tolerance in the first place. I don't think running pixel tests on just one build slave will help us understand that risk. I think we'd just update the baseline to the newer OS X release, then, like it has been done for the tiger - leopard, leopard - snow leopard
Re: [webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)
I'm not sure if this could be make to work with SVG (might require some additions to LayoutTestController), but Philip Taylor's canvas test suite (in LayoutTests/canvas/philip) compares pixels programmatically in JavaScript. This has the major advantage that it doesn't require pixel results, and allows for a per-test level of fuzziness/tolerance (if required). Obviously we would still want to have some tests remain pixel tests, as these tests only cover a subset of pixels, but it might be a good alternative to consider when writing new tests (especially for regressions, where a single pixel correctly chosen can often correctly isolate the problem). Stephen On Thu, Oct 14, 2010 at 12:06 PM, Ojan Vafai o...@chromium.org wrote: Simon, are you suggesting that we should only use pixel results for ref tests? If not, then we still need to come to a conclusion on this tolerance issue. Dirk, implementing --tolerance in NRWT isn't that hard is it? Getting rid of --tolerance will be a lot of work of making sure all the pixel results that currently pass also pass with --tolerance=0. While I would support someone doing that work, I don't think we should block moving to NRWT on it. Ojan On Fri, Oct 8, 2010 at 1:03 PM, Simon Fraser simon.fra...@apple.comwrote: I think the best solution to this pixel matching problem is ref tests. How practical would it be to use ref tests for SVG? Simon On Oct 8, 2010, at 12:43 PM, Dirk Pranke wrote: Jeremy is correct; the Chromium port has seen real regressions that virtually no concept of a fuzzy match that I can imagine would've caught. new-run-webkit-tests doesn't currently support the tolerance concept at al, and I am inclined to argue that it shouldn't. However, I frequently am wrong about things, so it's quite possible that there are good arguments for supporting it that I'm not aware of. I'm not particularly interested in working on a tool that doesn't do what the group wants it to do, and I would like all of the other WebKit ports to be running pixel tests by default (and new-run-webkit-tests ;) ) since I think it catches bugs. As far as I know, the general sentiment on the list has been that we should be running pixel tests by default, and the reason that we aren't is largely due to the work involved in getting them back up to date and keeping them up to date. I'm sure that fuzzy matching reduces the work load, especially for the sort of mismatches caused by differences in the text antialiasing. In addition, I have heard concerns that we'd like to keep fuzzy matching because people might potentially get different results on machines with different hardware configurations, but I don't know that we have any confirmed cases of that (except for arguably the case of different code paths for gpu-accelerated rendering vs. unaccelerated rendering). If we made it easier to maintain the baselines (improved tooling like the chromium's rebaselining tool, add reftest support, etc.) are there still compelling reasons for supporting --tolerance -based testing as opposed to exact matching? -- Dirk On Fri, Oct 8, 2010 at 11:14 AM, Jeremy Orlow jor...@chromium.org wrote: I'm not an expert on Pixel tests, but my understanding is that in Chromium (where we've always run with tolerance 0) we've seen real regressions that would have slipped by with something like tolerance 0.1. When you have 0 tolerance, it is more maintenance work, but if we can avoid regressions, it seems worth it. J On Fri, Oct 8, 2010 at 10:58 AM, Nikolas Zimmermann zimmerm...@physik.rwth-aachen.de wrote: Am 08.10.2010 um 19:53 schrieb Maciej Stachowiak: On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote: Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak: On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote: Good evening webkit folks, I've finished landing svg/ pixel test baselines, which pass with --tolerance 0 on my 10.5 10.6 machines. As the pixel testing is very important for the SVG tests, I'd like to run them on the bots, experimentally, so we can catch regressions easily. Maybe someone with direct access to the leopard snow leopard bots, could just run run-webkit-tests --tolerance 0 -p svg and mail me the results? If it passes, we could maybe run the pixel tests for the svg/ subdirectory on these bots? Running pixel tests would be great, but can we really expect the results to be stable cross-platform with tolerance 0? Perhaps we should start with a higher tolerance level. Sure, we could do that. But I'd really like to get a feeling, for what's problematic first. If we see 95% of the SVG tests pass with --tolerance 0, and only a few need higher tolerances (64bit vs. 32bit aa differences, etc.), I could come up with a per-file pixel test tolerance extension to DRT, if it's needed. How about starting with just one build
Re: [webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)
On Oct 14, 2010, at 9:06 AM, Ojan Vafai wrote: Simon, are you suggesting that we should only use pixel results for ref tests? In an ideal world, yes. But we have such a huge body of existing tests that converting them all to ref tests is a non-starter, so I agree that we need to resolve the tolerance issue. However, at some point I'd like to see us get to a stage where new pixel tests must be ref tests. Simon If not, then we still need to come to a conclusion on this tolerance issue. Dirk, implementing --tolerance in NRWT isn't that hard is it? Getting rid of --tolerance will be a lot of work of making sure all the pixel results that currently pass also pass with --tolerance=0. While I would support someone doing that work, I don't think we should block moving to NRWT on it. Ojan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)
On Thu, Oct 14, 2010 at 9:06 AM, Ojan Vafai o...@chromium.org wrote: Dirk, implementing --tolerance in NRWT isn't that hard is it? Getting rid of --tolerance will be a lot of work of making sure all the pixel results that currently pass also pass with --tolerance=0. While I would support someone doing that work, I don't think we should block moving to NRWT on it. Assuming we implement it only for the ports that currently use tolerance on old-run-webkit-tests, no, I wouldn't expect it to be hard. Dunno how much work it would be to implement tolerance on the chromium image_diff implementations (side note: it would be nice if these binaries weren't port-specific, but that's another topic). As to how many files we'd have to rebaseline for the base ports, I don't know how many there are compared to how many fail pixel tests, period. I'll run a couple tests and find out. -- Dirk Ojan On Fri, Oct 8, 2010 at 1:03 PM, Simon Fraser simon.fra...@apple.com wrote: I think the best solution to this pixel matching problem is ref tests. How practical would it be to use ref tests for SVG? Simon On Oct 8, 2010, at 12:43 PM, Dirk Pranke wrote: Jeremy is correct; the Chromium port has seen real regressions that virtually no concept of a fuzzy match that I can imagine would've caught. new-run-webkit-tests doesn't currently support the tolerance concept at al, and I am inclined to argue that it shouldn't. However, I frequently am wrong about things, so it's quite possible that there are good arguments for supporting it that I'm not aware of. I'm not particularly interested in working on a tool that doesn't do what the group wants it to do, and I would like all of the other WebKit ports to be running pixel tests by default (and new-run-webkit-tests ;) ) since I think it catches bugs. As far as I know, the general sentiment on the list has been that we should be running pixel tests by default, and the reason that we aren't is largely due to the work involved in getting them back up to date and keeping them up to date. I'm sure that fuzzy matching reduces the work load, especially for the sort of mismatches caused by differences in the text antialiasing. In addition, I have heard concerns that we'd like to keep fuzzy matching because people might potentially get different results on machines with different hardware configurations, but I don't know that we have any confirmed cases of that (except for arguably the case of different code paths for gpu-accelerated rendering vs. unaccelerated rendering). If we made it easier to maintain the baselines (improved tooling like the chromium's rebaselining tool, add reftest support, etc.) are there still compelling reasons for supporting --tolerance -based testing as opposed to exact matching? -- Dirk On Fri, Oct 8, 2010 at 11:14 AM, Jeremy Orlow jor...@chromium.org wrote: I'm not an expert on Pixel tests, but my understanding is that in Chromium (where we've always run with tolerance 0) we've seen real regressions that would have slipped by with something like tolerance 0.1. When you have 0 tolerance, it is more maintenance work, but if we can avoid regressions, it seems worth it. J On Fri, Oct 8, 2010 at 10:58 AM, Nikolas Zimmermann zimmerm...@physik.rwth-aachen.de wrote: Am 08.10.2010 um 19:53 schrieb Maciej Stachowiak: On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote: Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak: On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote: Good evening webkit folks, I've finished landing svg/ pixel test baselines, which pass with --tolerance 0 on my 10.5 10.6 machines. As the pixel testing is very important for the SVG tests, I'd like to run them on the bots, experimentally, so we can catch regressions easily. Maybe someone with direct access to the leopard snow leopard bots, could just run run-webkit-tests --tolerance 0 -p svg and mail me the results? If it passes, we could maybe run the pixel tests for the svg/ subdirectory on these bots? Running pixel tests would be great, but can we really expect the results to be stable cross-platform with tolerance 0? Perhaps we should start with a higher tolerance level. Sure, we could do that. But I'd really like to get a feeling, for what's problematic first. If we see 95% of the SVG tests pass with --tolerance 0, and only a few need higher tolerances (64bit vs. 32bit aa differences, etc.), I could come up with a per-file pixel test tolerance extension to DRT, if it's needed. How about starting with just one build slave (say. Mac Leopard) that runs the pixel tests for SVG, with --tolerance 0 for a while. I'd be happy to identify the problems, and see if we can make it work, somehow :-) The problem I worry about is that on future Mac OS X releases, rendering of shapes may change in some tiny
[webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)
Jeremy is correct; the Chromium port has seen real regressions that virtually no concept of a fuzzy match that I can imagine would've caught. new-run-webkit-tests doesn't currently support the tolerance concept at al, and I am inclined to argue that it shouldn't. However, I frequently am wrong about things, so it's quite possible that there are good arguments for supporting it that I'm not aware of. I'm not particularly interested in working on a tool that doesn't do what the group wants it to do, and I would like all of the other WebKit ports to be running pixel tests by default (and new-run-webkit-tests ;) ) since I think it catches bugs. As far as I know, the general sentiment on the list has been that we should be running pixel tests by default, and the reason that we aren't is largely due to the work involved in getting them back up to date and keeping them up to date. I'm sure that fuzzy matching reduces the work load, especially for the sort of mismatches caused by differences in the text antialiasing. In addition, I have heard concerns that we'd like to keep fuzzy matching because people might potentially get different results on machines with different hardware configurations, but I don't know that we have any confirmed cases of that (except for arguably the case of different code paths for gpu-accelerated rendering vs. unaccelerated rendering). If we made it easier to maintain the baselines (improved tooling like the chromium's rebaselining tool, add reftest support, etc.) are there still compelling reasons for supporting --tolerance -based testing as opposed to exact matching? -- Dirk On Fri, Oct 8, 2010 at 11:14 AM, Jeremy Orlow jor...@chromium.org wrote: I'm not an expert on Pixel tests, but my understanding is that in Chromium (where we've always run with tolerance 0) we've seen real regressions that would have slipped by with something like tolerance 0.1. When you have 0 tolerance, it is more maintenance work, but if we can avoid regressions, it seems worth it. J On Fri, Oct 8, 2010 at 10:58 AM, Nikolas Zimmermann zimmerm...@physik.rwth-aachen.de wrote: Am 08.10.2010 um 19:53 schrieb Maciej Stachowiak: On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote: Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak: On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote: Good evening webkit folks, I've finished landing svg/ pixel test baselines, which pass with --tolerance 0 on my 10.5 10.6 machines. As the pixel testing is very important for the SVG tests, I'd like to run them on the bots, experimentally, so we can catch regressions easily. Maybe someone with direct access to the leopard snow leopard bots, could just run run-webkit-tests --tolerance 0 -p svg and mail me the results? If it passes, we could maybe run the pixel tests for the svg/ subdirectory on these bots? Running pixel tests would be great, but can we really expect the results to be stable cross-platform with tolerance 0? Perhaps we should start with a higher tolerance level. Sure, we could do that. But I'd really like to get a feeling, for what's problematic first. If we see 95% of the SVG tests pass with --tolerance 0, and only a few need higher tolerances (64bit vs. 32bit aa differences, etc.), I could come up with a per-file pixel test tolerance extension to DRT, if it's needed. How about starting with just one build slave (say. Mac Leopard) that runs the pixel tests for SVG, with --tolerance 0 for a while. I'd be happy to identify the problems, and see if we can make it work, somehow :-) The problem I worry about is that on future Mac OS X releases, rendering of shapes may change in some tiny way that is not visible but enough to cause failures at tolerance 0. In the past, such false positives arose from time to time, which is one reason we added pixel test tolerance in the first place. I don't think running pixel tests on just one build slave will help us understand that risk. I think we'd just update the baseline to the newer OS X release, then, like it has been done for the tiger - leopard, leopard - snow leopard switch? platform/mac/ should always contain the newest release baseline, when therere are differences on leopard, the results go into platform/mac-leopard/ Why not start with some low but non-zero tolerance (0.1?) and see if we can at least make that work consistently, before we try the bolder step of tolerance 0? Also, and as a side note, we probably need to add more build slaves to run pixel tests at all, since just running the test suite without pixel tests is already slow enough that the testers are often significantly behind the builders. Well, I thought about just running the pixel tests for the svg/ subdirectory as a seperate step, hence my request for tolerance 0, as the baseline passes without problems at least on my Dirks machine already. I wouldnt' want to argue running 20.000+ pixel tests with tolerance 0 as first step :-)
Re: [webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)
I think the best solution to this pixel matching problem is ref tests. How practical would it be to use ref tests for SVG? Simon On Oct 8, 2010, at 12:43 PM, Dirk Pranke wrote: Jeremy is correct; the Chromium port has seen real regressions that virtually no concept of a fuzzy match that I can imagine would've caught. new-run-webkit-tests doesn't currently support the tolerance concept at al, and I am inclined to argue that it shouldn't. However, I frequently am wrong about things, so it's quite possible that there are good arguments for supporting it that I'm not aware of. I'm not particularly interested in working on a tool that doesn't do what the group wants it to do, and I would like all of the other WebKit ports to be running pixel tests by default (and new-run-webkit-tests ;) ) since I think it catches bugs. As far as I know, the general sentiment on the list has been that we should be running pixel tests by default, and the reason that we aren't is largely due to the work involved in getting them back up to date and keeping them up to date. I'm sure that fuzzy matching reduces the work load, especially for the sort of mismatches caused by differences in the text antialiasing. In addition, I have heard concerns that we'd like to keep fuzzy matching because people might potentially get different results on machines with different hardware configurations, but I don't know that we have any confirmed cases of that (except for arguably the case of different code paths for gpu-accelerated rendering vs. unaccelerated rendering). If we made it easier to maintain the baselines (improved tooling like the chromium's rebaselining tool, add reftest support, etc.) are there still compelling reasons for supporting --tolerance -based testing as opposed to exact matching? -- Dirk On Fri, Oct 8, 2010 at 11:14 AM, Jeremy Orlow jor...@chromium.org wrote: I'm not an expert on Pixel tests, but my understanding is that in Chromium (where we've always run with tolerance 0) we've seen real regressions that would have slipped by with something like tolerance 0.1. When you have 0 tolerance, it is more maintenance work, but if we can avoid regressions, it seems worth it. J On Fri, Oct 8, 2010 at 10:58 AM, Nikolas Zimmermann zimmerm...@physik.rwth-aachen.de wrote: Am 08.10.2010 um 19:53 schrieb Maciej Stachowiak: On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote: Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak: On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote: Good evening webkit folks, I've finished landing svg/ pixel test baselines, which pass with --tolerance 0 on my 10.5 10.6 machines. As the pixel testing is very important for the SVG tests, I'd like to run them on the bots, experimentally, so we can catch regressions easily. Maybe someone with direct access to the leopard snow leopard bots, could just run run-webkit-tests --tolerance 0 -p svg and mail me the results? If it passes, we could maybe run the pixel tests for the svg/ subdirectory on these bots? Running pixel tests would be great, but can we really expect the results to be stable cross-platform with tolerance 0? Perhaps we should start with a higher tolerance level. Sure, we could do that. But I'd really like to get a feeling, for what's problematic first. If we see 95% of the SVG tests pass with --tolerance 0, and only a few need higher tolerances (64bit vs. 32bit aa differences, etc.), I could come up with a per-file pixel test tolerance extension to DRT, if it's needed. How about starting with just one build slave (say. Mac Leopard) that runs the pixel tests for SVG, with --tolerance 0 for a while. I'd be happy to identify the problems, and see if we can make it work, somehow :-) The problem I worry about is that on future Mac OS X releases, rendering of shapes may change in some tiny way that is not visible but enough to cause failures at tolerance 0. In the past, such false positives arose from time to time, which is one reason we added pixel test tolerance in the first place. I don't think running pixel tests on just one build slave will help us understand that risk. I think we'd just update the baseline to the newer OS X release, then, like it has been done for the tiger - leopard, leopard - snow leopard switch? platform/mac/ should always contain the newest release baseline, when therere are differences on leopard, the results go into platform/mac-leopard/ Why not start with some low but non-zero tolerance (0.1?) and see if we can at least make that work consistently, before we try the bolder step of tolerance 0? Also, and as a side note, we probably need to add more build slaves to run pixel tests at all, since just running the test suite without pixel tests is already slow enough that the testers are often significantly behind the builders. Well, I thought about just running the pixel tests for the