Re: [webkit-dev] run-webkit-tests question; hashes when comparing ref test output

2015-01-23 Thread Alexey Proskuryakov

23 янв. 2015 г., в 10:01, Ryosuke Niwa  написал(а):

> We could add a new test expectation like ImageDiff to suppress these or we 
> could expose a new internals or testRunner methods to mark a test as such.

Good idea! An ImageHashMismatch expectation seems like a reasonable way to 
silence the noise.

Also, I am surprised that we weren't getting a specific message about size 
mismatch here, as Darin suggested. In fact, we used to have this very output on 
other tests until very recently (see 
).

- Alexey

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] run-webkit-tests question; hashes when comparing ref test output

2015-01-23 Thread Simon Fraser
On Jan 23, 2015, at 10:01 AM, Ryosuke Niwa  wrote:
> 
> 
> On Friday, January 23, 2015, Darin Adler  > wrote:
> > On Jan 22, 2015, at 10:36 PM, Alexey Proskuryakov  > > wrote:
> >
> >> 22 янв. 2015 г., в 17:57, Darin Adler > 
> >> написал(а):
> >>
> >> What about the test I cited?
> >>
> >> svg/css/svg-resource-fragment-identifier-img-src.html
> >
> > This particular test is buggy - it is a hidpi test, so it dumps results as 
> > a 1600x1200 image, but its -expected.html is not hidpi, and is dumped as 
> > 800x600, so hashes are obviously different. I now filed 
> >  > > about this test.
> >
> > When we compare pixels, we draw both images into bitmaps of the same size, 
> > so they become similar enough for ImageDiff to consider them identical.
> >
> > Earlier today, Simon and I agreed that we should just silence the error 
> > message, because it only tells us about minor color differences that are 
> > inevitable when comparing compositing vs. non-compositing. Looks like it 
> > tells us about more actionable issues, too. Also, I just found 
> >  > >, and I think that its 
> > rationale applies in this case, too.
> >
> > So we should probably keep this error message. I'm not sure whether we 
> > should make it a hard error though.
> 
> I think that to improve things we should make an informative message for this 
> particular mistake where the expected file has the wrong size or resolution.
> 
> For me, the bad thing about the current message is not simply that it’s a 
> false positive, but that it’s an unclear error message that covers too many 
> different possibilities.
> 
> I’m not sure that keeping things as is will be a good strategy. A message 
> that is often expected but sometimes indicates a real problem is not good for 
> the project. The average engineer has no idea whether to ignore these or act 
> on them!

Agreed. The size difference certainly needs to be a failure.

> 
> We could add a new test expectation like ImageDiff to suppress these or we 
> could expose a new internals or testRunner methods to mark a test as such.

I don’t think that addresses the issue. Darin’s point is that if a developer 
sees this output (possibly when running tests after making a change), it needs 
to describe something actionable.

Just suppressing this output for existing tests doesn’t help when the output 
crops up unexpectedly.

Simon

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] run-webkit-tests question; hashes when comparing ref test output

2015-01-23 Thread Ryosuke Niwa
On Friday, January 23, 2015, Darin Adler  wrote:

> > On Jan 22, 2015, at 10:36 PM, Alexey Proskuryakov  > wrote:
> >
> >> 22 янв. 2015 г., в 17:57, Darin Adler >
> написал(а):
> >>
> >> What about the test I cited?
> >>
> >> svg/css/svg-resource-fragment-identifier-img-src.html
> >
> > This particular test is buggy - it is a hidpi test, so it dumps results
> as a 1600x1200 image, but its -expected.html is not hidpi, and is dumped as
> 800x600, so hashes are obviously different. I now filed <
> https://bugs.webkit.org/show_bug.cgi?id=140815> about this test.
> >
> > When we compare pixels, we draw both images into bitmaps of the same
> size, so they become similar enough for ImageDiff to consider them
> identical.
> >
> > Earlier today, Simon and I agreed that we should just silence the error
> message, because it only tells us about minor color differences that are
> inevitable when comparing compositing vs. non-compositing. Looks like it
> tells us about more actionable issues, too. Also, I just found <
> https://bugs.webkit.org/show_bug.cgi?id=69444>, and I think that its
> rationale applies in this case, too.
> >
> > So we should probably keep this error message. I'm not sure whether we
> should make it a hard error though.
>
> I think that to improve things we should make an informative message for
> this particular mistake where the expected file has the wrong size or
> resolution.
>
> For me, the bad thing about the current message is not simply that it’s a
> false positive, but that it’s an unclear error message that covers too many
> different possibilities.
>
> I’m not sure that keeping things as is will be a good strategy. A message
> that is often expected but sometimes indicates a real problem is not good
> for the project. The average engineer has no idea whether to ignore these
> or act on them!


We could add a new test expectation like ImageDiff to suppress these or we
could expose a new internals or testRunner methods to mark a test as such.

 - R. Niwa


-- 
- R. Niwa
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] run-webkit-tests question; hashes when comparing ref test output

2015-01-23 Thread Darin Adler
> On Jan 22, 2015, at 10:36 PM, Alexey Proskuryakov  wrote:
> 
>> 22 янв. 2015 г., в 17:57, Darin Adler  написал(а):
>> 
>> What about the test I cited?
>> 
>> svg/css/svg-resource-fragment-identifier-img-src.html
> 
> This particular test is buggy - it is a hidpi test, so it dumps results as a 
> 1600x1200 image, but its -expected.html is not hidpi, and is dumped as 
> 800x600, so hashes are obviously different. I now filed 
>  about this test.
> 
> When we compare pixels, we draw both images into bitmaps of the same size, so 
> they become similar enough for ImageDiff to consider them identical.
> 
> Earlier today, Simon and I agreed that we should just silence the error 
> message, because it only tells us about minor color differences that are 
> inevitable when comparing compositing vs. non-compositing. Looks like it 
> tells us about more actionable issues, too. Also, I just found 
> , and I think that its 
> rationale applies in this case, too.
> 
> So we should probably keep this error message. I'm not sure whether we should 
> make it a hard error though.

I think that to improve things we should make an informative message for this 
particular mistake where the expected file has the wrong size or resolution.

For me, the bad thing about the current message is not simply that it’s a false 
positive, but that it’s an unclear error message that covers too many different 
possibilities.

I’m not sure that keeping things as is will be a good strategy. A message that 
is often expected but sometimes indicates a real problem is not good for the 
project. The average engineer has no idea whether to ignore these or act on 
them!

— Darin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] run-webkit-tests question; hashes when comparing ref test output

2015-01-22 Thread Alexey Proskuryakov

> 22 янв. 2015 г., в 17:57, Darin Adler  написал(а):
> 
> What about the test I cited?
> 
> svg/css/svg-resource-fragment-identifier-img-src.html

This particular test is buggy - it is a hidpi test, so it dumps results as a 
1600x1200 image, but its -expected.html is not hidpi, and is dumped as 800x600, 
so hashes are obviously different. I now filed 
 about this test.

When we compare pixels, we draw both images into bitmaps of the same size, so 
they become similar enough for ImageDiff to consider them identical.

Earlier today, Simon and I agreed that we should just silence the error 
message, because it only tells us about minor color differences that are 
inevitable when comparing compositing vs. non-compositing. Looks like it tells 
us about more actionable issues, too. Also, I just found 
, and I think that its rationale 
applies in this case, too.

So we should probably keep this error message. I'm not sure whether we should 
make it a hard error though.

- Alexey

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] run-webkit-tests question; hashes when comparing ref test output

2015-01-22 Thread Darin Adler
> On Jan 22, 2015, at 11:30 AM, Simon Fraser  wrote:
> 
> For example, some of the tests in question render a green box either via 
> CALayers, or by painting, and there’s a slight color difference between the 
> two code paths.

What about the test I cited?

svg/css/svg-resource-fragment-identifier-img-src.html

I don’t think the layer vs. painting explains the difference. Is there any 
reason to expect that one to have a “close but not identical” result?

— Darin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] run-webkit-tests question; hashes when comparing ref test output

2015-01-22 Thread Dirk Pranke
On Thu, Jan 22, 2015 at 1:56 PM, Darin Adler  wrote:

> > On Jan 22, 2015, at 12:58 PM, Dirk Pranke  wrote:
> >
> > Of course, Chromium was more willing to incur the cost of keeping all of
> the pixel tests up to date when trivial things changed the output
>
> This comment, and much of the discussion, seems to be about pixel tests.
> But what does this have to do with reference tests?
>

The message has nothing to do with whether the test is a pixel test or a
ref test.

My reply was intended to explain why we had the "hashes didn't match but
diff passed" message, and the discrepancy around fuzzy matching; on the
Chromium ports, if run-webkit-tests had printed this message, it would've
served as a warning to the user/developer that there was something weird
going on.

If you all decide you wanted to keep fuzzy matching enabled (and hence that
message would happen regularly and could be safely ignored), I would either
delete the message outright or at least make it be a debug message.

You could also decide you wanted to do fuzzy matching only on ref tests, or
only on pixel tests, and change the logic accordingly, of course.

-- Dirk
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] run-webkit-tests question; hashes when comparing ref test output

2015-01-22 Thread Darin Adler
> On Jan 22, 2015, at 12:58 PM, Dirk Pranke  wrote:
> 
> Of course, Chromium was more willing to incur the cost of keeping all of the 
> pixel tests up to date when trivial things changed the output

This comment, and much of the discussion, seems to be about pixel tests. But 
what does this have to do with reference tests?

— Darin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] run-webkit-tests question; hashes when comparing ref test output

2015-01-22 Thread Dirk Pranke
On Thu, Jan 22, 2015 at 11:48 AM, Alexey Proskuryakov  wrote:

>
> 22 янв. 2015 г., в 11:30, Simon Fraser 
> написал(а):
>
> > This happens when the expected and actual images are very close, but not
> identical. ImageDiff has some built-in rounding that effectively acts as a
> small tolerance, so the hashes are different, but ImageDiff incorrectly
> says the images are the same. For example, some of the tests in question
> render a green box either via CALayers, or by painting, and there’s a
> slight color difference between the two code paths.
> >
> > My preference for how to fix this would be to fix ImageDiff to remove
> its slight built-in tolerance, and then to expose testRunner API to allow a
> test to set an explicit tolerance. There are many cases where we’d like to
> use ref tests, but are unable to because of slight, justifiable rendering
> differences, and having an explicit tolerance would permit the use of ref
> tests in these cases.
>
> One thing about tolerance is that it is super confusing - are we talking
> about the number of pixels that are different, or about how different the
> pixels are? Also, a lot of failures only cause small differences in pixel
> results. Even a 100x100 box that becomes red instead of green is only a
> small portion of the 800x600 image, and it's even more the case for tests
> that check e.g. text rendering.
>
> It is not currently known what the root causes are for the tests that say
> "ref test hashes didn't match but diff passed". Given that the differences
> are very tiny, one guess is that even though compositing and
> non-compositing code paths are mathematically equivalent, there are
> different rendering steps taken, and rounding at each step adds up to
> slight differences. Another theory is that we have actual bugs, such as
> with color management.
>
> If it's just rounding differences, then the right thing to do is probably
> to silence the console output, keeping behavior the same otherwise.
>

Assuming nothing much has changed since I wrote that code, there are a
couple of ways you can get that message.

As you've found, the primary culprit is probably the fuzzy matching. We
turned off fuzzy matching on the Chromium ports because I never liked the
idea; I always was worried that it would mask real bugs.
As Alexey points out, the problem with fuzzy matching is that it's not
clear what exactly the definition of fuzziness is. What may be
appropriately fuzzy in one set of tests may be entirely inappropriate for
others.

Of course, Chromium was more willing to incur the cost of keeping all of
the pixel tests up to date when trivial things changed the output, to the
detriment of everyone's repo sizes and checkout times :).

There may be other things that can also cause mismatches; I remember Simon
bugging me from time to time when Chromium would check in a PNG with an
alpha channel, which would through off the checksum on the Mac ports (or
something like that). I don't know that I would expect you to be hitting
those these days, though.

Hopefully that provides a little bit of context,

-- Dirk
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] run-webkit-tests question; hashes when comparing ref test output

2015-01-22 Thread Alexey Proskuryakov

22 янв. 2015 г., в 11:30, Simon Fraser  написал(а):

> This happens when the expected and actual images are very close, but not 
> identical. ImageDiff has some built-in rounding that effectively acts as a 
> small tolerance, so the hashes are different, but ImageDiff incorrectly says 
> the images are the same. For example, some of the tests in question render a 
> green box either via CALayers, or by painting, and there’s a slight color 
> difference between the two code paths.
> 
> My preference for how to fix this would be to fix ImageDiff to remove its 
> slight built-in tolerance, and then to expose testRunner API to allow a test 
> to set an explicit tolerance. There are many cases where we’d like to use ref 
> tests, but are unable to because of slight, justifiable rendering 
> differences, and having an explicit tolerance would permit the use of ref 
> tests in these cases.

One thing about tolerance is that it is super confusing - are we talking about 
the number of pixels that are different, or about how different the pixels are? 
Also, a lot of failures only cause small differences in pixel results. Even a 
100x100 box that becomes red instead of green is only a small portion of the 
800x600 image, and it's even more the case for tests that check e.g. text 
rendering.

It is not currently known what the root causes are for the tests that say "ref 
test hashes didn't match but diff passed". Given that the differences are very 
tiny, one guess is that even though compositing and non-compositing code paths 
are mathematically equivalent, there are different rendering steps taken, and 
rounding at each step adds up to slight differences. Another theory is that we 
have actual bugs, such as with color management.

If it's just rounding differences, then the right thing to do is probably to 
silence the console output, keeping behavior the same otherwise.

- Alexey

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] run-webkit-tests question; hashes when comparing ref test output

2015-01-22 Thread Simon Fraser

> On Jan 22, 2015, at 8:44 AM, Darin Adler  wrote:
> 
> I noticed that when running tests I get messages like this:
> 
>svg/css/svg-resource-fragment-identifier-img-src.html -> ref test hashes 
> didn't match but diff passed
> 
> I understand that we can have hash mismatches in actual checked in images 
> (expected.png files), and there, at least, I believe the fix might be to 
> regenerate the image. But for a ref test I am really unclear on this. And 
> further, I think it’s distracting to get messages like this unless you are 
> specifically working on the testing infrastructure.
> 
> Why should the average WebKit engineer care about this? Does this reflect a 
> real performance problem when running tests? If so, who would fix it and what 
> would that person do?

Alexey, Tim and I looked at this yesterday.

This happens when the expected and actual images are very close, but not 
identical. ImageDiff has some built-in rounding that effectively acts as a 
small tolerance, so the hashes are different, but ImageDiff incorrectly says 
the images are the same. For example, some of the tests in question render a 
green box either via CALayers, or by painting, and there’s a slight color 
difference between the two code paths.

My preference for how to fix this would be to fix ImageDiff to remove its 
slight built-in tolerance, and then to expose testRunner API to allow a test to 
set an explicit tolerance. There are many cases where we’d like to use ref 
tests, but are unable to because of slight, justifiable rendering differences, 
and having an explicit tolerance would permit the use of ref tests in these 
cases.

Simon

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


[webkit-dev] run-webkit-tests question; hashes when comparing ref test output

2015-01-22 Thread Darin Adler
I noticed that when running tests I get messages like this:

svg/css/svg-resource-fragment-identifier-img-src.html -> ref test hashes 
didn't match but diff passed

I understand that we can have hash mismatches in actual checked in images 
(expected.png files), and there, at least, I believe the fix might be to 
regenerate the image. But for a ref test I am really unclear on this. And 
further, I think it’s distracting to get messages like this unless you are 
specifically working on the testing infrastructure.

Why should the average WebKit engineer care about this? Does this reflect a 
real performance problem when running tests? If so, who would fix it and what 
would that person do?

— Darin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev