Asserting a test case is 100% correct is nearly impossible for a large percentage of tests. The main advantage it gives us is the ability to have -expected mean "unsure".
Lets instead only add -failing (i.e. no -passing). Leaving -expected to mean roughly what it does today to Chromium folk (roughly, as best we can tell this test is passing). -failing means it's *probably* an incorrect result but needs an expert to look at it to either mark it correct (i.e. rename it to -expected) or figure out how the root cause of the bug. This actually matches exactly what Chromium gardeners do today, except instead of putting a line in TestExpectations/Skipped to look at later, they checkin the -failing file to look at later, which has all the advantages Dirk listed in the other thread. Like Dirk's proposal, having both a -failing and a -expected in the same directory for the same test will be disallowed by the tooling. The reason I like this is that it's more use-case driven. -failing is a clear todo list for anyone wanting to fix layout tests. On Fri, Aug 17, 2012 at 11:52 AM, Dirk Pranke <dpra...@chromium.org> wrote: > On Fri, Aug 17, 2012 at 11:29 AM, Ryosuke Niwa <rn...@webkit.org> wrote: > > On Fri, Aug 17, 2012 at 11:06 AM, Dirk Pranke <dpra...@chromium.org> > wrote: > >> > >> > On the other hand, the pixel test output that's correct to one expert > >> > may > >> > not be correct to another expert. For example, I might think that one > >> > editing test's output is correct because it shows the feature we're > >> > testing > >> > in the test is working properly. But Stephen might realizes that this > >> > -expected.png contains off-by-one Skia bug. So categorizing > -correct.png > >> > and > >> > -failure.png may require multiple experts looking at the output, which > >> > may > >> > or may not be practical. > >> > >> Perhaps. Obviously (a) there's a limit to what you can do here, and > >> (b) a test that requires multiple experts to verify its correctness > >> is, IMO, a bad test :). > > > > > > With that argument, almost all pixel tests are bad tests because pixel > tests > > in editing, for example, involve editing, rendering, and graphics code. > > If in order to tell a pixel test is correct you need to be aware of > how all of that stuff works, then, yes, it's a bad test. It can fail > too many different ways, and is testing too many different bits of > information. As Filip might suggest, it would be nice if we could > split such tests up. That said, I will freely grant that in many cases > we can't easily do better given the way things are currently > structured, and splitting up such tests would be an enormous amount of > work. > > If the pixel test is testing whether a rectangle is actually green or > actually red, such a test is fine, doesn't need much subject matter > expertise, and it is hard to imagine how you'd test such a thing some > other way. > > > I don't think any single person can comprehend the entire stack to tell > with a > > 100% confidence that the test result is exactly and precisely correct. > > Sure. Such a high bar should be avoided. > > >> > I think we should just check-in whatever result we're > >> > currently seeing as -expected.png because we wouldn't at least have > any > >> > ambiguity in the process then. We just check in whatever we're > currently > >> > seeing and file a bug if we see a problem with the new result and > >> > possibly > >> > rollout the patch after talking with the author/reviewer. > >> > >> This is basically saying we should just follow the "existing > >> non-Chromium" process, right? > > > > > > Yes. In addition, doing so will significantly reduce the complexity of > the > > current process. > > > >> This would seem to bring us back to step > >> 1: it doesn't address the problem that I identified with the "existing > >> non-Chromium" process, namely that a non-expert can't tell by looking > >> at the checkout what tests are believed to be passing or not. > > > > > > What is the use case of this? I've been working on WebKit for more than 3 > > years, and I've never had to think about whether a test for an area > outside > > of my expertise has the correct output or not other than when I was > > gardening. And having -correct / -failing wouldn't have helped me knowing > > what the correct output when I was gardening anyway because the new > output > > may as well as be new -correct or -failing result. > > I've done this frequently when gardening, when simply trying to learn > how a given chunk of code works and how a given chunk of tests work > (or don't work), and when trying to get a sense of how well our > product is or isn't passing tests. > > Perhaps this is the case because I tend to more work on infrastructure > and testing, and look at stuff shallowly across the whole tree rather > than in depth in particular areas as you do. > > >> I don't think searching bugzilla (as it is currently used) is a workable > >> alternative. > > > > > > Why not? Bugzilla is the tool we use to triage and track bugs. I don't > see a > > need for an alternative method to keep track of bugs. > > The way we currently use bugzilla, it is difficult if not impossible > to find a concise and accurate list of all the failing layout tests > meeting any sort of filename- or directory-based criteria (maybe you > can do it just for editing, I don't know). The layout test summary > reports that Ojan sends out to the chromium devs is an example of > this: he generates that from the TestExpectations files; doing so from > bugzilla is not currently feasible. > > Note that we could certainly extend bugzilla to make this easier, if > there was consensus to do so (and I would be in favor of this, but > that would also incur more process than we have today). > > - Dirk > _______________________________________________ > webkit-dev mailing list > webkit-dev@lists.webkit.org > http://lists.webkit.org/mailman/listinfo/webkit-dev >
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev