On Fri, Aug 17, 2012 at 11:06 AM, Dirk Pranke <dpra...@chromium.org> wrote:
> > On the other hand, the pixel test output that's correct to one expert > may > > not be correct to another expert. For example, I might think that one > > editing test's output is correct because it shows the feature we're > testing > > in the test is working properly. But Stephen might realizes that this > > -expected.png contains off-by-one Skia bug. So categorizing -correct.png > and > > -failure.png may require multiple experts looking at the output, which > may > > or may not be practical. > > Perhaps. Obviously (a) there's a limit to what you can do here, and > (b) a test that requires multiple experts to verify its correctness > is, IMO, a bad test :). > With that argument, almost all pixel tests are bad tests because pixel tests in editing, for example, involve editing, rendering, and graphics code. I don't think any single person can comprehend the entire stack to tell with a 100% confidence that the test result is exactly and precisely correct. > I think we should just check-in whatever result we're > > currently seeing as -expected.png because we wouldn't at least have any > > ambiguity in the process then. We just check in whatever we're currently > > seeing and file a bug if we see a problem with the new result and > possibly > > rollout the patch after talking with the author/reviewer. > > This is basically saying we should just follow the "existing > non-Chromium" process, right? Yes. In addition, doing so will significantly reduce the complexity of the current process. This would seem to bring us back to step > 1: it doesn't address the problem that I identified with the "existing > non-Chromium" process, namely that a non-expert can't tell by looking > at the checkout what tests are believed to be passing or not. What is the use case of this? I've been working on WebKit for more than 3 years, and I've never had to think about whether a test for an area outside of my expertise has the correct output or not other than when I was gardening. And having -correct / -failing wouldn't have helped me knowing what the correct output when I was gardening anyway because the new output may as well as be new -correct or -failing result. I don't think searching bugzilla (as it is currently used) is a > workable alternative. > Why not? Bugzilla is the tool we use to triage and track bugs. I don't see a need for an alternative method to keep track of bugs. > The new result we check in may not be 100% right but experts — e.g. me > for > > editing and Stephen for Skia — can come in and look at recent changes to > > triage any new failures. > > > > In fact, it might be worthwhile for us to invest our time in improving > tools > > to do this. For example, can we have a portal where I can see new > > rebaselines that happened in LayoutTests/editing and > > LayoutTests/platform/*/editing since the last time I visited the portal? > > e.g. it can show chronological timeline of baselines along with a > possible > > cause (list of changesets maybe?) of the baseline. > > We could build such a portal, sure. I would be interested to hear from > others whether such a thing would be more or less useful than my > proposal. > > Of course, you could just set up a watchlist for new expectations > today. Probably not quite as polished as we could get with a portal, > but dirt simple .. > That might be useful as long as it has an option to give us a digest instead of sending me an e-mail per commit. - Ryosuke
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev