Re: [webkit-dev] Process for importing third party tests
Thanks for the comments, Ryosuke. My replies are inline ... On Tue, May 8, 2012 at 12:42 PM, Ryosuke Niwa cont...@rniwa.com wrote: On Wed, Apr 25, 2012 at 2:18 PM, Dirk Pranke dpra...@chromium.org wrote: 1b. Run suite locally in WebKit directory * Ref Tests * Pass - good, submit it * Fail - create a testName-expected-failure.txt file We don't currently generate an -expected-failure.txt for reftests; there's no concept of a baseline for a reftest at all. We need to such a concept because we want to be able to do: Put differently, assuming the normal WebKit model of the baseline is what we currently produce, we don't actually have a way of capturing what do we current produce for a reference test. I am also more than a little leery of mixing -expected.{txt,png} results with -expected.html results; I feel like it would be very confusing, and it would lose many of the advantages of reftests, since we'd presumably have to update the reference every time as often as we update pixel tests. [We could create a fake reftest that just contained the 800x600 pixel dump, but I'm not sure if that's better or not]. I don't think we will lose advantages. Without some sort of the current result, we will not be able to catch regressions. According to Maciej, we've caught many regressions by using conformance tests this way. I have no doubt that you can catch regressions by using conformance tests this way. My concern -- which is expressed throughout and which you seemed to either miss or downplay -- is that adding more tests creates more work, especially for pixel tests (and it slows down build and test cycles, obviously). I don't think we should just add tests because they exist somewhere on the web; they may provide no additional coverage beyond the tests we already have. I feel like we need a stronger mechanism to either check that new test suites do cover more functionality or we move to obsolete tests we already have. To be clear, I am all for importing test suites when we believe they are comprehensive or do cover things we don't cover well now. But, for example, rather than having four different test suites for flexbox, I would rather see us have one good one. ii. DRT / Pixel tests * Expectations: create render tree dumps for each test and use that output as the new test expectation * Potential regressions will be identified by changes to this output * Proposal (open to discussion) - stop the production of .PNGs for these imported tests * PROS * Avoid the increase to the overall size of the repository caused by all the new PNGs * Regressions will likely be caught by the render tree dumps * Avoid maintenance of all associated PNGs * CONS * Regressions may be missed without the use of .PNGs * May make test results harder to interpret I'm not particularly a fan of this. I think each port should follow its own convention for pixel tests or no. i.e., if Chromium normally runs pixel tests, it should run all of these tests as pixel tests; if Gtk doesn't, than they should just check the render trees here as well. Right. I think we should just generate PNG files. Also, I was under the impression that (a) the W3C is mostly focused on ref tests going forward and (b) we had agreed in that discussion that we wouldn't import non-ref tests? Did something change in a discussion after that session? No. We had agreed to import all tests regardless of whether they're reftests or not because non-reftests can still catch future regressions or progressions. And the number of PNG files added to the repository wasn't considered as a valid counter-argument due to this utility. Well, I certainly didn't agree to it :) My concern is not the # of PNGs so much as the cost of maintenance. iii. JavaScript tests * Pass - good, submit it (along with new expected.txt file - W3C does not use an expected.txt file for JS tests) * Fail - Add to test_expectations file to avoid failures * Over time, individual can clean up failing JS tests If they don't have expected.txt files, how is failure determined? Why would we want to add failures to test_expectations.txt here but not for pixel tests or reftests? If anything, these text-only tests are *more* amenable to checking in the what we do now, even if it's wrong expectation. I agree we should just generate expected.txt. iv. Manual tests * Submit in their current form * Over time, convert to ref tests to be submitted back to W3C I don't know what submit in their current form means ... doesn't submitting have to do with exporting tests (i.e., importing into the w3c repos), and we're talking about importing tests? Are Manual tests somehow different from the other non-ref tests? I think what he meant is to just import as is. (They won't work as intended but we can't really do anything about it). 1. How should W3C
Re: [webkit-dev] Process for importing third party tests
On Tue, May 8, 2012 at 1:04 PM, Dirk Pranke dpra...@chromium.org wrote: On Tue, May 8, 2012 at 12:42 PM, Ryosuke Niwa rn...@webkit.org wrote: On Wed, Apr 25, 2012 at 2:18 PM, Dirk Pranke dpra...@chromium.org wrote: I am also more than a little leery of mixing -expected.{txt,png} results with -expected.html results; I feel like it would be very confusing, and it would lose many of the advantages of reftests, since we'd presumably have to update the reference every time as often as we update pixel tests. [We could create a fake reftest that just contained the 800x600 pixel dump, but I'm not sure if that's better or not]. I don't think we will lose advantages. Without some sort of the current result, we will not be able to catch regressions. According to Maciej, we've caught many regressions by using conformance tests this way. I have no doubt that you can catch regressions by using conformance tests this way. My concern -- which is expressed throughout and which you seemed to either miss or downplay -- is that adding more tests creates more work, especially for pixel tests (and it slows down build and test cycles, obviously). I don't think we should just add tests because they exist somewhere on the web; they may provide no additional coverage beyond the tests we already have. Yes, that's why we want reviews. And that's why we should only import tests from W3C instead of other browser vendors. We expect W3C to have some guideline on avoiding test duplicates. I feel like we need a stronger mechanism to either check that new test suites do cover more functionality or we move to obsolete tests we already have. We can't really verify that two tests test same functionality, etc... automatically. Also, the most general form of this question is undecidable. However, there are a couple of ways to mitigate this issue: 1. Upstream as many layout tests as possible to W3C 2. Delete duplicate layout tests as we import more tests from W3C To be clear, I am all for importing test suites when we believe they are comprehensive or do cover things we don't cover well now. This should probably be judged by individual reviewers. But, for example, rather than having four different test suites for flexbox, I would rather see us have one good one. Sure but I'd that's a really hard problem to solve. Also, I was under the impression that (a) the W3C is mostly focused on ref tests going forward and (b) we had agreed in that discussion that we wouldn't import non-ref tests? Did something change in a discussion after that session? No. We had agreed to import all tests regardless of whether they're reftests or not because non-reftests can still catch future regressions or progressions. And the number of PNG files added to the repository wasn't considered as a valid counter-argument due to this utility. Well, I certainly didn't agree to it :) My concern is not the # of PNGs so much as the cost of maintenance. Sure, that's a valid concern but the overwhelming majority of the people in the room (e.g. Darin, Maciej, etc...) seemed to agree that this is a good idea. I don't understand your proposal about adding platform/webkit. Why do we want that? As far as I know, there are no files in W3C test directories that end with -expected.txt or -expected.png. The idea would be that no webkit-specific files would live in the test directory, only files received from upstream. My thinking was that it would make importing new versions easier and it would be easier to understand what was ours vs. what was theirs. I don't feel that strongly about this, though, it was just an idea. That'll be nice indeed. But if we're going this route, we should probably move all existing -expected.* to this directory as well. So this is probably a tangential issue. Similar to: By ultimately move all existing tests, I assume you're including tests that are currently in LayoutTests that have not come from (or been submitted to) the W3C, e.g., the tests in fast/ ? Yes. I think reorganizing our existing test tree is an entirely different discussion. I'm all for it, I just don't want to confuse it with the discussion about importing test suites. - Ryosuke ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Process for importing third party tests
On Mon, Apr 23, 2012 at 4:13 PM, Jacob Goldstein jac...@adobe.com wrote: At the recent WebKit Contributors Meeting, a process was drafted for importing third party tests into WebKit. I created a wiki page that captures the process that we came up with here: http://trac.webkit.org/wiki/ImportingThirdPartyTests We'd like to get more input from the community on all aspects of this process. Please review and lets discuss further. Hi Jacob, I've only had a chance to glance over the document thus, but thanks for writing it up! Two initial comments: 1a. Import entire test suite(s) from W3C repository: pixel tests, ref tests, manual tests, JS tests You should probably change Import to download or something more descriptive here, since the whole process is an import :) 1b. Run suite locally in WebKit directory * Ref Tests * Pass - good, submit it * Fail - create a testName-expected-failure.txt file We don't currently generate an -expected-failure.txt for reftests; there's no concept of a baseline for a reftest at all. Put differently, assuming the normal WebKit model of the baseline is what we currently produce, we don't actually have a way of capturing what do we current produce for a reference test. I am also more than a little leery of mixing -expected.{txt,png} results with -expected.html results; I feel like it would be very confusing, and it would lose many of the advantages of reftests, since we'd presumably have to update the reference every time as often as we update pixel tests. [We could create a fake reftest that just contained the 800x600 pixel dump, but I'm not sure if that's better or not]. ii. DRT / Pixel tests * Expectations: create render tree dumps for each test and use that output as the new test expectation * Potential regressions will be identified by changes to this output * Proposal (open to discussion) - stop the production of .PNGs for these imported tests * PROS * Avoid the increase to the overall size of the repository caused by all the new PNGs * Regressions will likely be caught by the render tree dumps * Avoid maintenance of all associated PNGs * CONS * Regressions may be missed without the use of .PNGs * May make test results harder to interpret I'm not particularly a fan of this. I think each port should follow its own convention for pixel tests or no. i.e., if Chromium normally runs pixel tests, it should run all of these tests as pixel tests; if Gtk doesn't, than they should just check the render trees here as well. Also, I was under the impression that (a) the W3C is mostly focused on ref tests going forward and (b) we had agreed in that discussion that we wouldn't import non-ref tests? Did something change in a discussion after that session? iii. JavaScript tests * Pass - good, submit it (along with new expected.txt file - W3C does not use an expected.txt file for JS tests) * Fail - Add to test_expectations file to avoid failures * Over time, individual can clean up failing JS tests If they don't have expected.txt files, how is failure determined? Why would we want to add failures to test_expectations.txt here but not for pixel tests or reftests? If anything, these text-only tests are *more* amenable to checking in the what we do now, even if it's wrong expectation. So, it seems like we have three different kinds of tests that you are suggesting we treat three different ways. You can probably guess that I don't like that :). iv. Manual tests * Submit in their current form * Over time, convert to ref tests to be submitted back to W3C I don't know what submit in their current form means ... doesn't submitting have to do with exporting tests (i.e., importing into the w3c repos), and we're talking about importing tests? Are Manual tests somehow different from the other non-ref tests? 1. How should W3C tests that fail in WebKit be handled? a. Failures should be checked in. Details in General Import Process above. We discussed this in the session, but I don't see this in the notes; I would really like for us to move to model in our repo where it's possible to look at the filename for the baselines and determine whether the baseline is believed to be correct, incorrect, or unknown, in addition to capturing what we currently do (these are independent axes). This might be a separate discussion -- and of course there are complications that arise with this -- but I would like to establish it before we go to far down the import path ... in particular, I think it will be difficult to convince the chromium devs to move fully off their current model of checked in files are correct; if we currently do something different, we suppress that. 2. Should a set frequency be used for importing tests? a. No, frequency is up to the people who want to do this task. I'm fine w/ this 3. Can the approval process for previously reviewed W3C tests be streamlined?
[webkit-dev] Process for importing third party tests
At the recent WebKit Contributors Meeting, a process was drafted for importing third party tests into WebKit. I created a wiki page that captures the process that we came up with here: http://trac.webkit.org/wiki/ImportingThirdPartyTests We'd like to get more input from the community on all aspects of this process. Please review and lets discuss further. ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev