Re: [webkit-dev] Process for importing third party tests

2012-05-08 Thread Dirk Pranke
Thanks for the comments, Ryosuke. My replies are inline ...

On Tue, May 8, 2012 at 12:42 PM, Ryosuke Niwa cont...@rniwa.com wrote:
 On Wed, Apr 25, 2012 at 2:18 PM, Dirk Pranke dpra...@chromium.org wrote:

  1b. Run suite locally in WebKit directory
    * Ref Tests
      * Pass - good, submit it
      * Fail - create a testName-expected-failure.txt file

 We don't currently generate an -expected-failure.txt for reftests;
 there's no concept of a baseline for a reftest at all.


 We need to such a concept because we want to be able to do:

 Put differently, assuming the normal WebKit model of the baseline is
 what we currently produce, we don't actually have a way of capturing
 what do we current produce for a reference test.


 I am also more than a little leery of mixing -expected.{txt,png}
 results with -expected.html results; I feel like it would be very
 confusing, and it would lose many of the advantages of reftests, since
 we'd presumably have to update the reference every time as often as we
 update pixel tests. [We could create a fake reftest that just
 contained the 800x600 pixel dump, but I'm not sure if that's better or
 not].


 I don't think we will lose advantages. Without some sort of the current
 result, we will not be able to catch regressions.
 According to Maciej, we've caught many regressions by using conformance
 tests this way.


I have no doubt that you can catch regressions by using conformance
tests this way. My concern -- which is expressed throughout and which
you seemed to either miss or downplay -- is that adding more tests
creates more work, especially for pixel tests (and it slows down build
and test cycles, obviously). I don't think we should just add tests
because they exist somewhere on the web; they may provide no
additional coverage beyond the tests we already have.

I feel like we need a stronger mechanism to either check that new test
suites do cover more functionality or we move to obsolete tests we
already have.

To be clear, I am all for importing test suites when we believe they
are comprehensive or do cover things we don't cover well now. But, for
example, rather than having four different test suites for flexbox, I
would rather see us have one good one.

  ii. DRT / Pixel tests
    * Expectations: create render tree dumps for each test and use that
  output as the new test expectation
      * Potential regressions will be identified by changes to this output
    * Proposal (open to discussion) - stop the production of .PNGs for
  these imported tests
      * PROS
        * Avoid the increase to the overall size of the repository caused
  by all the new PNGs
        * Regressions will likely be caught by the render tree dumps
        * Avoid maintenance of all associated PNGs
      * CONS
        * Regressions may be missed without the use of .PNGs
        * May make test results harder to interpret

 I'm not particularly a fan of this. I think each port should follow
 its own convention for pixel tests or no. i.e., if Chromium normally
 runs pixel tests, it should run all of these tests as pixel tests; if
 Gtk doesn't, than they should just check the render trees here as
 well.


 Right. I think we should just generate PNG files.


 Also, I was under the impression that (a) the W3C is mostly focused on
 ref tests going forward and (b) we had agreed in that discussion that
 we wouldn't import non-ref tests? Did something change in a discussion
 after that session?


 No. We had agreed to import all tests regardless of whether they're reftests
 or not because non-reftests can still catch future regressions or
 progressions. And the number of PNG files added to the repository wasn't
 considered as a valid counter-argument due to this utility.

Well, I certainly didn't agree to it :) My concern is not the # of
PNGs so much as the cost of maintenance.

  iii. JavaScript tests
    * Pass - good, submit it (along with new expected.txt file - W3C does
  not use an expected.txt file for JS tests)
    * Fail - Add to test_expectations file to avoid failures
      * Over time, individual can clean up failing JS tests

 If they don't have expected.txt files, how is failure determined?

 Why would we want to add failures to test_expectations.txt here but
 not for pixel tests or reftests? If anything, these text-only tests
 are *more* amenable to checking in the what we do now, even if it's
 wrong expectation.


 I agree we should just generate expected.txt.

  iv. Manual tests
   * Submit in their current form
     * Over time, convert to ref tests to be submitted back to W3C

 I don't know what submit in their current form means ... doesn't
 submitting have to do with exporting tests (i.e., importing into the
 w3c repos), and we're talking about importing tests?

 Are Manual tests somehow different from the other non-ref tests?


 I think what he meant is to just import as is. (They won't work as intended
 but we can't really do anything about it).

  1. How should W3C 

Re: [webkit-dev] Process for importing third party tests

2012-05-08 Thread Ryosuke Niwa
On Tue, May 8, 2012 at 1:04 PM, Dirk Pranke dpra...@chromium.org wrote:

  On Tue, May 8, 2012 at 12:42 PM, Ryosuke Niwa rn...@webkit.org wrote:
  On Wed, Apr 25, 2012 at 2:18 PM, Dirk Pranke dpra...@chromium.org
 wrote:
  I am also more than a little leery of mixing -expected.{txt,png}
  results with -expected.html results; I feel like it would be very
  confusing, and it would lose many of the advantages of reftests, since
  we'd presumably have to update the reference every time as often as we
  update pixel tests. [We could create a fake reftest that just
  contained the 800x600 pixel dump, but I'm not sure if that's better or
  not].
 
  I don't think we will lose advantages. Without some sort of the current
  result, we will not be able to catch regressions.
  According to Maciej, we've caught many regressions by using conformance
  tests this way.

 I have no doubt that you can catch regressions by using conformance
 tests this way. My concern -- which is expressed throughout and which
 you seemed to either miss or downplay -- is that adding more tests
 creates more work, especially for pixel tests (and it slows down build
 and test cycles, obviously). I don't think we should just add tests
 because they exist somewhere on the web; they may provide no
 additional coverage beyond the tests we already have.


Yes, that's why we want reviews. And that's why we should only import tests
from W3C instead of other browser vendors. We expect W3C to have some
guideline on avoiding test duplicates.

I feel like we need a stronger mechanism to either check that new test
 suites do cover more functionality or we move to obsolete tests we
 already have.


We can't really verify that two tests test same functionality, etc...
automatically. Also, the most general form of this question is undecidable.

However, there are a couple of ways to mitigate this issue:

   1. Upstream as many layout tests as possible to W3C
   2. Delete duplicate layout tests as we import more tests from W3C

To be clear, I am all for importing test suites when we believe they
 are comprehensive or do cover things we don't cover well now.


This should probably be judged by individual reviewers.

But, for example, rather than having four different test suites for
 flexbox, I
 would rather see us have one good one.


Sure but I'd that's a really hard problem to solve.

  Also, I was under the impression that (a) the W3C is mostly focused on
  ref tests going forward and (b) we had agreed in that discussion that
  we wouldn't import non-ref tests? Did something change in a discussion
  after that session?
 
  No. We had agreed to import all tests regardless of whether they're
 reftests
  or not because non-reftests can still catch future regressions or
  progressions. And the number of PNG files added to the repository wasn't
  considered as a valid counter-argument due to this utility.

 Well, I certainly didn't agree to it :) My concern is not the # of
 PNGs so much as the cost of maintenance.


Sure, that's a valid concern but the overwhelming majority of the people in
the room (e.g. Darin, Maciej, etc...) seemed to agree that this is a good
idea.

 I don't understand your proposal about adding platform/webkit. Why do we
  want that? As far as I know, there are no files in W3C test directories
 that
  end with -expected.txt or -expected.png.

 The idea would be that no webkit-specific files would live in the test
 directory, only files received from upstream. My thinking was that it
 would make importing new versions easier and it would be easier to
 understand what was ours vs. what was theirs. I don't feel that
 strongly about this, though, it was just an idea.


That'll be nice indeed. But if we're going this route, we should probably
move all existing -expected.* to this directory as well. So this is
probably a tangential issue. Similar to:

 By ultimately move all existing tests, I assume you're including
  tests that are currently in LayoutTests that have not come from (or
  been submitted to) the W3C, e.g., the tests in fast/ ?
 
  Yes.

 I think reorganizing our existing test tree is an entirely different
 discussion. I'm all for it, I just don't want to confuse it with the
 discussion about importing test suites.


- Ryosuke
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Process for importing third party tests

2012-04-25 Thread Dirk Pranke
On Mon, Apr 23, 2012 at 4:13 PM, Jacob Goldstein jac...@adobe.com wrote:
 At the recent WebKit Contributors Meeting, a process was drafted for
 importing third party tests into WebKit.

 I created a wiki page that captures the process that we came up with here:

 http://trac.webkit.org/wiki/ImportingThirdPartyTests

 We'd like to get more input from the community on all aspects of this
 process.

 Please review and lets discuss further.


Hi Jacob,

I've only had a chance to glance over the document thus, but thanks
for writing it up!

Two initial comments:

 1a. Import entire test suite(s) from W3C repository: pixel tests, ref tests, 
 manual tests, JS tests

You should probably change Import to download or something more
descriptive here, since the whole process is an import :)

 1b. Run suite locally in WebKit directory
   * Ref Tests
 * Pass - good, submit it
 * Fail - create a testName-expected-failure.txt file

We don't currently generate an -expected-failure.txt for reftests;
there's no concept of a baseline for a reftest at all.

Put differently, assuming the normal WebKit model of the baseline is
what we currently produce, we don't actually have a way of capturing
what do we current produce for a reference test.

I am also more than a little leery of mixing -expected.{txt,png}
results with -expected.html results; I feel like it would be very
confusing, and it would lose many of the advantages of reftests, since
we'd presumably have to update the reference every time as often as we
update pixel tests. [We could create a fake reftest that just
contained the 800x600 pixel dump, but I'm not sure if that's better or
not].

 ii. DRT / Pixel tests
   * Expectations: create render tree dumps for each test and use that output 
 as the new test expectation
 * Potential regressions will be identified by changes to this output
   * Proposal (open to discussion) - stop the production of .PNGs for these 
 imported tests
 * PROS
   * Avoid the increase to the overall size of the repository caused by 
 all the new PNGs
   * Regressions will likely be caught by the render tree dumps
   * Avoid maintenance of all associated PNGs
 * CONS
   * Regressions may be missed without the use of .PNGs
   * May make test results harder to interpret

I'm not particularly a fan of this. I think each port should follow
its own convention for pixel tests or no. i.e., if Chromium normally
runs pixel tests, it should run all of these tests as pixel tests; if
Gtk doesn't, than they should just check the render trees here as
well.

Also, I was under the impression that (a) the W3C is mostly focused on
ref tests going forward and (b) we had agreed in that discussion that
we wouldn't import non-ref tests? Did something change in a discussion
after that session?

 iii. JavaScript tests
   * Pass - good, submit it (along with new expected.txt file - W3C does not 
 use an expected.txt file for JS tests)
   * Fail - Add to test_expectations file to avoid failures
 * Over time, individual can clean up failing JS tests

If they don't have expected.txt files, how is failure determined?

Why would we want to add failures to test_expectations.txt here but
not for pixel tests or reftests? If anything, these text-only tests
are *more* amenable to checking in the what we do now, even if it's
wrong expectation.

So, it seems like we have three different kinds of tests that you are
suggesting we treat three different ways. You can probably guess that
I don't like that :).

 iv. Manual tests
  * Submit in their current form
* Over time, convert to ref tests to be submitted back to W3C

I don't know what submit in their current form means ... doesn't
submitting have to do with exporting tests (i.e., importing into the
w3c repos), and we're talking about importing tests?

Are Manual tests somehow different from the other non-ref tests?

 1. How should W3C tests that fail in WebKit be handled?
   a. Failures should be checked in. Details in General Import Process above.

We discussed this in the session, but I don't see this in the notes; I
would really like for us to move to model in our repo where it's
possible to look at the filename for the baselines and determine
whether the baseline is believed to be correct, incorrect, or unknown,
in addition to capturing what we currently do (these are independent
axes).

This might be a separate discussion -- and of course there are
complications that arise with this -- but I would like to establish it
before we go to far down the import path ... in particular, I think it
will be difficult to convince the chromium devs to move fully off
their current model of checked in files are correct; if we currently
do something different, we suppress that.

 2. Should a set frequency be used for importing tests?
   a. No, frequency is up to the people who want to do this task.

I'm fine w/ this

 3. Can the approval process for previously reviewed W3C tests be streamlined?
 

[webkit-dev] Process for importing third party tests

2012-04-23 Thread Jacob Goldstein
At the recent WebKit Contributors Meeting, a process was drafted for importing 
third party tests into WebKit.

I created a wiki page that captures the process that we came up with here:

http://trac.webkit.org/wiki/ImportingThirdPartyTests

We'd like to get more input from the community on all aspects of this process.

Please review and lets discuss further.


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev