The typical approach used in situations that you describe is to rebase, not 
skip.  This avoids the problem of not knowing when the test started passing.  
Hence, I'm not sure what you're implying.  Maybe a better example would help.


On Aug 15, 2012, at 5:39 PM, Peter Kasting <pkast...@chromium.org> wrote:

> On Wed, Aug 15, 2012 at 5:00 PM, Filip Pizlo <fpi...@apple.com> wrote:
> I believe that the cognitive load is greater than any benefit from catching 
> bugs incidentally by continuing to run a (1-fail) or (3) test, and continuing 
> to evaluate whether or not the expectation matches some notions of desired 
> behavior.
> 
> As someone who has spent a lot of time maintaining Chromium's expectations, 
> this seems clearly false, if your proposed alternative is to stop running the 
> test.  This is because a very common course of events is for a test to begin 
> failing, and then later on return to passing.  We (Chromium) see this all the 
> time with e.g. Skia changes, where for example the Skia folks will rewrite 
> gradient handling to more perfectly match some spec and as a result dozens or 
> hundreds of tests, many not explicitly intended to be about gradient 
> handling, will change and possibly begin passing.
> 
> By contrast, if we aren't running a test, we don't know when the test begins 
> passing again (except by trying to run it).  The resulting effect is that 
> skipped tests tend to remain skipped.  Tests that remain skipped are no 
> better than no tests.  And even if such tests are periodically retested, once 
> a test's output changes, there is a large window of time where the test 
> wasn't running, making it difficult to pinpoint exactly what caused the 
> change and whether the resulting effect is intentional and beneficial.
> 
> If we ARE running a test, then when the results change, knowing whether the 
> existing result was thought to be correct or not is a critical part of a 
> sheriff's job in deciding what to do about the change.  This is one reason 
> why Chromium has never gone down the path of simply checking in failure 
> expectations, and something that Dirk's proposal explicitly tries to address 
> while still allowing ports that (IMO mistakenly) don't care to continue to 
> not care.
> 
> We already have some good tooling (e.g. garden-o-matic) that could be 
> extended to show and update the small amount of additional info Dirk is 
> proposing.  I am very skeptical of abstract claims that this proposal 
> inflates complexity and decreases productivity in the absence of actually 
> testing a real workflow using the tools that we sheriffs really use to 
> maintain tree greenness.
> 
> I would like to see this proposal tested to get concrete feedback instead of 
> arguments on principle.

I would not like to see our testing infrastructure get any more complicated 
than it already is, just because of a philosophical direction chosen 
unilaterally by one port.

> 
> PK

_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev

Reply via email to