On Wed, Aug 15, 2012 at 5:00 PM, Filip Pizlo <fpi...@apple.com> wrote:
> I believe that the cognitive load is greater than any benefit from > catching bugs incidentally by continuing to run a (1-fail) or (3) test, and > continuing to evaluate whether or not the expectation matches some notions > of desired behavior. > As someone who has spent a lot of time maintaining Chromium's expectations, this seems clearly false, if your proposed alternative is to stop running the test. This is because a very common course of events is for a test to begin failing, and then later on return to passing. We (Chromium) see this all the time with e.g. Skia changes, where for example the Skia folks will rewrite gradient handling to more perfectly match some spec and as a result dozens or hundreds of tests, many not explicitly intended to be about gradient handling, will change and possibly begin passing. By contrast, if we aren't running a test, we don't know when the test begins passing again (except by trying to run it). The resulting effect is that skipped tests tend to remain skipped. Tests that remain skipped are no better than no tests. And even if such tests are periodically retested, once a test's output changes, there is a large window of time where the test wasn't running, making it difficult to pinpoint exactly what caused the change and whether the resulting effect is intentional and beneficial. If we ARE running a test, then when the results change, knowing whether the existing result was thought to be correct or not is a critical part of a sheriff's job in deciding what to do about the change. This is one reason why Chromium has never gone down the path of simply checking in failure expectations, and something that Dirk's proposal explicitly tries to address while still allowing ports that (IMO mistakenly) don't care to continue to not care. We already have some good tooling (e.g. garden-o-matic) that could be extended to show and update the small amount of additional info Dirk is proposing. I am very skeptical of abstract claims that this proposal inflates complexity and decreases productivity in the absence of actually testing a real workflow using the tools that we sheriffs really use to maintain tree greenness. I would like to see this proposal tested to get concrete feedback instead of arguments on principle. PK
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev