On Oct 26, 2012, at 11:11 PM, Ryosuke Niwa <rn...@webkit.org> wrote:

> 
> I’m sure Antti, Alexey, and others who have worked on the loader and other 
> parts of WebKit are happy to write those tests or list the kind of things 
> they want to test. Heck, I don’t mind writing those tests if someone could 
> make a list.
> 
> I totally sympathize with the sentiment to reduce the test flakiness but 
> loader and cache code have historically been under-tested, and we’ve had a 
> number of bugs detected only by running non-loader tests consecutively.
> 
> On the contrary, we’ve had this DRT behavior for ages. Is there any reason we 
> can’t wait for another couple of weeks or months until we add more loader & 
> cache tests before making the behavior change?

I think the nature of loader and cache code is that it's very hard to make 
tests which always fail deterministically when regressions are introduced, as 
opposed to randomly. The reason for this is that bugs in these areas are often 
timing-dependent. I think it's likely this tendency to fail randomly will be 
the case whether or not the tests are trying to explicitly test the cache or 
are just incidentally doing so in the course of other things.

Unfortunately, it's very tempting when a test is failing randomly to blame the 
test rather than to investigate whether there is an actual regression affecting 
it. And sometimes it really is the test's fault. But sometimes it is a genuine 
bug in the code. 

On the other hand, nondetermisitic test failures make it harder to use test 
infrastructure in general.

These are difficult things to reconcile. The original philosophy of WebKit 
tests is to test end-to-end under relatively realistic conditions, but at the 
same time unpredictability makes it hard to stay at zero regressions.

I think making different ports do testing under different conditions makes it 
more likely that some contributors will introduce regressions without noticing, 
leaving it for others to clean up. So it's regrettable if we go that way 
because we are unable to reach consensus. Creating some special opt-in --antti 
mode would be even worse, as it's almost certain that failures would creep into 
a mode that nobody runs.

What I personally would most wish for is good tools to catch when a test starts 
failing nondeterministically, and to identify the revision where the failures 
began. The reason we hate random failures is that they are hard to track down 
and diagnose. But some types of bugs are unlikely to manifest in a purely 
deterministic way. It would be good if we had a reliable and useful way to 
catch those types of bugs.

Regards,
Maciej

_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev

Reply via email to