Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Wed, Oct 31, 2012 at 12:05 AM, Alexey Proskuryakov a...@apple.com wrote: This will mean that cache is always almost empty, and all resources in it are extremely fresh. I don't know if this would provide substantial additional test coverage over cleaning the cache all the time, or just completely disabling it in WebKitTestRunner. Certain areas of coverage would improve. The code paths taken when a resource is restored from the memory cache can be quite different from the usual loading. Many operations (like script execution) happen synchronously if the resource is found from the cache. We reuse various decoded forms (bitmaps, stylesheets, jsc parse structures, likely more in the future). All data is available in single chunk. It is possible to write tests that detect these differences (and it is possible that some tests hit them accidentally). We would still lose coverage for things that depend on having lots of resources around like cache pruning. antti - WBR, Alexey Proskuryakov ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Fri, Nov 2, 2012 at 12:33 PM, Antti Koivisto koivi...@iki.fi wrote: On Wed, Oct 31, 2012 at 12:05 AM, Alexey Proskuryakov a...@apple.comwrote: This will mean that cache is always almost empty, and all resources in it are extremely fresh. I don't know if this would provide substantial additional test coverage over cleaning the cache all the time, or just completely disabling it in WebKitTestRunner. Certain areas of coverage would improve. The code paths taken when a resource is restored from the memory cache can be quite different from the usual loading. Many operations (like script execution) happen synchronously if the resource is found from the cache. We reuse various decoded forms (bitmaps, stylesheets, jsc parse structures, likely more in the future). All data is available in single chunk. It is possible to write tests that detect these differences (and it is possible that some tests hit them accidentally). We would still lose coverage for things that depend on having lots of resources around like cache pruning. In this case to improve code coverage all tests should run twice - 1st with clear cache and 2nd run after that in order to test cached case. Slava ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Oct 28, 2012, at 3:30 PM, Antti Koivisto koivi...@iki.fi wrote: We could clear the cache between tests but run each test twice in a row. Second run will then happen with deterministically pre-populated cache. That would both make things more predictable and improve our test coverage for cached cases. Unfortunately it would also slow down testing significantly, though less than 2x. I actually really like this idea. Doing it this way would effectively run each test both completely uncached, and fully cached, which would be better test coverage than our current approach. Can we get an estimate on what this would cost if applied to our whole test suite? Could we do it for just a subset of the tests? (BTW I think this is better than the virtual test suite approach suggested by Dirk; running the test with all its resources cached from having loaded it immediately before is more reliable and better test coverage than running it later as part of some sequence that doesn't clear the cache.) Does anyone strongly object to this approach? It seems way better to me than other options discussed on this thread. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Mon, Oct 29, 2012 at 5:48 AM, Maciej Stachowiak m...@apple.com wrote: On Oct 28, 2012, at 10:09 PM, Dirk Pranke dpra...@chromium.org wrote: On Sun, Oct 28, 2012 at 6:32 AM, Maciej Stachowiak m...@apple.com wrote: I think the nature of loader and cache code is that it's very hard to make tests which always fail deterministically when regressions are introduced, as opposed to randomly. The reason for this is that bugs in these areas are often timing-dependent. I think it's likely this tendency to fail randomly will be the case whether or not the tests are trying to explicitly test the cache or are just incidentally doing so in the course of other things. I am not familiar with the loader and caching code in webkit, but I know enough about similar problem spaces to be puzzled by why it's impossible to write tests that can adequately test the code. Has anyone claimed that? I think impossible to write tests that can adequately test the code is not a position that anyone in this thread has taken, certainly not me above. My claim is only that many classes of loader and cache bugs, when first introduced, are likely to cause nondeterministic test failures. And further, this is likely to be the case even if tests are written to target that subsystem. That's not the same as saying adequate tests are impossible. I'm sorry, I didn't mean impossible literally. Please strike that, as it sounds like it has just made a confusing situation worse. But, you did claim that it would be very hard to make tests that always fail deterministically, and I don't see why that's true? Testing things that are timing-dependent only require that you be able to control or simulate time. It may be that this is hard to do with layout tests, but it's pretty straightforward with unit tests that allow you to control the layers above and below the cache. It just means to have good testing of some areas of the code, we need a good way of dealing with nondeterministic failures. This is backwards. If you *don't* have good testing, more of your failures are likely to show up sporadically, which leads you to want to build tools for them. Randomized testing is a helpful tool to use *alongside* focused testing to ensure coverage, but should not be used as a replacement. What I personally would most wish for is good tools to catch when a test starts failing nondeterministically, and to identify the revision where the failures began. The reason we hate random failures is that they are hard to track down and diagnose. But some types of bugs are unlikely to manifest in a purely deterministic way. It would be good if we had a reliable and useful way to catch those types of bugs. This is a fine idea -- and I'm always happy to talk about ways we can improve our test tooling, please feel free to start a separate thread on these issues -- but I don't want to lose sight of the main issue here. I think the problem I identified -- that it's overly hard to track down and diagnose regressions that cause tests to fail only part of the time -- is more important and more fundamental than any of the three problems that you cite below. Our test infrastructure ultimately exists to help us notice and promptly fix regressions, and for some types of regressions, namely those that do not manifest 100% of the time, it is not working so well. The problems you mention are all secondary consequences of that fundamental problem, in my opinion. First of all, this isn't an either/or situation. We should be capable of addressing all of these issues in parallel. Second, I don't see how the existence of bugs in the code, the lack of test isolation, or the lack of good test coverage for certain layers of the code follow from not having good tools to triage intermittent failures? That seems like putting the cart before the horse. Third, are you familiar with the flakiness dashboard? http://test-results.appspot.com/dashboards/flakiness_dashboard.html#group=%40ToT%20-%20webkit.orgbuilder=Apple%20Lion%20Debug%20WK1%20(Tests) Does it not do exactly what you're describing? Are there things that you would like added? If it would be helpful for us to have a meeting or something to help explain how this works, I'm sure we could set one up. - Maciej It sounds like we've identified three existing problems - please correct me if I'm misstating them: 1. There appears to be a bug in the caching code that is causing tests for other parts of the system to fail randomly. 2. DRT and WTR on some ports are implemented in a way that is causing the system to be more fragile than some of us would like it to be, and there doesn't seem to be an a priori need for this to be the case; indeed some ports already don't do this. 3. We don't apparently have dedicated test coverage for caching and the loader that people think is good enough, and getting such tests might be hard. P.S. I do think your problem
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Oct 26, 2012, at 11:11 PM, Ryosuke Niwa rn...@webkit.org wrote: I’m sure Antti, Alexey, and others who have worked on the loader and other parts of WebKit are happy to write those tests or list the kind of things they want to test. Heck, I don’t mind writing those tests if someone could make a list. I totally sympathize with the sentiment to reduce the test flakiness but loader and cache code have historically been under-tested, and we’ve had a number of bugs detected only by running non-loader tests consecutively. On the contrary, we’ve had this DRT behavior for ages. Is there any reason we can’t wait for another couple of weeks or months until we add more loader cache tests before making the behavior change? I think the nature of loader and cache code is that it's very hard to make tests which always fail deterministically when regressions are introduced, as opposed to randomly. The reason for this is that bugs in these areas are often timing-dependent. I think it's likely this tendency to fail randomly will be the case whether or not the tests are trying to explicitly test the cache or are just incidentally doing so in the course of other things. Unfortunately, it's very tempting when a test is failing randomly to blame the test rather than to investigate whether there is an actual regression affecting it. And sometimes it really is the test's fault. But sometimes it is a genuine bug in the code. On the other hand, nondetermisitic test failures make it harder to use test infrastructure in general. These are difficult things to reconcile. The original philosophy of WebKit tests is to test end-to-end under relatively realistic conditions, but at the same time unpredictability makes it hard to stay at zero regressions. I think making different ports do testing under different conditions makes it more likely that some contributors will introduce regressions without noticing, leaving it for others to clean up. So it's regrettable if we go that way because we are unable to reach consensus. Creating some special opt-in --antti mode would be even worse, as it's almost certain that failures would creep into a mode that nobody runs. What I personally would most wish for is good tools to catch when a test starts failing nondeterministically, and to identify the revision where the failures began. The reason we hate random failures is that they are hard to track down and diagnose. But some types of bugs are unlikely to manifest in a purely deterministic way. It would be good if we had a reliable and useful way to catch those types of bugs. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
We could clear the cache between tests but run each test twice in a row. Second run will then happen with deterministically pre-populated cache. That would both make things more predictable and improve our test coverage for cached cases. Unfortunately it would also slow down testing significantly, though less than 2x. antti ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
We can live in one of two worlds: 1) LayoutTests that concern themselves with specific network/loading concerns need to use unique URLs to refer to static data; or 2) DRT clears JS-visible state between tests. The pros/cons seem clear to me: Pro#1: loading/caching code is coincidentally tested by (unknown) tests that reuse URLs among themselves. Con#1: requires additional cognitive load for all webkit developers; the only way to write a test that won't be affected by future addition of unrelated tests is to use unique URLs Pro#2: principle of least-surprise is maintained; understanding DRT reading a test (and not every other test) is enough to understand its behavior Con#2: loading/caching code needs to be tested explicitly. IMO (Pro#2 + -Con#1) (Pro#1 + -Con#2). Are you saying you believe the inequality goes a different way, or am I missing some other feature of your thesis? Yes, this is a fair description. I'm going to assume you mean that yes, you believe the inequality goes the other way: (Pro#2 + -Con#1) (Pro#1 + -Con#2) This accidental testing is not something to be neglected I'm not neglecting it, I'm evaluating its benefit to be less than its cost. To make concrete the cost/benefit tradeoff, would you add a random sleep() into DRT execution to detect timing-related bugs? It seems like a crazy thing to do, to me, but it would certainly catch timing-related bugs quite effectively. If you don't think we should do that, can you describe how you're evaluating cost/benefit in each of the cases and why you arrive at different conclusions? (of course, adding such random sleeps under default-disabled flag control for bug investigation could make a lot of sense; but here I'm talking about what we do on the bots by default) It's not humanly possible to have tests for everything in advance. Of course. But we should at least make it humanly possible to understand our tests as written :) Making understanding our tests not humanly possible isn't the way to make up for the not-humanly-possible nature of testing everything in every way. It just means we push off not knowing how much coverage we really have, and derive a false sense of security from the fact that bugs have been found in the past. I completely agree with Maciej's idea that we should think about ways to make non-deterministic failures easier to work with, so that they would lead to discovering the root cause more directly, and without the costs currently associated with it. I have no problem with that, but I'm not sure how it relates to this thread unless one takes an XOR approach, in which case I guess I have low faith that the bigger problem Maciej highlights will be solved in a reasonable timeframe (weeks/months). Memory allocator state. Computer's real time clock. Hard drive's head position if you have a spinning hard drive, or SSD controller state if you have an SSD. HTTP cookies. Should I continue the list? These things are all outside of webkit. Yes, they are outside WebKit, but not outside WebKit control, if needed. Did you intend that to be an objection? I imagine Balazs was pointing out that you included items that are not JS-visible in an answer to my question about things that are JS-visible. But that was part of an earlier fork of this thread that went nowhere, so let's let it go. Cheers, -a ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Oct 26, 2012, at 11:11 PM, Ryosuke Niwa rn...@webkit.org wrote: I’m sure Antti, Alexey, and others who have worked on the loader and other parts of WebKit are happy to write those tests or list the kind of things they want to test. Heck, I don’t mind writing those tests if someone could make a list. I totally sympathize with the sentiment to reduce the test flakiness but loader and cache code have historically been under-tested, and we’ve had a number of bugs detected only by running non-loader tests consecutively. On the contrary, we’ve had this DRT behavior for ages. Is there any reason we can’t wait for another couple of weeks or months until we add more loader cache tests before making the behavior change? Please correct me if I'm misinformed, but it's been three months since this issue was first raised, and it doesn't sound like they've been writing those tests or are happy to do so, and despite people asking on this thread, they haven't been listing the kinds of tests they think they need. Have we actually made any progress here, or was the issue dropped until Ami raised it again? It seems like the latter to me ... again, please correct me if this is being actively worked on, because that would change the whole tenor of this debate. On Sun, Oct 28, 2012 at 6:32 AM, Maciej Stachowiak m...@apple.com wrote: I think the nature of loader and cache code is that it's very hard to make tests which always fail deterministically when regressions are introduced, as opposed to randomly. The reason for this is that bugs in these areas are often timing-dependent. I think it's likely this tendency to fail randomly will be the case whether or not the tests are trying to explicitly test the cache or are just incidentally doing so in the course of other things. I am not familiar with the loader and caching code in webkit, but I know enough about similar problem spaces to be puzzled by why it's impossible to write tests that can adequately test the code. Is the caching disk-based, and maybe running tests in parallel screwing with things? If so, then maybe the fact that we now run tests in parallel is why this is a problem now and hasn't been before? Or maybe the fact that a given process doesn't always see the same tests in the same order is the problem? Unfortunately, it's very tempting when a test is failing randomly to blame the test rather than to investigate whether there is an actual regression affecting it. And sometimes it really is the test's fault. But sometimes it is a genuine bug in the code. On the other hand, nondetermisitic test failures make it harder to use test infrastructure in general. These are difficult things to reconcile. The original philosophy of WebKit tests is to test end-to-end under relatively realistic conditions, but at the same time unpredictability makes it hard to stay at zero regressions. Exactly. Personally, the cost of unpredictability in the test infrastructure is so much higher than the value we're getting (implicitly) that this is a no-brainer to me. There are some tradeoffs (like running tests in parallel) that are worth it, but this isn't one of them. I am happy to explain further my thinking and standards if there's interest. Hopefully that partially answers Alexey's questions about where we should draw the line in trying to make our tests deterministic and hermetic: do everything you reasonably can. We're not picking on caching here. I think making different ports do testing under different conditions makes it more likely that some contributors will introduce regressions without noticing, leaving it for others to clean up. So it's regrettable if we go that way because we are unable to reach consensus. I agree that it is bad to have different ports behaving differently, and I would like to avoid that as well. I don't want any port suffering from flaky tests, but I also don't think it's reasonable to have one group force that on everyone else indefinitely, either. I am also fine with having some way to test systems more non-deterministically in a way to expose more bugs, but that needs to be clearly separated from the other testing we do; it is an unfair cost to impose on the rest of the system otherwise and should be tolerated only if we have no other choice. We have other choices. Creating some special opt-in --antti mode would be even worse, as it's almost certain that failures would creep into a mode that nobody runs. This comment (and Antti's suggestion, below) makes me think that you didn't understand my virtual test suite suggestion; that's not surprising, since Apple doesn't actual use this feature of NRWT yet. A virtual test suite is a way of saying (re-)run the tests under directory X with command-line flags Y and Z, and put the results in a new directory. For example, Chromium runs all of the tests in fast/canvas twice, once normally using the regular software code path, and once with a
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On 10/28/2012 08:25 PM, Ami Fischman wrote: We can live in one of two worlds: 1) LayoutTests that concern themselves with specific network/loading concerns need to use unique URLs to refer to static data; or 2) DRT clears JS-visible state between tests. The pros/cons seem clear to me: Pro#1: loading/caching code is coincidentally tested by (unknown) tests that reuse URLs among themselves. Con#1: requires additional cognitive load for all webkit developers; the only way to write a test that won't be affected by future addition of unrelated tests is to use unique URLs Pro#2: principle of least-surprise is maintained; understanding DRT reading a test (and not every other test) is enough to understand its behavior Con#2: loading/caching code needs to be tested explicitly. IMO (Pro#2 + -Con#1) (Pro#1 + -Con#2). Are you saying you believe the inequality goes a different way, or am I missing some other feature of your thesis? Yes, this is a fair description. I'm going to assume you mean that yes, you believe the inequality goes the other way: (Pro#2 + -Con#1) (Pro#1 + -Con#2) This accidental testing is not something to be neglected I'm not neglecting it, I'm evaluating its benefit to be less than its cost. To make concrete the cost/benefit tradeoff, would you add a random sleep() into DRT execution to detect timing-related bugs? It seems like a crazy thing to do, to me, but it would certainly catch timing-related bugs quite effectively. If you don't think we should do that, can you describe how you're evaluating cost/benefit in each of the cases and why you arrive at different conclusions? (of course, adding such random sleeps under default-disabled flag control for bug investigation could make a lot of sense; but here I'm talking about what we do on the bots by default) It's not humanly possible to have tests for everything in advance. Of course. But we should at least make it humanly possible to understand our tests as written :) Making understanding our tests not humanly possible isn't the way to make up for the not-humanly-possible nature of testing everything in every way. It just means we push off not knowing how much coverage we really have, and derive a false sense of security from the fact that bugs have been found in the past. I completely agree with Maciej's idea that we should think about ways to make non-deterministic failures easier to work with, so that they would lead to discovering the root cause more directly, and without the costs currently associated with it. I have no problem with that, but I'm not sure how it relates to this thread unless one takes an XOR approach, in which case I guess I have low faith that the bigger problem Maciej highlights will be solved in a reasonable timeframe (weeks/months). Memory allocator state. Computer's real time clock. Hard drive's head position if you have a spinning hard drive, or SSD controller state if you have an SSD. HTTP cookies. Should I continue the list? These things are all outside of webkit. Yes, they are outside WebKit, but not outside WebKit control, if needed. Did you intend that to be an objection? I imagine Balazs was pointing out that you included items that are not JS-visible in an answer to my question about things that are JS-visible. But that was part of an earlier fork of this thread that went nowhere, so let's let it go. I was just meaning that it is not feasible to force every external dependency to reset it's state, neither we want it. We just trust in them. But the cache is in WebKit, and we can reset it's state. So either resetting the cache is a good or a bad idea, I think it has nothing to do with the fact that we cannot reset the OS and the hardware (and external libs of course). ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Sun, Oct 28, 2012 at 2:47 PM, Ryosuke Niwa rn...@webkit.org wrote: On Sun, Oct 28, 2012 at 2:09 PM, Dirk Pranke dpra...@chromium.org wrote: On Oct 26, 2012, at 11:11 PM, Ryosuke Niwa rn...@webkit.org wrote: I’m sure Antti, Alexey, and others who have worked on the loader and other parts of WebKit are happy to write those tests or list the kind of things they want to test. Heck, I don’t mind writing those tests if someone could make a list. I totally sympathize with the sentiment to reduce the test flakiness but loader and cache code have historically been under-tested, and we’ve had a number of bugs detected only by running non-loader tests consecutively. On the contrary, we’ve had this DRT behavior for ages. Is there any reason we can’t wait for another couple of weeks or months until we add more loader cache tests before making the behavior change? Please correct me if I'm misinformed, but it's been three months since this issue was first raised, and it doesn't sound like they've been writing those tests or are happy to do so, and despite people asking on this thread, they haven't been listing the kinds of tests they think they need. I don't think anyone else had suggested adding tests as an option or set a deadline until I suggested yesterday (or when I did in my original reply to the thread). In fact, since Ami posted his reply on October 26th 1:20AM (PST), many contributors from non-PST timezones haven't even had a chance to read his post during normal business hours. Given that I'd think it's totally unreasonable to land the patch as is without giving people reasonable amount of time (~one week) to respond to this thread. Both you and Eric U suggesting adding new tests for this in the original thread on 8/9; in fact, this whole issue got a fair amount of discussion then, and this round hasn't really added anything new. I'm happy to wait a little longer if others want to come up with some other suggestions; I apologize if my previous response sounded like I was throwing down a gauntlet or otherwise not open to ideas; that was definitely not my intent. Rather, I was attempting to say that unless someone else has other ideas, the right path forward seems fairly clear to me and that I intended to proceed down it. Does that seem more reasonable to you? -- Dirk ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Sun, Oct 28, 2012 at 4:37 PM, Dirk Pranke dpra...@chromium.org wrote: On Sun, Oct 28, 2012 at 2:47 PM, Ryosuke Niwa rn...@webkit.org wrote: On Sun, Oct 28, 2012 at 2:09 PM, Dirk Pranke dpra...@chromium.org wrote: On Oct 26, 2012, at 11:11 PM, Ryosuke Niwa rn...@webkit.org wrote: I’m sure Antti, Alexey, and others who have worked on the loader and other parts of WebKit are happy to write those tests or list the kind of things they want to test. Heck, I don’t mind writing those tests if someone could make a list. I totally sympathize with the sentiment to reduce the test flakiness but loader and cache code have historically been under-tested, and we’ve had a number of bugs detected only by running non-loader tests consecutively. On the contrary, we’ve had this DRT behavior for ages. Is there any reason we can’t wait for another couple of weeks or months until we add more loader cache tests before making the behavior change? Please correct me if I'm misinformed, but it's been three months since this issue was first raised, and it doesn't sound like they've been writing those tests or are happy to do so, and despite people asking on this thread, they haven't been listing the kinds of tests they think they need. I don't think anyone else had suggested adding tests as an option or set a deadline until I suggested yesterday (or when I did in my original reply to the thread). In fact, since Ami posted his reply on October 26th 1:20AM (PST), many contributors from non-PST timezones haven't even had a chance to read his post during normal business hours. Given that I'd think it's totally unreasonable to land the patch as is without giving people reasonable amount of time (~one week) to respond to this thread. Both you and Eric U suggesting adding new tests for this in the original thread on 8/9; in fact, this whole issue got a fair amount of discussion then, and this round hasn't really added anything new. Yeah, but I don't think it got much traction back then. Also, we didn't have any deadlines like weeks or months. I'm happy to wait a little longer if others want to come up with some other suggestions; I apologize if my previous response sounded like I was throwing down a gauntlet or otherwise not open to ideas; that was definitely not my intent. Rather, I was attempting to say that unless someone else has other ideas, the right path forward seems fairly clear to me and that I intended to proceed down it. Does that seem more reasonable to you? Yes. - Ryosuke ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
There are lot of things remaining in the process across tests runs What things remain in the process across test runs that are visible to DRT/JS? As I've said before in this thread, it seems axiomatic to me that tests can only be reasoned about if they run in a pristine environment. This is why we TestShell::resetTestController()http://trac.webkit.org/browser/trunk/Tools/DumpRenderTree/chromium/TestShell.cpp#L300; so that a given test passes or fails the same way regardless of what other tests have run in the same process earlier. Given that we *do* reset the execution environment between tests, it seems arbitrary (and unworkable) to *not* reset the cache. Cheers, -a ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
27.10.2012, в 20:47, Ami Fischman fisch...@chromium.org написал(а): There are lot of things remaining in the process across tests runs What things remain in the process across test runs that are visible to DRT/JS? Memory allocator state. Computer's real time clock. Hard drive's head position if you have a spinning hard drive, or SSD controller state if you have an SSD. HTTP cookies. Should I continue the list? As I've said before in this thread, it seems axiomatic to me that tests can only be reasoned about if they run in a pristine environment. This is an empty statement. A computer always provides you with a pristine environment until its RAM or other storage starts randomly failing. I would agree that tests would become useless if ran on a machine with faulty RAM. But people working on the project have successfully reasoned about flaky test failures many times in the past. This is why we TestShell::resetTestController(); so that a given test passes or fails the same way regardless of what other tests have run in the same process earlier. Given that we *do* reset the execution environment between tests, it seems arbitrary (and unworkable) to *not* reset the cache. I don't think that pure logic can prove the need. As mentioned before, cache is just an entirely arbitrary target from this point of view. We do reset preferences that are temporarily changed by tests. This is basically modeled on user expectations - changing a preference is expected to change how your browser behaves, so it's OK for tests to depend on that. But visiting site A is not expected to affect behavior on site B, even though cache state was affected by site A. - WBR, Alexey Proskuryakov ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
This thread stalled out because although there seemed to be majority agreement that hermetic/repeatable tests are a good thing, there was a requirement that all ports be updated to the new behavior at the same time, and I'm only competent to do the chromium DRT (see https://bugs.webkit.org/show_bug.cgi?id=93195 for details). Is anyone interested in stepping up and doing the equivalent (clear caches between tests) for the mac and/or gtk ports' DRTs? On Wed, Aug 8, 2012 at 2:35 PM, Dirk Pranke dpra...@chromium.org wrote: On Wed, Aug 8, 2012 at 10:47 AM, Ojan Vafai o...@chromium.org wrote: See https://bugs.webkit.org/show_bug.cgi?id=93195. media/W3C/video/networkState/networkState_during_progress.html and media/video-poster-blocked-by-willsendrequest.html are flaky on all platforms because they behave differently if the loaded resource is cached. Every time I've taken a stab at reducing test flakiness, I've come across at least a few tests that pass when run as part of the test suite, but fail when run by themselves (or in parallel) because they accidentally expect an image or something to be in the cache. I think it would make the tests more maintainable if we cleared the cache before each test run. This is *not* before each page load though. So tests that do multiple page loads will still test cross-navigation caching behavior. While it's true that we could one-off fix each of these tests, it's usually very time consuming to figure out that caching is the problem, that's assuming anyone takes the time to look into why the test is flaky in the first place. Any objections? Given that the way we run tests in parallel in NRWT means that different processes get different lists of tests each time, it sounds like we may be getting a fair amount of nondeterminism from the cache not being cleared between tests. That seems bad, so I'm in favor of clearing the cache :) -- Dirk ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Wed, Aug 8, 2012 at 9:54 PM, Eric U er...@google.com wrote: On Wed, Aug 8, 2012 at 11:43 AM, Alexey Proskuryakov a...@webkit.org wrote: I can see some downsides to emptying the cache before each test: - we won't be getting any test coverage for cache behavior when it hits non-trivial size; Then let's add a cache test explicitly for this. Otherwise we just have to hope it gets tested accidentally along the way. Cache has subtle interactions with other things being tested (-flakiness). More explicit cache tests would be nice but we can't hope the replicate all the accidental testing we now get. We are going to lose a large chunk of existing test coverage if we do this. antti - this may well make tests measurably slower; - this will be yet another cause of subtle difference between platforms, as some will undoubtedly have this unimplemented for a long time. Both good points, but probably worth it, given the reliability improvement in the tests IMO. Eric ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Fri, Oct 26, 2012 at 1:44 AM, Antti Koivisto koivi...@iki.fi wrote: Cache has subtle interactions with other things being tested (-flakiness). More explicit cache tests would be nice but we can't hope the replicate all the accidental testing we now get. We are going to lose a large chunk of existing test coverage if we do this. The reality is that this test coverage today shows up as flakiness and so is ignored anyway, meaning we don't actually have useful coverage here. Even when flakiness is investigated, the fix is to cache-bust using unique URL params, which just means we lose the coverage you describe for that test, anyway. Brian notes in the bug that GTK wk2 GTK+ are done. I believe that just leaves chromium mac. Anyone wanting to step up to do mac, and, I guess, wk2 mac? -a ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
I don't know that there was consensus that every port had to be updated at the same time; in fact Balazs said Qt and EFL already clear the cache. I think you should just land the change for Chromium and let others update their ports as needed. The value in reduced flakiness and more predictability outweighs anything else in my book. Test coverage that you can't explain or rely on doesn't count for much to me. -- Dirk On Fri, Oct 26, 2012 at 1:20 AM, Ami Fischman fisch...@chromium.org wrote: This thread stalled out because although there seemed to be majority agreement that hermetic/repeatable tests are a good thing, there was a requirement that all ports be updated to the new behavior at the same time, and I'm only competent to do the chromium DRT (see https://bugs.webkit.org/show_bug.cgi?id=93195 for details). Is anyone interested in stepping up and doing the equivalent (clear caches between tests) for the mac and/or gtk ports' DRTs? On Wed, Aug 8, 2012 at 2:35 PM, Dirk Pranke dpra...@chromium.org wrote: On Wed, Aug 8, 2012 at 10:47 AM, Ojan Vafai o...@chromium.org wrote: See https://bugs.webkit.org/show_bug.cgi?id=93195. media/W3C/video/networkState/networkState_during_progress.html and media/video-poster-blocked-by-willsendrequest.html are flaky on all platforms because they behave differently if the loaded resource is cached. Every time I've taken a stab at reducing test flakiness, I've come across at least a few tests that pass when run as part of the test suite, but fail when run by themselves (or in parallel) because they accidentally expect an image or something to be in the cache. I think it would make the tests more maintainable if we cleared the cache before each test run. This is *not* before each page load though. So tests that do multiple page loads will still test cross-navigation caching behavior. While it's true that we could one-off fix each of these tests, it's usually very time consuming to figure out that caching is the problem, that's assuming anyone takes the time to look into why the test is flaky in the first place. Any objections? Given that the way we run tests in parallel in NRWT means that different processes get different lists of tests each time, it sounds like we may be getting a fair amount of nondeterminism from the cache not being cleared between tests. That seems bad, so I'm in favor of clearing the cache :) -- Dirk ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Fri, Oct 26, 2012 at 6:09 PM, Ami Fischman fisch...@chromium.org wrote: The reality is that this test coverage today shows up as flakiness and so is ignored anyway, meaning we don't actually have useful coverage here. Even when flakiness is investigated, the fix is to cache-bust using unique URL params, which just means we lose the coverage you describe for that test, anyway. When making cache related changes I have frequently found bugs from my patches because some seemingly random test started failing and I investigated. Without the test coverage some of those bugs would probably now be in the tree. antti Brian notes in the bug that GTK wk2 GTK+ are done. I believe that just leaves chromium mac. Anyone wanting to step up to do mac, and, I guess, wk2 mac? -a ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Fri, Oct 26, 2012 at 11:17 AM, Ryosuke Niwa rn...@webkit.org wrote: ... I agree this is a good change but it appears that we should add more cache/loader tests before changing DRT's behavior given that there are active contributors who rely on the current DRT behaviors to detect regressions. Can we add a flag to control this behavior? Then Antti could run the tests without cache clearing when modifying things possibly related to the cache code. We could even run a separate cr-linux bot like we do for debug builds. - E ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Fri, Oct 26, 2012 at 11:33 AM, Elliott Sprehn espr...@chromium.orgwrote: On Fri, Oct 26, 2012 at 11:17 AM, Ryosuke Niwa rn...@webkit.org wrote: ... I agree this is a good change but it appears that we should add more cache/loader tests before changing DRT's behavior given that there are active contributors who rely on the current DRT behaviors to detect regressions. Can we add a flag to control this behavior? Then Antti could run the tests without cache clearing when modifying things possibly related to the cache code. We could even run a separate cr-linux bot like we do for debug builds. I think having a set of tests that tests loaders/caches explicitly is more useful. - Ryosuke ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Fri, Oct 26, 2012 at 11:38 AM, Ryosuke Niwa rn...@webkit.org wrote: On Fri, Oct 26, 2012 at 11:33 AM, Elliott Sprehn espr...@chromium.org wrote: On Fri, Oct 26, 2012 at 11:17 AM, Ryosuke Niwa rn...@webkit.org wrote: ... I agree this is a good change but it appears that we should add more cache/loader tests before changing DRT's behavior given that there are active contributors who rely on the current DRT behaviors to detect regressions. Can we add a flag to control this behavior? Then Antti could run the tests without cache clearing when modifying things possibly related to the cache code. We could even run a separate cr-linux bot like we do for debug builds. I think having a set of tests that tests loaders/caches explicitly is more useful. I think having a set of tests for loaders and caches would be more useful as well, but I don't think it's fair to make that a requirement to changing the default behavior here, especially since it's not clear who all would be best suited to writing those tests or what the extent of that work is. I think Eliot's suggestion is a good one. I think the overall cost to the project by having flakiness in the tests probably outweighs the value we get in mysterious additional coverage, and it seems like having a flag would be a good compromise. -- Dirk - Ryosuke ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Fri, Oct 26, 2012 at 11:17 AM, Ryosuke Niwa rn...@webkit.org wrote: I agree this is a good change but it appears that we should add more cache/loader tests before changing DRT's behavior given that there are active contributors who rely on the current DRT behaviors to detect regressions. Not knowing the specifics of the regressions in question, I don't have any idea what these new cache-related tests would be. -a ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
26.10.2012, в 11:04, Antti Koivisto koivi...@iki.fi написал(а): The reality is that this test coverage today shows up as flakiness and so is ignored anyway, meaning we don't actually have useful coverage here. Even when flakiness is investigated, the fix is to cache-bust using unique URL params, which just means we lose the coverage you describe for that test, anyway. I think that this is the real issue here. Test flakiness is very important to investigate, this often leads to discovery of bad bugs, including security ones. The phrase flaky test often misplaces the blame. When making cache related changes I have frequently found bugs from my patches because some seemingly random test started failing and I investigated. Without the test coverage some of those bugs would probably now be in the tree. I agree with Antti. Finding regressions is what tests are for, and it would be difficult to make enough explicit tests to compensate for such loss of coverage. It would certainly be very unfortunate to lose test coverage without even an attempt to compensate for that. - WBR, Alexey Proskuryakov ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
Should we add random sleeps to DRT? It'll certainly help find some regressions (and even security bugs). Of course the down-side is that it makes tests non-repeatable and difficult to reason about. I'm baffled by your priorities and don't know how to continue this conversation productively. Sorry. Cheers, -a On Fri, Oct 26, 2012 at 12:43 PM, Alexey Proskuryakov a...@webkit.org wrote: 26.10.2012, в 11:04, Antti Koivisto koivi...@iki.fi написал(а): The reality is that this test coverage today shows up as flakiness and so is ignored anyway, meaning we don't actually have useful coverage here. Even when flakiness is investigated, the fix is to cache-bust using unique URL params, which just means we lose the coverage you describe for that test, anyway. I think that this is the real issue here. Test flakiness is very important to investigate, this often leads to discovery of bad bugs, including security ones. The phrase flaky test often misplaces the blame. When making cache related changes I have frequently found bugs from my patches because some seemingly random test started failing and I investigated. Without the test coverage some of those bugs would probably now be in the tree. I agree with Antti. Finding regressions is what tests are for, and it would be difficult to make enough explicit tests to compensate for such loss of coverage. It would certainly be very unfortunate to lose test coverage without even an attempt to compensate for that. - WBR, Alexey Proskuryakov ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Fri, Oct 26, 2012 at 11:43 AM, Dirk Pranke dpra...@chromium.org wrote: On Fri, Oct 26, 2012 at 11:38 AM, Ryosuke Niwa rn...@webkit.org wrote: On Fri, Oct 26, 2012 at 11:33 AM, Elliott Sprehn espr...@chromium.org wrote: On Fri, Oct 26, 2012 at 11:17 AM, Ryosuke Niwa rn...@webkit.org wrote: ... I agree this is a good change but it appears that we should add more cache/loader tests before changing DRT's behavior given that there are active contributors who rely on the current DRT behaviors to detect regressions. Can we add a flag to control this behavior? Then Antti could run the tests without cache clearing when modifying things possibly related to the cache code. We could even run a separate cr-linux bot like we do for debug builds. I think having a set of tests that tests loaders/caches explicitly is more useful. I think having a set of tests for loaders and caches would be more useful as well, but I don't think it's fair to make that a requirement to changing the default behavior here, especially since it's not clear who all would be best suited to writing those tests or what the extent of that work is. I’m sure Antti, Alexey, and others who have worked on the loader and other parts of WebKit are happy to write those tests or list the kind of things they want to test. Heck, I don’t mind writing those tests if someone could make a list. I totally sympathize with the sentiment to reduce the test flakiness but loader and cache code have historically been under-tested, and we’ve had a number of bugs detected only by running non-loader tests consecutively. On the contrary, we’ve had this DRT behavior for ages. Is there any reason we can’t wait for another couple of weeks or months until we add more loader cache tests before making the behavior change? - Ryosuke ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Fri, Oct 26, 2012 at 2:11 PM, Ryosuke Niwa rn...@webkit.org wrote: Is there any reason we can’t wait for another couple of weeks or months until we add more loader cache tests before making the behavior change? There is no time pressure here other than a desire to avoid this falling between the cracks and (continuing to) never being done. Is anyone signing up to write or enumerate the tests, who can do the work in the next weeks/months, but not immediately? -a ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
26.10.2012, в 14:57, Dirk Pranke dpra...@chromium.org написал(а): Perhaps a slight variant of this is that we can agree to make the changes on the Chromium port to clear the cache (much like the Qt and EFL ports already do), and you can continue to not clear the cache on the Apple Mac port until you feel comfortable that you've added additional tests? This means that when someone introduces flakiness into resource caching, it will be only seen on Apple Mac bots. How is this good for anyone? I personally find this unacceptable, as this will reduce usefulness of Apple Mac bots. The whole idea to clear cache between tests seems very arbitrary to me. There are lot of things remaining in the process across tests runs, and I'm not sure why you are picking on the one with the least explicit test coverage. Historically, test flakiness appears to increase whenever we do anything to address it without actual investigation of the root cause. Not long ago, we could run tests without re-running flaky tests, and get 100% pass. Now, we have many more flaky tests, re-run them, but flakiness remains even after second run. I don't think that this is a result of project scale change - I think that this is a result of the desire to get green bots without doing real WebCore work to fix underlying bugs. - WBR, Alexey Proskuryakov ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
Actually Qt and EFL DRT's already does that. On 08/08/2012 07:47 PM, Ojan Vafai wrote: See https://bugs.webkit.org/show_bug.cgi?id=93195. media/W3C/video/networkState/networkState_during_progress.html and media/video-poster-blocked-by-willsendrequest.html are flaky on all platforms because they behave differently if the loaded resource is cached. Every time I've taken a stab at reducing test flakiness, I've come across at least a few tests that pass when run as part of the test suite, but fail when run by themselves (or in parallel) because they accidentally expect an image or something to be in the cache. I think it would make the tests more maintainable if we cleared the cache before each test run. This is *not* before each page load though. So tests that do multiple page loads will still test cross-navigation caching behavior. While it's true that we could one-off fix each of these tests, it's usually very time consuming to figure out that caching is the problem, that's assuming anyone takes the time to look into why the test is flaky in the first place. Any objections? Ojan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
[webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
See https://bugs.webkit.org/show_bug.cgi?id=93195. media/W3C/video/networkState/networkState_during_progress.html and media/video-poster-blocked-by-willsendrequest.html are flaky on all platforms because they behave differently if the loaded resource is cached. Every time I've taken a stab at reducing test flakiness, I've come across at least a few tests that pass when run as part of the test suite, but fail when run by themselves (or in parallel) because they accidentally expect an image or something to be in the cache. I think it would make the tests more maintainable if we cleared the cache before each test run. This is *not* before each page load though. So tests that do multiple page loads will still test cross-navigation caching behavior. While it's true that we could one-off fix each of these tests, it's usually very time consuming to figure out that caching is the problem, that's assuming anyone takes the time to look into why the test is flaky in the first place. Any objections? Ojan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
That sounds like a great idea to me. I was actually surprised when fischman told me we don't currently do this. - Ryosuke On Wed, Aug 8, 2012 at 10:47 AM, Ojan Vafai o...@chromium.org wrote: See https://bugs.webkit.org/show_bug.cgi?id=93195. media/W3C/video/networkState/networkState_during_progress.html and media/video-poster-blocked-by-willsendrequest.html are flaky on all platforms because they behave differently if the loaded resource is cached. Every time I've taken a stab at reducing test flakiness, I've come across at least a few tests that pass when run as part of the test suite, but fail when run by themselves (or in parallel) because they accidentally expect an image or something to be in the cache. I think it would make the tests more maintainable if we cleared the cache before each test run. This is *not* before each page load though. So tests that do multiple page loads will still test cross-navigation caching behavior. While it's true that we could one-off fix each of these tests, it's usually very time consuming to figure out that caching is the problem, that's assuming anyone takes the time to look into why the test is flaky in the first place. Any objections? Ojan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
I can see some downsides to emptying the cache before each test: - we won't be getting any test coverage for cache behavior when it hits non-trivial size; - this may well make tests measurably slower; - this will be yet another cause of subtle difference between platforms, as some will undoubtedly have this unimplemented for a long time. - WBR, Alexey Proskuryakov 08.08.2012, в 10:47, Ojan Vafai написал(а): See https://bugs.webkit.org/show_bug.cgi?id=93195. media/W3C/video/networkState/networkState_during_progress.html and media/video-poster-blocked-by-willsendrequest.html are flaky on all platforms because they behave differently if the loaded resource is cached. Every time I've taken a stab at reducing test flakiness, I've come across at least a few tests that pass when run as part of the test suite, but fail when run by themselves (or in parallel) because they accidentally expect an image or something to be in the cache. I think it would make the tests more maintainable if we cleared the cache before each test run. This is *not* before each page load though. So tests that do multiple page loads will still test cross-navigation caching behavior. While it's true that we could one-off fix each of these tests, it's usually very time consuming to figure out that caching is the problem, that's assuming anyone takes the time to look into why the test is flaky in the first place. Any objections? Ojan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Wed, Aug 8, 2012 at 11:43 AM, Alexey Proskuryakov a...@webkit.org wrote: I can see some downsides to emptying the cache before each test: - we won't be getting any test coverage for cache behavior when it hits non-trivial size; Then let's add a cache test explicitly for this. Otherwise we just have to hope it gets tested accidentally along the way. - this may well make tests measurably slower; - this will be yet another cause of subtle difference between platforms, as some will undoubtedly have this unimplemented for a long time. Both good points, but probably worth it, given the reliability improvement in the tests IMO. Eric ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Wed, Aug 8, 2012 at 11:43 AM, Alexey Proskuryakov a...@webkit.org wrote: I can see some downsides to emptying the cache before each test: - we won't be getting any test coverage for cache behavior when it hits non-trivial size; We should have a separate test for that as Eric pointed out. - this may well make tests measurably slower; - this will be yet another cause of subtle difference between platforms, as some will undoubtedly have this unimplemented for a long time. On the contrary, it may well improve the overall bot cycle time because flaky tests are ran twice on new-run-webkit-tests if we actually have many tests that are flaky because of this. We also parallelize tests and resources are loaded from the disk (with cache) so I highly suspect this will be an issue. - Ryosuke ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] DRT/WTR should clear the cache at the beginning of each test?
On Wed, Aug 8, 2012 at 10:47 AM, Ojan Vafai o...@chromium.org wrote: See https://bugs.webkit.org/show_bug.cgi?id=93195. media/W3C/video/networkState/networkState_during_progress.html and media/video-poster-blocked-by-willsendrequest.html are flaky on all platforms because they behave differently if the loaded resource is cached. Every time I've taken a stab at reducing test flakiness, I've come across at least a few tests that pass when run as part of the test suite, but fail when run by themselves (or in parallel) because they accidentally expect an image or something to be in the cache. I think it would make the tests more maintainable if we cleared the cache before each test run. This is *not* before each page load though. So tests that do multiple page loads will still test cross-navigation caching behavior. While it's true that we could one-off fix each of these tests, it's usually very time consuming to figure out that caching is the problem, that's assuming anyone takes the time to look into why the test is flaky in the first place. Any objections? Given that the way we run tests in parallel in NRWT means that different processes get different lists of tests each time, it sounds like we may be getting a fair amount of nondeterminism from the cache not being cleared between tests. That seems bad, so I'm in favor of clearing the cache :) -- Dirk ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev