On Wed, Aug 15, 2012 at 5:00 PM, Filip Pizlo <fpi...@apple.com> wrote: > > On Aug 15, 2012, at 4:02 PM, Dirk Pranke <dpra...@chromium.org> wrote: > >> On Wed, Aug 15, 2012 at 3:06 PM, Filip Pizlo <fpi...@apple.com> wrote: >>> Apparently I was somewhat unclear. Let me restate. We have the following >>> mechanisms available when a test fails: >>> >>> 1) Check in a new -expected.* file. >>> >>> 2) Modify the test. >>> >>> 3) Modify a TestExpectations file. >>> >>> 4) Add the test to a Skipped file. >>> >>> 5) Remove the test entirely. >>> >>> I have no problem with (1) unless it is intended to mark the test as >>> expected-to-fail-but-not-crash. I agree that using -expected.* to >>> accomplish what TestExpectations accomplishes is not valuable, but I >>> further believe that even TestExpectations is not valuable. >>> >>> I broadly prefer (2) whenever possible. >>> >>> I believe that (3) and (4) are redundant, and I don't buy the value of (3). >>> >>> I don't like (5) but we should probably do more of it for tests that have a >>> chronically low signal-to-noise ratio. >>> >> >> Thank you for clarifying. I had actually written an almost identical >> list but didn't send it, so I think we're on the same page at least as >> far as understanding the problem goes ... >> >> So, I would describe my suggestion as an improved variant of the kind >> of (1) that can be used as "expected-to-fail-but-not-crash" (which >> I'll call 1-fail), and that we would use this in cases where we use >> (3), (4), or (1-fail) today. >> >> I would also agree that we should do (2) where possible, but I don't >> think this is easily possible for a large class of tests, especially >> pixel tests, although I am currently working on other things that will >> hopefully help here. >> >> Chromium certainly does a lot of (3) today, and some (1-fail). Other >> ports definitely use (1-fail) or (4) today, because (2) is rarely >> possible for many, many tests. >> >> We know that doing (1-fail), (3), or (4) causes real maintenance woes >> down the road, but also that doing (1-fail) or (3) catches real >> problems that simply skipping the test would not -- at some cost. >> Whether the benefit is worth the cost, is not known, of course, but I >> believe it is. I am hoping that my suggestion will have a lower >> overall cost than doing (1-fail) or (3). > > I also believe that the trade-off is known and, and specifically, I believe > that the cost of having any tests in the (1-fail) or (3) states is more > costly than having them in (4) or (5). > >> >>> You're proposing a new mechanism. I'm arguing that given the sheer number >>> of tests, and the overheads associated with maintaining them, (4) is the >>> broadly more productive strategy in terms of bugs-fixed/person-hours. And, >>> increasing the number of mechanisms for dealing with tests by 20% is likely >>> to reduce overall productivity rather than helping anyone. >>> >> >> Why do you believe this to be true? I'm not being flippant here ... I >> think this is a very plausible argument, and it may well be true, but >> I don't know what the criteria we would use to evaluate it are. Some >> of the possible factors are: >> >> * the complexity of the test infrastructure and the cognitive load it >> introduces on developers >> * the cost of bugs that are missed because we're skipping the tests >> intended to catch those bugs >> * the cost of looking at "regressions" and trying to figure out if the >> regression is something you care about or not >> * the cost of looking at the "-expected" results and trying to figure >> out if what is "expected" is correct or not >> >> There may be others as well, but the last three are all very real in >> my experience, and I believe they significantly outweigh the first >> one, but I don't know how to objectively assess that (and I don't >> think it's even possible since different people/teams/ports will weigh >> these things differently). > > I believe that the cognitive load is greater than any benefit from catching > bugs incidentally by continuing to run a (1-fail) or (3) test, and continuing > to evaluate whether or not the expectation matches some notions of desired > behavior. > > And therein lies one possible source of disagreement. >
Yes :) > But there is another source of disagreement: would adding a sixth facility > that overlaps with (1-fail) or (3) help? No, I don't believe it would. It's > just another mechanism leading to more possible arguments about which > mechanism is better. Perhaps. I think it will, obviously, or I wouldn't be proposing this in the first place. I welcome other opinions on this as well ... -- Dirk _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev