On Fri, May 18, 2012 at 12:43 AM, Maciej Stachowiak <[email protected]> wrote: > I guess we do. I think there is no point to saying PASS, because if a test > crashes, hangs or is skipped it's meaningless. And if it does not crash and > none of those other things apply, then it shouldn't be listed. But I could > be missing something.
I'm not sure what you mean by "if a test crashes, hangs, or is skipped it's meaningless"? Do mean that we can't conclude if it was text/image/whatever? this is certainly true. Although we don't normally specify passing tests, obviously, there are two cases where it's useful. The first is used in conjunction with a line applying to a whole directory, in order to specify suppressions for all tests except a few. For example: fast/html = FAIL fast/html/article-element.html = PASS The second is to indicate that a test is flaky, e.g. PASS TIMEOUT to indicate that a test may run fine and may timeout. > I think when there are regressions which do not cause a crash or hang, we > should be checking in a new expectation, so further regressions to either > text or image would be detected. More detail to come in an upcoming more > comprehensive proposal. How long should a test be allowed to produce output different from the checked-in-expectation? I think the answer should probably be at least as long as the cycle times for the bots (so, upwards of an hour or two), and it's reasonable to give developers a chance to triage failures (which probably doesn't need to be longer than the cycle time; if so, the change should be either reverted or new expectations checked in, right?) During that time, should the bots be red? If you don't want the bots to be red, you add an entry to the expectations file, right? (I assume the answer is yes here ...) Are you saying that you don't care to detect changes in the output between that point and the time the change is either reverted or new expectations are checked in? Or that you don't care if a TEXT failure becomes a TEXT+IMAGE failure, but other changes (e.g., IMAGE -> TEXT) may be more interesting? >> If one of the text tests or the image tests will fail but maybe not both, >> that means the test is nondeterministic, so it should be marked as flaky and >> its results should not affect greenness of the bots, so long as it does not >> hang or crash. It doesn't seem like we currently have a FLAKY result >> expectation based on the bots, you are supposed to indicate it by listing >> all possible kinds of failures, but that seems unhelpful. Also, a flaky test >> that sometimes entirely passes on multiple runs in a row will turn the bots >> red, which seems bad. Let's just have FLAKY state instead where we don't get >> upset whether the test passes or fails. > Just to rephrase this to make sure we're on the same page ... there are two kinds of flakiness. The first is intra-run: a test which fails but when immediately retried by NRWT, passes. Such flakiness does not cause the bot to turn red (but will be reported as "unexpected flakiness"). Note that CRASHes are never retried. The second is inter-run: A test may pass most runs, but sometimes fail. If you mark the test as PASS IMAGE (or whatever), it will not turn the bot red. I believe this is desirable to you, right? You then say that "listing all possible kinds of failures ... seems unhelpful". Why? Surely it is interesting if a test that previously had intermittent pixel failures starts having text failures as well (or worse, crashes or times out as rniwa points out)? Wouldn't you want that to turn the bot red? > Tests that could randomly crash or time out should probably not be run until > fixed. Are you saying that marking a test as "PASS CRASH" should automatically skip the test (much as WONTFIX would be equivalent to SKIP?) Or are you suggesting that people should mark the test as SKIP instead? On a related note, would you say that a test that deterministically (i.e., reliably) crashes should also be skipped? > Tests that randomly fail in one of several ways, I am not sure it is super > useful to list what files might be affected but nothing else. I'm not sure I'm parsing this properly. By "what files might be affected", do you mean indicating TEXT/IMAGE/both ? As noted above, I think some changes in behavior are interesting ... > If a test gets one of N results, bugzilla is a fine way to document that in > full detail. We currently display the different expected outcomes in the flakiness dashboard. If we were to move this information to a bug and not expose some way of retrieving that information, that would be a step backwards, I think. Perhaps we could use bug keywords for this? -- Dirk _______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

