Re: Chronically flaky tests

2020-08-04 Thread Robert Bradshaw
I'm in favor of a quarantine job whose tests are called out prominently as "possibly broken" in the release notes. As a follow up, +1 to exploring better tooling to track at a fine grained level exactly how flaky these test are (and hopefully detect if/when they go from flakey to just plain

Re: Chronically flaky tests

2020-08-04 Thread Tyson Hamilton
On Thu, Jul 30, 2020 at 6:24 PM Ahmet Altay wrote: > I like: > *Include ignored or quarantined tests in the release notes* > *Run flaky tests only in postcommit* (related? *Separate flaky tests into > quarantine job*) > The quarantine job would allow them to run in presubmit still, we would

Re: Chronically flaky tests

2020-08-04 Thread Etienne Chauchot
Hi all, +1 on ping the assigned person. For the flakes I know of (ESIO and CassandraIO), they are due to the load of the CI server. These IOs are tested using real embedded backends because those backends are complex and we need relevant tests. Counter measures have been taken (retrial

Re: Chronically flaky tests

2020-07-30 Thread Ahmet Altay
I like: *Include ignored or quarantined tests in the release notes* *Run flaky tests only in postcommit* (related? *Separate flaky tests into quarantine job*) *Require link to Jira to rerun a test* I am concerned about: *Add Gradle or Jenkins plugin to retry flaky tests* - because it is a

Re: Chronically flaky tests

2020-07-24 Thread Kenneth Knowles
Adding https://testautonation.com/analyse-test-results-deflake-flaky-tests/ to the list which seems a more powerful test history tool. On Fri, Jul 24, 2020 at 1:51 PM Kenneth Knowles wrote: > Had some off-list chats to brainstorm and I wanted to bring ideas back to > the dev@ list for

Re: Chronically flaky tests

2020-07-24 Thread Kenneth Knowles
Had some off-list chats to brainstorm and I wanted to bring ideas back to the dev@ list for consideration. A lot can be combined. I would really like to have a section in the release notes. I like the idea of banishing flakes from pre-commit (since you can't tell easily if it was a real failure

Re: Chronically flaky tests

2020-07-20 Thread Brian Hulette
> I think we are missing a way for checking that we are making progress on P1 issues. For example, P0 issues block releases and this obviously results in fixing/triaging/addressing P0 issues at least every 6 weeks. We do not have a similar process for flaky tests. I do not know what would be a

Re: Chronically flaky tests

2020-07-17 Thread Ahmet Altay
Another idea, could we change our "Retest X" phrases with "Retest X (Reason)" phrases? With this change a PR author will have to look at failed test logs. They could catch new flakiness introduced by their PR, file a JIRA for a flakiness that was not noted before, or ping an existing JIRA

Re: Chronically flaky tests

2020-07-17 Thread Tyson Hamilton
Adding retries can be beneficial in two ways, unblocking a PR, and collecting metrics about the flakes. If we also had a flaky test leaderboard that showed which tests are the most flaky, then we could take action on them. Encouraging someone from the community to fix the flaky test is another

Re: Chronically flaky tests

2020-07-16 Thread Luke Cwik
What do other Apache projects do to address this issue? On Thu, Jul 16, 2020 at 5:51 PM Ahmet Altay wrote: > I agree with the comments in this thread. > - If we are not re-enabling tests back again or we do not have a plan to > re-enable them again, disabling tests only provides us temporary

Re: Chronically flaky tests

2020-07-16 Thread Ahmet Altay
I agree with the comments in this thread. - If we are not re-enabling tests back again or we do not have a plan to re-enable them again, disabling tests only provides us temporary relief until eventually users find issues instead of disabled tests. - I feel similarly about retries. It is

Re: Chronically flaky tests

2020-07-16 Thread Valentyn Tymofieiev
I think the original discussion[1] on introducing tenacity might answer that question. [1] https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af%40%3Cdev.beam.apache.org%3E On Thu, Jul 16, 2020 at 10:48 AM Rui Wang wrote: > Is there an observation that

Re: Chronically flaky tests

2020-07-16 Thread Rui Wang
Is there an observation that enabling tenacity improves the development experience on Python SDK? E.g. less wait time to get PR pass and merged? Or it might be a matter of a right number of retry to align with the "flakiness" of a test? -Rui On Thu, Jul 16, 2020 at 10:38 AM Valentyn Tymofieiev

Re: Chronically flaky tests

2020-07-16 Thread Valentyn Tymofieiev
We used tenacity[1] to retry some unit tests for which we understood the nature of flakiness. [1] https://github.com/apache/beam/blob/3b9aae2bcaeb48ab43a77368ae496edc73634c91/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1156 On Thu, Jul 16, 2020 at 10:25 AM

Re: Chronically flaky tests

2020-07-16 Thread Kenneth Knowles
Didn't we use something like that flaky retry plugin for Python tests at some point? Adding retries may be preferable to disabling the test. We need a process to remove the retries ASAP though. As Luke says that is not so easy to make happen. Having a way to make P1 bugs more visible in an ongoing

Re: Chronically flaky tests

2020-07-16 Thread Luke Cwik
I don't think I have seen tests that were previously disabled become re-enabled. It seems as though we have about ~60 disabled tests in Java and ~15 in Python. Half of the Java ones seem to be in ZetaSQL/SQL due to missing features so unrelated to being a flake. On Thu, Jul 16, 2020 at 8:49 AM

Re: Chronically flaky tests

2020-07-16 Thread Gleb Kanterov
There is something called test-retry-gradle-plugin [1]. It retries tests if they fail, and have different modes to handle flaky tests. Did we ever try or consider using it? [1]: https://github.com/gradle/test-retry-gradle-plugin On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov wrote: > I agree

Re: Chronically flaky tests

2020-07-16 Thread Gleb Kanterov
I agree with what Ahmet is saying. I can share my perspective, recently I had to retrigger build 6 times due to flaky tests, and each retrigger took one hour of waiting time. I've seen examples of automatic tracking of flaky tests, where a test is considered flaky if both fails and succeeds for

Re: Chronically flaky tests

2020-07-15 Thread Ahmet Altay
I think it will be reasonable to disable/sickbay any flaky test that is actively blocking people. Collective cost of flaky tests for such a large group of contributors is very significant. Most of these issues are unassigned. IMO, it makes sense to assign these issues to the most relevant person

Re: Chronically flaky tests

2020-07-15 Thread Kenneth Knowles
The situation is much worse than that IMO. My experience of the last few days is that a large portion of time went to *just connecting failing runs with the corresponding Jira tickets or filing new ones*. Summarized on PRs: - https://github.com/apache/beam/pull/12272#issuecomment-659050891 -

Chronically flaky tests

2020-07-15 Thread Andrew Pilloud
We have two test suites that are responsible for a large percentage of our flaky tests and both have bugs open for about a year without being fixed. These suites are ParDoLifecycleTest (BEAM-8101 ) in Java and BigQueryWriteIntegrationTests in