Re: Chronically flaky tests

2020-08-04 Thread Robert Bradshaw
I'm in favor of a quarantine job whose tests are called out
prominently as "possibly broken" in the release notes. As a follow up,
+1 to exploring better tooling to track at a fine grained level
exactly how flaky these test are (and hopefully detect if/when they go
from flakey to just plain broken).

On Tue, Aug 4, 2020 at 7:25 AM Etienne Chauchot  wrote:
>
> Hi all,
>
> +1 on ping the assigned person.
>
> For the flakes I know of (ESIO and CassandraIO), they are due to the load of 
> the CI server. These IOs are tested using real embedded backends because 
> those backends are complex and we need relevant tests.
>
> Counter measures have been taken (retrial inside the test sensible to load, 
> add ranges of acceptable numbers, call internal backend mechanisms to force 
> refresh in case load prevented the backend to do so ...).

Yes, certain tests with external dependencies should to their own
internal retries. if that is not sufficient, they should probably be
quarantined.

> I recently got pinged my Ahmet (thanks to him!) about a flakiness that I did 
> not see. This seems to me the correct way to go. Systematically retrying 
> tests with a CI mechanism or disabling tests seem to me a risky workaround 
> that just allows to get the problem off our minds.
>
> Etienne
>
> On 20/07/2020 20:58, Brian Hulette wrote:
>
> > I think we are missing a way for checking that we are making progress on P1 
> > issues. For example, P0 issues block releases and this obviously results in 
> > fixing/triaging/addressing P0 issues at least every 6 weeks. We do not have 
> > a similar process for flaky tests. I do not know what would be a good 
> > policy. One suggestion is to ping (email/slack) assignees of issues. I 
> > recently missed a flaky issue that was assigned to me. A ping like that 
> > would have reminded me. And if an assignee cannot help/does not have the 
> > time, we can try to find a new assignee.
>
> Yeah I think this is something we should address. With the new jira 
> automation at least assignees should get an email notification after 30 days 
> because of a jira comment like [1], but that's too long to let a test 
> continue to flake. Could Beam Jira Bot ping every N days for P1s that aren't 
> making progress?
>
> That wouldn't help us with P1s that have no assignee, or are assigned to 
> overloaded people. It seems we'd need some kind of dashboard or report to 
> capture those.
>
> [1] 
> https://issues.apache.org/jira/browse/BEAM-8101?focusedCommentId=17121918&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17121918
>
> On Fri, Jul 17, 2020 at 1:09 PM Ahmet Altay  wrote:
>>
>> Another idea, could we change our "Retest X" phrases with "Retest X 
>> (Reason)" phrases? With this change a PR author will have to look at failed 
>> test logs. They could catch new flakiness introduced by their PR, file a 
>> JIRA for a flakiness that was not noted before, or ping an existing JIRA 
>> issue/raise its severity. On the downside this will require PR authors to do 
>> more.
>>
>> On Fri, Jul 17, 2020 at 6:46 AM Tyson Hamilton  wrote:
>>>
>>> Adding retries can be beneficial in two ways, unblocking a PR, and 
>>> collecting metrics about the flakes.
>>
>>
>> Makes sense. I think we will still need to have a plan to remove retries 
>> similar to re-enabling disabled tests.
>>
>>>
>>>
>>> If we also had a flaky test leaderboard that showed which tests are the 
>>> most flaky, then we could take action on them. Encouraging someone from the 
>>> community to fix the flaky test is another issue.
>>>
>>> The test status matrix of tests that is on the GitHub landing page could 
>>> show flake level to communicate to users which modules are losing a 
>>> trustable test signal. Maybe this shows up as a flake % or a code coverage 
>>> % that decreases due to disabled flaky tests.
>>
>>
>> +1 to a dashboard that will show a "leaderboard" of flaky tests.
>>
>>>
>>>
>>> I didn't look for plugins, just dreaming up some options.
>>>
>>>
>>>
>>>
>>> On Thu, Jul 16, 2020, 5:58 PM Luke Cwik  wrote:

 What do other Apache projects do to address this issue?

 On Thu, Jul 16, 2020 at 5:51 PM Ahmet Altay  wrote:
>
> I agree with the comments in this thread.
> - If we are not re-enabling tests back again or we do not have a plan to 
> re-enable them again, disabling tests only provides us temporary relief 
> until eventually users find issues instead of disabled tests.
> - I feel similarly about retries. It is reasonable to add retries for 
> reasons we understand. Adding retries to avoid flakes is similar to 
> disabling tests. They might hide real issues.
>
> I think we are missing a way for checking that we are making progress on 
> P1 issues. For example, P0 issues block releases and this obviously 
> results in fixing/triaging/addressing P0 issues at least every 6 weeks. 
> We do not have a similar process for flaky tests. I do n

Re: Chronically flaky tests

2020-08-04 Thread Tyson Hamilton
On Thu, Jul 30, 2020 at 6:24 PM Ahmet Altay  wrote:

> I like:
> *Include ignored or quarantined tests in the release notes*
> *Run flaky tests only in postcommit* (related? *Separate flaky tests into
> quarantine job*)
>

The quarantine job would allow them to run in presubmit still, we would
just not use it to determine the health of a PR or block submission.


> *Require link to Jira to rerun a test*
>
> I am concerned about:
> *Add Gradle or Jenkins plugin to retry flaky tests* - because it is a
> convenient place for real bugs to hide.
>

This concern has come up a few times now so I feel like this is a route we
shouldn't pursue further.


>
> I do not know much about:
> *Consider Gradle Enterprise*
> https://testautonation.com/analyse-test-results-deflake-flaky-tests/
>

There is a subscription fee for Gradle Enterprise but it offers a lot of
support for flaky tests and other metrics. I have a meeting to talk with
them on August 7th about the pricing model for open source projects. From
what I understand, last time we spoke with them they didn't have a good
model for open source projects and the fee was tied into the number of
developers in the project.


>
>
> Thank you for putting this list! I believe even if we can commit to doing
> some of these we would have a much healthier project. If we can build
> consensus on implementing, I will be happy to work on some of them.
>
> On Fri, Jul 24, 2020 at 1:54 PM Kenneth Knowles  wrote:
>
>> Adding
>> https://testautonation.com/analyse-test-results-deflake-flaky-tests/ to
>> the list which seems a more powerful test history tool.
>>
>> On Fri, Jul 24, 2020 at 1:51 PM Kenneth Knowles  wrote:
>>
>>> Had some off-list chats to brainstorm and I wanted to bring ideas back
>>> to the dev@ list for consideration. A lot can be combined. I would
>>> really like to have a section in the release notes. I like the idea of
>>> banishing flakes from pre-commit (since you can't tell easily if it was a
>>> real failure caused by the PR) and auto-retrying in post-commit (so we can
>>> gather data on exactly what is flaking without a lot of manual
>>> investigation).
>>>
>>> *Include ignored or quarantined tests in the release notes*
>>> Pro:
>>>  - Users are aware of what is not being tested so may be silently broken
>>>  - It forces discussion of ignored tests to be part of our community
>>> processes
>>> Con:
>>>  - It may look bad if the list is large (this is actually also a Pro
>>> because if it looks bad, it is bad)
>>>
>>> *Run flaky tests only in postcommit*
>>> Pro:
>>>  - isolates the bad signal so pre-commit is not affected
>>>  - saves pointless re-runs in pre-commit
>>>  - keeps a signal in post-commit that we can watch, instead of losing it
>>> completely when we disable a test
>>>  - maybe keeps the flaky tests in job related to what they are testing
>>> Con:
>>>  - we have to really watch post-commit or flakes can turn into failures
>>>
>>> *Separate flaky tests into quarantine job*
>>> Pro:
>>>  - gain signal for healthy tests, as with disabling or running in
>>> post-commit
>>>  - also saves pointless re-runs
>>> Con:
>>>  - may collect bad tests so that we never look at it so it is the same
>>> as disabling the test
>>>  - lots of unrelated tests grouped into signal instead of focused on
>>> health of a particular component
>>>
>>> *Add Gradle or Jenkins plugin to retry flaky tests*
>>> https://blog.gradle.org/gradle-flaky-test-retry-plugin
>>> https://plugins.jenkins.io/flaky-test-handler/
>>> Pro:
>>>  - easier than Jiras with human pasting links; works with moving flakes
>>> to post-commit
>>>  - get a somewhat automated view of flakiness, whether in pre-commit or
>>> post-commit
>>>  - don't get stopped by flakiness
>>> Con:
>>>  - maybe too easy to ignore flakes; we should add all flakes (not just
>>> disabled or quarantined) to the release notes
>>>  - sometimes flakes are actual bugs (like concurrency) so treating this
>>> as OK is not desirable
>>>  - without Jiras, no automated release notes
>>>  - Jenkins: retry only will work at job level because it needs Maven to
>>> retry only failed (I think)
>>>  - Jenkins: some of our jobs may have duplicate test names (but might
>>> already be fixed)
>>>
>>> *Consider Gradle Enterprise*
>>> Pro:
>>>  - get Gradle scan granularity of flake data (and other stuff)
>>>  - also gives module-level health which we do not have today
>>> Con:
>>>  - cost and administrative burden unknown
>>>  - we probably have to do some small work to make our jobs compatible
>>> with their history tracking
>>>
>>> *Require link to Jira to rerun a test*
>>> Instead of saying "Run Java PreCommit" you have to link to the bug
>>> relating to the failure.
>>> Pro:
>>>  - forces investigation
>>>  - helps others find out about issues
>>> Con:
>>>  - adds a lot of manual work, or requires automation (which will
>>> probably be ad hoc and fragile)
>>>
>>> Kenn
>>>
>>> On Mon, Jul 20, 2020 at 11:59 AM Brian Hulette 
>>> wrote:
>>>

Re: Chronically flaky tests

2020-08-04 Thread Etienne Chauchot

Hi all,

+1 on ping the assigned person.

For the flakes I know of (ESIO and CassandraIO), they are due to the 
load of the CI server. These IOs are tested using real embedded backends 
because those backends are complex and we need relevant tests.


Counter measures have been taken (retrial inside the test sensible to 
load, add ranges of acceptable numbers, call internal backend mechanisms 
to force refresh in case load prevented the backend to do so ...).


I recently got pinged my Ahmet (thanks to him!) about a flakiness that I 
did not see. This seems to me the correct way to go. Systematically 
retrying tests with a CI mechanism or disabling tests seem to me a risky 
workaround that just allows to get the problem off our minds.


Etienne

On 20/07/2020 20:58, Brian Hulette wrote:
> I think we are missing a way for checking that we are making 
progress on P1 issues. For example, P0 issues block releases and this 
obviously results in fixing/triaging/addressing P0 issues at least 
every 6 weeks. We do not have a similar process for flaky tests. I do 
not know what would be a good policy. One suggestion is to ping 
(email/slack) assignees of issues. I recently missed a flaky issue 
that was assigned to me. A ping like that would have reminded me. And 
if an assignee cannot help/does not have the time, we can try to find 
a new assignee.


Yeah I think this is something we should address. With the new jira 
automation at least assignees should get an email notification after 
30 days because of a jira comment like [1], but that's too long to let 
a test continue to flake. Could Beam Jira Bot ping every N days for 
P1s that aren't making progress?


That wouldn't help us with P1s that have no assignee, or are assigned 
to overloaded people. It seems we'd need some kind of dashboard or 
report to capture those.


[1] 
https://issues.apache.org/jira/browse/BEAM-8101?focusedCommentId=17121918&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17121918


On Fri, Jul 17, 2020 at 1:09 PM Ahmet Altay > wrote:


Another idea, could we change our "Retest X" phrases with "Retest
X (Reason)" phrases? With this change a PR author will have to
look at failed test logs. They could catch new flakiness
introduced by their PR, file a JIRA for a flakiness that was not
noted before, or ping an existing JIRA issue/raise its severity.
On the downside this will require PR authors to do more.

On Fri, Jul 17, 2020 at 6:46 AM Tyson Hamilton mailto:tyso...@google.com>> wrote:

Adding retries can be beneficial in two ways, unblocking a PR,
and collecting metrics about the flakes.


Makes sense. I think we will still need to have a plan to remove
retries similar to re-enabling disabled tests.


If we also had a flaky test leaderboard that showed which
tests are the most flaky, then we could take action on them.
Encouraging someone from the community to fix the flaky test
is another issue.

The test status matrix of tests that is on the GitHub landing
page could show flake level to communicate to users which
modules are losing a trustable test signal. Maybe this shows
up as a flake % or a code coverage % that decreases due to
disabled flaky tests.


+1 to a dashboard that will show a "leaderboard" of flaky tests.


I didn't look for plugins, just dreaming up some options.




On Thu, Jul 16, 2020, 5:58 PM Luke Cwik mailto:lc...@google.com>> wrote:

What do other Apache projects do to address this issue?

On Thu, Jul 16, 2020 at 5:51 PM Ahmet Altay
mailto:al...@google.com>> wrote:

I agree with the comments in this thread.
- If we are not re-enabling tests back again or we do
not have a plan to re-enable them again, disabling
tests only provides us temporary relief until
eventually users find issues instead of disabled tests.
- I feel similarly about retries. It is reasonable to
add retries for reasons we understand. Adding retries
to avoid flakes is similar to disabling tests. They
might hide real issues.

I think we are missing a way for checking that we are
making progress on P1 issues. For example, P0 issues
block releases and this obviously results in
fixing/triaging/addressing P0 issues at least every 6
weeks. We do not have a similar process for flaky
tests. I do not know what would be a good policy. One
suggestion is to ping (email/slack) assignees of
issues. I recently missed a flaky issue that was
assigned to me. A ping like that would have reminded
me. And if an assignee cannot help/does not ha

Re: Chronically flaky tests

2020-07-30 Thread Ahmet Altay
I like:
*Include ignored or quarantined tests in the release notes*
*Run flaky tests only in postcommit* (related? *Separate flaky tests into
quarantine job*)
*Require link to Jira to rerun a test*

I am concerned about:
*Add Gradle or Jenkins plugin to retry flaky tests* - because it is a
convenient place for real bugs to hide.

I do not know much about:
*Consider Gradle Enterprise*
https://testautonation.com/analyse-test-results-deflake-flaky-tests/

Thank you for putting this list! I believe even if we can commit to doing
some of these we would have a much healthier project. If we can build
consensus on implementing, I will be happy to work on some of them.

On Fri, Jul 24, 2020 at 1:54 PM Kenneth Knowles  wrote:

> Adding
> https://testautonation.com/analyse-test-results-deflake-flaky-tests/ to
> the list which seems a more powerful test history tool.
>
> On Fri, Jul 24, 2020 at 1:51 PM Kenneth Knowles  wrote:
>
>> Had some off-list chats to brainstorm and I wanted to bring ideas back to
>> the dev@ list for consideration. A lot can be combined. I would really
>> like to have a section in the release notes. I like the idea of banishing
>> flakes from pre-commit (since you can't tell easily if it was a real
>> failure caused by the PR) and auto-retrying in post-commit (so we can
>> gather data on exactly what is flaking without a lot of manual
>> investigation).
>>
>> *Include ignored or quarantined tests in the release notes*
>> Pro:
>>  - Users are aware of what is not being tested so may be silently broken
>>  - It forces discussion of ignored tests to be part of our community
>> processes
>> Con:
>>  - It may look bad if the list is large (this is actually also a Pro
>> because if it looks bad, it is bad)
>>
>> *Run flaky tests only in postcommit*
>> Pro:
>>  - isolates the bad signal so pre-commit is not affected
>>  - saves pointless re-runs in pre-commit
>>  - keeps a signal in post-commit that we can watch, instead of losing it
>> completely when we disable a test
>>  - maybe keeps the flaky tests in job related to what they are testing
>> Con:
>>  - we have to really watch post-commit or flakes can turn into failures
>>
>> *Separate flaky tests into quarantine job*
>> Pro:
>>  - gain signal for healthy tests, as with disabling or running in
>> post-commit
>>  - also saves pointless re-runs
>> Con:
>>  - may collect bad tests so that we never look at it so it is the same as
>> disabling the test
>>  - lots of unrelated tests grouped into signal instead of focused on
>> health of a particular component
>>
>> *Add Gradle or Jenkins plugin to retry flaky tests*
>> https://blog.gradle.org/gradle-flaky-test-retry-plugin
>> https://plugins.jenkins.io/flaky-test-handler/
>> Pro:
>>  - easier than Jiras with human pasting links; works with moving flakes
>> to post-commit
>>  - get a somewhat automated view of flakiness, whether in pre-commit or
>> post-commit
>>  - don't get stopped by flakiness
>> Con:
>>  - maybe too easy to ignore flakes; we should add all flakes (not just
>> disabled or quarantined) to the release notes
>>  - sometimes flakes are actual bugs (like concurrency) so treating this
>> as OK is not desirable
>>  - without Jiras, no automated release notes
>>  - Jenkins: retry only will work at job level because it needs Maven to
>> retry only failed (I think)
>>  - Jenkins: some of our jobs may have duplicate test names (but might
>> already be fixed)
>>
>> *Consider Gradle Enterprise*
>> Pro:
>>  - get Gradle scan granularity of flake data (and other stuff)
>>  - also gives module-level health which we do not have today
>> Con:
>>  - cost and administrative burden unknown
>>  - we probably have to do some small work to make our jobs compatible
>> with their history tracking
>>
>> *Require link to Jira to rerun a test*
>> Instead of saying "Run Java PreCommit" you have to link to the bug
>> relating to the failure.
>> Pro:
>>  - forces investigation
>>  - helps others find out about issues
>> Con:
>>  - adds a lot of manual work, or requires automation (which will probably
>> be ad hoc and fragile)
>>
>> Kenn
>>
>> On Mon, Jul 20, 2020 at 11:59 AM Brian Hulette 
>> wrote:
>>
>>> > I think we are missing a way for checking that we are making progress
>>> on P1 issues. For example, P0 issues block releases and this obviously
>>> results in fixing/triaging/addressing P0 issues at least every 6 weeks. We
>>> do not have a similar process for flaky tests. I do not know what would be
>>> a good policy. One suggestion is to ping (email/slack) assignees of issues.
>>> I recently missed a flaky issue that was assigned to me. A ping like that
>>> would have reminded me. And if an assignee cannot help/does not have the
>>> time, we can try to find a new assignee.
>>>
>>> Yeah I think this is something we should address. With the new jira
>>> automation at least assignees should get an email notification after 30
>>> days because of a jira comment like [1], but that's too long to let a test
>>> conti

Re: Chronically flaky tests

2020-07-24 Thread Kenneth Knowles
Adding https://testautonation.com/analyse-test-results-deflake-flaky-tests/ to
the list which seems a more powerful test history tool.

On Fri, Jul 24, 2020 at 1:51 PM Kenneth Knowles  wrote:

> Had some off-list chats to brainstorm and I wanted to bring ideas back to
> the dev@ list for consideration. A lot can be combined. I would really
> like to have a section in the release notes. I like the idea of banishing
> flakes from pre-commit (since you can't tell easily if it was a real
> failure caused by the PR) and auto-retrying in post-commit (so we can
> gather data on exactly what is flaking without a lot of manual
> investigation).
>
> *Include ignored or quarantined tests in the release notes*
> Pro:
>  - Users are aware of what is not being tested so may be silently broken
>  - It forces discussion of ignored tests to be part of our community
> processes
> Con:
>  - It may look bad if the list is large (this is actually also a Pro
> because if it looks bad, it is bad)
>
> *Run flaky tests only in postcommit*
> Pro:
>  - isolates the bad signal so pre-commit is not affected
>  - saves pointless re-runs in pre-commit
>  - keeps a signal in post-commit that we can watch, instead of losing it
> completely when we disable a test
>  - maybe keeps the flaky tests in job related to what they are testing
> Con:
>  - we have to really watch post-commit or flakes can turn into failures
>
> *Separate flaky tests into quarantine job*
> Pro:
>  - gain signal for healthy tests, as with disabling or running in
> post-commit
>  - also saves pointless re-runs
> Con:
>  - may collect bad tests so that we never look at it so it is the same as
> disabling the test
>  - lots of unrelated tests grouped into signal instead of focused on
> health of a particular component
>
> *Add Gradle or Jenkins plugin to retry flaky tests*
> https://blog.gradle.org/gradle-flaky-test-retry-plugin
> https://plugins.jenkins.io/flaky-test-handler/
> Pro:
>  - easier than Jiras with human pasting links; works with moving flakes to
> post-commit
>  - get a somewhat automated view of flakiness, whether in pre-commit or
> post-commit
>  - don't get stopped by flakiness
> Con:
>  - maybe too easy to ignore flakes; we should add all flakes (not just
> disabled or quarantined) to the release notes
>  - sometimes flakes are actual bugs (like concurrency) so treating this as
> OK is not desirable
>  - without Jiras, no automated release notes
>  - Jenkins: retry only will work at job level because it needs Maven to
> retry only failed (I think)
>  - Jenkins: some of our jobs may have duplicate test names (but might
> already be fixed)
>
> *Consider Gradle Enterprise*
> Pro:
>  - get Gradle scan granularity of flake data (and other stuff)
>  - also gives module-level health which we do not have today
> Con:
>  - cost and administrative burden unknown
>  - we probably have to do some small work to make our jobs compatible with
> their history tracking
>
> *Require link to Jira to rerun a test*
> Instead of saying "Run Java PreCommit" you have to link to the bug
> relating to the failure.
> Pro:
>  - forces investigation
>  - helps others find out about issues
> Con:
>  - adds a lot of manual work, or requires automation (which will probably
> be ad hoc and fragile)
>
> Kenn
>
> On Mon, Jul 20, 2020 at 11:59 AM Brian Hulette 
> wrote:
>
>> > I think we are missing a way for checking that we are making progress
>> on P1 issues. For example, P0 issues block releases and this obviously
>> results in fixing/triaging/addressing P0 issues at least every 6 weeks. We
>> do not have a similar process for flaky tests. I do not know what would be
>> a good policy. One suggestion is to ping (email/slack) assignees of issues.
>> I recently missed a flaky issue that was assigned to me. A ping like that
>> would have reminded me. And if an assignee cannot help/does not have the
>> time, we can try to find a new assignee.
>>
>> Yeah I think this is something we should address. With the new jira
>> automation at least assignees should get an email notification after 30
>> days because of a jira comment like [1], but that's too long to let a test
>> continue to flake. Could Beam Jira Bot ping every N days for P1s that
>> aren't making progress?
>>
>> That wouldn't help us with P1s that have no assignee, or are assigned to
>> overloaded people. It seems we'd need some kind of dashboard or report to
>> capture those.
>>
>> [1]
>> https://issues.apache.org/jira/browse/BEAM-8101?focusedCommentId=17121918&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17121918
>>
>> On Fri, Jul 17, 2020 at 1:09 PM Ahmet Altay  wrote:
>>
>>> Another idea, could we change our "Retest X" phrases with "Retest X
>>> (Reason)" phrases? With this change a PR author will have to look at failed
>>> test logs. They could catch new flakiness introduced by their PR, file a
>>> JIRA for a flakiness that was not noted before, or ping an existing JIRA
>>> issue/raise its sev

Re: Chronically flaky tests

2020-07-24 Thread Kenneth Knowles
Had some off-list chats to brainstorm and I wanted to bring ideas back to
the dev@ list for consideration. A lot can be combined. I would really like
to have a section in the release notes. I like the idea of banishing flakes
from pre-commit (since you can't tell easily if it was a real failure
caused by the PR) and auto-retrying in post-commit (so we can gather data
on exactly what is flaking without a lot of manual investigation).

*Include ignored or quarantined tests in the release notes*
Pro:
 - Users are aware of what is not being tested so may be silently broken
 - It forces discussion of ignored tests to be part of our community
processes
Con:
 - It may look bad if the list is large (this is actually also a Pro
because if it looks bad, it is bad)

*Run flaky tests only in postcommit*
Pro:
 - isolates the bad signal so pre-commit is not affected
 - saves pointless re-runs in pre-commit
 - keeps a signal in post-commit that we can watch, instead of losing it
completely when we disable a test
 - maybe keeps the flaky tests in job related to what they are testing
Con:
 - we have to really watch post-commit or flakes can turn into failures

*Separate flaky tests into quarantine job*
Pro:
 - gain signal for healthy tests, as with disabling or running in
post-commit
 - also saves pointless re-runs
Con:
 - may collect bad tests so that we never look at it so it is the same as
disabling the test
 - lots of unrelated tests grouped into signal instead of focused on health
of a particular component

*Add Gradle or Jenkins plugin to retry flaky tests*
https://blog.gradle.org/gradle-flaky-test-retry-plugin
https://plugins.jenkins.io/flaky-test-handler/
Pro:
 - easier than Jiras with human pasting links; works with moving flakes to
post-commit
 - get a somewhat automated view of flakiness, whether in pre-commit or
post-commit
 - don't get stopped by flakiness
Con:
 - maybe too easy to ignore flakes; we should add all flakes (not just
disabled or quarantined) to the release notes
 - sometimes flakes are actual bugs (like concurrency) so treating this as
OK is not desirable
 - without Jiras, no automated release notes
 - Jenkins: retry only will work at job level because it needs Maven to
retry only failed (I think)
 - Jenkins: some of our jobs may have duplicate test names (but might
already be fixed)

*Consider Gradle Enterprise*
Pro:
 - get Gradle scan granularity of flake data (and other stuff)
 - also gives module-level health which we do not have today
Con:
 - cost and administrative burden unknown
 - we probably have to do some small work to make our jobs compatible with
their history tracking

*Require link to Jira to rerun a test*
Instead of saying "Run Java PreCommit" you have to link to the bug relating
to the failure.
Pro:
 - forces investigation
 - helps others find out about issues
Con:
 - adds a lot of manual work, or requires automation (which will probably
be ad hoc and fragile)

Kenn

On Mon, Jul 20, 2020 at 11:59 AM Brian Hulette  wrote:

> > I think we are missing a way for checking that we are making progress on
> P1 issues. For example, P0 issues block releases and this obviously results
> in fixing/triaging/addressing P0 issues at least every 6 weeks. We do not
> have a similar process for flaky tests. I do not know what would be a good
> policy. One suggestion is to ping (email/slack) assignees of issues. I
> recently missed a flaky issue that was assigned to me. A ping like that
> would have reminded me. And if an assignee cannot help/does not have the
> time, we can try to find a new assignee.
>
> Yeah I think this is something we should address. With the new jira
> automation at least assignees should get an email notification after 30
> days because of a jira comment like [1], but that's too long to let a test
> continue to flake. Could Beam Jira Bot ping every N days for P1s that
> aren't making progress?
>
> That wouldn't help us with P1s that have no assignee, or are assigned to
> overloaded people. It seems we'd need some kind of dashboard or report to
> capture those.
>
> [1]
> https://issues.apache.org/jira/browse/BEAM-8101?focusedCommentId=17121918&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17121918
>
> On Fri, Jul 17, 2020 at 1:09 PM Ahmet Altay  wrote:
>
>> Another idea, could we change our "Retest X" phrases with "Retest X
>> (Reason)" phrases? With this change a PR author will have to look at failed
>> test logs. They could catch new flakiness introduced by their PR, file a
>> JIRA for a flakiness that was not noted before, or ping an existing JIRA
>> issue/raise its severity. On the downside this will require PR authors to
>> do more.
>>
>> On Fri, Jul 17, 2020 at 6:46 AM Tyson Hamilton 
>> wrote:
>>
>>> Adding retries can be beneficial in two ways, unblocking a PR, and
>>> collecting metrics about the flakes.
>>>
>>
>> Makes sense. I think we will still need to have a plan to remove retries
>> similar to re-enabling disabled tests.
>>
>>

Re: Chronically flaky tests

2020-07-20 Thread Brian Hulette
> I think we are missing a way for checking that we are making progress on
P1 issues. For example, P0 issues block releases and this obviously results
in fixing/triaging/addressing P0 issues at least every 6 weeks. We do not
have a similar process for flaky tests. I do not know what would be a good
policy. One suggestion is to ping (email/slack) assignees of issues. I
recently missed a flaky issue that was assigned to me. A ping like that
would have reminded me. And if an assignee cannot help/does not have the
time, we can try to find a new assignee.

Yeah I think this is something we should address. With the new jira
automation at least assignees should get an email notification after 30
days because of a jira comment like [1], but that's too long to let a test
continue to flake. Could Beam Jira Bot ping every N days for P1s that
aren't making progress?

That wouldn't help us with P1s that have no assignee, or are assigned to
overloaded people. It seems we'd need some kind of dashboard or report to
capture those.

[1]
https://issues.apache.org/jira/browse/BEAM-8101?focusedCommentId=17121918&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17121918

On Fri, Jul 17, 2020 at 1:09 PM Ahmet Altay  wrote:

> Another idea, could we change our "Retest X" phrases with "Retest X
> (Reason)" phrases? With this change a PR author will have to look at failed
> test logs. They could catch new flakiness introduced by their PR, file a
> JIRA for a flakiness that was not noted before, or ping an existing JIRA
> issue/raise its severity. On the downside this will require PR authors to
> do more.
>
> On Fri, Jul 17, 2020 at 6:46 AM Tyson Hamilton  wrote:
>
>> Adding retries can be beneficial in two ways, unblocking a PR, and
>> collecting metrics about the flakes.
>>
>
> Makes sense. I think we will still need to have a plan to remove retries
> similar to re-enabling disabled tests.
>
>
>>
>> If we also had a flaky test leaderboard that showed which tests are the
>> most flaky, then we could take action on them. Encouraging someone from the
>> community to fix the flaky test is another issue.
>>
>> The test status matrix of tests that is on the GitHub landing page could
>> show flake level to communicate to users which modules are losing a
>> trustable test signal. Maybe this shows up as a flake % or a code coverage
>> % that decreases due to disabled flaky tests.
>>
>
> +1 to a dashboard that will show a "leaderboard" of flaky tests.
>
>
>>
>> I didn't look for plugins, just dreaming up some options.
>>
>>
>>
>>
>> On Thu, Jul 16, 2020, 5:58 PM Luke Cwik  wrote:
>>
>>> What do other Apache projects do to address this issue?
>>>
>>> On Thu, Jul 16, 2020 at 5:51 PM Ahmet Altay  wrote:
>>>
 I agree with the comments in this thread.
 - If we are not re-enabling tests back again or we do not have a plan
 to re-enable them again, disabling tests only provides us temporary relief
 until eventually users find issues instead of disabled tests.
 - I feel similarly about retries. It is reasonable to add retries for
 reasons we understand. Adding retries to avoid flakes is similar to
 disabling tests. They might hide real issues.

 I think we are missing a way for checking that we are making progress
 on P1 issues. For example, P0 issues block releases and this obviously
 results in fixing/triaging/addressing P0 issues at least every 6 weeks. We
 do not have a similar process for flaky tests. I do not know what would be
 a good policy. One suggestion is to ping (email/slack) assignees of issues.
 I recently missed a flaky issue that was assigned to me. A ping like that
 would have reminded me. And if an assignee cannot help/does not have the
 time, we can try to find a new assignee.

 Ahmet


 On Thu, Jul 16, 2020 at 11:52 AM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> I think the original discussion[1] on introducing tenacity might
> answer that question.
>
> [1]
> https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af%40%3Cdev.beam.apache.org%3E
>
> On Thu, Jul 16, 2020 at 10:48 AM Rui Wang  wrote:
>
>> Is there an observation that enabling tenacity improves the
>> development experience on Python SDK? E.g. less wait time to get PR pass
>> and merged? Or it might be a matter of a right number of retry to align
>> with the "flakiness" of a test?
>>
>>
>> -Rui
>>
>> On Thu, Jul 16, 2020 at 10:38 AM Valentyn Tymofieiev <
>> valen...@google.com> wrote:
>>
>>> We used tenacity[1] to retry some unit tests for which we understood
>>> the nature of flakiness.
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/3b9aae2bcaeb48ab43a77368ae496edc73634c91/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1156
>>>
>>> On Thu, Jul 16, 2020 at 10:25 A

Re: Chronically flaky tests

2020-07-17 Thread Ahmet Altay
Another idea, could we change our "Retest X" phrases with "Retest X
(Reason)" phrases? With this change a PR author will have to look at failed
test logs. They could catch new flakiness introduced by their PR, file a
JIRA for a flakiness that was not noted before, or ping an existing JIRA
issue/raise its severity. On the downside this will require PR authors to
do more.

On Fri, Jul 17, 2020 at 6:46 AM Tyson Hamilton  wrote:

> Adding retries can be beneficial in two ways, unblocking a PR, and
> collecting metrics about the flakes.
>

Makes sense. I think we will still need to have a plan to remove retries
similar to re-enabling disabled tests.


>
> If we also had a flaky test leaderboard that showed which tests are the
> most flaky, then we could take action on them. Encouraging someone from the
> community to fix the flaky test is another issue.
>
> The test status matrix of tests that is on the GitHub landing page could
> show flake level to communicate to users which modules are losing a
> trustable test signal. Maybe this shows up as a flake % or a code coverage
> % that decreases due to disabled flaky tests.
>

+1 to a dashboard that will show a "leaderboard" of flaky tests.


>
> I didn't look for plugins, just dreaming up some options.
>
>
>
>
> On Thu, Jul 16, 2020, 5:58 PM Luke Cwik  wrote:
>
>> What do other Apache projects do to address this issue?
>>
>> On Thu, Jul 16, 2020 at 5:51 PM Ahmet Altay  wrote:
>>
>>> I agree with the comments in this thread.
>>> - If we are not re-enabling tests back again or we do not have a plan to
>>> re-enable them again, disabling tests only provides us temporary relief
>>> until eventually users find issues instead of disabled tests.
>>> - I feel similarly about retries. It is reasonable to add retries for
>>> reasons we understand. Adding retries to avoid flakes is similar to
>>> disabling tests. They might hide real issues.
>>>
>>> I think we are missing a way for checking that we are making progress on
>>> P1 issues. For example, P0 issues block releases and this obviously results
>>> in fixing/triaging/addressing P0 issues at least every 6 weeks. We do not
>>> have a similar process for flaky tests. I do not know what would be a good
>>> policy. One suggestion is to ping (email/slack) assignees of issues. I
>>> recently missed a flaky issue that was assigned to me. A ping like that
>>> would have reminded me. And if an assignee cannot help/does not have the
>>> time, we can try to find a new assignee.
>>>
>>> Ahmet
>>>
>>>
>>> On Thu, Jul 16, 2020 at 11:52 AM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>>
 I think the original discussion[1] on introducing tenacity might answer
 that question.

 [1]
 https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af%40%3Cdev.beam.apache.org%3E

 On Thu, Jul 16, 2020 at 10:48 AM Rui Wang  wrote:

> Is there an observation that enabling tenacity improves the
> development experience on Python SDK? E.g. less wait time to get PR pass
> and merged? Or it might be a matter of a right number of retry to align
> with the "flakiness" of a test?
>
>
> -Rui
>
> On Thu, Jul 16, 2020 at 10:38 AM Valentyn Tymofieiev <
> valen...@google.com> wrote:
>
>> We used tenacity[1] to retry some unit tests for which we understood
>> the nature of flakiness.
>>
>> [1]
>> https://github.com/apache/beam/blob/3b9aae2bcaeb48ab43a77368ae496edc73634c91/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1156
>>
>> On Thu, Jul 16, 2020 at 10:25 AM Kenneth Knowles 
>> wrote:
>>
>>> Didn't we use something like that flaky retry plugin for Python
>>> tests at some point? Adding retries may be preferable to disabling the
>>> test. We need a process to remove the retries ASAP though. As Luke says
>>> that is not so easy to make happen. Having a way to make P1 bugs more
>>> visible in an ongoing way may help.
>>>
>>> Kenn
>>>
>>> On Thu, Jul 16, 2020 at 8:57 AM Luke Cwik  wrote:
>>>
 I don't think I have seen tests that were previously disabled
 become re-enabled.

 It seems as though we have about ~60 disabled tests in Java and ~15
 in Python. Half of the Java ones seem to be in ZetaSQL/SQL due to 
 missing
 features so unrelated to being a flake.

 On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov 
 wrote:

> There is something called test-retry-gradle-plugin [1]. It retries
> tests if they fail, and have different modes to handle flaky tests. 
> Did we
> ever try or consider using it?
>
> [1]: https://github.com/gradle/test-retry-gradle-plugin
>
> On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov 
> wrote:
>
>> I agree with what Ahmet is saying. I can share my pers

Re: Chronically flaky tests

2020-07-17 Thread Tyson Hamilton
Adding retries can be beneficial in two ways, unblocking a PR, and
collecting metrics about the flakes.

If we also had a flaky test leaderboard that showed which tests are the
most flaky, then we could take action on them. Encouraging someone from the
community to fix the flaky test is another issue.

The test status matrix of tests that is on the GitHub landing page could
show flake level to communicate to users which modules are losing a
trustable test signal. Maybe this shows up as a flake % or a code coverage
% that decreases due to disabled flaky tests.

I didn't look for plugins, just dreaming up some options.




On Thu, Jul 16, 2020, 5:58 PM Luke Cwik  wrote:

> What do other Apache projects do to address this issue?
>
> On Thu, Jul 16, 2020 at 5:51 PM Ahmet Altay  wrote:
>
>> I agree with the comments in this thread.
>> - If we are not re-enabling tests back again or we do not have a plan to
>> re-enable them again, disabling tests only provides us temporary relief
>> until eventually users find issues instead of disabled tests.
>> - I feel similarly about retries. It is reasonable to add retries for
>> reasons we understand. Adding retries to avoid flakes is similar to
>> disabling tests. They might hide real issues.
>>
>> I think we are missing a way for checking that we are making progress on
>> P1 issues. For example, P0 issues block releases and this obviously results
>> in fixing/triaging/addressing P0 issues at least every 6 weeks. We do not
>> have a similar process for flaky tests. I do not know what would be a good
>> policy. One suggestion is to ping (email/slack) assignees of issues. I
>> recently missed a flaky issue that was assigned to me. A ping like that
>> would have reminded me. And if an assignee cannot help/does not have the
>> time, we can try to find a new assignee.
>>
>> Ahmet
>>
>>
>> On Thu, Jul 16, 2020 at 11:52 AM Valentyn Tymofieiev 
>> wrote:
>>
>>> I think the original discussion[1] on introducing tenacity might answer
>>> that question.
>>>
>>> [1]
>>> https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af%40%3Cdev.beam.apache.org%3E
>>>
>>> On Thu, Jul 16, 2020 at 10:48 AM Rui Wang  wrote:
>>>
 Is there an observation that enabling tenacity improves the
 development experience on Python SDK? E.g. less wait time to get PR pass
 and merged? Or it might be a matter of a right number of retry to align
 with the "flakiness" of a test?


 -Rui

 On Thu, Jul 16, 2020 at 10:38 AM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> We used tenacity[1] to retry some unit tests for which we understood
> the nature of flakiness.
>
> [1]
> https://github.com/apache/beam/blob/3b9aae2bcaeb48ab43a77368ae496edc73634c91/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1156
>
> On Thu, Jul 16, 2020 at 10:25 AM Kenneth Knowles 
> wrote:
>
>> Didn't we use something like that flaky retry plugin for Python tests
>> at some point? Adding retries may be preferable to disabling the test. We
>> need a process to remove the retries ASAP though. As Luke says that is 
>> not
>> so easy to make happen. Having a way to make P1 bugs more visible in an
>> ongoing way may help.
>>
>> Kenn
>>
>> On Thu, Jul 16, 2020 at 8:57 AM Luke Cwik  wrote:
>>
>>> I don't think I have seen tests that were previously disabled become
>>> re-enabled.
>>>
>>> It seems as though we have about ~60 disabled tests in Java and ~15
>>> in Python. Half of the Java ones seem to be in ZetaSQL/SQL due to 
>>> missing
>>> features so unrelated to being a flake.
>>>
>>> On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov 
>>> wrote:
>>>
 There is something called test-retry-gradle-plugin [1]. It retries
 tests if they fail, and have different modes to handle flaky tests. 
 Did we
 ever try or consider using it?

 [1]: https://github.com/gradle/test-retry-gradle-plugin

 On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov 
 wrote:

> I agree with what Ahmet is saying. I can share my perspective,
> recently I had to retrigger build 6 times due to flaky tests, and each
> retrigger took one hour of waiting time.
>
> I've seen examples of automatic tracking of flaky tests, where a
> test is considered flaky if both fails and succeeds for the same git 
> SHA.
> Not sure if there is anything we can enable to get this automatically.
>
> /Gleb
>
> On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay 
> wrote:
>
>> I think it will be reasonable to disable/sickbay any flaky test
>> that is actively blocking people. Collective cost of flaky tests for 
>> such a
>> large group of contributors is very signifi

Re: Chronically flaky tests

2020-07-16 Thread Luke Cwik
What do other Apache projects do to address this issue?

On Thu, Jul 16, 2020 at 5:51 PM Ahmet Altay  wrote:

> I agree with the comments in this thread.
> - If we are not re-enabling tests back again or we do not have a plan to
> re-enable them again, disabling tests only provides us temporary relief
> until eventually users find issues instead of disabled tests.
> - I feel similarly about retries. It is reasonable to add retries for
> reasons we understand. Adding retries to avoid flakes is similar to
> disabling tests. They might hide real issues.
>
> I think we are missing a way for checking that we are making progress on
> P1 issues. For example, P0 issues block releases and this obviously results
> in fixing/triaging/addressing P0 issues at least every 6 weeks. We do not
> have a similar process for flaky tests. I do not know what would be a good
> policy. One suggestion is to ping (email/slack) assignees of issues. I
> recently missed a flaky issue that was assigned to me. A ping like that
> would have reminded me. And if an assignee cannot help/does not have the
> time, we can try to find a new assignee.
>
> Ahmet
>
>
> On Thu, Jul 16, 2020 at 11:52 AM Valentyn Tymofieiev 
> wrote:
>
>> I think the original discussion[1] on introducing tenacity might answer
>> that question.
>>
>> [1]
>> https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af%40%3Cdev.beam.apache.org%3E
>>
>> On Thu, Jul 16, 2020 at 10:48 AM Rui Wang  wrote:
>>
>>> Is there an observation that enabling tenacity improves the
>>> development experience on Python SDK? E.g. less wait time to get PR pass
>>> and merged? Or it might be a matter of a right number of retry to align
>>> with the "flakiness" of a test?
>>>
>>>
>>> -Rui
>>>
>>> On Thu, Jul 16, 2020 at 10:38 AM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>>
 We used tenacity[1] to retry some unit tests for which we understood
 the nature of flakiness.

 [1]
 https://github.com/apache/beam/blob/3b9aae2bcaeb48ab43a77368ae496edc73634c91/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1156

 On Thu, Jul 16, 2020 at 10:25 AM Kenneth Knowles 
 wrote:

> Didn't we use something like that flaky retry plugin for Python tests
> at some point? Adding retries may be preferable to disabling the test. We
> need a process to remove the retries ASAP though. As Luke says that is not
> so easy to make happen. Having a way to make P1 bugs more visible in an
> ongoing way may help.
>
> Kenn
>
> On Thu, Jul 16, 2020 at 8:57 AM Luke Cwik  wrote:
>
>> I don't think I have seen tests that were previously disabled become
>> re-enabled.
>>
>> It seems as though we have about ~60 disabled tests in Java and ~15
>> in Python. Half of the Java ones seem to be in ZetaSQL/SQL due to missing
>> features so unrelated to being a flake.
>>
>> On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov 
>> wrote:
>>
>>> There is something called test-retry-gradle-plugin [1]. It retries
>>> tests if they fail, and have different modes to handle flaky tests. Did 
>>> we
>>> ever try or consider using it?
>>>
>>> [1]: https://github.com/gradle/test-retry-gradle-plugin
>>>
>>> On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov 
>>> wrote:
>>>
 I agree with what Ahmet is saying. I can share my perspective,
 recently I had to retrigger build 6 times due to flaky tests, and each
 retrigger took one hour of waiting time.

 I've seen examples of automatic tracking of flaky tests, where a
 test is considered flaky if both fails and succeeds for the same git 
 SHA.
 Not sure if there is anything we can enable to get this automatically.

 /Gleb

 On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay 
 wrote:

> I think it will be reasonable to disable/sickbay any flaky test
> that is actively blocking people. Collective cost of flaky tests for 
> such a
> large group of contributors is very significant.
>
> Most of these issues are unassigned. IMO, it makes sense to assign
> these issues to the most relevant person (who added the test/who 
> generally
> maintains those components). Those people can either fix and 
> re-enable the
> tests, or remove them if they no longer provide valuable signals.
>
> Ahmet
>
> On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles 
> wrote:
>
>> The situation is much worse than that IMO. My experience of the
>> last few days is that a large portion of time went to *just 
>> connecting
>> failing runs with the corresponding Jira tickets or filing new ones*.
>>
>> Summarized on PRs:
>>
>>  -
>

Re: Chronically flaky tests

2020-07-16 Thread Ahmet Altay
I agree with the comments in this thread.
- If we are not re-enabling tests back again or we do not have a plan to
re-enable them again, disabling tests only provides us temporary relief
until eventually users find issues instead of disabled tests.
- I feel similarly about retries. It is reasonable to add retries for
reasons we understand. Adding retries to avoid flakes is similar to
disabling tests. They might hide real issues.

I think we are missing a way for checking that we are making progress on P1
issues. For example, P0 issues block releases and this obviously results in
fixing/triaging/addressing P0 issues at least every 6 weeks. We do not have
a similar process for flaky tests. I do not know what would be a good
policy. One suggestion is to ping (email/slack) assignees of issues. I
recently missed a flaky issue that was assigned to me. A ping like that
would have reminded me. And if an assignee cannot help/does not have the
time, we can try to find a new assignee.

Ahmet


On Thu, Jul 16, 2020 at 11:52 AM Valentyn Tymofieiev 
wrote:

> I think the original discussion[1] on introducing tenacity might answer
> that question.
>
> [1]
> https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af%40%3Cdev.beam.apache.org%3E
>
> On Thu, Jul 16, 2020 at 10:48 AM Rui Wang  wrote:
>
>> Is there an observation that enabling tenacity improves the
>> development experience on Python SDK? E.g. less wait time to get PR pass
>> and merged? Or it might be a matter of a right number of retry to align
>> with the "flakiness" of a test?
>>
>>
>> -Rui
>>
>> On Thu, Jul 16, 2020 at 10:38 AM Valentyn Tymofieiev 
>> wrote:
>>
>>> We used tenacity[1] to retry some unit tests for which we understood the
>>> nature of flakiness.
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/3b9aae2bcaeb48ab43a77368ae496edc73634c91/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1156
>>>
>>> On Thu, Jul 16, 2020 at 10:25 AM Kenneth Knowles 
>>> wrote:
>>>
 Didn't we use something like that flaky retry plugin for Python tests
 at some point? Adding retries may be preferable to disabling the test. We
 need a process to remove the retries ASAP though. As Luke says that is not
 so easy to make happen. Having a way to make P1 bugs more visible in an
 ongoing way may help.

 Kenn

 On Thu, Jul 16, 2020 at 8:57 AM Luke Cwik  wrote:

> I don't think I have seen tests that were previously disabled become
> re-enabled.
>
> It seems as though we have about ~60 disabled tests in Java and ~15 in
> Python. Half of the Java ones seem to be in ZetaSQL/SQL due to missing
> features so unrelated to being a flake.
>
> On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov 
> wrote:
>
>> There is something called test-retry-gradle-plugin [1]. It retries
>> tests if they fail, and have different modes to handle flaky tests. Did 
>> we
>> ever try or consider using it?
>>
>> [1]: https://github.com/gradle/test-retry-gradle-plugin
>>
>> On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov 
>> wrote:
>>
>>> I agree with what Ahmet is saying. I can share my perspective,
>>> recently I had to retrigger build 6 times due to flaky tests, and each
>>> retrigger took one hour of waiting time.
>>>
>>> I've seen examples of automatic tracking of flaky tests, where a
>>> test is considered flaky if both fails and succeeds for the same git 
>>> SHA.
>>> Not sure if there is anything we can enable to get this automatically.
>>>
>>> /Gleb
>>>
>>> On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay 
>>> wrote:
>>>
 I think it will be reasonable to disable/sickbay any flaky test
 that is actively blocking people. Collective cost of flaky tests for 
 such a
 large group of contributors is very significant.

 Most of these issues are unassigned. IMO, it makes sense to assign
 these issues to the most relevant person (who added the test/who 
 generally
 maintains those components). Those people can either fix and re-enable 
 the
 tests, or remove them if they no longer provide valuable signals.

 Ahmet

 On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles 
 wrote:

> The situation is much worse than that IMO. My experience of the
> last few days is that a large portion of time went to *just connecting
> failing runs with the corresponding Jira tickets or filing new ones*.
>
> Summarized on PRs:
>
>  -
> https://github.com/apache/beam/pull/12272#issuecomment-659050891
>  -
> https://github.com/apache/beam/pull/12273#issuecomment-659070317
>  -
> https://github.com/apache/beam/pull/12225#issuecomment-656973073
>  -
> https://

Re: Chronically flaky tests

2020-07-16 Thread Valentyn Tymofieiev
I think the original discussion[1] on introducing tenacity might answer
that question.

[1]
https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af%40%3Cdev.beam.apache.org%3E

On Thu, Jul 16, 2020 at 10:48 AM Rui Wang  wrote:

> Is there an observation that enabling tenacity improves the
> development experience on Python SDK? E.g. less wait time to get PR pass
> and merged? Or it might be a matter of a right number of retry to align
> with the "flakiness" of a test?
>
>
> -Rui
>
> On Thu, Jul 16, 2020 at 10:38 AM Valentyn Tymofieiev 
> wrote:
>
>> We used tenacity[1] to retry some unit tests for which we understood the
>> nature of flakiness.
>>
>> [1]
>> https://github.com/apache/beam/blob/3b9aae2bcaeb48ab43a77368ae496edc73634c91/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1156
>>
>> On Thu, Jul 16, 2020 at 10:25 AM Kenneth Knowles  wrote:
>>
>>> Didn't we use something like that flaky retry plugin for Python tests at
>>> some point? Adding retries may be preferable to disabling the test. We need
>>> a process to remove the retries ASAP though. As Luke says that is not so
>>> easy to make happen. Having a way to make P1 bugs more visible in an
>>> ongoing way may help.
>>>
>>> Kenn
>>>
>>> On Thu, Jul 16, 2020 at 8:57 AM Luke Cwik  wrote:
>>>
 I don't think I have seen tests that were previously disabled become
 re-enabled.

 It seems as though we have about ~60 disabled tests in Java and ~15 in
 Python. Half of the Java ones seem to be in ZetaSQL/SQL due to missing
 features so unrelated to being a flake.

 On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov  wrote:

> There is something called test-retry-gradle-plugin [1]. It retries
> tests if they fail, and have different modes to handle flaky tests. Did we
> ever try or consider using it?
>
> [1]: https://github.com/gradle/test-retry-gradle-plugin
>
> On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov 
> wrote:
>
>> I agree with what Ahmet is saying. I can share my perspective,
>> recently I had to retrigger build 6 times due to flaky tests, and each
>> retrigger took one hour of waiting time.
>>
>> I've seen examples of automatic tracking of flaky tests, where a test
>> is considered flaky if both fails and succeeds for the same git SHA. Not
>> sure if there is anything we can enable to get this automatically.
>>
>> /Gleb
>>
>> On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay  wrote:
>>
>>> I think it will be reasonable to disable/sickbay any flaky test that
>>> is actively blocking people. Collective cost of flaky tests for such a
>>> large group of contributors is very significant.
>>>
>>> Most of these issues are unassigned. IMO, it makes sense to assign
>>> these issues to the most relevant person (who added the test/who 
>>> generally
>>> maintains those components). Those people can either fix and re-enable 
>>> the
>>> tests, or remove them if they no longer provide valuable signals.
>>>
>>> Ahmet
>>>
>>> On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles 
>>> wrote:
>>>
 The situation is much worse than that IMO. My experience of the
 last few days is that a large portion of time went to *just connecting
 failing runs with the corresponding Jira tickets or filing new ones*.

 Summarized on PRs:

  - https://github.com/apache/beam/pull/12272#issuecomment-659050891
  - https://github.com/apache/beam/pull/12273#issuecomment-659070317
  - https://github.com/apache/beam/pull/12225#issuecomment-656973073
  - https://github.com/apache/beam/pull/12225#issuecomment-657743373
  - https://github.com/apache/beam/pull/12224#issuecomment-657744481
  - https://github.com/apache/beam/pull/12216#issuecomment-657735289
  - https://github.com/apache/beam/pull/12216#issuecomment-657780781
  - https://github.com/apache/beam/pull/12216#issuecomment-657799415

 The tickets:

  - https://issues.apache.org/jira/browse/BEAM-10460
 SparkPortableExecutionTest
  - https://issues.apache.org/jira/browse/BEAM-10471
 CassandraIOTest > testEstimatedSizeBytes
  - https://issues.apache.org/jira/browse/BEAM-10504
 ElasticSearchIOTest > testWriteFullAddressing and testWriteWithIndexFn
  - https://issues.apache.org/jira/browse/BEAM-10470 JdbcDriverTest
  - https://issues.apache.org/jira/browse/BEAM-8025 CassandraIOTest
 > @BeforeClass (classmethod)
  - https://issues.apache.org/jira/browse/BEAM-8454 FnHarnessTest
  - https://issues.apache.org/jira/browse/BEAM-10506
 SplunkEventWriterTest
  - https://issues.apache.org/jira/browse/BEAM-10472 direct runner
 ParDoLifecycleTest
  - https://issues.apa

Re: Chronically flaky tests

2020-07-16 Thread Rui Wang
Is there an observation that enabling tenacity improves the
development experience on Python SDK? E.g. less wait time to get PR pass
and merged? Or it might be a matter of a right number of retry to align
with the "flakiness" of a test?


-Rui

On Thu, Jul 16, 2020 at 10:38 AM Valentyn Tymofieiev 
wrote:

> We used tenacity[1] to retry some unit tests for which we understood the
> nature of flakiness.
>
> [1]
> https://github.com/apache/beam/blob/3b9aae2bcaeb48ab43a77368ae496edc73634c91/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1156
>
> On Thu, Jul 16, 2020 at 10:25 AM Kenneth Knowles  wrote:
>
>> Didn't we use something like that flaky retry plugin for Python tests at
>> some point? Adding retries may be preferable to disabling the test. We need
>> a process to remove the retries ASAP though. As Luke says that is not so
>> easy to make happen. Having a way to make P1 bugs more visible in an
>> ongoing way may help.
>>
>> Kenn
>>
>> On Thu, Jul 16, 2020 at 8:57 AM Luke Cwik  wrote:
>>
>>> I don't think I have seen tests that were previously disabled become
>>> re-enabled.
>>>
>>> It seems as though we have about ~60 disabled tests in Java and ~15 in
>>> Python. Half of the Java ones seem to be in ZetaSQL/SQL due to missing
>>> features so unrelated to being a flake.
>>>
>>> On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov  wrote:
>>>
 There is something called test-retry-gradle-plugin [1]. It retries
 tests if they fail, and have different modes to handle flaky tests. Did we
 ever try or consider using it?

 [1]: https://github.com/gradle/test-retry-gradle-plugin

 On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov  wrote:

> I agree with what Ahmet is saying. I can share my perspective,
> recently I had to retrigger build 6 times due to flaky tests, and each
> retrigger took one hour of waiting time.
>
> I've seen examples of automatic tracking of flaky tests, where a test
> is considered flaky if both fails and succeeds for the same git SHA. Not
> sure if there is anything we can enable to get this automatically.
>
> /Gleb
>
> On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay  wrote:
>
>> I think it will be reasonable to disable/sickbay any flaky test that
>> is actively blocking people. Collective cost of flaky tests for such a
>> large group of contributors is very significant.
>>
>> Most of these issues are unassigned. IMO, it makes sense to assign
>> these issues to the most relevant person (who added the test/who 
>> generally
>> maintains those components). Those people can either fix and re-enable 
>> the
>> tests, or remove them if they no longer provide valuable signals.
>>
>> Ahmet
>>
>> On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles 
>> wrote:
>>
>>> The situation is much worse than that IMO. My experience of the last
>>> few days is that a large portion of time went to *just connecting 
>>> failing
>>> runs with the corresponding Jira tickets or filing new ones*.
>>>
>>> Summarized on PRs:
>>>
>>>  - https://github.com/apache/beam/pull/12272#issuecomment-659050891
>>>  - https://github.com/apache/beam/pull/12273#issuecomment-659070317
>>>  - https://github.com/apache/beam/pull/12225#issuecomment-656973073
>>>  - https://github.com/apache/beam/pull/12225#issuecomment-657743373
>>>  - https://github.com/apache/beam/pull/12224#issuecomment-657744481
>>>  - https://github.com/apache/beam/pull/12216#issuecomment-657735289
>>>  - https://github.com/apache/beam/pull/12216#issuecomment-657780781
>>>  - https://github.com/apache/beam/pull/12216#issuecomment-657799415
>>>
>>> The tickets:
>>>
>>>  - https://issues.apache.org/jira/browse/BEAM-10460
>>> SparkPortableExecutionTest
>>>  - https://issues.apache.org/jira/browse/BEAM-10471 CassandraIOTest
>>> > testEstimatedSizeBytes
>>>  - https://issues.apache.org/jira/browse/BEAM-10504
>>> ElasticSearchIOTest > testWriteFullAddressing and testWriteWithIndexFn
>>>  - https://issues.apache.org/jira/browse/BEAM-10470 JdbcDriverTest
>>>  - https://issues.apache.org/jira/browse/BEAM-8025 CassandraIOTest
>>> > @BeforeClass (classmethod)
>>>  - https://issues.apache.org/jira/browse/BEAM-8454 FnHarnessTest
>>>  - https://issues.apache.org/jira/browse/BEAM-10506
>>> SplunkEventWriterTest
>>>  - https://issues.apache.org/jira/browse/BEAM-10472 direct runner
>>> ParDoLifecycleTest
>>>  - https://issues.apache.org/jira/browse/BEAM-9187
>>> DefaultJobBundleFactoryTest
>>>
>>> Here are our P1 test flake bugs:
>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>>>
>>> I

Re: Chronically flaky tests

2020-07-16 Thread Valentyn Tymofieiev
We used tenacity[1] to retry some unit tests for which we understood the
nature of flakiness.

[1]
https://github.com/apache/beam/blob/3b9aae2bcaeb48ab43a77368ae496edc73634c91/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1156

On Thu, Jul 16, 2020 at 10:25 AM Kenneth Knowles  wrote:

> Didn't we use something like that flaky retry plugin for Python tests at
> some point? Adding retries may be preferable to disabling the test. We need
> a process to remove the retries ASAP though. As Luke says that is not so
> easy to make happen. Having a way to make P1 bugs more visible in an
> ongoing way may help.
>
> Kenn
>
> On Thu, Jul 16, 2020 at 8:57 AM Luke Cwik  wrote:
>
>> I don't think I have seen tests that were previously disabled become
>> re-enabled.
>>
>> It seems as though we have about ~60 disabled tests in Java and ~15 in
>> Python. Half of the Java ones seem to be in ZetaSQL/SQL due to missing
>> features so unrelated to being a flake.
>>
>> On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov  wrote:
>>
>>> There is something called test-retry-gradle-plugin [1]. It retries tests
>>> if they fail, and have different modes to handle flaky tests. Did we ever
>>> try or consider using it?
>>>
>>> [1]: https://github.com/gradle/test-retry-gradle-plugin
>>>
>>> On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov  wrote:
>>>
 I agree with what Ahmet is saying. I can share my perspective, recently
 I had to retrigger build 6 times due to flaky tests, and each retrigger
 took one hour of waiting time.

 I've seen examples of automatic tracking of flaky tests, where a test
 is considered flaky if both fails and succeeds for the same git SHA. Not
 sure if there is anything we can enable to get this automatically.

 /Gleb

 On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay  wrote:

> I think it will be reasonable to disable/sickbay any flaky test that
> is actively blocking people. Collective cost of flaky tests for such a
> large group of contributors is very significant.
>
> Most of these issues are unassigned. IMO, it makes sense to assign
> these issues to the most relevant person (who added the test/who generally
> maintains those components). Those people can either fix and re-enable the
> tests, or remove them if they no longer provide valuable signals.
>
> Ahmet
>
> On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles 
> wrote:
>
>> The situation is much worse than that IMO. My experience of the last
>> few days is that a large portion of time went to *just connecting failing
>> runs with the corresponding Jira tickets or filing new ones*.
>>
>> Summarized on PRs:
>>
>>  - https://github.com/apache/beam/pull/12272#issuecomment-659050891
>>  - https://github.com/apache/beam/pull/12273#issuecomment-659070317
>>  - https://github.com/apache/beam/pull/12225#issuecomment-656973073
>>  - https://github.com/apache/beam/pull/12225#issuecomment-657743373
>>  - https://github.com/apache/beam/pull/12224#issuecomment-657744481
>>  - https://github.com/apache/beam/pull/12216#issuecomment-657735289
>>  - https://github.com/apache/beam/pull/12216#issuecomment-657780781
>>  - https://github.com/apache/beam/pull/12216#issuecomment-657799415
>>
>> The tickets:
>>
>>  - https://issues.apache.org/jira/browse/BEAM-10460
>> SparkPortableExecutionTest
>>  - https://issues.apache.org/jira/browse/BEAM-10471 CassandraIOTest
>> > testEstimatedSizeBytes
>>  - https://issues.apache.org/jira/browse/BEAM-10504
>> ElasticSearchIOTest > testWriteFullAddressing and testWriteWithIndexFn
>>  - https://issues.apache.org/jira/browse/BEAM-10470 JdbcDriverTest
>>  - https://issues.apache.org/jira/browse/BEAM-8025 CassandraIOTest
>> > @BeforeClass (classmethod)
>>  - https://issues.apache.org/jira/browse/BEAM-8454 FnHarnessTest
>>  - https://issues.apache.org/jira/browse/BEAM-10506
>> SplunkEventWriterTest
>>  - https://issues.apache.org/jira/browse/BEAM-10472 direct runner
>> ParDoLifecycleTest
>>  - https://issues.apache.org/jira/browse/BEAM-9187
>> DefaultJobBundleFactoryTest
>>
>> Here are our P1 test flake bugs:
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>>
>> It seems quite a few of them are actively hindering people right now.
>>
>> Kenn
>>
>> On Wed, Jul 15, 2020 at 4:23 PM Andrew Pilloud 
>> wrote:
>>
>>> We have two test suites that are responsible for a large percentage
>>> of our flaky tests and  both have bugs open for about a year without 
>>> being
>>> fixed. These suites are ParDoLifecycleTest (BEAM-8101
>>> 

Re: Chronically flaky tests

2020-07-16 Thread Kenneth Knowles
Didn't we use something like that flaky retry plugin for Python tests at
some point? Adding retries may be preferable to disabling the test. We need
a process to remove the retries ASAP though. As Luke says that is not so
easy to make happen. Having a way to make P1 bugs more visible in an
ongoing way may help.

Kenn

On Thu, Jul 16, 2020 at 8:57 AM Luke Cwik  wrote:

> I don't think I have seen tests that were previously disabled become
> re-enabled.
>
> It seems as though we have about ~60 disabled tests in Java and ~15 in
> Python. Half of the Java ones seem to be in ZetaSQL/SQL due to missing
> features so unrelated to being a flake.
>
> On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov  wrote:
>
>> There is something called test-retry-gradle-plugin [1]. It retries tests
>> if they fail, and have different modes to handle flaky tests. Did we ever
>> try or consider using it?
>>
>> [1]: https://github.com/gradle/test-retry-gradle-plugin
>>
>> On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov  wrote:
>>
>>> I agree with what Ahmet is saying. I can share my perspective, recently
>>> I had to retrigger build 6 times due to flaky tests, and each retrigger
>>> took one hour of waiting time.
>>>
>>> I've seen examples of automatic tracking of flaky tests, where a test is
>>> considered flaky if both fails and succeeds for the same git SHA. Not sure
>>> if there is anything we can enable to get this automatically.
>>>
>>> /Gleb
>>>
>>> On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay  wrote:
>>>
 I think it will be reasonable to disable/sickbay any flaky test that is
 actively blocking people. Collective cost of flaky tests for such a large
 group of contributors is very significant.

 Most of these issues are unassigned. IMO, it makes sense to assign
 these issues to the most relevant person (who added the test/who generally
 maintains those components). Those people can either fix and re-enable the
 tests, or remove them if they no longer provide valuable signals.

 Ahmet

 On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles 
 wrote:

> The situation is much worse than that IMO. My experience of the last
> few days is that a large portion of time went to *just connecting failing
> runs with the corresponding Jira tickets or filing new ones*.
>
> Summarized on PRs:
>
>  - https://github.com/apache/beam/pull/12272#issuecomment-659050891
>  - https://github.com/apache/beam/pull/12273#issuecomment-659070317
>  - https://github.com/apache/beam/pull/12225#issuecomment-656973073
>  - https://github.com/apache/beam/pull/12225#issuecomment-657743373
>  - https://github.com/apache/beam/pull/12224#issuecomment-657744481
>  - https://github.com/apache/beam/pull/12216#issuecomment-657735289
>  - https://github.com/apache/beam/pull/12216#issuecomment-657780781
>  - https://github.com/apache/beam/pull/12216#issuecomment-657799415
>
> The tickets:
>
>  - https://issues.apache.org/jira/browse/BEAM-10460
> SparkPortableExecutionTest
>  - https://issues.apache.org/jira/browse/BEAM-10471 CassandraIOTest >
> testEstimatedSizeBytes
>  - https://issues.apache.org/jira/browse/BEAM-10504
> ElasticSearchIOTest > testWriteFullAddressing and testWriteWithIndexFn
>  - https://issues.apache.org/jira/browse/BEAM-10470 JdbcDriverTest
>  - https://issues.apache.org/jira/browse/BEAM-8025 CassandraIOTest
> > @BeforeClass (classmethod)
>  - https://issues.apache.org/jira/browse/BEAM-8454 FnHarnessTest
>  - https://issues.apache.org/jira/browse/BEAM-10506
> SplunkEventWriterTest
>  - https://issues.apache.org/jira/browse/BEAM-10472 direct runner
> ParDoLifecycleTest
>  - https://issues.apache.org/jira/browse/BEAM-9187
> DefaultJobBundleFactoryTest
>
> Here are our P1 test flake bugs:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>
> It seems quite a few of them are actively hindering people right now.
>
> Kenn
>
> On Wed, Jul 15, 2020 at 4:23 PM Andrew Pilloud 
> wrote:
>
>> We have two test suites that are responsible for a large percentage
>> of our flaky tests and  both have bugs open for about a year without 
>> being
>> fixed. These suites are ParDoLifecycleTest (BEAM-8101
>> ) in Java
>> and BigQueryWriteIntegrationTests in python (py3 BEAM-9484
>> , py2 BEAM-9232
>> , old duplicate
>> BEAM-8197 ).
>>
>> Are there any volunteers to look into these issues? What can we do to
>> mitigate the flakiness u

Re: Chronically flaky tests

2020-07-16 Thread Luke Cwik
I don't think I have seen tests that were previously disabled become
re-enabled.

It seems as though we have about ~60 disabled tests in Java and ~15 in
Python. Half of the Java ones seem to be in ZetaSQL/SQL due to missing
features so unrelated to being a flake.

On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov  wrote:

> There is something called test-retry-gradle-plugin [1]. It retries tests
> if they fail, and have different modes to handle flaky tests. Did we ever
> try or consider using it?
>
> [1]: https://github.com/gradle/test-retry-gradle-plugin
>
> On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov  wrote:
>
>> I agree with what Ahmet is saying. I can share my perspective, recently I
>> had to retrigger build 6 times due to flaky tests, and each retrigger took
>> one hour of waiting time.
>>
>> I've seen examples of automatic tracking of flaky tests, where a test is
>> considered flaky if both fails and succeeds for the same git SHA. Not sure
>> if there is anything we can enable to get this automatically.
>>
>> /Gleb
>>
>> On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay  wrote:
>>
>>> I think it will be reasonable to disable/sickbay any flaky test that is
>>> actively blocking people. Collective cost of flaky tests for such a large
>>> group of contributors is very significant.
>>>
>>> Most of these issues are unassigned. IMO, it makes sense to assign these
>>> issues to the most relevant person (who added the test/who generally
>>> maintains those components). Those people can either fix and re-enable the
>>> tests, or remove them if they no longer provide valuable signals.
>>>
>>> Ahmet
>>>
>>> On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles  wrote:
>>>
 The situation is much worse than that IMO. My experience of the last
 few days is that a large portion of time went to *just connecting failing
 runs with the corresponding Jira tickets or filing new ones*.

 Summarized on PRs:

  - https://github.com/apache/beam/pull/12272#issuecomment-659050891
  - https://github.com/apache/beam/pull/12273#issuecomment-659070317
  - https://github.com/apache/beam/pull/12225#issuecomment-656973073
  - https://github.com/apache/beam/pull/12225#issuecomment-657743373
  - https://github.com/apache/beam/pull/12224#issuecomment-657744481
  - https://github.com/apache/beam/pull/12216#issuecomment-657735289
  - https://github.com/apache/beam/pull/12216#issuecomment-657780781
  - https://github.com/apache/beam/pull/12216#issuecomment-657799415

 The tickets:

  - https://issues.apache.org/jira/browse/BEAM-10460
 SparkPortableExecutionTest
  - https://issues.apache.org/jira/browse/BEAM-10471 CassandraIOTest >
 testEstimatedSizeBytes
  - https://issues.apache.org/jira/browse/BEAM-10504
 ElasticSearchIOTest > testWriteFullAddressing and testWriteWithIndexFn
  - https://issues.apache.org/jira/browse/BEAM-10470 JdbcDriverTest
  - https://issues.apache.org/jira/browse/BEAM-8025 CassandraIOTest
 > @BeforeClass (classmethod)
  - https://issues.apache.org/jira/browse/BEAM-8454 FnHarnessTest
  - https://issues.apache.org/jira/browse/BEAM-10506
 SplunkEventWriterTest
  - https://issues.apache.org/jira/browse/BEAM-10472 direct runner
 ParDoLifecycleTest
  - https://issues.apache.org/jira/browse/BEAM-9187
 DefaultJobBundleFactoryTest

 Here are our P1 test flake bugs:
 https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC

 It seems quite a few of them are actively hindering people right now.

 Kenn

 On Wed, Jul 15, 2020 at 4:23 PM Andrew Pilloud 
 wrote:

> We have two test suites that are responsible for a large percentage of
> our flaky tests and  both have bugs open for about a year without being
> fixed. These suites are ParDoLifecycleTest (BEAM-8101
> ) in Java
> and BigQueryWriteIntegrationTests in python (py3 BEAM-9484
> , py2 BEAM-9232
> , old duplicate
> BEAM-8197 ).
>
> Are there any volunteers to look into these issues? What can we do to
> mitigate the flakiness until someone has time to investigate?
>
> Andrew
>



Re: Chronically flaky tests

2020-07-16 Thread Gleb Kanterov
There is something called test-retry-gradle-plugin [1]. It retries tests if
they fail, and have different modes to handle flaky tests. Did we ever try
or consider using it?

[1]: https://github.com/gradle/test-retry-gradle-plugin

On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov  wrote:

> I agree with what Ahmet is saying. I can share my perspective, recently I
> had to retrigger build 6 times due to flaky tests, and each retrigger took
> one hour of waiting time.
>
> I've seen examples of automatic tracking of flaky tests, where a test is
> considered flaky if both fails and succeeds for the same git SHA. Not sure
> if there is anything we can enable to get this automatically.
>
> /Gleb
>
> On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay  wrote:
>
>> I think it will be reasonable to disable/sickbay any flaky test that is
>> actively blocking people. Collective cost of flaky tests for such a large
>> group of contributors is very significant.
>>
>> Most of these issues are unassigned. IMO, it makes sense to assign these
>> issues to the most relevant person (who added the test/who generally
>> maintains those components). Those people can either fix and re-enable the
>> tests, or remove them if they no longer provide valuable signals.
>>
>> Ahmet
>>
>> On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles  wrote:
>>
>>> The situation is much worse than that IMO. My experience of the last few
>>> days is that a large portion of time went to *just connecting failing runs
>>> with the corresponding Jira tickets or filing new ones*.
>>>
>>> Summarized on PRs:
>>>
>>>  - https://github.com/apache/beam/pull/12272#issuecomment-659050891
>>>  - https://github.com/apache/beam/pull/12273#issuecomment-659070317
>>>  - https://github.com/apache/beam/pull/12225#issuecomment-656973073
>>>  - https://github.com/apache/beam/pull/12225#issuecomment-657743373
>>>  - https://github.com/apache/beam/pull/12224#issuecomment-657744481
>>>  - https://github.com/apache/beam/pull/12216#issuecomment-657735289
>>>  - https://github.com/apache/beam/pull/12216#issuecomment-657780781
>>>  - https://github.com/apache/beam/pull/12216#issuecomment-657799415
>>>
>>> The tickets:
>>>
>>>  - https://issues.apache.org/jira/browse/BEAM-10460
>>> SparkPortableExecutionTest
>>>  - https://issues.apache.org/jira/browse/BEAM-10471 CassandraIOTest >
>>> testEstimatedSizeBytes
>>>  - https://issues.apache.org/jira/browse/BEAM-10504 ElasticSearchIOTest
>>> > testWriteFullAddressing and testWriteWithIndexFn
>>>  - https://issues.apache.org/jira/browse/BEAM-10470 JdbcDriverTest
>>>  - https://issues.apache.org/jira/browse/BEAM-8025 CassandraIOTest
>>> > @BeforeClass (classmethod)
>>>  - https://issues.apache.org/jira/browse/BEAM-8454 FnHarnessTest
>>>  - https://issues.apache.org/jira/browse/BEAM-10506
>>> SplunkEventWriterTest
>>>  - https://issues.apache.org/jira/browse/BEAM-10472 direct runner
>>> ParDoLifecycleTest
>>>  - https://issues.apache.org/jira/browse/BEAM-9187
>>> DefaultJobBundleFactoryTest
>>>
>>> Here are our P1 test flake bugs:
>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>>>
>>> It seems quite a few of them are actively hindering people right now.
>>>
>>> Kenn
>>>
>>> On Wed, Jul 15, 2020 at 4:23 PM Andrew Pilloud 
>>> wrote:
>>>
 We have two test suites that are responsible for a large percentage of
 our flaky tests and  both have bugs open for about a year without being
 fixed. These suites are ParDoLifecycleTest (BEAM-8101
 ) in Java
 and BigQueryWriteIntegrationTests in python (py3 BEAM-9484
 , py2 BEAM-9232
 , old duplicate
 BEAM-8197 ).

 Are there any volunteers to look into these issues? What can we do to
 mitigate the flakiness until someone has time to investigate?

 Andrew

>>>


Re: Chronically flaky tests

2020-07-16 Thread Gleb Kanterov
I agree with what Ahmet is saying. I can share my perspective, recently I
had to retrigger build 6 times due to flaky tests, and each retrigger took
one hour of waiting time.

I've seen examples of automatic tracking of flaky tests, where a test is
considered flaky if both fails and succeeds for the same git SHA. Not sure
if there is anything we can enable to get this automatically.

/Gleb

On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay  wrote:

> I think it will be reasonable to disable/sickbay any flaky test that is
> actively blocking people. Collective cost of flaky tests for such a large
> group of contributors is very significant.
>
> Most of these issues are unassigned. IMO, it makes sense to assign these
> issues to the most relevant person (who added the test/who generally
> maintains those components). Those people can either fix and re-enable the
> tests, or remove them if they no longer provide valuable signals.
>
> Ahmet
>
> On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles  wrote:
>
>> The situation is much worse than that IMO. My experience of the last few
>> days is that a large portion of time went to *just connecting failing runs
>> with the corresponding Jira tickets or filing new ones*.
>>
>> Summarized on PRs:
>>
>>  - https://github.com/apache/beam/pull/12272#issuecomment-659050891
>>  - https://github.com/apache/beam/pull/12273#issuecomment-659070317
>>  - https://github.com/apache/beam/pull/12225#issuecomment-656973073
>>  - https://github.com/apache/beam/pull/12225#issuecomment-657743373
>>  - https://github.com/apache/beam/pull/12224#issuecomment-657744481
>>  - https://github.com/apache/beam/pull/12216#issuecomment-657735289
>>  - https://github.com/apache/beam/pull/12216#issuecomment-657780781
>>  - https://github.com/apache/beam/pull/12216#issuecomment-657799415
>>
>> The tickets:
>>
>>  - https://issues.apache.org/jira/browse/BEAM-10460
>> SparkPortableExecutionTest
>>  - https://issues.apache.org/jira/browse/BEAM-10471 CassandraIOTest >
>> testEstimatedSizeBytes
>>  - https://issues.apache.org/jira/browse/BEAM-10504 ElasticSearchIOTest
>> > testWriteFullAddressing and testWriteWithIndexFn
>>  - https://issues.apache.org/jira/browse/BEAM-10470 JdbcDriverTest
>>  - https://issues.apache.org/jira/browse/BEAM-8025 CassandraIOTest
>> > @BeforeClass (classmethod)
>>  - https://issues.apache.org/jira/browse/BEAM-8454 FnHarnessTest
>>  - https://issues.apache.org/jira/browse/BEAM-10506 SplunkEventWriterTest
>>  - https://issues.apache.org/jira/browse/BEAM-10472 direct runner
>> ParDoLifecycleTest
>>  - https://issues.apache.org/jira/browse/BEAM-9187
>> DefaultJobBundleFactoryTest
>>
>> Here are our P1 test flake bugs:
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>>
>> It seems quite a few of them are actively hindering people right now.
>>
>> Kenn
>>
>> On Wed, Jul 15, 2020 at 4:23 PM Andrew Pilloud 
>> wrote:
>>
>>> We have two test suites that are responsible for a large percentage of
>>> our flaky tests and  both have bugs open for about a year without being
>>> fixed. These suites are ParDoLifecycleTest (BEAM-8101
>>> ) in Java
>>> and BigQueryWriteIntegrationTests in python (py3 BEAM-9484
>>> , py2 BEAM-9232
>>> , old duplicate
>>> BEAM-8197 ).
>>>
>>> Are there any volunteers to look into these issues? What can we do to
>>> mitigate the flakiness until someone has time to investigate?
>>>
>>> Andrew
>>>
>>


Re: Chronically flaky tests

2020-07-15 Thread Ahmet Altay
I think it will be reasonable to disable/sickbay any flaky test that is
actively blocking people. Collective cost of flaky tests for such a large
group of contributors is very significant.

Most of these issues are unassigned. IMO, it makes sense to assign these
issues to the most relevant person (who added the test/who generally
maintains those components). Those people can either fix and re-enable the
tests, or remove them if they no longer provide valuable signals.

Ahmet

On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles  wrote:

> The situation is much worse than that IMO. My experience of the last few
> days is that a large portion of time went to *just connecting failing runs
> with the corresponding Jira tickets or filing new ones*.
>
> Summarized on PRs:
>
>  - https://github.com/apache/beam/pull/12272#issuecomment-659050891
>  - https://github.com/apache/beam/pull/12273#issuecomment-659070317
>  - https://github.com/apache/beam/pull/12225#issuecomment-656973073
>  - https://github.com/apache/beam/pull/12225#issuecomment-657743373
>  - https://github.com/apache/beam/pull/12224#issuecomment-657744481
>  - https://github.com/apache/beam/pull/12216#issuecomment-657735289
>  - https://github.com/apache/beam/pull/12216#issuecomment-657780781
>  - https://github.com/apache/beam/pull/12216#issuecomment-657799415
>
> The tickets:
>
>  - https://issues.apache.org/jira/browse/BEAM-10460
> SparkPortableExecutionTest
>  - https://issues.apache.org/jira/browse/BEAM-10471 CassandraIOTest >
> testEstimatedSizeBytes
>  - https://issues.apache.org/jira/browse/BEAM-10504 ElasticSearchIOTest >
> testWriteFullAddressing and testWriteWithIndexFn
>  - https://issues.apache.org/jira/browse/BEAM-10470 JdbcDriverTest
>  - https://issues.apache.org/jira/browse/BEAM-8025 CassandraIOTest
> > @BeforeClass (classmethod)
>  - https://issues.apache.org/jira/browse/BEAM-8454 FnHarnessTest
>  - https://issues.apache.org/jira/browse/BEAM-10506 SplunkEventWriterTest
>  - https://issues.apache.org/jira/browse/BEAM-10472 direct runner
> ParDoLifecycleTest
>  - https://issues.apache.org/jira/browse/BEAM-9187
> DefaultJobBundleFactoryTest
>
> Here are our P1 test flake bugs:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>
> It seems quite a few of them are actively hindering people right now.
>
> Kenn
>
> On Wed, Jul 15, 2020 at 4:23 PM Andrew Pilloud 
> wrote:
>
>> We have two test suites that are responsible for a large percentage of
>> our flaky tests and  both have bugs open for about a year without being
>> fixed. These suites are ParDoLifecycleTest (BEAM-8101
>> ) in Java
>> and BigQueryWriteIntegrationTests in python (py3 BEAM-9484
>> , py2 BEAM-9232
>> , old duplicate
>> BEAM-8197 ).
>>
>> Are there any volunteers to look into these issues? What can we do to
>> mitigate the flakiness until someone has time to investigate?
>>
>> Andrew
>>
>


Re: Chronically flaky tests

2020-07-15 Thread Kenneth Knowles
The situation is much worse than that IMO. My experience of the last few
days is that a large portion of time went to *just connecting failing runs
with the corresponding Jira tickets or filing new ones*.

Summarized on PRs:

 - https://github.com/apache/beam/pull/12272#issuecomment-659050891
 - https://github.com/apache/beam/pull/12273#issuecomment-659070317
 - https://github.com/apache/beam/pull/12225#issuecomment-656973073
 - https://github.com/apache/beam/pull/12225#issuecomment-657743373
 - https://github.com/apache/beam/pull/12224#issuecomment-657744481
 - https://github.com/apache/beam/pull/12216#issuecomment-657735289
 - https://github.com/apache/beam/pull/12216#issuecomment-657780781
 - https://github.com/apache/beam/pull/12216#issuecomment-657799415

The tickets:

 - https://issues.apache.org/jira/browse/BEAM-10460
SparkPortableExecutionTest
 - https://issues.apache.org/jira/browse/BEAM-10471 CassandraIOTest >
testEstimatedSizeBytes
 - https://issues.apache.org/jira/browse/BEAM-10504 ElasticSearchIOTest >
testWriteFullAddressing and testWriteWithIndexFn
 - https://issues.apache.org/jira/browse/BEAM-10470 JdbcDriverTest
 - https://issues.apache.org/jira/browse/BEAM-8025 CassandraIOTest
> @BeforeClass (classmethod)
 - https://issues.apache.org/jira/browse/BEAM-8454 FnHarnessTest
 - https://issues.apache.org/jira/browse/BEAM-10506 SplunkEventWriterTest
 - https://issues.apache.org/jira/browse/BEAM-10472 direct runner
ParDoLifecycleTest
 - https://issues.apache.org/jira/browse/BEAM-9187
DefaultJobBundleFactoryTest

Here are our P1 test flake bugs:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC

It seems quite a few of them are actively hindering people right now.

Kenn

On Wed, Jul 15, 2020 at 4:23 PM Andrew Pilloud  wrote:

> We have two test suites that are responsible for a large percentage of our
> flaky tests and  both have bugs open for about a year without being fixed.
> These suites are ParDoLifecycleTest (BEAM-8101
> ) in Java
> and BigQueryWriteIntegrationTests in python (py3 BEAM-9484
> , py2 BEAM-9232
> , old duplicate BEAM-8197
> ).
>
> Are there any volunteers to look into these issues? What can we do to
> mitigate the flakiness until someone has time to investigate?
>
> Andrew
>


Chronically flaky tests

2020-07-15 Thread Andrew Pilloud
We have two test suites that are responsible for a large percentage of our
flaky tests and  both have bugs open for about a year without being fixed.
These suites are ParDoLifecycleTest (BEAM-8101
) in Java
and BigQueryWriteIntegrationTests in python (py3 BEAM-9484
, py2 BEAM-9232
, old duplicate BEAM-8197
).

Are there any volunteers to look into these issues? What can we do to
mitigate the flakiness until someone has time to investigate?

Andrew