Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-28 Thread Sophie Blee-Goldman
Thanks for that link Ismael -- I'd never seen it before. That's super useful

In fact, it helped me realize another area where we could really improve
things. I took a look at the worst offenders on that chart and was
debugging the 2nd most flaky Streams test when I expanded the timeframe and
saw that it actually went from failing 0% of the time to being, well, the
2nd worst in all of Streams. The specific date it started failing made it
incredibly straightforward to find the culprit, so again, thanks for the
resource Ismael!

It's lucky this discussion picked up when it did because if I'd waited
another week to inspect this test, the 90-day retention would have eaten
the initial spike, and it would have been much, much harder to debug (the
culprit was not at all obvious without this hint)

But I think it's really concerning that we went almost three months without
noticing a new commit caused a formerly-stable test to start failing so
often. Obviously it's not practical to expect everyone to remember which
tests are flaky and which ones are not (yet), so it would be great if we
could set up some kind of alerting on sudden increases in test failure
rates. Have we ever looked into this?

On Tue, Nov 28, 2023 at 8:26 AM Chris Egerton 
wrote:

> Hi David,
>
> Thanks for continuing to drive this.
>
> Yes, ignoreFailures does come with some risk and it's only suitable as an
> intermediate step as part of a long-term plan that eventually gates all
> merges on builds with passing tests. I brought it up because it would allow
> us to immediately prevent any non-green builds from being merged. It also
> wouldn't practically change much in the review process; right now we have
> to manually check test results for every PR because the majority of builds
> are yellow due to flakiness, and that manual step would still be necessary
> if we started using ignoreFailures. If there's a way to allow yellow and
> green builds to be merged then there's no need to use ignoreFailures as
> that option would be strictly inferior.
>
> Cheers,
>
> Chris
>
> On Tue, Nov 28, 2023 at 2:53 AM David Jacot 
> wrote:
>
> > Hi all,
> >
> > I am still experimenting with reducing the noise of flaky tests in build
> > results. I should have results to share early next week.
> >
> > Chris, I am also for a programmatic gate. Regarding using ignoreFailures,
> > it seems risky because the build may be green but with failed tests, no?
> >
> > I would also like to make it clear that the current rule applies until we
> > agree on a way forward here. At minimum, I think that a build should be
> > yellow for all the combinations and the failed tests should have been
> > triaged to ensure that they are not related to the changes. We should not
> > merge when a build is red or has not completed.
> >
> > Best,
> > David
> >
> > On Sat, Nov 25, 2023 at 5:25 AM Chris Egerton 
> > wrote:
> >
> > > Hi all,
> > >
> > > There's a lot to catch up on here but I wanted to clarify something.
> > > Regarding this comment from Sophie:
> > >
> > >
> > > > Yet multiple people in this thread so
> > > far have voiced support for "gating merges on the successful completion
> > of
> > > all parts of the build except tests". Just to be totally clear, I
> really
> > > don't think that was ever in question -- though it certainly doesn't
> hurt
> > > to remind everyone.
> > >
> > > > So, this thread is not about whether or not to merge with failing
> > > *builds, *but it's
> > > whether it should be acceptable to merge with failing *tests.*
> > >
> > >
> > > I think there's a misunderstanding here. I was suggesting
> > > programmatic gating, not manual. If we could disable these types of
> > changes
> > > from being merged, instead of relying on committers to check and
> > interpret
> > > Jenkins results, that'd be a quick win IMO. And, because of the
> > > already-discussed issues with flaky tests, it seemed difficult to
> disable
> > > PRs from being merged with failing tests--just for other parts of the
> > > build.
> > >
> > > However, I think the retry logic brought up by David could be
> sufficient
> > to
> > > skip that kind of intermediate step and allow us to just start
> > > programmatically disabling PR merges if the build (including) tests
> > fails.
> > > But if anyone's interested, we can still prevent failing tests from
> > failing
> > > the build with the ignoreFailures property [1].
> > >
> > > [1] -
> > >
> > >
> >
> https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.Test.html#org.gradle.api.tasks.testing.Test:ignoreFailures
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Wed, Nov 22, 2023 at 3:00 AM Ismael Juma  wrote:
> > >
> > > > I think it breaks the Jenkins output otherwise. Feel free to test it
> > via
> > > a
> > > > PR.
> > > >
> > > > Ismael
> > > >
> > > > On Wed, Nov 22, 2023, 12:42 AM David Jacot
>  > >
> > > > wrote:
> > > >
> > > > > Hi Ismael,
> > > > >
> > > > > No, I was not aware of KAFKA-12216. My understanding is that w

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-28 Thread Chris Egerton
Hi David,

Thanks for continuing to drive this.

Yes, ignoreFailures does come with some risk and it's only suitable as an
intermediate step as part of a long-term plan that eventually gates all
merges on builds with passing tests. I brought it up because it would allow
us to immediately prevent any non-green builds from being merged. It also
wouldn't practically change much in the review process; right now we have
to manually check test results for every PR because the majority of builds
are yellow due to flakiness, and that manual step would still be necessary
if we started using ignoreFailures. If there's a way to allow yellow and
green builds to be merged then there's no need to use ignoreFailures as
that option would be strictly inferior.

Cheers,

Chris

On Tue, Nov 28, 2023 at 2:53 AM David Jacot 
wrote:

> Hi all,
>
> I am still experimenting with reducing the noise of flaky tests in build
> results. I should have results to share early next week.
>
> Chris, I am also for a programmatic gate. Regarding using ignoreFailures,
> it seems risky because the build may be green but with failed tests, no?
>
> I would also like to make it clear that the current rule applies until we
> agree on a way forward here. At minimum, I think that a build should be
> yellow for all the combinations and the failed tests should have been
> triaged to ensure that they are not related to the changes. We should not
> merge when a build is red or has not completed.
>
> Best,
> David
>
> On Sat, Nov 25, 2023 at 5:25 AM Chris Egerton 
> wrote:
>
> > Hi all,
> >
> > There's a lot to catch up on here but I wanted to clarify something.
> > Regarding this comment from Sophie:
> >
> >
> > > Yet multiple people in this thread so
> > far have voiced support for "gating merges on the successful completion
> of
> > all parts of the build except tests". Just to be totally clear, I really
> > don't think that was ever in question -- though it certainly doesn't hurt
> > to remind everyone.
> >
> > > So, this thread is not about whether or not to merge with failing
> > *builds, *but it's
> > whether it should be acceptable to merge with failing *tests.*
> >
> >
> > I think there's a misunderstanding here. I was suggesting
> > programmatic gating, not manual. If we could disable these types of
> changes
> > from being merged, instead of relying on committers to check and
> interpret
> > Jenkins results, that'd be a quick win IMO. And, because of the
> > already-discussed issues with flaky tests, it seemed difficult to disable
> > PRs from being merged with failing tests--just for other parts of the
> > build.
> >
> > However, I think the retry logic brought up by David could be sufficient
> to
> > skip that kind of intermediate step and allow us to just start
> > programmatically disabling PR merges if the build (including) tests
> fails.
> > But if anyone's interested, we can still prevent failing tests from
> failing
> > the build with the ignoreFailures property [1].
> >
> > [1] -
> >
> >
> https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.Test.html#org.gradle.api.tasks.testing.Test:ignoreFailures
> >
> > Cheers,
> >
> > Chris
> >
> > On Wed, Nov 22, 2023 at 3:00 AM Ismael Juma  wrote:
> >
> > > I think it breaks the Jenkins output otherwise. Feel free to test it
> via
> > a
> > > PR.
> > >
> > > Ismael
> > >
> > > On Wed, Nov 22, 2023, 12:42 AM David Jacot  >
> > > wrote:
> > >
> > > > Hi Ismael,
> > > >
> > > > No, I was not aware of KAFKA-12216. My understanding is that we could
> > > still
> > > > do it without the JUnitFlakyTestDataPublisher plugin and we could use
> > > > gradle enterprise for this. Or do you think that reporting the flaky
> > > tests
> > > > in the build results is required?
> > > >
> > > > David
> > > >
> > > > On Wed, Nov 22, 2023 at 9:35 AM Ismael Juma 
> wrote:
> > > >
> > > > > Hi David,
> > > > >
> > > > > Did you take a look at
> > > https://issues.apache.org/jira/browse/KAFKA-12216
> > > > ?
> > > > > I
> > > > > looked into this option already (yes, there isn't much that we
> > haven't
> > > > > considered in this space).
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Wed, Nov 22, 2023 at 12:24 AM David Jacot
> > >  > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Thanks for the good discussion and all the comments. Overall, it
> > > seems
> > > > > that
> > > > > > we all agree on the bad state of our CI. That's a good first
> step!
> > > > > >
> > > > > > I have talked to a few folks this week about it and it seems that
> > > many
> > > > > > folks (including me) are not comfortable with merging PRs at the
> > > moment
> > > > > > because the results of our builds are so bad. I had 40+ failed
> > tests
> > > in
> > > > > one
> > > > > > of my PRs, all unrelated to my changes. It is really hard to be
> > > > > productive
> > > > > > with this.
> > > > > >
> > > > > > Personally, I really want to move towards requiring a green build
> > to
> > > > > merge

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-27 Thread David Jacot
Hi all,

I am still experimenting with reducing the noise of flaky tests in build
results. I should have results to share early next week.

Chris, I am also for a programmatic gate. Regarding using ignoreFailures,
it seems risky because the build may be green but with failed tests, no?

I would also like to make it clear that the current rule applies until we
agree on a way forward here. At minimum, I think that a build should be
yellow for all the combinations and the failed tests should have been
triaged to ensure that they are not related to the changes. We should not
merge when a build is red or has not completed.

Best,
David

On Sat, Nov 25, 2023 at 5:25 AM Chris Egerton 
wrote:

> Hi all,
>
> There's a lot to catch up on here but I wanted to clarify something.
> Regarding this comment from Sophie:
>
>
> > Yet multiple people in this thread so
> far have voiced support for "gating merges on the successful completion of
> all parts of the build except tests". Just to be totally clear, I really
> don't think that was ever in question -- though it certainly doesn't hurt
> to remind everyone.
>
> > So, this thread is not about whether or not to merge with failing
> *builds, *but it's
> whether it should be acceptable to merge with failing *tests.*
>
>
> I think there's a misunderstanding here. I was suggesting
> programmatic gating, not manual. If we could disable these types of changes
> from being merged, instead of relying on committers to check and interpret
> Jenkins results, that'd be a quick win IMO. And, because of the
> already-discussed issues with flaky tests, it seemed difficult to disable
> PRs from being merged with failing tests--just for other parts of the
> build.
>
> However, I think the retry logic brought up by David could be sufficient to
> skip that kind of intermediate step and allow us to just start
> programmatically disabling PR merges if the build (including) tests fails.
> But if anyone's interested, we can still prevent failing tests from failing
> the build with the ignoreFailures property [1].
>
> [1] -
>
> https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.Test.html#org.gradle.api.tasks.testing.Test:ignoreFailures
>
> Cheers,
>
> Chris
>
> On Wed, Nov 22, 2023 at 3:00 AM Ismael Juma  wrote:
>
> > I think it breaks the Jenkins output otherwise. Feel free to test it via
> a
> > PR.
> >
> > Ismael
> >
> > On Wed, Nov 22, 2023, 12:42 AM David Jacot 
> > wrote:
> >
> > > Hi Ismael,
> > >
> > > No, I was not aware of KAFKA-12216. My understanding is that we could
> > still
> > > do it without the JUnitFlakyTestDataPublisher plugin and we could use
> > > gradle enterprise for this. Or do you think that reporting the flaky
> > tests
> > > in the build results is required?
> > >
> > > David
> > >
> > > On Wed, Nov 22, 2023 at 9:35 AM Ismael Juma  wrote:
> > >
> > > > Hi David,
> > > >
> > > > Did you take a look at
> > https://issues.apache.org/jira/browse/KAFKA-12216
> > > ?
> > > > I
> > > > looked into this option already (yes, there isn't much that we
> haven't
> > > > considered in this space).
> > > >
> > > > Ismael
> > > >
> > > > On Wed, Nov 22, 2023 at 12:24 AM David Jacot
> >  > > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Thanks for the good discussion and all the comments. Overall, it
> > seems
> > > > that
> > > > > we all agree on the bad state of our CI. That's a good first step!
> > > > >
> > > > > I have talked to a few folks this week about it and it seems that
> > many
> > > > > folks (including me) are not comfortable with merging PRs at the
> > moment
> > > > > because the results of our builds are so bad. I had 40+ failed
> tests
> > in
> > > > one
> > > > > of my PRs, all unrelated to my changes. It is really hard to be
> > > > productive
> > > > > with this.
> > > > >
> > > > > Personally, I really want to move towards requiring a green build
> to
> > > > merge
> > > > > to trunk because this is a clear and binary signal. I agree that we
> > > need
> > > > to
> > > > > stabilize the builds before we could even require this so here is
> my
> > > > > proposal.
> > > > >
> > > > > 1) We could leverage the `reports.junitXml.mergeReruns` option in
> > > gradle.
> > > > > From the doc [1]:
> > > > >
> > > > > > When mergeReruns is enabled, if a test fails but is then retried
> > and
> > > > > succeeds, its failures will be recorded as  instead
> of
> > > > > , within one . This is effectively the reporting
> > > > > produced by the surefire plugin of Apache Maven™ when enabling
> > reruns.
> > > If
> > > > > your CI server understands this format, it will indicate that the
> > test
> > > > was
> > > > > flaky. If it > does not, it will indicate that the test succeeded
> as
> > it
> > > > > will ignore the  information. If the test does not
> > > succeed
> > > > > (i.e. it fails for every retry), it will be indicated as having
> > failed
> > > > > whether your tool understands this format or not.
> > > > > > When mergeReruns is dis

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-24 Thread Chris Egerton
Hi all,

There's a lot to catch up on here but I wanted to clarify something.
Regarding this comment from Sophie:


> Yet multiple people in this thread so
far have voiced support for "gating merges on the successful completion of
all parts of the build except tests". Just to be totally clear, I really
don't think that was ever in question -- though it certainly doesn't hurt
to remind everyone.

> So, this thread is not about whether or not to merge with failing
*builds, *but it's
whether it should be acceptable to merge with failing *tests.*


I think there's a misunderstanding here. I was suggesting
programmatic gating, not manual. If we could disable these types of changes
from being merged, instead of relying on committers to check and interpret
Jenkins results, that'd be a quick win IMO. And, because of the
already-discussed issues with flaky tests, it seemed difficult to disable
PRs from being merged with failing tests--just for other parts of the build.

However, I think the retry logic brought up by David could be sufficient to
skip that kind of intermediate step and allow us to just start
programmatically disabling PR merges if the build (including) tests fails.
But if anyone's interested, we can still prevent failing tests from failing
the build with the ignoreFailures property [1].

[1] -
https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.Test.html#org.gradle.api.tasks.testing.Test:ignoreFailures

Cheers,

Chris

On Wed, Nov 22, 2023 at 3:00 AM Ismael Juma  wrote:

> I think it breaks the Jenkins output otherwise. Feel free to test it via a
> PR.
>
> Ismael
>
> On Wed, Nov 22, 2023, 12:42 AM David Jacot 
> wrote:
>
> > Hi Ismael,
> >
> > No, I was not aware of KAFKA-12216. My understanding is that we could
> still
> > do it without the JUnitFlakyTestDataPublisher plugin and we could use
> > gradle enterprise for this. Or do you think that reporting the flaky
> tests
> > in the build results is required?
> >
> > David
> >
> > On Wed, Nov 22, 2023 at 9:35 AM Ismael Juma  wrote:
> >
> > > Hi David,
> > >
> > > Did you take a look at
> https://issues.apache.org/jira/browse/KAFKA-12216
> > ?
> > > I
> > > looked into this option already (yes, there isn't much that we haven't
> > > considered in this space).
> > >
> > > Ismael
> > >
> > > On Wed, Nov 22, 2023 at 12:24 AM David Jacot
>  > >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Thanks for the good discussion and all the comments. Overall, it
> seems
> > > that
> > > > we all agree on the bad state of our CI. That's a good first step!
> > > >
> > > > I have talked to a few folks this week about it and it seems that
> many
> > > > folks (including me) are not comfortable with merging PRs at the
> moment
> > > > because the results of our builds are so bad. I had 40+ failed tests
> in
> > > one
> > > > of my PRs, all unrelated to my changes. It is really hard to be
> > > productive
> > > > with this.
> > > >
> > > > Personally, I really want to move towards requiring a green build to
> > > merge
> > > > to trunk because this is a clear and binary signal. I agree that we
> > need
> > > to
> > > > stabilize the builds before we could even require this so here is my
> > > > proposal.
> > > >
> > > > 1) We could leverage the `reports.junitXml.mergeReruns` option in
> > gradle.
> > > > From the doc [1]:
> > > >
> > > > > When mergeReruns is enabled, if a test fails but is then retried
> and
> > > > succeeds, its failures will be recorded as  instead of
> > > > , within one . This is effectively the reporting
> > > > produced by the surefire plugin of Apache Maven™ when enabling
> reruns.
> > If
> > > > your CI server understands this format, it will indicate that the
> test
> > > was
> > > > flaky. If it > does not, it will indicate that the test succeeded as
> it
> > > > will ignore the  information. If the test does not
> > succeed
> > > > (i.e. it fails for every retry), it will be indicated as having
> failed
> > > > whether your tool understands this format or not.
> > > > > When mergeReruns is disabled (the default), each execution of a
> test
> > > will
> > > > be listed as a separate test case.
> > > >
> > > > It would not resolve all the faky tests for sure but it would at
> least
> > > > reduce the noise. I see this as a means to get to green builds
> faster.
> > I
> > > > played a bit with this setting and I discovered [2]. I was hoping
> that
> > > [3]
> > > > could help to resolve it but I need to confirm.
> > > >
> > > > 2) I suppose that we would still have flaky tests preventing us from
> > > > getting a green build even with the setting in place. For those, I
> > think
> > > > that we need to review them one by one and decide whether we want to
> > fix
> > > or
> > > > disable them. This is a short term effort to help us get to green
> > builds.
> > > >
> > > > 3) When we get to a point where we can get green builds consistently,
> > we
> > > > could enforce it.
> > > >
> > > > 4) Flaky tests won't disappear with thi

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-22 Thread Ismael Juma
I think it breaks the Jenkins output otherwise. Feel free to test it via a
PR.

Ismael

On Wed, Nov 22, 2023, 12:42 AM David Jacot 
wrote:

> Hi Ismael,
>
> No, I was not aware of KAFKA-12216. My understanding is that we could still
> do it without the JUnitFlakyTestDataPublisher plugin and we could use
> gradle enterprise for this. Or do you think that reporting the flaky tests
> in the build results is required?
>
> David
>
> On Wed, Nov 22, 2023 at 9:35 AM Ismael Juma  wrote:
>
> > Hi David,
> >
> > Did you take a look at https://issues.apache.org/jira/browse/KAFKA-12216
> ?
> > I
> > looked into this option already (yes, there isn't much that we haven't
> > considered in this space).
> >
> > Ismael
> >
> > On Wed, Nov 22, 2023 at 12:24 AM David Jacot  >
> > wrote:
> >
> > > Hi all,
> > >
> > > Thanks for the good discussion and all the comments. Overall, it seems
> > that
> > > we all agree on the bad state of our CI. That's a good first step!
> > >
> > > I have talked to a few folks this week about it and it seems that many
> > > folks (including me) are not comfortable with merging PRs at the moment
> > > because the results of our builds are so bad. I had 40+ failed tests in
> > one
> > > of my PRs, all unrelated to my changes. It is really hard to be
> > productive
> > > with this.
> > >
> > > Personally, I really want to move towards requiring a green build to
> > merge
> > > to trunk because this is a clear and binary signal. I agree that we
> need
> > to
> > > stabilize the builds before we could even require this so here is my
> > > proposal.
> > >
> > > 1) We could leverage the `reports.junitXml.mergeReruns` option in
> gradle.
> > > From the doc [1]:
> > >
> > > > When mergeReruns is enabled, if a test fails but is then retried and
> > > succeeds, its failures will be recorded as  instead of
> > > , within one . This is effectively the reporting
> > > produced by the surefire plugin of Apache Maven™ when enabling reruns.
> If
> > > your CI server understands this format, it will indicate that the test
> > was
> > > flaky. If it > does not, it will indicate that the test succeeded as it
> > > will ignore the  information. If the test does not
> succeed
> > > (i.e. it fails for every retry), it will be indicated as having failed
> > > whether your tool understands this format or not.
> > > > When mergeReruns is disabled (the default), each execution of a test
> > will
> > > be listed as a separate test case.
> > >
> > > It would not resolve all the faky tests for sure but it would at least
> > > reduce the noise. I see this as a means to get to green builds faster.
> I
> > > played a bit with this setting and I discovered [2]. I was hoping that
> > [3]
> > > could help to resolve it but I need to confirm.
> > >
> > > 2) I suppose that we would still have flaky tests preventing us from
> > > getting a green build even with the setting in place. For those, I
> think
> > > that we need to review them one by one and decide whether we want to
> fix
> > or
> > > disable them. This is a short term effort to help us get to green
> builds.
> > >
> > > 3) When we get to a point where we can get green builds consistently,
> we
> > > could enforce it.
> > >
> > > 4) Flaky tests won't disappear with this. They are just hidden.
> > Therefore,
> > > we also need a process to review the flaky tests and address them.
> Here,
> > I
> > > think that we could leverage the dashboard shared by Ismael. One
> > > possibility would be to review it regularly and decide for each test
> > > whether it should be fixed, disabled or even removed.
> > >
> > > Please let me know what you think.
> > >
> > > Another angle that we could consider is improving the CI infrastructure
> > as
> > > well. I think that many of those flaky tests are due to overloaded
> > Jenkins
> > > workers. We should perhaps discuss with the infra team to see whether
> we
> > > could do something there.
> > >
> > > Best,
> > > David
> > >
> > > [1]
> > >
> https://docs.gradle.org/current/userguide/java_testing.html#mergereruns
> > > [2] https://github.com/gradle/gradle/issues/23324
> > > [3] https://github.com/apache/kafka/pull/14687
> > >
> > >
> > > On Wed, Nov 22, 2023 at 4:10 AM Ismael Juma  wrote:
> > >
> > > > Hi,
> > > >
> > > > We have a dashboard already:
> > > >
> > > > [image: image.png]
> > > >
> > > >
> > > >
> > >
> >
> https://ge.apache.org/scans/tests?search.names=Git%20branch&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=America%2FLos_Angeles&search.values=trunk&tests.sortField=FLAKY
> > > >
> > > > On Tue, Nov 14, 2023 at 10:41 PM Николай Ижиков  >
> > > > wrote:
> > > >
> > > >> Hello guys.
> > > >>
> > > >> I want to tell you about one more approach to deal with flaky tests.
> > > >> We adopt this approach in Apache Ignite community, so may be it can
> be
> > > >> helpful for Kafka, also.
> > > >>
> > > >> TL;DR: Apache Ignite community have a tool that provide a statistic
> of
> > > >> tests and can 

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-22 Thread David Jacot
Hi Ismael,

No, I was not aware of KAFKA-12216. My understanding is that we could still
do it without the JUnitFlakyTestDataPublisher plugin and we could use
gradle enterprise for this. Or do you think that reporting the flaky tests
in the build results is required?

David

On Wed, Nov 22, 2023 at 9:35 AM Ismael Juma  wrote:

> Hi David,
>
> Did you take a look at https://issues.apache.org/jira/browse/KAFKA-12216?
> I
> looked into this option already (yes, there isn't much that we haven't
> considered in this space).
>
> Ismael
>
> On Wed, Nov 22, 2023 at 12:24 AM David Jacot 
> wrote:
>
> > Hi all,
> >
> > Thanks for the good discussion and all the comments. Overall, it seems
> that
> > we all agree on the bad state of our CI. That's a good first step!
> >
> > I have talked to a few folks this week about it and it seems that many
> > folks (including me) are not comfortable with merging PRs at the moment
> > because the results of our builds are so bad. I had 40+ failed tests in
> one
> > of my PRs, all unrelated to my changes. It is really hard to be
> productive
> > with this.
> >
> > Personally, I really want to move towards requiring a green build to
> merge
> > to trunk because this is a clear and binary signal. I agree that we need
> to
> > stabilize the builds before we could even require this so here is my
> > proposal.
> >
> > 1) We could leverage the `reports.junitXml.mergeReruns` option in gradle.
> > From the doc [1]:
> >
> > > When mergeReruns is enabled, if a test fails but is then retried and
> > succeeds, its failures will be recorded as  instead of
> > , within one . This is effectively the reporting
> > produced by the surefire plugin of Apache Maven™ when enabling reruns. If
> > your CI server understands this format, it will indicate that the test
> was
> > flaky. If it > does not, it will indicate that the test succeeded as it
> > will ignore the  information. If the test does not succeed
> > (i.e. it fails for every retry), it will be indicated as having failed
> > whether your tool understands this format or not.
> > > When mergeReruns is disabled (the default), each execution of a test
> will
> > be listed as a separate test case.
> >
> > It would not resolve all the faky tests for sure but it would at least
> > reduce the noise. I see this as a means to get to green builds faster. I
> > played a bit with this setting and I discovered [2]. I was hoping that
> [3]
> > could help to resolve it but I need to confirm.
> >
> > 2) I suppose that we would still have flaky tests preventing us from
> > getting a green build even with the setting in place. For those, I think
> > that we need to review them one by one and decide whether we want to fix
> or
> > disable them. This is a short term effort to help us get to green builds.
> >
> > 3) When we get to a point where we can get green builds consistently, we
> > could enforce it.
> >
> > 4) Flaky tests won't disappear with this. They are just hidden.
> Therefore,
> > we also need a process to review the flaky tests and address them. Here,
> I
> > think that we could leverage the dashboard shared by Ismael. One
> > possibility would be to review it regularly and decide for each test
> > whether it should be fixed, disabled or even removed.
> >
> > Please let me know what you think.
> >
> > Another angle that we could consider is improving the CI infrastructure
> as
> > well. I think that many of those flaky tests are due to overloaded
> Jenkins
> > workers. We should perhaps discuss with the infra team to see whether we
> > could do something there.
> >
> > Best,
> > David
> >
> > [1]
> > https://docs.gradle.org/current/userguide/java_testing.html#mergereruns
> > [2] https://github.com/gradle/gradle/issues/23324
> > [3] https://github.com/apache/kafka/pull/14687
> >
> >
> > On Wed, Nov 22, 2023 at 4:10 AM Ismael Juma  wrote:
> >
> > > Hi,
> > >
> > > We have a dashboard already:
> > >
> > > [image: image.png]
> > >
> > >
> > >
> >
> https://ge.apache.org/scans/tests?search.names=Git%20branch&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=America%2FLos_Angeles&search.values=trunk&tests.sortField=FLAKY
> > >
> > > On Tue, Nov 14, 2023 at 10:41 PM Николай Ижиков 
> > > wrote:
> > >
> > >> Hello guys.
> > >>
> > >> I want to tell you about one more approach to deal with flaky tests.
> > >> We adopt this approach in Apache Ignite community, so may be it can be
> > >> helpful for Kafka, also.
> > >>
> > >> TL;DR: Apache Ignite community have a tool that provide a statistic of
> > >> tests and can tell if PR introduces new failures.
> > >>
> > >> Apache Ignite has a many tests.
> > >> Latest «Run All» contains around 75k.
> > >> Most of test has integration style therefore count of flacky are
> > >> significant.
> > >>
> > >> We build a tool - Team City Bot [1]
> > >> That provides a combined statistic of flaky tests [2]
> > >>
> > >> This tool can compare results of Run All for PR and master.
> > >> If all OK one 

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-22 Thread Ismael Juma
Hi David,

Did you take a look at https://issues.apache.org/jira/browse/KAFKA-12216? I
looked into this option already (yes, there isn't much that we haven't
considered in this space).

Ismael

On Wed, Nov 22, 2023 at 12:24 AM David Jacot 
wrote:

> Hi all,
>
> Thanks for the good discussion and all the comments. Overall, it seems that
> we all agree on the bad state of our CI. That's a good first step!
>
> I have talked to a few folks this week about it and it seems that many
> folks (including me) are not comfortable with merging PRs at the moment
> because the results of our builds are so bad. I had 40+ failed tests in one
> of my PRs, all unrelated to my changes. It is really hard to be productive
> with this.
>
> Personally, I really want to move towards requiring a green build to merge
> to trunk because this is a clear and binary signal. I agree that we need to
> stabilize the builds before we could even require this so here is my
> proposal.
>
> 1) We could leverage the `reports.junitXml.mergeReruns` option in gradle.
> From the doc [1]:
>
> > When mergeReruns is enabled, if a test fails but is then retried and
> succeeds, its failures will be recorded as  instead of
> , within one . This is effectively the reporting
> produced by the surefire plugin of Apache Maven™ when enabling reruns. If
> your CI server understands this format, it will indicate that the test was
> flaky. If it > does not, it will indicate that the test succeeded as it
> will ignore the  information. If the test does not succeed
> (i.e. it fails for every retry), it will be indicated as having failed
> whether your tool understands this format or not.
> > When mergeReruns is disabled (the default), each execution of a test will
> be listed as a separate test case.
>
> It would not resolve all the faky tests for sure but it would at least
> reduce the noise. I see this as a means to get to green builds faster. I
> played a bit with this setting and I discovered [2]. I was hoping that [3]
> could help to resolve it but I need to confirm.
>
> 2) I suppose that we would still have flaky tests preventing us from
> getting a green build even with the setting in place. For those, I think
> that we need to review them one by one and decide whether we want to fix or
> disable them. This is a short term effort to help us get to green builds.
>
> 3) When we get to a point where we can get green builds consistently, we
> could enforce it.
>
> 4) Flaky tests won't disappear with this. They are just hidden. Therefore,
> we also need a process to review the flaky tests and address them. Here, I
> think that we could leverage the dashboard shared by Ismael. One
> possibility would be to review it regularly and decide for each test
> whether it should be fixed, disabled or even removed.
>
> Please let me know what you think.
>
> Another angle that we could consider is improving the CI infrastructure as
> well. I think that many of those flaky tests are due to overloaded Jenkins
> workers. We should perhaps discuss with the infra team to see whether we
> could do something there.
>
> Best,
> David
>
> [1]
> https://docs.gradle.org/current/userguide/java_testing.html#mergereruns
> [2] https://github.com/gradle/gradle/issues/23324
> [3] https://github.com/apache/kafka/pull/14687
>
>
> On Wed, Nov 22, 2023 at 4:10 AM Ismael Juma  wrote:
>
> > Hi,
> >
> > We have a dashboard already:
> >
> > [image: image.png]
> >
> >
> >
> https://ge.apache.org/scans/tests?search.names=Git%20branch&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=America%2FLos_Angeles&search.values=trunk&tests.sortField=FLAKY
> >
> > On Tue, Nov 14, 2023 at 10:41 PM Николай Ижиков 
> > wrote:
> >
> >> Hello guys.
> >>
> >> I want to tell you about one more approach to deal with flaky tests.
> >> We adopt this approach in Apache Ignite community, so may be it can be
> >> helpful for Kafka, also.
> >>
> >> TL;DR: Apache Ignite community have a tool that provide a statistic of
> >> tests and can tell if PR introduces new failures.
> >>
> >> Apache Ignite has a many tests.
> >> Latest «Run All» contains around 75k.
> >> Most of test has integration style therefore count of flacky are
> >> significant.
> >>
> >> We build a tool - Team City Bot [1]
> >> That provides a combined statistic of flaky tests [2]
> >>
> >> This tool can compare results of Run All for PR and master.
> >> If all OK one can comment jira ticket with a visa from bot [3]
> >>
> >> Visa is a quality proof of PR for Ignite committers.
> >> And we can sort out most flaky tests and prioritize fixes with the bot
> >> statistic [2]
> >>
> >> TC bot integrated with the Team City only, for now.
> >> But, if Kafka community interested we can try to integrate it with
> >> Jenkins.
> >>
> >> [1] https://github.com/apache/ignite-teamcity-bot
> >> [2]
> https://tcbot2.sbt-ignite-dev.ru/current.html?branch=master&count=10
> >> [3]
> >>
> https://issues.apache.org/jira/browse/IGNITE-19950?focusedComment

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-22 Thread David Jacot
Hi all,

Thanks for the good discussion and all the comments. Overall, it seems that
we all agree on the bad state of our CI. That's a good first step!

I have talked to a few folks this week about it and it seems that many
folks (including me) are not comfortable with merging PRs at the moment
because the results of our builds are so bad. I had 40+ failed tests in one
of my PRs, all unrelated to my changes. It is really hard to be productive
with this.

Personally, I really want to move towards requiring a green build to merge
to trunk because this is a clear and binary signal. I agree that we need to
stabilize the builds before we could even require this so here is my
proposal.

1) We could leverage the `reports.junitXml.mergeReruns` option in gradle.
>From the doc [1]:

> When mergeReruns is enabled, if a test fails but is then retried and
succeeds, its failures will be recorded as  instead of
, within one . This is effectively the reporting
produced by the surefire plugin of Apache Maven™ when enabling reruns. If
your CI server understands this format, it will indicate that the test was
flaky. If it > does not, it will indicate that the test succeeded as it
will ignore the  information. If the test does not succeed
(i.e. it fails for every retry), it will be indicated as having failed
whether your tool understands this format or not.
> When mergeReruns is disabled (the default), each execution of a test will
be listed as a separate test case.

It would not resolve all the faky tests for sure but it would at least
reduce the noise. I see this as a means to get to green builds faster. I
played a bit with this setting and I discovered [2]. I was hoping that [3]
could help to resolve it but I need to confirm.

2) I suppose that we would still have flaky tests preventing us from
getting a green build even with the setting in place. For those, I think
that we need to review them one by one and decide whether we want to fix or
disable them. This is a short term effort to help us get to green builds.

3) When we get to a point where we can get green builds consistently, we
could enforce it.

4) Flaky tests won't disappear with this. They are just hidden. Therefore,
we also need a process to review the flaky tests and address them. Here, I
think that we could leverage the dashboard shared by Ismael. One
possibility would be to review it regularly and decide for each test
whether it should be fixed, disabled or even removed.

Please let me know what you think.

Another angle that we could consider is improving the CI infrastructure as
well. I think that many of those flaky tests are due to overloaded Jenkins
workers. We should perhaps discuss with the infra team to see whether we
could do something there.

Best,
David

[1] https://docs.gradle.org/current/userguide/java_testing.html#mergereruns
[2] https://github.com/gradle/gradle/issues/23324
[3] https://github.com/apache/kafka/pull/14687


On Wed, Nov 22, 2023 at 4:10 AM Ismael Juma  wrote:

> Hi,
>
> We have a dashboard already:
>
> [image: image.png]
>
>
> https://ge.apache.org/scans/tests?search.names=Git%20branch&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=America%2FLos_Angeles&search.values=trunk&tests.sortField=FLAKY
>
> On Tue, Nov 14, 2023 at 10:41 PM Николай Ижиков 
> wrote:
>
>> Hello guys.
>>
>> I want to tell you about one more approach to deal with flaky tests.
>> We adopt this approach in Apache Ignite community, so may be it can be
>> helpful for Kafka, also.
>>
>> TL;DR: Apache Ignite community have a tool that provide a statistic of
>> tests and can tell if PR introduces new failures.
>>
>> Apache Ignite has a many tests.
>> Latest «Run All» contains around 75k.
>> Most of test has integration style therefore count of flacky are
>> significant.
>>
>> We build a tool - Team City Bot [1]
>> That provides a combined statistic of flaky tests [2]
>>
>> This tool can compare results of Run All for PR and master.
>> If all OK one can comment jira ticket with a visa from bot [3]
>>
>> Visa is a quality proof of PR for Ignite committers.
>> And we can sort out most flaky tests and prioritize fixes with the bot
>> statistic [2]
>>
>> TC bot integrated with the Team City only, for now.
>> But, if Kafka community interested we can try to integrate it with
>> Jenkins.
>>
>> [1] https://github.com/apache/ignite-teamcity-bot
>> [2] https://tcbot2.sbt-ignite-dev.ru/current.html?branch=master&count=10
>> [3]
>> https://issues.apache.org/jira/browse/IGNITE-19950?focusedCommentId=17767394&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17767394
>>
>>
>>
>> > 15 нояб. 2023 г., в 09:18, Ismael Juma  написал(а):
>> >
>> > To use the pain analogy, people seem to have really good painkillers and
>> > hence they somehow don't feel the pain already. ;)
>> >
>> > The reality is that important and high quality tests will get fixed.
>> Poor
>> > quality tests (low signal to noise ratio) might not get fixed a

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-21 Thread Ismael Juma
Hi,

We have a dashboard already:

[image: image.png]

https://ge.apache.org/scans/tests?search.names=Git%20branch&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=America%2FLos_Angeles&search.values=trunk&tests.sortField=FLAKY

On Tue, Nov 14, 2023 at 10:41 PM Николай Ижиков  wrote:

> Hello guys.
>
> I want to tell you about one more approach to deal with flaky tests.
> We adopt this approach in Apache Ignite community, so may be it can be
> helpful for Kafka, also.
>
> TL;DR: Apache Ignite community have a tool that provide a statistic of
> tests and can tell if PR introduces new failures.
>
> Apache Ignite has a many tests.
> Latest «Run All» contains around 75k.
> Most of test has integration style therefore count of flacky are
> significant.
>
> We build a tool - Team City Bot [1]
> That provides a combined statistic of flaky tests [2]
>
> This tool can compare results of Run All for PR and master.
> If all OK one can comment jira ticket with a visa from bot [3]
>
> Visa is a quality proof of PR for Ignite committers.
> And we can sort out most flaky tests and prioritize fixes with the bot
> statistic [2]
>
> TC bot integrated with the Team City only, for now.
> But, if Kafka community interested we can try to integrate it with Jenkins.
>
> [1] https://github.com/apache/ignite-teamcity-bot
> [2] https://tcbot2.sbt-ignite-dev.ru/current.html?branch=master&count=10
> [3]
> https://issues.apache.org/jira/browse/IGNITE-19950?focusedCommentId=17767394&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17767394
>
>
>
> > 15 нояб. 2023 г., в 09:18, Ismael Juma  написал(а):
> >
> > To use the pain analogy, people seem to have really good painkillers and
> > hence they somehow don't feel the pain already. ;)
> >
> > The reality is that important and high quality tests will get fixed. Poor
> > quality tests (low signal to noise ratio) might not get fixed and that's
> ok.
> >
> > I'm not opposed to marking the tests as release blockers as a starting
> > point, but I'm saying it's fine if people triage them and decide they are
> > not blockers. In fact, that has already happened in the past.
> >
> > Ismael
> >
> > On Tue, Nov 14, 2023 at 10:02 PM Matthias J. Sax 
> wrote:
> >
> >> I agree on the test gap argument. However, my worry is, if we don't
> >> "force the pain", it won't get fixed at all. -- I also know, that we try
> >> to find an working approach for many years...
> >>
> >> My take is that if we disable a test and file a non-blocking Jira, it's
> >> basically the same as just deleting the test all together and never talk
> >> about it again. -- I believe, this is not want we aim for, but we aim
> >> for good test coverage and a way to get these test fixed?
> >>
> >> Thus IMHO we need some forcing function (either keep the tests and feel
> >> the pain on every PR), or disable the test and file a blocker JIRA so
> >> the pain surfaces on a release forcing us to do something about it.
> >>
> >> If there is no forcing function, it basically means we are willing to
> >> accept test gaps forever.
> >>
> >>
> >> -Matthias
> >>
> >> On 11/14/23 9:09 PM, Ismael Juma wrote:
> >>> Matthias,
> >>>
> >>> Flaky tests are worse than useless. I know engineers find it hard to
> >>> disable them because of the supposed test gap (I find it hard too), but
> >> the
> >>> truth is that the test gap is already there! No-one blocks merging PRs
> on
> >>> flaky tests, but they do get used to ignoring build failures.
> >>>
> >>> The current approach has been attempted for nearly a decade and it has
> >>> never worked. I think we should try something different.
> >>>
> >>> When it comes to marking flaky tests as release blockers, I don't think
> >>> this should be done as a general rule. We should instead assess on a
> case
> >>> by case basis, same way we do it for bugs.
> >>>
> >>> Ismael
> >>>
> >>> On Tue, Nov 14, 2023 at 5:02 PM Matthias J. Sax 
> >> wrote:
> >>>
>  Thanks for starting this discussion David! I totally agree to "no"!
> 
>  I think there is no excuse whatsoever for merging PRs with compilation
>  errors (except an honest mistake for conflicting PRs that got merged
>  interleaved). -- Every committer must(!) check the Jenkins status
> before
>  merging to avoid such an issue.
> 
>  Similar for actual permanently broken tests. If there is no green
> build,
>  and the same test failed across multiple Jenkins runs, a committer
>  should detect this and cannot merge a PR.
> 
>  Given the current state of the CI pipeline, it seems possible to get
>  green runs, and thus I support the policy (that we actually always
> had)
>  to only merge if there is at least one green build. If committers got
>  sloppy about this, we need to call it out and put a hold on this
> >> practice.
> 
>  (The only exception from the above policy would be a very unstable
>  status for which getting a green build

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-21 Thread Matthias J. Sax

Thanks Sophie. Overall I agree with you.

I think 50% is too high as a general rule, and I believe something like 
30% might be more appropriate (going lower, given the infrastructure at 
hand might become too aggressive)?


The difficult part about a policy like this is, that we don't really 
have statistics about it, so it will be up to the reviewers to keep 
monitoring and raising an "alert" (on the dev mailing list?) if we see 
too many failing builds.


How to achieve this? Personally, I think it would be best to agree on a 
"bug bash sprint". Not sure if folks are willing to sign up for this? 
The main question might be "when"? We recently discuss test stability in 
our team, and don't see capacity now, but want to invest more in Q1. 
(Btw: this issue goes beyond the build but also affect system tests.)



How long do we have to take action in the above cases?


You propose one day, but that's tricky? In the end, if a test is flaky, 
we won't know right away? Would we?



How do we track failures?


I also did file tickets and added comments to test in the past when they 
did re-fail. While labor intensive, it seems it worked best in the past. 
Justine mentioned Gradle Enterprise (I did not look into it yet, but 
maybe it can help to reduce manual labor)?



How often does a test need to fail to be considered flaky enough to take action?


Guess it's a judgement call, and I don't have a strong opinion. But I 
agree, that we might want to write down a rule. But again, a rule only 
make sense if we have data. If reviewers don't pay attention and don't 
comment on tickets to we can count how often a test fails, any number we 
put down won't be helpful.



In the end, to me it boils down to the willingness of all of us to 
tackle it, and to _make time_ to address flaky tests. On our side (ie, 
KS teams at Confluent), we want to put more time aside for this in 
quarterly planning, but in the end, without reliable data it's hard to 
know which tests to spent time on for the biggest bang for the bug. 
Thus, to me the second corner stone is to put the manual labor into 
tracking the frequency of flaky tests, what is a group effort.



-Matthias


On 11/21/23 1:33 PM, Sophie Blee-Goldman wrote:

For some concrete data, here are the stats for the latest build on two
community PRs I am currently reviewing:

https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14648/16/tests
- 18 unrelated test failures
- 13 unique tests
- only 1 out of 4 JDK builds were green with all tests passing

https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14735/4/tests
- 44(!) unrelated test failures
- not even going to count up the unique tests because there were too many
- 0 out of 4 JDK builds were green
- this particular build seemed to have a number of infra/timeout issues, so
it may have exacerbated the flakiness beyond the "usual", although as
others have noted, unstable infra is not uncommon

My point is, we clearly have a long way to go if we want to start enforcing
this policy and have any hope of merging any PRs and not driving away our
community of contributors.

This isn't meant to discourage anyone, actually the opposite: if we want to
start gating PRs on passing builds, we need to start tackling flaky tests
now!

On Tue, Nov 21, 2023 at 1:19 PM Sophie Blee-Goldman 
wrote:


In the interest of moving things forward, here is what we would need (in
my opinion) to start enforcing this:

1. Get overall build failure percentage under a certain threshold
   1. What is an acceptable number here?
   2. How do we achieve this: wait until all of them are fixed,
   disable everything that's flaky right away, etc
2. Come up with concrete policy rules so there's no confusion. I think
we need to agree on answers for these questions at least:
1. What happens if a new PR introduces a new test that is revealed to
   be flaky?
   2. What happens if a new PR makes an old test become flaky?
   3. How long do we have to take action in the above cases?
   4. How do we track failures?
   5. How often does a test need to fail to be considered flaky enough
   to take action?

Here's my take on these questions, but would love to hear from others:

1.1) Imo the failure rate has to be under 50% at the very highest. At 50
we still have half the builds failing, but 75% would pass after just one
retry, and 87.5% after two retries. Is that acceptable? Maybe we should aim
to kick things off with a higher success rate to give ourselves some wiggle
room over time, and have 50% be the absolute maximum failure rate -- if it
ever gets beyond that we trigger an emergency response (whether that be
blocking all feature work until the test failures are addressed, ending
this policy, etc)
1.2) I think we'd probably all agree that there's no way we'll be able to
triage all of the currently flaky tests in a reasonable time frame, but I'm
also wary of 

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-21 Thread Sophie Blee-Goldman
For some concrete data, here are the stats for the latest build on two
community PRs I am currently reviewing:

https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14648/16/tests
- 18 unrelated test failures
- 13 unique tests
- only 1 out of 4 JDK builds were green with all tests passing

https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14735/4/tests
- 44(!) unrelated test failures
- not even going to count up the unique tests because there were too many
- 0 out of 4 JDK builds were green
- this particular build seemed to have a number of infra/timeout issues, so
it may have exacerbated the flakiness beyond the "usual", although as
others have noted, unstable infra is not uncommon

My point is, we clearly have a long way to go if we want to start enforcing
this policy and have any hope of merging any PRs and not driving away our
community of contributors.

This isn't meant to discourage anyone, actually the opposite: if we want to
start gating PRs on passing builds, we need to start tackling flaky tests
now!

On Tue, Nov 21, 2023 at 1:19 PM Sophie Blee-Goldman 
wrote:

> In the interest of moving things forward, here is what we would need (in
> my opinion) to start enforcing this:
>
>1. Get overall build failure percentage under a certain threshold
>   1. What is an acceptable number here?
>   2. How do we achieve this: wait until all of them are fixed,
>   disable everything that's flaky right away, etc
>2. Come up with concrete policy rules so there's no confusion. I think
>we need to agree on answers for these questions at least:
>1. What happens if a new PR introduces a new test that is revealed to
>   be flaky?
>   2. What happens if a new PR makes an old test become flaky?
>   3. How long do we have to take action in the above cases?
>   4. How do we track failures?
>   5. How often does a test need to fail to be considered flaky enough
>   to take action?
>
> Here's my take on these questions, but would love to hear from others:
>
> 1.1) Imo the failure rate has to be under 50% at the very highest. At 50
> we still have half the builds failing, but 75% would pass after just one
> retry, and 87.5% after two retries. Is that acceptable? Maybe we should aim
> to kick things off with a higher success rate to give ourselves some wiggle
> room over time, and have 50% be the absolute maximum failure rate -- if it
> ever gets beyond that we trigger an emergency response (whether that be
> blocking all feature work until the test failures are addressed, ending
> this policy, etc)
> 1.2) I think we'd probably all agree that there's no way we'll be able to
> triage all of the currently flaky tests in a reasonable time frame, but I'm
> also wary of just disabling everything and forgetting them. Should we
> consider making all of the currently-disabled tests a 3.7 release blocker?
> Should we wait until after 3.7 to disable them and implement this policy to
> give us a longer time to address them? Will we just procrastinate until the
> next release deadline anyways? Many open questions here
>
> 2.1) We can either revert the entire PR or just disable the specific
> test(s) that are failing. Personally I'm in favor of giving a short window
> (see point #3) for the author to attempt a fix or else disable the specific
> failing test, otherwise we revert the entire commit. This should guarantee
> that tests aren't just introduced, disabled, and never looked at again. My
> main concern here is that sometimes reverting a commit can mean resolving
> merge conflicts that not everyone will have the context to do properly.
> This is just another reason to keep the window short.
> 2.2) Again I think we should allow a (short) window of time for someone
> with context to triage and take action, but after that, we need to either
> revert the PR or disable the test. Neither of those feels like a good
> option, since new changes can often induce timeout-related failures in my
> experience, which are solely a problem with the test and not with the
> change. Maybe if the test is failing very frequently/most of the time, then
> we should revert the change, but if it becomes only occasionally flaky,
> then we should disable the test, file a ticket, and mark it a blocker until
> it can be triaged (the triaging can of course include demoting it from a
> blocker if someone with context determines the test itself is at fault and
> offers little value).
> 2.3) I think any longer than a day (or over the weekend) starts to become
> an issue. Of course, we can/will be more lenient with tests that fail less
> often, especially since it might just take a while to notice
> 2.4) Presumably the best way is to file tickets for every failure and
> comment on that ticket for each subsequent failure. I used to do this quite
> religiously, but I have to admit, it was actually a non-trivial timesuck.
> If we can get to a low enough leve

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-21 Thread Sophie Blee-Goldman
In the interest of moving things forward, here is what we would need (in my
opinion) to start enforcing this:

   1. Get overall build failure percentage under a certain threshold
  1. What is an acceptable number here?
  2. How do we achieve this: wait until all of them are fixed, disable
  everything that's flaky right away, etc
   2. Come up with concrete policy rules so there's no confusion. I think
   we need to agree on answers for these questions at least:
   1. What happens if a new PR introduces a new test that is revealed to be
  flaky?
  2. What happens if a new PR makes an old test become flaky?
  3. How long do we have to take action in the above cases?
  4. How do we track failures?
  5. How often does a test need to fail to be considered flaky enough
  to take action?

Here's my take on these questions, but would love to hear from others:

1.1) Imo the failure rate has to be under 50% at the very highest. At 50 we
still have half the builds failing, but 75% would pass after just one
retry, and 87.5% after two retries. Is that acceptable? Maybe we should aim
to kick things off with a higher success rate to give ourselves some wiggle
room over time, and have 50% be the absolute maximum failure rate -- if it
ever gets beyond that we trigger an emergency response (whether that be
blocking all feature work until the test failures are addressed, ending
this policy, etc)
1.2) I think we'd probably all agree that there's no way we'll be able to
triage all of the currently flaky tests in a reasonable time frame, but I'm
also wary of just disabling everything and forgetting them. Should we
consider making all of the currently-disabled tests a 3.7 release blocker?
Should we wait until after 3.7 to disable them and implement this policy to
give us a longer time to address them? Will we just procrastinate until the
next release deadline anyways? Many open questions here

2.1) We can either revert the entire PR or just disable the specific
test(s) that are failing. Personally I'm in favor of giving a short window
(see point #3) for the author to attempt a fix or else disable the specific
failing test, otherwise we revert the entire commit. This should guarantee
that tests aren't just introduced, disabled, and never looked at again. My
main concern here is that sometimes reverting a commit can mean resolving
merge conflicts that not everyone will have the context to do properly.
This is just another reason to keep the window short.
2.2) Again I think we should allow a (short) window of time for someone
with context to triage and take action, but after that, we need to either
revert the PR or disable the test. Neither of those feels like a good
option, since new changes can often induce timeout-related failures in my
experience, which are solely a problem with the test and not with the
change. Maybe if the test is failing very frequently/most of the time, then
we should revert the change, but if it becomes only occasionally flaky,
then we should disable the test, file a ticket, and mark it a blocker until
it can be triaged (the triaging can of course include demoting it from a
blocker if someone with context determines the test itself is at fault and
offers little value).
2.3) I think any longer than a day (or over the weekend) starts to become
an issue. Of course, we can/will be more lenient with tests that fail less
often, especially since it might just take a while to notice
2.4) Presumably the best way is to file tickets for every failure and
comment on that ticket for each subsequent failure. I used to do this quite
religiously, but I have to admit, it was actually a non-trivial timesuck.
If we can get to a low enough level of failure, however, this should be
more reasonable. We just need to trust everyone to pay attention and follow
up on failures they see. And will probably need to come up with a system to
avoid overcounting by different reviewers of the same PR/build
2.5) This one is actually the most tricky in my opinion. I think everyone
would agree that if a test is failing on all or most of the JDK runs within
a single build, it is problematic. And probably most of us would agree that
if it fails even once per build on almost all PRs, we need to take action.
On the other hand, a single test that fails once every 50 builds might feel
acceptable, but if we have 100 such tests, suddenly it's extremely
difficult to get a clean build. So what's the threshold for action?

So...any thoughts on how to implement this policy? And when to actually
start enforcing it?

On Tue, Nov 21, 2023 at 1:18 PM Sophie Blee-Goldman 
wrote:

> So...where did we land on this?
>
> To take a few steps back and make sure we're all talking about the same
> thing, I want to mention that I myself was responsible for merging a PR
> that broke the build a few weeks ago. There was a warning that only
> appeared in some of the versions, but when checking on the results I
> immediately navigated 

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-21 Thread Sophie Blee-Goldman
So...where did we land on this?

To take a few steps back and make sure we're all talking about the same
thing, I want to mention that I myself was responsible for merging a PR
that broke the build a few weeks ago. There was a warning that only
appeared in some of the versions, but when checking on the results I
immediately navigated to the "Tests" page before letting the main build
page load. I just assumed that if there were test failures that must mean
the build itself was fine, and with this tunnel-vision I missed that a
subset of JDK builds had failed.

I'm mentioning this now not to make excuses, but because I really hope that
it goes without saying that this was an honest (though sloppy) mistake, and
that no one genuinely believes it is acceptable to merge any PR that causes
any of the builds to fail to compile. Yet multiple people in this thread so
far have voiced support for "gating merges on the successful completion of
all parts of the build except tests". Just to be totally clear, I really
don't think that was ever in question -- though it certainly doesn't hurt
to remind everyone.

So, this thread is not about whether or not to merge with failing
*builds, *but it's
whether it should be acceptable to merge with failing *tests.* It seems
like we're in agreement for the most part that the current system isn't
great, but I don't think we can (or rather, should) start enforcing the new
rule to only merge fully-green builds until we can actually get to a
reasonable percentage of green builds. Do we have any stats on how often
builds are failing right now due to flaky tests? Just speaking from
personal experience, it's been a while since I've seen a truly green build
with all tests passing.

Lastly, just to play devil's advocate for a moment, before we commit to
this I think we need to consider how such a policy would impact the
contributor experience. It's incredibly disheartening to go through all the
work of submitting and getting it reviewed only to be blocked from merging
at the last minute due to test failures completely outside that
contributor's control. We already struggle just getting some folks, new
contributors in particular, all the way through the review process without
abandoning their PRs. I do think we've been doing a better job lately, but
it's a cycle that ebbs and flows, and most community PRs still take an
admirable degree of patience and persistence just to get enough reviews to
reach the point of merging. So I just think we need to be careful not to
make this situation even worse by having to wait for a green build. I'm
just worried about our ability to stay on top of disabling tests,
especially if we need to wait for someone with enough context to make a
judgement call. Can we really rely on everyone to drop everything at any
time to check on a failing test? At the same time I would be hesitant to be
overly aggressive about reverting/disabling tests without having someone
who understands the context take a look.

That said, I do think it's worth trying this out as an experiment, as long
as we can be clear to frustrated contributors that this isn't necessarily
the new policy from here on out if it isn't going well.

On Thu, Nov 16, 2023 at 3:32 AM Igor Soarez  wrote:

> Hi all,
>
> I think at least one of those is my fault, apologies.
> I'll try to make sure all my tests are passing from now on.
>
> It doesn't help that GitHub always shows that the tests have failed,
> even when they have not. I suspect this is because Jenkins always
> marks the builds as unstable, even when all tests pass, because
> the "Archive JUnit-formatted test results" step seems to persistently
> fail with "[Checks API] No suitable checks publisher found.".
> e.g.
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14770/1/pipeline/
>
> Can we get rid of that persistent failure and actually mark successful
> test runs as green?
>
> --
> Igor
>


Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-16 Thread Igor Soarez
Hi all,

I think at least one of those is my fault, apologies.
I'll try to make sure all my tests are passing from now on.

It doesn't help that GitHub always shows that the tests have failed,
even when they have not. I suspect this is because Jenkins always
marks the builds as unstable, even when all tests pass, because
the "Archive JUnit-formatted test results" step seems to persistently
fail with "[Checks API] No suitable checks publisher found.".
e.g. 
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14770/1/pipeline/

Can we get rid of that persistent failure and actually mark successful test 
runs as green?

--
Igor


Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-14 Thread Николай Ижиков
Hello guys.

I want to tell you about one more approach to deal with flaky tests.
We adopt this approach in Apache Ignite community, so may be it can be helpful 
for Kafka, also.

TL;DR: Apache Ignite community have a tool that provide a statistic of tests 
and can tell if PR introduces new failures.

Apache Ignite has a many tests.
Latest «Run All» contains around 75k.
Most of test has integration style therefore count of flacky are significant.

We build a tool - Team City Bot [1] 
That provides a combined statistic of flaky tests [2]

This tool can compare results of Run All for PR and master.
If all OK one can comment jira ticket with a visa from bot [3]

Visa is a quality proof of PR for Ignite committers.
And we can sort out most flaky tests and prioritize fixes with the bot 
statistic [2]

TC bot integrated with the Team City only, for now.
But, if Kafka community interested we can try to integrate it with Jenkins.

[1] https://github.com/apache/ignite-teamcity-bot
[2] https://tcbot2.sbt-ignite-dev.ru/current.html?branch=master&count=10
[3] 
https://issues.apache.org/jira/browse/IGNITE-19950?focusedCommentId=17767394&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17767394



> 15 нояб. 2023 г., в 09:18, Ismael Juma  написал(а):
> 
> To use the pain analogy, people seem to have really good painkillers and
> hence they somehow don't feel the pain already. ;)
> 
> The reality is that important and high quality tests will get fixed. Poor
> quality tests (low signal to noise ratio) might not get fixed and that's ok.
> 
> I'm not opposed to marking the tests as release blockers as a starting
> point, but I'm saying it's fine if people triage them and decide they are
> not blockers. In fact, that has already happened in the past.
> 
> Ismael
> 
> On Tue, Nov 14, 2023 at 10:02 PM Matthias J. Sax  wrote:
> 
>> I agree on the test gap argument. However, my worry is, if we don't
>> "force the pain", it won't get fixed at all. -- I also know, that we try
>> to find an working approach for many years...
>> 
>> My take is that if we disable a test and file a non-blocking Jira, it's
>> basically the same as just deleting the test all together and never talk
>> about it again. -- I believe, this is not want we aim for, but we aim
>> for good test coverage and a way to get these test fixed?
>> 
>> Thus IMHO we need some forcing function (either keep the tests and feel
>> the pain on every PR), or disable the test and file a blocker JIRA so
>> the pain surfaces on a release forcing us to do something about it.
>> 
>> If there is no forcing function, it basically means we are willing to
>> accept test gaps forever.
>> 
>> 
>> -Matthias
>> 
>> On 11/14/23 9:09 PM, Ismael Juma wrote:
>>> Matthias,
>>> 
>>> Flaky tests are worse than useless. I know engineers find it hard to
>>> disable them because of the supposed test gap (I find it hard too), but
>> the
>>> truth is that the test gap is already there! No-one blocks merging PRs on
>>> flaky tests, but they do get used to ignoring build failures.
>>> 
>>> The current approach has been attempted for nearly a decade and it has
>>> never worked. I think we should try something different.
>>> 
>>> When it comes to marking flaky tests as release blockers, I don't think
>>> this should be done as a general rule. We should instead assess on a case
>>> by case basis, same way we do it for bugs.
>>> 
>>> Ismael
>>> 
>>> On Tue, Nov 14, 2023 at 5:02 PM Matthias J. Sax 
>> wrote:
>>> 
 Thanks for starting this discussion David! I totally agree to "no"!
 
 I think there is no excuse whatsoever for merging PRs with compilation
 errors (except an honest mistake for conflicting PRs that got merged
 interleaved). -- Every committer must(!) check the Jenkins status before
 merging to avoid such an issue.
 
 Similar for actual permanently broken tests. If there is no green build,
 and the same test failed across multiple Jenkins runs, a committer
 should detect this and cannot merge a PR.
 
 Given the current state of the CI pipeline, it seems possible to get
 green runs, and thus I support the policy (that we actually always had)
 to only merge if there is at least one green build. If committers got
 sloppy about this, we need to call it out and put a hold on this
>> practice.
 
 (The only exception from the above policy would be a very unstable
 status for which getting a green build is not possible at all, due to
 too many flaky tests -- for this case, I would accept to merge even
 there is no green build, but committer need to manually ensure that
 every test did pass in at least one test run. -- We had this in the
 past, but we I don't think we are in such a bad situation right now).
 
 About disabling tests: I was never a fan of this, because in my
 experience those tests are not fixed any time soon. Especially, because
 we do not consider such ti

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-14 Thread Ismael Juma
To use the pain analogy, people seem to have really good painkillers and
hence they somehow don't feel the pain already. ;)

The reality is that important and high quality tests will get fixed. Poor
quality tests (low signal to noise ratio) might not get fixed and that's ok.

I'm not opposed to marking the tests as release blockers as a starting
point, but I'm saying it's fine if people triage them and decide they are
not blockers. In fact, that has already happened in the past.

Ismael

On Tue, Nov 14, 2023 at 10:02 PM Matthias J. Sax  wrote:

> I agree on the test gap argument. However, my worry is, if we don't
> "force the pain", it won't get fixed at all. -- I also know, that we try
> to find an working approach for many years...
>
> My take is that if we disable a test and file a non-blocking Jira, it's
> basically the same as just deleting the test all together and never talk
> about it again. -- I believe, this is not want we aim for, but we aim
> for good test coverage and a way to get these test fixed?
>
> Thus IMHO we need some forcing function (either keep the tests and feel
> the pain on every PR), or disable the test and file a blocker JIRA so
> the pain surfaces on a release forcing us to do something about it.
>
> If there is no forcing function, it basically means we are willing to
> accept test gaps forever.
>
>
> -Matthias
>
> On 11/14/23 9:09 PM, Ismael Juma wrote:
> > Matthias,
> >
> > Flaky tests are worse than useless. I know engineers find it hard to
> > disable them because of the supposed test gap (I find it hard too), but
> the
> > truth is that the test gap is already there! No-one blocks merging PRs on
> > flaky tests, but they do get used to ignoring build failures.
> >
> > The current approach has been attempted for nearly a decade and it has
> > never worked. I think we should try something different.
> >
> > When it comes to marking flaky tests as release blockers, I don't think
> > this should be done as a general rule. We should instead assess on a case
> > by case basis, same way we do it for bugs.
> >
> > Ismael
> >
> > On Tue, Nov 14, 2023 at 5:02 PM Matthias J. Sax 
> wrote:
> >
> >> Thanks for starting this discussion David! I totally agree to "no"!
> >>
> >> I think there is no excuse whatsoever for merging PRs with compilation
> >> errors (except an honest mistake for conflicting PRs that got merged
> >> interleaved). -- Every committer must(!) check the Jenkins status before
> >> merging to avoid such an issue.
> >>
> >> Similar for actual permanently broken tests. If there is no green build,
> >> and the same test failed across multiple Jenkins runs, a committer
> >> should detect this and cannot merge a PR.
> >>
> >> Given the current state of the CI pipeline, it seems possible to get
> >> green runs, and thus I support the policy (that we actually always had)
> >> to only merge if there is at least one green build. If committers got
> >> sloppy about this, we need to call it out and put a hold on this
> practice.
> >>
> >> (The only exception from the above policy would be a very unstable
> >> status for which getting a green build is not possible at all, due to
> >> too many flaky tests -- for this case, I would accept to merge even
> >> there is no green build, but committer need to manually ensure that
> >> every test did pass in at least one test run. -- We had this in the
> >> past, but we I don't think we are in such a bad situation right now).
> >>
> >> About disabling tests: I was never a fan of this, because in my
> >> experience those tests are not fixed any time soon. Especially, because
> >> we do not consider such tickets as release blockers. To me, seeing tests
> >> fails on PR build is actually a good forcing function for people to feel
> >> the pain, and thus get motivated to make time to fix the tests.
> >>
> >> I have to admit that I was a little bit sloppy paying attention to flaky
> >> tests recently, and I highly appreciate this effort. Also thanks to
> >> everyone how actually filed a ticket! IMHO, we should file a ticket for
> >> every flaky test, and also keep adding comments each time we see a test
> >> fail to be able to track the frequency at which a tests fails, so we can
> >> fix the most pressing ones first.
> >>
> >> Te me, the best forcing function to get test stabilized is to file
> >> tickets and consider them release blockers. Disabling tests does not
> >> really help much IMHO to tackle the problem (we can of course still
> >> disable them to get noise out of the system, but it would only introduce
> >> testing gaps for the time being and also does not help to figure out how
> >> often a test fails, so it's not a solution to the problem IMHO)
> >>
> >>
> >> -Matthias
> >>
> >> On 11/13/23 11:40 PM, Sagar wrote:
> >>> Hi Divij,
> >>>
> >>> I think this proposal overall makes sense. My only nit sort of a
> >> suggestion
> >>> is that let's also consider a label called newbie++[1] for flaky tests
> if
> >>> we are considering add

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-14 Thread Chris Egerton
One potential forcing function could be to allow an unconditional revert of
any commit (including testing and non-testing changes) that can be shown to
have introduced test flakiness. This could at least prevent us from
accruing more flaky tests, though it would obviously not be viable for
existing tests that touch on already-released features.

I also liked the earlier suggestion that we immediately begin gating merges
on the successful completion of all parts of the build except tests. This
would prevent the kinds of compilation/Checkstyle errors that kicked off
this discussion.

On Wed, Nov 15, 2023, 15:02 Matthias J. Sax  wrote:

> I agree on the test gap argument. However, my worry is, if we don't
> "force the pain", it won't get fixed at all. -- I also know, that we try
> to find an working approach for many years...
>
> My take is that if we disable a test and file a non-blocking Jira, it's
> basically the same as just deleting the test all together and never talk
> about it again. -- I believe, this is not want we aim for, but we aim
> for good test coverage and a way to get these test fixed?
>
> Thus IMHO we need some forcing function (either keep the tests and feel
> the pain on every PR), or disable the test and file a blocker JIRA so
> the pain surfaces on a release forcing us to do something about it.
>
> If there is no forcing function, it basically means we are willing to
> accept test gaps forever.
>
>
> -Matthias
>
> On 11/14/23 9:09 PM, Ismael Juma wrote:
> > Matthias,
> >
> > Flaky tests are worse than useless. I know engineers find it hard to
> > disable them because of the supposed test gap (I find it hard too), but
> the
> > truth is that the test gap is already there! No-one blocks merging PRs on
> > flaky tests, but they do get used to ignoring build failures.
> >
> > The current approach has been attempted for nearly a decade and it has
> > never worked. I think we should try something different.
> >
> > When it comes to marking flaky tests as release blockers, I don't think
> > this should be done as a general rule. We should instead assess on a case
> > by case basis, same way we do it for bugs.
> >
> > Ismael
> >
> > On Tue, Nov 14, 2023 at 5:02 PM Matthias J. Sax 
> wrote:
> >
> >> Thanks for starting this discussion David! I totally agree to "no"!
> >>
> >> I think there is no excuse whatsoever for merging PRs with compilation
> >> errors (except an honest mistake for conflicting PRs that got merged
> >> interleaved). -- Every committer must(!) check the Jenkins status before
> >> merging to avoid such an issue.
> >>
> >> Similar for actual permanently broken tests. If there is no green build,
> >> and the same test failed across multiple Jenkins runs, a committer
> >> should detect this and cannot merge a PR.
> >>
> >> Given the current state of the CI pipeline, it seems possible to get
> >> green runs, and thus I support the policy (that we actually always had)
> >> to only merge if there is at least one green build. If committers got
> >> sloppy about this, we need to call it out and put a hold on this
> practice.
> >>
> >> (The only exception from the above policy would be a very unstable
> >> status for which getting a green build is not possible at all, due to
> >> too many flaky tests -- for this case, I would accept to merge even
> >> there is no green build, but committer need to manually ensure that
> >> every test did pass in at least one test run. -- We had this in the
> >> past, but we I don't think we are in such a bad situation right now).
> >>
> >> About disabling tests: I was never a fan of this, because in my
> >> experience those tests are not fixed any time soon. Especially, because
> >> we do not consider such tickets as release blockers. To me, seeing tests
> >> fails on PR build is actually a good forcing function for people to feel
> >> the pain, and thus get motivated to make time to fix the tests.
> >>
> >> I have to admit that I was a little bit sloppy paying attention to flaky
> >> tests recently, and I highly appreciate this effort. Also thanks to
> >> everyone how actually filed a ticket! IMHO, we should file a ticket for
> >> every flaky test, and also keep adding comments each time we see a test
> >> fail to be able to track the frequency at which a tests fails, so we can
> >> fix the most pressing ones first.
> >>
> >> Te me, the best forcing function to get test stabilized is to file
> >> tickets and consider them release blockers. Disabling tests does not
> >> really help much IMHO to tackle the problem (we can of course still
> >> disable them to get noise out of the system, but it would only introduce
> >> testing gaps for the time being and also does not help to figure out how
> >> often a test fails, so it's not a solution to the problem IMHO)
> >>
> >>
> >> -Matthias
> >>
> >> On 11/13/23 11:40 PM, Sagar wrote:
> >>> Hi Divij,
> >>>
> >>> I think this proposal overall makes sense. My only nit sort of a
> >> suggestion
> >>> is that let's

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-14 Thread Matthias J. Sax
I agree on the test gap argument. However, my worry is, if we don't 
"force the pain", it won't get fixed at all. -- I also know, that we try 
to find an working approach for many years...


My take is that if we disable a test and file a non-blocking Jira, it's 
basically the same as just deleting the test all together and never talk 
about it again. -- I believe, this is not want we aim for, but we aim 
for good test coverage and a way to get these test fixed?


Thus IMHO we need some forcing function (either keep the tests and feel 
the pain on every PR), or disable the test and file a blocker JIRA so 
the pain surfaces on a release forcing us to do something about it.


If there is no forcing function, it basically means we are willing to 
accept test gaps forever.



-Matthias

On 11/14/23 9:09 PM, Ismael Juma wrote:

Matthias,

Flaky tests are worse than useless. I know engineers find it hard to
disable them because of the supposed test gap (I find it hard too), but the
truth is that the test gap is already there! No-one blocks merging PRs on
flaky tests, but they do get used to ignoring build failures.

The current approach has been attempted for nearly a decade and it has
never worked. I think we should try something different.

When it comes to marking flaky tests as release blockers, I don't think
this should be done as a general rule. We should instead assess on a case
by case basis, same way we do it for bugs.

Ismael

On Tue, Nov 14, 2023 at 5:02 PM Matthias J. Sax  wrote:


Thanks for starting this discussion David! I totally agree to "no"!

I think there is no excuse whatsoever for merging PRs with compilation
errors (except an honest mistake for conflicting PRs that got merged
interleaved). -- Every committer must(!) check the Jenkins status before
merging to avoid such an issue.

Similar for actual permanently broken tests. If there is no green build,
and the same test failed across multiple Jenkins runs, a committer
should detect this and cannot merge a PR.

Given the current state of the CI pipeline, it seems possible to get
green runs, and thus I support the policy (that we actually always had)
to only merge if there is at least one green build. If committers got
sloppy about this, we need to call it out and put a hold on this practice.

(The only exception from the above policy would be a very unstable
status for which getting a green build is not possible at all, due to
too many flaky tests -- for this case, I would accept to merge even
there is no green build, but committer need to manually ensure that
every test did pass in at least one test run. -- We had this in the
past, but we I don't think we are in such a bad situation right now).

About disabling tests: I was never a fan of this, because in my
experience those tests are not fixed any time soon. Especially, because
we do not consider such tickets as release blockers. To me, seeing tests
fails on PR build is actually a good forcing function for people to feel
the pain, and thus get motivated to make time to fix the tests.

I have to admit that I was a little bit sloppy paying attention to flaky
tests recently, and I highly appreciate this effort. Also thanks to
everyone how actually filed a ticket! IMHO, we should file a ticket for
every flaky test, and also keep adding comments each time we see a test
fail to be able to track the frequency at which a tests fails, so we can
fix the most pressing ones first.

Te me, the best forcing function to get test stabilized is to file
tickets and consider them release blockers. Disabling tests does not
really help much IMHO to tackle the problem (we can of course still
disable them to get noise out of the system, but it would only introduce
testing gaps for the time being and also does not help to figure out how
often a test fails, so it's not a solution to the problem IMHO)


-Matthias

On 11/13/23 11:40 PM, Sagar wrote:

Hi Divij,

I think this proposal overall makes sense. My only nit sort of a

suggestion

is that let's also consider a label called newbie++[1] for flaky tests if
we are considering adding newbie as a label. I think some of the flaky
tests need familiarity with the codebase or the test setup so as a first
time contributor, it might be difficult. newbie++ IMO covers that aspect.

[1]


https://issues.apache.org/jira/browse/KAFKA-15406?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20%22newbie%2B%2B%22


Let me know what you think.

Thanks!
Sagar.

On Mon, Nov 13, 2023 at 9:11 PM Divij Vaidya 
wrote:


   Please, do it.

We can use specific labels to effectively filter those tickets.

We already have a label and a way to discover flaky tests. They are

tagged

with the label "flaky-test" [1]. There is also a label "newbie" [2]

meant

for folks who are new to Apache Kafka code base.
My suggestion is to send a broader email to the community (since many

will

miss details in this thread) and call for action for committers to
volunteer as "shepherds" for these tickets. I ca

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-14 Thread Ismael Juma
Matthias,

Flaky tests are worse than useless. I know engineers find it hard to
disable them because of the supposed test gap (I find it hard too), but the
truth is that the test gap is already there! No-one blocks merging PRs on
flaky tests, but they do get used to ignoring build failures.

The current approach has been attempted for nearly a decade and it has
never worked. I think we should try something different.

When it comes to marking flaky tests as release blockers, I don't think
this should be done as a general rule. We should instead assess on a case
by case basis, same way we do it for bugs.

Ismael

On Tue, Nov 14, 2023 at 5:02 PM Matthias J. Sax  wrote:

> Thanks for starting this discussion David! I totally agree to "no"!
>
> I think there is no excuse whatsoever for merging PRs with compilation
> errors (except an honest mistake for conflicting PRs that got merged
> interleaved). -- Every committer must(!) check the Jenkins status before
> merging to avoid such an issue.
>
> Similar for actual permanently broken tests. If there is no green build,
> and the same test failed across multiple Jenkins runs, a committer
> should detect this and cannot merge a PR.
>
> Given the current state of the CI pipeline, it seems possible to get
> green runs, and thus I support the policy (that we actually always had)
> to only merge if there is at least one green build. If committers got
> sloppy about this, we need to call it out and put a hold on this practice.
>
> (The only exception from the above policy would be a very unstable
> status for which getting a green build is not possible at all, due to
> too many flaky tests -- for this case, I would accept to merge even
> there is no green build, but committer need to manually ensure that
> every test did pass in at least one test run. -- We had this in the
> past, but we I don't think we are in such a bad situation right now).
>
> About disabling tests: I was never a fan of this, because in my
> experience those tests are not fixed any time soon. Especially, because
> we do not consider such tickets as release blockers. To me, seeing tests
> fails on PR build is actually a good forcing function for people to feel
> the pain, and thus get motivated to make time to fix the tests.
>
> I have to admit that I was a little bit sloppy paying attention to flaky
> tests recently, and I highly appreciate this effort. Also thanks to
> everyone how actually filed a ticket! IMHO, we should file a ticket for
> every flaky test, and also keep adding comments each time we see a test
> fail to be able to track the frequency at which a tests fails, so we can
> fix the most pressing ones first.
>
> Te me, the best forcing function to get test stabilized is to file
> tickets and consider them release blockers. Disabling tests does not
> really help much IMHO to tackle the problem (we can of course still
> disable them to get noise out of the system, but it would only introduce
> testing gaps for the time being and also does not help to figure out how
> often a test fails, so it's not a solution to the problem IMHO)
>
>
> -Matthias
>
> On 11/13/23 11:40 PM, Sagar wrote:
> > Hi Divij,
> >
> > I think this proposal overall makes sense. My only nit sort of a
> suggestion
> > is that let's also consider a label called newbie++[1] for flaky tests if
> > we are considering adding newbie as a label. I think some of the flaky
> > tests need familiarity with the codebase or the test setup so as a first
> > time contributor, it might be difficult. newbie++ IMO covers that aspect.
> >
> > [1]
> >
> https://issues.apache.org/jira/browse/KAFKA-15406?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20%22newbie%2B%2B%22
> >
> > Let me know what you think.
> >
> > Thanks!
> > Sagar.
> >
> > On Mon, Nov 13, 2023 at 9:11 PM Divij Vaidya 
> > wrote:
> >
> >>>   Please, do it.
> >> We can use specific labels to effectively filter those tickets.
> >>
> >> We already have a label and a way to discover flaky tests. They are
> tagged
> >> with the label "flaky-test" [1]. There is also a label "newbie" [2]
> meant
> >> for folks who are new to Apache Kafka code base.
> >> My suggestion is to send a broader email to the community (since many
> will
> >> miss details in this thread) and call for action for committers to
> >> volunteer as "shepherds" for these tickets. I can send one out once we
> have
> >> some consensus wrt next steps in this thread.
> >>
> >>
> >> [1]
> >>
> >>
> https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
> >>
> >>
> >> [2] https://kafka.apache.org/contributing -> Finding a project to work
> on
> >>
> >>
> >> Divij Vaidya
> >>
> >>
> >>
> >> On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков 
> >> wrote:
> >>
> >>>
>  To kickstart this

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-14 Thread Justine Olshan
Thanks Matthias -- I agree that there are lots of flaky test JIRAs but not
a forcing function to fix them.
One option is to mark them as blockers as you alluded to.

One other thing to keep in mind is that we have the gradle enterprise tool
to track flaky tests!
https://ge.apache.org/scans/tests?search.names=Git%20branch&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.values=trunk

I think there are some false positives that may come up if we look at PRs
builds too, but we may be able to use tags to filter things out.

Justine

On Tue, Nov 14, 2023 at 5:01 PM Matthias J. Sax  wrote:

> Thanks for starting this discussion David! I totally agree to "no"!
>
> I think there is no excuse whatsoever for merging PRs with compilation
> errors (except an honest mistake for conflicting PRs that got merged
> interleaved). -- Every committer must(!) check the Jenkins status before
> merging to avoid such an issue.
>
> Similar for actual permanently broken tests. If there is no green build,
> and the same test failed across multiple Jenkins runs, a committer
> should detect this and cannot merge a PR.
>
> Given the current state of the CI pipeline, it seems possible to get
> green runs, and thus I support the policy (that we actually always had)
> to only merge if there is at least one green build. If committers got
> sloppy about this, we need to call it out and put a hold on this practice.
>
> (The only exception from the above policy would be a very unstable
> status for which getting a green build is not possible at all, due to
> too many flaky tests -- for this case, I would accept to merge even
> there is no green build, but committer need to manually ensure that
> every test did pass in at least one test run. -- We had this in the
> past, but we I don't think we are in such a bad situation right now).
>
> About disabling tests: I was never a fan of this, because in my
> experience those tests are not fixed any time soon. Especially, because
> we do not consider such tickets as release blockers. To me, seeing tests
> fails on PR build is actually a good forcing function for people to feel
> the pain, and thus get motivated to make time to fix the tests.
>
> I have to admit that I was a little bit sloppy paying attention to flaky
> tests recently, and I highly appreciate this effort. Also thanks to
> everyone how actually filed a ticket! IMHO, we should file a ticket for
> every flaky test, and also keep adding comments each time we see a test
> fail to be able to track the frequency at which a tests fails, so we can
> fix the most pressing ones first.
>
> Te me, the best forcing function to get test stabilized is to file
> tickets and consider them release blockers. Disabling tests does not
> really help much IMHO to tackle the problem (we can of course still
> disable them to get noise out of the system, but it would only introduce
> testing gaps for the time being and also does not help to figure out how
> often a test fails, so it's not a solution to the problem IMHO)
>
>
> -Matthias
>
> On 11/13/23 11:40 PM, Sagar wrote:
> > Hi Divij,
> >
> > I think this proposal overall makes sense. My only nit sort of a
> suggestion
> > is that let's also consider a label called newbie++[1] for flaky tests if
> > we are considering adding newbie as a label. I think some of the flaky
> > tests need familiarity with the codebase or the test setup so as a first
> > time contributor, it might be difficult. newbie++ IMO covers that aspect.
> >
> > [1]
> >
> https://issues.apache.org/jira/browse/KAFKA-15406?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20%22newbie%2B%2B%22
> >
> > Let me know what you think.
> >
> > Thanks!
> > Sagar.
> >
> > On Mon, Nov 13, 2023 at 9:11 PM Divij Vaidya 
> > wrote:
> >
> >>>   Please, do it.
> >> We can use specific labels to effectively filter those tickets.
> >>
> >> We already have a label and a way to discover flaky tests. They are
> tagged
> >> with the label "flaky-test" [1]. There is also a label "newbie" [2]
> meant
> >> for folks who are new to Apache Kafka code base.
> >> My suggestion is to send a broader email to the community (since many
> will
> >> miss details in this thread) and call for action for committers to
> >> volunteer as "shepherds" for these tickets. I can send one out once we
> have
> >> some consensus wrt next steps in this thread.
> >>
> >>
> >> [1]
> >>
> >>
> https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
> >>
> >>
> >> [2] https://kafka.apache.org/contributing -> Finding a project to work
> on
> >>
> >>
> >> Divij Vaidya
> >>
> >>
> >>
> >> On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков 
> >> wrote:
> >>
> >>>
>  To kickstart this effort, we can publish a list of such tickets in the
> >>> communi

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-14 Thread Matthias J. Sax

Thanks for starting this discussion David! I totally agree to "no"!

I think there is no excuse whatsoever for merging PRs with compilation 
errors (except an honest mistake for conflicting PRs that got merged 
interleaved). -- Every committer must(!) check the Jenkins status before 
merging to avoid such an issue.


Similar for actual permanently broken tests. If there is no green build, 
and the same test failed across multiple Jenkins runs, a committer 
should detect this and cannot merge a PR.


Given the current state of the CI pipeline, it seems possible to get 
green runs, and thus I support the policy (that we actually always had) 
to only merge if there is at least one green build. If committers got 
sloppy about this, we need to call it out and put a hold on this practice.


(The only exception from the above policy would be a very unstable 
status for which getting a green build is not possible at all, due to 
too many flaky tests -- for this case, I would accept to merge even 
there is no green build, but committer need to manually ensure that 
every test did pass in at least one test run. -- We had this in the 
past, but we I don't think we are in such a bad situation right now).


About disabling tests: I was never a fan of this, because in my 
experience those tests are not fixed any time soon. Especially, because 
we do not consider such tickets as release blockers. To me, seeing tests 
fails on PR build is actually a good forcing function for people to feel 
the pain, and thus get motivated to make time to fix the tests.


I have to admit that I was a little bit sloppy paying attention to flaky 
tests recently, and I highly appreciate this effort. Also thanks to 
everyone how actually filed a ticket! IMHO, we should file a ticket for 
every flaky test, and also keep adding comments each time we see a test 
fail to be able to track the frequency at which a tests fails, so we can 
fix the most pressing ones first.


Te me, the best forcing function to get test stabilized is to file 
tickets and consider them release blockers. Disabling tests does not 
really help much IMHO to tackle the problem (we can of course still 
disable them to get noise out of the system, but it would only introduce 
testing gaps for the time being and also does not help to figure out how 
often a test fails, so it's not a solution to the problem IMHO)



-Matthias

On 11/13/23 11:40 PM, Sagar wrote:

Hi Divij,

I think this proposal overall makes sense. My only nit sort of a suggestion
is that let's also consider a label called newbie++[1] for flaky tests if
we are considering adding newbie as a label. I think some of the flaky
tests need familiarity with the codebase or the test setup so as a first
time contributor, it might be difficult. newbie++ IMO covers that aspect.

[1]
https://issues.apache.org/jira/browse/KAFKA-15406?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20%22newbie%2B%2B%22

Let me know what you think.

Thanks!
Sagar.

On Mon, Nov 13, 2023 at 9:11 PM Divij Vaidya 
wrote:


  Please, do it.

We can use specific labels to effectively filter those tickets.

We already have a label and a way to discover flaky tests. They are tagged
with the label "flaky-test" [1]. There is also a label "newbie" [2] meant
for folks who are new to Apache Kafka code base.
My suggestion is to send a broader email to the community (since many will
miss details in this thread) and call for action for committers to
volunteer as "shepherds" for these tickets. I can send one out once we have
some consensus wrt next steps in this thread.


[1]

https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC


[2] https://kafka.apache.org/contributing -> Finding a project to work on


Divij Vaidya



On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков 
wrote:




To kickstart this effort, we can publish a list of such tickets in the

community and assign one or more committers the role of a «shepherd" for
each ticket.

Please, do it.
We can use specific label to effectively filter those tickets.


13 нояб. 2023 г., в 15:16, Divij Vaidya 

написал(а):


Thanks for bringing this up David.

My primary concern revolves around the possibility that the currently
disabled tests may remain inactive indefinitely. We currently have
unresolved JIRA tickets for flaky tests that have been pending for an
extended period. I am inclined to support the idea of disabling these

tests

temporarily and merging changes only when the build is successful,

provided

there is a clear plan for re-enabling them in the future.

To address this issue, I propose the following measures:

1\ Foster a supportive environment for new contributors within the
community, encouraging them to take on tickets associated with flaky

tests.

This

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-13 Thread Sagar
Hi Divij,

I think this proposal overall makes sense. My only nit sort of a suggestion
is that let's also consider a label called newbie++[1] for flaky tests if
we are considering adding newbie as a label. I think some of the flaky
tests need familiarity with the codebase or the test setup so as a first
time contributor, it might be difficult. newbie++ IMO covers that aspect.

[1]
https://issues.apache.org/jira/browse/KAFKA-15406?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20%22newbie%2B%2B%22

Let me know what you think.

Thanks!
Sagar.

On Mon, Nov 13, 2023 at 9:11 PM Divij Vaidya 
wrote:

> >  Please, do it.
> We can use specific labels to effectively filter those tickets.
>
> We already have a label and a way to discover flaky tests. They are tagged
> with the label "flaky-test" [1]. There is also a label "newbie" [2] meant
> for folks who are new to Apache Kafka code base.
> My suggestion is to send a broader email to the community (since many will
> miss details in this thread) and call for action for committers to
> volunteer as "shepherds" for these tickets. I can send one out once we have
> some consensus wrt next steps in this thread.
>
>
> [1]
>
> https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>
>
> [2] https://kafka.apache.org/contributing -> Finding a project to work on
>
>
> Divij Vaidya
>
>
>
> On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков 
> wrote:
>
> >
> > > To kickstart this effort, we can publish a list of such tickets in the
> > community and assign one or more committers the role of a «shepherd" for
> > each ticket.
> >
> > Please, do it.
> > We can use specific label to effectively filter those tickets.
> >
> > > 13 нояб. 2023 г., в 15:16, Divij Vaidya 
> > написал(а):
> > >
> > > Thanks for bringing this up David.
> > >
> > > My primary concern revolves around the possibility that the currently
> > > disabled tests may remain inactive indefinitely. We currently have
> > > unresolved JIRA tickets for flaky tests that have been pending for an
> > > extended period. I am inclined to support the idea of disabling these
> > tests
> > > temporarily and merging changes only when the build is successful,
> > provided
> > > there is a clear plan for re-enabling them in the future.
> > >
> > > To address this issue, I propose the following measures:
> > >
> > > 1\ Foster a supportive environment for new contributors within the
> > > community, encouraging them to take on tickets associated with flaky
> > tests.
> > > This initiative would require individuals familiar with the relevant
> code
> > > to offer guidance to those undertaking these tasks. Committers should
> > > prioritize reviewing and addressing these tickets within their
> available
> > > bandwidth. To kickstart this effort, we can publish a list of such
> > tickets
> > > in the community and assign one or more committers the role of a
> > "shepherd"
> > > for each ticket.
> > >
> > > 2\ Implement a policy to block minor version releases until the Release
> > > Manager (RM) is satisfied that the disabled tests do not result in gaps
> > in
> > > our testing coverage. The RM may rely on Subject Matter Experts (SMEs)
> in
> > > the specific code areas to provide assurance before giving the green
> > light
> > > for a release.
> > >
> > > 3\ Set a community-wide goal for 2024 to achieve a stable Continuous
> > > Integration (CI) system. This goal should encompass projects such as
> > > refining our test suite to eliminate flakiness and addressing
> > > infrastructure issues if necessary. By publishing this goal, we create
> a
> > > shared vision for the community in 2024, fostering alignment on our
> > > objectives. This alignment will aid in prioritizing tasks for community
> > > members and guide reviewers in allocating their bandwidth effectively.
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan
> > 
> > > wrote:
> > >
> > >> I will say that I have also seen tests that seem to be more flaky
> > >> intermittently. It may be ok for some time and suddenly the CI is
> > >> overloaded and we see issues.
> > >> I have also seen the CI struggling with running out of space recently,
> > so I
> > >> wonder if we can also try to improve things on that front.
> > >>
> > >> FWIW, I noticed, filed, or commented on several flaky test JIRAs last
> > week.
> > >> I'm happy to try to get to green builds, but everyone needs to be on
> > board.
> > >>
> > >> https://issues.apache.org/jira/browse/KAFKA-15529
> > >> https://issues.apache.org/jira/browse/KAFKA-14806
> > >> https://issues.apache.org/jira/browse/KAFKA-14249
> > >> https://issues.apache.org/jira/browse/KAFKA-15798
> > >> https://issues.apache.org/jira/browse/KAFKA-15797
> > >> https://

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-13 Thread Philip Nee
Hi David et al,

I agree with all the suggestions, but from what I've seen, the flaky tests
tend to get ignored, and I'm afraid that disabling them would leave them
getting forgotten.  If the Jira ticket is accurate, we've got plenty of
tickets opened for > 2 years
.
I do think Divij's call is a good initiative but keep in mind that tackling
these flaky tests can take significant time and effort - outside of one's
full time job.  I think the very least one can do is to ensure there is no
red build, for the near term - as I have seen quite a few PR getting merged
with broken build and broke the trunk.

P

On Mon, Nov 13, 2023 at 7:41 AM Divij Vaidya 
wrote:

> >  Please, do it.
> We can use specific labels to effectively filter those tickets.
>
> We already have a label and a way to discover flaky tests. They are tagged
> with the label "flaky-test" [1]. There is also a label "newbie" [2] meant
> for folks who are new to Apache Kafka code base.
> My suggestion is to send a broader email to the community (since many will
> miss details in this thread) and call for action for committers to
> volunteer as "shepherds" for these tickets. I can send one out once we have
> some consensus wrt next steps in this thread.
>
>
> [1]
>
> https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>
>
> [2] https://kafka.apache.org/contributing -> Finding a project to work on
>
>
> Divij Vaidya
>
>
>
> On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков 
> wrote:
>
> >
> > > To kickstart this effort, we can publish a list of such tickets in the
> > community and assign one or more committers the role of a «shepherd" for
> > each ticket.
> >
> > Please, do it.
> > We can use specific label to effectively filter those tickets.
> >
> > > 13 нояб. 2023 г., в 15:16, Divij Vaidya 
> > написал(а):
> > >
> > > Thanks for bringing this up David.
> > >
> > > My primary concern revolves around the possibility that the currently
> > > disabled tests may remain inactive indefinitely. We currently have
> > > unresolved JIRA tickets for flaky tests that have been pending for an
> > > extended period. I am inclined to support the idea of disabling these
> > tests
> > > temporarily and merging changes only when the build is successful,
> > provided
> > > there is a clear plan for re-enabling them in the future.
> > >
> > > To address this issue, I propose the following measures:
> > >
> > > 1\ Foster a supportive environment for new contributors within the
> > > community, encouraging them to take on tickets associated with flaky
> > tests.
> > > This initiative would require individuals familiar with the relevant
> code
> > > to offer guidance to those undertaking these tasks. Committers should
> > > prioritize reviewing and addressing these tickets within their
> available
> > > bandwidth. To kickstart this effort, we can publish a list of such
> > tickets
> > > in the community and assign one or more committers the role of a
> > "shepherd"
> > > for each ticket.
> > >
> > > 2\ Implement a policy to block minor version releases until the Release
> > > Manager (RM) is satisfied that the disabled tests do not result in gaps
> > in
> > > our testing coverage. The RM may rely on Subject Matter Experts (SMEs)
> in
> > > the specific code areas to provide assurance before giving the green
> > light
> > > for a release.
> > >
> > > 3\ Set a community-wide goal for 2024 to achieve a stable Continuous
> > > Integration (CI) system. This goal should encompass projects such as
> > > refining our test suite to eliminate flakiness and addressing
> > > infrastructure issues if necessary. By publishing this goal, we create
> a
> > > shared vision for the community in 2024, fostering alignment on our
> > > objectives. This alignment will aid in prioritizing tasks for community
> > > members and guide reviewers in allocating their bandwidth effectively.
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan
> > 
> > > wrote:
> > >
> > >> I will say that I have also seen tests that seem to be more flaky
> > >> intermittently. It may be ok for some time and suddenly the CI is
> > >> overloaded and we see issues.
> > >> I have also seen the CI struggling with running out of space recently,
> > so I
> > >> wonder if we can also try to improve things on that front.
> > >>
> > >> FWIW, I noticed, filed, or commented on several flaky test JIRAs last
> > week.
> > >> I'm happy to try to get to green builds, but everyone needs to be on
> > board.
> > >>
> > >> https://issues.apache.org/jira/browse/KAFKA-15529
> > >> https://issues.

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-13 Thread Divij Vaidya
>  Please, do it.
We can use specific labels to effectively filter those tickets.

We already have a label and a way to discover flaky tests. They are tagged
with the label "flaky-test" [1]. There is also a label "newbie" [2] meant
for folks who are new to Apache Kafka code base.
My suggestion is to send a broader email to the community (since many will
miss details in this thread) and call for action for committers to
volunteer as "shepherds" for these tickets. I can send one out once we have
some consensus wrt next steps in this thread.


[1]
https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flaky-test%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC


[2] https://kafka.apache.org/contributing -> Finding a project to work on


Divij Vaidya



On Mon, Nov 13, 2023 at 4:24 PM Николай Ижиков  wrote:

>
> > To kickstart this effort, we can publish a list of such tickets in the
> community and assign one or more committers the role of a «shepherd" for
> each ticket.
>
> Please, do it.
> We can use specific label to effectively filter those tickets.
>
> > 13 нояб. 2023 г., в 15:16, Divij Vaidya 
> написал(а):
> >
> > Thanks for bringing this up David.
> >
> > My primary concern revolves around the possibility that the currently
> > disabled tests may remain inactive indefinitely. We currently have
> > unresolved JIRA tickets for flaky tests that have been pending for an
> > extended period. I am inclined to support the idea of disabling these
> tests
> > temporarily and merging changes only when the build is successful,
> provided
> > there is a clear plan for re-enabling them in the future.
> >
> > To address this issue, I propose the following measures:
> >
> > 1\ Foster a supportive environment for new contributors within the
> > community, encouraging them to take on tickets associated with flaky
> tests.
> > This initiative would require individuals familiar with the relevant code
> > to offer guidance to those undertaking these tasks. Committers should
> > prioritize reviewing and addressing these tickets within their available
> > bandwidth. To kickstart this effort, we can publish a list of such
> tickets
> > in the community and assign one or more committers the role of a
> "shepherd"
> > for each ticket.
> >
> > 2\ Implement a policy to block minor version releases until the Release
> > Manager (RM) is satisfied that the disabled tests do not result in gaps
> in
> > our testing coverage. The RM may rely on Subject Matter Experts (SMEs) in
> > the specific code areas to provide assurance before giving the green
> light
> > for a release.
> >
> > 3\ Set a community-wide goal for 2024 to achieve a stable Continuous
> > Integration (CI) system. This goal should encompass projects such as
> > refining our test suite to eliminate flakiness and addressing
> > infrastructure issues if necessary. By publishing this goal, we create a
> > shared vision for the community in 2024, fostering alignment on our
> > objectives. This alignment will aid in prioritizing tasks for community
> > members and guide reviewers in allocating their bandwidth effectively.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan
> 
> > wrote:
> >
> >> I will say that I have also seen tests that seem to be more flaky
> >> intermittently. It may be ok for some time and suddenly the CI is
> >> overloaded and we see issues.
> >> I have also seen the CI struggling with running out of space recently,
> so I
> >> wonder if we can also try to improve things on that front.
> >>
> >> FWIW, I noticed, filed, or commented on several flaky test JIRAs last
> week.
> >> I'm happy to try to get to green builds, but everyone needs to be on
> board.
> >>
> >> https://issues.apache.org/jira/browse/KAFKA-15529
> >> https://issues.apache.org/jira/browse/KAFKA-14806
> >> https://issues.apache.org/jira/browse/KAFKA-14249
> >> https://issues.apache.org/jira/browse/KAFKA-15798
> >> https://issues.apache.org/jira/browse/KAFKA-15797
> >> https://issues.apache.org/jira/browse/KAFKA-15690
> >> https://issues.apache.org/jira/browse/KAFKA-15699
> >> https://issues.apache.org/jira/browse/KAFKA-15772
> >> https://issues.apache.org/jira/browse/KAFKA-15759
> >> https://issues.apache.org/jira/browse/KAFKA-15760
> >> https://issues.apache.org/jira/browse/KAFKA-15700
> >>
> >> I've also seen that kraft transactions tests often flakily see that the
> >> producer id is not allocated and times out.
> >> I can file a JIRA for that too.
> >>
> >> Hopefully this is a place we can start from.
> >>
> >> Justine
> >>
> >> On Sat, Nov 11, 2023 at 11:35 AM Ismael Juma  wrote:
> >>
> >>> On Sat, Nov 11, 2023 at 10:32 AM John Roesler 
> >> wrote:
> >>>
>  In other words, I’m biased to think that new flakiness indicates
>  non-deterministic bugs more often t

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-13 Thread Николай Ижиков


> To kickstart this effort, we can publish a list of such tickets in the 
> community and assign one or more committers the role of a «shepherd" for each 
> ticket.

Please, do it.
We can use specific label to effectively filter those tickets.

> 13 нояб. 2023 г., в 15:16, Divij Vaidya  написал(а):
> 
> Thanks for bringing this up David.
> 
> My primary concern revolves around the possibility that the currently
> disabled tests may remain inactive indefinitely. We currently have
> unresolved JIRA tickets for flaky tests that have been pending for an
> extended period. I am inclined to support the idea of disabling these tests
> temporarily and merging changes only when the build is successful, provided
> there is a clear plan for re-enabling them in the future.
> 
> To address this issue, I propose the following measures:
> 
> 1\ Foster a supportive environment for new contributors within the
> community, encouraging them to take on tickets associated with flaky tests.
> This initiative would require individuals familiar with the relevant code
> to offer guidance to those undertaking these tasks. Committers should
> prioritize reviewing and addressing these tickets within their available
> bandwidth. To kickstart this effort, we can publish a list of such tickets
> in the community and assign one or more committers the role of a "shepherd"
> for each ticket.
> 
> 2\ Implement a policy to block minor version releases until the Release
> Manager (RM) is satisfied that the disabled tests do not result in gaps in
> our testing coverage. The RM may rely on Subject Matter Experts (SMEs) in
> the specific code areas to provide assurance before giving the green light
> for a release.
> 
> 3\ Set a community-wide goal for 2024 to achieve a stable Continuous
> Integration (CI) system. This goal should encompass projects such as
> refining our test suite to eliminate flakiness and addressing
> infrastructure issues if necessary. By publishing this goal, we create a
> shared vision for the community in 2024, fostering alignment on our
> objectives. This alignment will aid in prioritizing tasks for community
> members and guide reviewers in allocating their bandwidth effectively.
> 
> --
> Divij Vaidya
> 
> 
> 
> On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan 
> wrote:
> 
>> I will say that I have also seen tests that seem to be more flaky
>> intermittently. It may be ok for some time and suddenly the CI is
>> overloaded and we see issues.
>> I have also seen the CI struggling with running out of space recently, so I
>> wonder if we can also try to improve things on that front.
>> 
>> FWIW, I noticed, filed, or commented on several flaky test JIRAs last week.
>> I'm happy to try to get to green builds, but everyone needs to be on board.
>> 
>> https://issues.apache.org/jira/browse/KAFKA-15529
>> https://issues.apache.org/jira/browse/KAFKA-14806
>> https://issues.apache.org/jira/browse/KAFKA-14249
>> https://issues.apache.org/jira/browse/KAFKA-15798
>> https://issues.apache.org/jira/browse/KAFKA-15797
>> https://issues.apache.org/jira/browse/KAFKA-15690
>> https://issues.apache.org/jira/browse/KAFKA-15699
>> https://issues.apache.org/jira/browse/KAFKA-15772
>> https://issues.apache.org/jira/browse/KAFKA-15759
>> https://issues.apache.org/jira/browse/KAFKA-15760
>> https://issues.apache.org/jira/browse/KAFKA-15700
>> 
>> I've also seen that kraft transactions tests often flakily see that the
>> producer id is not allocated and times out.
>> I can file a JIRA for that too.
>> 
>> Hopefully this is a place we can start from.
>> 
>> Justine
>> 
>> On Sat, Nov 11, 2023 at 11:35 AM Ismael Juma  wrote:
>> 
>>> On Sat, Nov 11, 2023 at 10:32 AM John Roesler 
>> wrote:
>>> 
 In other words, I’m biased to think that new flakiness indicates
 non-deterministic bugs more often than it indicates a bad test.
 
>>> 
>>> My experience is exactly the opposite. As someone who has tracked many of
>>> the flaky fixes, the vast majority of the time they are an issue with the
>>> test.
>>> 
>>> Ismael
>>> 
>> 



Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-13 Thread Divij Vaidya
Thanks for bringing this up David.

My primary concern revolves around the possibility that the currently
disabled tests may remain inactive indefinitely. We currently have
unresolved JIRA tickets for flaky tests that have been pending for an
extended period. I am inclined to support the idea of disabling these tests
temporarily and merging changes only when the build is successful, provided
there is a clear plan for re-enabling them in the future.

To address this issue, I propose the following measures:

1\ Foster a supportive environment for new contributors within the
community, encouraging them to take on tickets associated with flaky tests.
This initiative would require individuals familiar with the relevant code
to offer guidance to those undertaking these tasks. Committers should
prioritize reviewing and addressing these tickets within their available
bandwidth. To kickstart this effort, we can publish a list of such tickets
in the community and assign one or more committers the role of a "shepherd"
for each ticket.

2\ Implement a policy to block minor version releases until the Release
Manager (RM) is satisfied that the disabled tests do not result in gaps in
our testing coverage. The RM may rely on Subject Matter Experts (SMEs) in
the specific code areas to provide assurance before giving the green light
for a release.

3\ Set a community-wide goal for 2024 to achieve a stable Continuous
Integration (CI) system. This goal should encompass projects such as
refining our test suite to eliminate flakiness and addressing
infrastructure issues if necessary. By publishing this goal, we create a
shared vision for the community in 2024, fostering alignment on our
objectives. This alignment will aid in prioritizing tasks for community
members and guide reviewers in allocating their bandwidth effectively.

--
Divij Vaidya



On Sun, Nov 12, 2023 at 2:58 AM Justine Olshan 
wrote:

> I will say that I have also seen tests that seem to be more flaky
> intermittently. It may be ok for some time and suddenly the CI is
> overloaded and we see issues.
> I have also seen the CI struggling with running out of space recently, so I
> wonder if we can also try to improve things on that front.
>
> FWIW, I noticed, filed, or commented on several flaky test JIRAs last week.
> I'm happy to try to get to green builds, but everyone needs to be on board.
>
> https://issues.apache.org/jira/browse/KAFKA-15529
> https://issues.apache.org/jira/browse/KAFKA-14806
> https://issues.apache.org/jira/browse/KAFKA-14249
> https://issues.apache.org/jira/browse/KAFKA-15798
> https://issues.apache.org/jira/browse/KAFKA-15797
> https://issues.apache.org/jira/browse/KAFKA-15690
> https://issues.apache.org/jira/browse/KAFKA-15699
> https://issues.apache.org/jira/browse/KAFKA-15772
> https://issues.apache.org/jira/browse/KAFKA-15759
> https://issues.apache.org/jira/browse/KAFKA-15760
> https://issues.apache.org/jira/browse/KAFKA-15700
>
> I've also seen that kraft transactions tests often flakily see that the
> producer id is not allocated and times out.
> I can file a JIRA for that too.
>
> Hopefully this is a place we can start from.
>
> Justine
>
> On Sat, Nov 11, 2023 at 11:35 AM Ismael Juma  wrote:
>
> > On Sat, Nov 11, 2023 at 10:32 AM John Roesler 
> wrote:
> >
> > > In other words, I’m biased to think that new flakiness indicates
> > > non-deterministic bugs more often than it indicates a bad test.
> > >
> >
> > My experience is exactly the opposite. As someone who has tracked many of
> > the flaky fixes, the vast majority of the time they are an issue with the
> > test.
> >
> > Ismael
> >
>


Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-11 Thread Justine Olshan
I will say that I have also seen tests that seem to be more flaky
intermittently. It may be ok for some time and suddenly the CI is
overloaded and we see issues.
I have also seen the CI struggling with running out of space recently, so I
wonder if we can also try to improve things on that front.

FWIW, I noticed, filed, or commented on several flaky test JIRAs last week.
I'm happy to try to get to green builds, but everyone needs to be on board.

https://issues.apache.org/jira/browse/KAFKA-15529
https://issues.apache.org/jira/browse/KAFKA-14806
https://issues.apache.org/jira/browse/KAFKA-14249
https://issues.apache.org/jira/browse/KAFKA-15798
https://issues.apache.org/jira/browse/KAFKA-15797
https://issues.apache.org/jira/browse/KAFKA-15690
https://issues.apache.org/jira/browse/KAFKA-15699
https://issues.apache.org/jira/browse/KAFKA-15772
https://issues.apache.org/jira/browse/KAFKA-15759
https://issues.apache.org/jira/browse/KAFKA-15760
https://issues.apache.org/jira/browse/KAFKA-15700

I've also seen that kraft transactions tests often flakily see that the
producer id is not allocated and times out.
I can file a JIRA for that too.

Hopefully this is a place we can start from.

Justine

On Sat, Nov 11, 2023 at 11:35 AM Ismael Juma  wrote:

> On Sat, Nov 11, 2023 at 10:32 AM John Roesler  wrote:
>
> > In other words, I’m biased to think that new flakiness indicates
> > non-deterministic bugs more often than it indicates a bad test.
> >
>
> My experience is exactly the opposite. As someone who has tracked many of
> the flaky fixes, the vast majority of the time they are an issue with the
> test.
>
> Ismael
>


Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-11 Thread Ismael Juma
On Sat, Nov 11, 2023 at 10:32 AM John Roesler  wrote:

> In other words, I’m biased to think that new flakiness indicates
> non-deterministic bugs more often than it indicates a bad test.
>

My experience is exactly the opposite. As someone who has tracked many of
the flaky fixes, the vast majority of the time they are an issue with the
test.

Ismael


Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-11 Thread John Roesler
Thanks, all,

This proposal sounds good to me.

We could try to just disable the current flaky tests as a one-time step and 
configure GitHub only to merge green builds as a bulwark against the future.

By definition, flaky tests may not fail during the buggy PR build itself, but 
they should make themselves known soon afterwards, in the nightly and 
subsequent PRs. As long as committers are empowered to identify and revert 
flakiness-inducing PRs, they should be able to unblock their subsequent PRs. 

In other words, I’m biased to think that new flakiness indicates 
non-deterministic bugs more often than it indicates a bad test.

But whatever we do, it’s better than merging red builds. 

Thanks,
John

On Sat, Nov 11, 2023, at 10:48, Ismael Juma wrote:
> One more thing:
>
> 3. If test flakiness is introduced by a recent PR, it's appropriate to
> revert said PR vs disabling the flaky tests.
>
> Ismael
>
> On Sat, Nov 11, 2023, 8:45 AM Ismael Juma  wrote:
>
>> Hi David,
>>
>> I would be fine with:
>> 1. Only allowing merges if the build is green
>> 2. Disabling all flaky tests that aren't fixed within a week. That is, if
>> a test is flaky for more than a week, it should be automatically disabled
>> (it doesn't add any value since it gets ignored).
>>
>> We need both to make this work, if you just do step 1, then we will be
>> stuck with no ability to merge anything.
>>
>> Ismael
>>
>>
>>
>> On Sat, Nov 11, 2023, 2:02 AM David Jacot 
>> wrote:
>>
>>> Hi all,
>>>
>>> The state of our CI worries me a lot. Just this week, we merged two PRs
>>> with compilation errors and one PR introducing persistent failures. This
>>> really hurts the quality and the velocity of the project and it basically
>>> defeats the purpose of having a CI because we tend to ignore it nowadays.
>>>
>>> Should we continue to merge without a green build? No! We should not so I
>>> propose to prevent merging a pull request without a green build. This is a
>>> really simple and bold move that will prevent us from introducing
>>> regressions and will improve the overall health of the project. At the
>>> same
>>> time, I think that we should disable all the known flaky tests, raise
>>> jiras
>>> for them, find an owner for each of them, and fix them.
>>>
>>> What do you think?
>>>
>>> Best,
>>> David
>>>
>>


Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-11 Thread Ismael Juma
One more thing:

3. If test flakiness is introduced by a recent PR, it's appropriate to
revert said PR vs disabling the flaky tests.

Ismael

On Sat, Nov 11, 2023, 8:45 AM Ismael Juma  wrote:

> Hi David,
>
> I would be fine with:
> 1. Only allowing merges if the build is green
> 2. Disabling all flaky tests that aren't fixed within a week. That is, if
> a test is flaky for more than a week, it should be automatically disabled
> (it doesn't add any value since it gets ignored).
>
> We need both to make this work, if you just do step 1, then we will be
> stuck with no ability to merge anything.
>
> Ismael
>
>
>
> On Sat, Nov 11, 2023, 2:02 AM David Jacot 
> wrote:
>
>> Hi all,
>>
>> The state of our CI worries me a lot. Just this week, we merged two PRs
>> with compilation errors and one PR introducing persistent failures. This
>> really hurts the quality and the velocity of the project and it basically
>> defeats the purpose of having a CI because we tend to ignore it nowadays.
>>
>> Should we continue to merge without a green build? No! We should not so I
>> propose to prevent merging a pull request without a green build. This is a
>> really simple and bold move that will prevent us from introducing
>> regressions and will improve the overall health of the project. At the
>> same
>> time, I think that we should disable all the known flaky tests, raise
>> jiras
>> for them, find an owner for each of them, and fix them.
>>
>> What do you think?
>>
>> Best,
>> David
>>
>


Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-11 Thread Ismael Juma
Hi David,

I would be fine with:
1. Only allowing merges if the build is green
2. Disabling all flaky tests that aren't fixed within a week. That is, if a
test is flaky for more than a week, it should be automatically disabled (it
doesn't add any value since it gets ignored).

We need both to make this work, if you just do step 1, then we will be
stuck with no ability to merge anything.

Ismael



On Sat, Nov 11, 2023, 2:02 AM David Jacot 
wrote:

> Hi all,
>
> The state of our CI worries me a lot. Just this week, we merged two PRs
> with compilation errors and one PR introducing persistent failures. This
> really hurts the quality and the velocity of the project and it basically
> defeats the purpose of having a CI because we tend to ignore it nowadays.
>
> Should we continue to merge without a green build? No! We should not so I
> propose to prevent merging a pull request without a green build. This is a
> really simple and bold move that will prevent us from introducing
> regressions and will improve the overall health of the project. At the same
> time, I think that we should disable all the known flaky tests, raise jiras
> for them, find an owner for each of them, and fix them.
>
> What do you think?
>
> Best,
> David
>


Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-11 Thread Sagar
Hi David,

Thanks for bringing this point and also for creating the revert PRs. I
think there has been an effort in the community to fix a lot of flakey
tests(like around MirrorMaker). I also agree that we shouldn't merge PRs
without green builds and look to ignore flaky tests. For example, I did a
quick search for some of the common (and sometimes longstanding flakey
tests) and this is a brief list. Some of them have JIRAs associated with
them as well =>

1)
org.apache.kafka.trogdor.coordinator.CoordinatorTest.testTaskRequestWithOldStartMsGetsUpdated
=> https://issues.apache.org/jira/browse/KAFKA-8115

2) kafka.server.DynamicBrokerReconfigurationTest.testThreadPoolResize =>
https://issues.apache.org/jira/browse/KAFKA-15421

3)
org.apache.kafka.controller.QuorumControllerTest.testFenceMultipleBrokers
(Couldn't find JIRA).

4)
integration.kafka.server.FetchFromFollowerIntegrationTest.testRackAwareRangeAssignor
 => https://issues.apache.org/jira/browse/KAFKA-15020

5) EosIntegrationTest => https://issues.apache.org/jira/browse/KAFKA-15690

The only thing is where do we draw the line of disabling a genuine flakey
test v/s looking to fix it. I feel that could get confusing at times
especially if the flakey test involved is on an unrelated part of code
(like a flaky Connect test on Group Coordinator or Streams).

Thanks!
Sagar.


On Sat, Nov 11, 2023 at 3:31 PM David Jacot 
wrote:

> Hi all,
>
> The state of our CI worries me a lot. Just this week, we merged two PRs
> with compilation errors and one PR introducing persistent failures. This
> really hurts the quality and the velocity of the project and it basically
> defeats the purpose of having a CI because we tend to ignore it nowadays.
>
> Should we continue to merge without a green build? No! We should not so I
> propose to prevent merging a pull request without a green build. This is a
> really simple and bold move that will prevent us from introducing
> regressions and will improve the overall health of the project. At the same
> time, I think that we should disable all the known flaky tests, raise jiras
> for them, find an owner for each of them, and fix them.
>
> What do you think?
>
> Best,
> David
>


[DISCUSS] Should we continue to merge without a green build? No!

2023-11-11 Thread David Jacot
Hi all,

The state of our CI worries me a lot. Just this week, we merged two PRs
with compilation errors and one PR introducing persistent failures. This
really hurts the quality and the velocity of the project and it basically
defeats the purpose of having a CI because we tend to ignore it nowadays.

Should we continue to merge without a green build? No! We should not so I
propose to prevent merging a pull request without a green build. This is a
really simple and bold move that will prevent us from introducing
regressions and will improve the overall health of the project. At the same
time, I think that we should disable all the known flaky tests, raise jiras
for them, find an owner for each of them, and fix them.

What do you think?

Best,
David