Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19842 )

Change subject: [tests] Fix flakiness in log-rolling-itest.cc
......................................................................


Patch Set 2:

> > Patch Set 1:
 > >
 > > > > > Patch Set 1:
 > >  > > >
 > >  > > > > Patch Set 1:
 > >  > > > >
 > >  > > > > (1 comment)
 > >  > > >
 > >  > > > Running git bisect now, I'll update this patch with my
 > >  > findings.
 > >  > >
 > >  > > Finished running git bisect, this commit: 
 > > https://github.com/apache/kudu/commit/555854178b9b498701619f4bb0dbbbbeab8e69e7
 > >  > >
 > >  > > was the first bad commit where the test started to become
 > flaky.
 > >  > > Looks like there were some changes in ServerBase::Init()
 > that may
 > >  > > have caused the initialization process to be longer. For
 > context,
 > >  > > the thread that cleans up the excess log files is
 > >  > ServerBase::StartExcessGlogDeleterThread().
 > >  >
 > >  > Interesting: that doesn't seem to be the case, at least in my
 > >  > testing.
 > >  >
 > >  > I did a test run of the log-rolling-itest for the git commit
 > >  > 555854178 and commit just before.  In both cases, all 256
 > tests
 > >  > passed for both commits, so my testing show that 555854178
 > isn't
 > >  > the changelist that introduced the issue.
 > >  >
 > >  > Test run for b00431df6:
 > >  > http://dist-test.cloudera.org//job?job_id=aserbin.1683691780.85751
 > >  >
 > >  > Test run for 555854178:
 > >  > http://dist-test.cloudera.org//job?job_id=aserbin.1683695402.105720
 > >
 > > And running the test with the very latest sources is also passing
 > in my case, interesting:
 > >
 > > http://dist-test.cloudera.org//job?job_id=aserbin.1683697731.120325
 > >
 > > I was running those tests using the following command:
 > >
 > > ../../build-support/dist_test.py loop -n 256 -- $test_fpath
 > --stress_cpu_threads=16 $@
 >
 > Interesting, what's the default value for --stress_cpu_threads?
 > Could that parameter being set to 16 affect the result of this
 > test? For context, the command I ran was
 > ../../build-support/dist_test.py loop -n 100 ./bin/log-rolling-itest

I guess there original timeout (30) seconds was in the grey area, and since the 
changelist 555854178 introduced an extra check for time source (that might add 
up to 1 extra second to the startup time), the ASSERT_EVENTUALLY started 
failing.

At least, introducing this fix doesn't hurt, and allows to have less flakiness 
in the test.

Thank you for the fix!


--
To view, visit http://gerrit.cloudera.org:8080/19842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d33559fa01c26d8c8b3ad12b8822d2cdd760fae
Gerrit-Change-Number: 19842
Gerrit-PatchSet: 2
Gerrit-Owner: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Comment-Date: Mon, 22 May 2023 16:10:40 +0000
Gerrit-HasComments: No

Reply via email to