Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/19842 )
Change subject: [tests] Fix flakiness in log-rolling-itest.cc ...................................................................... Patch Set 2: > > Patch Set 1: > > > > > > > Patch Set 1: > > > > > > > > > > > Patch Set 1: > > > > > > > > > > > > (1 comment) > > > > > > > > > > Running git bisect now, I'll update this patch with my > > > findings. > > > > > > > > Finished running git bisect, this commit: > > https://github.com/apache/kudu/commit/555854178b9b498701619f4bb0dbbbbeab8e69e7 > > > > > > > > was the first bad commit where the test started to become > flaky. > > > > Looks like there were some changes in ServerBase::Init() > that may > > > > have caused the initialization process to be longer. For > context, > > > > the thread that cleans up the excess log files is > > > ServerBase::StartExcessGlogDeleterThread(). > > > > > > Interesting: that doesn't seem to be the case, at least in my > > > testing. > > > > > > I did a test run of the log-rolling-itest for the git commit > > > 555854178 and commit just before. In both cases, all 256 > tests > > > passed for both commits, so my testing show that 555854178 > isn't > > > the changelist that introduced the issue. > > > > > > Test run for b00431df6: > > > http://dist-test.cloudera.org//job?job_id=aserbin.1683691780.85751 > > > > > > Test run for 555854178: > > > http://dist-test.cloudera.org//job?job_id=aserbin.1683695402.105720 > > > > And running the test with the very latest sources is also passing > in my case, interesting: > > > > http://dist-test.cloudera.org//job?job_id=aserbin.1683697731.120325 > > > > I was running those tests using the following command: > > > > ../../build-support/dist_test.py loop -n 256 -- $test_fpath > --stress_cpu_threads=16 $@ > > Interesting, what's the default value for --stress_cpu_threads? > Could that parameter being set to 16 affect the result of this > test? For context, the command I ran was > ../../build-support/dist_test.py loop -n 100 ./bin/log-rolling-itest I guess there original timeout (30) seconds was in the grey area, and since the changelist 555854178 introduced an extra check for time source (that might add up to 1 extra second to the startup time), the ASSERT_EVENTUALLY started failing. At least, introducing this fix doesn't hurt, and allows to have less flakiness in the test. Thank you for the fix! -- To view, visit http://gerrit.cloudera.org:8080/19842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d33559fa01c26d8c8b3ad12b8822d2cdd760fae Gerrit-Change-Number: 19842 Gerrit-PatchSet: 2 Gerrit-Owner: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Comment-Date: Mon, 22 May 2023 16:10:40 +0000 Gerrit-HasComments: No
