Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20739 )
Change subject: KUDU-3524 Fix crash when sending periodic keep-alive requests ...................................................................... Patch Set 6: (2 comments) Thank you very much for the fix! I built client-test with this patch and run the test many times as prescribed by instructions in KUDU-3524. The test is stable and doesn't crash now, while before this fix it was failing 100% of the runs on my macOS laptop. http://gerrit.cloudera.org:8080/#/c/20739/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20739/3//COMMIT_MSG@10 PS3, Line 10: eriodical > This case only happens in macOS. The community server is linux. And this ca While I'm not exactly sure this happens only on macOS, but I could not reproduce the exact crash when running the test on Linux as prescribed in KUDU-3524. Maybe, this manifests itself a bit differently? I'm not sure about that, and I didn't spend any time trying to investigate this on Linux -- having failing 100% on macOS with the same trace under debugger was an evidence of a bug to me. However, I know there are many failures of the KeepAlivePeriodically/KeepAlivePeriodicallyTest.TestScannerKeepAlivePeriodicallyScannerTolerate/1 test at the dist-test dashboard. The majority of those failures were because of that tests failing: http://dist-test.cloudera.org:8080/test_drilldown?test_name=client-test One example is at http://dist-test.cloudera.org:8080/diagnose?key=0c43a1ce-9a29-11ee-b18e-0242ac110002 The test failed because of timeout, but you could some abnormal condition has been detected and traces of all threads have been printed out. Probably, that masks the issue on Linux, but on macOS we don't have similar tracing, so the process simply crashes, making it easier to comprehed and troubleshoot. http://gerrit.cloudera.org:8080/#/c/20739/3//COMMIT_MSG@10 PS3, Line 10: eriodical > Could you please add a unit test to reproduce this? And make sure the fix w We have had many failures in dist-test recently after introducing the new keep-alive functionality: http://dist-test.cloudera.org:8080/test_drilldown?test_name=client-test The majority of those failures are because of this bug, I guess, but it manifests itself differently on Linux, maybe? I didn't to any investigation on Linux, but I bet that was the same issue, and with this fix flakiness in client-test will go down. I mean we already have tests that are failing on Linux because of the bug, but it manifests itself a bit different :) I also verified that the test run from code built with this patch no longer crashes on macOS, so the issue is fixed. -- To view, visit http://gerrit.cloudera.org:8080/20739 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I130db970a091cdf7689245a79dc4ea445d1f739f Gerrit-Change-Number: 20739 Gerrit-PatchSet: 6 Gerrit-Owner: Wang Xixu <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Reviewer: Ádám Bakai <[email protected]> Gerrit-Comment-Date: Fri, 15 Dec 2023 05:08:10 +0000 Gerrit-HasComments: Yes
