Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 )
Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure. ...................................................................... Patch Set 8: (3 comments) http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc File src/kudu/util/maintenance_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc@922 PS7, Line 922: TEST_F(MaintenanceManagerTest, TestNotFlushMemory) { > Did you run any practical workloads to see other maintenance ops in action Thank you for your review! Yes, I did. This patch has worked in our clusters for a long time. Maintenance manager does schedule some operations other than flush ops while under memory pressure. I have a test, which is involved in KUDU-3488, testing various policies of maintenance manager, including this not_flush_memory_prob. And it shows that the not-flush mechanism works. Sometimes the memory usage of tablet servers stay at about 60%(memory pressure threshold), because the write workload and the flush ability of maintenance manager are almost equal, in which case the high perf score operations can not be run. And eventually the performance of the tablet server will be lower and lower, leading to higher memory usage, and finally a vicious circle appears. A user might want to turn on this if he find that the situation which is described above occurred, and he need to find the balance between the performance and memory. Actually, The probability is decreasing while the memory usage is getting close to the 80%(memory soft limit), so mostly the memory usage won't be too high. http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc File src/kudu/util/maintenance_manager.cc: http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104 PS7, Line 104: DEFINE_double(not_flush_memory_prob, 0, > You solution only distinguishes the memory related ops and non-memory relat Thanks for your comment! I think the ops which we want to run have already got a high perf score, the only reason they can't be run is that FindBestOp always do flush ops if under memory pressure. For now, I think the perf score mechanism is able to find the best op to run after tuning the configurations and table priorities. http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104 PS7, Line 104: 0 > If this is going to be 0(i.e. DRS/MRS flush ops will be scheduled as per pr Exactly, I will commit another patch with more information. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 8 Gerrit-Owner: Song Jiacheng <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng <[email protected]> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Comment-Date: Tue, 25 Jul 2023 07:33:19 +0000 Gerrit-HasComments: Yes
