Song Jiacheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407: Give a chance to do other maintenance operations 
while server is under memory pressure.
......................................................................


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc@922
PS7, Line 922: TEST_F(MaintenanceManagerTest, TestNotFlushMemory) {
> Did you run any practical workloads to see other maintenance ops in action
Thank you for your review!
Yes, I did. This patch has worked in our clusters for a long time. Maintenance 
manager does schedule some operations other than flush ops while under memory 
pressure.
I have a test, which is involved in KUDU-3488, testing various policies of 
maintenance manager, including this not_flush_memory_prob. And it shows that 
the not-flush mechanism works.
Sometimes the memory usage of tablet servers stay at about 60%(memory pressure 
threshold), because the write workload and the flush ability of maintenance 
manager are almost equal, in which case the high perf score operations can not 
be run. And eventually the performance of the tablet server will be lower and 
lower, leading to higher memory usage, and finally a vicious circle appears.
A user might want to turn on this if he find that the situation which is 
described above occurred, and he need to find the balance between the 
performance and memory.
Actually, The probability is decreasing while the memory usage is getting close 
to the 80%(memory soft limit), so mostly the memory usage won't be too high.


http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104
PS7, Line 104: DEFINE_double(not_flush_memory_prob, 0,
> You solution only distinguishes the memory related ops and non-memory relat
Thanks for your comment!
I think the ops which we want to run have already got a high perf score, the 
only reason they can't be run is that FindBestOp always do flush ops if under 
memory pressure. For now, I think the perf score mechanism is able to find the 
best op to run after tuning the configurations and table priorities.


http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104
PS7, Line 104: 0
> If this is going to be 0(i.e. DRS/MRS flush ops will be scheduled as per pr
Exactly, I will commit another patch with more information.
Thanks!



--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 8
Gerrit-Owner: Song Jiacheng <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng <[email protected]>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <[email protected]>
Gerrit-Comment-Date: Tue, 25 Jul 2023 07:33:19 +0000
Gerrit-HasComments: Yes

Reply via email to