[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 ) Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure. .. Patch Set 8: (3 comments) http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc File src/kudu/util/maintenance_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc@922 PS7, Line 922: TEST_F(MaintenanceManagerTest, TestNotFlushMemory) { > Did you run any practical workloads to see other maintenance ops in action Thank you for your review! Yes, I did. This patch has worked in our clusters for a long time. Maintenance manager does schedule some operations other than flush ops while under memory pressure. I have a test, which is involved in KUDU-3488, testing various policies of maintenance manager, including this not_flush_memory_prob. And it shows that the not-flush mechanism works. Sometimes the memory usage of tablet servers stay at about 60%(memory pressure threshold), because the write workload and the flush ability of maintenance manager are almost equal, in which case the high perf score operations can not be run. And eventually the performance of the tablet server will be lower and lower, leading to higher memory usage, and finally a vicious circle appears. A user might want to turn on this if he find that the situation which is described above occurred, and he need to find the balance between the performance and memory. Actually, The probability is decreasing while the memory usage is getting close to the 80%(memory soft limit), so mostly the memory usage won't be too high. http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc File src/kudu/util/maintenance_manager.cc: http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104 PS7, Line 104: DEFINE_double(not_flush_memory_prob, 0, > You solution only distinguishes the memory related ops and non-memory relat Thanks for your comment! I think the ops which we want to run have already got a high perf score, the only reason they can't be run is that FindBestOp always do flush ops if under memory pressure. For now, I think the perf score mechanism is able to find the best op to run after tuning the configurations and table priorities. http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104 PS7, Line 104: 0 > If this is going to be 0(i.e. DRS/MRS flush ops will be scheduled as per pr Exactly, I will commit another patch with more information. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 8 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Comment-Date: Tue, 25 Jul 2023 07:33:19 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20166 to look at the new patch set (#8). Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure. .. KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure. This patch add an argument to give a chance to do other ops while server is under memory pressure. This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage. The higher memory usage is, this higher probability to do flush MRS/DMS. e.g. memory_pressure_percentage = 60% memory_limit_soft_percentage = 80% The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%. Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 --- M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 3 files changed, 103 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/8 -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 8 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
Ashwani Raina has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 ) Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure. .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc File src/kudu/util/maintenance_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc@922 PS7, Line 922: TEST_F(MaintenanceManagerTest, TestNotFlushMemory) { Did you run any practical workloads to see other maintenance ops in action when memory usage is high? I want to understand the impact of not choosing flush ops when memory usage is high for long time. Also, what factors are to be kept in mind when deciding to increase/decrease a probability of running flush ops? http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc File src/kudu/util/maintenance_manager.cc: http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104 PS7, Line 104: 0 If this is going to be 0(i.e. DRS/MRS flush ops will be scheduled as per priority), you might want to think about providing some to hint to user that it is time to avoid flushing DRS and MRS when under memory pressure. -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 7 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Comment-Date: Fri, 14 Jul 2023 13:51:53 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 ) Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure. .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc File src/kudu/util/maintenance_manager.cc: http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104 PS7, Line 104: DEFINE_double(not_flush_memory_prob, 0, You solution only distinguishes the memory related ops and non-memory related ops. How about distributing every op a priority, so you can adjust the priority of every type of ops to decide which op should be executed firstly? -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 7 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Comment-Date: Thu, 13 Jul 2023 08:50:42 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20166 to look at the new patch set (#7). Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure. .. KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure. This patch add an argument to give a chance to do other ops while server is under memory pressure. This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage. The higher memory usage is, this higher probability to do flush MRS/DMS. e.g. memory_pressure_percentage = 60% memory_limit_soft_percentage = 80% The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%. Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 --- M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 3 files changed, 101 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/7 -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 7 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20166 to look at the new patch set (#6). Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure. .. KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure. This patch add an argument to give a chance to do other ops while server is under memory pressure. This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage. The higher memory usage is, this higher probability to do flush MRS/DMS. e.g. memory_pressure_percentage = 60% memory_limit_soft_percentage = 80% The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%. Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 --- M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 3 files changed, 87 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/6 -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 6 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 ) Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure. .. Patch Set 5: (17 comments) > Patch Set 5: Verified-1 > > Build Failed > > http://jenkins.kudu.apache.org/job/kudu-gerrit/28183/ : FAILURE This unit test TestPrioritizeLogRetentionUnderMemoryPressure works well in my PC. http://gerrit.cloudera.org:8080/#/c/20166/3//COMMIT_MSG Commit Message: PS3: > Please follow this generic git guideline to properly format the commit mess Changed the tittle and still not sure it's ok. http://gerrit.cloudera.org:8080/#/c/20166/3//COMMIT_MSG@7 PS3, Line 7: -3407: Give a chance to do other maintenance operations while serv > Do you have any measurements/results that show how much improvement this pr This patch has been working in our clusters(more than 100 clusters) for over half a year. It does make some influence since the performance will not be too low even if the cluster stay in high memory pressure for very long. According to web page and log, tablet server is scheduling some other operations like CompactRowsetOp, MajorCompactOp, which have big impact on the performance of tablet server. http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager-test.cc File src/kudu/util/maintenance_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager-test.cc@919 PS3, Line 919: FLAGS_not_flush_memory_prob = 1; > This test scenario covers just one corner case. It would be great to add m About the test, there is another test about the whole maintenance manager, testing which operations will be run in various policies, which you mentioned in JIRA. But it's based on this patch so I post this patch first. http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h File src/kudu/util/maintenance_manager.h: http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@379 PS3, Line 379: ressure > memory pressure Done http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@379 PS3, Line 379: ends o > the server Done http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@379 PS3, Line 379: r > run Done http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@380 PS3, Line 380: he flag no > Please describe the parameter and the return value. Done http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc File src/kudu/util/maintenance_manager.cc: http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@104 PS3, Line 104: 0, > What sort of testing has been done to make sure setting the default value t That's the default value in our clusters and it works well, but still I changed it to 0. http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@104 PS3, Line 104: DEFINE_double(not_flush_memory_prob, 0, > Consider adding a validator for this flag: IIUC, the only allowed values ar Added validator for not_flush_memory_prob and also data_gc_prioritization_prob. http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@105 PS3, Line 105: " > memory pressure Done http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@106 PS3, Line 106: server > memory pressure Done http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@107 PS3, Line 107: d there are oth > to run Done http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@519 PS3, Line 519: > even if under memory pressure Done http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@519 PS3, Line 519: > What are 'performance ops'? Done http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@522 PS3, Line 522: // anchors the most WALs (the op should also free memory). : // > Would it would be easier to comprehend if calling memory_pressure_func_() f That would be much better. http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@715 PS3, Line 715: void Main > nit: the indent here should be 4 spaces per Kudu's C++ code style Done http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@716 PS3, Line 716: running > ditto Done -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 5 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20166 to look at the new patch set (#5). Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure. .. KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure. This patch add an argument to give a chance to do other OP while server is under memory pressure. This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage. The higher memory usage is, this higher probability to do flush MRS/DMS. e.g. memory_pressure_percentage = 60% memory_limit_soft_percentage = 80% The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%. Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 --- M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 3 files changed, 87 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/5 -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 5 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.
Hello Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20166 to look at the new patch set (#4). Change subject: KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure. .. KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure. This patch add an argument to give a chance to do other OP while server is under memory pressure. This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage. The higher memory usage is, this higher probability to do flush MRS/DMS. e.g. memory_pressure_percentage = 60% memory_limit_soft_percentage = 80% The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%. Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 --- M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 3 files changed, 85 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/4 -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 4 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng