[kudu-CR] KUDU-3407 not always flush even if under memory pressure.
Ashwani Raina has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 ) Change subject: KUDU-3407 not always flush even if under memory pressure. .. Patch Set 17: (13 comments) http://gerrit.cloudera.org:8080/#/c/20166/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20166/14//COMMIT_MSG@18 PS14, Line 18: usage nit: usage is http://gerrit.cloudera.org:8080/#/c/20166/14//COMMIT_MSG@20 PS14, Line 20: this nit: the http://gerrit.cloudera.org:8080/#/c/20166/14//COMMIT_MSG@20 PS14, Line 20: higher memory nit: higher the memory http://gerrit.cloudera.org:8080/#/c/20166/14//COMMIT_MSG@20 PS14, Line 20: flush : MRS/DMS nit: MRS/DMS flushes http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@7 PS17, Line 7: not always flush even if nit: Avoid unchecked scheduling of flush operations http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@20 PS17, Line 20: this higher probability to do flush nit: higher the probability to flush http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@20 PS17, Line 20: The higher nit: Higher the http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@27 PS17, Line 27: nit: remove whitespace http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@27 PS17, Line 27: not_flush_memory_prob nit: you mean "run_non_memory_ops_prob" ? http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc File src/kudu/util/maintenance_manager.cc: http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@110 PS17, Line 110: maintainer nit: maybe you could use "system admin" or "user"? There has to be some mechanism by which user/admin is made aware that this flag can be used to regulate the scheduling of flush operations when node is under memory pressure. Maybe you could point to some (metrics) or (recommended action via a warning log), visible to user/admin that essentially denote these conditions and suggest to turn this on or increase the probability. http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@111 PS17, Line 111: the performance is decreasing. nit: "there is a significant degradation in performance." http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@730 PS17, Line 730: FlushOrNot FlushOrNot becomes confusing when return type is bool. It is not clearly evident from the name whether this method returns true or false when flush is a go. Maybe rename to "ProceedWithFlush" ? http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@730 PS17, Line 730: bool MaintenanceManager::FlushOrNot(double* used_memory_percentage) { nit: Add a comment above that describes the purpose of this method, out parameter and return value. -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 17 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Comment-Date: Tue, 12 Sep 2023 07:30:14 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3407 not always flush even if under memory pressure.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 ) Change subject: KUDU-3407 not always flush even if under memory pressure. .. Patch Set 17: (13 comments) http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc File src/kudu/util/maintenance_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@151 PS17, Line 151: DLOG(INFO) << "Re-registering op " << this->name(); Consider removing this once done debugging/troubleshooting the new test -- it isn't used by the test, but pollutes the output. http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@314 PS17, Line 314: int64_t nit: why not to use the MonoDelta type here? http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@315 PS17, Line 315: Perform count; nit: How many times the operation has been run. http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@989 PS17, Line 989: not assert anything Is it possible to assert on the range of the expected counters, etc.? Given the amount of memory is pre-determined and doesn't depend on the actual amount of memory that a node has, I'd expect that the scenario should preserve some particular traits even if its behavior is driven by some stochastic factors. Running the scenario multiple times should converge to a pretty narrow range of the result counters, no? http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.h File src/kudu/util/maintenance_manager.h: http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.h@379 PS17, Line 379: server pressure the memory pressure http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.h@386 PS17, Line 386: bool FlushOrNot(double* used_memory_percentage); nit: you could use a free format in a non-public API when documenting methods and fields; the doxygen format isn't a requirement here http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc File src/kudu/util/maintenance_manager.cc: http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@522 PS3, Line 522: // Look at ops that we can run quickly that free up log retention. : if (low_io_most_logs_retai > That would be much better. Alternatively, updating the `memory_pressure_func_` with the new function could be a way to go. Basically, having less parts for the condition would be better. http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc File src/kudu/util/maintenance_manager.cc: http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@105 PS17, Line 105: we nit: Who's "we" here? It's better to be more specific on who does what here. http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@108 PS17, Line 108: ran run http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@136 PS17, Line 136: Simulated memory usage What is the unit of the usage? http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@539 PS17, Line 539: in under http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@542 PS17, Line 542: FlushOrNot Alternatively, consider calling memory_pressure_func_() functor as it is, but assign it to a different method/function, as necessary? http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@732 PS17, Line 732: if (PREDICT_FALSE(FLAGS_memory_simulate_for_test != 0.0)) { : *used_memory_percentage = FLAGS_memory_simulate_for_test; : } Instead of introducing this extra test flag and adding this extra logic, consider setting the memory_pressure_func_ via the MaintenanceManager::set_memory_pressure_func_for_tests() method for specific tests, similar to what's done in MaintenanceManagerTest::StartManager()? The extra level of indirection introduced by MaintenanceManager::memory_pressure_func_ is there just for this particular purpose, so introducing an extra flag looks like an overkill if having that provision already. -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 17 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Comment-Date: Wed, 06 Sep 2023 23:46:58 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3407 not always flush even if under memory pressure.
Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 ) Change subject: KUDU-3407 not always flush even if under memory pressure. .. Patch Set 17: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 17 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Comment-Date: Thu, 31 Aug 2023 07:52:22 + Gerrit-HasComments: No
[kudu-CR] KUDU-3407 not always flush even if under memory pressure.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20166 to look at the new patch set (#17). Change subject: KUDU-3407 not always flush even if under memory pressure. .. KUDU-3407 not always flush even if under memory pressure. In some clusters, the memory usages of tservers might be 60% ~ 80% for a long time. During this time the maintenance manager will not run any operation other than wal gc and MRS/DRS flushes, which will make the performance of tservers worse and worse and eventually break due to OOM. This patch add an argument to give a chance to do other ops while server is under memory pressure. This mechanism works when the memory usage is between memory_pressure_percentage and memory_limit_soft_percentage. The higher memory usage is, this higher probability to do flush MRS/DMS. e.g. memory_pressure_percentage = 60% memory_limit_soft_percentage = 80% The probability of not flushing MRS/DMS is the value of not_flush_memory_prob. As the memory increases, it gradually decreases to 0, when the memory usage is 80%. Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 --- M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 3 files changed, 210 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/17 -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 17 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
[kudu-CR] KUDU-3407 not always flush even if under memory pressure.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20166 to look at the new patch set (#16). Change subject: KUDU-3407 not always flush even if under memory pressure. .. KUDU-3407 not always flush even if under memory pressure. In some clusters, the memory usages of tservers might be 60% ~ 80% for a long time. During this time the maintenance manager will not run any operation other than wal gc and MRS/DRS flushes, which will make the performance of tservers worse and worse and eventually break due to OOM. This patch add an argument to give a chance to do other ops while server is under memory pressure. This mechanism works when the memory usage is between memory_pressure_percentage and memory_limit_soft_percentage. The higher memory usage is, this higher probability to do flush MRS/DMS. e.g. memory_pressure_percentage = 60% memory_limit_soft_percentage = 80% The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%. Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 --- M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 3 files changed, 210 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/16 -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 16 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
[kudu-CR] KUDU-3407 not always flush even if under memory pressure.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20166 to look at the new patch set (#15). Change subject: KUDU-3407 not always flush even if under memory pressure. .. KUDU-3407 not always flush even if under memory pressure. In some clusters, the memory usages of tservers might be 60% ~ 80% for a long time. During this time the maintenance manager will not run any operation other than wal gc and MRS/DRS flushes, which will make the performance of tservers worse and worse and eventually break due to OOM. This patch add an argument to give a chance to do other ops while server is under memory pressure. This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage. The higher memory usage is, this higher probability to do flush MRS/DMS. e.g. memory_pressure_percentage = 60% memory_limit_soft_percentage = 80% The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%. Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 --- M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 3 files changed, 210 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/15 -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 15 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
[kudu-CR] KUDU-3407 not always flush even if under memory pressure.
Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 ) Change subject: KUDU-3407 not always flush even if under memory pressure. .. Patch Set 14: (7 comments) I added a test to show which operation the maintenance manager would like to choose in various policies and workloads, which is involved in KUDU-3488. Using the test, we could test all the operations and flags, including the newly added flag in this patch. http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager-test.cc File src/kudu/util/maintenance_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager-test.cc@336 PS9, Line 336: > The default value is 0, why set it again? Done http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.h File src/kudu/util/maintenance_manager.h: http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.h@382 PS9, Line 382: /// @param [out] used_memory_percentage > nit: used_memory_percentage Done http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc File src/kudu/util/maintenance_manager.cc: http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@104 PS9, Line 104: run_non_memory_ops_pr > It is easy to be understand to use a positive flag. How about: run_non_memo Done http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@107 PS9, Line 107: mory operations " > non-memory operations waiting to be ran. Done http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@109 PS9, Line 109: ead of flushing ops. This might be needed to turn " : "on if maintainer found that the tablet server is under memory " : "pressure for a long time and > I think your purpose is to make a balance between running memory and non-me It is computed by the memory usage. Maintainer may set a certain value and the exact probability is calculated by the value and the memory usage. I think the only convincible factor is memory usage, other factors, like perf scores of operations, are not really reliable. http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@111 PS9, Line 111: pressure for a > Reading or wring performance? Both, I think. Row set height is getting higher, redo files are getting larger, so both write performance and read performance might be worse. http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@112 PS9, Line 112: TAG_FLAG(run_non_memory_ops_prob, experimental > It should also add 'runtime' flag? Done -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 14 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Comment-Date: Thu, 24 Aug 2023 09:05:17 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3407 not always flush even if under memory pressure.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20166 to look at the new patch set (#14). Change subject: KUDU-3407 not always flush even if under memory pressure. .. KUDU-3407 not always flush even if under memory pressure. In some clusters, the memory usages of tservers might be 60% ~ 80% for a long time. During this time the maintenance manager will not run any operation other than WAL gc and MRS/DRS flushes, which will make the performance of tservers worse and worse and eventually break due to OOM. This patch add an argument to give a chance to do other ops while server is under memory pressure. This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage. The higher memory usage is, this higher probability to do flush MRS/DMS. e.g. memory_pressure_percentage = 60% memory_limit_soft_percentage = 80% The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%. Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 --- M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 3 files changed, 213 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/14 -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 14 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
[kudu-CR] KUDU-3407 not always flush even if under memory pressure.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20166 to look at the new patch set (#13). Change subject: KUDU-3407 not always flush even if under memory pressure. .. KUDU-3407 not always flush even if under memory pressure. In some clusters, the memory usages of tservers might be 60% ~ 80% for a long time. During this time the maintenance manager will not run any operation other than wal gc and MRS/DRS flushes, which will make the performance of tservers worse and worse and eventually break due to OOM. This patch add an argument to give a chance to do other ops while server is under memory pressure. This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage. The higher memory usage is, this higher probability to do flush MRS/DMS. e.g. memory_pressure_percentage = 60% memory_limit_soft_percentage = 80% The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%. Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 --- M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 3 files changed, 213 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/13 -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 13 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
[kudu-CR] KUDU-3407 not always flush even if under memory pressure.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20166 to look at the new patch set (#12). Change subject: KUDU-3407 not always flush even if under memory pressure. .. KUDU-3407 not always flush even if under memory pressure. In some clusters, the memory usages of tservers might be 60% ~ 80% for a long time. During this time the maintenance manager will not run any operation other than wal gc and MRS/DRS flushes, which will make the performance of tservers worse and worse and eventually break due to OOM. This patch add an argument to give a chance to do other ops while server is under memory pressure. This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage. The higher memory usage is, this higher probability to do flush MRS/DMS. e.g. memory_pressure_percentage = 60% memory_limit_soft_percentage = 80% The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%. Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 --- M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 3 files changed, 212 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/12 -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 12 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
[kudu-CR] KUDU-3407 not always flush even if under memory pressure.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20166 to look at the new patch set (#11). Change subject: KUDU-3407 not always flush even if under memory pressure. .. KUDU-3407 not always flush even if under memory pressure. In some clusters, the memory usages of tservers might be 60% ~ 80% for a long time. During this time the maintenance manager will not run any operation other than wal gc and MRS/DRS flushes, which will make the performance of the tservers worse and worse and eventually break due to OOM. This patch add an argument to give a chance to do other ops while server is under memory pressure. This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage. The higher memory usage is, this higher probability to do flush MRS/DMS. e.g. memory_pressure_percentage = 60% memory_limit_soft_percentage = 80% The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%. Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 --- M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 3 files changed, 207 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/11 -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 11 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
[kudu-CR] KUDU-3407 not always flush even if under memory pressure.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20166 to look at the new patch set (#10). Change subject: KUDU-3407 not always flush even if under memory pressure. .. KUDU-3407 not always flush even if under memory pressure. In some clusters, the memory usages of tservers might be 60% ~ 80% for a long time. During this time the maintenance manager will not run any operation other than wal gc and MRS/DRS flushes, which will make the performance of tservers worse and worse and eventually break due to OOM. This patch add an argument to give a chance to do other ops while server is under memory pressure. This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage. The higher memory usage is, this higher probability to do flush MRS/DMS. e.g. memory_pressure_percentage = 60% memory_limit_soft_percentage = 80% The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%. Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 --- M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 3 files changed, 207 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/10 -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 10 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
[kudu-CR] KUDU-3407 not always flush even if under memory pressure.
Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20166 ) Change subject: KUDU-3407 not always flush even if under memory pressure. .. Patch Set 9: (8 comments) http://gerrit.cloudera.org:8080/#/c/20166/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20166/9//COMMIT_MSG@8 PS9, Line 8: : This patch add an argument to give a chance to do other ops while : server is under memory pressure. : The background of this patch should also be introduced. http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager-test.cc File src/kudu/util/maintenance_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager-test.cc@336 PS9, Line 336: FLAGS_not_flush_memory_prob = 0.0; The default value is 0, why set it again? http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.h File src/kudu/util/maintenance_manager.h: http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.h@382 PS9, Line 382: /// @param [out] capacity_pct nit: used_memory_percentage http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc File src/kudu/util/maintenance_manager.cc: http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@104 PS9, Line 104: not_flush_memory_prob It is easy to be understand to use a positive flag. How about: run_non_memory_ops_prob? http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@107 PS9, Line 107: operations to run non-memory operations waiting to be ran. http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@109 PS9, Line 109: This might be needed to turn on if maintainer found " : "that the tablet server is under memory pressure for a long time and " : "the performance is decreasing I think your purpose is to make a balance between running memory and non-memory operations. But It is hard to decide the value of not_flush_memory_prob by the maintainer. How about to computing the not_flush_memory_prob automatically, and using a switch to turn on this feature? http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@111 PS9, Line 111: the performance Reading or wring performance? http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@112 PS9, Line 112: TAG_FLAG(not_flush_memory_prob, experimental); It should also add 'runtime' flag? -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 9 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Comment-Date: Fri, 04 Aug 2023 08:29:20 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3407 not always flush even if under memory pressure.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20166 to look at the new patch set (#9). Change subject: KUDU-3407 not always flush even if under memory pressure. .. KUDU-3407 not always flush even if under memory pressure. This patch add an argument to give a chance to do other ops while server is under memory pressure. This mechanism works when the memory usage between memory_pressure_percentage and memory_limit_soft_percentage. The higher memory usage is, this higher probability to do flush MRS/DMS. e.g. memory_pressure_percentage = 60% memory_limit_soft_percentage = 80% The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, which is 0.2 by default, when the memory usage is 60%. As the memory increases, it gradually decreases to 0, when the memory usage is 80%. Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 --- M src/kudu/util/maintenance_manager-test.cc M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h 3 files changed, 103 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/9 -- To view, visit http://gerrit.cloudera.org:8080/20166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80 Gerrit-Change-Number: 20166 Gerrit-PatchSet: 9 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>