[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

2023-09-12 Thread Ashwani Raina (Code Review)
Ashwani Raina has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 not always flush even if under memory pressure.
..


Patch Set 17:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/20166/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20166/14//COMMIT_MSG@18
PS14, Line 18: usage
nit: usage is


http://gerrit.cloudera.org:8080/#/c/20166/14//COMMIT_MSG@20
PS14, Line 20: this
nit: the


http://gerrit.cloudera.org:8080/#/c/20166/14//COMMIT_MSG@20
PS14, Line 20: higher memory
nit: higher the memory


http://gerrit.cloudera.org:8080/#/c/20166/14//COMMIT_MSG@20
PS14, Line 20: flush
 : MRS/DMS
nit: MRS/DMS flushes


http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@7
PS17, Line 7: not always flush even if
nit: Avoid unchecked scheduling of flush operations


http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@20
PS17, Line 20: this higher probability to do flush
nit: higher the probability to flush


http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@20
PS17, Line 20: The higher
nit: Higher the


http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@27
PS17, Line 27:
nit: remove whitespace


http://gerrit.cloudera.org:8080/#/c/20166/17//COMMIT_MSG@27
PS17, Line 27: not_flush_memory_prob
nit: you mean "run_non_memory_ops_prob" ?


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@110
PS17, Line 110: maintainer
nit: maybe you could use "system admin" or "user"?

There has to be some mechanism by which user/admin is made aware that this flag 
can be used to regulate the scheduling of flush operations when node is under 
memory pressure.

Maybe you could point to some (metrics) or (recommended action via a warning 
log), visible to user/admin that essentially denote these conditions and 
suggest to turn this on or increase the probability.


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@111
PS17, Line 111: the performance is decreasing.
nit: "there is a significant degradation in performance."


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@730
PS17, Line 730: FlushOrNot
FlushOrNot becomes confusing when return type is bool. It is not clearly 
evident from the name whether this method returns true or false when flush is a 
go.

Maybe rename to "ProceedWithFlush" ?


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@730
PS17, Line 730: bool MaintenanceManager::FlushOrNot(double* 
used_memory_percentage) {
nit: Add a comment above that describes the purpose of this method, out 
parameter and return value.



--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 17
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
Gerrit-Comment-Date: Tue, 12 Sep 2023 07:30:14 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

2023-09-06 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 not always flush even if under memory pressure.
..


Patch Set 17:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@151
PS17, Line 151: DLOG(INFO) << "Re-registering op " << this->name();
Consider removing this once done debugging/troubleshooting the new test -- it 
isn't used by the test, but pollutes the output.


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@314
PS17, Line 314: int64_t
nit: why not to use the MonoDelta type here?


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@315
PS17, Line 315: Perform count;
nit: How many times the operation has been run.


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager-test.cc@989
PS17, Line 989: not assert anything
Is it possible to assert on the range of the expected counters, etc.?  Given 
the amount of memory is pre-determined and doesn't depend on the actual amount 
of memory that a node has, I'd expect that the scenario should preserve some 
particular traits even if its behavior is driven by some stochastic factors.  
Running the scenario multiple times should converge to a pretty narrow range of 
the result counters, no?


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.h
File src/kudu/util/maintenance_manager.h:

http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.h@379
PS17, Line 379: server pressure
the memory pressure


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.h@386
PS17, Line 386:   bool FlushOrNot(double* used_memory_percentage);
nit: you could use a free format in a non-public API when documenting methods 
and fields; the doxygen format isn't a requirement here


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@522
PS3, Line 522:   // Look at ops that we can run quickly that free up log 
retention.
 :   if (low_io_most_logs_retai
> That would be much better.
Alternatively, updating the `memory_pressure_func_` with the new function could 
be a way to go.  Basically, having less parts for the condition would be better.


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@105
PS17, Line 105: we
nit: Who's "we" here?  It's better to be more specific on who does what here.


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@108
PS17, Line 108: ran
run


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@136
PS17, Line 136: Simulated memory usage
What is the unit of the usage?


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@539
PS17, Line 539: in
under


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@542
PS17, Line 542: FlushOrNot
Alternatively, consider calling memory_pressure_func_() functor as it is, but 
assign it to a different method/function, as necessary?


http://gerrit.cloudera.org:8080/#/c/20166/17/src/kudu/util/maintenance_manager.cc@732
PS17, Line 732: if (PREDICT_FALSE(FLAGS_memory_simulate_for_test != 0.0)) {
  :   *used_memory_percentage = FLAGS_memory_simulate_for_test;
  : }
Instead of introducing this extra test flag and adding this extra logic, 
consider setting the memory_pressure_func_ via the 
MaintenanceManager::set_memory_pressure_func_for_tests() method for specific 
tests, similar to what's done in MaintenanceManagerTest::StartManager()?

The extra level of indirection introduced by 
MaintenanceManager::memory_pressure_func_ is there just for this particular 
purpose, so introducing an extra flag looks like an overkill if having that 
provision already.



-- 
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 17
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
Gerrit-Comment-Date: Wed, 06 Sep 2023 23:46:58 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

2023-08-31 Thread Wang Xixu (Code Review)
Wang Xixu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 not always flush even if under memory pressure.
..


Patch Set 17: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 17
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
Gerrit-Comment-Date: Thu, 31 Aug 2023 07:52:22 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

2023-08-29 Thread Song Jiacheng (Code Review)
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#17).

Change subject: KUDU-3407 not always flush even if under memory pressure.
..

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob. As the memory increases, it gradually
decreases to 0, when the memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 210 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/17
--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 17
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>


[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

2023-08-29 Thread Song Jiacheng (Code Review)
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#16).

Change subject: KUDU-3407 not always flush even if under memory pressure.
..

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage is between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 210 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/16
--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 16
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>


[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

2023-08-29 Thread Song Jiacheng (Code Review)
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#15).

Change subject: KUDU-3407 not always flush even if under memory pressure.
..

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 210 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/15
--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 15
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>


[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

2023-08-24 Thread Song Jiacheng (Code Review)
Song Jiacheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 not always flush even if under memory pressure.
..


Patch Set 14:

(7 comments)

I added a test to show which operation the maintenance manager would like to 
choose in various policies and workloads, which is involved in KUDU-3488.
Using the test, we could test all the operations and flags, including the newly 
added flag in this patch.

http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager-test.cc@336
PS9, Line 336:
> The default value is 0, why set it again?
Done


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.h
File src/kudu/util/maintenance_manager.h:

http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.h@382
PS9, Line 382:   /// @param [out] used_memory_percentage
> nit: used_memory_percentage
Done


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@104
PS9, Line 104: run_non_memory_ops_pr
> It is easy to be understand to use a positive flag. How about: run_non_memo
Done


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@107
PS9, Line 107: mory operations "
> non-memory operations waiting to be ran.
Done


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@109
PS9, Line 109: ead of flushing ops. This might be needed to turn "
 :   "on if maintainer found that the tablet server is 
under memory "
 :   "pressure for a long time and
> I think your purpose is to make a balance between running memory and non-me
It is computed by the memory usage. Maintainer may set a certain value and the 
exact probability is calculated by the value and the memory usage. I think the 
only convincible factor is memory usage, other factors, like perf scores of 
operations, are not really reliable.


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@111
PS9, Line 111: pressure for a
> Reading or wring performance?
Both, I think. Row set height is getting higher, redo files are getting larger, 
so both write performance and read performance might be worse.


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@112
PS9, Line 112: TAG_FLAG(run_non_memory_ops_prob, experimental
> It should also add 'runtime' flag?
Done



--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 14
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
Gerrit-Comment-Date: Thu, 24 Aug 2023 09:05:17 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

2023-08-24 Thread Song Jiacheng (Code Review)
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#14).

Change subject: KUDU-3407 not always flush even if under memory pressure.
..

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than WAL gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 213 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/14
--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 14
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>


[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

2023-08-24 Thread Song Jiacheng (Code Review)
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#13).

Change subject: KUDU-3407 not always flush even if under memory pressure.
..

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 213 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/13
--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 13
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>


[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

2023-08-24 Thread Song Jiacheng (Code Review)
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#12).

Change subject: KUDU-3407 not always flush even if under memory pressure.
..

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 212 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/12
--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 12
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>


[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

2023-08-23 Thread Song Jiacheng (Code Review)
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#11).

Change subject: KUDU-3407 not always flush even if under memory pressure.
..

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of the tservers worse and worse and eventually
break due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 207 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/11
--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 11
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>


[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

2023-08-23 Thread Song Jiacheng (Code Review)
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#10).

Change subject: KUDU-3407 not always flush even if under memory pressure.
..

KUDU-3407 not always flush even if under memory pressure.

In some clusters, the memory usages of tservers might be 60% ~ 80%
for a long time. During this time the maintenance manager will not
run any operation other than wal gc and MRS/DRS flushes, which will
make the performance of tservers worse and worse and eventually break
due to OOM.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 207 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/10
--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 10
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>


[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

2023-08-04 Thread Wang Xixu (Code Review)
Wang Xixu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407 not always flush even if under memory pressure.
..


Patch Set 9:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/20166/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20166/9//COMMIT_MSG@8
PS9, Line 8:
   : This patch add an argument to give a chance to do other ops while
   : server is under memory pressure.
   :
The background of this patch should also be introduced.


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager-test.cc@336
PS9, Line 336:   FLAGS_not_flush_memory_prob = 0.0;
The default value is 0, why set it again?


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.h
File src/kudu/util/maintenance_manager.h:

http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.h@382
PS9, Line 382:   /// @param [out] capacity_pct
nit: used_memory_percentage


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@104
PS9, Line 104: not_flush_memory_prob
It is easy to be understand to use a positive flag. How about: 
run_non_memory_ops_prob?


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@107
PS9, Line 107: operations to run
non-memory operations waiting to be ran.


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@109
PS9, Line 109: This might be needed to turn on if maintainer found "
 :   "that the tablet server is under memory pressure 
for a long time and "
 :   "the performance is decreasing
I think your purpose is to make a balance between running memory and non-memory 
operations. But It is hard to decide the value of not_flush_memory_prob by  the 
maintainer. How about to computing the not_flush_memory_prob automatically, and 
using a switch to turn on this feature?


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@111
PS9, Line 111: the performance
Reading or wring performance?


http://gerrit.cloudera.org:8080/#/c/20166/9/src/kudu/util/maintenance_manager.cc@112
PS9, Line 112: TAG_FLAG(not_flush_memory_prob, experimental);
It should also add 'runtime' flag?



--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 9
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
Gerrit-Comment-Date: Fri, 04 Aug 2023 08:29:20 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3407 not always flush even if under memory pressure.

2023-07-25 Thread Song Jiacheng (Code Review)
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#9).

Change subject: KUDU-3407 not always flush even if under memory pressure.
..

KUDU-3407 not always flush even if under memory pressure.

This patch add an argument to give a chance to do other ops while
server is under memory pressure.

This mechanism works when the memory usage between
memory_pressure_percentage and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush
MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of
not_flush_memory_prob, which is 0.2 by default, when the memory
usage is 60%.
As the memory increases, it gradually decreases to 0, when the
memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 103 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/9
--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 9
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>