[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

2023-07-25 Thread Song Jiacheng (Code Review)
Song Jiacheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407: Give a chance to do other maintenance operations 
while server is under memory pressure.
..


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc@922
PS7, Line 922: TEST_F(MaintenanceManagerTest, TestNotFlushMemory) {
> Did you run any practical workloads to see other maintenance ops in action
Thank you for your review!
Yes, I did. This patch has worked in our clusters for a long time. Maintenance 
manager does schedule some operations other than flush ops while under memory 
pressure.
I have a test, which is involved in KUDU-3488, testing various policies of 
maintenance manager, including this not_flush_memory_prob. And it shows that 
the not-flush mechanism works.
Sometimes the memory usage of tablet servers stay at about 60%(memory pressure 
threshold), because the write workload and the flush ability of maintenance 
manager are almost equal, in which case the high perf score operations can not 
be run. And eventually the performance of the tablet server will be lower and 
lower, leading to higher memory usage, and finally a vicious circle appears.
A user might want to turn on this if he find that the situation which is 
described above occurred, and he need to find the balance between the 
performance and memory.
Actually, The probability is decreasing while the memory usage is getting close 
to the 80%(memory soft limit), so mostly the memory usage won't be too high.


http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104
PS7, Line 104: DEFINE_double(not_flush_memory_prob, 0,
> You solution only distinguishes the memory related ops and non-memory relat
Thanks for your comment!
I think the ops which we want to run have already got a high perf score, the 
only reason they can't be run is that FindBestOp always do flush ops if under 
memory pressure. For now, I think the perf score mechanism is able to find the 
best op to run after tuning the configurations and table priorities.


http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104
PS7, Line 104: 0
> If this is going to be 0(i.e. DRS/MRS flush ops will be scheduled as per pr
Exactly, I will commit another patch with more information.
Thanks!



--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 8
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
Gerrit-Comment-Date: Tue, 25 Jul 2023 07:33:19 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

2023-07-17 Thread Song Jiacheng (Code Review)
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Wang Xixu,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#8).

Change subject: KUDU-3407: Give a chance to do other maintenance operations 
while server is under memory pressure.
..

KUDU-3407: Give a chance to do other maintenance operations while server is 
under memory pressure.

This patch add an argument to give a chance to do other ops while server is 
under memory pressure.

This mechanism works when the memory usage between memory_pressure_percentage 
and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, 
which is 0.2 by default, when the memory usage is 60%.
As the memory increases, it gradually decreases to 0, when the memory usage is 
80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 103 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/8
--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 8
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>


[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

2023-07-14 Thread Ashwani Raina (Code Review)
Ashwani Raina has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407: Give a chance to do other maintenance operations 
while server is under memory pressure.
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager-test.cc@922
PS7, Line 922: TEST_F(MaintenanceManagerTest, TestNotFlushMemory) {
Did you run any practical workloads to see other maintenance ops in action when 
memory usage is high?

I want to understand the impact of not choosing flush ops when memory usage is 
high for long time.

Also, what factors are to be kept in mind when deciding to increase/decrease a 
probability of running flush ops?


http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104
PS7, Line 104: 0
If this is going to be 0(i.e. DRS/MRS flush ops will be scheduled as per 
priority), you might want to think about providing some to hint to user that it 
is time to avoid flushing DRS and MRS when under memory pressure.



--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 7
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Ashwani Raina 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
Gerrit-Comment-Date: Fri, 14 Jul 2023 13:51:53 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

2023-07-13 Thread Wang Xixu (Code Review)
Wang Xixu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407: Give a chance to do other maintenance operations 
while server is under memory pressure.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/7/src/kudu/util/maintenance_manager.cc@104
PS7, Line 104: DEFINE_double(not_flush_memory_prob, 0,
You solution only distinguishes the memory related ops and non-memory related 
ops. How about distributing every op a priority, so you can adjust the priority 
of every type of ops to decide which op should be executed firstly?



--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 7
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
Gerrit-Comment-Date: Thu, 13 Jul 2023 08:50:42 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

2023-07-12 Thread Song Jiacheng (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#7).

Change subject: KUDU-3407: Give a chance to do other maintenance operations 
while server is under memory pressure.
..

KUDU-3407: Give a chance to do other maintenance operations while server is 
under memory pressure.

This patch add an argument to give a chance to do other ops while server is 
under memory pressure.

This mechanism works when the memory usage between memory_pressure_percentage 
and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, 
which is 0.2 by default, when the memory usage is 60%.
As the memory increases, it gradually decreases to 0, when the memory usage is 
80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 101 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/7
--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 7
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

2023-07-12 Thread Song Jiacheng (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#6).

Change subject: KUDU-3407: Give a chance to do other maintenance operations 
while server is under memory pressure.
..

KUDU-3407: Give a chance to do other maintenance operations while server is 
under memory pressure.

This patch add an argument to give a chance to do other ops while server is 
under memory pressure.

This mechanism works when the memory usage between memory_pressure_percentage 
and memory_limit_soft_percentage.
The higher memory usage is, this higher probability to do flush MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, 
which is 0.2 by default, when the memory usage is 60%.
As the memory increases, it gradually decreases to 0, when the memory usage is 
80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 87 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/6
--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 6
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

2023-07-12 Thread Song Jiacheng (Code Review)
Song Jiacheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20166 )

Change subject: KUDU-3407: Give a chance to do other maintenance operations 
while server is under memory pressure.
..


Patch Set 5:

(17 comments)

> Patch Set 5: Verified-1
>
> Build Failed
>
> http://jenkins.kudu.apache.org/job/kudu-gerrit/28183/ : FAILURE

This unit test TestPrioritizeLogRetentionUnderMemoryPressure works well in my 
PC.

http://gerrit.cloudera.org:8080/#/c/20166/3//COMMIT_MSG
Commit Message:

PS3:
> Please follow this generic git guideline to properly format the commit mess
Changed the tittle and still not sure it's ok.


http://gerrit.cloudera.org:8080/#/c/20166/3//COMMIT_MSG@7
PS3, Line 7: -3407: Give a chance to do other maintenance operations while serv
> Do you have any measurements/results that show how much improvement this pr
This patch has been working in our clusters(more than 100 clusters) for over 
half a year. It does make some influence since the performance will not be too 
low even if the cluster stay in high memory pressure for very long. According 
to web page and log, tablet server is scheduling some other operations like 
CompactRowsetOp, MajorCompactOp, which have big impact on the performance of 
tablet server.


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager-test.cc
File src/kudu/util/maintenance_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager-test.cc@919
PS3, Line 919:  FLAGS_not_flush_memory_prob = 1;
> This test scenario covers just one corner case.  It would be great to add m
About the test, there is another test about the whole maintenance manager, 
testing which operations will be run in various policies, which you mentioned 
in JIRA. But it's based on this patch so I post this patch first.


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h
File src/kudu/util/maintenance_manager.h:

http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@379
PS3, Line 379: ressure
> memory pressure
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@379
PS3, Line 379: ends o
> the server
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@379
PS3, Line 379:  r
> run
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.h@380
PS3, Line 380: he flag no
> Please describe the parameter and the return value.
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@104
PS3, Line 104: 0,
> What sort of testing has been done to make sure setting the default value t
That's the default value in our clusters and it works well,  but still I 
changed it to 0.


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@104
PS3, Line 104: DEFINE_double(not_flush_memory_prob, 0,
> Consider adding a validator for this flag: IIUC, the only allowed values ar
Added validator for not_flush_memory_prob and also data_gc_prioritization_prob.


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@105
PS3, Line 105: "
> memory pressure
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@106
PS3, Line 106:  server
> memory pressure
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@107
PS3, Line 107: d there are oth
> to run
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@519
PS3, Line 519:
> even if under memory pressure
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@519
PS3, Line 519:
> What are 'performance ops'?
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@522
PS3, Line 522:   // anchors the most WALs (the op should also free memory).
 :   //
> Would it would be easier to comprehend if calling memory_pressure_func_() f
That would be much better.


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@715
PS3, Line 715: void Main
> nit: the indent here should be 4 spaces per Kudu's C++ code style
Done


http://gerrit.cloudera.org:8080/#/c/20166/3/src/kudu/util/maintenance_manager.cc@716
PS3, Line 716:   running
> ditto
Done



--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 5
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot 

[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

2023-07-12 Thread Song Jiacheng (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#5).

Change subject: KUDU-3407: Give a chance to do other maintenance operations 
while server is under memory pressure.
..

KUDU-3407: Give a chance to do other maintenance operations while server is 
under memory pressure.

This patch add an argument to give a chance to do other OP while server is 
under memory pressure.

This mechanism works when the memory usage between memory_pressure_percentage 
and memory_limit_soft_percentage.  The higher memory usage is, this higher 
probability to do flush MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, 
which is 0.2 by default, when the memory usage is 60%. As the memory increases, 
it gradually decreases to 0, when the memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 87 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/5
--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 5
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng 
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] KUDU-3407: Give a chance to do other maintenance operations while server is under memory pressure.

2023-07-12 Thread Song Jiacheng (Code Review)
Hello Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20166

to look at the new patch set (#4).

Change subject: KUDU-3407: Give a chance to do other maintenance operations 
while server is under memory pressure.
..

KUDU-3407: Give a chance to do other maintenance operations while server is 
under memory pressure.

This patch add an argument to give a chance to do other OP while server is 
under memory pressure.

This mechanism works when the memory usage between memory_pressure_percentage 
and memory_limit_soft_percentage.  The higher memory usage is, this higher 
probability to do flush MRS/DMS.

e.g.
memory_pressure_percentage = 60%
memory_limit_soft_percentage = 80%
The probability of not flushing MRS/DMS is the value of not_flush_memory_prob, 
which is 0.2 by default, when the memory usage is 60%. As the memory increases, 
it gradually decreases to 0, when the memory usage is 80%.

Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 85 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/20166/4
--
To view, visit http://gerrit.cloudera.org:8080/20166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idc2fd3a850cf99d54ef2980211b712468440ed80
Gerrit-Change-Number: 20166
Gerrit-PatchSet: 4
Gerrit-Owner: Song Jiacheng 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Song Jiacheng