Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21316 )

Change subject: [metrics] Add metrics for create and delete op time
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21316/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21316/1//COMMIT_MSG@11
PS1, Line 11: These monitoring metrics will be very helpful for analyzing
            : issues related to high CPU usage.
> Thank you very much for your response.
Yes: in general that makes sense, of course.  I was just trying to say that 
creating/deleting a tablet replica is mostly disk IO, but not many CPU cycles.

Adding metrics here and there might be a guessing game.  If the goal is to spot 
CPU bottlenecks, I can also recommend using built-in tracing: 
https://kudu.apache.org/docs/troubleshooting.html#kudu_tracing

In addition, running 'htop -p <kudu_tserver_pid>' and performing stracing, etc. 
could pin-point particular threads that consume a lot of CPU.


http://gerrit.cloudera.org:8080/#/c/21316/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21316/4//COMMIT_MSG@11
PS4, Line 11: analyzing
            : issues related to high CPU usage
How the stats on the duration of create/delete a tablet replica could help with 
analyzing high CPU usage scenarios  FWIW, most of the activity while 
creating/deleting a tablet is attributed to disk IO.

If you are interested in tracking CPU usage of some activity (not attributed to 
IO wait times), I could recommend taking a look at the 'reactor_load_percent' 
metric.

I hope this helps.


http://gerrit.cloudera.org:8080/#/c/21316/4/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/21316/4/src/kudu/tserver/ts_tablet_manager.cc@273
PS4, Line 273:
What is the significance of this 'on the current node' part?  All the tablet 
metrics are attributed to the node where the tablet replica is hosted, no?  If 
so, maybe drop this part?


http://gerrit.cloudera.org:8080/#/c/21316/4/src/kudu/tserver/ts_tablet_manager.cc@274
PS4, Line 274:
Why kInfo, not kDebug?  Looking at metrics like 'tablets_opening_time_startup', 
it seems this sort of metric is something that would be used mostly for 
troubleshooting.


http://gerrit.cloudera.org:8080/#/c/21316/4/src/kudu/tserver/ts_tablet_manager.cc@281
PS4, Line 281:
ditto: maybe, kDebug is a better choice here?


http://gerrit.cloudera.org:8080/#/c/21316/4/src/kudu/tserver/ts_tablet_manager.cc@1168
PS4, Line 1168:
Shouldn't the delete_tablet_run_time_ metric be updated before return here as 
well?



--
To view, visit http://gerrit.cloudera.org:8080/21316
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bd52013caa94a33143cb16ff3831a49b74bac4
Gerrit-Change-Number: 21316
Gerrit-PatchSet: 1
Gerrit-Owner: KeDeng <kdeng...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <ale...@apache.org>
Gerrit-Reviewer: KeDeng <kdeng...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 07 May 2024 07:26:16 +0000
Gerrit-HasComments: Yes

Reply via email to