Abhishek Chennaka has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17903 )

Change subject: KUDU-1959 - Implement server startup progress page for tablet 
and master servers
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17903/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17903/2//COMMIT_MSG@7
PS2, Line 7: Implement server startup progress page for tablet and master 
servers
> While the headers may reuse jira numbers, each patch should have unique des
Done


http://gerrit.cloudera.org:8080/#/c/17903/2/src/kudu/fs/fs_manager.cc
File src/kudu/fs/fs_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17903/2/src/kudu/fs/fs_manager.cc@112
PS2, Line 112:
             : METRIC_DEFINE_gauge_int64(server, 
log_block_manager_containers_processing_time_startup,
             :                           "Time taken to open all log block 
containers during server startup",
             :                           kudu::MetricUnit::kMilliseconds,
             :                           "The total time taken by the server to 
open all the container"
             :                           "files during the startup"
> See my other thoughts on elapsed time metrics.
Making this metric debug as discussed in 
https://gerrit.cloudera.org/#/c/17903/2/src/kudu/tserver/ts_tablet_manager.cc@1382


http://gerrit.cloudera.org:8080/#/c/17903/1/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17903/1/src/kudu/fs/log_block_manager.cc@2561
PS1, Line 2561: ->set_value(containers_total->load())
> Rather than loading from the atomic (which isn't free), how about just incr
Done


http://gerrit.cloudera.org:8080/#/c/17903/2/src/kudu/server/startup_path_handler.cc
File src/kudu/server/startup_path_handler.cc:

http://gerrit.cloudera.org:8080/#/c/17903/2/src/kudu/server/startup_path_handler.cc@147
PS2, Line 147:
> Seem's we're missing the tablets processed counter. Though see my below com
Done


http://gerrit.cloudera.org:8080/#/c/17903/2/src/kudu/server/startup_path_handler.cc@142
PS2, Line 142:
             : int32 StartupPathHandler::StartupProgressPercentageMetric() {
             :   int percent = 0;
             :   percent += (init_progress_.IsStopped()) ? 100 : 0;
             :   percent += (read_filesystem_progress_.IsStopped()) ? 100 : 0;
             :   percent += (start_rpc_server_progress_.IsStopped()) ? 100 : 0;
             :   if (is_tablet_server_) {
             :     if (containers_total_ == 0 && 
read_data_directories_progress_.IsStopped()) {
             :       percent += 100;
             :     } else if (containers_total_ != 0) {
             :       percent += containers_processed_ * 100 / containers_total_;
             :     }
             :   } else {
             :     percent += (initialize_master_catalog_progress_.IsStopped()) 
? 100 : 0;
             :   }
             :   return percent/4;
             : }
> I see. Another approach would be to use METRIC_state() to define a counter
So instead of a value between [0,100] the idea is to have it between [0-4], is 
that correct? In that case we can have it as a steps_remaining counter and 
count down, too.


http://gerrit.cloudera.org:8080/#/c/17903/2/src/kudu/server/startup_path_handler.cc@159
PS2, Line 159:
             : int64_t StartupPathHandler::StartupProgressTimeElapsedMetric() {
             :   int64_t time_elapsed = 0;
             :   time_elapsed += init_progress_.TimeElapsed().ToMilliseconds();
             :   time_elapsed += 
read_filesystem_progress_.TimeElapsed().ToMilliseconds();
             :   if (is_tablet_server_) {
             :     time_elapsed += 
start_tablets_progress_.TimeElapsed().ToMilliseconds();
             :   } else {
             :     time_elapsed += 
initialize_master_catalog_progress_.TimeElapsed().ToMilliseconds();
             :   }
             :   time_elapsed += 
start_rpc_server_progress_.TimeElapsed().ToMilliseconds();
             :   return time_elapsed;
             : }
             :
             :
> Makes sense. Please add that reasoning to the commit message, i.e. the "why
Done


http://gerrit.cloudera.org:8080/#/c/17903/2/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17903/2/src/kudu/tserver/ts_tablet_manager.cc@1381
PS2, Line 1381:     ++*tablets_processed;
              :     tablets_num_opened_startup_->Increment();
> I see your point. Though while it seems useful, maybe it's not _that_ usefu
Sounds good. Making all the time metrics related to containers and tablets 
debug level from info level i.e.:
tablets_opening_time_startup and
log_block_manager_containers_processing_time_startup



--
To view, visit http://gerrit.cloudera.org:8080/17903
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0a508c3baf0a0d77baf75f36f7bb305a6ad821e1
Gerrit-Change-Number: 17903
Gerrit-PatchSet: 3
Gerrit-Owner: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 19 Oct 2021 04:06:41 +0000
Gerrit-HasComments: Yes

Reply via email to