Yuqi Du has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18679 )

Change subject: [master] KUDU-3389 support turning on/off auto rebalancer at 
runtime
......................................................................


Patch Set 33:

(4 comments)

Thanks your crs.

http://gerrit.cloudera.org:8080/#/c/18679/31//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18679/31//COMMIT_MSG@14
PS31, Line 14: There are many independent, idle and isolated threads(which do
             : period tasks), they don't participant in thread pool. This patch 
start
             : to optimize them to use an unified thread pool to manager them.
> I don't understand what these threads are. Before this patch we only have o
eg:

[root@e2bac9768674 kudu]# ag Thread::Create src/kudu/ | grep -v test | grep -v 
src/kudu/util/threadpool.cc
src/kudu/clock/builtin_ntp.cc:651:      Thread::Create("ntp", "ntp client", 
[this]() { this->PollThread(); }, &thread_),
src/kudu/hms/hms_catalog.cc:132:  RETURN_NOT_OK(Thread::Create("hms_catalog", 
"fetch_uuid",
src/kudu/master/hms_notification_log_listener.cc:98:  return 
kudu::Thread::Create("catalog manager", "hms-notification-log-listener",
src/kudu/master/master.cc:452:  return Thread::Create("master",
src/kudu/master/catalog_manager.cc:787:  
RETURN_NOT_OK(kudu::Thread::Create("catalog manager", "bgtasks",
src/kudu/rpc/acceptor_pool.cc:96:    Status s = kudu::Thread::Create("acceptor 
pool", "acceptor",
src/kudu/rpc/reactor.cc:193:  return kudu::Thread::Create("reactor", "rpc 
reactor",
src/kudu/rpc/reactor.cc:660:Status ReactorThread::CreateClientSocket(int 
family, Socket* sock) {
src/kudu/rpc/result_tracker.cc:462:  CHECK_OK(Thread::Create("server", 
"result-tracker",
src/kudu/rpc/service_pool.cc:93:    CHECK_OK(kudu::Thread::Create(
src/kudu/security/init.cc:318:  RETURN_NOT_OK(Thread::Create("kerberos", 
"reacquire thread",
src/kudu/server/diagnostics_log.cc:144:  Status s = Thread::Create("server", 
"diag-logger",
src/kudu/server/server_base.cc:837:  return Thread::Create("server", 
"excess-log-deleter",
src/kudu/server/server_base.cc:885:  return Thread::Create("server", 
"tcmalloc-memory-gc",
src/kudu/subprocess/server.cc:130:  RETURN_NOT_OK(Thread::Create("subprocess", 
"start",
src/kudu/subprocess/server.cc:153:    
RETURN_NOT_OK(Thread::Create("subprocess", "responder",
src/kudu/subprocess/server.cc:157:  RETURN_NOT_OK(Thread::Create("subprocess", 
"reader",
src/kudu/subprocess/server.cc:160:  RETURN_NOT_OK(Thread::Create("subprocess", 
"writer",
src/kudu/subprocess/server.cc:163:  return Thread::Create("subprocess", 
"deadline-checker",
src/kudu/tools/tool_action_local_replica.cc:265:    
RETURN_NOT_OK(Thread::Create("tool-tablet-copy", "check-progress",
src/kudu/tserver/heartbeater.cc:669:  return 
kudu::Thread::Create("heartbeater", "heartbeat",
src/kudu/tserver/scanners.cc:118:  RETURN_NOT_OK(Thread::Create("scanners", 
"removal_thread",
src/kudu/tserver/tablet_copy_service.cc:114:  
CHECK_OK(Thread::Create("tablet-copy", "tc-session-exp",
src/kudu/util/cloud/instance_detector.cc:76:    RETURN_NOT_OK(Thread::Create(
src/kudu/util/debug/trace_event_impl.cc:1364:      Status s = 
Thread::CreateWithFlags(
src/kudu/util/file_cache.cc:504:  return Thread::Create("cache", 
Substitute("$0-evict", cache_name_),
src/kudu/util/kernel_stack_watchdog.cc:71:  CHECK_OK(Thread::CreateWithFlags(
src/kudu/util/maintenance_manager.cc:209:  return Thread::Create("maintenance", 
"maintenance_scheduler",
src/kudu/util/minidump.cc:288:  return Thread::Create("minidump", 
"sigusr1-handler",
src/kudu/util/pstack_watcher.cc:53:  CHECK_OK(Thread::Create("pstack_watcher", 
"pstack_watcher",
src/kudu/util/thread.h:118:  // Flags passed to Thread::CreateWithFlags().
src/kudu/util/ttl_cache.h:139:      CHECK_OK(Thread::Create(


I can fix some words for the patch.
most of the threads can be added to scheduler thread pool, that's need refactor 
continue.


http://gerrit.cloudera.org:8080/#/c/18679/31/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/18679/31/src/kudu/master/catalog_manager.h@1260
PS31, Line 1260:
> Will it be used to run only rebalance tasks in the future?
No , it can be used by other tasks. Such as LeaderRebalance, catalog background 
threads...


http://gerrit.cloudera.org:8080/#/c/18679/31/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/18679/31/src/kudu/master/catalog_manager.cc@160
PS31, Line 160: DEFINE_int32(scheduler_pool_max_thread_num, 4,
> Does it mean we can have 4 concurrent rebalance tasks at most? And why the
We use thread_pool_token SERIAL mode, so it's not concurrent.
the thread pool thread_num [1, 4].  4 more than 1, the scheduler pool would be 
used be other tasks.
So we need a number a little greater than 1.


http://gerrit.cloudera.org:8080/#/c/18679/31/src/kudu/master/catalog_manager.cc@1645
PS31, Line 1645:   state_ = kClosing;
               :   }
               :
               :   // Shutdown the Catalog Manager background thread
               :   if (background_tasks_) {
               :     background_tasks_->Shutdown();
               :   }
> Should we call auto_rebalancer_->Shutdown() before calling scheduler_pool_-
You are reasonable.
The shutdown order should not change, because 'line 1646' to shutdown 
scheduler_thread in the threadpool.


There are 2 problems.

1) As you said, the writes may cause confuse. I plan add a patch to split 
scheduler_thread and shutdown it alone rather than the thread pool, that's will 
make easily understand.
2) Review the codes, there is a bug when extreme situation, as shutdown the 
scheduler thread pool and delete the scheduler_thread, at the same time 
Schedule a task may cause a coredump. Next patch I fix it.  See: 
https://gerrit.cloudera.org/c/18867/



--
To view, visit http://gerrit.cloudera.org:8080/18679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifbcea6392b783843f89e019f1141dd8905366de5
Gerrit-Change-Number: 18679
Gerrit-PatchSet: 33
Gerrit-Owner: Yuqi Du <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <[email protected]>
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Reviewer: Yuqi Du <[email protected]>
Gerrit-Comment-Date: Fri, 19 Aug 2022 08:59:14 +0000
Gerrit-HasComments: Yes

Reply via email to