Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18679 )
Change subject: [master] KUDU-3389 support turning on/off auto rebalancer at runtime ...................................................................... Patch Set 33: (4 comments) Thanks your crs. http://gerrit.cloudera.org:8080/#/c/18679/31//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18679/31//COMMIT_MSG@14 PS31, Line 14: There are many independent, idle and isolated threads(which do : period tasks), they don't participant in thread pool. This patch start : to optimize them to use an unified thread pool to manager them. > I don't understand what these threads are. Before this patch we only have o eg: [root@e2bac9768674 kudu]# ag Thread::Create src/kudu/ | grep -v test | grep -v src/kudu/util/threadpool.cc src/kudu/clock/builtin_ntp.cc:651: Thread::Create("ntp", "ntp client", [this]() { this->PollThread(); }, &thread_), src/kudu/hms/hms_catalog.cc:132: RETURN_NOT_OK(Thread::Create("hms_catalog", "fetch_uuid", src/kudu/master/hms_notification_log_listener.cc:98: return kudu::Thread::Create("catalog manager", "hms-notification-log-listener", src/kudu/master/master.cc:452: return Thread::Create("master", src/kudu/master/catalog_manager.cc:787: RETURN_NOT_OK(kudu::Thread::Create("catalog manager", "bgtasks", src/kudu/rpc/acceptor_pool.cc:96: Status s = kudu::Thread::Create("acceptor pool", "acceptor", src/kudu/rpc/reactor.cc:193: return kudu::Thread::Create("reactor", "rpc reactor", src/kudu/rpc/reactor.cc:660:Status ReactorThread::CreateClientSocket(int family, Socket* sock) { src/kudu/rpc/result_tracker.cc:462: CHECK_OK(Thread::Create("server", "result-tracker", src/kudu/rpc/service_pool.cc:93: CHECK_OK(kudu::Thread::Create( src/kudu/security/init.cc:318: RETURN_NOT_OK(Thread::Create("kerberos", "reacquire thread", src/kudu/server/diagnostics_log.cc:144: Status s = Thread::Create("server", "diag-logger", src/kudu/server/server_base.cc:837: return Thread::Create("server", "excess-log-deleter", src/kudu/server/server_base.cc:885: return Thread::Create("server", "tcmalloc-memory-gc", src/kudu/subprocess/server.cc:130: RETURN_NOT_OK(Thread::Create("subprocess", "start", src/kudu/subprocess/server.cc:153: RETURN_NOT_OK(Thread::Create("subprocess", "responder", src/kudu/subprocess/server.cc:157: RETURN_NOT_OK(Thread::Create("subprocess", "reader", src/kudu/subprocess/server.cc:160: RETURN_NOT_OK(Thread::Create("subprocess", "writer", src/kudu/subprocess/server.cc:163: return Thread::Create("subprocess", "deadline-checker", src/kudu/tools/tool_action_local_replica.cc:265: RETURN_NOT_OK(Thread::Create("tool-tablet-copy", "check-progress", src/kudu/tserver/heartbeater.cc:669: return kudu::Thread::Create("heartbeater", "heartbeat", src/kudu/tserver/scanners.cc:118: RETURN_NOT_OK(Thread::Create("scanners", "removal_thread", src/kudu/tserver/tablet_copy_service.cc:114: CHECK_OK(Thread::Create("tablet-copy", "tc-session-exp", src/kudu/util/cloud/instance_detector.cc:76: RETURN_NOT_OK(Thread::Create( src/kudu/util/debug/trace_event_impl.cc:1364: Status s = Thread::CreateWithFlags( src/kudu/util/file_cache.cc:504: return Thread::Create("cache", Substitute("$0-evict", cache_name_), src/kudu/util/kernel_stack_watchdog.cc:71: CHECK_OK(Thread::CreateWithFlags( src/kudu/util/maintenance_manager.cc:209: return Thread::Create("maintenance", "maintenance_scheduler", src/kudu/util/minidump.cc:288: return Thread::Create("minidump", "sigusr1-handler", src/kudu/util/pstack_watcher.cc:53: CHECK_OK(Thread::Create("pstack_watcher", "pstack_watcher", src/kudu/util/thread.h:118: // Flags passed to Thread::CreateWithFlags(). src/kudu/util/ttl_cache.h:139: CHECK_OK(Thread::Create( I can fix some words for the patch. most of the threads can be added to scheduler thread pool, that's need refactor continue. http://gerrit.cloudera.org:8080/#/c/18679/31/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: http://gerrit.cloudera.org:8080/#/c/18679/31/src/kudu/master/catalog_manager.h@1260 PS31, Line 1260: > Will it be used to run only rebalance tasks in the future? No , it can be used by other tasks. Such as LeaderRebalance, catalog background threads... http://gerrit.cloudera.org:8080/#/c/18679/31/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/18679/31/src/kudu/master/catalog_manager.cc@160 PS31, Line 160: DEFINE_int32(scheduler_pool_max_thread_num, 4, > Does it mean we can have 4 concurrent rebalance tasks at most? And why the We use thread_pool_token SERIAL mode, so it's not concurrent. the thread pool thread_num [1, 4]. 4 more than 1, the scheduler pool would be used be other tasks. So we need a number a little greater than 1. http://gerrit.cloudera.org:8080/#/c/18679/31/src/kudu/master/catalog_manager.cc@1645 PS31, Line 1645: state_ = kClosing; : } : : // Shutdown the Catalog Manager background thread : if (background_tasks_) { : background_tasks_->Shutdown(); : } > Should we call auto_rebalancer_->Shutdown() before calling scheduler_pool_- You are reasonable. The shutdown order should not change, because 'line 1646' to shutdown scheduler_thread in the threadpool. There are 2 problems. 1) As you said, the writes may cause confuse. I plan add a patch to split scheduler_thread and shutdown it alone rather than the thread pool, that's will make easily understand. 2) Review the codes, there is a bug when extreme situation, as shutdown the scheduler thread pool and delete the scheduler_thread, at the same time Schedule a task may cause a coredump. Next patch I fix it. See: https://gerrit.cloudera.org/c/18867/ -- To view, visit http://gerrit.cloudera.org:8080/18679 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbcea6392b783843f89e019f1141dd8905366de5 Gerrit-Change-Number: 18679 Gerrit-PatchSet: 33 Gerrit-Owner: Yuqi Du <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Reviewer: Yuqi Du <[email protected]> Gerrit-Comment-Date: Fri, 19 Aug 2022 08:59:14 +0000 Gerrit-HasComments: Yes
