Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/18867 )
Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown ...................................................................... Patch Set 17: (13 comments) Thanks for your crs. I have fixed them and you can review them again. http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@9 PS16, Line 9: When shutting down a thread pool with Schedule > nit: When shutting down a thread pool with SchedulerThread Done http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@11 PS16, Line 11: uled on the tok > nit: a task is scheduled on the token Done http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@9 PS16, Line 9: When shutting down a thread pool with SchedulerThread, thread pool's variable : 'scheduler_' is deleted and set to nullptr, at the same time, if thread pool : token is not shutdown and a task is scheduled on the toke > Since the ThreadPoolToken is created after ThreadPool is created, when we n The first question, yes. The shutdown order is needed by before, not the patch added. So if thread pool has shut down, token invoke 'schedule/Submit' , core dump may happen. http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@14 PS16, Line 14: fix > nit: fixes Done http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool-test.cc File src/kudu/util/threadpool-test.cc: http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool-test.cc@159 PS16, Line 159: SleepFor(MonoDelta::FromMilliseconds(2 * kDelayMs)); : // make sure a > Is it essential to have both SleepFor() and pool_->Wait() here? Wouldn't j OK, I will add some comments. 'SleepFor(MonoDelta::FromMilliseconds(2 * kDelayMs));' make sure the line 154 task scheduled in SchedulerThread. http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool-test.cc@256 PS16, Line 256: ASSERT_OK(pool_->Schedule( : nullptr, [&counter]() { SimpleTaskMetho > How essential is this extra wait? It seems there is pool_->Wait() below, s make sure the delayed task execute, pool_->Wait() do not wait. I add a comment. I make an experiment, at cpu 32 cores, --stress_cpu_threads=4000 is ok. http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.h File src/kudu/util/threadpool.h: http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.h@187 PS16, Line 187: class SchedulerTask { : public: : SchedulerTask(ThreadPoolToken* thread_pool_token, std::function<void()> func) : > Is it possible to move this into the 'private' section of the class? change it to 'class'. http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.h@612 PS16, Line 612: const ThreadPoolMetrics metrics_; > Can the 'scheduler' be a unique_ptr so we don't need to explicitly delete i I will adding a document, because of we should control the deleting order. http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc File src/kudu/util/threadpool.cc: http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@678 PS13, Line 678: ntil)) { > Done WaitForScheduler is removed. http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@678 PS13, Line 678: : return false; > Done WaitForScheduler is removed. http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@680 PS13, Line 680: } > This sounds like a good idea -- if you find this is needed elsewhere but no ok http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.cc File src/kudu/util/threadpool.cc: http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.cc@263 PS16, Line 263: MonoTime execute_time = MonoTime::Now(); : execute_time.AddDelta(MonoDelta::FromMilliseconds(delay_ms)); : return pool_->Schedule(this, std::move(f), execute_time); : } : : voi > nit: Maybe we can move this check into ThreadPool::Schedule? ok http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.cc@540 PS16, Line 540: scheduler_->Schedule(token, std::move(f), execute_time); > Should this be wrapped into RETURN_NOT_OK()? Maybe, just The function 'scheduler_->Schedule()' can return status, and it must return Status::OK(); Fixing it is ok, but it seems no strong reason, someone may like return 'void'. What do you think? I'll do this according to your reply. ` void Schedule(ThreadPoolToken* token, std::function<void()> f, const MonoTime& execute_time) { MutexLock unique_lock(mutex_); future_tasks_.insert({execute_time, SchedulerTask({token, std::move(f)})}); } ` -- To view, visit http://gerrit.cloudera.org:8080/18867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357 Gerrit-Change-Number: 18867 Gerrit-PatchSet: 17 Gerrit-Owner: Yuqi Du <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Reviewer: Yuqi Du <[email protected]> Gerrit-Comment-Date: Fri, 07 Oct 2022 12:39:03 +0000 Gerrit-HasComments: Yes
