Yuqi Du has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18867 )

Change subject: [threadpool] Fix unsafe behaviour when SchedulerThread shutdown
......................................................................


Patch Set 17:

(13 comments)

Thanks for your crs. I have fixed them and you can review them again.

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@9
PS16, Line 9: When shutting down a thread pool with Schedule
> nit: When shutting down a thread pool with SchedulerThread
Done


http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@11
PS16, Line 11: uled on the tok
> nit: a task is scheduled on the token
Done


http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@9
PS16, Line 9: When shutting down a thread pool with SchedulerThread, thread 
pool's variable
            : 'scheduler_' is deleted and set to nullptr, at the same time, if 
thread pool
            : token is not shutdown and a task is scheduled on the toke
> Since the ThreadPoolToken is created after ThreadPool is created, when we n
The first question, yes. The shutdown order is needed by before, not the patch 
added.

So if thread pool has shut down, token invoke 'schedule/Submit' , core dump may 
happen.


http://gerrit.cloudera.org:8080/#/c/18867/16//COMMIT_MSG@14
PS16, Line 14: fix
> nit: fixes
Done


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool-test.cc
File src/kudu/util/threadpool-test.cc:

http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool-test.cc@159
PS16, Line 159:   SleepFor(MonoDelta::FromMilliseconds(2 * kDelayMs));
              :   // make sure a
> Is it essential to have both SleepFor() and pool_->Wait() here?  Wouldn't j
OK, I will add some comments.

'SleepFor(MonoDelta::FromMilliseconds(2 * kDelayMs));'  make sure the line 154 
task scheduled in SchedulerThread.


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool-test.cc@256
PS16, Line 256:   ASSERT_OK(pool_->Schedule(
              :       nullptr, [&counter]() { SimpleTaskMetho
> How essential is this extra wait?  It seems there is pool_->Wait() below, s
make sure the delayed task execute, pool_->Wait()  do not wait. I add a comment.

I make an experiment, at cpu 32 cores, --stress_cpu_threads=4000 is ok.


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.h
File src/kudu/util/threadpool.h:

http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.h@187
PS16, Line 187:   class SchedulerTask {
              :    public:
              :     SchedulerTask(ThreadPoolToken* thread_pool_token, 
std::function<void()> func)
              :
> Is it possible to move this into the 'private' section of the class?
change it to 'class'.


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.h@612
PS16, Line 612:   const ThreadPoolMetrics metrics_;
> Can the 'scheduler' be a unique_ptr so we don't need to explicitly delete i
I will adding a document, because of we should control the deleting order.


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@678
PS13, Line 678: ntil)) {
> Done
WaitForScheduler is removed.


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@678
PS13, Line 678:
              :       return false;
> Done
WaitForScheduler is removed.


http://gerrit.cloudera.org:8080/#/c/18867/13/src/kudu/util/threadpool.cc@680
PS13, Line 680:   }
> This sounds like a good idea -- if you find this is needed elsewhere but no
ok


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.cc
File src/kudu/util/threadpool.cc:

http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.cc@263
PS16, Line 263: MonoTime execute_time = MonoTime::Now();
              :   execute_time.AddDelta(MonoDelta::FromMilliseconds(delay_ms));
              :   return pool_->Schedule(this, std::move(f), execute_time);
              : }
              :
              : voi
> nit: Maybe we can move this check into ThreadPool::Schedule?
ok


http://gerrit.cloudera.org:8080/#/c/18867/16/src/kudu/util/threadpool.cc@540
PS16, Line 540: scheduler_->Schedule(token, std::move(f), execute_time);
> Should this be wrapped into RETURN_NOT_OK()?  Maybe, just
The function 'scheduler_->Schedule()' can return status, and it must return 
Status::OK();
Fixing it is ok, but it seems no strong reason,
someone may like return 'void'.

What do you think?  I'll do this according to your reply.

`
  void Schedule(ThreadPoolToken* token,
                std::function<void()> f,
                const MonoTime& execute_time) {
    MutexLock unique_lock(mutex_);
    future_tasks_.insert({execute_time, SchedulerTask({token, std::move(f)})});
  }
`



--
To view, visit http://gerrit.cloudera.org:8080/18867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I021422f7e51e1007c5bdbc877ab445f70ba12357
Gerrit-Change-Number: 18867
Gerrit-PatchSet: 17
Gerrit-Owner: Yuqi Du <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <[email protected]>
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Reviewer: Yuqi Du <[email protected]>
Gerrit-Comment-Date: Fri, 07 Oct 2022 12:39:03 +0000
Gerrit-HasComments: Yes

Reply via email to