Adar Dembo has posted comments on this change.

Change subject: consensus: consolidate Raft thread pools
......................................................................


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6946/6//COMMIT_MSG
Commit Message:

PS6, Line 8: 
           : This patch consolidates the two thread pools used in Raft 
consensus: the
           : observer and "Raft" (for lack of a better name) pools. The former 
runs tasks
           : for infrequent events such as term and config changes while the 
latter is
           : used for periodic events such as processing heartbeat responses.
           : 
           : In this patch, each per-replica pool is consolidated into a single
           : per-server pool. Sequence tokens are used to ensure that tasks 
belonging to
           : a single replica are executed serially and in order. For the 
"Raft" pool, a
           : single sequence token is shared by the PeerManager and 
RaftConsensus; this
           : mirrors the existing sharing behavior and is safe because token 
operations
           : are thread-safe.
           : 
           : The non-default max_threads may come as a surprise, but David 
showed me how
           : tasks submitted to either pool may sometimes lead to an fsync 
(mostly via
           : cmeta flushes). As such, it isn't safe to use the default num_cpus 
upper
           : bound, as that may cause such IO-based tasks to block other 
CPU-only tasks
           : (or worse, periodic tasks like heartbeat response processing).
           : 
           : What per-replica threads are not addressed by this patch?
           : - Failure detection thread: a threadpool isn't the right model
           :   for these. Something like a hash wheel timer would be ideal.
           : - Prepare thread pool (max_threads=1): these could be consolidated 
too, but
           :   their metrics are per-replica, and sequence tokens don't (yet) 
support
           :   that. Meaning, if they were consolidated now, the metrics would 
also
           :   consolidate and would be less useful as a result.
> would be good to have a few runs of raft_consensus-itest on dist-test with 
I made a bunch of changes to the threadpool slot implementation and to this 
consolidation patch. I ran raft_consensus-itest on dist-test in DEBUG and slow 
mode.

Out of 1000 runs, only 1 failed:
  F0603 02:23:28.156342 15190 test_workload.cc:220] Check failed: row_count >= 
expected_row_count (7382 vs. 7384)

The full log is available here: https://filebin.ca/3OloKjkUOdaK


http://gerrit.cloudera.org:8080/#/c/6946/6/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

PS6, Line 114: ThreadPool* raft_observers_pool
> noticed that, for consensus peers, you're passing the token whereas here yo
Done


PS6, Line 917: raft_observers_pool_token_.Wait()
> shutdown on a tp prevents more ops from being submitted while this doesn't,
Yes, this was the race we discussed last week.

I've since added a proper token shutdown routine and used it here.


http://gerrit.cloudera.org:8080/#/c/6946/2/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

PS2, Line 434:   // The pool token which executes observe
> remove? also maybe add some info about the experiments you made where you f
I've removed the TODO.

I don't think it's safe to conclude that running notifications in parallel is 
unsafe, at least not based on my dist-test runs. All of the failures were row 
count related, and I the runs took place before you merged your fix for the 
semi-obscure bug that was producing bad row counts.


http://gerrit.cloudera.org:8080/#/c/6946/6/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS6, Line 192: raft_pool
> on the peer manager side you call this request_* something. mind using the 
Done


PS6, Line 1743: raft_pool_token_->Wait();
> again, theres a change in semantics from "close the tp and wait for the ong
As we discussed last week, this particular semantic change wasn't dangerous 
because the "last task in a slot may destroy its token" exception I added to 
ThreadPool ensured that when RaftConsensus was destroyed, it was via its last 
task. RaftConsensus could still submit useless tasks to the token after 
Shutdown(), but they'd no-op and eventually lead to the object's destruction.

In any case this is now moot because I added proper token shutdown behavior, so 
token-based submission should now have identical semantics.


http://gerrit.cloudera.org:8080/#/c/6946/6/src/kudu/server/server_base.cc
File src/kudu/server/server_base.cc:

PS6, Line 255: queue-observers-pool
> use "raft_observers_pool" or something
Now moot: I further consolidated the two pools into one.


-- 
To view, visit http://gerrit.cloudera.org:8080/6946
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to