Adar Dembo has posted comments on this change. Change subject: consensus: consolidate Raft thread pools ......................................................................
Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/6946/6//COMMIT_MSG Commit Message: PS6, Line 8: : This patch consolidates the two thread pools used in Raft consensus: the : observer and "Raft" (for lack of a better name) pools. The former runs tasks : for infrequent events such as term and config changes while the latter is : used for periodic events such as processing heartbeat responses. : : In this patch, each per-replica pool is consolidated into a single : per-server pool. Sequence tokens are used to ensure that tasks belonging to : a single replica are executed serially and in order. For the "Raft" pool, a : single sequence token is shared by the PeerManager and RaftConsensus; this : mirrors the existing sharing behavior and is safe because token operations : are thread-safe. : : The non-default max_threads may come as a surprise, but David showed me how : tasks submitted to either pool may sometimes lead to an fsync (mostly via : cmeta flushes). As such, it isn't safe to use the default num_cpus upper : bound, as that may cause such IO-based tasks to block other CPU-only tasks : (or worse, periodic tasks like heartbeat response processing). : : What per-replica threads are not addressed by this patch? : - Failure detection thread: a threadpool isn't the right model : for these. Something like a hash wheel timer would be ideal. : - Prepare thread pool (max_threads=1): these could be consolidated too, but : their metrics are per-replica, and sequence tokens don't (yet) support : that. Meaning, if they were consolidated now, the metrics would also : consolidate and would be less useful as a result. > would be good to have a few runs of raft_consensus-itest on dist-test with I made a bunch of changes to the threadpool slot implementation and to this consolidation patch. I ran raft_consensus-itest on dist-test in DEBUG and slow mode. Out of 1000 runs, only 1 failed: F0603 02:23:28.156342 15190 test_workload.cc:220] Check failed: row_count >= expected_row_count (7382 vs. 7384) The full log is available here: https://filebin.ca/3OloKjkUOdaK http://gerrit.cloudera.org:8080/#/c/6946/6/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: PS6, Line 114: ThreadPool* raft_observers_pool > noticed that, for consensus peers, you're passing the token whereas here yo Done PS6, Line 917: raft_observers_pool_token_.Wait() > shutdown on a tp prevents more ops from being submitted while this doesn't, Yes, this was the race we discussed last week. I've since added a proper token shutdown routine and used it here. http://gerrit.cloudera.org:8080/#/c/6946/2/src/kudu/consensus/consensus_queue.h File src/kudu/consensus/consensus_queue.h: PS2, Line 434: // The pool token which executes observe > remove? also maybe add some info about the experiments you made where you f I've removed the TODO. I don't think it's safe to conclude that running notifications in parallel is unsafe, at least not based on my dist-test runs. All of the failures were row count related, and I the runs took place before you merged your fix for the semi-obscure bug that was producing bad row counts. http://gerrit.cloudera.org:8080/#/c/6946/6/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: PS6, Line 192: raft_pool > on the peer manager side you call this request_* something. mind using the Done PS6, Line 1743: raft_pool_token_->Wait(); > again, theres a change in semantics from "close the tp and wait for the ong As we discussed last week, this particular semantic change wasn't dangerous because the "last task in a slot may destroy its token" exception I added to ThreadPool ensured that when RaftConsensus was destroyed, it was via its last task. RaftConsensus could still submit useless tasks to the token after Shutdown(), but they'd no-op and eventually lead to the object's destruction. In any case this is now moot because I added proper token shutdown behavior, so token-based submission should now have identical semantics. http://gerrit.cloudera.org:8080/#/c/6946/6/src/kudu/server/server_base.cc File src/kudu/server/server_base.cc: PS6, Line 255: queue-observers-pool > use "raft_observers_pool" or something Now moot: I further consolidated the two pools into one. -- To view, visit http://gerrit.cloudera.org:8080/6946 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
