[kudu-CR] threadpool: new test for pools with no max threads
Hello Dan Burkert, David Ribeiro Alves, Alexey Serbin, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6945 to look at the new patch set (#8). Change subject: threadpool: new test for pools with no max_threads .. threadpool: new test for pools with no max_threads This test ensures that a pool created with effectively no max_threads works as expected. That is: 1. Tokenless tasks should trigger the creation of a new thread. 2. Token-based tasks can create new threads, but only up to the number of tokens submitted against. I intend to use this "feature" to consolidate some Raft thread pools where a num_cpus upper bound may be undesirable (i.e. where tasks submitted to the pools may result in blocking IO). Change-Id: I37d0bd0a05098bcc1a81ec9f1ac33e74e98781e4 --- M src/kudu/util/threadpool-test.cc M src/kudu/util/threadpool.h 2 files changed, 70 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/6945/8 -- To view, visit http://gerrit.cloudera.org:8080/6945 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I37d0bd0a05098bcc1a81ec9f1ac33e74e98781e4 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Todd Lipcon
[kudu-CR] server: move apply pool into ServerBase
Hello Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6984 to look at the new patch set (#5). Change subject: server: move apply_pool into ServerBase .. server: move apply_pool into ServerBase The server-wide apply_pool was instantiated in two places: SysCatalogTable (for the master) and TsTabletManager (for the tserver). This commit moves it into ServerBase and unifies the instantiation. Note: the apply pool semantics have not changed. Some interesting side effects: 1. The master will now generate apply pool metrics. 2. The apply pool is shut down a little bit later in server shutdown than it was before, though I don't see any issues with this. Change-Id: Ie7ffc886269aa6531517a52fef29c4408a925aed --- M src/kudu/master/catalog_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/master/sys_catalog.h M src/kudu/server/server_base.cc M src/kudu/server/server_base.h M src/kudu/tserver/tablet_server.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 8 files changed, 48 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/84/6984/5 -- To view, visit http://gerrit.cloudera.org:8080/6984 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie7ffc886269aa6531517a52fef29c4408a925aed Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] threadpool: token-based task sequencing
Hello Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6874 to look at the new patch set (#8). Change subject: threadpool: token-based task sequencing .. threadpool: token-based task sequencing This patch adds task sequencing to util/threadpool. Task sequencing allows M contexts to share a single pool with N threads while also ensuring that the pool executes tasks belonging to each context in order. Previously this was only achievable via "one singleton pool per context", which grossly inflated the total number of threads and overall state. A group of tasks are sequenced by submission to a dedicated "slot". A client obtains exclusive access to a slot via AllocateTokenSlot(), which returns a "token" that can be used for task submission/waiting. When the token goes out of scope, the slot is returned to the pool. The token may also be shut down before being destroyed, which has semantics equivalent to shutting down the pool itself (i.e. tasks may no longer be submitted with the token). Some notes: - Slots and tokens are mapped 1-1 so they could theoretically be combined, but I prefer this separation of concerns. - I evaluated two other implementations. In one, tokens had an implicit lifecycle that was automatically managed by the threadpool. While simpler for clients, the threadpool was more inefficient with more allocations and deallocations in each submission. In the other, token submission was built using regular task submission. This afforded a certain separation between the "core" of the threadpool and the token logic, but it complicated locking and tracking of queued tasks. - I tried to keep submission (whether token-based or tokenless) fast. Just the change from std::list to std::deque for the main queue ought to improve performance of tokenless submissions. The next bottleneck is likely to be lock contention on the global threadpool lock. Change-Id: If46dc34212027b6ea5dbc2ead7c7af8be57f2c70 --- M src/kudu/util/debug-util.h M src/kudu/util/threadpool-test.cc M src/kudu/util/threadpool.cc M src/kudu/util/threadpool.h 4 files changed, 1,168 insertions(+), 75 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/6874/8 -- To view, visit http://gerrit.cloudera.org:8080/6874 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If46dc34212027b6ea5dbc2ead7c7af8be57f2c70 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: consolidate Raft thread pools
Adar Dembo has posted comments on this change. Change subject: consensus: consolidate Raft thread pools .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/6946/6//COMMIT_MSG Commit Message: PS6, Line 8: : This patch consolidates the two thread pools used in Raft consensus: the : observer and "Raft" (for lack of a better name) pools. The former runs tasks : for infrequent events such as term and config changes while the latter is : used for periodic events such as processing heartbeat responses. : : In this patch, each per-replica pool is consolidated into a single : per-server pool. Sequence tokens are used to ensure that tasks belonging to : a single replica are executed serially and in order. For the "Raft" pool, a : single sequence token is shared by the PeerManager and RaftConsensus; this : mirrors the existing sharing behavior and is safe because token operations : are thread-safe. : : The non-default max_threads may come as a surprise, but David showed me how : tasks submitted to either pool may sometimes lead to an fsync (mostly via : cmeta flushes). As such, it isn't safe to use the default num_cpus upper : bound, as that may cause such IO-based tasks to block other CPU-only tasks : (or worse, periodic tasks like heartbeat response processing). : : What per-replica threads are not addressed by this patch? : - Failure detection thread: a threadpool isn't the right model : for these. Something like a hash wheel timer would be ideal. : - Prepare thread pool (max_threads=1): these could be consolidated too, but : their metrics are per-replica, and sequence tokens don't (yet) support : that. Meaning, if they were consolidated now, the metrics would also : consolidate and would be less useful as a result. > would be good to have a few runs of raft_consensus-itest on dist-test with I made a bunch of changes to the threadpool slot implementation and to this consolidation patch. I ran raft_consensus-itest on dist-test in DEBUG and slow mode. Out of 1000 runs, only 1 failed: F0603 02:23:28.156342 15190 test_workload.cc:220] Check failed: row_count >= expected_row_count (7382 vs. 7384) The full log is available here: https://filebin.ca/3OloKjkUOdaK http://gerrit.cloudera.org:8080/#/c/6946/6/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: PS6, Line 114: ThreadPool* raft_observers_pool > noticed that, for consensus peers, you're passing the token whereas here yo Done PS6, Line 917: raft_observers_pool_token_.Wait() > shutdown on a tp prevents more ops from being submitted while this doesn't, Yes, this was the race we discussed last week. I've since added a proper token shutdown routine and used it here. http://gerrit.cloudera.org:8080/#/c/6946/2/src/kudu/consensus/consensus_queue.h File src/kudu/consensus/consensus_queue.h: PS2, Line 434: // The pool token which executes observe > remove? also maybe add some info about the experiments you made where you f I've removed the TODO. I don't think it's safe to conclude that running notifications in parallel is unsafe, at least not based on my dist-test runs. All of the failures were row count related, and I the runs took place before you merged your fix for the semi-obscure bug that was producing bad row counts. http://gerrit.cloudera.org:8080/#/c/6946/6/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: PS6, Line 192: raft_pool > on the peer manager side you call this request_* something. mind using the Done PS6, Line 1743: raft_pool_token_->Wait(); > again, theres a change in semantics from "close the tp and wait for the ong As we discussed last week, this particular semantic change wasn't dangerous because the "last task in a slot may destroy its token" exception I added to ThreadPool ensured that when RaftConsensus was destroyed, it was via its last task. RaftConsensus could still submit useless tasks to the token after Shutdown(), but they'd no-op and eventually lead to the object's destruction. In any case this is now moot because I added proper token shutdown behavior, so token-based submission should now have identical semantics. http://gerrit.cloudera.org:8080/#/c/6946/6/src/kudu/server/server_base.cc File src/kudu/server/server_base.cc: PS6, Line 255: queue-observers-pool > use "raft_observers_pool" or something Now moot: I further consolidated the two pools into one. -- To view, visit http://gerrit.cloudera.org:8080/6946 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54