Song Jiacheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/21076 )
Change subject: KUDU-3557 Simple way to change maximum thread cache size of tcmalloc. ...................................................................... Patch Set 10: (19 comments) > Patch Set 10: Code-Review+1 > > Looks good to me! The only thing to fix is the IWYU issue below: > > 00:52:58 >>> Fixing #includes in > '/home/jenkins-slave/workspace/kudu-master/1/src/kudu/integration-tests/memory_gc-itest.cc' > 00:52:58 @@ -18,6 +18,7 @@ > 00:52:58 #include <cstdint> > 00:52:58 #include <functional> > 00:52:58 #include <memory> > 00:52:58 +#include <ostream> > 00:52:58 #include <string> > 00:52:58 #include <utility> > 00:52:58 #include <vector> > 00:52:58 IWYU would have edited 1 files on your behalf. Thanks for all the reviews! http://gerrit.cloudera.org:8080/#/c/21076/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21076/3//COMMIT_MSG@7 PS3, Line 7: Simple way to change maximum thread cache size > What if somebody sets the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES environment I think the setting is via this flag. SetNumericProperty is a runtime function, it could be called anytime. I have added a warning to the flag comment. http://gerrit.cloudera.org:8080/#/c/21076/3//COMMIT_MSG@10 PS3, Line 10: Making the total thread cache size larger : could highly decrease the lock contention. > Is there a way to add a small test to validate the results? We do already I tried adding a test that start tablet servers with different values of this flag. But I think we could not ensure that the test result is as expected. http://gerrit.cloudera.org:8080/#/c/21076/3//COMMIT_MSG@12 PS3, Line 12: at the : first time, but it could request more from unclaimed cache space : or steal from other thread cache each time it's doing garbag > I guess it's all originated from the way how the GC process manages the per I know this mechanism in tcmalloc. It actually just get 64K from unclaimed_cache_space_ if unclaimed_cache_space_ is > 0. The size of unclaimed_cache_space_ is tcmalloc_max_total_thread_cache_bytes - (actual used by all thread cache). We could observe the metric tcmalloc_current_total_thread_cache_bytes, the stealing will never happen if tcmalloc_current_total_thread_cache_bytes always less than tcmalloc_max_total_thread_cache_bytes. Added some details to the commit message. Thanks! http://gerrit.cloudera.org:8080/#/c/21076/3//COMMIT_MSG@17 PS3, Line 17: : KUDU-3557 s > Thank you for adding those graphs in KUDU-3557. For those graphs, could yo It's actually the metric spinlock_contention_time, I will make it clearer. http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc File src/kudu/integration-tests/memory_gc-itest.cc: http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@135 PS7, Line 135: TEST_F(MemoryGcITest, TestLockContentionInVariousThreadCacheSize) { > Thank you for adding this test. Thank you for the clarification about the metric 'spinlock_contention_time', I thought it includes the spinlock in tcmalloc. It turns out that Kudu and tcmalloc are using different implemention of spinlock. So actually the impact on decreasing lock contention in tamalloc by increasing the max total cache of thread caches is still not visible. But the metric spinlock_contention_time does show the performance currently. So I guess the flag still helpful. I have set the flags to 1MB, 8MB and 64MB, and the test still fails in a certain test. So I will change the assertion to log. http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@146 PS7, Line 146: Set --max_total_thread_cache_bytes to 8MB for the second tablet server. > nit: Set --max_total_thread_cache_bytes to 32MB for the second tablet serve Done http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@152 PS7, Line 152: Set --max_total_thread_cache_bytes to 64MB for the third tablet server. > nit: Set --max_total_thread_cache_bytes to 128MB for the third tablet serve Done http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@159 PS7, Line 159: int64_t total_size = 0; > Why to have this ASSERT_EVENTUALLY? Isn't the setting becomes effective im I'm not sure the tserver will be started at once. Removing this. http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@184 PS7, Line 184: estWorkload workload(cluster_.g > nit: Write some data to be scanned later on. These comments are from the codes above, changing those also. http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@200 PS7, Line 200: estWorkload workload(cluster_.get()); > nit: Additional memory is allocated during scan operations below. Done http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@207 PS7, Line 207: kload.StopAndJoin(); > nit: Run the scan workload for a long time to let it allocate/deallocate a Done http://gerrit.cloudera.org:8080/#/c/21076/7/src/kudu/integration-tests/memory_gc-itest.cc@213 PS7, Line 213: &ME > I'm not sure this ASSERT_EVENTUALLY is needed here. Indeed, removing it. http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc File src/kudu/server/server_base.cc: http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc@309 PS3, Line 309: 128 * 10 > Just another thought: given that Kudu servers have many threads by design a The max total size by default is computed as follow: static const size_t kMaxThreadCacheSize = 4 << 20; static const size_t kDefaultOverallThreadCacheSize = 8u * kMaxThreadCacheSize; I think it is 32MB, not sure if I am wrong. And before setting the flag, the default value of our cluster is 32MB. Of course 128MB is better, I just want to keep the original value by default. Done. http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc@310 PS3, Line 310: across all thread cach > Some words are missing? Sorry, done. http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc@311 PS3, Line 311: in tcm > a greater value Done http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc@311 PS3, Line 311: lloc. Increasing this value helps in reducing lock contention > Well, those requests are always frequent even in non-stressful workloads an Done http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc@312 PS3, Line 312: "in tcmalloc for memory-intensive workloads. WARNING: This flag will " > Please also add the 'experimental' tag as well. Done http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc@313 PS3, Line 313: "cover the TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES environment variable. " > Does it make sense to add a validator for the flag? It seems in the gperft Yes, it does make sense. Done. http://gerrit.cloudera.org:8080/#/c/21076/3/src/kudu/server/server_base.cc@809 PS3, Line 809: LOG(INFO) << "enabling wall clock jump detection"; : } : clock_.reset(new clock::HybridClock( : metric_entity_, threshold_usec, std::move(im))); : } > Does it makes sense to add logging about the inability to apply the setting Done -- To view, visit http://gerrit.cloudera.org:8080/21076 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3cb8c6ed6a8f24c63258ae65f8b841fe74b75308 Gerrit-Change-Number: 21076 Gerrit-PatchSet: 10 Gerrit-Owner: Song Jiacheng <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng <[email protected]> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 01 Mar 2024 06:12:24 +0000 Gerrit-HasComments: Yes
