Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/14555 )
Change subject: KUDU-2977 sharding block map to speed up tserver startup ...................................................................... Patch Set 2: (10 comments) http://gerrit.cloudera.org:8080/#/c/14555/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14555/1//COMMIT_MSG@11 PS1, Line 11: > Have you run the benchmark multiple times? Are the numbers stable? Could yo Yea, I ran the benchmark multiple times, pretty stable. I ran the StartupBenchmark test on 2 server, and compare old version and new version, 'real time' cost logged by LOG_TIMING like follows: block count old new 100,000 0.0881 0.0938 1,000,000 1.177 1.0485 2,000,000 2.5675 2.1233 4,000,000 6.0655 4.6847 10,000,000 16.9558 13.9073 block count old new 100,000 0.0939 0.1006 1,000,000 1.3432 1.222 2,000,000 2.9359 2.6752 4,000,000 6.8357 5.3497 10,000,000 19.8975 14.8417 When block count increasing, startup has better performance. http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager-test.cc File src/kudu/fs/log_block_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager-test.cc@1001 PS1, Line 1001: // - minimal number of containers, each of which is entirely full > Remove this. Done http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager-test.cc@1017 PS1, Line 1017: > Nit: prepend with /* force= */ so it's more clear what this argument means. Done http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.h File src/kudu/fs/log_block_manager.h: http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.h@311 PS1, Line 311: bool AddLogBlock(LogBlockRefPtr lb); > By renaming this it becomes an overload of the existing AddLogBlock, which Done http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.h@408 PS1, Line 408: > If the size of each of these three vectors is the same, I'd replace them wi Done http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.h@410 PS1, Line 410: struct ManagedBlocks { > Why do the entries here and in blocks_by_block_id_arr_ need shared ownershi unique_ptr is OK, Done. http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.cc@1861 PS1, Line 1861: static const uint64_t kBlockMapMask = kBlockMapChunk - 1; > Would be interesting to see how performance (and overhead) changes as this I also ran StartupBenchmark on 2 servers with different kBlockMapChunk values, time cost like follows: sharding shift, server1, server2 1 5.383 6.2373 2 5.1782 5.5581 3 4.9536 5.4049 4 4.2259 5.3289 5 4.8093 5.5196 6 4.983 5.5974 7 5.5863 6.0348 So I think 4 shift bits (16 shards) is best, and I changed it to '1 << 4'. http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.cc@2288 PS1, Line 2288: Status s = RemoveLogBlock(block_id, &lb); > If block_ids is large (i.e. we're deleting a large tablet), this will resul I did some benchmark for RemoveLogBlocks(in LogBlockManagerTest.StartupBenchmark add some code to remove a batch of block in a loop), the difference is very little. Test result like follows (time cost per batch, in seconds): batch size, old version, lock once, lock N times 1,000 0.00384425 0.00398775 0.0040495 5,000 0.0188475 0.0197075 0.01955875 10,000 0.037975 0.039765 0.0388825 20,000 0.076725 0.083855 0.08087 40,000 0.15567 0.16672 0.16199 100,000 0.407225 0.414 0.395775 * lock once: for (int i = 0; i < kBlockMapChunk; ++i) { std::lock_guard<simple_spinlock> l(*managed_blocks_[i].lock); for (const auto& block_id : block_ids) { int index = block_id.id() & kBlockMapMask; if (index != i) continue; LogBlockRefPtr lb; Status s = RemoveLogBlock(i, block_id, &lb); ... } } * lock N times: current code And also there may be the same case as what I mentioned in https://gerrit.cloudera.org/c/14555/1/src/kudu/fs/log_block_manager.cc#2609, it's a common case that we remove several tablets once, e.g. remove a range partiton which has many hash partitions, or drop a table. If we hold a lock and do something last a long time , the later arrive threads will wait 'in a queue' to aquire the lock, so I think keep the locking time shorter would be better. http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.cc@2609 PS1, Line 2609: if (!AddLogBlock(std::move(e.second))) { > This isn't super helpful as what we really care about is total startup time I use it to compare with old version. When I ran LogBlockManagerTest.StartupBenchmark with 10,000,000 blocks, old version logged like: I1028 16:46:50.621170 4339 log_block_manager.cc:2588] Time spent AddLogBlocks...: real 1.288s user 1.113s sys 0.175s I1028 16:46:51.779841 4346 log_block_manager.cc:2588] Time spent AddLogBlocks...: real 2.317s user 1.018s sys 0.140s I1028 16:46:52.913904 4340 log_block_manager.cc:2588] Time spent AddLogBlocks...: real 3.426s user 0.942s sys 0.187s I1028 16:46:54.350420 4342 log_block_manager.cc:2588] Time spent AddLogBlocks...: real 4.832s user 1.266s sys 0.171s I1028 16:46:55.632331 4338 log_block_manager.cc:2588] Time spent AddLogBlocks...: real 6.167s user 1.026s sys 0.253s I1028 16:46:56.918051 4343 log_block_manager.cc:2588] Time spent AddLogBlocks...: real 7.442s user 1.043s sys 0.244s I1028 16:46:58.794180 4345 log_block_manager.cc:2588] Time spent AddLogBlocks...: real 9.376s user 1.659s sys 0.218s I1028 16:47:00.056476 4344 log_block_manager.cc:2588] Time spent AddLogBlocks...: real 10.535s user 1.045s sys 0.216s new version logged like: I1028 16:42:13.491647 95613 log_block_manager.cc:2609] Time spent AddLogBlock...: real 4.268s user 2.292s sys 0.574s I1028 16:42:13.502215 95615 log_block_manager.cc:2609] Time spent AddLogBlock...: real 4.321s user 2.078s sys 0.532s I1028 16:42:13.609458 95611 log_block_manager.cc:2609] Time spent AddLogBlock...: real 4.256s user 2.112s sys 0.540s I1028 16:42:13.649394 95612 log_block_manager.cc:2609] Time spent AddLogBlock...: real 4.295s user 2.046s sys 0.579s I1028 16:42:13.655992 95614 log_block_manager.cc:2609] Time spent AddLogBlock...: real 4.322s user 2.333s sys 0.520s I1028 16:42:13.659255 95616 log_block_manager.cc:2609] Time spent AddLogBlock...: real 4.476s user 2.228s sys 0.556s I1028 16:42:13.674238 95617 log_block_manager.cc:2609] Time spent AddLogBlock...: real 4.208s user 2.206s sys 0.520s I1028 16:42:13.685796 95610 log_block_manager.cc:2609] Time spent AddLogBlock...: real 4.326s user 2.157s sys 0.474s That said new method has better concurrency. I will remove it after test. http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.cc@2612 PS1, Line 2612: << " which already is alive from another container when " > Likewise this can result in a lot of lock activity due to the number of ent If we lock once before per ith shard's sub-map insertion, concurrency will be influenced as what I mentioned in https://gerrit.cloudera.org/c/14555/1/src/kudu/fs/log_block_manager.cc#2609. Test result like follows: block count, current, lock once per sub 100,000 0.0938 0.0939 1,000,000 1.0485 1.1477 2,000,000 2.1233 2.5293 4,000,000 4.6847 5.447 10,000,000 13.9073 15.8155 -- To view, visit http://gerrit.cloudera.org:8080/14555 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If0d5c13e051a2c1d6cfd1c9ad7db8a3cd195459d Gerrit-Change-Number: 14555 Gerrit-PatchSet: 2 Gerrit-Owner: Yingchun Lai <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Comment-Date: Thu, 31 Oct 2019 08:12:07 +0000 Gerrit-HasComments: Yes
