[kudu-CR] KUDU-2977 sharding block map to speed up tserver startup

Yingchun Lai (Code Review) Thu, 31 Oct 2019 01:13:11 -0700

Yingchun Lai has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14555 )


Change subject: KUDU-2977 sharding block map to speed up tserver startup
......................................................................


Patch Set 2:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/14555/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14555/1//COMMIT_MSG@11
PS1, Line 11:
> Have you run the benchmark multiple times? Are the numbers stable? Could yo
Yea, I ran the benchmark multiple times, pretty stable.
I ran the StartupBenchmark test on 2 server, and compare old version and new 
version, 'real time' cost logged by LOG_TIMING like follows:
block count     old     new
100,000 0.0881  0.0938
1,000,000       1.177   1.0485
2,000,000       2.5675  2.1233
4,000,000       6.0655  4.6847
10,000,000      16.9558 13.9073

block count     old     new
100,000 0.0939  0.1006
1,000,000       1.3432  1.222
2,000,000       2.9359  2.6752
4,000,000       6.8357  5.3497
10,000,000      19.8975 14.8417

When block count increasing, startup has better performance.


http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager-test.cc@1001
PS1, Line 1001: // - minimal number of containers, each of which is entirely 
full
> Remove this.
Done


http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager-test.cc@1017
PS1, Line 1017:
> Nit: prepend with /* force= */ so it's more clear what this argument means.
Done


http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.h@311
PS1, Line 311:   bool AddLogBlock(LogBlockRefPtr lb);
> By renaming this it becomes an overload of the existing AddLogBlock, which
Done


http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.h@408
PS1, Line 408:
> If the size of each of these three vectors is the same, I'd replace them wi
Done


http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.h@410
PS1, Line 410:   struct ManagedBlocks {
> Why do the entries here and in blocks_by_block_id_arr_ need shared ownershi
unique_ptr is OK, Done.


http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.cc@1861
PS1, Line 1861: static const uint64_t kBlockMapMask = kBlockMapChunk - 1;
> Would be interesting to see how performance (and overhead) changes as this
I also ran StartupBenchmark on 2 servers with different kBlockMapChunk values, 
time cost like follows:
sharding shift, server1,    server2
1              5.383      6.2373                                                
                                                                                
                         
2              5.1782     5.5581
3              4.9536     5.4049
4              4.2259     5.3289
5              4.8093     5.5196
6              4.983      5.5974
7              5.5863     6.0348

So I think 4 shift bits (16 shards) is best, and I changed it to '1 << 4'.


http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.cc@2288
PS1, Line 2288:     Status s = RemoveLogBlock(block_id, &lb);
> If block_ids is large (i.e. we're deleting a large tablet), this will resul
I did some benchmark for RemoveLogBlocks(in 
LogBlockManagerTest.StartupBenchmark add some code to remove a batch of block 
in a loop), the difference is very little.
Test result like follows (time cost per batch, in seconds):
batch size, old version,  lock once,  lock N times
1,000       0.00384425    0.00398775  0.0040495
5,000       0.0188475     0.0197075   0.01955875
10,000      0.037975      0.039765    0.0388825
20,000      0.076725      0.083855    0.08087
40,000      0.15567       0.16672     0.16199
100,000     0.407225      0.414       0.395775

* lock once:
  for (int i = 0; i < kBlockMapChunk; ++i) {
    std::lock_guard<simple_spinlock> l(*managed_blocks_[i].lock);
    for (const auto& block_id : block_ids) {
      int index = block_id.id() & kBlockMapMask;
      if (index != i) continue;

      LogBlockRefPtr lb;
      Status s = RemoveLogBlock(i, block_id, &lb);
      ...
    }
  }

 * lock N times: current code

And also there may be the same case as what I mentioned in 
https://gerrit.cloudera.org/c/14555/1/src/kudu/fs/log_block_manager.cc#2609, 
it's a common case that we remove several tablets once, e.g. remove a range 
partiton which has many hash partitions, or drop a table. If we hold a lock and 
do something last a long time , the later arrive threads will wait  'in a 
queue' to aquire the lock, so I think keep the locking time shorter would be 
better.


http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.cc@2609
PS1, Line 2609:       if (!AddLogBlock(std::move(e.second))) {
> This isn't super helpful as what we really care about is total startup time
I use it to compare with old version. When I ran 
LogBlockManagerTest.StartupBenchmark with 10,000,000 blocks, old version logged 
like:
I1028 16:46:50.621170  4339 log_block_manager.cc:2588] Time spent 
AddLogBlocks...: real 1.288s  user 1.113s     sys 0.175s
I1028 16:46:51.779841  4346 log_block_manager.cc:2588] Time spent 
AddLogBlocks...: real 2.317s  user 1.018s     sys 0.140s
I1028 16:46:52.913904  4340 log_block_manager.cc:2588] Time spent 
AddLogBlocks...: real 3.426s  user 0.942s     sys 0.187s
I1028 16:46:54.350420  4342 log_block_manager.cc:2588] Time spent 
AddLogBlocks...: real 4.832s  user 1.266s     sys 0.171s
I1028 16:46:55.632331  4338 log_block_manager.cc:2588] Time spent 
AddLogBlocks...: real 6.167s  user 1.026s     sys 0.253s
I1028 16:46:56.918051  4343 log_block_manager.cc:2588] Time spent 
AddLogBlocks...: real 7.442s  user 1.043s     sys 0.244s
I1028 16:46:58.794180  4345 log_block_manager.cc:2588] Time spent 
AddLogBlocks...: real 9.376s  user 1.659s     sys 0.218s
I1028 16:47:00.056476  4344 log_block_manager.cc:2588] Time spent 
AddLogBlocks...: real 10.535s user 1.045s     sys 0.216s

new version logged like:
I1028 16:42:13.491647 95613 log_block_manager.cc:2609] Time spent 
AddLogBlock...: real 4.268s   user 2.292s     sys 0.574s
I1028 16:42:13.502215 95615 log_block_manager.cc:2609] Time spent 
AddLogBlock...: real 4.321s   user 2.078s     sys 0.532s
I1028 16:42:13.609458 95611 log_block_manager.cc:2609] Time spent 
AddLogBlock...: real 4.256s   user 2.112s     sys 0.540s
I1028 16:42:13.649394 95612 log_block_manager.cc:2609] Time spent 
AddLogBlock...: real 4.295s   user 2.046s     sys 0.579s
I1028 16:42:13.655992 95614 log_block_manager.cc:2609] Time spent 
AddLogBlock...: real 4.322s   user 2.333s     sys 0.520s
I1028 16:42:13.659255 95616 log_block_manager.cc:2609] Time spent 
AddLogBlock...: real 4.476s   user 2.228s     sys 0.556s
I1028 16:42:13.674238 95617 log_block_manager.cc:2609] Time spent 
AddLogBlock...: real 4.208s   user 2.206s     sys 0.520s
I1028 16:42:13.685796 95610 log_block_manager.cc:2609] Time spent 
AddLogBlock...: real 4.326s   user 2.157s     sys 0.474s

That said new method has better concurrency.
I will remove it after test.


http://gerrit.cloudera.org:8080/#/c/14555/1/src/kudu/fs/log_block_manager.cc@2612
PS1, Line 2612:                    << " which already is alive from another 
container when "
> Likewise this can result in a lot of lock activity due to the number of ent
If we lock once before per ith shard's sub-map insertion, concurrency will be 
influenced as what I mentioned in 
https://gerrit.cloudera.org/c/14555/1/src/kudu/fs/log_block_manager.cc#2609. 
Test result like follows:
block count,     current,     lock once per sub
100,000         0.0938  0.0939
1,000,000       1.0485  1.1477
2,000,000       2.1233  2.5293
4,000,000       4.6847  5.447
10,000,000      13.9073 15.8155



--
To view, visit http://gerrit.cloudera.org:8080/14555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0d5c13e051a2c1d6cfd1c9ad7db8a3cd195459d
Gerrit-Change-Number: 14555
Gerrit-PatchSet: 2
Gerrit-Owner: Yingchun Lai <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Comment-Date: Thu, 31 Oct 2019 08:12:07 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2977 sharding block map to speed up tserver startup

Reply via email to