Todd Lipcon has posted comments on this change.

Change subject: log_block_manager: switch from google::sparse_hash_map to 
sparsepp
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8007/1//COMMIT_MSG
Commit Message:

> Forgot to ask: we originally switched to sparse_hash_map because of it's ex
it's slightly worse (maybe 10%) than sparse_hash_map but still way better than 
std::unordered_map. Generally the block map isn't the worst consumer of memory.

The original commit said that 1M blocks took 9M with sparse_hash_map vs 24M 
with unordered_map. I'd expect this one to use maybe 11M based on the graphs at 
https://github.com/greg7mdp/sparsepp/blob/master/bench.md

The bigger win of these maps vs unordered_map is that it doesn't allocate any 
big single array for buckets. Instead it does a lot of smaller allocations. 
This helps avoid fragmentation issues, etc.

I'll add some color to the commit message.


Line 14: This improved startup time 7-8x on a real host with ~11M blocks:
> To be clear, this improvement is just from switching to sparsepp? Or with t
just this. the copy-vs-move is a big win because the copies involved refcounts 
(i.e atomic operations) and those probably inhibited a lot of memory load 
speculation, etc, with this random-access memory workload.


http://gerrit.cloudera.org:8080/#/c/8007/1/thirdparty/download-thirdparty.sh
File thirdparty/download-thirdparty.sh:

PS1, Line 328: SPARSEPP_PATCHLEVEL=0
             : delete_if_wrong_patchlevel $SPARSEPP_SOURCE $SPARSEPP_PATCHLEVEL
             : if [ ! -d "$SPARSEPP_SOURCE" ]; then
             :   fetch_and_expand sparsepp-${SPARSEPP_VERSION}.tar.gz
             :   pushd $SPARSEPP_SOURCE
             :   touch patchlevel-$SPARSEPP_PATCHLEVEL
             :   popd
             : fi
> If there's not one patch, you can omit the patchlevel-0 boilerplate, as wel
I thought we wanted to start putting these in so that, if in the future we add 
a patch, and then rebase back to this patchlevel 0, it will be smart enough to 
rm and rebuild.


http://gerrit.cloudera.org:8080/#/c/8007/1/thirdparty/vars.sh
File thirdparty/vars.sh:

Line 193: # (from https://github.com/toddlipcon/sparsepp)
> Why are we pulling from your fork of sparsepp and not the main repo? Have y
yea but since then they pulled my commit in, so I'll revert back to the 
upstream.


-- 
To view, visit http://gerrit.cloudera.org:8080/8007
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7397f9cd418782caecf8b2dae2c7bfe2c0e6215c
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to