Todd Lipcon has posted comments on this change. Change subject: log_block_manager: switch from google::sparse_hash_map to sparsepp ......................................................................
Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/8007/1//COMMIT_MSG Commit Message: > Forgot to ask: we originally switched to sparse_hash_map because of it's ex it's slightly worse (maybe 10%) than sparse_hash_map but still way better than std::unordered_map. Generally the block map isn't the worst consumer of memory. The original commit said that 1M blocks took 9M with sparse_hash_map vs 24M with unordered_map. I'd expect this one to use maybe 11M based on the graphs at https://github.com/greg7mdp/sparsepp/blob/master/bench.md The bigger win of these maps vs unordered_map is that it doesn't allocate any big single array for buckets. Instead it does a lot of smaller allocations. This helps avoid fragmentation issues, etc. I'll add some color to the commit message. Line 14: This improved startup time 7-8x on a real host with ~11M blocks: > To be clear, this improvement is just from switching to sparsepp? Or with t just this. the copy-vs-move is a big win because the copies involved refcounts (i.e atomic operations) and those probably inhibited a lot of memory load speculation, etc, with this random-access memory workload. http://gerrit.cloudera.org:8080/#/c/8007/1/thirdparty/download-thirdparty.sh File thirdparty/download-thirdparty.sh: PS1, Line 328: SPARSEPP_PATCHLEVEL=0 : delete_if_wrong_patchlevel $SPARSEPP_SOURCE $SPARSEPP_PATCHLEVEL : if [ ! -d "$SPARSEPP_SOURCE" ]; then : fetch_and_expand sparsepp-${SPARSEPP_VERSION}.tar.gz : pushd $SPARSEPP_SOURCE : touch patchlevel-$SPARSEPP_PATCHLEVEL : popd : fi > If there's not one patch, you can omit the patchlevel-0 boilerplate, as wel I thought we wanted to start putting these in so that, if in the future we add a patch, and then rebase back to this patchlevel 0, it will be smart enough to rm and rebuild. http://gerrit.cloudera.org:8080/#/c/8007/1/thirdparty/vars.sh File thirdparty/vars.sh: Line 193: # (from https://github.com/toddlipcon/sparsepp) > Why are we pulling from your fork of sparsepp and not the main repo? Have y yea but since then they pulled my commit in, so I'll revert back to the upstream. -- To view, visit http://gerrit.cloudera.org:8080/8007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7397f9cd418782caecf8b2dae2c7bfe2c0e6215c Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes