Todd Lipcon has submitted this change and it was merged. ( )

Change subject: thirdparty: patch tcmalloc to improve AllocLarge performance

thirdparty: patch tcmalloc to improve AllocLarge performance

This pulls in an upstream tcmalloc pull request[1] which addresses O(n)
behavior in the large allocation path. Without this patch, large
allocations (>=1MB) do a linear scan of all large spans of free heap
to find the best fit, which is very expensive especially as the heap
grows more fragmented over time.

I tested this using YCSB workload C (100% random read) on an in-memory dataset
(10M rows on 6 nodes). As noted in KUDU-1465, scanners currently always
allocate 1MB buffers as a starting point even if the scan will only return a
small amount of data. These 1M allocations aggravate the AllocLarge path
and without this patch, AllocLarge was the top CPU consumer.

Prior to this fix, the workload managed about 20k reads/second. With the
fix, the workload averaged around 85k reads/second.

We should still fix KUDU-1465 to allocate smaller buffers, since small
allocations will always be faster (and less wasteful), but with this tcmalloc
fix, the performance issue is much less pronounced. The fix will also likely
help any other places that we might be making large allocations.


Change-Id: I4abffbd1cb02f99ebda1628d98e5c342ccb7f0f9
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert <>
M thirdparty/
3 files changed, 571 insertions(+), 51 deletions(-)

  Kudu Jenkins: Verified
  Dan Burkert: Looks good to me, approved

To view, visit
To unsubscribe, visit

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4abffbd1cb02f99ebda1628d98e5c342ccb7f0f9
Gerrit-Change-Number: 9392
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <>
Gerrit-Reviewer: Dan Burkert <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <>

Reply via email to