Re: better malloc strategies

Howard Chu Wed, 22 Nov 2006 17:06:27 -0800

David Boreham wrote:

Howard Chu wrote:

Kinda interesting - with hoard this shows us processing 23000 entriesper second in the single-threaded case, vs only 3521 per second withglibc malloc.


It is possible that you are seeing the syndrome that I wrote about here:
http://www.bozemanpass.com/info/linux/malloc/Linux_Heap_Contention.html
AFAIK the poorly behaving code is still present in today's glibc malloc.
(the problem affects UMich-derived LDAP servers because entries in
the cache comprise a significant proportion of the heap traffic, and they
tend to get allocated by a different thread than frees them when the
cache fills up). malloc exhibits lock contention when blocks are freed
by a different thread than allocated them (more exactly when the threads
hash to different arenas).

Yes, that's a factor I had considered. One of the approaches I hadtested was tagging each entry with the threadID that allocated it, anddeferring the frees until the owning thread comes back to dispose ofthem. I wasn't really happy with this approach because it meant theentry cache size would not be strictly regulated, though on a busyserver it's likely that all of the frees would happen in reasonablyshort time.

Still, free() contention alone doesn't explain the huge performance gapin the single-threaded case. However, judging from the fact that thevery first glibc search often runs faster than the second and subsequentones, I'd say that there are other significant costs in glibc free().Most likely would be that glibc returns memory back to the OS toofrequently. Hoard and Umem also return memory back to the OS, but theyalso keep more extensive free lists. Tcmalloc never returns memory backto the OS.

What's also interesting is that for Hoard, Umem, and Tcmalloc, themulti-threaded query times are consistently about 2x slower than thesingle-threaded case. The 2x slowdown makes sense since it's only adual-core CPU and it's doing 4x as much work. This kinda says that thecost of malloc is overshadowed by the overhead of thread scheduling.

But for glibc the multi-threaded times are actually faster than thesingle-threaded case. Since glibc's costs are being partially hidden itseems that the multiple threads are actually getting some benefit fromthe entry cache here. Still the difference between glibc and the otherallocators is so dramatic that this little difference is just academic.

--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

Re: better malloc strategies

Reply via email to