Howard Chu writes: > The obvious fix is to adopt the same strategies that tcmalloc uses. (And > unfortunately we can't simply rely on tcmalloc always being available, or > always being stable in a given environment.)
Good, though I'd like to see these slapd re-implementations of system features (like malloc) #ifdeffed with a fallback to the system feature. Then one can compile with -D<revert to system feature> either when that one is as good or better than slapd's, or to simplify debugging. Configure can guess about it too, e.g. it can detect tcmalloc. The new entry_free() plus tcmalloc may be better than plain tcmalloc, I don't know. It retains the global mutex though, which presumably is or someday will be a pessimization compared to _some_ malloc out there. > I.e., use per-thread cached free > lists. We maintain some small number of free objects per thread; this > per-thread free list can be used without locking. When the number of free > objects on a given thread exceeds a particular threshold ...or there is no thread key for the mutex (e.g. when the current thread is not from the thread pool)... Might be convenient to let slapd register init-thread and cleanup-thread functions in the thread pool. These could create/destroy these mutexes, and maybe some other per-thread slapd variables too. (Preferably the init function would be able to fail and cause the pool thread to die, but that'd mess up the pool logic which assumes once a thread has been created it will be able to handle submitted tasks. Except slapd often doesn't check for malloc/mutex_init success anyway, so demanding success would be no worse than what slapd does now.) > then we obtain the > global lock to return some number of objects to the global list. > > In practice this threshold can be very small - any given thread typically > needs no more than 4 entries at a time. (ModDN is the worst case at 3 entries > locked at once. LDAP TXNs would distort this figure but not in any critical > fashion.) For attributes the typical usage is much more variable, but any > number we pick will be an improvement over the current code. Add a few more for overlays, in particular syncrepl. Otherwise even a single overlay doing entry_dup() reduces performance. -- Hallvard