I am sponsoring this case for Rick Weisner.
Requested release binding: Patch
Modified man pages are in the case's materials directory and diffs
are at the end of this proposal.
Template Version: @(#)sac_nextcase 1.70 03/30/10 SMI
This information is Copyright (c) 2010, Oracle and/or its affiliates. All
rights reserved.
1. Introduction
1.1. Project/Component Working Name:
Performance Improvements for libmtmalloc
1.2. Name of Document Author/Supplier:
Author: Rick Weisner
1.3 Date of This Document:
08 June, 2010
4. Technical Description
Template Version: @(#)sac_nextcase 1.70 05/10/10 SMI
This information is Copyright (c) 2010, Oracle and/or its affiliates. All
rights reserved.
1. Introduction
1.1. Project/Component Working Name:
Performance Improvements for libmtmalloc
1.2. Name of Document Author/Supplier:
Author: Rick Weisner
1.3 Date of This Document:
01 June, 2010
4. Technical Description
SUMMARY
Under the following two situations libmtmalloc has shown
poor scalability.
1. When there are large numbers of allocating threads.
(see CR6922229)
and
2. When the allocation size is larger than 64 KB.
(see CR6555149)
We will remedy the above scalability issues by:
1) Using atomic operations to eliminate the cache lock in
libmtmalloc.
2) Provide a mechanism whereby the parent lock can also
be eliminated for threads whose id is less than 2* the number
of cpus.
3) Make the maximum cacheable requestsize tunable via an
environment variable.
BACKGROUND
libmtmalloc organizes avaiable address space into buckets.
Each thread which calls malloc is assigned a bucket based
upon its thread id. The per bucket parent lock controls
the use of each bucket. Each bucket is a list of caches
based on size. Each list is protected by a cache lock.
Applications with a large number of allocating threads may
have their performance limited by contention for these locks.
These sort of applications are not unusual in the Telco space.
Larger allocations sizes are also becoming more common. With
64 bit applications, terabytes of memory, and hundreds of
threads it is advantageous to be able to adjust the
maximum cacheable requestsize to better suit the needs
of the application.
PROBLEM
A customer's application did not perform as needed on a
Netra 5440. DTrace indicated lock contention relating to
memory allocation in libmtmalloc. The customer provided
some code that provided dramatic performance increases by
eliminating the "cache" locks and "parent" locks from
libmtmalloc and replacing them with atomic operations.
The customer's code was not threadsafe in general but was
promising.
In a different case the customer states:
We observed that db is hitting oversize_lock mutex due to the
memory needed to be allocated is more than MAX_CACHED.
Sometimes acquiring the oversize_lock mutex is taking more
than 2sec, causing the db performance to degrade. (see 6555149)
PROPOSAL
1) Eliminate the cache lock by using atomic operations.
2) Add a new option to mallocctl(3MALLOC) that activates
the use of exclusive buckets for threads whose ID is < 2 *
the number of CPUs.
The value argument associated with the mallocctl option is
ignored.
The use of exclusive buckets can also be activitated if there
is an environment variable named MTEXCLUSIVE.
This feature is needed for situations where the source code is
unavailable. This feature will also assist in performance
analysis.
Once the option has been called there is no facility
to 'unset' it.
3) Introduce the environment variable, MTMAXCACHE, which will
set the maximum request size that is cached. It will have the
values of 16 to 21. The default is 16 which means that requests
less than 2^^16 are cached. With this value we can support up to
2mb (2^^21) request sizes in cache.
If the value of MTMAXCACHE is set to something outside of the
ranges then it will use either 16 or 21 (which ever bound
has been broken by the value set).
It is necessary to use an environment variable instead of
a mallocctl interface because the MTMAXCACHE must be determined
before malloc_init calls setup_caches.
DETAILS
The code has been developed and tested in 64 bit mode on
Solaris 10 u6 on a Netra T5440. The test harness uses a
configurable number of allocation threads, a configurable
sample count, a configurable "maximum" allocation size.
Each allocation thread has a configurable number of ramdom
or fixed size allocations between 8 and the requested "max"
allocation size + 1/2 the "max" allocation size.
A freeing thread then releases the allocations while the
allocating thread performs a fresh set of allocations.
In initial testing with "stock" libmtmalloc it was possible to do
6300 64 bit operations per sec on the N5440. With the "atomic"
library this increases to 15000.
COMMENTS
Exported Interfaces:
MTEXCLUSIVE Committed option for mallocctl(3MALLOC).
MTEXCLUSIVE Committed Shell environment variable. If set,
then the effect is the same as if
mallocctl was called with the
option MTEXCLUSIVE.
MTMAXCACHE Committed Shell environmet variable. If set,
the value sets the maximum cachable
requestsize to 2^^MTMAXCACHE.
Reference:
6922229 libmtmalloc would benefit from atomic operations
6555149 poor performance with libmtmalloc compared to libc
6956786 Provide a tunable to tweak the MAX_CACHED threshold
in libmtmalloc
6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: FastTrack
6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: FastTrack
6.6. ARC Exposure: open
Man page diffs:
** libmtmalloc.man Thu Jun 3 15:46:52 2010
--- new_libmtmalloc.man Thu Jun 3 16:09:44 2010
***************
*** 28,34 ****
--- 28,58 ----
mallocctl memalign
realloc valloc
+ ENVIRONMENT VARIABLES
+ MTEXCLUSIVE By default, libmtmalloc allocates 2*NCPUS
+ buckets from which allocations occur.
+ threads share buckets based on their thread
+ id. If MTEXCLUSIVE is invoked, then 4*NCPUS
+ buckets are used. Threads with thread id less
+ than 2*NCPUS receive an exclusive bucket and
+ thus do not need to use locks. Allocation
+ performance for these buckets may be dramatically
+ increased. One enabled MTEXCLUSIVE can not be
+ disabled. This feature can be enabled by
+ setting the environment value MTEXCLUSIVE to
+ anything. Altenatively it can be enabled by
+ a call to mallocctl(see mallocctl).
+ MTMAXCACHE By default, allocations less than 2^^16 bytes
+ are allocated from buckets indexed by thread id.
+ Using this environment variable size of the
+ cached allocations can be increased to 2^^17,
+ 2^^18, 2^^18, 2^^19, 2^^20, or 2^^21 by
+ setting MTMAXCACHE to 17,18,19,20,or 21.
+ If MTMAXCACHE is set to less than 16 it is
+ reset to 16. If MTMAXCACHE is set to more than
+ 21, then it is reset to 21. This all occurs
+ silently.
FILES
/usr/lib/libmtmalloc.so.1
*** mallocctl.man Thu Jun 3 15:37:18 2010
--- new_mallocctl.man Thu Jun 3 15:45:41 2010
***************
*** 164,170 ****
--- 164,183 ----
256. The default value is 9. This value
is multiplied by 8192.
+ MTEXCLUSIVE By default, libmtmalloc allocates 2*NCPUS
+ buckets from which allocations occur.
+ threads share buckets based on their thread
+ id. If MTEXCLUSIVE is invoked, then 4*NCPUS
+ buckets are used. Threads with thread id less
+ than 2*NCPUS receive an exclusive bucket and
+ thus do not need to use locks. Allocation
+ performance for these buckets may be dramatically
+ increased. One enabled MTEXCLUSIVE can not be
+ disabled. This feature can also be enabled by
+ setting the environment value MTEXCLUSIVE to
+ anything.
+
RETURN VALUES
If there is no available memory, malloc(), realloc(),
memalign(), and valloc() return a null pointer. When real-
***************
*** 224,230 ****
brk(2), getrlimit(2), bsdmalloc(3MALLOC), dlopen(3C),
malloc(3C), malloc(3MALLOC), mapmalloc(3MALLOC),
signal.h(3HEAD), umem_alloc(3MALLOC), watchmalloc(3MALLOC),
! attributes(5)
WARNINGS
Undefined results will occur if the size requested for a
--- 237,243 ----
brk(2), getrlimit(2), bsdmalloc(3MALLOC), dlopen(3C),
malloc(3C), malloc(3MALLOC), mapmalloc(3MALLOC),
signal.h(3HEAD), umem_alloc(3MALLOC), watchmalloc(3MALLOC),
! libmtmalloc(3LIB), attributes(5)
WARNINGS
Undefined results will occur if the size requested for a
_______________________________________________
opensolaris-arc mailing list
[email protected]