[SLUB 0/2] SLUB: The unqueued slab allocator V6
[PATCH] SLUB The unqueued slab allocator v6 Note that the definition of the return type of ksize() is currently different between mm and Linus' tree. Patch is conforming to mm. This patch also needs sprint_symbol() support from mm. V5->V6: - Straighten out various coding issues u.a. to make the hot path clearer in slab_alloc and slab_free. This adds more gotos. sigh. - Detailed alloc / free tracking including pid, cpu, time of alloc / free if SLAB_STORE_USER is enabled or slub_debug=U specified on boot. - sysfs support via /sys/slab. Drop /proc/slubinfo support. Include slabinfo tool that produces an output similar to what /proc/slabinfo does. Tool needs to be made more sophisticated to allow control of various slub options at runtime. Currently reports total slab sizes, slab fragmentation and slab effectiveness (actual object use vs. slab space use). - Runtime debug option changes per slab via /sys/slab/. All slab debug options can be configured via sysfs provided that no objects have been allocated yet. - Deal with i386 use of slab page structs. Main patch disables slub for i386 (CONFIG_ARCH_USES_SLAB_PAGE_STRUCT). Then a special patch removes the page sized slabs and removes that setting. See the caveats in that patch for further details. V4->V5: - Single object slabs only for slabs > slub_max_order otherwise generate sufficient objects to avoid frequent use of the page allocator. This is necessary to compensate for fragmentation caused by frequent uses of the page allocator. We expect slabs of PAGE_SIZE from this rule since multi object slabs require uses of fields that are in use on i386 and x86_64. See the quicklist patchset for a way to fix that issue and a patch to get rid of the PAGE_SIZE special casing. - Drop pass through to page allocator due to page allocator fragmenting memory. The buffering through large order allocations is done in SLUB. Infrequent larger order allocations cause less fragmentation than frequent small order allocations. - We need to update object sizes when merging slabs otherwise kzalloc will not initialize the full object (this caused the failure on various platforms). - Padding checks before redzone checks so that we get messages about the corruption of whole slab and not about a single object. V3->V4 - Rename /proc/slabinfo to /proc/slubinfo. We have a different format after all. - More bug fixes and stabilization of diagnostic functions. This seems to be finally something that works wherever we test it. - Serialize kmem_cache_create and kmem_cache_destroy via slub_lock (Adrian's idea) - Add two new modifications (separate patches) to guarantee a mininum number of objects per slab and to pass through large allocations. V2->V3 - Debugging and diagnostic support. This is runtime enabled and not compile time enabled. Runtime debugging can be controlled via kernel boot options on an individual slab cache basis or globally. - Slab Trace support (For individual slab caches). - Resiliency support: If basic sanity checks are enabled (via F f.e.) (boot option) then SLUB will do the best to perform diagnostics and then continue (i.e. mark corrupted objects as used). - Fix up numerous issues including clash of SLUBs use of page flags with i386 arch use for pmd and pgds (which are managed as slab caches, sigh). - Dynamic per CPU array sizing. - Explain SLUB slabcache flags V1->V2 - Fix up various issues. Tested on i386 UP, X86_64 SMP, ia64 NUMA. - Provide NUMA support by splitting partial lists per node. - Better Slab cache merge support (now at around 50% of slabs) - List slab cache aliases if slab caches are merged. - Updated descriptions /proc/slabinfo output This is a new slab allocator which was motivated by the complexity of the existing code in mm/slab.c. It attempts to address a variety of concerns with the existing implementation. A. Management of object queues A particular concern was the complex management of the numerous object queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for each allocating CPU and use objects from a slab directly instead of queueing them up. B. Storage overhead of object queues SLAB Object queues exist per node, per CPU. The alien cache queue even has a queue array that contain a queue for each processor on each node. For very large systems the number of queues and the number of objects that may be caught in those queues grows exponentially. On our systems with 1k nodes / processors we have several gigabytes just tied up for storing references to objects for those queues This does not include the objects that could be on those queues. One fears that the whole memory of the machine could one day be consumed by those queues. C. SLAB meta data overhead SLAB has overhead at the beginning of each slab. This means that data cannot be naturally aligned at the beginning of a slab block. SLUB keeps all
[SLUB 0/2] SLUB: The unqueued slab allocator V6
[PATCH] SLUB The unqueued slab allocator v6 Note that the definition of the return type of ksize() is currently different between mm and Linus' tree. Patch is conforming to mm. This patch also needs sprint_symbol() support from mm. V5-V6: - Straighten out various coding issues u.a. to make the hot path clearer in slab_alloc and slab_free. This adds more gotos. sigh. - Detailed alloc / free tracking including pid, cpu, time of alloc / free if SLAB_STORE_USER is enabled or slub_debug=U specified on boot. - sysfs support via /sys/slab. Drop /proc/slubinfo support. Include slabinfo tool that produces an output similar to what /proc/slabinfo does. Tool needs to be made more sophisticated to allow control of various slub options at runtime. Currently reports total slab sizes, slab fragmentation and slab effectiveness (actual object use vs. slab space use). - Runtime debug option changes per slab via /sys/slab/slabcache. All slab debug options can be configured via sysfs provided that no objects have been allocated yet. - Deal with i386 use of slab page structs. Main patch disables slub for i386 (CONFIG_ARCH_USES_SLAB_PAGE_STRUCT). Then a special patch removes the page sized slabs and removes that setting. See the caveats in that patch for further details. V4-V5: - Single object slabs only for slabs slub_max_order otherwise generate sufficient objects to avoid frequent use of the page allocator. This is necessary to compensate for fragmentation caused by frequent uses of the page allocator. We expect slabs of PAGE_SIZE from this rule since multi object slabs require uses of fields that are in use on i386 and x86_64. See the quicklist patchset for a way to fix that issue and a patch to get rid of the PAGE_SIZE special casing. - Drop pass through to page allocator due to page allocator fragmenting memory. The buffering through large order allocations is done in SLUB. Infrequent larger order allocations cause less fragmentation than frequent small order allocations. - We need to update object sizes when merging slabs otherwise kzalloc will not initialize the full object (this caused the failure on various platforms). - Padding checks before redzone checks so that we get messages about the corruption of whole slab and not about a single object. V3-V4 - Rename /proc/slabinfo to /proc/slubinfo. We have a different format after all. - More bug fixes and stabilization of diagnostic functions. This seems to be finally something that works wherever we test it. - Serialize kmem_cache_create and kmem_cache_destroy via slub_lock (Adrian's idea) - Add two new modifications (separate patches) to guarantee a mininum number of objects per slab and to pass through large allocations. V2-V3 - Debugging and diagnostic support. This is runtime enabled and not compile time enabled. Runtime debugging can be controlled via kernel boot options on an individual slab cache basis or globally. - Slab Trace support (For individual slab caches). - Resiliency support: If basic sanity checks are enabled (via F f.e.) (boot option) then SLUB will do the best to perform diagnostics and then continue (i.e. mark corrupted objects as used). - Fix up numerous issues including clash of SLUBs use of page flags with i386 arch use for pmd and pgds (which are managed as slab caches, sigh). - Dynamic per CPU array sizing. - Explain SLUB slabcache flags V1-V2 - Fix up various issues. Tested on i386 UP, X86_64 SMP, ia64 NUMA. - Provide NUMA support by splitting partial lists per node. - Better Slab cache merge support (now at around 50% of slabs) - List slab cache aliases if slab caches are merged. - Updated descriptions /proc/slabinfo output This is a new slab allocator which was motivated by the complexity of the existing code in mm/slab.c. It attempts to address a variety of concerns with the existing implementation. A. Management of object queues A particular concern was the complex management of the numerous object queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for each allocating CPU and use objects from a slab directly instead of queueing them up. B. Storage overhead of object queues SLAB Object queues exist per node, per CPU. The alien cache queue even has a queue array that contain a queue for each processor on each node. For very large systems the number of queues and the number of objects that may be caught in those queues grows exponentially. On our systems with 1k nodes / processors we have several gigabytes just tied up for storing references to objects for those queues This does not include the objects that could be on those queues. One fears that the whole memory of the machine could one day be consumed by those queues. C. SLAB meta data overhead SLAB has overhead at the beginning of each slab. This means that data cannot be naturally aligned at the beginning of a slab block. SLUB keeps