Title: [174109] trunk/Source/bmalloc
Revision
174109
Author
[email protected]
Date
2014-09-30 11:37:45 -0700 (Tue, 30 Sep 2014)

Log Message

bmalloc: cleaned up fast path vs slow path
https://bugs.webkit.org/show_bug.cgi?id=137081

Reviewed by Sam Weinig.

Might be a 1% speedup on MallocBench. Also cleans up the code a bit.

* bmalloc/Allocator.cpp:
(bmalloc::Allocator::Allocator): Merged the small and medium range
caches, just like the small and medium allocators. Ranges are abstract
objects that don't really care whether they hold small or medium objects,
so they don't need to be segregated.

(bmalloc::Allocator::scavenge): Ditto.

(bmalloc::Allocator::allocateBumpRangeSlowCase):
(bmalloc::Allocator::allocateBumpRange): Same thing here, except that
we do care a tiny bit, because we need to specify small vs medium when
allocating new ranges from the heap, to ensure that the heap allocates
from the right segment of VM.

(bmalloc::Allocator::allocateLarge):
(bmalloc::Allocator::allocateXLarge): NO_INLINE because this was clouding
up the fast path. Large allocation performance is dominated by allocation
logic and initialization, so inlining it doesn't help.

(bmalloc::Allocator::allocateSlowCase): Slow path got a bit cleaner since
it doesn't need to distinguish small vs medium objects.

(bmalloc::Allocator::allocateSmallBumpRange): Deleted.
(bmalloc::Allocator::allocateMediumBumpRange): Deleted.

* bmalloc/Allocator.h:
* bmalloc/BumpRange.h:

* bmalloc/Cache.cpp:
(bmalloc::Cache::allocateSlowCase): Deleted.
(bmalloc::Cache::deallocateSlowCase): Deleted.
* bmalloc/Cache.h:
(bmalloc::Cache::allocate):
(bmalloc::Cache::deallocate):
(bmalloc::Cache::allocateFastCase): Deleted.
(bmalloc::Cache::deallocateFastCase): Deleted. Removed the Cache slow
paths. The downside to this change is that the fast path branches to two
distinct failure cases instead of one. The upside is that the slow path
doesn't need to re-read the segment register, which is not as cheap as a
normal register, and it doesn't need to do an extra level of function 
call. Seems to be worth it.

* bmalloc/Deallocator.h:
* bmalloc/Heap.cpp:
(bmalloc::Heap::refillSmallBumpRangeCache):
(bmalloc::Heap::refillMediumBumpRangeCache):
* bmalloc/Heap.h: Updated for interface changes.

* bmalloc/Sizes.h: The most ranges a cache will hold is the number of
small lines in a page / 2, since any other free lines will coalesce
with their neighbors.

Modified Paths

Diff

Modified: trunk/Source/bmalloc/ChangeLog (174108 => 174109)


--- trunk/Source/bmalloc/ChangeLog	2014-09-30 17:27:10 UTC (rev 174108)
+++ trunk/Source/bmalloc/ChangeLog	2014-09-30 18:37:45 UTC (rev 174109)
@@ -1,3 +1,64 @@
+2014-09-24  Geoffrey Garen  <[email protected]>
+
+        bmalloc: cleaned up fast path vs slow path
+        https://bugs.webkit.org/show_bug.cgi?id=137081
+
+        Reviewed by Sam Weinig.
+
+        Might be a 1% speedup on MallocBench. Also cleans up the code a bit.
+
+        * bmalloc/Allocator.cpp:
+        (bmalloc::Allocator::Allocator): Merged the small and medium range
+        caches, just like the small and medium allocators. Ranges are abstract
+        objects that don't really care whether they hold small or medium objects,
+        so they don't need to be segregated.
+
+        (bmalloc::Allocator::scavenge): Ditto.
+
+        (bmalloc::Allocator::allocateBumpRangeSlowCase):
+        (bmalloc::Allocator::allocateBumpRange): Same thing here, except that
+        we do care a tiny bit, because we need to specify small vs medium when
+        allocating new ranges from the heap, to ensure that the heap allocates
+        from the right segment of VM.
+
+        (bmalloc::Allocator::allocateLarge):
+        (bmalloc::Allocator::allocateXLarge): NO_INLINE because this was clouding
+        up the fast path. Large allocation performance is dominated by allocation
+        logic and initialization, so inlining it doesn't help.
+
+        (bmalloc::Allocator::allocateSlowCase): Slow path got a bit cleaner since
+        it doesn't need to distinguish small vs medium objects.
+
+        (bmalloc::Allocator::allocateSmallBumpRange): Deleted.
+        (bmalloc::Allocator::allocateMediumBumpRange): Deleted.
+
+        * bmalloc/Allocator.h:
+        * bmalloc/BumpRange.h:
+
+        * bmalloc/Cache.cpp:
+        (bmalloc::Cache::allocateSlowCase): Deleted.
+        (bmalloc::Cache::deallocateSlowCase): Deleted.
+        * bmalloc/Cache.h:
+        (bmalloc::Cache::allocate):
+        (bmalloc::Cache::deallocate):
+        (bmalloc::Cache::allocateFastCase): Deleted.
+        (bmalloc::Cache::deallocateFastCase): Deleted. Removed the Cache slow
+        paths. The downside to this change is that the fast path branches to two
+        distinct failure cases instead of one. The upside is that the slow path
+        doesn't need to re-read the segment register, which is not as cheap as a
+        normal register, and it doesn't need to do an extra level of function 
+        call. Seems to be worth it.
+
+        * bmalloc/Deallocator.h:
+        * bmalloc/Heap.cpp:
+        (bmalloc::Heap::refillSmallBumpRangeCache):
+        (bmalloc::Heap::refillMediumBumpRangeCache):
+        * bmalloc/Heap.h: Updated for interface changes.
+
+        * bmalloc/Sizes.h: The most ranges a cache will hold is the number of
+        small lines in a page / 2, since any other free lines will coalesce
+        with their neighbors.
+
 2014-09-23  Geoffrey Garen  <[email protected]>
 
         Rolled out r173346.

Modified: trunk/Source/bmalloc/bmalloc/Allocator.cpp (174108 => 174109)


--- trunk/Source/bmalloc/bmalloc/Allocator.cpp	2014-09-30 17:27:10 UTC (rev 174108)
+++ trunk/Source/bmalloc/bmalloc/Allocator.cpp	2014-09-30 18:37:45 UTC (rev 174109)
@@ -38,11 +38,8 @@
 Allocator::Allocator(Deallocator& deallocator)
     : m_deallocator(deallocator)
 {
-    for (unsigned short size = alignment; size <= smallMax; size += alignment)
+    for (unsigned short size = alignment; size <= mediumMax; size += alignment)
         m_bumpAllocators[sizeClass(size)].init(size);
-
-    for (unsigned short size = smallMax + alignment; size <= mediumMax; size += alignment)
-        m_bumpAllocators[sizeClass(size)].init(size);
 }
 
 Allocator::~Allocator()
@@ -52,9 +49,9 @@
 
 void Allocator::scavenge()
 {
-    for (unsigned short i = alignment; i <= smallMax; i += alignment) {
+    for (unsigned short i = alignment; i <= mediumMax; i += alignment) {
         BumpAllocator& allocator = m_bumpAllocators[sizeClass(i)];
-        SmallBumpRangeCache& bumpRangeCache = m_smallBumpRangeCaches[sizeClass(i)];
+        BumpRangeCache& bumpRangeCache = m_bumpRangeCaches[sizeClass(i)];
 
         while (allocator.canAllocate())
             m_deallocator.deallocate(allocator.allocate());
@@ -67,54 +64,37 @@
 
         allocator.clear();
     }
-
-    for (unsigned short i = smallMax + alignment; i <= mediumMax; i += alignment) {
-        BumpAllocator& allocator = m_bumpAllocators[sizeClass(i)];
-        MediumBumpRangeCache& bumpRangeCache = m_mediumBumpRangeCaches[sizeClass(i)];
-
-        while (allocator.canAllocate())
-            m_deallocator.deallocate(allocator.allocate());
-
-        while (bumpRangeCache.size()) {
-            allocator.refill(bumpRangeCache.pop());
-            while (allocator.canAllocate())
-                m_deallocator.deallocate(allocator.allocate());
-        }
-
-        allocator.clear();
-    }
 }
 
-BumpRange Allocator::allocateSmallBumpRange(size_t sizeClass)
+NO_INLINE BumpRange Allocator::allocateBumpRangeSlowCase(size_t sizeClass)
 {
-    SmallBumpRangeCache& bumpRangeCache = m_smallBumpRangeCaches[sizeClass];
-    if (!bumpRangeCache.size()) {
-        std::lock_guard<StaticMutex> lock(PerProcess<Heap>::mutex());
+    BumpRangeCache& bumpRangeCache = m_bumpRangeCaches[sizeClass];
+
+    std::lock_guard<StaticMutex> lock(PerProcess<Heap>::mutex());
+    if (sizeClass <= bmalloc::sizeClass(smallMax))
         PerProcess<Heap>::getFastCase()->refillSmallBumpRangeCache(lock, sizeClass, bumpRangeCache);
-    }
+    else
+        PerProcess<Heap>::getFastCase()->refillMediumBumpRangeCache(lock, sizeClass, bumpRangeCache);
 
     return bumpRangeCache.pop();
 }
 
-BumpRange Allocator::allocateMediumBumpRange(size_t sizeClass)
+INLINE BumpRange Allocator::allocateBumpRange(size_t sizeClass)
 {
-    MediumBumpRangeCache& bumpRangeCache = m_mediumBumpRangeCaches[sizeClass];
-    if (!bumpRangeCache.size()) {
-        std::lock_guard<StaticMutex> lock(PerProcess<Heap>::mutex());
-        PerProcess<Heap>::getFastCase()->refillMediumBumpRangeCache(lock, sizeClass, bumpRangeCache);
-    }
-
+    BumpRangeCache& bumpRangeCache = m_bumpRangeCaches[sizeClass];
+    if (!bumpRangeCache.size())
+        return allocateBumpRangeSlowCase(sizeClass);
     return bumpRangeCache.pop();
 }
 
-void* Allocator::allocateLarge(size_t size)
+NO_INLINE void* Allocator::allocateLarge(size_t size)
 {
     size = roundUpToMultipleOf<largeAlignment>(size);
     std::lock_guard<StaticMutex> lock(PerProcess<Heap>::mutex());
     return PerProcess<Heap>::getFastCase()->allocateLarge(lock, size);
 }
 
-void* Allocator::allocateXLarge(size_t size)
+NO_INLINE void* Allocator::allocateXLarge(size_t size)
 {
     size = roundUpToMultipleOf<largeAlignment>(size);
     std::lock_guard<StaticMutex> lock(PerProcess<Heap>::mutex());
@@ -123,26 +103,13 @@
 
 void* Allocator::allocateSlowCase(size_t size)
 {
-IF_DEBUG(
-    void* dummy;
-    BASSERT(!allocateFastCase(size, dummy));
-)
-
     if (size <= mediumMax) {
         size_t sizeClass = bmalloc::sizeClass(size);
         BumpAllocator& allocator = m_bumpAllocators[sizeClass];
-
-        if (allocator.size() <= smallMax) {
-            allocator.refill(allocateSmallBumpRange(sizeClass));
-            return allocator.allocate();
-        }
-
-        if (allocator.size() <= mediumMax) {
-            allocator.refill(allocateMediumBumpRange(sizeClass));
-            return allocator.allocate();
-        }
+        allocator.refill(allocateBumpRange(sizeClass));
+        return allocator.allocate();
     }
-    
+
     if (size <= largeMax)
         return allocateLarge(size);
 

Modified: trunk/Source/bmalloc/bmalloc/Allocator.h (174108 => 174109)


--- trunk/Source/bmalloc/bmalloc/Allocator.h	2014-09-30 17:27:10 UTC (rev 174108)
+++ trunk/Source/bmalloc/bmalloc/Allocator.h	2014-09-30 18:37:45 UTC (rev 174109)
@@ -45,27 +45,23 @@
     ~Allocator();
 
     void* allocate(size_t);
-    bool allocateFastCase(size_t, void*&);
-    void* allocateSlowCase(size_t);
-    
     void scavenge();
 
 private:
-    void* allocateFastCase(BumpAllocator&);
-
+    bool allocateFastCase(size_t, void*&);
+    void* allocateSlowCase(size_t);
+    
     void* allocateMedium(size_t);
     void* allocateLarge(size_t);
     void* allocateXLarge(size_t);
     
-    BumpRange allocateSmallBumpRange(size_t sizeClass);
-    BumpRange allocateMediumBumpRange(size_t sizeClass);
+    BumpRange allocateBumpRange(size_t sizeClass);
+    BumpRange allocateBumpRangeSlowCase(size_t sizeClass);
     
     Deallocator& m_deallocator;
 
     std::array<BumpAllocator, mediumMax / alignment> m_bumpAllocators;
-
-    std::array<SmallBumpRangeCache, smallMax / alignment> m_smallBumpRangeCaches;
-    std::array<MediumBumpRangeCache, mediumMax / alignment> m_mediumBumpRangeCaches;
+    std::array<BumpRangeCache, mediumMax / alignment> m_bumpRangeCaches;
 };
 
 inline bool Allocator::allocateFastCase(size_t size, void*& object)

Modified: trunk/Source/bmalloc/bmalloc/BumpRange.h (174108 => 174109)


--- trunk/Source/bmalloc/bmalloc/BumpRange.h	2014-09-30 17:27:10 UTC (rev 174108)
+++ trunk/Source/bmalloc/bmalloc/BumpRange.h	2014-09-30 18:37:45 UTC (rev 174109)
@@ -37,8 +37,7 @@
     unsigned short objectCount;
 };
 
-typedef FixedVector<BumpRange, smallRangeCacheCapacity> SmallBumpRangeCache;
-typedef FixedVector<BumpRange, mediumRangeCacheCapacity> MediumBumpRangeCache;
+typedef FixedVector<BumpRange, bumpRangeCacheCapacity> BumpRangeCache;
 
 } // namespace bmalloc
 

Modified: trunk/Source/bmalloc/bmalloc/Cache.cpp (174108 => 174109)


--- trunk/Source/bmalloc/bmalloc/Cache.cpp	2014-09-30 17:27:10 UTC (rev 174108)
+++ trunk/Source/bmalloc/bmalloc/Cache.cpp	2014-09-30 18:37:45 UTC (rev 174109)
@@ -54,27 +54,11 @@
     m_deallocator.scavenge();
 }
 
-NO_INLINE void* Cache::allocateSlowCase(size_t size)
-{
-    Cache* cache = PerThread<Cache>::getFastCase();
-    if (!cache)
-        return allocateSlowCaseNullCache(size);
-    return cache->allocator().allocateSlowCase(size);
-}
-
 NO_INLINE void* Cache::allocateSlowCaseNullCache(size_t size)
 {
     return PerThread<Cache>::getSlowCase()->allocator().allocate(size);
 }
 
-NO_INLINE void Cache::deallocateSlowCase(void* object)
-{
-    Cache* cache = PerThread<Cache>::getFastCase();
-    if (!cache)
-        return deallocateSlowCaseNullCache(object);
-    cache->deallocator().deallocateSlowCase(object);
-}
-
 NO_INLINE void Cache::deallocateSlowCaseNullCache(void* object)
 {
     PerThread<Cache>::getSlowCase()->deallocator().deallocate(object);

Modified: trunk/Source/bmalloc/bmalloc/Cache.h (174108 => 174109)


--- trunk/Source/bmalloc/bmalloc/Cache.h	2014-09-30 17:27:10 UTC (rev 174108)
+++ trunk/Source/bmalloc/bmalloc/Cache.h	2014-09-30 18:37:45 UTC (rev 174109)
@@ -50,48 +50,29 @@
     void scavenge();
 
 private:
-    static bool allocateFastCase(size_t, void*&);
-    static void* allocateSlowCase(size_t);
     static void* allocateSlowCaseNullCache(size_t);
-
-    static bool deallocateFastCase(void*);
-    static void deallocateSlowCase(void*);
     static void deallocateSlowCaseNullCache(void*);
 
     Deallocator m_deallocator;
     Allocator m_allocator;
 };
 
-inline bool Cache::allocateFastCase(size_t size, void*& object)
+inline void* Cache::allocate(size_t size)
 {
     Cache* cache = PerThread<Cache>::getFastCase();
     if (!cache)
-        return false;
-    return cache->allocator().allocateFastCase(size, object);
+        return allocateSlowCaseNullCache(size);
+    return cache->allocator().allocate(size);
 }
 
-inline bool Cache::deallocateFastCase(void* object)
+inline void Cache::deallocate(void* object)
 {
     Cache* cache = PerThread<Cache>::getFastCase();
     if (!cache)
-        return false;
-    return cache->deallocator().deallocateFastCase(object);
+        return deallocateSlowCaseNullCache(object);
+    return cache->deallocator().deallocate(object);
 }
 
-inline void* Cache::allocate(size_t size)
-{
-    void* object;
-    if (!allocateFastCase(size, object))
-        return allocateSlowCase(size);
-    return object;
-}
-
-inline void Cache::deallocate(void* object)
-{
-    if (!deallocateFastCase(object))
-        deallocateSlowCase(object);
-}
-
 } // namespace bmalloc
 
 #endif // Cache_h

Modified: trunk/Source/bmalloc/bmalloc/Deallocator.h (174108 => 174109)


--- trunk/Source/bmalloc/bmalloc/Deallocator.h	2014-09-30 17:27:10 UTC (rev 174108)
+++ trunk/Source/bmalloc/bmalloc/Deallocator.h	2014-09-30 18:37:45 UTC (rev 174109)
@@ -41,12 +41,12 @@
     ~Deallocator();
 
     void deallocate(void*);
+    void scavenge();
+    
+private:
     bool deallocateFastCase(void*);
     void deallocateSlowCase(void*);
 
-    void scavenge();
-    
-private:
     void deallocateLarge(void*);
     void deallocateXLarge(void*);
     void processObjectLog();

Modified: trunk/Source/bmalloc/bmalloc/Heap.cpp (174108 => 174109)


--- trunk/Source/bmalloc/bmalloc/Heap.cpp	2014-09-30 17:27:10 UTC (rev 174108)
+++ trunk/Source/bmalloc/bmalloc/Heap.cpp	2014-09-30 18:37:45 UTC (rev 174109)
@@ -152,7 +152,7 @@
     }
 }
 
-void Heap::refillSmallBumpRangeCache(std::lock_guard<StaticMutex>& lock, size_t sizeClass, SmallBumpRangeCache& rangeCache)
+void Heap::refillSmallBumpRangeCache(std::lock_guard<StaticMutex>& lock, size_t sizeClass, BumpRangeCache& rangeCache)
 {
     BASSERT(!rangeCache.size());
     SmallPage* page = allocateSmallPage(lock, sizeClass);
@@ -189,7 +189,7 @@
     }
 }
 
-void Heap::refillMediumBumpRangeCache(std::lock_guard<StaticMutex>& lock, size_t sizeClass, MediumBumpRangeCache& rangeCache)
+void Heap::refillMediumBumpRangeCache(std::lock_guard<StaticMutex>& lock, size_t sizeClass, BumpRangeCache& rangeCache)
 {
     MediumPage* page = allocateMediumPage(lock, sizeClass);
     BASSERT(!rangeCache.size());

Modified: trunk/Source/bmalloc/bmalloc/Heap.h (174108 => 174109)


--- trunk/Source/bmalloc/bmalloc/Heap.h	2014-09-30 17:27:10 UTC (rev 174108)
+++ trunk/Source/bmalloc/bmalloc/Heap.h	2014-09-30 18:37:45 UTC (rev 174109)
@@ -50,10 +50,10 @@
 public:
     Heap(std::lock_guard<StaticMutex>&);
 
-    void refillSmallBumpRangeCache(std::lock_guard<StaticMutex>&, size_t sizeClass, SmallBumpRangeCache&);
+    void refillSmallBumpRangeCache(std::lock_guard<StaticMutex>&, size_t sizeClass, BumpRangeCache&);
     void derefSmallLine(std::lock_guard<StaticMutex>&, SmallLine*);
 
-    void refillMediumBumpRangeCache(std::lock_guard<StaticMutex>&, size_t sizeClass, MediumBumpRangeCache&);
+    void refillMediumBumpRangeCache(std::lock_guard<StaticMutex>&, size_t sizeClass, BumpRangeCache&);
     void derefMediumLine(std::lock_guard<StaticMutex>&, MediumLine*);
 
     void* allocateLarge(std::lock_guard<StaticMutex>&, size_t);

Modified: trunk/Source/bmalloc/bmalloc/Sizes.h (174108 => 174109)


--- trunk/Source/bmalloc/bmalloc/Sizes.h	2014-09-30 17:27:10 UTC (rev 174108)
+++ trunk/Source/bmalloc/bmalloc/Sizes.h	2014-09-30 18:37:45 UTC (rev 174109)
@@ -91,9 +91,7 @@
     static const uintptr_t smallOrMediumSmallTypeMask = smallType ^ mediumType; // Only valid if object is known to be small or medium.
 
     static const size_t deallocatorLogCapacity = 256;
-
-    static const size_t smallRangeCacheCapacity = vmPageSize / smallLineSize;
-    static const size_t mediumRangeCacheCapacity = vmPageSize / mediumLineSize;
+    static const size_t bumpRangeCacheCapacity = vmPageSize / smallLineSize / 2;
     
     static const std::chrono::milliseconds scavengeSleepDuration = std::chrono::milliseconds(512);
 
_______________________________________________
webkit-changes mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to