Skip to site navigation (Press enter)

[webkit-changes] [198376] trunk/Source

commit-queue Sat, 19 Mar 2016 01:24:34 -0700

Title: [198376] trunk/Source

Revision: 198376
Author: [email protected]
Date: 2016-03-17 21:06:59 -0700 (Thu, 17 Mar 2016)

Log Message

[JSC] Make CSE's ImpureData faster when dealing with large blocks
https://bugs.webkit.org/show_bug.cgi?id=155594


Patch by Benjamin Poulain <[email protected]> on 2016-03-17
Reviewed by Filip Pizlo.

Source/_javascript_Core:

In some tests with large blocks, the time spent in DFG's LocalCSE
can be over 10% of the total compile time.
In those cases, LocalCSE is completely dominated by handling large
blocks.

This patch addresses the most obvious hot spots ImpureData's handling.

Initially, most of the time was going into HashTable::rehash().
The reason is the buckets are <HeapLocation, LazyNode> gigantic.
The hash table would easily get into several kilobytes and the CPU
was spending more time dealing with memory than anything.

To solve that, I moved the pairs lazily to the heap. The table itself
just contains the unique_ptr to those values. This makes the table
reasonably small and the alloc/dealloc are paid for by the fast rehash().

Once addImpure() was better, the next big bottleneck was clobber().
For each clobber(), we need to go over the entire map and test each value.
That loop was where most of the time was going.

Most calls to clobber() come from two kinds: SideState and Stack.

SideState is easy: it is never def'ed so we can always skip it.

Stack is disjoint from Heap too so we can also put it separately.

Splitting the map into 2 helped reduce the overhead. The maps are:
-Stack
-Heap

Having Stack alone was not enough for many blocks. In some cases,
you have a ton of SetLocal/GetLocal and having Stack separately
makes no difference.

To solve that, I split Stack in two: a map addressed by AbstractHeap
+ unique HeapLocation and a fallback map for everything else.
Since most Stack are not TOP and are unique per AbstractHeap,
I get O(1) clobber in most cases.

I could achieve the same result with a custom hash structure.
I don't think it is worth the effort, in most cases, m_fallbackStackMap
has a size of zero or one.

This patch introduces a lot of coupling between CSE and AbstractHeap.
To reduce the risk of bugs, the old map is still maintained in debug
and each step checks that the results are the same as the new implementation.

A new validation step also verify the strong assumptions made by CSE:
-SideState and World are never def().
-We never write HEAP TOP, we only write specific heap location.

* dfg/DFGCSEPhase.cpp:
* dfg/DFGHeapLocation.h:
* dfg/DFGLazyNode.h:
(JSC::DFG::LazyNode::hash):

Source/WTF:

* wtf/HashSet.h:
(WTF::V>::removeIf):

Modified Paths

trunk/Source/_javascript_Core/ChangeLog
trunk/Source/_javascript_Core/dfg/DFGCSEPhase.cpp
trunk/Source/_javascript_Core/dfg/DFGHeapLocation.h
trunk/Source/_javascript_Core/dfg/DFGLazyNode.h
trunk/Source/_javascript_Core/dfg/DFGValidate.cpp
trunk/Source/WTF/ChangeLog
trunk/Source/WTF/wtf/HashSet.h
trunk/Source/WTF/wtf/HashTraits.h

Diff

Modified: trunk/Source/_javascript_Core/ChangeLog (198375 => 198376)


--- trunk/Source/_javascript_Core/ChangeLog	2016-03-18 02:11:31 UTC (rev 198375)
+++ trunk/Source/_javascript_Core/ChangeLog	2016-03-18 04:06:59 UTC (rev 198376)
@@ -1,3 +1,66 @@
+2016-03-17  Benjamin Poulain  <[email protected]>
+
+        [JSC] Make CSE's ImpureData faster when dealing with large blocks
+        https://bugs.webkit.org/show_bug.cgi?id=155594
+
+        Reviewed by Filip Pizlo.
+
+        In some tests with large blocks, the time spent in DFG's LocalCSE
+        can be over 10% of the total compile time.
+        In those cases, LocalCSE is completely dominated by handling large
+        blocks.
+
+        This patch addresses the most obvious hot spots ImpureData's handling.
+
+        Initially, most of the time was going into HashTable::rehash().
+        The reason is the buckets are <HeapLocation, LazyNode> gigantic.
+        The hash table would easily get into several kilobytes and the CPU
+        was spending more time dealing with memory than anything.
+
+        To solve that, I moved the pairs lazily to the heap. The table itself
+        just contains the unique_ptr to those values. This makes the table
+        reasonably small and the alloc/dealloc are paid for by the fast rehash().
+
+        Once addImpure() was better, the next big bottleneck was clobber().
+        For each clobber(), we need to go over the entire map and test each value.
+        That loop was where most of the time was going.
+
+        Most calls to clobber() come from two kinds: SideState and Stack.
+
+        SideState is easy: it is never def'ed so we can always skip it.
+
+        Stack is disjoint from Heap too so we can also put it separately.
+
+        Splitting the map into 2 helped reduce the overhead. The maps are:
+        -Stack
+        -Heap
+
+        Having Stack alone was not enough for many blocks. In some cases,
+        you have a ton of SetLocal/GetLocal and having Stack separately
+        makes no difference.
+
+        To solve that, I split Stack in two: a map addressed by AbstractHeap
+        + unique HeapLocation and a fallback map for everything else.
+        Since most Stack are not TOP and are unique per AbstractHeap,
+        I get O(1) clobber in most cases.
+
+        I could achieve the same result with a custom hash structure.
+        I don't think it is worth the effort, in most cases, m_fallbackStackMap
+        has a size of zero or one.
+
+        This patch introduces a lot of coupling between CSE and AbstractHeap.
+        To reduce the risk of bugs, the old map is still maintained in debug
+        and each step checks that the results are the same as the new implementation.
+
+        A new validation step also verify the strong assumptions made by CSE:
+        -SideState and World are never def().
+        -We never write HEAP TOP, we only write specific heap location.
+
+        * dfg/DFGCSEPhase.cpp:
+        * dfg/DFGHeapLocation.h:
+        * dfg/DFGLazyNode.h:
+        (JSC::DFG::LazyNode::hash):
+
 2016-03-17  Saam barati  <[email protected]>
 
         Implement SmallPtrSet and integrate it into the Parser

Modified: trunk/Source/_javascript_Core/dfg/DFGCSEPhase.cpp (198375 => 198376)


--- trunk/Source/_javascript_Core/dfg/DFGCSEPhase.cpp	2016-03-18 02:11:31 UTC (rev 198375)
+++ trunk/Source/_javascript_Core/dfg/DFGCSEPhase.cpp	2016-03-18 04:06:59 UTC (rev 198376)
@@ -51,28 +51,231 @@
 
 const bool verbose = false;
 
-class ClobberFilter {
+class ImpureDataSlot {
+    WTF_MAKE_NONCOPYABLE(ImpureDataSlot);
+    WTF_MAKE_FAST_ALLOCATED;
 public:
-    ClobberFilter(AbstractHeap heap)
-        : m_heap(heap)
+    HeapLocation key;
+    LazyNode value;
+    unsigned hash;
+};
+
+struct ImpureDataSlotHash : public DefaultHash<std::unique_ptr<ImpureDataSlot>>::Hash {
+    static unsigned hash(const std::unique_ptr<ImpureDataSlot>& key)
     {
+        return key->hash;
     }
-    
-    bool operator()(const ImpureMap::KeyValuePairType& pair) const
+
+    static bool equal(const std::unique_ptr<ImpureDataSlot>& a, const std::unique_ptr<ImpureDataSlot>& b)
     {
-        return m_heap.overlaps(pair.key.heap());
+        // The ImpureDataSlot are unique per table per HeapLocation. This lets us compare the key
+        // by just comparing the pointers of the unique ImpureDataSlots.
+        ASSERT(a != b || a->key == b->key);
+        return a == b;
     }
-    
+};
+
+struct ImpureDataTranslator {
+    static unsigned hash(const HeapLocation& key)
+    {
+        return key.hash();
+    }
+
+    static bool equal(const std::unique_ptr<ImpureDataSlot>& slot, const HeapLocation& key)
+    {
+        if (!slot)
+            return false;
+        if (HashTraits<std::unique_ptr<ImpureDataSlot>>::isDeletedValue(slot))
+            return false;
+        return slot->key == key;
+    }
+
+    static void translate(std::unique_ptr<ImpureDataSlot>& slot, const HeapLocation& key, unsigned hashCode)
+    {
+        new (NotNull, std::addressof(slot)) std::unique_ptr<ImpureDataSlot>(new ImpureDataSlot {key, LazyNode(), hashCode});
+    }
+};
+
+class ImpureMap {
+    WTF_MAKE_FAST_ALLOCATED;
+    WTF_MAKE_NONCOPYABLE(ImpureMap);
+public:
+    ImpureMap() = default;
+
+    ImpureMap(ImpureMap&& other)
+    {
+        m_heapMap.swap(other.m_heapMap);
+        m_abstractHeapStackMap.swap(other.m_abstractHeapStackMap);
+        m_fallbackStackMap.swap(other.m_fallbackStackMap);
+#if !defined(NDEBUG)
+        m_debugImpureData.swap(other.m_debugImpureData);
+#endif
+    }
+
+    const ImpureDataSlot* add(const HeapLocation& location, const LazyNode& node)
+    {
+        const ImpureDataSlot* result = addImpl(location, node);
+
+#if !defined(NDEBUG)
+        auto addResult = m_debugImpureData.add(location, node);
+        ASSERT(!!result == !addResult.isNewEntry);
+#endif
+        return result;
+    }
+
+    LazyNode get(const HeapLocation& location) const
+    {
+        LazyNode result = getImpl(location);
+#if !defined(NDEBUG)
+        ASSERT(result == m_debugImpureData.get(location));
+#endif
+        return result;
+    }
+
+    void clobber(AbstractHeap heap)
+    {
+        switch (heap.kind()) {
+        case World: {
+            clear();
+            break;
+        }
+        case SideState:
+            break;
+        case Stack: {
+            ASSERT(!heap.payload().isTop());
+            ASSERT(heap.payload().value() == heap.payload().value32());
+            m_abstractHeapStackMap.remove(heap.payload().value32());
+            clobber(m_fallbackStackMap, heap);
+            break;
+        }
+        default:
+            clobber(m_heapMap, heap);
+            break;
+        }
+#if !defined(NDEBUG)
+        m_debugImpureData.removeIf([heap](const HashMap<HeapLocation, LazyNode>::KeyValuePairType& pair) -> bool {
+            return heap.overlaps(pair.key.heap());
+        });
+        ASSERT(m_debugImpureData.size()
+            == (m_heapMap.size()
+                + m_abstractHeapStackMap.size()
+                + m_fallbackStackMap.size()));
+
+        const bool verifyClobber = false;
+        if (verifyClobber) {
+            for (auto& pair : m_debugImpureData)
+                ASSERT(!!get(pair.key));
+        }
+#endif
+    }
+
+    void clear()
+    {
+        m_heapMap.clear();
+        m_abstractHeapStackMap.clear();
+        m_fallbackStackMap.clear();
+#if !defined(NDEBUG)
+        m_debugImpureData.clear();
+#endif
+    }
+
 private:
-    AbstractHeap m_heap;
+    typedef HashSet<std::unique_ptr<ImpureDataSlot>, ImpureDataSlotHash> Map;
+
+    const ImpureDataSlot* addImpl(const HeapLocation& location, const LazyNode& node)
+    {
+        switch (location.heap().kind()) {
+        case World:
+        case SideState:
+            RELEASE_ASSERT_NOT_REACHED();
+        case Stack: {
+            AbstractHeap abstractHeap = location.heap();
+            if (abstractHeap.payload().isTop())
+                return add(m_fallbackStackMap, location, node);
+            ASSERT(abstractHeap.payload().value() == abstractHeap.payload().value32());
+            auto addResult = m_abstractHeapStackMap.add(abstractHeap.payload().value32(), nullptr);
+            if (addResult.isNewEntry) {
+                addResult.iterator->value.reset(new ImpureDataSlot {location, node, 0});
+                return nullptr;
+            }
+            if (addResult.iterator->value->key == location)
+                return addResult.iterator->value.get();
+            return add(m_fallbackStackMap, location, node);
+        }
+        default:
+            return add(m_heapMap, location, node);
+        }
+        return nullptr;
+    }
+
+    LazyNode getImpl(const HeapLocation& location) const
+    {
+        switch (location.heap().kind()) {
+        case World:
+        case SideState:
+            RELEASE_ASSERT_NOT_REACHED();
+        case Stack: {
+            ASSERT(location.heap().payload().value() == location.heap().payload().value32());
+            auto iterator = m_abstractHeapStackMap.find(location.heap().payload().value32());
+            if (iterator != m_abstractHeapStackMap.end()
+                && iterator->value->key == location)
+                return iterator->value->value;
+            return get(m_fallbackStackMap, location);
+        }
+        default:
+            return get(m_heapMap, location);
+        }
+        return LazyNode();
+    }
+
+    static const ImpureDataSlot* add(Map& map, const HeapLocation& location, const LazyNode& node)
+    {
+        auto result = map.add<ImpureDataTranslator>(location);
+        if (result.isNewEntry) {
+            (*result.iterator)->value = node;
+            return nullptr;
+        }
+        return result.iterator->get();
+    }
+
+    static LazyNode get(const Map& map, const HeapLocation& location)
+    {
+        auto iterator = map.find<ImpureDataTranslator>(location);
+        if (iterator != map.end())
+            return (*iterator)->value;
+        return LazyNode();
+    }
+
+    static void clobber(Map& map, AbstractHeap heap)
+    {
+        map.removeIf([heap](const std::unique_ptr<ImpureDataSlot>& slot) -> bool {
+            return heap.overlaps(slot->key.heap());
+        });
+    }
+
+    Map m_worldMap;
+    Map m_heapMap;
+    Map m_sideStateMap;
+
+    // The majority of Impure Stack Slotsare unique per value.
+    // This is very useful for fast clobber(), we can just remove the slot addressed by AbstractHeap
+    // in O(1).
+    //
+    // When there are conflict, any additional HeapLocation is added in the fallback map.
+    // This works well because fallbackStackMap remains tiny.
+    //
+    // One cannot assume a unique ImpureData is in m_abstractHeapStackMap. It may have been
+    // a duplicate in the past and now only live in m_fallbackStackMap.
+    //
+    // Obviously, TOP always goes into m_fallbackStackMap since it does not have a unique value.
+    HashMap<int32_t, std::unique_ptr<ImpureDataSlot>, DefaultHash<int32_t>::Hash, WTF::SignedWithZeroKeyHashTraits<int32_t>> m_abstractHeapStackMap;
+    Map m_fallbackStackMap;
+
+#if !defined(NDEBUG)
+    HashMap<HeapLocation, LazyNode> m_debugImpureData;
+#endif
 };
 
-inline void clobber(ImpureMap& map, AbstractHeap heap)
-{
-    ClobberFilter filter(heap);
-    map.removeIf(filter);
-}
-
 class LocalCSEPhase : public Phase {
 public:
     LocalCSEPhase(Graph& graph)
@@ -131,6 +334,9 @@
     
         void write(AbstractHeap heap)
         {
+            if (heap.kind() == SideState)
+                return;
+
             for (unsigned i = 0; i < m_impureLength; ++i) {
                 if (heap.overlaps(m_impureMap[i].key.heap()))
                     m_impureMap[i--] = m_impureMap[--m_impureLength];
@@ -192,7 +398,7 @@
     
         void write(AbstractHeap heap)
         {
-            clobber(m_impureMap, heap);
+            m_impureMap.clobber(heap);
         }
     
         Node* addPure(PureValue value, Node* node)
@@ -208,17 +414,16 @@
             return m_impureMap.get(location);
         }
     
-        LazyNode addImpure(HeapLocation location, LazyNode node)
+        LazyNode addImpure(const HeapLocation& location, const LazyNode& node)
         {
-            auto result = m_impureMap.add(location, node);
-            if (result.isNewEntry)
-                return nullptr;
-            return result.iterator->value;
+            if (const ImpureDataSlot* slot = m_impureMap.add(location, node))
+                return slot->value;
+            return LazyNode();
         }
 
     private:
         HashMap<PureValue, Node*> m_pureMap;
-        HashMap<HeapLocation, LazyNode> m_impureMap;
+        ImpureMap m_impureMap;
     };
 
     template<typename Maps>
@@ -327,7 +532,7 @@
             m_changed = true;
         }
     
-        void def(HeapLocation location, LazyNode value)
+        void def(const HeapLocation& location, const LazyNode& value)
         {
             LazyNode match = m_maps.addImpure(location, value);
             if (!match)
@@ -461,10 +666,8 @@
     
     void write(AbstractHeap heap)
     {
-        clobber(m_impureData->availableAtTail, heap);
+        m_impureData->availableAtTail.clobber(heap);
         m_writesSoFar.add(heap);
-        if (verbose)
-            dataLog("    Clobbered, new tail map: ", mapDump(m_impureData->availableAtTail), "\n");
     }
     
     void def(PureValue value)
@@ -579,8 +782,6 @@
                     dataLog("        It strictly dominates.\n");
                 DFG_ASSERT(m_graph, m_node, data.didVisit);
                 DFG_ASSERT(m_graph, m_node, !match);
-                if (verbose)
-                    dataLog("        Availability map: ", mapDump(data.availableAtTail), "\n");
                 match = data.availableAtTail.get(location);
                 if (verbose)
                     dataLog("        Availability: ", match, "\n");
@@ -639,7 +840,7 @@
             if (verbose)
                 dataLog("      Adding at-tail mapping: ", location, " -> ", value, "\n");
             auto result = m_impureData->availableAtTail.add(location, value);
-            ASSERT_UNUSED(result, result.isNewEntry);
+            ASSERT_UNUSED(result, !result);
             return;
         }

Modified: trunk/Source/_javascript_Core/dfg/DFGHeapLocation.h (198375 => 198376)


--- trunk/Source/_javascript_Core/dfg/DFGHeapLocation.h	2016-03-18 02:11:31 UTC (rev 198375)
+++ trunk/Source/_javascript_Core/dfg/DFGHeapLocation.h	2016-03-18 04:06:59 UTC (rev 198376)
@@ -154,12 +154,6 @@
 
 } // namespace WTF
 
-namespace JSC { namespace DFG {
-
-typedef HashMap<HeapLocation, LazyNode> ImpureMap;
-
-} } // namespace JSC::DFG
-
 #endif // ENABLE(DFG_JIT)
 
 #endif // DFGHeapLocation_h

Modified: trunk/Source/_javascript_Core/dfg/DFGLazyNode.h (198375 => 198376)


--- trunk/Source/_javascript_Core/dfg/DFGLazyNode.h	2016-03-18 02:11:31 UTC (rev 198375)
+++ trunk/Source/_javascript_Core/dfg/DFGLazyNode.h	2016-03-18 04:06:59 UTC (rev 198376)
@@ -110,9 +110,10 @@
 
     unsigned hash() const
     {
-        if (asValue())
-            return WTF::PtrHash<FrozenValue*>::hash(asValue());
-        return WTF::PtrHash<Node*>::hash(m_node);
+        void* toHash = m_node;
+        if (FrozenValue* value = asValue())
+            toHash = value;
+        return WTF::PtrHash<void*>::hash(toHash);
     }
 
     bool operator==(const LazyNode& other) const

Modified: trunk/Source/_javascript_Core/dfg/DFGValidate.cpp (198375 => 198376)


--- trunk/Source/_javascript_Core/dfg/DFGValidate.cpp	2016-03-18 02:11:31 UTC (rev 198375)
+++ trunk/Source/_javascript_Core/dfg/DFGValidate.cpp	2016-03-18 04:06:59 UTC (rev 198376)
@@ -29,6 +29,7 @@
 #if ENABLE(DFG_JIT)
 
 #include "CodeBlockWithJITType.h"
+#include "DFGClobberize.h"
 #include "DFGClobbersExitState.h"
 #include "DFGMayExit.h"
 #include "JSCInlines.h"
@@ -298,6 +299,47 @@
             validateSSA();
             break;
         }
+
+        // Validate clobbered states.
+        struct DefLambdaAdaptor {
+            std::function<void(PureValue)> pureValue;
+            std::function<void(HeapLocation, LazyNode)> locationAndNode;
+
+            void operator()(PureValue value) const
+            {
+                pureValue(value);
+            }
+
+            void operator()(HeapLocation location, LazyNode node) const
+            {
+                locationAndNode(location, node);
+            }
+        };
+        for (BasicBlock* block : m_graph.blocksInNaturalOrder()) {
+            for (Node* node : *block) {
+                clobberize(m_graph, node,
+                    [&] (AbstractHeap) { },
+                    [&] (AbstractHeap heap)
+                    {
+                        // CSE assumes that HEAP TOP is never written.
+                        // If this assumption is weakened, you need to update clobbering
+                        // in CSE accordingly.
+                        if (heap.kind() == Stack)
+                            VALIDATE((node), !heap.payload().isTop());
+                    },
+                    DefLambdaAdaptor {
+                        [&] (PureValue) { },
+                        [&] (HeapLocation location, LazyNode)
+                        {
+                            VALIDATE((node), location.heap().kind() != SideState);
+
+                            // More specific kinds should be used instead.
+                            VALIDATE((node), location.heap().kind() != World);
+                            VALIDATE((node), location.heap().kind() != Heap);
+                        }
+                });
+            }
+        }
     }
     
 private:

Modified: trunk/Source/WTF/ChangeLog (198375 => 198376)


--- trunk/Source/WTF/ChangeLog	2016-03-18 02:11:31 UTC (rev 198375)
+++ trunk/Source/WTF/ChangeLog	2016-03-18 04:06:59 UTC (rev 198376)
@@ -1,3 +1,13 @@
+2016-03-17  Benjamin Poulain  <[email protected]>
+
+        [JSC] Make CSE's ImpureData faster when dealing with large blocks
+        https://bugs.webkit.org/show_bug.cgi?id=155594
+
+        Reviewed by Filip Pizlo.
+
+        * wtf/HashSet.h:
+        (WTF::V>::removeIf):
+
 2016-03-17  Saam barati  <[email protected]>
 
         Implement SmallPtrSet and integrate it into the Parser

Modified: trunk/Source/WTF/wtf/HashSet.h (198375 => 198376)


--- trunk/Source/WTF/wtf/HashSet.h	2016-03-18 02:11:31 UTC (rev 198375)
+++ trunk/Source/WTF/wtf/HashSet.h	2016-03-18 04:06:59 UTC (rev 198376)
@@ -101,6 +101,8 @@
 
         bool remove(const ValueType&);
         bool remove(iterator);
+        template<typename Functor>
+        void removeIf(const Functor&);
         void clear();
 
         ValueType take(const ValueType&);
@@ -251,6 +253,13 @@
     }
 
     template<typename T, typename U, typename V>
+    template<typename Functor>
+    inline void HashSet<T, U, V>::removeIf(const Functor& functor)
+    {
+        m_impl.removeIf(functor);
+    }
+
+    template<typename T, typename U, typename V>
     inline void HashSet<T, U, V>::clear()
     {
         m_impl.clear();

Modified: trunk/Source/WTF/wtf/HashTraits.h (198375 => 198376)


--- trunk/Source/WTF/wtf/HashTraits.h	2016-03-18 02:11:31 UTC (rev 198375)
+++ trunk/Source/WTF/wtf/HashTraits.h	2016-03-18 04:06:59 UTC (rev 198376)
@@ -85,6 +85,13 @@
     static bool isDeletedValue(T value) { return value == std::numeric_limits<T>::max() - 1; }
 };
 
+template<typename T> struct SignedWithZeroKeyHashTraits : GenericHashTraits<T> {
+    static const bool emptyValueIsZero = false;
+    static T emptyValue() { return std::numeric_limits<T>::min(); }
+    static void constructDeletedValue(T& slot) { slot = std::numeric_limits<T>::max(); }
+    static bool isDeletedValue(T value) { return value == std::numeric_limits<T>::max(); }
+};
+
 // Can be used with strong enums, allows zero as key.
 template<typename T> struct StrongEnumHashTraits : GenericHashTraits<T> {
     using UnderlyingType = typename std::underlying_type<T>::type;

_______________________________________________
webkit-changes mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-changes